Comprehensive Python Programming for Data Science

Abstract:

With tools like ChatGPT, it has never been easier to look up Python for data science and spend less time on programming and more time on deriving meaningful insights from data. However, we still need to ask the right questions for ChatGPT to look up in the right place and provide a professional response. Therefore, it is very important to know what to ask and in what order to ask. That is what we are going to focus on in this page. By quickly moving down this page, familiarize yourself, or refresh your memory, with the core of Python programming language and useful libraries that data scientists use in their daily jobs. Note: since language models may return wrong answers, we also provide the right answer to these questions below. After reviewing this page, you should have enough background to ask ChatGPT the right questions and fill in whatever part is not covered on this page.

Table of Contents:


  • Core Data Types and Structures
    • Numbers
      • Integers, floats, and complex numbers
      • Operations and conversions
    • Strings
      • String methods and slicing
      • Formatting strings
    • Booleans
      • Logical operators and truthiness
    • Built-in collections
      • Lists
        • Methods and comprehensions
      • Tuples
        • Immutability and use cases
      • Sets
        • Set operations (union, intersection, difference)
      • Dictionaries
        • Key-value pairs, methods, and dictionary comprehensions
  • Control Flow
    • Conditional statements (if, elif, else)
    • Loops
      • for loops with range()
      • while loops
      • Loop control (break, continue, else)
    • Comprehensions
      • List, set, and dictionary comprehensions
  • Functions
    • Defining and calling functions
    • Arguments and parameters
      • Positional, keyword, and default arguments
    • *args and **kwargs
    • Anonymous functions with lambda
    • Scope and the global keyword
    • Higher-order functions
      • map, filter, and reduce
  • Modules and Packages
    • Importing modules
      • import, from ... import ..., as keyword
    • Standard library overview
      • math, random, os, sys, time, datetime
    • Writing and using custom modules
    • Exploring the dir() and help() functions
  • File Handling
    • Opening and closing files
    • Reading and writing files
    • File modes (r, w, a, rb, etc.)
    • Using with statement for file handling
    • Working with os and shutil modules
  • Error Handling
    • try, except, else, and finally blocks
    • Raising exceptions
    • Built-in exceptions
    • Custom exception classes
  • Iterators and Generators
    • Creating and using iterators
    • Generator functions and expressions
    • yield keyword
  • Object-Oriented Programming (OOP)
    • Classes and objects
    • Methods and attributes
    • __init__ method
    • Inheritance and polymorphism
    • Special methods (__str__, __repr__, __len__, etc.)
    • Encapsulation and property decorators
  • Built-in Functions and Utilities
    • Mathematical functions
      • abs(), pow(), round(), etc.
    • Iterable functions
      • len(), max(), min(), sorted(), enumerate()
    • Type-related functions
      • type(), isinstance(), id()
    • I/O functions
      • print(), input()
    • Conversion functions
      • int(), float(), str(), list(), etc.
  • Decorators and Context Managers
    • Creating and applying decorators
    • Built-in decorators (@staticmethod, @classmethod, @property)
    • Creating custom context managers with __enter__ and __exit__
  • Concurrency and Parallelism
    • Introduction to threading and multiprocessing
    • Async programming with asyncio
  • Exploring Advanced Features
    • Working with collections module
      • Counter, defaultdict, OrderedDict, deque
    • itertools for functional programming
    • functools utilities
      • lru_cache, partial, reduce
    • Handling JSON and other formats
      • json module
      • pickle for serialization
  • Getting Started with IPython
    • Accessing documentation (?, ??)
    • Exploring modules with Tab completion
    • Keyboard shortcuts in IPython Shell
      • Navigation shortcuts
      • Text entry shortcuts
      • Command history shortcuts
    • IPython magic commands
      • %paste, %cpaste, %run
      • %time, %timeit
      • Accessing magic commands (%magic, %lsmagic)
  • Working with the Shell
    • Shell commands in IPython
    • Passing values to and from the shell
    • Shell-related magic commands
  • Debugging and Profiling
    • Controlling exceptions (%xmode)
    • Debugging tracebacks
    • Timing code execution (%timeit, %time)
    • Profiling scripts (%prun, %lprun, %memit, %mprun)
  • Introduction to NumPy
    • Understanding data types in Python
    • Creating and manipulating arrays
      • Arrays from Python lists
      • Arrays from scratch
      • NumPy standard data types
    • Basics of NumPy arrays
      • Attributes, indexing, and slicing
      • Reshaping arrays
      • Array concatenation and splitting
    • Computation on arrays
      • Universal functions (UFuncs)
      • Broadcasting
    • Aggregations
      • Summing, min, max, and averages
      • Boolean masks and comparisons
    • Advanced indexing
      • Fancy indexing
      • Sorting and binning data
    • Structured arrays
  • Data Manipulation with Pandas
    • Introduction to Pandas objects
      • Series, DataFrames, and Index objects
    • Data selection and indexing
    • Handling missing data
    • Combining datasets
    • Aggregation and grouping
    • Working with time series
  • Data Visualization
    • Visualization with Matplotlib
      • Basic plots: Line, scatter, and bar plots
      • Plot customization: Colors, styles, labels, legends
      • Subplots and grids
      • Histograms and density plots
      • Three-dimensional plotting
      • Text, annotations, and tick customization
    • Visualization with Seaborn
      • Statistical plots
      • Pair plots, distribution plots, and categorical plots

Core Data Types and Structures

Numbers

Python supports different types of numbers including integers, floats, and complex numbers. These types support various operations and can be converted from one type to another.

# Integer
int_num = 10

# Float
float_num = 10.5

# Complex
complex_num = 3 + 4j

# Operations
sum_result = int_num + float_num  # Addition
product_result = int_num * 2  # Multiplication
division_result = int_num / 3  # Division

# Type conversions
float_to_int = int(float_num)  # Convert float to int
int_to_float = float(int_num)  # Convert int to float
  • Integers are whole numbers, e.g., 10, -5, 0.
  • Floats are numbers with decimal points, e.g., 10.5, -3.14.
  • Complex numbers have a real and imaginary part, e.g., 3 + 4j.
  • Conversions between types can be done using int() and float().

Strings

Strings are sequences of characters. Python provides methods for manipulating strings, slicing, and formatting them.

# String declaration
greeting = "Hello, World!"

# Slicing
substring = greeting[0:5]  # 'Hello'

# String methods
uppercase = greeting.upper()  # 'HELLO, WORLD!'
lowercase = greeting.lower()  # 'hello, world!'
replaced = greeting.replace("World", "Python")  # 'Hello, Python!'

# String formatting
name = "John"
age = 25
formatted_string = f"My name is {name} and I am {age} years old."
  • Strings are immutable, meaning their content cannot be changed in place.
  • Use square brackets to slice strings, e.g., string[start:end].
  • Common string methods include .upper(), .lower(), and .replace().
  • String interpolation can be done using f-strings or format().

Booleans

Booleans represent one of two possible values: True or False. Logical operators are used to perform logical operations.

# Boolean values
is_python_fun = True
is_raining = False

# Logical operators
and_result = is_python_fun and is_raining  # False
or_result = is_python_fun or is_raining  # True
not_result = not is_python_fun  # False

# Truthiness
truthy_check = bool(1)  # True, since 1 is considered True
falsy_check = bool(0)  # False, since 0 is considered False
  • Logical operators include and, or, and not.
  • Python considers non-zero numbers, non-empty collections, and True as truthy.
  • Falsy values include 0, None, and empty collections.

Built-in Collections

Lists

Lists are mutable collections that store ordered elements. They allow duplication and can be modified in place.

# List declaration
fruits = ["apple", "banana", "cherry"]

# List methods
fruits.append("orange")  # Add an item
fruits.remove("banana")  # Remove an item
fruits.sort()  # Sort the list alphabetically

# List comprehension
squares = [x**2 for x in range(5)]  # [0, 1, 4, 9, 16]
  • Lists are mutable, meaning they can be changed in place.
  • Common methods include .append(), .remove(), and .sort().
  • List comprehensions provide a concise way to create new lists.

Tuples

Tuples are immutable, ordered collections often used for fixed sets of items.

# Tuple declaration
coordinates = (10, 20)

# Accessing tuple elements
x = coordinates[0]  # 10
y = coordinates[1]  # 20

# Tuples are immutable, so this would raise an error:
# coordinates[0] = 15
  • Tuples are immutable, meaning their content cannot be modified after creation.
  • Useful for representing fixed collections of related data (e.g., coordinates).

Sets

Sets are unordered collections of unique elements.

# Set declaration
fruits = {"apple", "banana", "cherry"}

# Set operations
union_result = fruits.union({"orange", "grape"})  # Combine sets
intersection_result = fruits.intersection({"banana", "kiwi"})  # Items present in both sets
difference_result = fruits.difference({"banana", "cherry"})  # Items in the first set but not the second
  • Sets only store unique values, and duplicates are ignored.
  • Supports operations like union(), intersection(), and difference().

Dictionaries

Dictionaries store key-value pairs. Keys must be unique, and values can be of any type.

# Dictionary declaration
person = {
    "name": "John",
    "age": 30,
    "city": "New York"
}

# Accessing values
name = person["name"]  # 'John'

# Adding a new key-value pair
person["job"] = "Engineer"

# Updating values
person["age"] = 31

# Dictionary methods
keys = person.keys()  # Get all keys
values = person.values()  # Get all values
items = person.items()  # Get all key-value pairs

# Dictionary comprehension
squares = {x: x**2 for x in range(5)}  # {0: 0, 1: 1, 2: 4, 3: 9, 4: 16}
  • Dictionaries use key-value pairs to store data.
  • Access values using dict[key] or with the get() method.
  • Common methods include .keys(), .values(), and .items().
  • Dictionary comprehensions provide a concise way to create dictionaries.

Control Flow

Conditional Statements

Conditional statements allow you to control the flow of your program based on conditions using if, elif, and else statements.

# Example of conditional statements
x = 10

if x > 0:
    result = "Positive"
elif x == 0:
    result = "Zero"
else:
    result = "Negative"
  • Use if to check a condition.
  • Use elif (else if) to check multiple conditions.
  • Use else to specify a block of code to run if all conditions are false.

Loops

For Loops with range()

A for loop iterates over a sequence (like a list, string, or range). The range() function generates a sequence of numbers.

# For loop using range
for i in range(5):  # Loops from 0 to 4
    print(f"Iteration {i}")
  • range(n) generates numbers from 0 up to (but not including) n.
  • You can also specify start, stop, and step: range(start, stop, step).
  • Useful for iterating a fixed number of times.

While Loops

A while loop repeats as long as a given condition is True.

# While loop example
count = 0

while count < 5:
    print(f"Count is {count}")
    count += 1  # Increment count
  • The loop continues as long as the condition count < 5 is True.
  • Be cautious of infinite loops, where the condition is never False.
  • Use break to exit a loop early.

Loop Control (break, continue, else)

Loop control statements modify the behavior of loops. They allow you to exit or skip iterations.

# Using break, continue, and else
for i in range(5):
    if i == 3:
        break  # Exit the loop when i is 3
    if i % 2 == 0:
        continue  # Skip even numbers
    print(f"Odd number: {i}")
else:
    print("Loop completed without break")
  • break stops the loop immediately.
  • continue skips the current iteration and moves to the next one.
  • The else block runs if the loop is not terminated by a break.

Comprehensions

List Comprehensions

List comprehensions provide a concise way to create lists from iterables.

# List comprehension
squares = [x**2 for x in range(5)]  # [0, 1, 4, 9, 16]
  • Syntax: [expression for item in iterable if condition].
  • List comprehensions are more readable and concise than traditional loops.

Set Comprehensions

Set comprehensions create sets, ensuring all items are unique.

# Set comprehension
unique_squares = {x**2 for x in range(-3, 4)}  # {0, 1, 4, 9}
  • Similar to list comprehensions, but sets do not allow duplicate values.
  • Syntax: {expression for item in iterable if condition}.

Dictionary Comprehensions

Dictionary comprehensions create dictionaries from iterables.

# Dictionary comprehension
squares_dict = {x: x**2 for x in range(5)}  # {0: 0, 1: 1, 2: 4, 3: 9, 4: 16}
  • Syntax: {key_expression: value_expression for item in iterable}.
  • Useful for building dictionaries dynamically from other iterables.

Functions

Defining and Calling Functions

Functions are reusable blocks of code that perform specific tasks. They are defined using the def keyword.

# Function definition
def greet(name):
    return f"Hello, {name}!"

# Function call
message = greet("John")
  • Use def function_name(parameters): to define a function.
  • Call the function using function_name(arguments).
  • The return statement sends a result back to the caller.

Arguments and Parameters

Functions can have positional, keyword, and default arguments, which define how arguments are passed.

# Function with positional, keyword, and default arguments
def introduce(name, age=30, city="New York"):
    return f"My name is {name}, I am {age} years old, and I live in {city}."

# Positional arguments
intro1 = introduce("Alice", 25, "Boston")

# Keyword arguments
intro2 = introduce(name="Bob", age=40, city="Chicago")

# Using default arguments
intro3 = introduce("Charlie")
  • Positional arguments must be passed in the correct order.
  • Keyword arguments explicitly specify parameter names when calling a function.
  • Default arguments are used when no argument is provided for a parameter.

*args and **kwargs

*args and **kwargs allow functions to accept a variable number of arguments.

# Using *args for variable-length arguments
def sum_numbers(*args):
    return sum(args)

total = sum_numbers(1, 2, 3, 4)  # 10

# Using **kwargs for variable-length keyword arguments
def print_info(**kwargs):
    for key, value in kwargs.items():
        print(f"{key}: {value}")

print_info(name="Alice", age=25, city="Boston")
  • *args allows a function to accept any number of positional arguments as a tuple.
  • **kwargs allows a function to accept any number of keyword arguments as a dictionary.
  • Both *args and **kwargs can be used in the same function.

Anonymous Functions with lambda

Lambda functions are anonymous (unnamed) functions that can have multiple arguments but only one expression.

# Lambda function to square a number
square = lambda x: x ** 2

# Call the lambda function
result = square(4)  # 16

# Lambda function with multiple arguments
add = lambda x, y: x + y
result2 = add(3, 5)  # 8
  • Lambda functions are defined as lambda arguments: expression.
  • They are often used as short, one-line functions.
  • Useful for short, temporary, or one-time-use functions.

Scope and the global Keyword

Scope determines where variables are accessible. The global keyword allows access to global variables inside functions.

# Global variable
counter = 0

def increment():
    global counter  # Access the global variable
    counter += 1

increment()
increment()
  • Variables defined inside a function have local scope and are only accessible within that function.
  • Global variables exist outside functions and can be accessed using the global keyword.

Higher-Order Functions

Higher-order functions are functions that accept other functions as arguments or return functions as results.

map

The map() function applies a given function to each item in an iterable (like a list) and returns a map object.

# Using map to square a list of numbers
numbers = [1, 2, 3, 4, 5]
squared_numbers = list(map(lambda x: x ** 2, numbers))  # [1, 4, 9, 16, 25]
  • map(function, iterable) applies a function to every item in the iterable.
  • Returns a map object, which can be converted to a list or other collections.

filter

The filter() function filters elements from an iterable based on a condition.

# Using filter to get even numbers from a list
numbers = [1, 2, 3, 4, 5, 6]
even_numbers = list(filter(lambda x: x % 2 == 0, numbers))  # [2, 4, 6]
  • filter(function, iterable) returns elements where the function returns True.
  • Returns a filter object, which can be converted to a list or other collections.

reduce

The reduce() function from the functools module reduces a sequence to a single value using a function.

# Using reduce to compute the product of a list of numbers
from functools import reduce

numbers = [1, 2, 3, 4]
product = reduce(lambda x, y: x * y, numbers)  # 24
  • reduce(function, iterable) applies a rolling computation to elements in an iterable.
  • Requires importing reduce from the functools module.
  • Useful for cumulative operations like summing, multiplying, or concatenating elements.

Modules and Packages

Importing Modules

Modules allow you to organize Python code into reusable files. You can import entire modules or specific parts of them.

# Import the entire module
import math
result1 = math.sqrt(16)  # 4.0

# Import specific functions from a module
from math import sqrt, pi
result2 = sqrt(25)  # 5.0
circle_area = pi * (5 ** 2)  # Area of a circle with radius 5

# Import a module with an alias
import random as rnd
random_number = rnd.randint(1, 10)  # Random number between 1 and 10
  • import module_name imports the entire module.
  • from module_name import specific_function imports specific parts of a module.
  • import module_name as alias gives the module a shorter name (alias) for convenience.

Standard Library Overview

Python has a rich standard library that provides modules for many common tasks. Here are some key modules:

math

import math
result = math.factorial(5)  # 120
  • Provides mathematical functions like sqrt(), factorial(), and constants like pi.

random

import random
random_choice = random.choice(['apple', 'banana', 'cherry'])
  • Generates random numbers, selects random elements, and shuffles sequences.

os

import os
current_directory = os.getcwd()  # Get the current working directory
  • Interacts with the operating system, allowing access to files, directories, and environment variables.

sys

import sys
print(sys.version)  # Print Python version
  • Provides system-specific parameters and functions, such as command-line arguments and Python version info.

time

import time
current_time = time.time()  # Get the current time in seconds since epoch
  • Provides functions for time-related tasks like sleeping, measuring execution time, and working with timestamps.

datetime

import datetime
current_date = datetime.datetime.now()  # Get the current date and time
  • Used for date and time manipulation, such as getting the current date or formatting dates.

Writing and Using Custom Modules

You can create your own Python module by saving functions and classes in a .py file and importing it.

# File: my_module.py
def greet(name):
    return f"Hello, {name}!"

def add(a, b):
    return a + b
# Main script to import and use the custom module
import my_module

message = my_module.greet("Alice")
sum_result = my_module.add(3, 4)
  • Create a file named my_module.py containing functions and classes.
  • Import it using import my_module or from my_module import function_name.
  • Custom modules help organize and reuse code in multiple files.

Exploring the dir() and help() Functions

The dir() and help() functions are useful for exploring Python modules, objects, and functions.

# Using dir() to list attributes and methods of a module
import math
print(dir(math))  # Lists all attributes, methods, and constants in the math module

# Using help() to display the documentation for a function or module
help(math.sqrt)  # Shows documentation for the sqrt() function
  • dir(object) returns a list of all attributes, methods, and constants of the object.
  • help(object) displays detailed documentation and usage information for an object.
  • Use dir() to explore what is available and help() to learn how to use it.

File Handling

Opening and Closing Files

File handling allows you to read from and write to files on your system. Files must be opened before they can be read or written, and closed afterward to free system resources.

# Opening and closing a file
file = open("example.txt", "w")  # Open the file in write mode
file.write("Hello, World!")
file.close()  # Close the file
  • Use open(filename, mode) to open a file.
  • Use file.close() to close the file after use.
  • It's a good practice to close files to avoid resource leaks.

Reading and Writing Files

You can read from and write to files using the read(), readline(), readlines(), and write() methods.

# Writing to a file
with open("example.txt", "w") as file:
    file.write("Hello, World!\n")
    file.write("Welcome to Python file handling.\n")

# Reading from a file
with open("example.txt", "r") as file:
    content = file.read()  # Read the entire file
    print(content)
  • file.write(data) writes a string to the file.
  • file.read() reads the entire file as a string.
  • file.readline() reads one line at a time.
  • file.readlines() reads all lines and returns a list of strings.

File Modes

File modes determine how files are opened. Here are the most common file modes:

# File modes
with open("example.txt", "r") as file:  # Read mode
    content = file.read()

with open("example.txt", "w") as file:  # Write mode (overwrites the file)
    file.write("This will overwrite the file.")

with open("example.txt", "a") as file:  # Append mode (adds to the end of the file)
    file.write("This will be appended to the file.")

with open("example.txt", "rb") as file:  # Read binary mode
    binary_content = file.read()
  • r - Read mode (default) - Opens the file for reading.
  • w - Write mode - Overwrites the file if it exists, or creates a new one.
  • a - Append mode - Adds new content to the end of the file.
  • rb - Read binary - Reads binary files like images or executables.
  • wb - Write binary - Writes binary data to the file.

Using the with Statement

The with statement is used to manage file resources. It automatically closes the file once the block is finished, even if an error occurs.

# Using 'with' to automatically close the file
with open("example.txt", "r") as file:
    content = file.read()
    print(content)  # The file is automatically closed after this block
  • The with statement ensures the file is properly closed after use.
  • It eliminates the need to explicitly call file.close().

Working with os and shutil Modules

The os and shutil modules provide functions for file and directory manipulation.

os Module

import os

# Get the current working directory
current_directory = os.getcwd()

# List files in the current directory
files = os.listdir()

# Create a new directory
os.mkdir("new_folder")

# Remove a directory
os.rmdir("new_folder")

# Check if a file exists
file_exists = os.path.exists("example.txt")
  • os.getcwd() returns the current working directory.
  • os.listdir() lists files and directories in the current directory.
  • os.mkdir() creates a new directory.
  • os.rmdir() removes an empty directory.
  • os.path.exists() checks if a file or directory exists.

shutil Module

import shutil

# Copy a file
shutil.copy("example.txt", "copy_example.txt")

# Move a file
shutil.move("copy_example.txt", "new_folder/copy_example.txt")

# Remove a file
os.remove("example.txt")
  • shutil.copy(src, dst) copies a file from src to dst.
  • shutil.move(src, dst) moves a file from src to dst.
  • os.remove() deletes a file from the filesystem.

Error Handling

try, except, else, and finally Blocks

Error handling allows you to gracefully handle exceptions that might occur in your program. Python provides try, except, else, and finally blocks to catch and handle exceptions.

# Example of try, except, else, and finally
try:
    num = int(input("Enter a number: "))
    result = 10 / num  # May raise ZeroDivisionError
except ZeroDivisionError:
    print("Error: Cannot divide by zero.")
except ValueError:
    print("Error: Invalid input. Please enter a number.")
else:
    print(f"Division successful, result is {result}")
finally:
    print("This block runs no matter what.")
  • try: Defines a block of code to test for exceptions.
  • except: Handles specific exceptions that occur in the try block.
  • else: Runs if no exceptions occur in the try block.
  • finally: Always runs, regardless of what happens in try or except.

Raising Exceptions

Python allows you to raise exceptions explicitly using the raise statement.

# Example of raising exceptions
def check_age(age):
    if age < 0:
        raise ValueError("Age cannot be negative.")
    print(f"Age is {age}")

try:
    check_age(-5)
except ValueError as e:
    print(f"Exception occurred: {e}")
  • Use raise ExceptionType("message") to raise an exception.
  • Raising exceptions allows you to enforce rules and validate input.

Built-in Exceptions

Python has several built-in exceptions that are raised when errors occur. Here are some commonly used exceptions:

  • ValueError: Raised when a function receives an argument of the right type but with an inappropriate value.
  • TypeError: Raised when an operation or function is applied to an object of an inappropriate type.
  • IndexError: Raised when an index is out of range for lists, tuples, etc.
  • KeyError: Raised when a dictionary key is not found.
  • ZeroDivisionError: Raised when division or modulo by zero occurs.
  • FileNotFoundError: Raised when an attempt to open a file that does not exist is made.
# Examples of built-in exceptions
try:
    my_list = [1, 2, 3]
    print(my_list[5])  # Raises IndexError
except IndexError as e:
    print(f"Exception occurred: {e}")

try:
    result = 10 / 0  # Raises ZeroDivisionError
except ZeroDivisionError as e:
    print(f"Exception occurred: {e}")
  • Common exceptions include TypeError, ValueError, IndexError, and KeyError.
  • Use try and except to catch and handle these exceptions.

Custom Exception Classes

You can create custom exceptions by subclassing Python's Exception class. This allows you to define your own error types and raise them when needed.

# Custom exception class
class CustomError(Exception):
    def __init__(self, message):
        super().__init__(message)

# Raising and handling a custom exception
try:
    raise CustomError("This is a custom error message.")
except CustomError as e:
    print(f"Custom exception occurred: {e}")
  • Create a custom exception by subclassing the Exception class.
  • Define an __init__() method to customize the error message.
  • Use raise CustomError("message") to raise your custom exception.

Iterators and Generators

Creating and Using Iterators

An iterator is an object that contains a sequence of elements and can be iterated (looped) one element at a time. Iterators are implemented using the __iter__() and __next__() methods.

# Creating an iterator from a list
my_list = [1, 2, 3, 4]
iterator = iter(my_list)  # Get an iterator from the list

# Access elements using next()
first_element = next(iterator)  # 1
second_element = next(iterator)  # 2
# Custom iterator class
class MyIterator:
  def __init__(self, max_value):
      self.max_value = max_value
      self.current = 0

  def __iter__(self):
      return self

  def __next__(self):
      if self.current < self.max_value:
          result = self.current
          self.current += 1
          return result
      else:
          raise StopIteration

# Using the custom iterator
for num in MyIterator(5):
  print(num)  # Outputs: 0, 1, 2, 3, 4
  • An iterator is an object with __iter__() (returns the iterator) and __next__() (returns the next item) methods.
  • Use iter() to get an iterator from an iterable like a list, tuple, or string.
  • Use next() to get the next item from an iterator, and StopIteration is raised when no more items are available.

Generator Functions and Expressions

Generators are functions that yield items one at a time instead of returning them all at once. They are memory-efficient since they generate values on demand.

# Generator function using yield
def my_generator():
    yield 1
    yield 2
    yield 3

# Using the generator
for value in my_generator():
    print(value)  # Outputs: 1, 2, 3
# Infinite generator
def infinite_counter():
    num = 0
    while True:
        yield num
        num += 1

# Access the first 5 values from the infinite generator
counter = infinite_counter()
for _ in range(5):
    print(next(counter))  # Outputs: 0, 1, 2, 3, 4
  • Generator functions use the yield keyword instead of return to produce values one at a time.
  • Generators maintain their state between yields, allowing the computation to be resumed.
  • Generators are memory-efficient as they only produce one value at a time, unlike lists that store all values in memory.

yield Keyword

The yield keyword is used in generator functions to yield a value and pause execution until the next call to next().

# Generator using yield
def countdown(n):
    while n > 0:
        yield n
        n -= 1

# Using the generator
for value in countdown(5):
    print(value)  # Outputs: 5, 4, 3, 2, 1
# Generator that generates Fibonacci numbers
def fibonacci(limit):
    a, b = 0, 1
    while a < limit:
        yield a
        a, b = b, a + b

# Print Fibonacci numbers less than 20
for num in fibonacci(20):
    print(num)  # Outputs: 0, 1, 1, 2, 3, 5, 8, 13
  • yield pauses the generator function and returns a value to the caller.
  • When next() is called, execution resumes from the last yield point.
  • Generators maintain the state of local variables between yield calls, unlike functions that reset their state on each call.

Object-Oriented Programming (OOP)

Classes and Objects

Classes define blueprints for creating objects. An object is an instance of a class with specific properties and behavior.

# Define a class
class Dog:
    def __init__(self, name, breed):
        self.name = name
        self.breed = breed

# Create an object (instance) of the class
dog1 = Dog("Buddy", "Golden Retriever")
dog2 = Dog("Max", "German Shepherd")

# Access object attributes
print(dog1.name)  # Buddy
print(dog2.breed)  # German Shepherd
  • Classes are defined using the class keyword.
  • Objects are instances of a class, and they have attributes and methods.
  • Use object.attribute to access an attribute of an object.

Methods and Attributes

Attributes store object data, while methods define actions that objects can perform.

# Class with attributes and methods
class Dog:
    def __init__(self, name, breed):
        self.name = name
        self.breed = breed
    
    def bark(self):
        return f"{self.name} says Woof!"

dog = Dog("Buddy", "Golden Retriever")
print(dog.bark())  # Buddy says Woof!
  • Attributes store data related to an object (e.g., name and breed).
  • Methods are functions defined in a class that operate on the object's data.

__init__ Method

The __init__ method is a constructor that initializes the object's attributes when it is created.

# __init__ method initializes attributes
class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age

person = Person("Alice", 30)
print(person.name)  # Alice
print(person.age)  # 30
  • __init__ is called automatically when an object is created.
  • It initializes the attributes of the object.
  • Parameters passed during object creation are assigned to the object's attributes.

Inheritance and Polymorphism

Inheritance allows a class to inherit attributes and methods from another class. Polymorphism allows objects of different classes to be treated as objects of a common superclass.

# Parent (Base) class
class Animal:
    def __init__(self, name):
        self.name = name
    
    def speak(self):
        return "I am an animal"

# Child (Derived) class
class Dog(Animal):
    def speak(self):
        return f"{self.name} says Woof!"

# Child (Derived) class
class Cat(Animal):
    def speak(self):
        return f"{self.name} says Meow!"

dog = Dog("Buddy")
cat = Cat("Whiskers")

print(dog.speak())  # Buddy says Woof!
print(cat.speak())  # Whiskers says Meow!
  • Inheritance allows one class to inherit the properties and methods of another class.
  • Polymorphism allows objects of different classes to be used interchangeably.
  • Child classes can override parent methods (method overriding).

Special Methods

Special methods (also called "magic" or "dunder" methods) allow classes to define behavior for built-in Python operations.

# Special methods for string representation
class Book:
    def __init__(self, title, author, pages):
        self.title = title
        self.author = author
        self.pages = pages

    def __str__(self):
        return f"'{self.title}' by {self.author}"

    def __len__(self):
        return self.pages

book = Book("1984", "George Orwell", 328)
print(str(book))  # '1984' by George Orwell
print(len(book))  # 328
  • __str__() returns a string representation of the object (used by print()).
  • __repr__() returns an unambiguous representation of the object (used in debugging).
  • __len__() allows the object to be used with len() to get its "length".
  • Other special methods include __add__(), __eq__(), and __getitem__().

Encapsulation and Property Decorators

Encapsulation hides internal details of an object from the outside. Property decorators (@property) provide controlled access to attributes.

# Encapsulation using private attributes and property decorators
class Account:
    def __init__(self, balance):
        self.__balance = balance  # Private attribute

    @property
    def balance(self):
        return self.__balance

    @balance.setter
    def balance(self, amount):
        if amount >= 0:
            self.__balance = amount
        else:
            raise ValueError("Balance cannot be negative")

account = Account(1000)
print(account.balance)  # 1000

account.balance = 1200  # Update balance
print(account.balance)  # 1200

# The following line raises an exception
# account.balance = -500  # ValueError: Balance cannot be negative
  • Private attributes are prefixed with a double underscore (e.g., __balance).
  • The @property decorator defines a getter for an attribute.
  • The @property_name.setter decorator defines a setter to control how attributes are updated.

Built-in Functions and Utilities

Mathematical Functions

Python provides several built-in functions for mathematical operations.

# Mathematical functions
absolute_value = abs(-10)  # 10
power_value = pow(2, 3)  # 2^3 = 8
rounded_value = round(3.14159, 2)  # 3.14 (rounds to 2 decimal places)
  • abs(x): Returns the absolute value of x.
  • pow(x, y): Returns x raised to the power y (same as x**y).
  • round(x, n): Rounds x to n decimal places.

Iterable Functions

These functions operate on iterables such as lists, tuples, and sets.

# Iterable functions
my_list = [10, 20, 30, 40]

length = len(my_list)  # 4
maximum = max(my_list)  # 40
minimum = min(my_list)  # 10
sorted_list = sorted(my_list, reverse=True)  # [40, 30, 20, 10]

# Using enumerate to get index and value
for index, value in enumerate(my_list):
    print(f"Index: {index}, Value: {value}")
  • len(x): Returns the number of elements in the iterable x.
  • max(x): Returns the largest item in the iterable x.
  • min(x): Returns the smallest item in the iterable x.
  • sorted(x, reverse=True): Returns a new sorted list from the items of x.
  • enumerate(x): Returns both index and value while iterating over x.

These functions allow you to check and manipulate the type of objects.

# Type-related functions
variable = 42

variable_type = type(variable)  # 
is_instance = isinstance(variable, int)  # True
variable_id = id(variable)  # Unique identifier for the object in memory
  • type(x): Returns the type of the object x.
  • isinstance(x, type): Checks if x is an instance of the specified type.
  • id(x): Returns the unique identifier of an object (its memory address).

I/O Functions

Input/Output functions are used to take input from the user and display output on the screen.

# I/O functions
# Taking input from the user
name = input("What is your name? ")

# Printing a message
print(f"Hello, {name}!")
  • input(prompt): Takes input from the user as a string and returns it.
  • print(x): Prints the string representation of x to the console.

Conversion Functions

These functions convert values from one type to another.

# Conversion functions
num_str = "123"
int_value = int(num_str)  # 123 (string to integer)
float_value = float(num_str)  # 123.0 (string to float)
str_value = str(456)  # '456' (integer to string)
list_value = list("hello")  # ['h', 'e', 'l', 'l', 'o'] (string to list)
tuple_value = tuple([1, 2, 3])  # (1, 2, 3) (list to tuple)
set_value = set([1, 2, 2, 3])  # {1, 2, 3} (list to set, removes duplicates)
  • int(x): Converts x to an integer.
  • float(x): Converts x to a float.
  • str(x): Converts x to a string.
  • list(x): Converts x to a list.
  • tuple(x): Converts x to a tuple.
  • set(x): Converts x to a set (removes duplicates).

Decorators and Context Managers

Creating and Applying Decorators

Decorators are functions that modify the behavior of other functions or methods. They are applied using the @decorator_name syntax.

# Simple decorator to log function calls
def logger(func):
    def wrapper(*args, **kwargs):
        print(f"Calling function: {func.__name__}")
        result = func(*args, **kwargs)
        print(f"Function {func.__name__} finished execution")
        return result
    return wrapper

# Applying the decorator using @
@logger
def say_hello(name):
    print(f"Hello, {name}!")

say_hello("Alice")
  • Decorators modify the behavior of functions or methods.
  • They are defined as functions that return a "wrapper" function.
  • Apply a decorator using @decorator_name before the function definition.

Built-in Decorators

Python provides several built-in decorators for commonly used patterns, such as @staticmethod, @classmethod, and @property.

@staticmethod

Defines a method that does not require access to the instance or class.

# Static method example
class Math:
    @staticmethod
    def add(a, b):
        return a + b

result = Math.add(5, 3)  # 8
  • @staticmethod allows the method to be called on the class itself, not an instance.

@classmethod

Defines a method that takes the class itself (cls) as the first argument instead of the instance.

# Class method example
class Person:
    population = 0

    def __init__(self, name):
        self.name = name
        Person.population += 1

    @classmethod
    def get_population(cls):
        return cls.population

person1 = Person("Alice")
person2 = Person("Bob")

total_population = Person.get_population()  # 2
  • @classmethod allows access to class variables and methods from within the method.

@property

Converts a method into a read-only attribute using the @property decorator.

# Property decorator example
class Circle:
    def __init__(self, radius):
        self._radius = radius

    @property
    def area(self):
        return 3.14159 * self._radius ** 2

circle = Circle(5)
circle_area = circle.area  # Access as an attribute, not a method
  • @property allows you to access methods like attributes without calling them as functions.
  • It is used to create read-only attributes.

Creating Custom Context Managers

Context managers are used to manage resources like file handling. They ensure that setup and cleanup actions are always performed.

Using __enter__ and __exit__ Methods

# Custom context manager using __enter__ and __exit__
class MyContextManager:
    def __enter__(self):
        print("Entering context")
        return self

    def __exit__(self, exc_type, exc_value, traceback):
        print("Exiting context")

# Using the custom context manager
with MyContextManager() as context:
    print("Inside the context")
  • __enter__ runs when entering the with block and returns the context manager object.
  • __exit__ runs when exiting the with block, even if an exception occurs.
  • It takes three arguments: exc_type, exc_value, and traceback, which are used for exception handling.

Creating Context Managers Using contextlib

Another way to create context managers is by using the @contextmanager decorator from the contextlib module.

from contextlib import contextmanager

@contextmanager
def my_context():
    print("Entering context")
    yield
    print("Exiting context")

# Using the context manager
with my_context():
    print("Inside the context")
  • The @contextmanager decorator makes it easier to create context managers using yield.
  • Code before yield runs when entering the context, and code after yield runs when exiting.

Concurrency and Parallelism

Introduction to threading and multiprocessing

Concurrency and parallelism allow multiple tasks to run simultaneously, improving performance. Python supports concurrency with threading and parallelism with multiprocessing.

Threading

The threading module allows you to run multiple threads concurrently. Threads run in the same memory space and share data, making them lightweight but requiring synchronization.

import threading
import time

def worker():
    print("Starting thread")
    time.sleep(2)
    print("Finished thread")

# Create and start two threads
thread1 = threading.Thread(target=worker)
thread2 = threading.Thread(target=worker)

thread1.start()
thread2.start()

thread1.join()  # Wait for thread1 to finish
thread2.join()  # Wait for thread2 to finish
  • Threads run concurrently, sharing the same memory space.
  • Use threading.Thread(target=function_name) to create a thread that runs function_name.
  • thread.start() begins execution of the thread, and thread.join() waits for it to complete.

Multiprocessing

The multiprocessing module allows you to run multiple processes in parallel. Each process runs in its own memory space, so data is not shared directly.

import multiprocessing
import time

def worker():
    print("Starting process")
    time.sleep(2)
    print("Finished process")

# Create and start two processes
process1 = multiprocessing.Process(target=worker)
process2 = multiprocessing.Process(target=worker)

process1.start()
process2.start()

process1.join()  # Wait for process1 to finish
process2.join()  # Wait for process2 to finish
  • Processes run in separate memory spaces, making them safer but more resource-intensive than threads.
  • Use multiprocessing.Process(target=function_name) to create a process that runs function_name.
  • process.start() begins execution of the process, and process.join() waits for it to complete.

Async Programming with asyncio

Asynchronous programming allows for non-blocking execution. The asyncio module enables running asynchronous tasks using async and await keywords.

Asynchronous Functions

Asynchronous functions use async def to define a function that can be paused using await.

import asyncio

async def greet(name):
    print(f"Hello, {name}!")
    await asyncio.sleep(2)  # Simulate a network delay
    print(f"Goodbye, {name}!")

# Run the asynchronous function
asyncio.run(greet("Alice"))
  • Use async def to define an asynchronous function.
  • Use await to pause execution until the awaited task is complete.
  • Use asyncio.run() to run an asynchronous function from synchronous code.

Running Multiple Tasks Concurrently

To run multiple asynchronous tasks at the same time, use asyncio.gather().

import asyncio

async def task1():
    print("Task 1 starting")
    await asyncio.sleep(2)
    print("Task 1 finished")

async def task2():
    print("Task 2 starting")
    await asyncio.sleep(1)
    print("Task 2 finished")

# Run multiple tasks concurrently
async def main():
    await asyncio.gather(task1(), task2())

asyncio.run(main())
  • Use asyncio.gather() to run multiple asynchronous tasks concurrently.
  • Tasks are started together, and the program waits for all of them to finish.

Using Async Context Managers

Async context managers are used to manage asynchronous resources, such as network connections.

import asyncio

class AsyncContextManager:
    async def __aenter__(self):
        print("Entering async context")
        return self

    async def __aexit__(self, exc_type, exc_value, traceback):
        print("Exiting async context")

# Using the async context manager
async def main():
    async with AsyncContextManager():
        print("Inside the context")

asyncio.run(main())
  • Use __aenter__() and __aexit__() to create an async context manager.
  • Async context managers are useful for managing network connections, files, and other asynchronous resources.

Exploring Advanced Features

Working with collections Module

The collections module provides specialized data structures like Counter, defaultdict, OrderedDict, and deque for enhanced data manipulation.

Counter

from collections import Counter

# Count the occurrences of each character in a string
counter = Counter("hello world")
print(counter)  # Counter({'l': 3, 'o': 2, 'h': 1, 'e': 1, ' ': 1, 'w': 1, 'r': 1, 'd': 1})
  • Counter counts the occurrences of elements in an iterable.

defaultdict

from collections import defaultdict

# Create a defaultdict with a default type of list
dd = defaultdict(list)
dd['fruits'].append('apple')
dd['fruits'].append('banana')
print(dd)  # defaultdict(, {'fruits': ['apple', 'banana']})
  • defaultdict provides default values for missing keys, avoiding KeyError.

OrderedDict

from collections import OrderedDict

# Create an ordered dictionary
od = OrderedDict()
od['a'] = 1
od['b'] = 2
od['c'] = 3
print(od)  # OrderedDict([('a', 1), ('b', 2), ('c', 3)])
  • OrderedDict maintains the order in which items are added.

deque

from collections import deque

# Create a deque and append elements
dq = deque([1, 2, 3])
dq.appendleft(0)
dq.append(4)
print(dq)  # deque([0, 1, 2, 3, 4])

# Pop elements from both ends
dq.pop()  # 4
dq.popleft()  # 0
  • deque is a double-ended queue that allows fast appends and pops from both ends.

itertools for Functional Programming

The itertools module provides iterator-building functions for efficient looping.

import itertools

# Infinite counter
counter = itertools.count(start=1, step=2)
print(next(counter))  # 1
print(next(counter))  # 3

# Cartesian product
product = list(itertools.product([1, 2], ['A', 'B']))
print(product)  # [(1, 'A'), (1, 'B'), (2, 'A'), (2, 'B')]

# Group elements by condition
grouped = itertools.groupby('AAAABBBCCDAA')
for key, group in grouped:
    print(key, list(group))  # Groups consecutive identical elements
  • count(start, step): Infinite counter starting from start.
  • product(): Cartesian product of input iterables.
  • groupby(iterable): Groups consecutive identical elements together.

functools Utilities

The functools module provides higher-order functions that act on other functions.

lru_cache

from functools import lru_cache

@lru_cache(maxsize=100)
def factorial(n):
    if n == 0:
        return 1
    return n * factorial(n - 1)

print(factorial(10))  # 3628800
  • @lru_cache caches function results to avoid redundant calculations.

partial

from functools import partial

# Create a partial function
def multiply(x, y):
    return x * y

double = partial(multiply, 2)
print(double(5))  # 10
  • partial(func, *args, **kwargs) creates a new function with pre-filled arguments.

reduce

from functools import reduce

# Reduce a list to a single value
numbers = [1, 2, 3, 4]
product = reduce(lambda x, y: x * y, numbers)
print(product)  # 24
  • reduce(func, iterable) applies a function cumulatively to the items in the iterable.

Handling JSON and Other Formats

json Module

import json

# Convert Python object to JSON string
data = {'name': 'Alice', 'age': 25}
json_string = json.dumps(data)
print(json_string)  # {"name": "Alice", "age": 25}

# Convert JSON string to Python object
parsed_data = json.loads(json_string)
print(parsed_data)  # {'name': 'Alice', 'age': 25}

# Write JSON to a file
with open('data.json', 'w') as file:
    json.dump(data, file)

# Read JSON from a file
with open('data.json', 'r') as file:
    loaded_data = json.load(file)
  • json.dumps(): Convert a Python object to a JSON string.
  • json.loads(): Parse a JSON string to a Python object.
  • json.dump(): Write a Python object to a file as JSON.
  • json.load(): Load JSON data from a file into a Python object.

pickle for Serialization

import pickle

# Serialize an object and save to a file
data = {'name': 'Bob', 'age': 30}
with open('data.pkl', 'wb') as file:
    pickle.dump(data, file)

# Deserialize the object from the file
with open('data.pkl', 'rb') as file:
    loaded_data = pickle.load(file)

print(loaded_data)  # {'name': 'Bob', 'age': 30}
  • pickle.dump(obj, file): Serialize an object and save it to a binary file.
  • pickle.load(file): Load a pickled object from a file.
  • Pickle is used for serializing and deserializing Python objects, but it should not be used for untrusted data.

Getting Started with IPython

Accessing Documentation (? and ??)

IPython allows you to access documentation for functions, classes, and objects using ? and ??.

# Access documentation for a function
len?

# Access source code along with the documentation
len??
  • object?: Displays documentation for the object.
  • object??: Displays documentation and source code (if available) for the object.

Exploring Modules with Tab Completion

IPython supports tab completion, allowing you to explore available attributes, methods, and variables.

# Press Tab after typing the module name and a dot (.)
import math
math.  # Press Tab to see available attributes and methods
  • Type part of a module name or variable and press Tab to view suggestions.
  • Useful for discovering available functions, methods, and variables in a module.

Keyboard Shortcuts in IPython Shell

Navigation Shortcuts

  • Ctrl + A: Move cursor to the beginning of the line.
  • Ctrl + E: Move cursor to the end of the line.
  • Ctrl + L: Clear the screen.

Text Entry Shortcuts

  • Ctrl + K: Delete from cursor to the end of the line.
  • Ctrl + U: Delete from cursor to the beginning of the line.
  • Ctrl + W: Delete the word before the cursor.

Command History Shortcuts

  • Up/Down Arrow: Navigate through command history.
  • Ctrl + R: Search command history (reverse search).
  • Ctrl + P: Previous command in history.
  • Ctrl + N: Next command in history.

IPython Magic Commands

%paste and %cpaste

These commands allow you to paste and execute multiple lines of code.

# Paste code directly into the IPython shell
%paste  # Pastes the content from the system clipboard

# Paste code interactively, line by line
%cpaste
# Paste the code, then press Ctrl + D to execute
  • %paste: Pastes and executes clipboard content.
  • %cpaste: Opens an interactive prompt to paste multi-line code.

%run

The %run command allows you to run Python scripts directly from the IPython shell.

# Run a Python script
%run my_script.py
  • %run script.py: Executes a Python script as if it were run using python script.py.
  • You can pass arguments to the script: %run script.py arg1 arg2.

%time and %timeit

These commands measure the execution time of a single statement or an entire block of code.

# Measure execution time of a single statement
%time sum([i for i in range(100000)])

# Measure execution time multiple times for better accuracy
%timeit sum([i for i in range(100000)])
  • %time: Measures execution time of a single statement.
  • %timeit: Repeats the execution to provide a more accurate measure of execution time.

Accessing Magic Commands (%magic and %lsmagic)

These commands provide information about all available magic commands.

# List all magic commands
%lsmagic

# View documentation for all magic commands
%magic
  • %lsmagic: Lists all available magic commands.
  • %magic: Displays detailed documentation for all magic commands.

Working with the Shell

Shell Commands in IPython

IPython allows you to run shell commands directly from the IPython shell. You can execute any command you would run in the terminal by prefixing it with an exclamation mark (!).

# List files in the current directory (Linux/Mac)
!ls

# List files in the current directory (Windows)
!dir

# Check the current working directory
!pwd

# Create a new directory
!mkdir my_new_directory

# Remove a file
!rm myfile.txt
  • Prefix shell commands with ! to execute them in IPython.
  • Common commands include !ls (list files) and !pwd (print working directory).
  • Works similarly to how you run commands in a command-line shell (like Bash, Zsh, or Command Prompt).

Passing Values to and from the Shell

You can pass values from IPython to the shell and capture shell output into Python variables.

Passing Python Variables to the Shell

# Define a Python variable
filename = "example.txt"

# Use the variable in a shell command using {}
!echo "This is a test" > {filename}
  • Use curly braces {} to insert Python variables into shell commands.
  • For example, !echo "Hello" > {filename} creates a file with the name stored in filename.

Capturing Shell Output in Python Variables

# Capture output of shell command
files = !ls
print(files)  # List of files as Python list

# Get the current working directory
current_directory = !pwd
print(current_directory[0])  # Print the first line of the output
  • Use output = !command to capture the output of a shell command as a list of strings.
  • Each line of the shell output becomes an element in the list.

%alias

Use %alias to create custom shortcuts for shell commands.

# Create an alias
%alias ll ls -l

# Use the alias
ll
  • %alias name command defines an alias for a shell command.
  • For example, %alias ll ls -l creates a shortcut ll for ls -l.

%env

The %env command displays and modifies environment variables.

# View all environment variables
%env

# Set a new environment variable
%env MY_VAR=hello

# Access the environment variable in Python
import os
print(os.getenv('MY_VAR'))  # hello
  • %env displays all environment variables.
  • Use %env VAR_NAME=value to set an environment variable.
  • Environment variables can be accessed in Python using os.getenv().

%sc (Shell Capture)

Use %sc to run shell commands and capture the output as a Python variable.

# Run a shell command and capture the output
%sc files = ls
print(files)  # ['file1.txt', 'file2.txt']
  • %sc variable = command captures the output of a shell command in the variable.
  • Similar to !command but stores the result directly in a variable.

%cd (Change Directory)

The %cd command changes the current working directory.

# Change the current working directory
%cd /path/to/directory

# Change to the parent directory
%cd ..
  • %cd /path/to/directory changes the current directory.
  • Use %cd .. to move to the parent directory.
  • Similar to the shell cd command.

%pushd and %popd

These commands allow you to navigate directories while remembering the previous location.

# Change directory and save the previous directory
%pushd /path/to/directory

# Return to the previous directory
%popd
  • %pushd changes to a directory and stores the previous directory on a stack.
  • %popd returns to the most recently stored directory.

%pwd

Prints the current working directory.

# Print the current directory
%pwd
  • Displays the current directory path.

%who and %whos

These commands list the variables defined in the current namespace.

# List all variables
%who

# List all variables with more details
%whos
  • %who: Lists the names of all user-defined variables.
  • %whos: Displays a detailed view of each variable, including type and value.

%clear

Clears all user-defined variables from the namespace.

# Clear all variables
%clear
  • Clears all user-defined variables, but it does not affect imports or built-in objects.

Debugging and Profiling

Controlling Exceptions (%xmode)

IPython allows you to control how exceptions are displayed using the %xmode magic command. There are three modes: Plain, Context, and Verbose.

# Set the exception display mode to 'Verbose' (more detailed tracebacks)
%xmode Verbose

# Set the exception display mode to 'Context' (default view)
%xmode Context

# Set the exception display mode to 'Plain' (minimal traceback)
%xmode Plain
  • Plain: Minimal traceback information.
  • Context: Default traceback view with context around each call.
  • Verbose: Full traceback with variable values at each step.

Debugging Tracebacks

IPython provides an interactive debugger that allows you to step through the traceback of an error and inspect variables.

# Trigger an error and start the debugger
def divide(a, b):
    return a / b

divide(10, 0)  # This will raise a ZeroDivisionError

Once an error occurs, you can activate the debugger using %debug or automatically enter the debugger on exceptions with %pdb on.

# Enable automatic debugging on errors
%pdb on

# Trigger an error
divide(10, 0)  # The debugger will activate automatically
  • %debug: Starts an interactive debugging session after an error occurs.
  • %pdb on: Automatically enters the debugger when an exception occurs.
  • While in the debugger, you can use commands like n (next), c (continue), and q (quit).

Timing Code Execution (%time and %timeit)

IPython provides tools to measure the execution time of code. Use %time for a single execution and %timeit for multiple executions to get an average time.

%time

# Measure execution time of a single statement
%time sum([i for i in range(1000000)])
  • Use %time to measure the time it takes to execute a single line of code.
  • It displays wall time and CPU time.

%timeit

# Measure execution time with multiple iterations for better accuracy
%timeit sum([i for i in range(1000000)])
  • Use %timeit to measure execution time by running the statement multiple times.
  • Provides more accurate measurements since it eliminates noise from background processes.

Profiling Scripts (%prun, %lprun, %memit, %mprun)

%prun (Profile the execution time of a program)

Use %prun to get detailed statistics about the execution time of each function call in your script.

# Profile a function
def factorial(n):
    if n == 0:
        return 1
    return n * factorial(n - 1)

%prun factorial(10)
  • Use %prun to profile the execution of an entire function or script.
  • It shows the number of calls, total time, and time per call for each function.

%lprun (Line-by-line profiling)

Line-by-line profiling allows you to see how much time is spent on each line of a function. To use %lprun, you need to install the line_profiler package.

# Example of line-by-line profiling (requires line_profiler)
from time import sleep

def example_function():
    sleep(1)
    result = [i ** 2 for i in range(10000)]
    sleep(2)
    return sum(result)

%lprun -f example_function example_function()
  • Use %lprun -f function_name to profile individual lines of a function.
  • Displays the time spent on each line of code in the function.
  • Requires the line_profiler package (install via pip install line_profiler).

%memit (Memory usage measurement)

Measure the memory usage of a function or statement using %memit. This requires the memory_profiler package.

# Example of measuring memory usage
%load_ext memory_profiler

# Measure the memory usage of this list comprehension
%memit [i ** 2 for i in range(100000)]
  • Use %memit to measure the peak memory usage of a statement.
  • Requires the memory_profiler package (install via pip install memory_profiler).

%mprun (Memory profiling line-by-line)

Profile memory usage line-by-line in a function. Similar to %lprun, but for memory instead of time.

# Example of memory profiling line-by-line
from memory_profiler import profile

@profile
def example_function():
    x = [i for i in range(100000)]
    y = [i ** 2 for i in x]
    del x
    return y

example_function()
  • Use %mprun -f function_name function_name() to measure memory usage of each line in a function.
  • Requires the memory_profiler package (install via pip install memory_profiler).
  • Must be run as a Python script (not in Jupyter) for proper profiling.

Introduction to NumPy

Understanding Data Types in Python

Python supports several data types, but NumPy introduces more efficient data types for numerical computation. NumPy arrays have fixed types, allowing for faster computations and less memory usage.

  • Common NumPy data types include int32, float64, and bool.
  • NumPy arrays require all elements to be of the same data type.

Creating and Manipulating Arrays

Arrays from Python Lists

import numpy as np

# Create an array from a Python list
array_from_list = np.array([1, 2, 3, 4, 5])

Arrays from Scratch

# Create arrays using NumPy functions
zeros_array = np.zeros((2, 3))  # 2x3 array of zeros
ones_array = np.ones((2, 3))  # 2x3 array of ones
random_array = np.random.rand(2, 3)  # 2x3 array of random numbers

NumPy Standard Data Types

# Specify the data type of an array
int_array = np.array([1, 2, 3], dtype=np.int32)
float_array = np.array([1.2, 3.4, 5.6], dtype=np.float64)
  • Use np.array() to create arrays from Python lists.
  • Functions like np.zeros() and np.ones() create arrays from scratch.
  • Specify the data type using the dtype parameter.

Basics of NumPy Arrays

Attributes, Indexing, and Slicing

# Create an array
array = np.array([[1, 2, 3], [4, 5, 6]])

# Array attributes
shape = array.shape  # (2, 3)
size = array.size  # 6 (total number of elements)
dtype = array.dtype  # Data type of the elements

# Indexing and slicing
element = array[1, 2]  # Access element at row 1, column 2
sub_array = array[:, 1]  # Get all rows for column 1

Reshaping Arrays

# Reshape an array
array = np.array([1, 2, 3, 4, 5, 6])
reshaped = array.reshape((2, 3))  # Reshape to 2x3 array

Array Concatenation and Splitting

# Concatenate arrays
array1 = np.array([1, 2, 3])
array2 = np.array([4, 5, 6])
concatenated = np.concatenate([array1, array2])  # [1, 2, 3, 4, 5, 6]

# Split an array
split_array = np.split(array1, 3)  # Split into 3 parts
  • Access array elements using array[row, col].
  • Use array.reshape() to change the shape of an array.
  • Concatenate arrays with np.concatenate() and split them with np.split().

Computation on Arrays

Universal Functions (UFuncs)

# Perform element-wise operations
array = np.array([1, 2, 3])
squared = np.square(array)  # [1, 4, 9]

Broadcasting

# Broadcast a scalar to an array
array = np.array([1, 2, 3])
result = array + 5  # [6, 7, 8]
  • UFuncs perform element-wise computations (e.g., np.square()).
  • Broadcasting allows operations on arrays of different shapes.

Aggregations

Summing, Min, Max, and Averages

# Aggregate functions
array = np.array([1, 2, 3, 4, 5])
total = np.sum(array)  # 15
minimum = np.min(array)  # 1
maximum = np.max(array)  # 5
average = np.mean(array)  # 3.0

Boolean Masks and Comparisons

# Use boolean masks
array = np.array([1, 2, 3, 4, 5])
mask = array > 3  # [False, False, False, True, True]
filtered = array[mask]  # [4, 5]
  • Aggregation functions like np.sum() and np.mean() compute summary statistics.
  • Boolean masks filter elements in an array based on a condition.

Advanced Indexing

Fancy Indexing

# Fancy indexing
array = np.array([10, 20, 30, 40, 50])
selected = array[[0, 2, 4]]  # [10, 30, 50]

Sorting and Binning Data

# Sort an array
array = np.array([3, 1, 4, 1, 5, 9])
sorted_array = np.sort(array)  # [1, 1, 3, 4, 5, 9]
  • Fancy indexing allows selecting multiple elements using an array of indices.
  • Sort arrays using np.sort().

Structured Arrays

Structured arrays allow you to define arrays with multiple fields, similar to a database table.

# Create a structured array
data = np.array([(25, 'Alice', 55.0), (30, 'Bob', 60.5)], 
                dtype=[('age', 'i4'), ('name', 'U10'), ('weight', 'f4')])

# Access data by field name
ages = data['age']  # [25, 30]
  • Structured arrays are arrays with fields that have names and data types.
  • Access fields using array['field_name'].

Data Manipulation with Pandas

Introduction to Pandas Objects

Series

A Series is a one-dimensional array-like object with labeled indices.

import pandas as pd

# Create a Series from a list
series = pd.Series([10, 20, 30, 40], index=['a', 'b', 'c', 'd'])
print(series)
  • Series are similar to NumPy arrays but with labels for each element.
  • Access elements using labels, e.g., series['a'].

DataFrame

A DataFrame is a two-dimensional, table-like data structure with labeled rows and columns.

# Create a DataFrame from a dictionary
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
})
print(df)
  • DataFrames are like Excel spreadsheets or SQL tables.
  • Access columns using df['column_name'] or df.column_name.

Index

An Index is an immutable array of row or column labels in a Series or DataFrame.

# Access DataFrame index
index = df.index
  • The Index is used to label rows or columns in a Series or DataFrame.

Data Selection and Indexing

Selection in Series and DataFrames

# Select elements from a Series
element = series['a']

# Select a row from a DataFrame
row = df.loc[0]  # Access by label
row = df.iloc[0]  # Access by integer position

# Select multiple columns
columns = df[['Name', 'Age']]
  • Use loc to select rows by label and iloc to select by position.
  • Use df['column_name'] or df[['col1', 'col2']] to select columns.

Hierarchical Indexing

# Create a DataFrame with a hierarchical index
multi_index_df = pd.DataFrame(
    {'A': ['foo', 'foo', 'bar', 'bar'],
      'B': ['one', 'two', 'one', 'two'],
      'C': [1, 2, 3, 4]}
).set_index(['A', 'B'])

print(multi_index_df)
  • Hierarchical indices allow for multi-level indexing in rows or columns.
  • Access elements using a tuple of indices, e.g., multi_index_df.loc[('foo', 'one')].

Handling Missing Data

Identifying and Filling Null Values

# Identify missing values
missing_values = df.isnull()

# Fill missing values
filled_df = df.fillna(0)
  • Use isnull() to detect missing values.
  • Use fillna() to fill missing values with a specified value.

Combining Datasets

Concatenation

# Concatenate two DataFrames
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})
concatenated_df = pd.concat([df1, df2])
  • Use pd.concat() to concatenate multiple DataFrames along rows or columns.

Merge and Join Operations

# Merge two DataFrames
df1 = pd.DataFrame({'key': ['A', 'B', 'C'], 'value': [1, 2, 3]})
df2 = pd.DataFrame({'key': ['B', 'C', 'D'], 'value2': [4, 5, 6]})
merged_df = pd.merge(df1, df2, on='key')
  • Use pd.merge() to merge DataFrames on a common key.
  • Use df.join() to join DataFrames by their indices.

Aggregation and Grouping

GroupBy Operations

# Group by and aggregate
grouped = df.groupby('City')['Age'].mean()
  • Use groupby() to group by a column and perform aggregations.

Pivot Tables

# Create a pivot table
pivot_table = df.pivot_table(values='Age', index='City', aggfunc='mean')
  • Use pivot_table() to create summary tables with aggregations.

Working with Time Series

Indexing by Time

# Create a time-indexed DataFrame
time_df = pd.DataFrame(
    {'value': [1, 2, 3, 4]},
    index=pd.date_range('2023-01-01', periods=4)
)
  • Use pd.date_range() to create a DateTime index.

Resampling, Shifting, and Windowing

# Resample data
resampled = time_df.resample('D').mean()

# Shift data
shifted = time_df.shift(1)
  • Use resample() to change the frequency of time-series data.
  • Use shift() to shift data forward or backward.

Data Visualization

Visualization with Matplotlib

Basic Plots: Line, Scatter, and Bar Plots

import matplotlib.pyplot as plt

# Line plot
x = [1, 2, 3, 4, 5]
y = [10, 20, 30, 40, 50]
plt.plot(x, y, label='Line Plot')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Line Plot Example')
plt.legend()
plt.show()

# Scatter plot
plt.scatter(x, y, color='red', label='Scatter Plot')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Scatter Plot Example')
plt.legend()
plt.show()

# Bar plot
plt.bar(x, y, color='green', label='Bar Plot')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Bar Plot Example')
plt.legend()
plt.show()
  • Use plt.plot() for line plots.
  • Use plt.scatter() for scatter plots.
  • Use plt.bar() for bar plots.

Plot Customization: Colors, Styles, Labels, Legends

# Customizing line style, color, and markers
plt.plot(x, y, color='purple', linestyle='--', marker='o', label='Customized Line')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Customized Line Plot')
plt.legend()
plt.show()
  • Change line style using linestyle (e.g., '--', ':').
  • Change color using color (e.g., 'red', 'blue').
  • Set markers using marker (e.g., 'o', '^').

Subplots and Grids

# Create multiple subplots
fig, axs = plt.subplots(2, 2)  # 2x2 grid of subplots

# Plot on each subplot
axs[0, 0].plot(x, y)
axs[0, 1].scatter(x, y)
axs[1, 0].bar(x, y)
axs[1, 1].hist(y)

plt.tight_layout()
plt.show()
  • Use plt.subplots() to create grids of subplots.
  • Access individual subplots using axs[row, col].

Histograms and Density Plots

# Histogram
data = [1, 1, 2, 2, 2, 3, 3, 4, 4, 5]
plt.hist(data, bins=5, color='orange', edgecolor='black')
plt.xlabel('Values')
plt.ylabel('Frequency')
plt.title('Histogram Example')
plt.show()
  • Use plt.hist() to plot histograms.
  • Use the bins parameter to define the number of bins.

Three-Dimensional Plotting

from mpl_toolkits.mplot3d import Axes3D

# Create 3D plot
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
x = [1, 2, 3, 4, 5]
y = [10, 20, 30, 40, 50]
z = [5, 10, 15, 20, 25]
ax.plot(x, y, z)
plt.show()
  • Use projection='3d' in add_subplot() to create 3D plots.

Text, Annotations, and Tick Customization

# Customizing ticks, adding annotations, and text
plt.plot(x, y)
plt.xticks(ticks=[1, 3, 5], labels=['One', 'Three', 'Five'])
plt.yticks(ticks=[10, 30, 50], labels=['Low', 'Medium', 'High'])
plt.text(3, 30, 'Midpoint', fontsize=12, color='red')
plt.show()
  • Use plt.xticks() and plt.yticks() to customize tick labels.
  • Use plt.text() to annotate specific points.

Visualization with Seaborn

Statistical Plots

import seaborn as sns

# Distribution plot
sns.histplot(data, kde=True)
plt.show()
  • Seaborn's histplot() provides enhanced histograms with KDE curves.

Pair Plots, Distribution Plots, and Categorical Plots

# Pair plot
sns.pairplot(df)
plt.show()

# Categorical plot
sns.catplot(x='City', y='Age', kind='bar', data=df)
plt.show()
  • Use sns.pairplot() to visualize relationships between all columns in a DataFrame.
  • Use sns.catplot() to plot categorical data.

Leave a Comment

Comments

KC

Thank You

Are You a Physicist?


Join Our
FREE-or-Land-Job Data Science BootCamp