Comprehensive Python Programming for Data Science
Abstract:
With tools like ChatGPT, it has never been easier to look up Python for data science and spend less time on programming and more time on deriving meaningful insights from data. However, we still need to ask the right questions for ChatGPT to look up in the right place and provide a professional response. Therefore, it is very important to know what to ask and in what order to ask. That is what we are going to focus on in this page. By quickly moving down this page, familiarize yourself, or refresh your memory, with the core of Python programming language and useful libraries that data scientists use in their daily jobs. Note: since language models may return wrong answers, we also provide the right answer to these questions below. After reviewing this page, you should have enough background to ask ChatGPT the right questions and fill in whatever part is not covered on this page.
Table of Contents:
- Core Data Types and Structures
- Numbers
- Integers, floats, and complex numbers
- Operations and conversions
- Strings
- String methods and slicing
- Formatting strings
- Booleans
- Logical operators and truthiness
- Built-in collections
- Lists
- Methods and comprehensions
- Tuples
- Immutability and use cases
- Sets
- Set operations (union, intersection, difference)
- Dictionaries
- Key-value pairs, methods, and dictionary comprehensions
- Lists
- Numbers
- Control Flow
- Conditional statements (
if
,elif
,else
) - Loops
for
loops withrange()
while
loops- Loop control (
break
,continue
,else
)
- Comprehensions
- List, set, and dictionary comprehensions
- Conditional statements (
- Functions
- Defining and calling functions
- Arguments and parameters
- Positional, keyword, and default arguments
*args
and**kwargs
- Anonymous functions with
lambda
- Scope and the
global
keyword - Higher-order functions
map
,filter
, andreduce
- Modules and Packages
- Importing modules
import
,from ... import ...
,as
keyword
- Standard library overview
math
,random
,os
,sys
,time
,datetime
- Writing and using custom modules
- Exploring the
dir()
andhelp()
functions
- Importing modules
- File Handling
- Opening and closing files
- Reading and writing files
- File modes (
r
,w
,a
,rb
, etc.) - Using
with
statement for file handling - Working with
os
andshutil
modules
- Error Handling
try
,except
,else
, andfinally
blocks- Raising exceptions
- Built-in exceptions
- Custom exception classes
- Iterators and Generators
- Creating and using iterators
- Generator functions and expressions
yield
keyword
- Object-Oriented Programming (OOP)
- Classes and objects
- Methods and attributes
__init__
method- Inheritance and polymorphism
- Special methods (
__str__
,__repr__
,__len__
, etc.) - Encapsulation and property decorators
- Built-in Functions and Utilities
- Mathematical functions
- abs(), pow(), round(), etc.
- Iterable functions
- len(), max(), min(), sorted(), enumerate()
- Type-related functions
- type(), isinstance(), id()
- I/O functions
- print(), input()
- Conversion functions
- int(), float(), str(), list(), etc.
- Mathematical functions
- Decorators and Context Managers
- Creating and applying decorators
- Built-in decorators (
@staticmethod
,@classmethod
,@property
) - Creating custom context managers with
__enter__
and__exit__
- Concurrency and Parallelism
- Introduction to
threading
andmultiprocessing
- Async programming with
asyncio
- Introduction to
- Exploring Advanced Features
- Working with
collections
moduleCounter
,defaultdict
,OrderedDict
,deque
itertools
for functional programmingfunctools
utilitieslru_cache
,partial
,reduce
- Handling JSON and other formats
json
modulepickle
for serialization
- Working with
- Getting Started with IPython
- Accessing documentation (
?
,??
) - Exploring modules with Tab completion
- Keyboard shortcuts in IPython Shell
- Navigation shortcuts
- Text entry shortcuts
- Command history shortcuts
- IPython magic commands
%paste
,%cpaste
,%run
%time
,%timeit
- Accessing magic commands (
%magic
,%lsmagic
)
- Accessing documentation (
- Working with the Shell
- Shell commands in IPython
- Passing values to and from the shell
- Shell-related magic commands
- Debugging and Profiling
- Controlling exceptions (
%xmode
) - Debugging tracebacks
- Timing code execution (
%timeit
,%time
) - Profiling scripts (
%prun
,%lprun
,%memit
,%mprun
)
- Controlling exceptions (
- Introduction to NumPy
- Understanding data types in Python
- Creating and manipulating arrays
- Arrays from Python lists
- Arrays from scratch
- NumPy standard data types
- Basics of NumPy arrays
- Attributes, indexing, and slicing
- Reshaping arrays
- Array concatenation and splitting
- Computation on arrays
- Universal functions (UFuncs)
- Broadcasting
- Aggregations
- Summing, min, max, and averages
- Boolean masks and comparisons
- Advanced indexing
- Fancy indexing
- Sorting and binning data
- Structured arrays
- Data Manipulation with Pandas
- Introduction to Pandas objects
- Series, DataFrames, and Index objects
- Data selection and indexing
- Handling missing data
- Combining datasets
- Aggregation and grouping
- Working with time series
- Introduction to Pandas objects
- Data Visualization
- Visualization with Matplotlib
- Basic plots: Line, scatter, and bar plots
- Plot customization: Colors, styles, labels, legends
- Subplots and grids
- Histograms and density plots
- Three-dimensional plotting
- Text, annotations, and tick customization
- Visualization with Seaborn
- Statistical plots
- Pair plots, distribution plots, and categorical plots
- Visualization with Matplotlib
Core Data Types and Structures
Numbers
Python supports different types of numbers including integers, floats, and complex numbers. These types support various operations and can be converted from one type to another.
# Integer
int_num = 10
# Float
float_num = 10.5
# Complex
complex_num = 3 + 4j
# Operations
sum_result = int_num + float_num # Addition
product_result = int_num * 2 # Multiplication
division_result = int_num / 3 # Division
# Type conversions
float_to_int = int(float_num) # Convert float to int
int_to_float = float(int_num) # Convert int to float
- Integers are whole numbers, e.g., 10, -5, 0.
- Floats are numbers with decimal points, e.g., 10.5, -3.14.
- Complex numbers have a real and imaginary part, e.g., 3 + 4j.
- Conversions between types can be done using
int()
andfloat()
.
Strings
Strings are sequences of characters. Python provides methods for manipulating strings, slicing, and formatting them.
# String declaration
greeting = "Hello, World!"
# Slicing
substring = greeting[0:5] # 'Hello'
# String methods
uppercase = greeting.upper() # 'HELLO, WORLD!'
lowercase = greeting.lower() # 'hello, world!'
replaced = greeting.replace("World", "Python") # 'Hello, Python!'
# String formatting
name = "John"
age = 25
formatted_string = f"My name is {name} and I am {age} years old."
- Strings are immutable, meaning their content cannot be changed in place.
- Use square brackets to slice strings, e.g.,
string[start:end]
. - Common string methods include
.upper()
,.lower()
, and.replace()
. - String interpolation can be done using
f-strings
orformat()
.
Booleans
Booleans represent one of two possible values: True
or False
. Logical operators are used to perform logical operations.
# Boolean values
is_python_fun = True
is_raining = False
# Logical operators
and_result = is_python_fun and is_raining # False
or_result = is_python_fun or is_raining # True
not_result = not is_python_fun # False
# Truthiness
truthy_check = bool(1) # True, since 1 is considered True
falsy_check = bool(0) # False, since 0 is considered False
- Logical operators include
and
,or
, andnot
. - Python considers non-zero numbers, non-empty collections, and
True
as truthy. - Falsy values include
0
,None
, and empty collections.
Built-in Collections
Lists
Lists are mutable collections that store ordered elements. They allow duplication and can be modified in place.
# List declaration
fruits = ["apple", "banana", "cherry"]
# List methods
fruits.append("orange") # Add an item
fruits.remove("banana") # Remove an item
fruits.sort() # Sort the list alphabetically
# List comprehension
squares = [x**2 for x in range(5)] # [0, 1, 4, 9, 16]
- Lists are mutable, meaning they can be changed in place.
- Common methods include
.append()
,.remove()
, and.sort()
. - List comprehensions provide a concise way to create new lists.
Tuples
Tuples are immutable, ordered collections often used for fixed sets of items.
# Tuple declaration
coordinates = (10, 20)
# Accessing tuple elements
x = coordinates[0] # 10
y = coordinates[1] # 20
# Tuples are immutable, so this would raise an error:
# coordinates[0] = 15
- Tuples are immutable, meaning their content cannot be modified after creation.
- Useful for representing fixed collections of related data (e.g., coordinates).
Sets
Sets are unordered collections of unique elements.
# Set declaration
fruits = {"apple", "banana", "cherry"}
# Set operations
union_result = fruits.union({"orange", "grape"}) # Combine sets
intersection_result = fruits.intersection({"banana", "kiwi"}) # Items present in both sets
difference_result = fruits.difference({"banana", "cherry"}) # Items in the first set but not the second
- Sets only store unique values, and duplicates are ignored.
- Supports operations like
union()
,intersection()
, anddifference()
.
Dictionaries
Dictionaries store key-value pairs. Keys must be unique, and values can be of any type.
# Dictionary declaration
person = {
"name": "John",
"age": 30,
"city": "New York"
}
# Accessing values
name = person["name"] # 'John'
# Adding a new key-value pair
person["job"] = "Engineer"
# Updating values
person["age"] = 31
# Dictionary methods
keys = person.keys() # Get all keys
values = person.values() # Get all values
items = person.items() # Get all key-value pairs
# Dictionary comprehension
squares = {x: x**2 for x in range(5)} # {0: 0, 1: 1, 2: 4, 3: 9, 4: 16}
- Dictionaries use key-value pairs to store data.
- Access values using
dict[key]
or with theget()
method. - Common methods include
.keys()
,.values()
, and.items()
. - Dictionary comprehensions provide a concise way to create dictionaries.
Control Flow
Conditional Statements
Conditional statements allow you to control the flow of your program based on conditions using if
, elif
, and else
statements.
# Example of conditional statements
x = 10
if x > 0:
result = "Positive"
elif x == 0:
result = "Zero"
else:
result = "Negative"
- Use
if
to check a condition. - Use
elif
(else if) to check multiple conditions. - Use
else
to specify a block of code to run if all conditions are false.
Loops
For Loops with range()
A for
loop iterates over a sequence (like a list, string, or range). The range()
function generates a sequence of numbers.
# For loop using range
for i in range(5): # Loops from 0 to 4
print(f"Iteration {i}")
range(n)
generates numbers from 0 up to (but not including) n.- You can also specify start, stop, and step:
range(start, stop, step)
. - Useful for iterating a fixed number of times.
While Loops
A while
loop repeats as long as a given condition is True.
# While loop example
count = 0
while count < 5:
print(f"Count is {count}")
count += 1 # Increment count
- The loop continues as long as the condition
count < 5
is True. - Be cautious of infinite loops, where the condition is never False.
- Use
break
to exit a loop early.
Loop Control (break, continue, else)
Loop control statements modify the behavior of loops. They allow you to exit or skip iterations.
# Using break, continue, and else
for i in range(5):
if i == 3:
break # Exit the loop when i is 3
if i % 2 == 0:
continue # Skip even numbers
print(f"Odd number: {i}")
else:
print("Loop completed without break")
break
stops the loop immediately.continue
skips the current iteration and moves to the next one.- The
else
block runs if the loop is not terminated by abreak
.
Comprehensions
List Comprehensions
List comprehensions provide a concise way to create lists from iterables.
# List comprehension
squares = [x**2 for x in range(5)] # [0, 1, 4, 9, 16]
- Syntax:
[expression for item in iterable if condition]
. - List comprehensions are more readable and concise than traditional loops.
Set Comprehensions
Set comprehensions create sets, ensuring all items are unique.
# Set comprehension
unique_squares = {x**2 for x in range(-3, 4)} # {0, 1, 4, 9}
- Similar to list comprehensions, but sets do not allow duplicate values.
- Syntax:
{expression for item in iterable if condition}
.
Dictionary Comprehensions
Dictionary comprehensions create dictionaries from iterables.
# Dictionary comprehension
squares_dict = {x: x**2 for x in range(5)} # {0: 0, 1: 1, 2: 4, 3: 9, 4: 16}
- Syntax:
{key_expression: value_expression for item in iterable}
. - Useful for building dictionaries dynamically from other iterables.
Functions
Defining and Calling Functions
Functions are reusable blocks of code that perform specific tasks. They are defined using the def
keyword.
# Function definition
def greet(name):
return f"Hello, {name}!"
# Function call
message = greet("John")
- Use
def function_name(parameters):
to define a function. - Call the function using
function_name(arguments)
. - The
return
statement sends a result back to the caller.
Arguments and Parameters
Functions can have positional, keyword, and default arguments, which define how arguments are passed.
# Function with positional, keyword, and default arguments
def introduce(name, age=30, city="New York"):
return f"My name is {name}, I am {age} years old, and I live in {city}."
# Positional arguments
intro1 = introduce("Alice", 25, "Boston")
# Keyword arguments
intro2 = introduce(name="Bob", age=40, city="Chicago")
# Using default arguments
intro3 = introduce("Charlie")
- Positional arguments must be passed in the correct order.
- Keyword arguments explicitly specify parameter names when calling a function.
- Default arguments are used when no argument is provided for a parameter.
*args and **kwargs
*args
and **kwargs
allow functions to accept a variable number of arguments.
# Using *args for variable-length arguments
def sum_numbers(*args):
return sum(args)
total = sum_numbers(1, 2, 3, 4) # 10
# Using **kwargs for variable-length keyword arguments
def print_info(**kwargs):
for key, value in kwargs.items():
print(f"{key}: {value}")
print_info(name="Alice", age=25, city="Boston")
*args
allows a function to accept any number of positional arguments as a tuple.**kwargs
allows a function to accept any number of keyword arguments as a dictionary.- Both
*args
and**kwargs
can be used in the same function.
Anonymous Functions with lambda
Lambda functions are anonymous (unnamed) functions that can have multiple arguments but only one expression.
# Lambda function to square a number
square = lambda x: x ** 2
# Call the lambda function
result = square(4) # 16
# Lambda function with multiple arguments
add = lambda x, y: x + y
result2 = add(3, 5) # 8
- Lambda functions are defined as
lambda arguments: expression
. - They are often used as short, one-line functions.
- Useful for short, temporary, or one-time-use functions.
Scope and the global Keyword
Scope determines where variables are accessible. The global
keyword allows access to global variables inside functions.
# Global variable
counter = 0
def increment():
global counter # Access the global variable
counter += 1
increment()
increment()
- Variables defined inside a function have local scope and are only accessible within that function.
- Global variables exist outside functions and can be accessed using the
global
keyword.
Higher-Order Functions
Higher-order functions are functions that accept other functions as arguments or return functions as results.
map
The map()
function applies a given function to each item in an iterable (like a list) and returns a map object.
# Using map to square a list of numbers
numbers = [1, 2, 3, 4, 5]
squared_numbers = list(map(lambda x: x ** 2, numbers)) # [1, 4, 9, 16, 25]
map(function, iterable)
applies a function to every item in the iterable.- Returns a map object, which can be converted to a list or other collections.
filter
The filter()
function filters elements from an iterable based on a condition.
# Using filter to get even numbers from a list
numbers = [1, 2, 3, 4, 5, 6]
even_numbers = list(filter(lambda x: x % 2 == 0, numbers)) # [2, 4, 6]
filter(function, iterable)
returns elements where the function returns True.- Returns a filter object, which can be converted to a list or other collections.
reduce
The reduce()
function from the functools
module reduces a sequence to a single value using a function.
# Using reduce to compute the product of a list of numbers
from functools import reduce
numbers = [1, 2, 3, 4]
product = reduce(lambda x, y: x * y, numbers) # 24
reduce(function, iterable)
applies a rolling computation to elements in an iterable.- Requires importing
reduce
from thefunctools
module. - Useful for cumulative operations like summing, multiplying, or concatenating elements.
Modules and Packages
Importing Modules
Modules allow you to organize Python code into reusable files. You can import entire modules or specific parts of them.
# Import the entire module
import math
result1 = math.sqrt(16) # 4.0
# Import specific functions from a module
from math import sqrt, pi
result2 = sqrt(25) # 5.0
circle_area = pi * (5 ** 2) # Area of a circle with radius 5
# Import a module with an alias
import random as rnd
random_number = rnd.randint(1, 10) # Random number between 1 and 10
import module_name
imports the entire module.from module_name import specific_function
imports specific parts of a module.import module_name as alias
gives the module a shorter name (alias) for convenience.
Standard Library Overview
Python has a rich standard library that provides modules for many common tasks. Here are some key modules:
math
import math
result = math.factorial(5) # 120
- Provides mathematical functions like
sqrt()
,factorial()
, and constants likepi
.
random
import random
random_choice = random.choice(['apple', 'banana', 'cherry'])
- Generates random numbers, selects random elements, and shuffles sequences.
os
import os
current_directory = os.getcwd() # Get the current working directory
- Interacts with the operating system, allowing access to files, directories, and environment variables.
sys
import sys
print(sys.version) # Print Python version
- Provides system-specific parameters and functions, such as command-line arguments and Python version info.
time
import time
current_time = time.time() # Get the current time in seconds since epoch
- Provides functions for time-related tasks like sleeping, measuring execution time, and working with timestamps.
datetime
import datetime
current_date = datetime.datetime.now() # Get the current date and time
- Used for date and time manipulation, such as getting the current date or formatting dates.
Writing and Using Custom Modules
You can create your own Python module by saving functions and classes in a .py
file and importing it.
# File: my_module.py
def greet(name):
return f"Hello, {name}!"
def add(a, b):
return a + b
# Main script to import and use the custom module
import my_module
message = my_module.greet("Alice")
sum_result = my_module.add(3, 4)
- Create a file named
my_module.py
containing functions and classes. - Import it using
import my_module
orfrom my_module import function_name
. - Custom modules help organize and reuse code in multiple files.
Exploring the dir() and help() Functions
The dir()
and help()
functions are useful for exploring Python modules, objects, and functions.
# Using dir() to list attributes and methods of a module
import math
print(dir(math)) # Lists all attributes, methods, and constants in the math module
# Using help() to display the documentation for a function or module
help(math.sqrt) # Shows documentation for the sqrt() function
dir(object)
returns a list of all attributes, methods, and constants of the object.help(object)
displays detailed documentation and usage information for an object.- Use
dir()
to explore what is available andhelp()
to learn how to use it.
File Handling
Opening and Closing Files
File handling allows you to read from and write to files on your system. Files must be opened before they can be read or written, and closed afterward to free system resources.
# Opening and closing a file
file = open("example.txt", "w") # Open the file in write mode
file.write("Hello, World!")
file.close() # Close the file
- Use
open(filename, mode)
to open a file. - Use
file.close()
to close the file after use. - It's a good practice to close files to avoid resource leaks.
Reading and Writing Files
You can read from and write to files using the read()
, readline()
, readlines()
, and write()
methods.
# Writing to a file
with open("example.txt", "w") as file:
file.write("Hello, World!\n")
file.write("Welcome to Python file handling.\n")
# Reading from a file
with open("example.txt", "r") as file:
content = file.read() # Read the entire file
print(content)
file.write(data)
writes a string to the file.file.read()
reads the entire file as a string.file.readline()
reads one line at a time.file.readlines()
reads all lines and returns a list of strings.
File Modes
File modes determine how files are opened. Here are the most common file modes:
# File modes
with open("example.txt", "r") as file: # Read mode
content = file.read()
with open("example.txt", "w") as file: # Write mode (overwrites the file)
file.write("This will overwrite the file.")
with open("example.txt", "a") as file: # Append mode (adds to the end of the file)
file.write("This will be appended to the file.")
with open("example.txt", "rb") as file: # Read binary mode
binary_content = file.read()
r
- Read mode (default) - Opens the file for reading.w
- Write mode - Overwrites the file if it exists, or creates a new one.a
- Append mode - Adds new content to the end of the file.rb
- Read binary - Reads binary files like images or executables.wb
- Write binary - Writes binary data to the file.
Using the with Statement
The with
statement is used to manage file resources. It automatically closes the file once the block is finished, even if an error occurs.
# Using 'with' to automatically close the file
with open("example.txt", "r") as file:
content = file.read()
print(content) # The file is automatically closed after this block
- The
with
statement ensures the file is properly closed after use. - It eliminates the need to explicitly call
file.close()
.
Working with os and shutil Modules
The os
and shutil
modules provide functions for file and directory manipulation.
os Module
import os
# Get the current working directory
current_directory = os.getcwd()
# List files in the current directory
files = os.listdir()
# Create a new directory
os.mkdir("new_folder")
# Remove a directory
os.rmdir("new_folder")
# Check if a file exists
file_exists = os.path.exists("example.txt")
os.getcwd()
returns the current working directory.os.listdir()
lists files and directories in the current directory.os.mkdir()
creates a new directory.os.rmdir()
removes an empty directory.os.path.exists()
checks if a file or directory exists.
shutil Module
import shutil
# Copy a file
shutil.copy("example.txt", "copy_example.txt")
# Move a file
shutil.move("copy_example.txt", "new_folder/copy_example.txt")
# Remove a file
os.remove("example.txt")
shutil.copy(src, dst)
copies a file fromsrc
todst
.shutil.move(src, dst)
moves a file fromsrc
todst
.os.remove()
deletes a file from the filesystem.
Error Handling
try, except, else, and finally Blocks
Error handling allows you to gracefully handle exceptions that might occur in your program. Python provides try
, except
, else
, and finally
blocks to catch and handle exceptions.
# Example of try, except, else, and finally
try:
num = int(input("Enter a number: "))
result = 10 / num # May raise ZeroDivisionError
except ZeroDivisionError:
print("Error: Cannot divide by zero.")
except ValueError:
print("Error: Invalid input. Please enter a number.")
else:
print(f"Division successful, result is {result}")
finally:
print("This block runs no matter what.")
try
: Defines a block of code to test for exceptions.except
: Handles specific exceptions that occur in thetry
block.else
: Runs if no exceptions occur in thetry
block.finally
: Always runs, regardless of what happens intry
orexcept
.
Raising Exceptions
Python allows you to raise exceptions explicitly using the raise
statement.
# Example of raising exceptions
def check_age(age):
if age < 0:
raise ValueError("Age cannot be negative.")
print(f"Age is {age}")
try:
check_age(-5)
except ValueError as e:
print(f"Exception occurred: {e}")
- Use
raise ExceptionType("message")
to raise an exception. - Raising exceptions allows you to enforce rules and validate input.
Built-in Exceptions
Python has several built-in exceptions that are raised when errors occur. Here are some commonly used exceptions:
ValueError
: Raised when a function receives an argument of the right type but with an inappropriate value.TypeError
: Raised when an operation or function is applied to an object of an inappropriate type.IndexError
: Raised when an index is out of range for lists, tuples, etc.KeyError
: Raised when a dictionary key is not found.ZeroDivisionError
: Raised when division or modulo by zero occurs.FileNotFoundError
: Raised when an attempt to open a file that does not exist is made.
# Examples of built-in exceptions
try:
my_list = [1, 2, 3]
print(my_list[5]) # Raises IndexError
except IndexError as e:
print(f"Exception occurred: {e}")
try:
result = 10 / 0 # Raises ZeroDivisionError
except ZeroDivisionError as e:
print(f"Exception occurred: {e}")
- Common exceptions include
TypeError
,ValueError
,IndexError
, andKeyError
. - Use
try
andexcept
to catch and handle these exceptions.
Custom Exception Classes
You can create custom exceptions by subclassing Python's Exception
class. This allows you to define your own error types and raise them when needed.
# Custom exception class
class CustomError(Exception):
def __init__(self, message):
super().__init__(message)
# Raising and handling a custom exception
try:
raise CustomError("This is a custom error message.")
except CustomError as e:
print(f"Custom exception occurred: {e}")
- Create a custom exception by subclassing the
Exception
class. - Define an
__init__()
method to customize the error message. - Use
raise CustomError("message")
to raise your custom exception.
Iterators and Generators
Creating and Using Iterators
An iterator is an object that contains a sequence of elements and can be iterated (looped) one element at a time. Iterators are implemented using the __iter__()
and __next__()
methods.
# Creating an iterator from a list
my_list = [1, 2, 3, 4]
iterator = iter(my_list) # Get an iterator from the list
# Access elements using next()
first_element = next(iterator) # 1
second_element = next(iterator) # 2
# Custom iterator class
class MyIterator:
def __init__(self, max_value):
self.max_value = max_value
self.current = 0
def __iter__(self):
return self
def __next__(self):
if self.current < self.max_value:
result = self.current
self.current += 1
return result
else:
raise StopIteration
# Using the custom iterator
for num in MyIterator(5):
print(num) # Outputs: 0, 1, 2, 3, 4
- An iterator is an object with
__iter__()
(returns the iterator) and__next__()
(returns the next item) methods. - Use
iter()
to get an iterator from an iterable like a list, tuple, or string. - Use
next()
to get the next item from an iterator, andStopIteration
is raised when no more items are available.
Generator Functions and Expressions
Generators are functions that yield items one at a time instead of returning them all at once. They are memory-efficient since they generate values on demand.
# Generator function using yield
def my_generator():
yield 1
yield 2
yield 3
# Using the generator
for value in my_generator():
print(value) # Outputs: 1, 2, 3
# Infinite generator
def infinite_counter():
num = 0
while True:
yield num
num += 1
# Access the first 5 values from the infinite generator
counter = infinite_counter()
for _ in range(5):
print(next(counter)) # Outputs: 0, 1, 2, 3, 4
- Generator functions use the
yield
keyword instead ofreturn
to produce values one at a time. - Generators maintain their state between yields, allowing the computation to be resumed.
- Generators are memory-efficient as they only produce one value at a time, unlike lists that store all values in memory.
yield Keyword
The yield
keyword is used in generator functions to yield a value and pause execution until the next call to next()
.
# Generator using yield
def countdown(n):
while n > 0:
yield n
n -= 1
# Using the generator
for value in countdown(5):
print(value) # Outputs: 5, 4, 3, 2, 1
# Generator that generates Fibonacci numbers
def fibonacci(limit):
a, b = 0, 1
while a < limit:
yield a
a, b = b, a + b
# Print Fibonacci numbers less than 20
for num in fibonacci(20):
print(num) # Outputs: 0, 1, 1, 2, 3, 5, 8, 13
yield
pauses the generator function and returns a value to the caller.- When
next()
is called, execution resumes from the lastyield
point. - Generators maintain the state of local variables between
yield
calls, unlike functions that reset their state on each call.
Object-Oriented Programming (OOP)
Classes and Objects
Classes define blueprints for creating objects. An object is an instance of a class with specific properties and behavior.
# Define a class
class Dog:
def __init__(self, name, breed):
self.name = name
self.breed = breed
# Create an object (instance) of the class
dog1 = Dog("Buddy", "Golden Retriever")
dog2 = Dog("Max", "German Shepherd")
# Access object attributes
print(dog1.name) # Buddy
print(dog2.breed) # German Shepherd
- Classes are defined using the
class
keyword. - Objects are instances of a class, and they have attributes and methods.
- Use
object.attribute
to access an attribute of an object.
Methods and Attributes
Attributes store object data, while methods define actions that objects can perform.
# Class with attributes and methods
class Dog:
def __init__(self, name, breed):
self.name = name
self.breed = breed
def bark(self):
return f"{self.name} says Woof!"
dog = Dog("Buddy", "Golden Retriever")
print(dog.bark()) # Buddy says Woof!
- Attributes store data related to an object (e.g.,
name
andbreed
). - Methods are functions defined in a class that operate on the object's data.
__init__ Method
The __init__
method is a constructor that initializes the object's attributes when it is created.
# __init__ method initializes attributes
class Person:
def __init__(self, name, age):
self.name = name
self.age = age
person = Person("Alice", 30)
print(person.name) # Alice
print(person.age) # 30
__init__
is called automatically when an object is created.- It initializes the attributes of the object.
- Parameters passed during object creation are assigned to the object's attributes.
Inheritance and Polymorphism
Inheritance allows a class to inherit attributes and methods from another class. Polymorphism allows objects of different classes to be treated as objects of a common superclass.
# Parent (Base) class
class Animal:
def __init__(self, name):
self.name = name
def speak(self):
return "I am an animal"
# Child (Derived) class
class Dog(Animal):
def speak(self):
return f"{self.name} says Woof!"
# Child (Derived) class
class Cat(Animal):
def speak(self):
return f"{self.name} says Meow!"
dog = Dog("Buddy")
cat = Cat("Whiskers")
print(dog.speak()) # Buddy says Woof!
print(cat.speak()) # Whiskers says Meow!
- Inheritance allows one class to inherit the properties and methods of another class.
- Polymorphism allows objects of different classes to be used interchangeably.
- Child classes can override parent methods (method overriding).
Special Methods
Special methods (also called "magic" or "dunder" methods) allow classes to define behavior for built-in Python operations.
# Special methods for string representation
class Book:
def __init__(self, title, author, pages):
self.title = title
self.author = author
self.pages = pages
def __str__(self):
return f"'{self.title}' by {self.author}"
def __len__(self):
return self.pages
book = Book("1984", "George Orwell", 328)
print(str(book)) # '1984' by George Orwell
print(len(book)) # 328
__str__()
returns a string representation of the object (used byprint()
).__repr__()
returns an unambiguous representation of the object (used in debugging).__len__()
allows the object to be used withlen()
to get its "length".- Other special methods include
__add__()
,__eq__()
, and__getitem__()
.
Encapsulation and Property Decorators
Encapsulation hides internal details of an object from the outside. Property decorators (@property
) provide controlled access to attributes.
# Encapsulation using private attributes and property decorators
class Account:
def __init__(self, balance):
self.__balance = balance # Private attribute
@property
def balance(self):
return self.__balance
@balance.setter
def balance(self, amount):
if amount >= 0:
self.__balance = amount
else:
raise ValueError("Balance cannot be negative")
account = Account(1000)
print(account.balance) # 1000
account.balance = 1200 # Update balance
print(account.balance) # 1200
# The following line raises an exception
# account.balance = -500 # ValueError: Balance cannot be negative
- Private attributes are prefixed with a double underscore (e.g.,
__balance
). - The
@property
decorator defines a getter for an attribute. - The
@property_name.setter
decorator defines a setter to control how attributes are updated.
Built-in Functions and Utilities
Mathematical Functions
Python provides several built-in functions for mathematical operations.
# Mathematical functions
absolute_value = abs(-10) # 10
power_value = pow(2, 3) # 2^3 = 8
rounded_value = round(3.14159, 2) # 3.14 (rounds to 2 decimal places)
abs(x)
: Returns the absolute value ofx
.pow(x, y)
: Returnsx
raised to the powery
(same asx**y
).round(x, n)
: Roundsx
ton
decimal places.
Iterable Functions
These functions operate on iterables such as lists, tuples, and sets.
# Iterable functions
my_list = [10, 20, 30, 40]
length = len(my_list) # 4
maximum = max(my_list) # 40
minimum = min(my_list) # 10
sorted_list = sorted(my_list, reverse=True) # [40, 30, 20, 10]
# Using enumerate to get index and value
for index, value in enumerate(my_list):
print(f"Index: {index}, Value: {value}")
len(x)
: Returns the number of elements in the iterablex
.max(x)
: Returns the largest item in the iterablex
.min(x)
: Returns the smallest item in the iterablex
.sorted(x, reverse=True)
: Returns a new sorted list from the items ofx
.enumerate(x)
: Returns both index and value while iterating overx
.
Type-Related Functions
These functions allow you to check and manipulate the type of objects.
# Type-related functions
variable = 42
variable_type = type(variable) #
is_instance = isinstance(variable, int) # True
variable_id = id(variable) # Unique identifier for the object in memory
type(x)
: Returns the type of the objectx
.isinstance(x, type)
: Checks ifx
is an instance of the specified type.id(x)
: Returns the unique identifier of an object (its memory address).
I/O Functions
Input/Output functions are used to take input from the user and display output on the screen.
# I/O functions
# Taking input from the user
name = input("What is your name? ")
# Printing a message
print(f"Hello, {name}!")
input(prompt)
: Takes input from the user as a string and returns it.print(x)
: Prints the string representation ofx
to the console.
Conversion Functions
These functions convert values from one type to another.
# Conversion functions
num_str = "123"
int_value = int(num_str) # 123 (string to integer)
float_value = float(num_str) # 123.0 (string to float)
str_value = str(456) # '456' (integer to string)
list_value = list("hello") # ['h', 'e', 'l', 'l', 'o'] (string to list)
tuple_value = tuple([1, 2, 3]) # (1, 2, 3) (list to tuple)
set_value = set([1, 2, 2, 3]) # {1, 2, 3} (list to set, removes duplicates)
int(x)
: Convertsx
to an integer.float(x)
: Convertsx
to a float.str(x)
: Convertsx
to a string.list(x)
: Convertsx
to a list.tuple(x)
: Convertsx
to a tuple.set(x)
: Convertsx
to a set (removes duplicates).
Decorators and Context Managers
Creating and Applying Decorators
Decorators are functions that modify the behavior of other functions or methods. They are applied using the @decorator_name
syntax.
# Simple decorator to log function calls
def logger(func):
def wrapper(*args, **kwargs):
print(f"Calling function: {func.__name__}")
result = func(*args, **kwargs)
print(f"Function {func.__name__} finished execution")
return result
return wrapper
# Applying the decorator using @
@logger
def say_hello(name):
print(f"Hello, {name}!")
say_hello("Alice")
- Decorators modify the behavior of functions or methods.
- They are defined as functions that return a "wrapper" function.
- Apply a decorator using
@decorator_name
before the function definition.
Built-in Decorators
Python provides several built-in decorators for commonly used patterns, such as @staticmethod
, @classmethod
, and @property
.
@staticmethod
Defines a method that does not require access to the instance or class.
# Static method example
class Math:
@staticmethod
def add(a, b):
return a + b
result = Math.add(5, 3) # 8
@staticmethod
allows the method to be called on the class itself, not an instance.
@classmethod
Defines a method that takes the class itself (cls
) as the first argument instead of the instance.
# Class method example
class Person:
population = 0
def __init__(self, name):
self.name = name
Person.population += 1
@classmethod
def get_population(cls):
return cls.population
person1 = Person("Alice")
person2 = Person("Bob")
total_population = Person.get_population() # 2
@classmethod
allows access to class variables and methods from within the method.
@property
Converts a method into a read-only attribute using the @property
decorator.
# Property decorator example
class Circle:
def __init__(self, radius):
self._radius = radius
@property
def area(self):
return 3.14159 * self._radius ** 2
circle = Circle(5)
circle_area = circle.area # Access as an attribute, not a method
@property
allows you to access methods like attributes without calling them as functions.- It is used to create read-only attributes.
Creating Custom Context Managers
Context managers are used to manage resources like file handling. They ensure that setup and cleanup actions are always performed.
Using __enter__
and __exit__
Methods
# Custom context manager using __enter__ and __exit__
class MyContextManager:
def __enter__(self):
print("Entering context")
return self
def __exit__(self, exc_type, exc_value, traceback):
print("Exiting context")
# Using the custom context manager
with MyContextManager() as context:
print("Inside the context")
__enter__
runs when entering thewith
block and returns the context manager object.__exit__
runs when exiting thewith
block, even if an exception occurs.- It takes three arguments:
exc_type
,exc_value
, andtraceback
, which are used for exception handling.
Creating Context Managers Using contextlib
Another way to create context managers is by using the @contextmanager
decorator from the contextlib
module.
from contextlib import contextmanager
@contextmanager
def my_context():
print("Entering context")
yield
print("Exiting context")
# Using the context manager
with my_context():
print("Inside the context")
- The
@contextmanager
decorator makes it easier to create context managers usingyield
. - Code before
yield
runs when entering the context, and code afteryield
runs when exiting.
Concurrency and Parallelism
Introduction to threading
and multiprocessing
Concurrency and parallelism allow multiple tasks to run simultaneously, improving performance. Python supports concurrency with threading
and parallelism with multiprocessing
.
Threading
The threading
module allows you to run multiple threads concurrently. Threads run in the same memory space and share data, making them lightweight but requiring synchronization.
import threading
import time
def worker():
print("Starting thread")
time.sleep(2)
print("Finished thread")
# Create and start two threads
thread1 = threading.Thread(target=worker)
thread2 = threading.Thread(target=worker)
thread1.start()
thread2.start()
thread1.join() # Wait for thread1 to finish
thread2.join() # Wait for thread2 to finish
- Threads run concurrently, sharing the same memory space.
- Use
threading.Thread(target=function_name)
to create a thread that runsfunction_name
. thread.start()
begins execution of the thread, andthread.join()
waits for it to complete.
Multiprocessing
The multiprocessing
module allows you to run multiple processes in parallel. Each process runs in its own memory space, so data is not shared directly.
import multiprocessing
import time
def worker():
print("Starting process")
time.sleep(2)
print("Finished process")
# Create and start two processes
process1 = multiprocessing.Process(target=worker)
process2 = multiprocessing.Process(target=worker)
process1.start()
process2.start()
process1.join() # Wait for process1 to finish
process2.join() # Wait for process2 to finish
- Processes run in separate memory spaces, making them safer but more resource-intensive than threads.
- Use
multiprocessing.Process(target=function_name)
to create a process that runsfunction_name
. process.start()
begins execution of the process, andprocess.join()
waits for it to complete.
Async Programming with asyncio
Asynchronous programming allows for non-blocking execution. The asyncio
module enables running asynchronous tasks using async
and await
keywords.
Asynchronous Functions
Asynchronous functions use async def
to define a function that can be paused using await
.
import asyncio
async def greet(name):
print(f"Hello, {name}!")
await asyncio.sleep(2) # Simulate a network delay
print(f"Goodbye, {name}!")
# Run the asynchronous function
asyncio.run(greet("Alice"))
- Use
async def
to define an asynchronous function. - Use
await
to pause execution until the awaited task is complete. - Use
asyncio.run()
to run an asynchronous function from synchronous code.
Running Multiple Tasks Concurrently
To run multiple asynchronous tasks at the same time, use asyncio.gather()
.
import asyncio
async def task1():
print("Task 1 starting")
await asyncio.sleep(2)
print("Task 1 finished")
async def task2():
print("Task 2 starting")
await asyncio.sleep(1)
print("Task 2 finished")
# Run multiple tasks concurrently
async def main():
await asyncio.gather(task1(), task2())
asyncio.run(main())
- Use
asyncio.gather()
to run multiple asynchronous tasks concurrently. - Tasks are started together, and the program waits for all of them to finish.
Using Async Context Managers
Async context managers are used to manage asynchronous resources, such as network connections.
import asyncio
class AsyncContextManager:
async def __aenter__(self):
print("Entering async context")
return self
async def __aexit__(self, exc_type, exc_value, traceback):
print("Exiting async context")
# Using the async context manager
async def main():
async with AsyncContextManager():
print("Inside the context")
asyncio.run(main())
- Use
__aenter__()
and__aexit__()
to create an async context manager. - Async context managers are useful for managing network connections, files, and other asynchronous resources.
Exploring Advanced Features
Working with collections
Module
The collections
module provides specialized data structures like Counter
, defaultdict
, OrderedDict
, and deque
for enhanced data manipulation.
Counter
from collections import Counter
# Count the occurrences of each character in a string
counter = Counter("hello world")
print(counter) # Counter({'l': 3, 'o': 2, 'h': 1, 'e': 1, ' ': 1, 'w': 1, 'r': 1, 'd': 1})
Counter
counts the occurrences of elements in an iterable.
defaultdict
from collections import defaultdict
# Create a defaultdict with a default type of list
dd = defaultdict(list)
dd['fruits'].append('apple')
dd['fruits'].append('banana')
print(dd) # defaultdict(, {'fruits': ['apple', 'banana']})
defaultdict
provides default values for missing keys, avoidingKeyError
.
OrderedDict
from collections import OrderedDict
# Create an ordered dictionary
od = OrderedDict()
od['a'] = 1
od['b'] = 2
od['c'] = 3
print(od) # OrderedDict([('a', 1), ('b', 2), ('c', 3)])
OrderedDict
maintains the order in which items are added.
deque
from collections import deque
# Create a deque and append elements
dq = deque([1, 2, 3])
dq.appendleft(0)
dq.append(4)
print(dq) # deque([0, 1, 2, 3, 4])
# Pop elements from both ends
dq.pop() # 4
dq.popleft() # 0
deque
is a double-ended queue that allows fast appends and pops from both ends.
itertools
for Functional Programming
The itertools
module provides iterator-building functions for efficient looping.
import itertools
# Infinite counter
counter = itertools.count(start=1, step=2)
print(next(counter)) # 1
print(next(counter)) # 3
# Cartesian product
product = list(itertools.product([1, 2], ['A', 'B']))
print(product) # [(1, 'A'), (1, 'B'), (2, 'A'), (2, 'B')]
# Group elements by condition
grouped = itertools.groupby('AAAABBBCCDAA')
for key, group in grouped:
print(key, list(group)) # Groups consecutive identical elements
count(start, step)
: Infinite counter starting fromstart
.product()
: Cartesian product of input iterables.groupby(iterable)
: Groups consecutive identical elements together.
functools
Utilities
The functools
module provides higher-order functions that act on other functions.
lru_cache
from functools import lru_cache
@lru_cache(maxsize=100)
def factorial(n):
if n == 0:
return 1
return n * factorial(n - 1)
print(factorial(10)) # 3628800
@lru_cache
caches function results to avoid redundant calculations.
partial
from functools import partial
# Create a partial function
def multiply(x, y):
return x * y
double = partial(multiply, 2)
print(double(5)) # 10
partial(func, *args, **kwargs)
creates a new function with pre-filled arguments.
reduce
from functools import reduce
# Reduce a list to a single value
numbers = [1, 2, 3, 4]
product = reduce(lambda x, y: x * y, numbers)
print(product) # 24
reduce(func, iterable)
applies a function cumulatively to the items in the iterable.
Handling JSON and Other Formats
json
Module
import json
# Convert Python object to JSON string
data = {'name': 'Alice', 'age': 25}
json_string = json.dumps(data)
print(json_string) # {"name": "Alice", "age": 25}
# Convert JSON string to Python object
parsed_data = json.loads(json_string)
print(parsed_data) # {'name': 'Alice', 'age': 25}
# Write JSON to a file
with open('data.json', 'w') as file:
json.dump(data, file)
# Read JSON from a file
with open('data.json', 'r') as file:
loaded_data = json.load(file)
json.dumps()
: Convert a Python object to a JSON string.json.loads()
: Parse a JSON string to a Python object.json.dump()
: Write a Python object to a file as JSON.json.load()
: Load JSON data from a file into a Python object.
pickle
for Serialization
import pickle
# Serialize an object and save to a file
data = {'name': 'Bob', 'age': 30}
with open('data.pkl', 'wb') as file:
pickle.dump(data, file)
# Deserialize the object from the file
with open('data.pkl', 'rb') as file:
loaded_data = pickle.load(file)
print(loaded_data) # {'name': 'Bob', 'age': 30}
pickle.dump(obj, file)
: Serialize an object and save it to a binary file.pickle.load(file)
: Load a pickled object from a file.- Pickle is used for serializing and deserializing Python objects, but it should not be used for untrusted data.
Getting Started with IPython
Accessing Documentation (?
and ??
)
IPython allows you to access documentation for functions, classes, and objects using ?
and ??
.
# Access documentation for a function
len?
# Access source code along with the documentation
len??
object?
: Displays documentation for the object.object??
: Displays documentation and source code (if available) for the object.
Exploring Modules with Tab Completion
IPython supports tab completion, allowing you to explore available attributes, methods, and variables.
# Press Tab after typing the module name and a dot (.)
import math
math. # Press Tab to see available attributes and methods
- Type part of a module name or variable and press
Tab
to view suggestions. - Useful for discovering available functions, methods, and variables in a module.
Keyboard Shortcuts in IPython Shell
Navigation Shortcuts
Ctrl + A
: Move cursor to the beginning of the line.Ctrl + E
: Move cursor to the end of the line.Ctrl + L
: Clear the screen.
Text Entry Shortcuts
Ctrl + K
: Delete from cursor to the end of the line.Ctrl + U
: Delete from cursor to the beginning of the line.Ctrl + W
: Delete the word before the cursor.
Command History Shortcuts
Up/Down Arrow
: Navigate through command history.Ctrl + R
: Search command history (reverse search).Ctrl + P
: Previous command in history.Ctrl + N
: Next command in history.
IPython Magic Commands
%paste
and %cpaste
These commands allow you to paste and execute multiple lines of code.
# Paste code directly into the IPython shell
%paste # Pastes the content from the system clipboard
# Paste code interactively, line by line
%cpaste
# Paste the code, then press Ctrl + D to execute
%paste
: Pastes and executes clipboard content.%cpaste
: Opens an interactive prompt to paste multi-line code.
%run
The %run
command allows you to run Python scripts directly from the IPython shell.
# Run a Python script
%run my_script.py
%run script.py
: Executes a Python script as if it were run usingpython script.py
.- You can pass arguments to the script:
%run script.py arg1 arg2
.
%time
and %timeit
These commands measure the execution time of a single statement or an entire block of code.
# Measure execution time of a single statement
%time sum([i for i in range(100000)])
# Measure execution time multiple times for better accuracy
%timeit sum([i for i in range(100000)])
%time
: Measures execution time of a single statement.%timeit
: Repeats the execution to provide a more accurate measure of execution time.
Accessing Magic Commands (%magic
and %lsmagic
)
These commands provide information about all available magic commands.
# List all magic commands
%lsmagic
# View documentation for all magic commands
%magic
%lsmagic
: Lists all available magic commands.%magic
: Displays detailed documentation for all magic commands.
Working with the Shell
Shell Commands in IPython
IPython allows you to run shell commands directly from the IPython shell. You can execute any command you would run in the terminal by prefixing it with an exclamation mark (!
).
# List files in the current directory (Linux/Mac)
!ls
# List files in the current directory (Windows)
!dir
# Check the current working directory
!pwd
# Create a new directory
!mkdir my_new_directory
# Remove a file
!rm myfile.txt
- Prefix shell commands with
!
to execute them in IPython. - Common commands include
!ls
(list files) and!pwd
(print working directory). - Works similarly to how you run commands in a command-line shell (like Bash, Zsh, or Command Prompt).
Passing Values to and from the Shell
You can pass values from IPython to the shell and capture shell output into Python variables.
Passing Python Variables to the Shell
# Define a Python variable
filename = "example.txt"
# Use the variable in a shell command using {}
!echo "This is a test" > {filename}
- Use curly braces
{}
to insert Python variables into shell commands. - For example,
!echo "Hello" > {filename}
creates a file with the name stored infilename
.
Capturing Shell Output in Python Variables
# Capture output of shell command
files = !ls
print(files) # List of files as Python list
# Get the current working directory
current_directory = !pwd
print(current_directory[0]) # Print the first line of the output
- Use
output = !command
to capture the output of a shell command as a list of strings. - Each line of the shell output becomes an element in the list.
Shell-Related Magic Commands
%alias
Use %alias
to create custom shortcuts for shell commands.
# Create an alias
%alias ll ls -l
# Use the alias
ll
%alias name command
defines an alias for a shell command.- For example,
%alias ll ls -l
creates a shortcutll
forls -l
.
%env
The %env
command displays and modifies environment variables.
# View all environment variables
%env
# Set a new environment variable
%env MY_VAR=hello
# Access the environment variable in Python
import os
print(os.getenv('MY_VAR')) # hello
%env
displays all environment variables.- Use
%env VAR_NAME=value
to set an environment variable. - Environment variables can be accessed in Python using
os.getenv()
.
%sc
(Shell Capture)
Use %sc
to run shell commands and capture the output as a Python variable.
# Run a shell command and capture the output
%sc files = ls
print(files) # ['file1.txt', 'file2.txt']
%sc variable = command
captures the output of a shell command in the variable.- Similar to
!command
but stores the result directly in a variable.
%cd
(Change Directory)
The %cd
command changes the current working directory.
# Change the current working directory
%cd /path/to/directory
# Change to the parent directory
%cd ..
%cd /path/to/directory
changes the current directory.- Use
%cd ..
to move to the parent directory. - Similar to the shell
cd
command.
%pushd
and %popd
These commands allow you to navigate directories while remembering the previous location.
# Change directory and save the previous directory
%pushd /path/to/directory
# Return to the previous directory
%popd
%pushd
changes to a directory and stores the previous directory on a stack.%popd
returns to the most recently stored directory.
%pwd
Prints the current working directory.
# Print the current directory
%pwd
- Displays the current directory path.
%who
and %whos
These commands list the variables defined in the current namespace.
# List all variables
%who
# List all variables with more details
%whos
%who
: Lists the names of all user-defined variables.%whos
: Displays a detailed view of each variable, including type and value.
%clear
Clears all user-defined variables from the namespace.
# Clear all variables
%clear
- Clears all user-defined variables, but it does not affect imports or built-in objects.
Debugging and Profiling
Controlling Exceptions (%xmode
)
IPython allows you to control how exceptions are displayed using the %xmode
magic command. There are three modes: Plain
, Context
, and Verbose
.
# Set the exception display mode to 'Verbose' (more detailed tracebacks)
%xmode Verbose
# Set the exception display mode to 'Context' (default view)
%xmode Context
# Set the exception display mode to 'Plain' (minimal traceback)
%xmode Plain
Plain
: Minimal traceback information.Context
: Default traceback view with context around each call.Verbose
: Full traceback with variable values at each step.
Debugging Tracebacks
IPython provides an interactive debugger that allows you to step through the traceback of an error and inspect variables.
# Trigger an error and start the debugger
def divide(a, b):
return a / b
divide(10, 0) # This will raise a ZeroDivisionError
Once an error occurs, you can activate the debugger using %debug
or automatically enter the debugger on exceptions with %pdb on
.
# Enable automatic debugging on errors
%pdb on
# Trigger an error
divide(10, 0) # The debugger will activate automatically
%debug
: Starts an interactive debugging session after an error occurs.%pdb on
: Automatically enters the debugger when an exception occurs.- While in the debugger, you can use commands like
n
(next),c
(continue), andq
(quit).
Timing Code Execution (%time
and %timeit
)
IPython provides tools to measure the execution time of code. Use %time
for a single execution and %timeit
for multiple executions to get an average time.
%time
# Measure execution time of a single statement
%time sum([i for i in range(1000000)])
- Use
%time
to measure the time it takes to execute a single line of code. - It displays wall time and CPU time.
%timeit
# Measure execution time with multiple iterations for better accuracy
%timeit sum([i for i in range(1000000)])
- Use
%timeit
to measure execution time by running the statement multiple times. - Provides more accurate measurements since it eliminates noise from background processes.
Profiling Scripts (%prun
, %lprun
, %memit
, %mprun
)
%prun
(Profile the execution time of a program)
Use %prun
to get detailed statistics about the execution time of each function call in your script.
# Profile a function
def factorial(n):
if n == 0:
return 1
return n * factorial(n - 1)
%prun factorial(10)
- Use
%prun
to profile the execution of an entire function or script. - It shows the number of calls, total time, and time per call for each function.
%lprun
(Line-by-line profiling)
Line-by-line profiling allows you to see how much time is spent on each line of a function. To use %lprun
, you need to install the line_profiler
package.
# Example of line-by-line profiling (requires line_profiler)
from time import sleep
def example_function():
sleep(1)
result = [i ** 2 for i in range(10000)]
sleep(2)
return sum(result)
%lprun -f example_function example_function()
- Use
%lprun -f function_name
to profile individual lines of a function. - Displays the time spent on each line of code in the function.
- Requires the
line_profiler
package (install viapip install line_profiler
).
%memit
(Memory usage measurement)
Measure the memory usage of a function or statement using %memit
. This requires the memory_profiler
package.
# Example of measuring memory usage
%load_ext memory_profiler
# Measure the memory usage of this list comprehension
%memit [i ** 2 for i in range(100000)]
- Use
%memit
to measure the peak memory usage of a statement. - Requires the
memory_profiler
package (install viapip install memory_profiler
).
%mprun
(Memory profiling line-by-line)
Profile memory usage line-by-line in a function. Similar to %lprun
, but for memory instead of time.
# Example of memory profiling line-by-line
from memory_profiler import profile
@profile
def example_function():
x = [i for i in range(100000)]
y = [i ** 2 for i in x]
del x
return y
example_function()
- Use
%mprun -f function_name function_name()
to measure memory usage of each line in a function. - Requires the
memory_profiler
package (install viapip install memory_profiler
). - Must be run as a Python script (not in Jupyter) for proper profiling.
Introduction to NumPy
Understanding Data Types in Python
Python supports several data types, but NumPy introduces more efficient data types for numerical computation. NumPy arrays have fixed types, allowing for faster computations and less memory usage.
- Common NumPy data types include
int32
,float64
, andbool
. - NumPy arrays require all elements to be of the same data type.
Creating and Manipulating Arrays
Arrays from Python Lists
import numpy as np
# Create an array from a Python list
array_from_list = np.array([1, 2, 3, 4, 5])
Arrays from Scratch
# Create arrays using NumPy functions
zeros_array = np.zeros((2, 3)) # 2x3 array of zeros
ones_array = np.ones((2, 3)) # 2x3 array of ones
random_array = np.random.rand(2, 3) # 2x3 array of random numbers
NumPy Standard Data Types
# Specify the data type of an array
int_array = np.array([1, 2, 3], dtype=np.int32)
float_array = np.array([1.2, 3.4, 5.6], dtype=np.float64)
- Use
np.array()
to create arrays from Python lists. - Functions like
np.zeros()
andnp.ones()
create arrays from scratch. - Specify the data type using the
dtype
parameter.
Basics of NumPy Arrays
Attributes, Indexing, and Slicing
# Create an array
array = np.array([[1, 2, 3], [4, 5, 6]])
# Array attributes
shape = array.shape # (2, 3)
size = array.size # 6 (total number of elements)
dtype = array.dtype # Data type of the elements
# Indexing and slicing
element = array[1, 2] # Access element at row 1, column 2
sub_array = array[:, 1] # Get all rows for column 1
Reshaping Arrays
# Reshape an array
array = np.array([1, 2, 3, 4, 5, 6])
reshaped = array.reshape((2, 3)) # Reshape to 2x3 array
Array Concatenation and Splitting
# Concatenate arrays
array1 = np.array([1, 2, 3])
array2 = np.array([4, 5, 6])
concatenated = np.concatenate([array1, array2]) # [1, 2, 3, 4, 5, 6]
# Split an array
split_array = np.split(array1, 3) # Split into 3 parts
- Access array elements using
array[row, col]
. - Use
array.reshape()
to change the shape of an array. - Concatenate arrays with
np.concatenate()
and split them withnp.split()
.
Computation on Arrays
Universal Functions (UFuncs)
# Perform element-wise operations
array = np.array([1, 2, 3])
squared = np.square(array) # [1, 4, 9]
Broadcasting
# Broadcast a scalar to an array
array = np.array([1, 2, 3])
result = array + 5 # [6, 7, 8]
- UFuncs perform element-wise computations (e.g.,
np.square()
). - Broadcasting allows operations on arrays of different shapes.
Aggregations
Summing, Min, Max, and Averages
# Aggregate functions
array = np.array([1, 2, 3, 4, 5])
total = np.sum(array) # 15
minimum = np.min(array) # 1
maximum = np.max(array) # 5
average = np.mean(array) # 3.0
Boolean Masks and Comparisons
# Use boolean masks
array = np.array([1, 2, 3, 4, 5])
mask = array > 3 # [False, False, False, True, True]
filtered = array[mask] # [4, 5]
- Aggregation functions like
np.sum()
andnp.mean()
compute summary statistics. - Boolean masks filter elements in an array based on a condition.
Advanced Indexing
Fancy Indexing
# Fancy indexing
array = np.array([10, 20, 30, 40, 50])
selected = array[[0, 2, 4]] # [10, 30, 50]
Sorting and Binning Data
# Sort an array
array = np.array([3, 1, 4, 1, 5, 9])
sorted_array = np.sort(array) # [1, 1, 3, 4, 5, 9]
- Fancy indexing allows selecting multiple elements using an array of indices.
- Sort arrays using
np.sort()
.
Structured Arrays
Structured arrays allow you to define arrays with multiple fields, similar to a database table.
# Create a structured array
data = np.array([(25, 'Alice', 55.0), (30, 'Bob', 60.5)],
dtype=[('age', 'i4'), ('name', 'U10'), ('weight', 'f4')])
# Access data by field name
ages = data['age'] # [25, 30]
- Structured arrays are arrays with fields that have names and data types.
- Access fields using
array['field_name']
.
Data Manipulation with Pandas
Introduction to Pandas Objects
Series
A Series is a one-dimensional array-like object with labeled indices.
import pandas as pd
# Create a Series from a list
series = pd.Series([10, 20, 30, 40], index=['a', 'b', 'c', 'd'])
print(series)
- Series are similar to NumPy arrays but with labels for each element.
- Access elements using labels, e.g.,
series['a']
.
DataFrame
A DataFrame is a two-dimensional, table-like data structure with labeled rows and columns.
# Create a DataFrame from a dictionary
df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']
})
print(df)
- DataFrames are like Excel spreadsheets or SQL tables.
- Access columns using
df['column_name']
ordf.column_name
.
Index
An Index is an immutable array of row or column labels in a Series or DataFrame.
# Access DataFrame index
index = df.index
- The Index is used to label rows or columns in a Series or DataFrame.
Data Selection and Indexing
Selection in Series and DataFrames
# Select elements from a Series
element = series['a']
# Select a row from a DataFrame
row = df.loc[0] # Access by label
row = df.iloc[0] # Access by integer position
# Select multiple columns
columns = df[['Name', 'Age']]
- Use
loc
to select rows by label andiloc
to select by position. - Use
df['column_name']
ordf[['col1', 'col2']]
to select columns.
Hierarchical Indexing
# Create a DataFrame with a hierarchical index
multi_index_df = pd.DataFrame(
{'A': ['foo', 'foo', 'bar', 'bar'],
'B': ['one', 'two', 'one', 'two'],
'C': [1, 2, 3, 4]}
).set_index(['A', 'B'])
print(multi_index_df)
- Hierarchical indices allow for multi-level indexing in rows or columns.
- Access elements using a tuple of indices, e.g.,
multi_index_df.loc[('foo', 'one')]
.
Handling Missing Data
Identifying and Filling Null Values
# Identify missing values
missing_values = df.isnull()
# Fill missing values
filled_df = df.fillna(0)
- Use
isnull()
to detect missing values. - Use
fillna()
to fill missing values with a specified value.
Combining Datasets
Concatenation
# Concatenate two DataFrames
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})
concatenated_df = pd.concat([df1, df2])
- Use
pd.concat()
to concatenate multiple DataFrames along rows or columns.
Merge and Join Operations
# Merge two DataFrames
df1 = pd.DataFrame({'key': ['A', 'B', 'C'], 'value': [1, 2, 3]})
df2 = pd.DataFrame({'key': ['B', 'C', 'D'], 'value2': [4, 5, 6]})
merged_df = pd.merge(df1, df2, on='key')
- Use
pd.merge()
to merge DataFrames on a common key. - Use
df.join()
to join DataFrames by their indices.
Aggregation and Grouping
GroupBy Operations
# Group by and aggregate
grouped = df.groupby('City')['Age'].mean()
- Use
groupby()
to group by a column and perform aggregations.
Pivot Tables
# Create a pivot table
pivot_table = df.pivot_table(values='Age', index='City', aggfunc='mean')
- Use
pivot_table()
to create summary tables with aggregations.
Working with Time Series
Indexing by Time
# Create a time-indexed DataFrame
time_df = pd.DataFrame(
{'value': [1, 2, 3, 4]},
index=pd.date_range('2023-01-01', periods=4)
)
- Use
pd.date_range()
to create a DateTime index.
Resampling, Shifting, and Windowing
# Resample data
resampled = time_df.resample('D').mean()
# Shift data
shifted = time_df.shift(1)
- Use
resample()
to change the frequency of time-series data. - Use
shift()
to shift data forward or backward.
Data Visualization
Visualization with Matplotlib
Basic Plots: Line, Scatter, and Bar Plots
import matplotlib.pyplot as plt
# Line plot
x = [1, 2, 3, 4, 5]
y = [10, 20, 30, 40, 50]
plt.plot(x, y, label='Line Plot')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Line Plot Example')
plt.legend()
plt.show()
# Scatter plot
plt.scatter(x, y, color='red', label='Scatter Plot')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Scatter Plot Example')
plt.legend()
plt.show()
# Bar plot
plt.bar(x, y, color='green', label='Bar Plot')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Bar Plot Example')
plt.legend()
plt.show()
- Use
plt.plot()
for line plots. - Use
plt.scatter()
for scatter plots. - Use
plt.bar()
for bar plots.
Plot Customization: Colors, Styles, Labels, Legends
# Customizing line style, color, and markers
plt.plot(x, y, color='purple', linestyle='--', marker='o', label='Customized Line')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Customized Line Plot')
plt.legend()
plt.show()
- Change line style using
linestyle
(e.g.,'--'
,':'
). - Change color using
color
(e.g.,'red'
,'blue'
). - Set markers using
marker
(e.g.,'o'
,'^'
).
Subplots and Grids
# Create multiple subplots
fig, axs = plt.subplots(2, 2) # 2x2 grid of subplots
# Plot on each subplot
axs[0, 0].plot(x, y)
axs[0, 1].scatter(x, y)
axs[1, 0].bar(x, y)
axs[1, 1].hist(y)
plt.tight_layout()
plt.show()
- Use
plt.subplots()
to create grids of subplots. - Access individual subplots using
axs[row, col]
.
Histograms and Density Plots
# Histogram
data = [1, 1, 2, 2, 2, 3, 3, 4, 4, 5]
plt.hist(data, bins=5, color='orange', edgecolor='black')
plt.xlabel('Values')
plt.ylabel('Frequency')
plt.title('Histogram Example')
plt.show()
- Use
plt.hist()
to plot histograms. - Use the
bins
parameter to define the number of bins.
Three-Dimensional Plotting
from mpl_toolkits.mplot3d import Axes3D
# Create 3D plot
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
x = [1, 2, 3, 4, 5]
y = [10, 20, 30, 40, 50]
z = [5, 10, 15, 20, 25]
ax.plot(x, y, z)
plt.show()
- Use
projection='3d'
inadd_subplot()
to create 3D plots.
Text, Annotations, and Tick Customization
# Customizing ticks, adding annotations, and text
plt.plot(x, y)
plt.xticks(ticks=[1, 3, 5], labels=['One', 'Three', 'Five'])
plt.yticks(ticks=[10, 30, 50], labels=['Low', 'Medium', 'High'])
plt.text(3, 30, 'Midpoint', fontsize=12, color='red')
plt.show()
- Use
plt.xticks()
andplt.yticks()
to customize tick labels. - Use
plt.text()
to annotate specific points.
Visualization with Seaborn
Statistical Plots
import seaborn as sns
# Distribution plot
sns.histplot(data, kde=True)
plt.show()
- Seaborn's
histplot()
provides enhanced histograms with KDE curves.
Pair Plots, Distribution Plots, and Categorical Plots
# Pair plot
sns.pairplot(df)
plt.show()
# Categorical plot
sns.catplot(x='City', y='Age', kind='bar', data=df)
plt.show()
- Use
sns.pairplot()
to visualize relationships between all columns in a DataFrame. - Use
sns.catplot()
to plot categorical data.
Comments
KC
Thank You
Leave a Comment