Intermediate
Pandas
Master the Pandas library for data manipulation — from creating DataFrames to grouping, merging, pivot tables, and method chaining.
Series and DataFrame
Python
import pandas as pd # Series - 1D labeled array s = pd.Series([10, 20, 30], index=["a", "b", "c"]) # DataFrame - 2D labeled table df = pd.DataFrame({ "name": ["Alice", "Bob", "Charlie"], "age": [25, 30, 35], "salary": [70000, 80000, 90000] })
Loading Data
Python
# Read from files df = pd.read_csv("data.csv") df = pd.read_excel("data.xlsx", sheet_name="Sheet1") df = pd.read_json("data.json") # Quick inspection print(df.head()) # First 5 rows print(df.info()) # Column types, non-null counts print(df.describe()) # Statistical summary print(df.shape) # (rows, columns) print(df.columns) # Column names
Indexing: loc and iloc
Python
# loc - label-based indexing df.loc[0, "name"] # Single value df.loc[0:2, ["name", "age"]] # Rows 0-2, specific columns df.loc[df["age"] > 25] # Boolean filter # iloc - integer position-based indexing df.iloc[0, 1] # Row 0, column 1 df.iloc[0:2, 0:2] # First 2 rows, first 2 cols df.iloc[-1] # Last row
Filtering and Querying
Python
# Boolean filtering high_salary = df[df["salary"] > 75000] adults = df[(df["age"] >= 25) & (df["age"] <= 35)] # query() method (more readable) result = df.query("age > 25 and salary > 75000") # isin() for membership selected = df[df["name"].isin(["Alice", "Bob"])]
Sorting
Python
df.sort_values("salary", ascending=False) df.sort_values(["age", "salary"], ascending=[True, False]) df.sort_index() # Sort by index
GroupBy and Aggregation
Python
# Group and aggregate df.groupby("department")["salary"].mean() df.groupby("department").agg({ "salary": ["mean", "max", "min"], "age": "mean" }) # Value counts df["department"].value_counts()
Merge, Join, Concat
Python
# Merge (SQL-style join) merged = pd.merge(df1, df2, on="id", how="left") merged = pd.merge(df1, df2, left_on="emp_id", right_on="id") # Concatenate (stack) combined = pd.concat([df1, df2], axis=0) # Stack rows combined = pd.concat([df1, df2], axis=1) # Stack columns
Pivot Tables
Python
pivot = df.pivot_table(
values="salary",
index="department",
columns="role",
aggfunc="mean",
fill_value=0
)
Method Chaining
Python
result = (
df
.query("age > 25")
.assign(bonus=lambda x: x["salary"] * 0.1)
.sort_values("salary", ascending=False)
.reset_index(drop=True)
)
Method chaining makes your data pipeline readable and declarative. Each step transforms the data and passes it to the next step, similar to piping in Unix or dplyr in R.
Lilly Tech Systems