Overview
Want to quickly master Python data analytics through working examples? This by-example guide teaches 95% of practical data analytics through 85 annotated examples organized by complexity level, targeting pandas 3.0.2, numpy 2.4.4, scikit-learn 1.8.0, and polars 1.40.1.
What Is By-Example Learning?
By-example learning is an example-first approach where you learn through annotated, runnable Python code rather than narrative explanations. Each example is self-contained, immediately executable, and heavily commented to show:
- What each operation does - Inline
# =>comments explain the purpose and mechanism - Expected outputs - Show actual values, shapes, and types after each operation
- Breaking changes - Critical version differences for pandas 3.0.2 and numpy 2.4.4
- Key takeaways - 1-2 sentence summaries of core concepts
This approach is ideal for experienced developers who know at least one programming language and want to quickly understand Python data analytics syntax, library conventions, and patterns through working code.
Learning Path
The tutorial guides you through 85 examples organized into three progressive levels.
%% Color Palette: Blue #0173B2, Orange #DE8F05, Teal #029E73, Purple #CC78BC, Brown #CA9161
graph TD
A["Beginner (Examples 1-28)<br/>Loading data, pandas basics,<br/>numpy, basic visualization"]
B["Intermediate (Examples 29-57)<br/>Advanced pandas, polars, DuckDB,<br/>scikit-learn pipelines, stats"]
C["Advanced (Examples 58-85)<br/>ML models, time series, financial,<br/>production pipelines, dashboards"]
A -->|Master foundations| B
B -->|Advanced patterns| C
style A fill:#0173B2,stroke:#000,color:#fff
style B fill:#DE8F05,stroke:#000,color:#fff
style C fill:#029E73,stroke:#000,color:#fff
Version Landscape
This tutorial targets the current major versions as of 2026. Understanding breaking changes is critical — several libraries have major API shifts that break older code.
%% Color Palette: Blue #0173B2, Orange #DE8F05, Teal #029E73, Purple #CC78BC, Brown #CA9161
graph LR
A["pandas 3.0.2<br/>CoW enforced<br/>New string dtype"]
B["numpy 2.4.4<br/>np.float64 not np.float_<br/>np.nan not np.NaN"]
C["scikit-learn 1.8.0<br/>set_output pandas<br/>HistGradientBoosting"]
D["polars 1.40.1<br/>group_by not groupby<br/>Lazy evaluation"]
style A fill:#0173B2,stroke:#000,color:#fff
style B fill:#DE8F05,stroke:#000,color:#fff
style C fill:#029E73,stroke:#000,color:#fff
style D fill:#CC78BC,stroke:#000,color:#fff
What This Tutorial Covers
Core Libraries
- pandas 3.0.2 - DataFrame operations, Copy-on-Write, new string dtype, deprecated freq aliases
- numpy 2.4.4 - Arrays, broadcasting, new random API, updated type names
- scikit-learn 1.8.0 - Preprocessing, pipelines, classification, regression, clustering
- polars 1.40.1 - High-performance DataFrames with lazy evaluation and SIMD acceleration
Visualization
- matplotlib 3.10.x - Foundation plotting with
fig, ax = plt.subplots()pattern - seaborn 0.13.2 - Statistical visualization including the new
sns.objectsinterface - plotly 6.x - Interactive charts with
px.scatter(),px.line(),go.Figure()
Data Sources and Storage
- DuckDB 1.2.x - In-process SQL analytics on DataFrames and Parquet files
- PyArrow 20.0.0 - Apache Arrow format, Parquet reading, Arrow-backed DataFrames
- yfinance 0.2.x - Financial data fetching from Yahoo Finance
Statistical Analysis
- scipy 1.15.x - Statistical tests, hypothesis testing, optimization
- statsmodels 0.14.x - OLS regression, time series ARIMA, seasonal decomposition
Production Patterns
- Reproducible pipelines with type hints and docstrings
- Data validation with pandera
- Streamlit dashboards
- Packaging with
pyproject.tomlanduv
Critical Breaking Changes to Know
Before writing any code, understand these breaking changes in pandas 3.0.2 and numpy 2.4.4:
pandas 3.0.2 Breaking Changes
Copy-on-Write (CoW) is now enforced:
# WRONG in pandas 3.0.2 - raises ChainedAssignmentError
df["subset"]["col"] = value
# CORRECT - use df.loc for modifications
df.loc[mask, "col"] = valueNew string dtype - strings no longer use object dtype:
# pandas 3.0.2 default string storage
df["name"].dtype # => StringDtype (not object)Deprecated frequency aliases changed:
# WRONG (pandas 2.x aliases)
df.resample("M") # Month-end was "M"
df.resample("Y") # Year-end was "Y"
df.resample("Q") # Quarter-end was "Q"
# CORRECT (pandas 3.0.2 aliases)
df.resample("ME") # Month-end is now "ME"
df.resample("YE") # Year-end is now "YE"
df.resample("QE") # Quarter-end is now "QE"applymap() replaced by map():
# WRONG (deprecated in pandas 3.x)
df.applymap(lambda x: x * 2)
# CORRECT (pandas 3.0.2)
df.map(lambda x: x * 2)numpy 2.4.4 Breaking Changes
Type alias names changed:
# WRONG (numpy 1.x aliases, removed in 2.x)
np.float_ # Removed
np.int_ # Removed (means intp now, not int64)
np.complex_ # Removed
np.NaN # Removed (was alias for float("nan"))
# CORRECT (numpy 2.4.4)
np.float64 # Explicit 64-bit float
np.intp # Platform pointer integer
np.complex128 # 128-bit complex
np.nan # Lowercase nan (built-in float)New random API (use this, not deprecated np.random.seed()):
# WRONG (deprecated, non-reproducible across processes)
np.random.seed(42)
np.random.normal(0, 1, 100)
# CORRECT (numpy 2.4.4 style)
rng = np.random.default_rng(seed=42)
rng.normal(0, 1, 100)Prerequisites
This tutorial assumes you:
- Know at least one programming language (Python experience helpful but not required)
- Understand basic programming concepts (loops, functions, data structures)
- Have Python 3.11+ installed with pip or uv
Installation:
pip install pandas==3.0.2 numpy==2.4.4 scikit-learn==1.8.0 polars==1.40.1
pip install matplotlib seaborn plotly scipy statsmodels
pip install pyarrow duckdb yfinance pandera streamlitOr with uv (faster):
uv pip install pandas numpy scikit-learn polars matplotlib seaborn plotly scipy statsmodels pyarrow duckdb yfinance pandera streamlitHow to Use This Guide
Each example follows a consistent five-part structure:
- Brief explanation - What this example demonstrates (1-3 sentences)
- Optional diagram - Visual representation for complex concepts
- Heavily annotated code - Working Python with
# =>comments showing values and outputs - Key takeaway - 1-2 sentence lesson summary
- Why It Matters - 50-100 words on practical significance
The # => annotation pattern documents what happens at each step:
import pandas as pd # => pandas 3.0.2
df = pd.read_csv("sales.csv") # => loads CSV into DataFrame
print(df.shape) # => (1000, 8) - rows, columns
print(df.dtypes) # => shows column types per columnWork through examples sequentially within each level, or jump directly to the example that covers your immediate need. Each example is self-contained.
Last updated April 28, 2026