Overview
Want to quickly master Python data analytics through working examples? This by-example guide teaches 95% of practical data analytics through 85 annotated examples organized by complexity level, targeting pandas 3.0.2, numpy 2.4.4, scikit-learn 1.8.0, and polars 1.40.1.
What Is By-Example Learning?
By-example learning is an example-first approach where you learn through annotated, runnable Python code rather than narrative explanations. Each example is self-contained, immediately executable, and heavily commented to show:
- What each operation does - Inline
# =>comments explain the purpose and mechanism - Expected outputs - Show actual values, shapes, and types after each operation
- Breaking changes - Critical version differences for pandas 3.0.2 and numpy 2.4.4
- Key takeaways - 1-2 sentence summaries of core concepts
This approach is ideal for experienced developers who know at least one programming language and want to quickly understand Python data analytics syntax, library conventions, and patterns through working code.
Learning Path
The tutorial guides you through 85 examples organized into three progressive levels.
%% Color Palette: Blue #0173B2, Orange #DE8F05, Teal #029E73, Purple #CC78BC, Brown #CA9161
graph TD
A["Beginner (Examples 1-28)<br/>Loading data, pandas basics,<br/>numpy, basic visualization"]
B["Intermediate (Examples 29-57)<br/>Advanced pandas, polars, DuckDB,<br/>scikit-learn pipelines, stats"]
C["Advanced (Examples 58-85)<br/>ML models, time series, financial,<br/>production pipelines, dashboards"]
A -->|Master foundations| B
B -->|Advanced patterns| C
style A fill:#0173B2,stroke:#000,color:#fff
style B fill:#DE8F05,stroke:#000,color:#fff
style C fill:#029E73,stroke:#000,color:#fff
Version Landscape
This tutorial targets the current major versions as of 2026. Understanding breaking changes is critical — several libraries have major API shifts that break older code.
%% Color Palette: Blue #0173B2, Orange #DE8F05, Teal #029E73, Purple #CC78BC, Brown #CA9161
graph LR
A["pandas 3.0.2<br/>CoW enforced<br/>New string dtype"]
B["numpy 2.4.4<br/>np.float64 not np.float_<br/>np.nan not np.NaN"]
C["scikit-learn 1.8.0<br/>set_output pandas<br/>HistGradientBoosting"]
D["polars 1.40.1<br/>group_by not groupby<br/>Lazy evaluation"]
style A fill:#0173B2,stroke:#000,color:#fff
style B fill:#DE8F05,stroke:#000,color:#fff
style C fill:#029E73,stroke:#000,color:#fff
style D fill:#CC78BC,stroke:#000,color:#fff
What This Tutorial Covers
Core Libraries
- pandas 3.0.2 - DataFrame operations, Copy-on-Write, new string dtype, deprecated freq aliases
- numpy 2.4.4 - Arrays, broadcasting, new random API, updated type names
- scikit-learn 1.8.0 - Preprocessing, pipelines, classification, regression, clustering
- polars 1.40.1 - High-performance DataFrames with lazy evaluation and SIMD acceleration
Visualization
- matplotlib 3.10.x - Foundation plotting with
fig, ax = plt.subplots()pattern - seaborn 0.13.2 - Statistical visualization including the new
sns.objectsinterface - plotly 6.x - Interactive charts with
px.scatter(),px.line(),go.Figure()
Data Sources and Storage
- DuckDB 1.2.x - In-process SQL analytics on DataFrames and Parquet files
- PyArrow 20.0.0 - Apache Arrow format, Parquet reading, Arrow-backed DataFrames
- yfinance 0.2.x - Financial data fetching from Yahoo Finance
Statistical Analysis
- scipy 1.15.x - Statistical tests, hypothesis testing, optimization
- statsmodels 0.14.x - OLS regression, time series ARIMA, seasonal decomposition
Production Patterns
- Reproducible pipelines with type hints and docstrings
- Data validation with pandera
- Streamlit dashboards
- Packaging with
pyproject.tomlanduv
Critical Breaking Changes to Know
Before writing any code, understand these breaking changes in pandas 3.0.2 and numpy 2.4.4:
pandas 3.0.2 Breaking Changes
Copy-on-Write (CoW) is now enforced:
# WRONG in pandas 3.0.2 - raises ChainedAssignmentError
df["subset"]["col"] = value
# CORRECT - use df.loc for modifications
df.loc[mask, "col"] = valueNew string dtype - strings no longer use object dtype:
# pandas 3.0.2 default string storage
df["name"].dtype # => StringDtype (not object)Deprecated frequency aliases changed:
# WRONG (pandas 2.x aliases)
df.resample("M") # Month-end was "M"
df.resample("Y") # Year-end was "Y"
df.resample("Q") # Quarter-end was "Q"
# CORRECT (pandas 3.0.2 aliases)
df.resample("ME") # Month-end is now "ME"
df.resample("YE") # Year-end is now "YE"
df.resample("QE") # Quarter-end is now "QE"applymap() replaced by map():
# WRONG (deprecated in pandas 3.x)
df.applymap(lambda x: x * 2)
# CORRECT (pandas 3.0.2)
df.map(lambda x: x * 2)numpy 2.4.4 Breaking Changes
Type alias names changed:
# WRONG (numpy 1.x aliases, removed in 2.x)
np.float_ # Removed
np.int_ # Removed (means intp now, not int64)
np.complex_ # Removed
np.NaN # Removed (was alias for float("nan"))
# CORRECT (numpy 2.4.4)
np.float64 # Explicit 64-bit float
np.intp # Platform pointer integer
np.complex128 # 128-bit complex
np.nan # Lowercase nan (built-in float)New random API (use this, not deprecated np.random.seed()):
# WRONG (deprecated, non-reproducible across processes)
np.random.seed(42)
np.random.normal(0, 1, 100)
# CORRECT (numpy 2.4.4 style)
rng = np.random.default_rng(seed=42)
rng.normal(0, 1, 100)Prerequisites
This tutorial assumes you:
- Know at least one programming language (Python experience helpful but not required)
- Understand basic programming concepts (loops, functions, data structures)
- Have Python 3.11+ installed with pip or uv
Installation:
pip install pandas==3.0.2 numpy==2.4.4 scikit-learn==1.8.0 polars==1.40.1
pip install matplotlib seaborn plotly scipy statsmodels
pip install pyarrow duckdb yfinance pandera streamlitOr with uv (faster):
uv pip install pandas numpy scikit-learn polars matplotlib seaborn plotly scipy statsmodels pyarrow duckdb yfinance pandera streamlitHow to Use This Guide
Each example follows a consistent five-part structure:
- Brief explanation - What this example demonstrates (1-3 sentences)
- Optional diagram - Visual representation for complex concepts
- Heavily annotated code - Working Python with
# =>comments showing values and outputs - Key takeaway - 1-2 sentence lesson summary
- Why It Matters - 50-100 words on practical significance
The # => annotation pattern documents what happens at each step:
import pandas as pd # => pandas 3.0.2
df = pd.read_csv("sales.csv") # => loads CSV into DataFrame
print(df.shape) # => (1000, 8) - rows, columns
print(df.dtypes) # => shows column types per columnWork through examples sequentially within each level, or jump directly to the example that covers your immediate need. Each example is self-contained.
Examples by Level
Beginner (Examples 1–28)
- Example 1: Loading CSV with pandas 3.0.2
- Example 2: DataFrame Basics — Shape, Info, Describe
- Example 3: Selecting Columns and Rows — loc, iloc, Single/Multi-Column
- Example 4: Copy-on-Write in pandas 3.0.2 — ChainedAssignmentError
- Example 5: Filtering Rows — Boolean Indexing
- Example 6: Handling Missing Values
- Example 7: Data Types — astype, StringDtype, Safe Conversion
- Example 8: Sorting — sort_values with Single and Multiple Keys
- Example 9: Adding and Removing Columns
- Example 10: Aggregations with groupby
- Example 11: Renaming and Reindexing
- Example 12: Merging DataFrames — pd.merge Join Types
- Example 13: Concatenating DataFrames — pd.concat
- Example 14: String Operations with pandas
- Example 15: Date and Time — pd.to_datetime, dt Accessor, New Freq Aliases
- Example 16: numpy Arrays — Creating and Inspecting
- Example 17: numpy Arithmetic — Element-wise Operations and Broadcasting
- Example 18: numpy 2.4.4 Random — default_rng (New API)
- Example 19: numpy 2.4.4 Type Names — Breaking Changes from 1.x
- Example 20: Basic matplotlib Plot — fig, ax Pattern
- Example 21: Scatter Plot with matplotlib and pandas
- Example 22: Bar Chart — Grouped and Stacked
- Example 23: Histogram — Understanding Distributions
- Example 24: seaborn Basics — scatterplot, lineplot, histplot
- Example 25: seaborn Heatmap — Correlation Matrix
- Example 26: Saving Plots — savefig with DPI and Layout
- Example 27: Jupyter Notebook Setup — Display and Magic Commands
- Example 28: Data Summary Report — Combining Describe, value_counts, and Missing Analysis
Intermediate (Examples 29–57)
- Example 29: DataFrame.map() — Replacing Deprecated applymap()
- Example 30: pd.col() Expression Syntax — Lazy Column References
- Example 31: apply() vs map() vs Vectorized Operations — Performance
- Example 32: Pivot Tables — pd.pivot_table
- Example 33: Melt and Stack/Unstack — Wide to Long Reshaping
- Example 34: Window Functions — Rolling, Expanding, EWM
- Example 35: Time Series Resampling — New Freq Aliases (pandas 3.0.2)
- Example 36: Multi-level Indexing — MultiIndex Creation and Slicing
- Example 37: Reading Parquet with PyArrow
- Example 38: polars 1.40.1 Basics — API Differences from pandas
- Example 39: polars Lazy Evaluation — scan_csv + collect
- Example 40: polars Expressions — str, cast, when/then/otherwise
- Example 41: polars vs pandas Performance — When to Use Each
- Example 42: DuckDB 1.2.x — In-process SQL on DataFrames
- Example 43: DuckDB Reading Parquet and CSV Directly
- Example 44: Feature Engineering — Binning with pd.cut and pd.qcut
- Example 45: Outlier Detection — IQR Method and Z-Score
- Example 46: Encoding Categorical Variables
- Example 47: Scaling Features — StandardScaler and MinMaxScaler
- Example 48: scikit-learn Pipeline — Preprocessing + Model in One Object
- Example 49: scikit-learn set_output — pandas-native Pipeline Output
- Example 50: Train/Test Split — Stratification and Random State
- Example 51: Cross-Validation — cross_val_score and StratifiedKFold
- Example 52: plotly Interactive Charts — px.scatter, px.line, px.bar
- Example 53: plotly Subplots — Multiple Chart Types in One Figure
- Example 54: seaborn objects Interface — sns.objects (0.13.2)
- Example 55: Statistical Testing — scipy.stats.ttest_ind
- Example 56: Chi-Square Test — Categorical Independence
- Example 57: Correlation Analysis — Pearson and Spearman
Advanced (Examples 58–85)
- Example 58: Linear Regression with scikit-learn
- Example 59: Logistic Regression — Classification Report and Confusion Matrix
- Example 60: HistGradientBoostingClassifier — NaN-Native Gradient Boosting
- Example 61: Hyperparameter Tuning — GridSearchCV and RandomizedSearchCV
- Example 62: Feature Importance — permutation_importance
- Example 63: Unsupervised Learning — KMeans and Silhouette Score
- Example 64: Dimensionality Reduction — PCA and t-SNE
- Example 65: Time Series Analysis with statsmodels — ARIMA
- Example 66: ARIMA Forecasting and Residual Diagnostics
- Example 67: Seasonal Decomposition — Trend, Seasonality, Residual
- Example 68: yfinance 0.2.x — Fetching Financial Data
- Example 69: Financial Analysis — Returns, Rolling Volatility, Sharpe Ratio
- Example 70: Working with Large Datasets — Chunked Reading and Dask Basics
- Example 71: PyArrow 20.0.0 — Reading Parquet and Schema Inspection
- Example 72: Arrow-backed DataFrames — 2-5x Memory Reduction
- Example 73: Data Validation with pandera
- Example 74: Advanced Aggregations — Named Agg and Custom Functions
- Example 75: String Matching and Fuzzy Join
- Example 76: Geospatial Basics with geopandas
- Example 77: NetworkX — Graph Analytics
- Example 78: A/B Testing Analysis — t-test, Effect Size, Sample Size
- Example 79: Survival Analysis Basics with lifelines
- Example 80: Reproducible Analytics Pipeline — Functions, Type Hints, Tests
- Example 81: Exporting Results — CSV, Excel, Parquet, Styled HTML
- Example 82: Scheduling Analytics — schedule Library and Cron Patterns
- Example 83: Streamlit Analytics Dashboard
- Example 84: Packaging an Analytics Project — pyproject.toml and uv
- Example 85: Data Analytics Production Checklist
Last updated April 28, 2026