FROM ABSOLUTE ZERO TO ML ENGINEER

Farouz Ultimate
Data Science Roadmap

The most comprehensive, open-source learning path for Data Science & AI. From your first line of code to deploying production ML systems.

0
Phases
0
Sections
0
Topics
0
Resources
Scroll to explore

Roadmap Overview

Phase 0

Absolute Zero

Computer Basics · Python Fundamentals · Logic

Phase 1

Data Analysis

Excel · SQL · Pandas · EDA · Power BI

Phase 2

Math & Engineering

Linear Algebra · Calculus · Probability · DSA

Phase 3

Machine Learning

Regression · Trees · SVMs · Clustering · SHAP

Phase 4

Deep Learning

CNNs · Transformers · LLMs · GenAI

Phase 5

MLOps & Cloud

Docker · FastAPI · AWS · MLflow · CI/CD
0

Absolute Zero — The Starting Point

Never written a line of code? Perfect. This is where legends begin.

Computer & Digital Literacy

How Computers Work
  • CPU, RAM, Storage — what happens when you press "Run"
  • Operating Systems (Windows, macOS, Linux basics)
  • File Systems — organizing projects professionally
  • The Terminal / Command Line (navigating, creating files, running scripts)
The Internet & Data
  • Client-Server architecture
  • HTTP, URLs, APIs — what happens when you open a website
  • Data formats: CSV, JSON, XML
Essential Tools Setup
  • Installing Python (Anaconda distribution)
  • Jupyter Notebooks & Google Colab
  • VS Code — your professional code editor
  • Creating a GitHub account

Computational Thinking & Logic

Thinking Like a Programmer
  • Decomposition — breaking big problems into tiny steps
  • Pattern Recognition — spotting repeating structures
  • Abstraction — ignoring irrelevant details
  • Algorithm Design — step-by-step solutions (flowcharts)
Pre-Math Foundations
  • Number systems (integers, floats, binary basics)
  • Order of operations, fractions, percentages
  • Basic algebra (variables, equations, functions)
  • Reading graphs and charts

Python — Your First Language

Core Syntax
  • Variables & Data Types (int, float, str, bool)
  • Operators (arithmetic, comparison, logical)
  • Strings (slicing, formatting, methods)
  • Input / Output — your first interactive program
Control Flow
  • if / elif / else — decision making
  • for loops, while loops — repetition
  • break, continue, pass
  • Nested loops & loop patterns
Data Collections
  • Lists — ordered, mutable sequences
  • Tuples — immutable sequences
  • Dictionaries — key-value storage
  • Sets — unique elements
  • List Comprehensions
Functions & Modules
  • Defining functions (parameters, return values)
  • *args, **kwargs
  • Lambda functions
  • Importing modules (math, random, os)
  • Installing packages with pip
File & Error Handling
  • Reading / Writing text and CSV files
  • try / except / finally
  • Common Python errors and debugging
Mini-Projects
  • Calculator app
  • Number guessing game
  • Simple CSV data reader
1

Data Analysis Mastery

Master business logic, data wrangling, and dashboarding

The Business Engine

Advanced Excel
  • Pivot Tables & Pivot Charts
  • XLOOKUP, INDEX/MATCH, INDIRECT
  • Power Query (ETL inside Excel)
  • Power Pivot & Data Modeling
  • Dynamic Arrays & LAMBDA Functions
  • Conditional Formatting & Sparklines
Business Metrics
  • CAC (Customer Acquisition Cost)
  • LTV (Lifetime Value) & LTV:CAC Ratio
  • Churn Rate, Retention Rate
  • Gross Margin, Net Margin, EBITDA
  • Unit Economics & Break-even Analysis
  • MRR / ARR (Recurring Revenue)
SQL Mastery
  • SELECT, WHERE, ORDER BY, LIMIT
  • GROUP BY, HAVING, COUNT, SUM, AVG
  • Joins: INNER, LEFT, RIGHT, FULL, CROSS, SELF
  • Subqueries & Correlated Subqueries
  • CTEs (Common Table Expressions)
  • Window Functions: ROW_NUMBER, RANK, DENSE_RANK
  • LEAD, LAG, Running Totals, Moving Averages
  • CASE WHEN, COALESCE, NULLIF
  • Query Optimization & Indexing
  • Database: PostgreSQL / MySQL

Python Data Engine & Statistics

NumPy
  • ndarray — creation, indexing, slicing
  • Broadcasting & Vectorized Operations
  • Math functions (sum, mean, std, dot)
  • Random number generation
  • Reshaping, stacking, splitting
Pandas
  • Series & DataFrames
  • Reading data (CSV, Excel, JSON, SQL)
  • Filtering, Sorting, Boolean Indexing
  • GroupBy, Aggregation, Pivot Tables
  • Merge, Join, Concat
  • Missing Data (fillna, dropna, interpolate)
  • apply(), map(), lambda
  • Time Series basics (datetime, resample)
Descriptive Statistics
  • Mean, Median, Mode
  • Variance, Standard Deviation
  • Skewness, Kurtosis
  • Percentiles, Quartiles, IQR
  • Correlation (Pearson, Spearman)
Probability Distributions
  • Normal (Gaussian) Distribution
  • Binomial & Bernoulli
  • Poisson, Uniform Distribution
  • Central Limit Theorem
Visualization & Storytelling
  • Matplotlib (figures, axes, subplots)
  • Seaborn (heatmaps, pair plots, violin plots)
  • Plotly (interactive charts)
  • Choosing the right chart for the data

Analytics & Intelligence

Inferential Statistics
  • Population vs. Sample
  • Hypothesis Testing (Null vs. Alternative)
  • Z-test, T-test (one-sample, two-sample, paired)
  • Chi-Square test, ANOVA
  • P-values, Significance Level (α)
  • Confidence Intervals
  • Effect Size (Cohen's d)
  • Statistical Power & Sample Size
  • A/B Testing — design, execution, interpretation
Business Intelligence — Power BI
  • Data Import & Power Query
  • Data Modeling (Star Schema, Relationships)
  • DAX (Measures, CALCULATE, Time Intelligence)
  • Row-Level Security (RLS)
  • Interactive Report Design
Advanced Business Analytics
  • Funnel Analysis (Conversion Funnels)
  • Cohort Analysis
  • RFM Segmentation
  • Customer Segmentation
  • Pareto Analysis (80/20 Rule)
Capstone Projects
  • SQL Audit Report on a real database
  • Full Python EDA on a messy dataset
  • Interactive Power BI Executive Dashboard
2

Mathematics & Engineering Foundations

The mathematical backbone of AI and the engineering skills for production code

Linear Algebra & Core Programming

Git & Version Control
  • init, add, commit, push, pull
  • Branching, Merging, Rebasing
  • Conflict Resolution
  • GitHub: Pull Requests, Issues, Collaboration
Python OOP
  • Classes & Objects, __init__, self
  • Inheritance & Polymorphism
  • Encapsulation & Abstraction
  • Magic Methods (__str__, __repr__, __len__)
  • SOLID Principles
  • Design Patterns (Factory, Singleton)
  • Decorators & Context Managers
Linear Algebra
  • Scalars, Vectors, Matrices, Tensors
  • Dot Product, Cross Product
  • Matrix Multiplication, Transpose, Inverse
  • Determinants & Rank
  • Linear Independence & Span
  • Eigenvalues & Eigenvectors
  • Singular Value Decomposition (SVD)
  • Applications: PCA, Image Compression

Calculus & Data Structures

Calculus
  • Limits & Continuity
  • Derivatives & Differentiation Rules
  • Partial Derivatives & Gradients
  • The Chain Rule & Computational Graphs
  • Gradient Descent (Batch, SGD, Mini-Batch)
  • Convexity, Local vs. Global Minima
  • Integrals & Area Under Curves
Data Structures
  • Big O Notation (Time & Space Complexity)
  • Arrays & Dynamic Arrays
  • Linked Lists (Singly, Doubly)
  • Stacks & Queues
  • Hash Maps / Hash Tables
  • Trees: Binary Trees, BST, AVL
  • Heaps & Priority Queues
  • Tries (Prefix Trees)
Algorithms
  • Binary Search
  • Sorting: Merge Sort, Quick Sort, Heap Sort
  • Recursion & Backtracking
  • Two Pointers & Sliding Window

Probability, Graphs & System Design

Probability Theory
  • Conditional Probability
  • Bayes' Theorem & Bayesian Thinking
  • Random Variables (Discrete & Continuous)
  • Expectation, Variance
  • Joint, Marginal & Conditional Distributions
  • MLE & MAP
  • Information Theory (Entropy, KL Divergence)
Graph Algorithms
  • Graph Representations (Adjacency List / Matrix)
  • BFS & DFS
  • Dijkstra's Shortest Path
  • Topological Sort
  • Dynamic Programming Patterns
System Design Basics
  • REST APIs & HTTP Methods
  • Load Balancing & Caching
  • Database Design (SQL vs. NoSQL)
  • Microservices vs. Monolith
3

Machine Learning Mastery

Train predictive models — understand both the math and the practical intuition

Linear Models & Optimization

Linear Regression
  • Simple & Multiple Linear Regression
  • Cost Function (MSE, RMSE, MAE)
  • Normal Equation (closed-form)
  • Gradient Descent from Scratch (NumPy)
  • Polynomial Regression
  • Regularization: Ridge (L2), Lasso (L1), ElasticNet
  • Bias-Variance Tradeoff
  • Cross-Validation (K-Fold, Stratified)
Logistic Regression & Classification
  • Sigmoid & Decision Boundary
  • Binary Cross-Entropy Loss
  • Multi-class: OvR, Softmax
  • Confusion Matrix (TP, TN, FP, FN)
  • Precision, Recall, F1, ROC/AUC
Support Vector Machines
  • Maximum Margin Classifier
  • Soft Margin & C parameter
  • Kernel Trick (RBF, Polynomial)
Scikit-Learn Pipelines
  • Pipeline & ColumnTransformer
  • Custom Transformers (TransformerMixin)
  • Encoding: OneHot, Label, Ordinal
  • Scaling: Standard, MinMax, Robust

Ensembles & Unsupervised Learning

Tree-Based Models
  • Decision Trees (Entropy, Gini, Info Gain)
  • Random Forests (Bagging)
  • AdaBoost, Gradient Boosting
  • XGBoost, LightGBM, CatBoost
  • Stacking & Blending Ensembles
Hyperparameter Tuning
  • GridSearchCV, RandomizedSearchCV
  • Optuna (Bayesian Optimization)
Dimensionality Reduction
  • PCA — math & code
  • t-SNE, UMAP for Visualization
  • Feature Selection methods
Clustering
  • K-Means, K-Means++
  • Elbow Method, Silhouette Score
  • DBSCAN, HDBSCAN
  • Gaussian Mixture Models (GMM)
  • Hierarchical Clustering

Advanced ML & Production

Anomaly Detection
  • Isolation Forest, One-Class SVM
  • Local Outlier Factor (LOF)
Recommender Systems
  • Collaborative Filtering (User & Item)
  • Matrix Factorization (SVD, NMF)
  • Content-Based & Hybrid
Intro to Neural Networks
  • Perceptron & MLPs
  • Activation Functions (ReLU, Sigmoid, Tanh, Softmax)
  • Forward Pass & Backpropagation
Production Practices
  • OOP ML Pipelines
  • Imbalanced Data (SMOTE, Class Weights)
  • Feature Importance
  • Explainability: SHAP & LIME
  • Model Serialization (joblib, pickle)
Grand ML Capstone
  • End-to-End OOP ML project with mathematical whitepaper
4

Deep Learning & Neural Architectures

Master modern AI — CNNs, Transformers, LLMs, and Generative AI

DNNs & Optimization

Frameworks
  • TensorFlow & Keras (Sequential, Functional, Subclassing)
  • PyTorch (Tensors, Autograd, nn.Module)
Training Challenges
  • Vanishing & Exploding Gradients
  • Weight Initialization (Xavier, He)
  • Batch Normalization, Layer Normalization
  • Gradient Clipping
Optimizers
  • SGD + Momentum, RMSProp
  • Adam, AdamW
  • Learning Rate Scheduling (Cosine Annealing, Warm Restarts)
Regularization
  • Dropout & Spatial Dropout
  • Early Stopping, L2 Weight Decay
  • Data Augmentation as regularization
Custom Training
  • tf.GradientTape / PyTorch training loop
  • Custom Callbacks & Metrics
  • TensorBoard visualization

Computer Vision — CNNs

CNN Fundamentals
  • Convolution (Filters, Stride, Padding)
  • Pooling (Max, Average, Global Average)
  • Feature Maps & Receptive Field
Architectures
  • LeNet, AlexNet, VGG
  • ResNet (Skip / Residual Connections)
  • Inception, EfficientNet, MobileNet
Transfer Learning
  • Pre-trained Models (ImageNet)
  • Feature Extraction vs. Fine-Tuning
  • Freezing / Unfreezing Layers
Augmentation & Pipelines
  • Geometric, Color, Cutout, Mixup
  • tf.data API / PyTorch DataLoader
  • TFRecords & Prefetching
Advanced CV
  • Object Detection (YOLO basics)
  • Semantic Segmentation (U-Net basics)

NLP, Transformers, LLMs & GenAI

Sequence Models
  • RNNs, LSTMs, GRUs
  • Bidirectional & Stacked RNNs
  • Seq2Seq Encoder-Decoder
Time-Series
  • ARIMA, SARIMA, Prophet
  • DeepAR, N-BEATS
NLP Foundations
  • Tokenization, Stemming, Lemmatization
  • Bag of Words, TF-IDF
  • Word2Vec, GloVe, FastText
  • Attention (Bahdanau, Luong)
The Transformer Revolution
  • Self-Attention & Multi-Head Attention
  • Positional Encoding
  • "Attention Is All You Need" paper
  • BERT, GPT, T5
  • HuggingFace Transformers Library
  • Tokenizers (BPE, WordPiece)
Large Language Models
  • Prompt Engineering (Zero/Few-shot, CoT)
  • Fine-Tuning: LoRA, QLoRA, PEFT
  • RAG (Retrieval-Augmented Generation)
  • Vector DBs (ChromaDB, Pinecone, Weaviate)
  • LangChain / LlamaIndex
  • AI Agents & Agentic Workflows
Generative AI
  • Autoencoders & VAEs
  • GANs (DCGAN, Conditional, StyleGAN)
  • Diffusion Models (Stable Diffusion)
Multimodal AI
  • Vision-Language Models (CLIP, LLaVA)
Deep Learning Capstone
  • Deploy a custom DNN on unstructured data (images, text, or time-series)
5

MLOps & Cloud Architecture

Take models out of notebooks and deploy them to the cloud at scale

Containers & APIs

Docker
  • Images, Containers, Volumes, Networks
  • Dockerfile & Multi-Stage Builds
  • Docker Compose
  • Environment Management (Poetry, venv)
FastAPI — Model Serving
  • REST APIs for ML models
  • Pydantic validation
  • Async endpoints & batch inference
  • Swagger / OpenAPI docs
CI/CD
  • GitHub Actions workflows
  • Automated Testing (pytest)
  • Linting (ruff) & Formatting (black)
  • Pre-commit Hooks

Cloud & Big Data

AWS Fundamentals
  • IAM (Users, Roles, Policies)
  • S3 (Object Storage, Buckets)
  • EC2 (Virtual Machines)
  • Lambda (Serverless)
  • SageMaker (Training & Endpoints)
  • ECR (Container Registry)
  • CloudWatch (Monitoring)
Distributed Computing
  • Apache Spark (PySpark DataFrames)
  • Data Warehousing (Redshift, BigQuery)
  • ETL vs. ELT patterns
Orchestration
  • Apache Airflow (DAGs, Operators)
  • Prefect (modern alternative)

Production ML Lifecycle

Experiment Tracking
  • MLflow (Tracking, Registry, Serving)
  • Weights & Biases (W&B)
  • DVC (Data Version Control)
Model Monitoring
  • Data Drift (Evidently AI)
  • Concept Drift
  • Performance Monitoring & alerting
Deployment Strategies
  • Shadow Deployment
  • Canary Releases
  • Blue/Green Deployment
  • A/B Testing in Production
Responsible AI
  • Fairness & Bias Auditing
  • Model Cards & Documentation
  • AI Governance & Compliance
Final Grand Capstone
  • Continuous Training Pipeline: auto-fetch → retrain → evaluate → auto-deploy (GitHub Actions + AWS)

Books & Resources

Curated by phase — the exact books, courses, and platforms used by top engineers worldwide.

Phase 0–1: Foundation Books

Automate the Boring Stuff with Python FREE
Al Sweigart
The perfect first Python book. Practical and freely available online.
Python for Data Analysis ESSENTIAL
Wes McKinney (creator of Pandas)
The bible of data manipulation — NumPy, Pandas, and data cleaning.
Naked Statistics
Charles Wheelan
Statistics made intuitive and fun. No heavy math required.
Storytelling with Data
Cole Nussbaumer Knaflic
Master data visualization and compelling presentations.
Data Science for Business
Foster Provost & Tom Fawcett
Bridges business thinking and data-driven decision-making.

Phase 2–3: Math & ML Books

Mathematics for Machine Learning FREE PDF
Deisenroth, Faisal & Ong
Linear algebra, calculus, probability — written specifically for ML.
Intro to Statistical Learning (ISLR) FREE PDF
James, Witten, Hastie & Tibshirani
Gold standard ML theory intro. Python edition available.
Hands-On ML with Scikit-Learn, Keras & TF ESSENTIAL
Aurélien Géron
The most practical ML book. Classical ML + Deep Learning end-to-end.
Pattern Recognition & Machine Learning ADVANCED
Christopher Bishop
Heavy mathematical ML theory. The gold reference for serious learners.
The Hundred-Page Machine Learning Book
Andriy Burkov
Concise but rigorous. Perfect for revision and interview prep.

Phase 4–5: Deep Learning & MLOps

Deep Learning FREE ONLINE
Goodfellow, Bengio & Courville
The comprehensive DL textbook. Heavy theory — the industry standard.
Deep Learning with Python ESSENTIAL
François Chollet (creator of Keras)
Practical DL with Keras. Intuitive, beautifully written.
NLP with Transformers ESSENTIAL
Tunstall, von Werra & Wolf (HuggingFace)
THE book on Transformers. Written by the HuggingFace team.
Designing ML Systems
Chip Huyen
Production ML — deployment, monitoring, and system design.
Machine Learning Engineering
Andriy Burkov
MLOps best practices from prototype to production.

Online Courses

Machine Learning Specialization ESSENTIAL
Andrew Ng — DeepLearning.AI / Stanford (Coursera)
The world's most popular ML course. Rebuilt in 2022 with Python.
Deep Learning Specialization ESSENTIAL
Andrew Ng — DeepLearning.AI (Coursera)
5-course specialization. CNNs, RNNs, Transformers, and more.
Practical Deep Learning for Coders FREE
Jeremy Howard — fast.ai
Top-down practical approach. Build real models from day one.
MIT 6.S191 — Intro to Deep Learning FREE
MIT
Research-level, concise. Covers latest architectures yearly.
CS50 AI with Python FREE
Harvard (edX)
Search, optimization, neural nets, NLP — excellent fundamentals.
Google ML Crash Course FREE
Google Developers
ML fundamentals with TensorFlow and interactive Colab notebooks.

Practice Platforms

Kaggle FREE
Competitions, Datasets, Micro-Courses, Notebooks
The #1 data science competition platform. Free GPU/TPU access.
LeetCode
DSA & Algorithm Practice
Essential for coding interviews. Focus on Blind 75 + NeetCode 150.
StrataScratch / DataLemur
SQL Practice Platforms
Real SQL interview questions from FAANG companies.
HuggingFace FREE
Models, Datasets, Spaces
The GitHub of AI. Pre-trained models, datasets, and deployment.
Papers With Code FREE
Research Papers + Implementations
Every ML paper with its code. Stay at the cutting edge.

YouTube Channels

3Blue1Brown
Grant Sanderson
The best math visualizations on the internet. Linear Algebra & Calculus series are legendary.
StatQuest
Josh Starmer
Statistics & ML explained with crystal clarity. BAM!
Sentdex
Harrison Kinsley
Python, ML, Deep Learning — practical project-based tutorials.
Andrej Karpathy
Former Tesla AI Director
Neural Networks from scratch. "Let's build GPT" is a masterpiece.
Yannic Kilcher
ML Paper Reviews
Deep dives into cutting-edge research papers explained clearly.
Two Minute Papers
Károly Zsolnai-Fehér
Quick exciting overviews of latest AI research breakthroughs.

Cutting-Edge Topics (2025–2026)

AI Agents & Agentic Workflows
Graph Neural Networks (GNNs)
Reinforcement Learning (PPO, DQN)
Federated & Privacy-Preserving ML
AutoML (Auto-Sklearn, FLAML, H2O)
Explainable AI (XAI)
Edge AI & TinyML
Zero-Shot / Few-Shot Learning
Vector Databases (Pinecone, Weaviate)
Text-to-Image & Text-to-Video
Multimodal Foundation Models
AI Ethics, Safety & Alignment