Program Starts
July 8 2025 - Sept 25, 2025
Cohort Sessions Tues and Thursdays
6:30 pm : 8 pm CST
It will be hosted on a Google Meeting with live teaching Q&A and labs
How it Works
You will have access to exclusive, curated training videos from previous cohorts then you will get live coaching
Hi, Iโm Mezue your instructor
I am a seasoned Azure Databricks Data Engineer with 10 years of experience. I have worked with various clients, consulting across different industries, and currently, I am employed at Capgemini, a top IT consulting firm
Weekly Topics
โ Week 1: SQL for Data Engineering
- Advanced SQL: joins, window functions, CTEs, aggregations
- Query optimization and performance tips
- Hands-on SQL exercises using realistic datasets
๐ ๏ธ Lab: Clean and join customer + transaction tables using complex SQL
โ Week 2: Python for Data Engineering
- Python core concepts for data engineers
- Working with files, APIs, exceptions
- Intro to Pandas vs PySpark
๐ ๏ธ Lab: Write a script to pull data from an API and clean using Pandas
โ Week 3: Introduction to Azure Data Engineering
- Overview of cloud data platforms
- What is Azure, role of a Data Engineer
- Azure services ecosystem: Synapse, Data Factory, Databricks, ADLS, Key Vault, Azure Monitor
- Resource Groups, IAM, Networking basics
๐ ๏ธ Mini-project: Create and provision Azure resources (Databricks, ADLS Gen2, Key Vault)
โ Week 3: Deep Dive into Azure Storage (ADLS Gen2)
- ADLS Gen2 architecture
- Introduction to Big Data
- File formats: CSV, Parquet, JSON, Parquet, Avro, Delta
- Storage security (access keys, SAS tokens, service principals)
- Mounting ADLS in Databricks
๐ ๏ธ Hands-on: Upload raw datasets and explore access/mounting via Databricks
โ Week 4: Introduction to Azure Data Factory (ELT)
- Data Factory basics: linked services, datasets, copy activity
- Extracting data from APIs and databases into ADLS
- Extracting data from Json/Csv file in blob storage into Database/Delta table
๐ ๏ธ Lab: Use ADF to load CSV/JSON from web into raw ADLS folder
โ Week 5: PySpark Essentials in Databricks
- Spark execution model
- Distributed computing with Spark
- Spark DataFrames, RDDs
- Transformations and actions
- UDFs and performance tips
- DataFrames and schema inference
- Partitioning, transformations, actions
๐ ๏ธ Lab: Load data from ADLS and perform batch transforms using PySpark
โ Week 6: Databricks Delta Lake and Incremental Processing
- Delta Lake concepts (ACID, versioning, schema evolution)
- Time Travel, Upserts (MERGE)
- Change Data Capture (CDC) design
- Designing efficient incremental loads
- Delta vs Parquet: differences, benefits
- Schema evolution and versioning
- VACUUM, OPTIMIZE, Time Travel
- Medallion architecture: Bronze, Silver, Gold design
๐ ๏ธ Project: Implement full + incremental load using Delta Lake
โ Week 7: Advanced Databricks -Unity Catalog & Data Governance
- What is Unity Catalog? Benefits
- Managing users, permissions, and access control
- Lineage tracking and auditing
๐ ๏ธ Lab: Register Delta tables to Unity Catalog with access controls
โ Week 8: Spark Streaming in Databricks - Auto Loader & Ingestion Patterns
- File-based incremental ingestion using Auto Loader
- File notifications vs directory listing
- Checkpointing and schema evolution in Auto Loader
๐ ๏ธ Lab: Set up Auto Loader to ingest JSON data into Bronze Delta table
- Structured Streaming concepts
- Triggers, watermarks, streaming joins
- Sink formats: Delta, console, memory
๐ ๏ธ Lab: Stream EventHub data into Bronze layer using Auto Loader & Delta
โ Week 9: Data Modeling & Data Warehousing
- OLTP vs OLAP
- Dimensional modeling: Star vs Snowflake schemas
- Slowly Changing Dimensions (SCD Types)
- Data vault and modern alternatives
ยทย ย ย ย ย ย ย Choosing the right grain, surrogate keys
๐ ๏ธ Exercise: Design a dimensional model for a sample e-commerce dataset
โ Week 10: Orchestration & Monitoring + CI/CD
- ADF pipelines + triggers + parameterization
- Invoking Databricks notebooks from ADF
- Databricks Workflows
- ADF + Git integration
- Deploying pipelines via Azure DevOps
- Logging, alerting, retry policies
๐ ๏ธ Project: Full orchestration from raw โ gold using ADF + Databricks
โ Week 11: Capstone Project โ Architecture & Build
- Teams select domain (finance, IoT, sales)
- Apply Medallion architecture
- Use Auto Loader, Delta, Unity Catalog, ADF
- Document and present architecture
๐ ๏ธ Goal: Production-quality pipeline with reusable patterns
โ Week 12: Final Reviews + Job Preparation
- Capstone project demos and critique
- Resume reviews + LinkedIn optimization
- Python algorithm drills (loops, recursion, dict, list ops)
- Mock interviews with feedback
๐ ๏ธ Prep: Whiteboarding 2-3 Python challenges + system design Qsย