Program Starts
July 8 2025 - Sept 25, 2025


Cohort Sessions Tues and Thursdays

6:30 pm : 8 pm CST

It will be hosted on a Google Meeting with live teaching Q&A and labs


How it Works


You will have access to exclusive, curated training videos from previous cohorts then you will get live coaching










Hi, Iโ€™m Mezue your instructor




I am a seasoned Azure Databricks Data Engineer with 10 years of experience. I have worked with various clients, consulting across different industries, and currently, I am employed at Capgemini, a top IT consulting firm

Weekly Topics


โœ… Week 1: SQL for Data Engineering

  • Advanced SQL: joins, window functions, CTEs, aggregations
  • Query optimization and performance tips
  • Hands-on SQL exercises using realistic datasets

๐Ÿ› ๏ธ Lab: Clean and join customer + transaction tables using complex SQL

โœ… Week 2: Python for Data Engineering

  • Python core concepts for data engineers
  • Working with files, APIs, exceptions
  • Intro to Pandas vs PySpark

๐Ÿ› ๏ธ Lab: Write a script to pull data from an API and clean using Pandas

โœ… Week 3: Introduction to Azure Data Engineering

  • Overview of cloud data platforms
  • What is Azure, role of a Data Engineer
  • Azure services ecosystem: Synapse, Data Factory, Databricks, ADLS, Key Vault, Azure Monitor
  • Resource Groups, IAM, Networking basics

๐Ÿ› ๏ธ Mini-project: Create and provision Azure resources (Databricks, ADLS Gen2, Key Vault)



โœ… Week 3: Deep Dive into Azure Storage (ADLS Gen2)

  • ADLS Gen2 architecture
  • Introduction to Big Data
  • File formats: CSV, Parquet, JSON, Parquet, Avro, Delta
  • Storage security (access keys, SAS tokens, service principals)
  • Mounting ADLS in Databricks

๐Ÿ› ๏ธ Hands-on: Upload raw datasets and explore access/mounting via Databricks

โœ… Week 4: Introduction to Azure Data Factory (ELT)

  • Data Factory basics: linked services, datasets, copy activity
  • Extracting data from APIs and databases into ADLS
  • Extracting data from Json/Csv file in blob storage into Database/Delta table

๐Ÿ› ๏ธ Lab: Use ADF to load CSV/JSON from web into raw ADLS folder



โœ… Week 5: PySpark Essentials in Databricks

  • Spark execution model
  • Distributed computing with Spark
  • Spark DataFrames, RDDs
  • Transformations and actions
  • UDFs and performance tips
  • DataFrames and schema inference
  • Partitioning, transformations, actions

๐Ÿ› ๏ธ Lab: Load data from ADLS and perform batch transforms using PySpark

โœ… Week 6: Databricks Delta Lake and Incremental Processing

  • Delta Lake concepts (ACID, versioning, schema evolution)
  • Time Travel, Upserts (MERGE)
  • Change Data Capture (CDC) design
  • Designing efficient incremental loads
  • Delta vs Parquet: differences, benefits
  • Schema evolution and versioning
  • VACUUM, OPTIMIZE, Time Travel
  • Medallion architecture: Bronze, Silver, Gold design

๐Ÿ› ๏ธ Project: Implement full + incremental load using Delta Lake

โœ… Week 7: Advanced Databricks -Unity Catalog & Data Governance

  • What is Unity Catalog? Benefits
  • Managing users, permissions, and access control
  • Lineage tracking and auditing

๐Ÿ› ๏ธ Lab: Register Delta tables to Unity Catalog with access controls



โœ… Week 8: Spark Streaming in Databricks - Auto Loader & Ingestion Patterns

  • File-based incremental ingestion using Auto Loader
  • File notifications vs directory listing
  • Checkpointing and schema evolution in Auto Loader

๐Ÿ› ๏ธ Lab: Set up Auto Loader to ingest JSON data into Bronze Delta table

  • Structured Streaming concepts
  • Triggers, watermarks, streaming joins
  • Sink formats: Delta, console, memory

๐Ÿ› ๏ธ Lab: Stream EventHub data into Bronze layer using Auto Loader & Delta




โœ… Week 9: Data Modeling & Data Warehousing

  • OLTP vs OLAP
  • Dimensional modeling: Star vs Snowflake schemas
  • Slowly Changing Dimensions (SCD Types)
  • Data vault and modern alternatives

ยทย ย ย ย ย ย ย Choosing the right grain, surrogate keys


๐Ÿ› ๏ธ Exercise: Design a dimensional model for a sample e-commerce dataset



โœ… Week 10: Orchestration & Monitoring + CI/CD

  • ADF pipelines + triggers + parameterization
  • Invoking Databricks notebooks from ADF
  • Databricks Workflows
  • ADF + Git integration
  • Deploying pipelines via Azure DevOps
  • Logging, alerting, retry policies

๐Ÿ› ๏ธ Project: Full orchestration from raw โ†’ gold using ADF + Databricks



โœ… Week 11: Capstone Project โ€“ Architecture & Build

  • Teams select domain (finance, IoT, sales)
  • Apply Medallion architecture
  • Use Auto Loader, Delta, Unity Catalog, ADF
  • Document and present architecture

๐Ÿ› ๏ธ Goal: Production-quality pipeline with reusable patterns



โœ… Week 12: Final Reviews + Job Preparation

  • Capstone project demos and critique
  • Resume reviews + LinkedIn optimization
  • Python algorithm drills (loops, recursion, dict, list ops)
  • Mock interviews with feedback

๐Ÿ› ๏ธ Prep: Whiteboarding 2-3 Python challenges + system design Qsย 






Register NOW

Join Summer 2025 Cohort NOW