Bootcamp Azure Data Engineering

Program Starts
July 8 2025 - Sept 25, 2025

Cohort Sessions Tues and Thursdays

6:30 pm : 8 pm CST

It will be hosted on a Google Meeting with live teaching Q&A and labs

How it Works

You will have access to exclusive, curated training videos from previous cohorts then you will get live coaching

Hi, I’m Mezue your instructor

I am a seasoned Azure Databricks Data Engineer with 10 years of experience. I have worked with various clients, consulting across different industries, and currently, I am employed at Capgemini, a top IT consulting firm

Weekly Topics

✅ Week 1: SQL for Data Engineering

Advanced SQL: joins, window functions, CTEs, aggregations
Query optimization and performance tips
Hands-on SQL exercises using realistic datasets

🛠️ Lab: Clean and join customer + transaction tables using complex SQL

✅ Week 2: Python for Data Engineering

Python core concepts for data engineers
Working with files, APIs, exceptions
Intro to Pandas vs PySpark

🛠️ Lab: Write a script to pull data from an API and clean using Pandas

✅ Week 3: Introduction to Azure Data Engineering

Overview of cloud data platforms
What is Azure, role of a Data Engineer
Azure services ecosystem: Synapse, Data Factory, Databricks, ADLS, Key Vault, Azure Monitor
Resource Groups, IAM, Networking basics

🛠️ Mini-project: Create and provision Azure resources (Databricks, ADLS Gen2, Key Vault)

✅ Week 3: Deep Dive into Azure Storage (ADLS Gen2)

ADLS Gen2 architecture
Introduction to Big Data
File formats: CSV, Parquet, JSON, Parquet, Avro, Delta
Storage security (access keys, SAS tokens, service principals)
Mounting ADLS in Databricks

🛠️ Hands-on: Upload raw datasets and explore access/mounting via Databricks

✅ Week 4: Introduction to Azure Data Factory (ELT)

Data Factory basics: linked services, datasets, copy activity
Extracting data from APIs and databases into ADLS
Extracting data from Json/Csv file in blob storage into Database/Delta table

🛠️ Lab: Use ADF to load CSV/JSON from web into raw ADLS folder

✅ Week 5: PySpark Essentials in Databricks

Spark execution model
Distributed computing with Spark
Spark DataFrames, RDDs
Transformations and actions
UDFs and performance tips
DataFrames and schema inference
Partitioning, transformations, actions

🛠️ Lab: Load data from ADLS and perform batch transforms using PySpark

✅ Week 6: Databricks Delta Lake and Incremental Processing

Delta Lake concepts (ACID, versioning, schema evolution)
Time Travel, Upserts (MERGE)
Change Data Capture (CDC) design
Designing efficient incremental loads
Delta vs Parquet: differences, benefits
Schema evolution and versioning
VACUUM, OPTIMIZE, Time Travel
Medallion architecture: Bronze, Silver, Gold design

🛠️ Project: Implement full + incremental load using Delta Lake

✅ Week 7: Advanced Databricks -Unity Catalog & Data Governance

What is Unity Catalog? Benefits
Managing users, permissions, and access control
Lineage tracking and auditing

🛠️ Lab: Register Delta tables to Unity Catalog with access controls

✅ Week 8: Spark Streaming in Databricks - Auto Loader & Ingestion Patterns

File-based incremental ingestion using Auto Loader
File notifications vs directory listing
Checkpointing and schema evolution in Auto Loader

🛠️ Lab: Set up Auto Loader to ingest JSON data into Bronze Delta table

Structured Streaming concepts
Triggers, watermarks, streaming joins
Sink formats: Delta, console, memory

🛠️ Lab: Stream EventHub data into Bronze layer using Auto Loader & Delta

✅ Week 9: Data Modeling & Data Warehousing

OLTP vs OLAP
Dimensional modeling: Star vs Snowflake schemas
Slowly Changing Dimensions (SCD Types)
Data vault and modern alternatives

· Choosing the right grain, surrogate keys

🛠️ Exercise: Design a dimensional model for a sample e-commerce dataset

✅ Week 10: Orchestration & Monitoring + CI/CD

ADF pipelines + triggers + parameterization
Invoking Databricks notebooks from ADF
Databricks Workflows
ADF + Git integration
Deploying pipelines via Azure DevOps
Logging, alerting, retry policies

🛠️ Project: Full orchestration from raw → gold using ADF + Databricks

✅ Week 11: Capstone Project – Architecture & Build

Teams select domain (finance, IoT, sales)
Apply Medallion architecture
Use Auto Loader, Delta, Unity Catalog, ADF
Document and present architecture

🛠️ Goal: Production-quality pipeline with reusable patterns

✅ Week 12: Final Reviews + Job Preparation

Capstone project demos and critique
Resume reviews + LinkedIn optimization
Python algorithm drills (loops, recursion, dict, list ops)
Mock interviews with feedback

🛠️ Prep: Whiteboarding 2-3 Python challenges + system design Qs

Fundamentals of Data Engineering

Cohort Summer 2025

Program Starts
July 8 2025 - Sept 25, 2025

Cohort Sessions Tues and Thursdays

6:30 pm : 8 pm CST

It will be hosted on a Google Meeting with live teaching Q&A and labs

How it Works

You will have access to exclusive, curated training videos from previous cohorts then you will get live coaching

Hi, I’m Mezue your instructor

Weekly Topics

Register NOW

Summer 2025 Bootcamp

Join Summer 2025 Cohort NOW

Summer 2025 Bootcamp Plan

Fundamentals of Data Engineering

Cohort Summer 2025

Program StartsJuly 8 2025 - Sept 25, 2025

Cohort Sessions Tues and Thursdays

6:30 pm : 8 pm CST

It will be hosted on a Google Meeting with live teaching Q&A and labs

How it Works

You will have access to exclusive, curated training videos from previous cohorts then you will get live coaching

Hi, I’m Mezue your instructor

Weekly Topics

Register NOW

Summer 2025 Bootcamp

Join Summer 2025 Cohort NOW

Summer 2025 Bootcamp Plan

Program Starts
July 8 2025 - Sept 25, 2025