1. SQL for Azure Data Engineers
with Databricks
Presented by: [Your Name]
Team: [Your Team Name]
Date: [Presentation Date]
2. What is SQL?
• Structured Query Language used to manage relational databases.
• Core for querying Azure SQL, Synapse Analytics, and Databricks.
3. SQL SELECT Statement
• Used to retrieve data from a table.
• Example:
• SELECT FirstName, LastName FROM Employees;
4. SQL WHERE Clause
• Filters rows based on a condition.
• Example:
• SELECT * FROM Sales WHERE Region = 'West';
5. SQL JOINs
• Used to combine rows from two or more tables.
• Example:
• SELECT o.OrderID, c.CustomerName FROM Orders o
• JOIN Customers c ON o.CustomerID = c.CustomerID;
6. SQL GROUP BY and HAVING
• GROUP BY aggregates data.
• HAVING filters aggregated results.
• Example:
• SELECT Region, COUNT(*) FROM Sales GROUP BY Region HAVING COUNT(*) > 10;
7. SQL INSERT, UPDATE, DELETE
• INSERT INTO Employees (Name, Role) VALUES ('Alex', 'Analyst');
• UPDATE Employees SET Role = 'Manager' WHERE Name = 'Alex';
• DELETE FROM Employees WHERE Name = 'Alex';
8. SQL Window Functions
• Useful for analytical queries in Databricks and Synapse.
• Example:
• SELECT Name, Salary, RANK() OVER (ORDER BY Salary DESC) AS Rank FROM
Employees;
9. SQL with Azure Databricks
• Use `%sql` magic for SQL cells.
• Query Delta tables stored in ADLS:
• %sql SELECT * FROM delta.`/mnt/data/sales`;
10. Common Use Cases in Azure
• - Ingest and validate data using SQL in ADF data flows.
• - Run batch queries in Synapse and Databricks.
• - Use SQL in Power BI for dashboards.
• - Implement SCD (Slowly Changing Dimensions) using SQL.
11. Data Types in SQL
• INT, VARCHAR, FLOAT, DATE, BOOLEAN
• Example:
• CREATE TABLE Customers (ID INT, Name VARCHAR(100), SignupDate DATE);
12. Creating and Altering Tables
• CREATE TABLE Orders (ID INT, Amount FLOAT);
• ALTER TABLE Orders ADD COLUMN OrderDate DATE;
13. Working with NULLs
• IS NULL and IS NOT NULL
• COALESCE to replace NULLs
• Example:
• SELECT COALESCE(CreditScore, 0) FROM Customers;
14. CTEs and Temp Tables
• CTE Example:
• WITH TopSales AS (SELECT * FROM Sales ORDER BY Amount DESC LIMIT 10)
• SELECT * FROM TopSales;
15. SQL Best Practices for Azure
Engineers
• - Use parameterized queries in notebooks.
• - Avoid SELECT * in production.
• - Optimize joins and indexes in Azure SQL.
• - Use Delta Lake for ACID and schema enforcement.