SQL for data scientist And data analysist Advanced
1. What is Data?
Data is just information in its raw form. It can be numbers, words, pictures, or
anything that helps us understand something.
For example:
Your name and age are data.
A list of students' marks in a school is data.
A photo on your phone is also data.
2. Types of Data
Structured Data: Organized in
rows & columns
Unstructured Data: No
predefined format
Semi-structured Data: Contains
some structure but is not strictly
tabular
3. What is Database?
A database is a structured collection of data that allows for efficient
storage, retrieval, and management.
Key Features of Databases:
Organizes large amounts of data efficiently
Allows fast retrieval of specific data
Supports multi-user access
Ensures data integrity and security
Examples: E-commerce databases (Amazon, Flipkart)
1.
Banking databases (Customer transactions)
2.
4. Types of Databases
Relational Databases
1. 2. Non-Relational Databases
Data is stored in structured tables
with rows & columns.
Uses SQL for querying and data
management.
Examples: MySQL, PostgreSQL,
SQL Server.
Data is stored in various formats
(key-value, document, graph).
Used for unstructured or semi-
structured data.
Examples: MongoDB, Redis,
Cassandra.
5. 3. Object-Oriented Database 4. Hierarchical Database
Data is stored in the form of
objects (similar to OOP
concepts).
Supports classes,
inheritance, and
encapsulation.
Examples: ObjectDB, db4o.
Data is organized in a tree-like
structure with parent-child
relationships.
Each child record has only one
parent.
Examples: IBM IMS, Windows
Registry.
6. 5. Network Database 6. Graph Database
Allows complex relationships
with many-to-many links.
Data is organized in a graph
where entities can have
multiple relationships.
Examples: Integrated Data
Store (IDS), TurboIMAGE.
Focuses on relationships using
nodes and edges.
Ideal for social networks,
recommendation engines, etc.
Examples: Neo4j, Amazon
Neptune.
7. 7. Time-Series Database
Specially designed to handle
time-stamped or time-series
data.
Ideal for monitoring, IoT, or
financial data trends.
Examples: InfluxDB,
TimescaleDB.
8. What is DBMS?
A Database Management System (DBMS) is software that manages databases
and allows users to create, retrieve, update, and delete data.
Examples: MySQL, PostgreSQL, Microsoft Access
9. SQL - Structured Query Language
SQL (Structured Query Language) is a programming language used for
accessing , manipulating and managing relational databases.
Why is SQL Important?
Standardized language for databases.
Helps store, retrieve, and manipulate data.
Used in various industries like finance, e-commerce, healthcare.
SQL syntax example
SELECT * FROM Table_name;
10. Client - Server Architecture
SQL databases follow a client-server architecture where clients send queries to
the database server, which processes and returns the data.
Components:
Client (User or Application) – Sends SQL
queries.
Database Server – Stores and manages
data, executes queries.
Network Connection – Communication
between client & server.
11. SQL Languages
SQL is categorized into five types of languages, each serving a specific purpose:
Data Definition Language (DDL) – Defines the structure of the database (e.g., CREATE,
ALTER, DROP, RENAME, TRUNCATE).
Data Manipulation Language (DML) – Modifies and manages data in tables (e.g., INSERT,
UPDATE, DELETE).
Data Query Language (DQL) – Retrieves data from the database (SELECT statement).
Data Control Language (DCL) – Manages access permissions (e.g., GRANT, REVOKE).
Transaction Control Language (TCL) – Controls transactions to maintain data integrity
(e.g., COMMIT, ROLLBACK, SAVE POINT).
12. Data Definition Language (DDL)
It is a subset of SQL (Structured Query Language) used to define, manage, and
modify database structures such as tables, indexes, views, and schemas.
CREATE:
1. Create statement is used to define and create databases, tables, views,
indexes, and other database objects.
MYSQL
create database <databasename> ;
create database if not exists <databasename> ;
To Creating the New database
1.
2. It ensures that the database will be created only if it does not already exist, thus
prevents errors from attempting to create a duplicate database.
Syntax:-
Syntax:-
13. SQL Server
To Creating the New database
1.
create database <databasename> ;
2. It ensures that the database will be created only if it does not already exist, thus
prevents errors from attempting to create a duplicate database.
IF NOT EXISTS (SELECT name FROM sys.databases WHERE name = <databasename>)
BEGIN
CREATE DATABASE <databasename>;
END
Note: MYSQL and SQL Server both are not a case sensitive. It considers Lower
case and Upper case are the same
Syntax:-
Syntax:-
14. 2. DROP: To drop the database or Table
Syntax:-
MYSQL
Drop database <databasename>;
SQL Server
To drop the database or Table
1.
Syntax:-
Drop database <databasename>;
Drop Table <Table_name>;
15. 3. ALTER TABLE:
ADD – Add a new column to the table
Syntax:- Alter table table_name add column column_name data type
SQL Server
Syntax:-
DROP – Remove a column from the table
Syntax:- Alter table table_name drop column column_name
SQL Server
Syntax:-
Used to make changes to an existing table structure.
MYSQL
MYSQL
16. MYSQL
MODIFY – Change the data type or size of a column
Syntax:- Alter table table_name modify column data type or size
SQL Server
Syntax:-
MYSQL
RENAME – Rename a column or the table itself
Syntax:- Alter table table_name rename column old_column to new_column
SQL Server
Syntax:-
17. TRUNCATE – Used to delete all records from a table
Syntax:- TRUNCATE TABLE table_name;
SQL Server
Syntax:-
MYSQL
18. CONSTRAINTS: Set of rules applied to table columns to ensure data accuracy and
integrity in a database.
Types of Constraints:
1.NOT NULL : Ensures a column cannot have empty (NULL) values
Syntax:-
CREATE TABLE students (
id INT NOT NULL,
name VARCHAR(50) NOT NULL
);
19. 2.UNIQUE : Ensures all values in a column are different
Syntax:-
CREATE TABLE users (
email VARCHAR(100) UNIQUE
);
3.PRIMARY KEY : Uniquely identifies each row and cannot be NULL
Syntax:-
CREATE TABLE employees (
emp_id INT PRIMARY KEY,
name VARCHAR(100)
);
20. 4.FOREIGN KEY : Connects a column to another table’s primary key
Syntax:-
5.CHECK: Ensures the values in a column meet a specific condition
Syntax:-
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT,
FOREIGN KEY (customer_id) REFERENCES customers(id)
);
CREATE TABLE products (
price DECIMAL(10, 2),
CHECK (price > 0)
);
21. 6.DEFAULT : Sets a default value for a column if no value is provided
Syntax:-
7.AUTO-INCREMENT: Automatically generates a unique number for each
new record, usually for primary key columns.
Syntax:-
CREATE TABLE accounts (
status VARCHAR(10) DEFAULT 'active'
);
CREATE TABLE students (
id INT AUTO_INCREMENT,
name VARCHAR(100),
PRIMARY KEY (id)
);
22. Data Manipulation Language (DML)
Used to manage and manipulate data within database tables, such as INSERT,
UPDATE, DELETE.
INSERT:
1. Adds new records to a table
MYSQL Syntax:-
INSERT INTO table_name (column1, column2)
VALUES (value1, value2);
SQLserver Syntax:-
23. 2. UPDATE: Modifies existing records in a table
MYSQL Syntax:- UPDATE table_name
SET column1 = value1, column2 = value2;
SQLserver Syntax:-
3. DELETE: Removes specific records from a table
MYSQL Syntax:- DELETE FROM table_name;
SQLserver Syntax:-
24. Data Query Language (DQL)
Used to fetch data from the database using the SELECT statement.
SELECT: Retrieves data from one or more tables
MYSQL Syntax:-
SELECT column1, column2
FROM table_name;
SQLserver Syntax:-
WHERE: Used to filter records based on a condition.
Syntax:- SELECT * FROM students WHERE age > 18;
25. Comparision Operators:
= (Equal to): Used to match an exact value.
SELECT * FROM students WHERE age = 18;
!= or <> (Not equal to): Checks if the value is not equal.
SELECT * FROM employees WHERE name != 'John';
> (Greater than): Filters values greater than a number.
SELECT * FROM marks WHERE score > 75;
< (Less than): Filters values less than a number.
SELECT * FROM students WHERE age < 20;
26. >= (Greater than or equal to): Includes values that are equal to or above a number.
SELECT * FROM staff WHERE salary >= 30000;
<= (Less than or equal to): Includes values that are equal to or below a number.
SELECT * FROM students WHERE age <= 25;
27. Logical Operators:
AND (Both conditions must be true): Combines multiple conditions.
SELECT * FROM students WHERE age > 18 AND city = 'Delhi';
OR (At least one condition must be true): Selects rows if any condition is true.
SELECT * FROM students WHERE age < 18 OR city = 'Mumbai';
28. BETWEEN: Checks if a value is within a range (inclusive).
SELECT * FROM table_name WHERE column_name BETWEEN value1 AND value2;
NOT BETWEEN: Checks if a value is outside the range.
SELECT * FROM table_name WHERE column_name NOT BETWEEN value1 AND value2;
IN (Match list of values):Filters rows with values that exist in a list.
SELECT * FROM table_name WHERE column_name IN (value1, value2, value3);
NOT IN: Checks if a value does not exist in the list.
SELECT * FROM table_name WHERE column_name NOT IN (value1, value2, value3);
29. IS: Checks for NULL value.
SELECT * FROM table_name WHERE column_name IS NULL;
IS NOT: Checks that value is not NULL.
SELECT * FROM table_name WHERE column_name IS NOT NULL;
LIKE: Checks for a pattern (using % or _).
SELECT * FROM table_name WHERE column_name LIKE 'A%'; -- Starts with A
SELECT * FROM table_name WHERE column_name LIKE '%A'; -- Ends with A
SELECT * FROM table_name WHERE column_name LIKE '%A%'; -- Contains A
NOT LIKE: Finds values that don’t match the pattern.
SELECT * FROM table_name WHERE column_name NOT LIKE 'A%';
30. Functions: Functions perform operations on data and return a single value result.
Numeric Functions :
Used to perform mathematical operations on numeric values.
BS(number) – Returns the absolute value.
Syntax: SELECT ABS(-15);
CEIL(number) – Rounds up to the nearest integer.
Syntax: SELECT CEIL(4.3);
FLOOR(number) – Rounds down to the nearest integer.
Syntax: SELECT FLOOR(4.8);
31. ROUND(number, decimals) – Rounds a number to specified decimals.
Syntax: SELECT ROUND(3.456, 2);
POWER(x, y) – Returns x raised to the power y.
Syntax: SELECT POWER(2, 3);
MOD(x, y) – Returns the remainder of division.
Syntax: SELECT MOD(10, 3);
SQRT(number) – Returns the square root of a number.
Syntax: SELECT SQRT(25);
32. String Functions:
Used to perform operations on text or string values.
LENGTH(string) – Returns number of characters.
Syntax: SELECT LENGTH('Hello');
UPPER(string) – Converts text to uppercase.
Syntax: SELECT UPPER('hello');
LOWER(string) – Converts text to lowercase.
Syntax: SELECT LOWER('HELLO');
33. CONCAT(str1, str2) – Joins two or more strings.
Syntax: SELECT CONCAT('Hello', 'World');
SUBSTRING(str, start, len) – Extracts part of a string.
Syntax: SELECT SUBSTRING('Hello', 1, 3);
TRIM(string) – Removes spaces from ends.
Syntax: SELECT TRIM(' Hello ');
REPLACE(str, from, to) – Replaces part of a string.
Syntax: SELECT REPLACE('cat', 'c', 'b');
34. Date Functions
Used to work with date and time values.
CURRENT_DATE – Returns the current date.
Syntax: SELECT CURRENT_DATE;
CURRENT_TIME – Returns the current time.
Syntax: SELECT CURRENT_TIME;
NOW() – Returns current date and time.
Syntax: SELECT NOW();
DATE(datetime) – Extracts only date part.
Syntax: SELECT DATE(NOW());
DAY(date) – Extracts the day.
Syntax: SELECT DAY('2025-04-21');
35. MONTH(date) – Extracts the month.
Syntax: SELECT MONTH('2025-04-21');
YEAR(date) – Extracts the year.
Syntax: SELECT YEAR('2025-04-21');
DATEDIFF(d1, d2) – Finds days between two dates.
Syntax: SELECT DATEDIFF('2025-01-01', '2024-01-01');
DATE_ADD(date, INTERVAL x DAY) – Adds days to a date.
Syntax: SELECT DATE_ADD('2025-01-01', INTERVAL 10 DAY);
DATE_SUB(date, INTERVAL x DAY) – Subtracts days from a date.
Syntax: SELECT DATE_SUB('2025-01-01', INTERVAL 5 DAY);
36. GROUP BY: Groups rows that have the same values in specified columns.
Syntax:-
SELECT department, COUNT(*)
FROM employees
GROUP BY department;
HAVING: Filters groups based on a condition (used with GROUP BY).
Syntax:-
SELECT department, COUNT(*)
FROM employees
GROUP BY department
HAVING COUNT(*) > 5;
37. ORDER BY: Sorts the result set in ascending (ASC) or descending (DESC) order.
Syntax:-
SELECT name, age FROM students
ORDER BY age DESC;
LIMIT: Restricts the number of records returned.
Syntax:- SELECT * FROM students
LIMIT 5;
OFFSET: Skips a specified number of records before starting to return rows.
Syntax:-
SELECT * FROM students
LIMIT 5 OFFSET 10;
38. JOINS
A JOIN is used to combine rows from two or more tables based
on a related column between them.
Types of JOINS:
1. INNER JOIN
Returns only the matching rows from both tables.
Syntax:-
SELECT a.name, b.salary FROM employees a
INNER JOIN salary b ON a.id = b.emp_id;
39. 2. LEFT JOIN (LEFT OUTER JOIN)
Returns all rows from the left table and the matched rows from the right table.
Syntax:-
SELECT a.name, b.salary FROM employees a
LEFT JOIN salary b ON a.id = b.emp_id;
3. RIGHT JOIN (RIGHT OUTER JOIN)
Returns all rows from the right table and the matched rows from the left table.
Syntax:-
SELECT a.name, b.salary FROM employees a
RIGHT JOIN salary b ON a.id = b.emp_id;
40. 4. FULL JOIN (FULL OUTER JOIN)
Returns all rows from both tables, whether matched or not.
Syntax:-
SELECT a.name, b.salary FROM employees a
FULL JOIN salary b ON a.id = b.emp_id;
5. CROSS JOIN
Returns the cartesian product of both tables (every combination).
Syntax:-
SELECT a.name, b.salary FROM employees a
CROSS JOIN salary b;
41. Rules & Characteristics of SQL JOINS
Work on two or more tables.
Require a common column to connect tables.
Use the ON keyword to define the joining condition.
Result set may vary depending on the type of JOIN used.
JOINS can be combined with WHERE, GROUP BY, ORDER BY, etc.
You can join tables using table aliases for better readability.
If no match is found:
INNER JOIN will exclude that row.
LEFT JOIN will include the row with NULL from the right table.
RIGHT JOIN will include the row with NULL from the left table.
FULL JOIN will include all rows, filling unmatched parts with NULL.
42. UNION
Combines result sets of two or more SELECT statements and removes duplicate rows.
Syntax:-
SELECT name FROM students_a
UNION
SELECT name FROM students_b;
UNION ALL
Combines result sets of two or more SELECT statements including duplicates.
Syntax:-
SELECT name FROM students_a
UNION ALL
SELECT name FROM students_b;
43. Difference Between UNION and UNION ALL
UNION removes duplicates.
UNION ALL keeps duplicates.
UNION is slower (because it checks for duplicates).
UNION ALL is faster and more performance-friendly.
44. SUB QUERIES
A subquery is a query inside another query to fetch data.
Types of Subqueries
1. Correlated Subquery: Subquery depends on outer query for its value.
Syntax:- SELECT name
FROM employees e
WHERE salary > (SELECT AVG(salary) FROM employees WHERE dept_id = e.dept_id);
45. 2. Non-Correlated Subquery: Subquery executes independently of the
outer query.
Syntax:- SELECT name
FROM employees
WHERE dept_id IN (SELECT id FROM departments WHERE location = 'New York');
46. Cases of Subqueries
With One Table: Subquery uses the same table as the outer query.
Syntax:-
SELECT name
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
47. With Multiple Tables:
Subquery and outer query use different tables..
Syntax:-
SELECT name
FROM employees
WHERE dept_id IN (SELECT id FROM departments WHERE location = 'Delhi');
48. Subqueries can be used in:
SELECT Clause:
1. Used to fetch a value for each row
Syntax:-
SELECT name,
(SELECT department_name FROM departments WHERE departments.id =
employees.dept_id) AS dept_name
FROM employees;
49. 2. FROM Clause: Treats subquery result as a temporary table
Syntax:- SELECT avg_salary
FROM (
SELECT AVG(salary) AS avg_salary
FROM employees
) AS temp;
3. WHERE Clause: Used to filter records
Syntax:-
SELECT name
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
50. CASE Statement
Used to apply conditional logic in SQL queries — like IF-ELSE.
Syntax:-
SELECT column,
CASE
WHEN condition1 THEN result1
WHEN condition2 THEN result2
ELSE default_result
END AS alias_name
FROM table_name;
51. WINDOW FUNCTIONS
Used to calculate values across multiple rows without merging them into one row.
OVER Clause: Tells SQL to apply a window function.
Used with functions like SUM(), RANK(), LEAD(), etc.
Syntax:- SUM(salary) OVER ()
52. PARTITION BY Clause: Divides rows into groups to apply the function
separately within each group.
Syntax:- SUM(salary) OVER (PARTITION BY department)
ORDER BY Clause: Defines the order of rows within each partition.
Important for functions like RANK(), LEAD(), ROW_NUMBER().
Syntax:- RANK() OVER (ORDER BY salary DESC)
53. Types of Window Functions
1. Aggregate Window Functions:
Perform calculations like total, average, count, etc., across a window of rows.
SUM() - SUM(salary) OVER (PARTITION BY department)
AVG() - AVG(salary) OVER (PARTITION BY department)
MIN() - MIN(salary) OVER (PARTITION BY department)
54. MAX() - MAX(salary) OVER (PARTITION BY department)
COUNT() - COUNT(*) OVER (PARTITION BY department)
2. Ranking Window Functions:
Assign a rank or number to each row based on a specific order.
ROW_NUMBER() – Gives unique row numbers
ROW_NUMBER() OVER (ORDER BY salary DESC)
55. RANK() – Same rank for ties, skips next
RANK() OVER (ORDER BY salary DESC)
DENSE_RANK() – Same rank for ties, no gaps
DENSE_RANK() OVER (ORDER BY salary DESC)
NTILE(n) – Divides rows into n equal parts
NTILE(4) OVER (ORDER BY salary)
56. 3. Value Window Functions:
Access values from other rows relative to the current row.
LEAD() – Next row value
LEAD(salary) OVER (ORDER BY salary)
LAG() – Previous row value
LAG(salary) OVER (ORDER BY salary)
57. FIRST_VALUE() – First value in the window
FIRST_VALUE(salary) OVER (ORDER BY salary)
LAST_VALUE() – Last value in the window
LAST_VALUE(salary) OVER (ORDER BY salary)
58. CTE (Common Table Expression)
A CTE is a temporary named result set that you can reference within
a SELECT, INSERT, UPDATE, or DELETE query.
Syntax:- WITH cte_name AS (
SELECT column1, column2
FROM table_name
WHERE condition
)
SELECT * FROM cte_name;