Post content
Complete SQL guide for Data Analytics 1. Introduction to SQL What is SQL? • SQL (Structured Query Language) is a domain-specific language used for managing and manipulating relational databases. It allows you to interact with data by querying, inserting, updating, and deleting records in a database. • SQL is essential for Data Analytics because it enables analysts to retrieve and manipulate data for analysis, reporting, and decision-making. Applications in Data Analytics • Data Retrieval: SQL is used to pull data from databases for analysis. • Data Transformation: SQL helps clean, aggregate, and transform data into a usable format for analysis. • Reporting: SQL can be used to create reports by summarizing data or applying business rules. • Data Modeling: SQL helps in preparing datasets for further analysis or machine learning. 2. SQL Basics Data Types SQL supports various data types that define the kind of data a column can hold: • Numeric Data Types: • INT: Integer numbers, e.g., 123. • DECIMAL(p,s): Exact numbers with a specified precision and scale, e.g., DECIMAL(10,2) for numbers like 12345.67. • FLOAT: Approximate numbers, e.g., 123.456. • String Data Types: • CHAR(n): Fixed-length strings, e.g., CHAR(10) will always use 10 characters. • VARCHAR(n): Variable-length strings, e.g., VARCHAR(50) can store up to 50 characters. • TEXT: Long text data, e.g., descriptions or long notes. • Date/Time Data Types: • DATE: Stores date values, e.g., 2024-12-01. • DATETIME: Stores both date and time, e.g., 2024-12-01 12:00:00. Creating and Modifying Tables You can create, alter, and drop tables using SQL commands: -- Create a table with columns for ID, name, salary, and hire date CREATE TABLE employees ( id INT PRIMARY KEY, name VARCHAR(50), salary DECIMAL(10, 2), hire_date DATE ); -- Alter an existing table to add a new column for department ALTER TABLE employees ADD department VARCHAR(50); -- Drop a table (delete it from the database) DROP TABLE employees; Data Insertion, Updating, and Deletion SQL allows you to manipulate data using INSERT, UPDATE, and DELETE commands: -- Insert a new employee record INSERT INTO employees (id, name, salary, hire_date, department) VALUES (1, 'Alice', 75000.00, '2022-01-15', 'HR'); -- Update the salary of employee with id 1 UPDATE employees SET salary = 80000 WHERE id = 1; -- Delete the employee record with id 1 DELETE FROM employees WHERE id = 1; 3. Data Retrieval SELECT Statement The SELECT statement is used to retrieve data from a database: SELECT * FROM employees; -- Retrieve all columns SELECT name, salary FROM employees; -- Retrieve specific columns Filtering Data with WHERE The WHERE clause filters data based on specific conditions: SELECT * FROM employees WHERE salary > 60000 AND department = 'HR'; -- Filter records based on salary and department Sorting Data with ORDER BY The ORDER BY clause sorts the result set by one or more columns: SELECT * FROM employees ORDER BY salary DESC; -- Sort by salary in descending order Aliasing You can use aliases to rename columns or tables for clarity: SELECT name AS employee_name, salary AS monthly_salary FROM employees; 4. Aggregate Functions Aggregate functions perform calculations on a set of values and return a single result. Common Aggregate Functions SELECT COUNT(*) AS total_employees, AVG(salary) AS average_salary FROM employees; -- Count total employees and calculate the average salary GROUP BY and HAVING • GROUP BY is used to group rows sharing the same value in a column. • HAVING filters groups based on aggregate conditions. -- Find average salary by department SELECT department, AVG(salary) AS average_salary FROM employees GROUP BY department; -- Filter groups with more than 5 employees SELECT department, COUNT(*) AS employee_count FROM employees GROUP BY department HAVING COUNT(*) > 5; 5. Joins Joins are used to combine rows from two or more tables based on related columns. Types of Joins