Post content
Data Analytics Interview Questions with Answers Part-1:📱 1. What is the difference between data analysis and data analytics? ⦁ Data analysis involves inspecting, cleaning, and modeling data to discover useful information and patterns for decision-making. ⦁ Data analytics is a broader process that includes data collection, transformation, analysis, and interpretation, often involving predictive and prescriptive techniques to drive business strategies. 2. Explain the data cleaning process you follow. ⦁ Identify missing, inconsistent, or corrupt data. ⦁ Handle missing data by imputation (mean, median, mode) or removal if appropriate. ⦁ Standardize formats (dates, strings). ⦁ Remove duplicates. ⦁ Detect and treat outliers. ⦁ Validate cleaned data against known business rules. 3. How do you handle missing or duplicate data? ⦁ Missing data: Identify patterns; if random, impute using statistical methods or predictive modeling; else consider domain knowledge before removal. ⦁ Duplicate data: Detect with key fields; remove exact duplicates or merge fuzzy duplicates based on context. 4. What is a primary key in a database? A primary key uniquely identifies each record in a table, ensuring entity integrity and enabling relationships between tables via foreign keys. 5. Write a SQL query to find the second highest salary in a table. SELECT MAX(salary) FROM employees WHERE salary < (SELECT MAX(salary) FROM employees); 6. Explain INNER JOIN vs LEFT JOIN with examples. ⦁ INNER JOIN: Returns only matching rows between two tables. ⦁ LEFT JOIN: Returns all rows from the left table, plus matching rows from the right; if no match, right columns are NULL. Example: SELECT * FROM A INNER JOIN B ON A.id = B.id; SELECT * FROM A LEFT JOIN B ON A.id = B.id; 7. What are outliers? How do you detect and treat them? ⦁ Outliers are data points significantly different from others that can skew analysis. ⦁ Detect with boxplots, z-score (>3), or IQR method (values outside 1.5*IQR). ⦁ Treat by investigating causes, correcting errors, transforming data, or removing if they’re noise. 8. Describe what a pivot table is and how you use it. A pivot table is a data summarization tool that groups, aggregates (sum, average), and displays data cross-categorically. Used in Excel and BI tools for quick insights and reporting. 9. How do you validate a data model’s performance? ⦁ Use relevant metrics (accuracy, precision, recall for classification; RMSE, MAE for regression). ⦁ Perform cross-validation to check generalizability. ⦁ Test on holdout or unseen data sets. 10. What is hypothesis testing? Explain t-test and z-test. ⦁ Hypothesis testing assesses if sample data supports a claim about a population. ⦁ t-test: Used when sample size is small and population variance is unknown, often comparing means. ⦁ z-test: Used for large samples with known variance to test population parameters. React ♥️ for Part-2