Post content
πΌ Top 20 Frequently Asked Data Analyst Interview Questions π§ 1) Can you walk me through the tools you use for data analysis? π Answer: Absolutely! For data extraction I use SQL to query databases like MySQL and PostgreSQL. For cleaning and analysis, Python with pandas and NumPy is my go-to. Excel for quick pivots and Power BI/Tableau for interactive dashboards. I pick the right tool based on data size and stakeholder needs. π― 2) Write a SQL query to find the 2nd highest salary from employees table. π Answer: SELECT MAX(salary) as second_highest FROM employees WHERE salary < (SELECT MAX(salary) FROM employees); Follow-up: Or using window functions: DENSE_RANK() OVER (ORDER BY salary DESC) π 3) Explain INNER JOIN vs LEFT JOIN with a business example. π Answer: INNER JOIN gives only matching records. LEFT JOIN gives all from left table + matches from right. Example: Customer orders analysis - LEFT JOIN keeps customers with zero orders to see churn patterns. π 4) How would you handle missing values in a sales dataset? π Answer: Step 1: df.isnull().sum() to assess impact. Step 2: For numbers - impute median (df.fillna(df.median())). For categories - mode. Step 3: Flag imputed values for transparency. Never drop >5% without business justification. π§© 5) What's pandas groupby() and write an example? π Answer: # Sales by region + month df.groupby(['region', 'month'])['revenue'].agg({ 'mean': 'mean', 'total': 'sum', 'records': 'count' }).round(2) Split -> Apply -> Combine pattern! π 6) When would you normalize vs denormalize a database? π Answer: Normalize for transactional systems (OLTP) to save storage. Denormalize for analytics (OLAP) for faster queries. Example: Star schema with fact/dimension tables. π’ 7) VLOOKUP vs INDEX+MATCH - which is better and why? π Answer: INDEX+MATCH wins! VLOOKUP breaks if columns shift and only looks right. =INDEX(sales_range, MATCH(A2, id_range, 0)) Dynamic, safer, 2-way lookup. π 8) Difference between COUNT() vs COUNT(column_name)? π Answer: COUNT(): Total rows including NULLs. COUNT(column): Non-null values only. Use COUNT() for total records, COUNT(sales) to exclude null sales. βοΈ 9) How do you identify and remove duplicates in pandas? π Answer: # Find duplicates dupe_count = df.duplicated(subset=['email']).sum() print(f"Found {dupe_count} duplicates") # Remove (keep first) df_clean = df.drop_duplicates(subset=['email'], keep='first') Always check business logic first! π§ 10) Name 4 SQL aggregate functions with a practical example. π Answer: SELECT dept, COUNT() as headcount, AVG(salary) as avg_salary, MAX(salary) as top_earner, SUM(salary) as payroll FROM employees GROUP BY dept; π 11) Sales dropped 20% last quarter. Walk me through your analysis. π Answer: Framework: 1οΈβ£ Segment - Product/Category/Region/Customer 2οΈβ£ Trends - YoY, MoM, seasonality 3οΈβ£ Funnel - Where drop occurs 4οΈβ£ External - Competitor pricing, marketing Dashboard: Drill-down + alerts for anomalies. π― 12) What's the difference between Data Analyst and Data Scientist? π Answer: DA: SQL/Excel/Dashboards = 'What happened?' DS: ML/Python/R = 'What will happen?' Analogy: DA = Rearview mirror, DS = Crystal ball. Most value from clean DA first! π 13) Write a SQL window function to rank salaries by department. π Answer: SELECT name, dept, salary, RANK() OVER (PARTITION BY dept ORDER BY salary DESC) as dept_rank FROM employees; π§© 14) How do you create a pivot table showing sales by region/month? π Answer: Excel: Insert -> PivotTable -> Rows: Region -> Columns: Month -> Values: Sum of Sales -> Slicers for filters. Power BI: Drag-drop + matrix visual. π 15) Explain correlation vs causation with an example. π Answer: Classic: Ice cream sales correlate with drownings (both peak summer) Correlation β Causation. Need experiments to prove cause-effect.