Post #2706

@sqlspecialist

Data Analytics

Views4,730Post view count

PostedApr 1404/14/2026, 11:26 AM

Post content

💼 Top 20 Frequently Asked Data Analyst Interview Questions 🧠 1) Can you walk me through the tools you use for data analysis? 👉 Answer: Absolutely! For data extraction I use SQL to query databases like MySQL and PostgreSQL. For cleaning and analysis, Python with pandas and NumPy is my go-to. Excel for quick pivots and Power BI/Tableau for interactive dashboards. I pick the right tool based on data size and stakeholder needs. 🎯 2) Write a SQL query to find the 2nd highest salary from employees table. 👉 Answer: SELECT MAX(salary) as second_highest FROM employees WHERE salary < (SELECT MAX(salary) FROM employees); Follow-up: Or using window functions: DENSE_RANK() OVER (ORDER BY salary DESC) 📊 3) Explain INNER JOIN vs LEFT JOIN with a business example. 👉 Answer: INNER JOIN gives only matching records. LEFT JOIN gives all from left table + matches from right. Example: Customer orders analysis - LEFT JOIN keeps customers with zero orders to see churn patterns. 🔍 4) How would you handle missing values in a sales dataset? 👉 Answer: Step 1: df.isnull().sum() to assess impact. Step 2: For numbers - impute median (df.fillna(df.median())). For categories - mode. Step 3: Flag imputed values for transparency. Never drop >5% without business justification. 🧩 5) What's pandas groupby() and write an example? 👉 Answer: # Sales by region + month df.groupby(['region', 'month'])['revenue'].agg({ 'mean': 'mean', 'total': 'sum', 'records': 'count' }).round(2) Split -> Apply -> Combine pattern! 📈 6) When would you normalize vs denormalize a database? 👉 Answer: Normalize for transactional systems (OLTP) to save storage. Denormalize for analytics (OLAP) for faster queries. Example: Star schema with fact/dimension tables. 🔢 7) VLOOKUP vs INDEX+MATCH - which is better and why? 👉 Answer: INDEX+MATCH wins! VLOOKUP breaks if columns shift and only looks right. =INDEX(sales_range, MATCH(A2, id_range, 0)) Dynamic, safer, 2-way lookup. 📉 8) Difference between COUNT() vs COUNT(column_name)? 👉 Answer: COUNT(): Total rows including NULLs. COUNT(column): Non-null values only. Use COUNT() for total records, COUNT(sales) to exclude null sales. ⚙️ 9) How do you identify and remove duplicates in pandas? 👉 Answer: # Find duplicates dupe_count = df.duplicated(subset=['email']).sum() print(f"Found {dupe_count} duplicates") # Remove (keep first) df_clean = df.drop_duplicates(subset=['email'], keep='first') Always check business logic first! 🧠 10) Name 4 SQL aggregate functions with a practical example. 👉 Answer: SELECT dept, COUNT() as headcount, AVG(salary) as avg_salary, MAX(salary) as top_earner, SUM(salary) as payroll FROM employees GROUP BY dept; 📊 11) Sales dropped 20% last quarter. Walk me through your analysis. 👉 Answer: Framework: 1️⃣ Segment - Product/Category/Region/Customer 2️⃣ Trends - YoY, MoM, seasonality 3️⃣ Funnel - Where drop occurs 4️⃣ External - Competitor pricing, marketing Dashboard: Drill-down + alerts for anomalies. 🎯 12) What's the difference between Data Analyst and Data Scientist? 👉 Answer: DA: SQL/Excel/Dashboards = 'What happened?' DS: ML/Python/R = 'What will happen?' Analogy: DA = Rearview mirror, DS = Crystal ball. Most value from clean DA first! 🔍 13) Write a SQL window function to rank salaries by department. 👉 Answer: SELECT name, dept, salary, RANK() OVER (PARTITION BY dept ORDER BY salary DESC) as dept_rank FROM employees; 🧩 14) How do you create a pivot table showing sales by region/month? 👉 Answer: Excel: Insert -> PivotTable -> Rows: Region -> Columns: Month -> Values: Sum of Sales -> Slicers for filters. Power BI: Drag-drop + matrix visual. 📈 15) Explain correlation vs causation with an example. 👉 Answer: Classic: Ice cream sales correlate with drownings (both peak summer) Correlation ≠ Causation. Need experiments to prove cause-effect.