Post content
🚀 Data Analyst Interview Questions with Answers — Part 6 🛠️ Python for Data Analysis 51. Why do data analysts use Python instead of (or along with) Excel? Python is used because it can handle larger datasets, automate repetitive tasks, and perform advanced analysis more efficiently than Excel. Benefits of Python: ✔️ Faster processing ✔️ Automation capabilities ✔️ Advanced analytics ✔️ Better scalability ✔️ Integration with databases and APIs ✔️ Powerful libraries like "pandas", "numpy", and "matplotlib" Excel is great for quick analysis, while Python is better for scalable workflows. 52. How do you load data from CSV or SQL into a "pandas" DataFrame? ✅Load CSV file: import pandas as pd df = pd.read_csv("sales_data.csv") ✅Load data from SQL: import pandas as pd import sqlite3 conn = sqlite3.connect("company.db") df = pd.read_sql("SELECT * FROM employees", conn) "pandas" makes data loading and manipulation simple. 53. How do you inspect the first/last rows, shape, data types, and missing values? Useful functions for quick inspection: df.head() df.tail() df.shape df.dtypes df.isnull().sum() These functions help analysts understand dataset structure quickly. 54. How do you clean missing values ("dropna", "fillna", interpolation)? ✅Remove missing values: df.dropna() ✅Fill missing values: df.fillna(0) ✅Fill with mean: df["salary"].fillna(df["salary"].mean()) ✅Interpolation: df.interpolate() The method depends on business context and data quality requirements. 55. How do you filter, sort, and group data with "pandas"? ✅Filter rows: df[df["sales"] > 5000] ✅Sort values: df.sort_values("sales", ascending=False) ✅Group data: df.groupby("region")["sales"].sum() These operations are commonly used in real-world analysis. 56. How do you calculate aggregates and pivots with "groupby" and "pivot_table"? ✅Aggregation using "groupby": df.groupby("department")["salary"].mean() ✅Create Pivot Table: pd.pivot_table( df, values="sales", index="region", columns="category", aggfunc="sum" ) Pivot tables summarize data efficiently. 57. How do you merge/join multiple DataFrames? DataFrames can be combined using "merge()". Example: pd.merge(customers, orders, on="customer_id", how="inner") Join types include: ✔️ Inner Join ✔️ Left Join ✔️ Right Join ✔️ Outer Join This is similar to SQL joins. 58. How do you create basic visualizations with "matplotlib" or "seaborn"? ✅Line chart using "matplotlib": import matplotlib.pyplot as plt plt.plot(df["month"], df["sales"]) plt.show() ✅Bar chart using "seaborn": import seaborn as sns sns.barplot(x="region", y="sales", data=df) Visualizations help identify trends and patterns quickly. 59. How do you save processed data back to CSV or database? ✅Save to CSV: df.to_csv("cleaned_data.csv", index=False) ✅Save to SQL database: df.to_sql("employees", conn, if_exists="replace") Saving processed data supports reporting and further analysis. 60. How do you write reusable Python functions for common analysis patterns? Reusable functions reduce repetition and improve code quality. Example: def calculate_growth(old, new): return ((new - old) / old) * 100 Benefits of reusable functions: ✔️ Cleaner code ✔️ Faster development ✔️ Easier debugging ✔️ Better collaboration 🚀Double Tap ❤️ For Part-7