Beta Version

ShodhSarthi

DULS Guide to

๐Ÿ“Š Data Analysis & Visualization

Master Data Analytics with R and Python: From Basics to Advanced Visualization

๐ŸŽฏ What is Data Analysis & Visualization?

Data analysis is the process of inspecting, cleaning, transforming, and modeling data to discover useful information, draw conclusions, and support decision-making. Data visualization is the graphical representation of information and data using visual elements like charts, graphs, and maps to make complex data more accessible and understandable.

๐Ÿ”ฌ Core Components of Data Analytics

Modern data analytics encompasses several key areas:

  • Data Collection: Gathering raw data from various sources
  • Data Cleaning: Identifying and correcting errors in datasets
  • Exploratory Data Analysis: Understanding patterns and relationships
  • Statistical Modeling: Applying mathematical models to data
  • Data Visualization: Creating visual representations of insights
  • Interpretation & Communication: Translating findings into actionable insights

๐Ÿ“ˆ Why R and Python?

R and Python are the leading languages for data analysis in 2024-2025:

  • R: Specifically designed for statistical computing and graphics
  • Python: Versatile language with powerful data science libraries
  • Open Source: Both are free and have extensive community support
  • Industry Standard: Used by data scientists at major companies
  • Rich Ecosystems: Thousands of packages for specialized analysis

๐Ÿ“Š Data Analysis Process

1Problem Definition

Clearly define the business problem or research question you're trying to solve. This step determines the entire analysis approach.

๐ŸŒŸ Example: E-commerce Analysis

Problem: "Why has our online store's conversion rate dropped by 15% over the past quarter?"

  • Define metrics: conversion rate, traffic sources, user behavior
  • Identify stakeholders: marketing, UX, product teams
  • Set success criteria: identify root causes and recommendations

2Data Collection & Preparation

Gather relevant data from various sources and prepare it for analysis through cleaning and transformation.

๐Ÿ“š Online Courses

  • Coursera: Data Science Specializations
  • edX: MIT and Harvard analytics courses
  • Udacity: Data analyst nanodegree
  • DataCamp: Hands-on R and Python
  • Pluralsight: Technology skills platform

Price Range: $29-99/month

๐Ÿ“– Books & Documentation

  • "R for Data Science" by Wickham & Grolemund
  • "Python for Data Analysis" by Wes McKinney
  • "The Elements of Statistical Learning"
  • Official Documentation: R-project.org, Python.org
  • Stack Overflow: Community Q&A

๐ŸŽฅ Video Resources

  • YouTube: StatQuest, 3Blue1Brown
  • Khan Academy: Statistics fundamentals
  • Fast.ai: Practical deep learning
  • Towards Data Science: Medium publication
  • R-bloggers: R community blog
๐Ÿ“ฅ Data Sources
  • Databases (SQL, NoSQL)
  • APIs and web scraping
  • CSV/Excel files
  • Surveys and forms
๐Ÿงน Data Cleaning
  • Handle missing values
  • Remove duplicates
  • Standardize formats
  • Detect outliers

3Exploratory Data Analysis

Explore the data to understand its structure, patterns, and relationships using statistical summaries and visualizations.

4Modeling & Analysis

Apply appropriate statistical methods, machine learning algorithms, or analytical techniques to answer your research questions.

5Interpretation & Communication

Interpret results, create visualizations, and communicate findings to stakeholders in an actionable format.

๐ŸŽฏ Choosing Between R and Python

๐Ÿ“ˆ R Language

Best for: Statistical analysis, data visualization, academic research

Strengths:

  • Exceptional statistical capabilities
  • Outstanding visualization (ggplot2)
  • Comprehensive statistical packages
  • Strong academic community
  • Built-in data analysis functions
R Example:
# Load data and create visualization
library(ggplot2)
data <- read.csv("sales.csv")
ggplot(data, aes(x=month, y=sales)) +
  geom_line() + theme_minimal()

๐Ÿ Python

Best for: Machine learning, web scraping, general programming, production systems

Strengths:

  • Versatile general-purpose language
  • Excellent machine learning libraries
  • Great for automation and scripting
  • Strong industry adoption
  • Easy integration with other systems
Python Example:
# Load data and create visualization
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv('sales.csv')
data.plot(x='month', y='sales')

๐Ÿ“ˆ Introduction to R Programming

R is a programming language and software environment for statistical computing and graphics. Created by statisticians for statisticians, R provides an extensive catalog of statistical and graphical methods.

๐ŸŽฏ Why Learn R?

  • Statistical Computing: Built specifically for data analysis
  • Data Visualization: Exceptional graphics capabilities
  • Reproducible Research: R Markdown for reports and presentations
  • Extensive Packages: Over 18,000 packages on CRAN
  • Active Community: Strong support from statisticians and data scientists
  • Free and Open Source: No licensing costs

๐Ÿš€ Getting Started with R

1Installation and Setup

๐Ÿ“ฅ Download and Install

  1. Download R: Visit CRAN and download R for your operating system
  2. Download RStudio: Get the free RStudio IDE from RStudio.com
  3. Install Both: Install R first, then RStudio
  4. Verify Installation: Open RStudio and run version

2Basic R Syntax

Variables and Assignment:
# Assign values to variables
x <- 5
y <- 10
name <- "John"
is_student <- TRUE

# Print values
print(x)
cat("Hello", name)
Basic Operations:
# Arithmetic operations
sum <- x + y
product <- x * y
division <- y / x

# Logical operations
is_greater <- x > y
is_equal <- x == y

3Data Types and Structures

๐Ÿ“Š Basic Data Types
# Numeric
num <- 3.14

# Integer
int <- 42L

# Character
char <- "Hello"

# Logical
bool <- TRUE
๐Ÿ“ Data Structures
# Vector
vec <- c(1, 2, 3, 4, 5)

# List
lst <- list(a=1, b=2)

# Matrix
mat <- matrix(1:6, nrow=2)

4Working with Data Frames

Creating Data Frames:
# Create a data frame
students <- data.frame(
  name = c("Alice", "Bob", "Charlie"),
  age = c(20, 22, 19),
  grade = c(85, 92, 78)
)

# View the data frame
print(students)
head(students)
str(students)
๐ŸŽฎ Interactive Demo: Data Frame Operations

Try these common data frame operations:

Click a button to see R code examples...

๐Ÿ“ฆ Essential R Packages

๐Ÿงน Data Manipulation

  • dplyr: Grammar of data manipulation
  • tidyr: Tidy messy data
  • readr: Fast and friendly data import
  • stringr: String manipulation
# Install and load
install.packages("dplyr")
library(dplyr)

๐Ÿ“Š Visualization

  • ggplot2: Grammar of graphics
  • plotly: Interactive plots
  • lattice: Trellis graphics
  • corrplot: Correlation matrices
# Install ggplot2
install.packages("ggplot2")
library(ggplot2)

๐Ÿ“ˆ Statistical Analysis

  • stats: Built-in statistical functions
  • car: Companion to Applied Regression
  • psych: Psychometric analysis
  • forecast: Time series forecasting
# Load built-in stats
library(stats)
mean(c(1,2,3,4,5))

๐Ÿ” Data Analytics with R

R excels at statistical analysis and data exploration. This section covers practical data analytics techniques from data import to advanced statistical modeling.

1Data Import and Export

Reading Different File Formats:
# CSV files
data <- read.csv("data.csv", header = TRUE)

# Excel files (requires readxl)
library(readxl)
excel_data <- read_excel("data.xlsx")

# From URL
url_data <- read.csv("https://example.com/data.csv")

# Export data
write.csv(data, "output.csv", row.names = FALSE)

2Data Exploration and Summary Statistics

Basic Data Exploration:
# Load sample dataset
data(mtcars)

# Basic information
dim(mtcars) # Dimensions
names(mtcars) # Column names
head(mtcars, 6) # First 6 rows
tail(mtcars, 6) # Last 6 rows
str(mtcars) # Structure

# Summary statistics
summary(mtcars)
mean(mtcars$mpg)
median(mtcars$mpg)
sd(mtcars$mpg) # Standard deviation

๐ŸŒŸ Example: Car Performance Analysis

# Analyze car fuel efficiency
data(mtcars)

# Basic statistics
cat("Average MPG:", mean(mtcars$mpg))
cat("Range:", range(mtcars$mpg))

# Correlation analysis
cor(mtcars$mpg, mtcars$wt) # Correlation with weight
cor(mtcars[, c("mpg", "wt", "hp")])

3Data Manipulation with dplyr

The dplyr Grammar:
library(dplyr)

# Filter rows
high_mpg <- mtcars %>%
  filter(mpg > 20)

# Select columns
car_basics <- mtcars %>%
  select(mpg, wt, hp)

# Create new columns
mtcars_enhanced <- mtcars %>%
  mutate(power_to_weight = hp / wt)

# Group and summarize
cylinder_summary <- mtcars %>%
  group_by(cyl) %>%
  summarise(
    avg_mpg = mean(mpg),
    avg_hp = mean(hp),
    count = n()
  )

4Statistical Analysis

๐Ÿ“Š Descriptive Statistics
# Central tendency
mean(mtcars$mpg)
median(mtcars$mpg)
mode(mtcars$mpg)

# Variability
var(mtcars$mpg)
sd(mtcars$mpg)
IQR(mtcars$mpg)

# Distribution shape
library(moments)
skewness(mtcars$mpg)
kurtosis(mtcars$mpg)
๐Ÿ” Inferential Statistics
# T-test
t.test(mpg ~ am, data = mtcars)

# ANOVA
model <- aov(mpg ~ cyl, data = mtcars)
summary(model)

# Chi-square test
chisq.test(table(mtcars$cyl, mtcars$am))

5Linear Regression

Building and Evaluating Models:
# Simple linear regression
model1 <- lm(mpg ~ wt, data = mtcars)
summary(model1)

# Multiple regression
model2 <- lm(mpg ~ wt + hp + cyl, data = mtcars)
summary(model2)

# Model diagnostics
plot(model2) # Diagnostic plots
anova(model1, model2) # Compare models

# Predictions
predictions <- predict(model2, newdata = mtcars)
residuals <- residuals(model2)
๐ŸŽฎ Interactive Demo: Regression Analysis

Explore different aspects of regression modeling:

Click a button to see regression examples...

๐Ÿ“Š Data Visualization with ggplot2

1Grammar of Graphics

ggplot2 is based on the Grammar of Graphics, a systematic approach to building visualizations by combining components.

Basic ggplot Structure:
library(ggplot2)

# Basic scatter plot
ggplot(data = mtcars, aes(x = wt, y = mpg)) +
  geom_point()

# Add layers
ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point() +
  geom_smooth(method = "lm") +
  labs(title = "Car Weight vs MPG",
       x = "Weight (1000 lbs)",
       y = "Miles per Gallon") +
  theme_minimal()

2Common Plot Types

๐Ÿ“ˆ Scatter Plots
# Basic scatter
ggplot(mtcars, aes(wt, mpg)) +
  geom_point()

# With color grouping
ggplot(mtcars, aes(wt, mpg, color = factor(cyl))) +
  geom_point(size = 3)

# With size mapping
ggplot(mtcars, aes(wt, mpg, size = hp)) +
  geom_point(alpha = 0.7)
๐Ÿ“Š Bar Charts
# Simple bar chart
ggplot(mtcars, aes(x = factor(cyl))) +
  geom_bar()

# Grouped bar chart
ggplot(mtcars, aes(factor(cyl), fill = factor(am))) +
  geom_bar(position = "dodge")

# Horizontal bars
ggplot(mtcars, aes(factor(cyl))) +
  geom_bar() +
  coord_flip()

๐Ÿ Introduction to Python for Data Analysis

Python is a versatile, high-level programming language that has become the go-to choice for data science, machine learning, and analytics. Its readable syntax and extensive ecosystem make it ideal for both beginners and experienced programmers.

๐ŸŽฏ Why Python for Data Analysis?

  • Readable Syntax: Easy to learn and understand
  • Rich Ecosystem: Powerful libraries like pandas, NumPy, scikit-learn
  • Versatility: Data analysis, web development, automation
  • Industry Standard: Widely used in tech companies
  • Machine Learning: Excellent ML and AI capabilities
  • Community Support: Large, active community

๐Ÿš€ Getting Started with Python

1Installation and Environment Setup

๐Ÿ“ฅ Installation Options

  1. Anaconda Distribution: Includes Python + data science packages
  2. Python.org: Official Python installer
  3. Package Managers: pip for packages, conda for environments
  4. IDEs: Jupyter Notebook, PyCharm, VS Code, Spyder
Setting up Environment:
# Install packages using pip
pip install pandas numpy matplotlib seaborn scikit-learn

# Or using conda
conda install pandas numpy matplotlib seaborn scikit-learn

# Create virtual environment
python -m venv data_analysis_env
source data_analysis_env/bin/activate # On Windows: data_analysis_env\Scripts\activate

2Python Basics for Data Analysis

Variables and Data Types:
# Basic data types
name = "Alice" # String
age = 25 # Integer
height = 5.6 # Float
is_student = True # Boolean

# Check data type
print(type(name))
print(f"{name} is {age} years old")
Data Structures:
# Lists (ordered, mutable)
numbers = [1, 2, 3, 4, 5]
mixed_list = ["apple", 42, True, 3.14]

# Dictionaries (key-value pairs)
person = {
  "name": "Bob",
  "age": 30,
  "city": "New York"
}

# Tuples (ordered, immutable)
coordinates = (10.5, 20.3)

# Sets (unordered, unique elements)
unique_numbers = {1, 2, 3, 4, 5}

3Control Flow and Functions

๐Ÿ”„ Control Structures
# If statements
score = 85
if score >= 90:
  grade = "A"
elif score >= 80:
  grade = "B"
else:
  grade = "C"

# For loops
for i in range(5):
  print(f"Number: {i}")

# While loops
count = 0
while count < 3:
  print(count)
  count += 1
โš™๏ธ Functions
# Define functions
def calculate_bmi(weight, height):
  """Calculate BMI given weight and height"""
  bmi = weight / (height ** 2)
  return bmi

# Call function
my_bmi = calculate_bmi(70, 1.75)
print(f"BMI: {my_bmi:.2f}")

# Lambda functions
square = lambda x: x ** 2
print(square(5))

๐Ÿ“ฆ Essential Python Libraries for Data Analysis

๐Ÿ”ข NumPy

Fundamental package for scientific computing with Python

  • N-dimensional arrays
  • Mathematical functions
  • Linear algebra operations
  • Foundation for other libraries
import numpy as np

# Create arrays
arr = np.array([1, 2, 3, 4, 5])
matrix = np.array([[1, 2], [3, 4]])

# Basic operations
print(arr.mean())
print(arr.sum())
print(np.sqrt(arr))

๐Ÿผ Pandas

Data manipulation and analysis library

  • DataFrames and Series
  • Data cleaning and transformation
  • File I/O operations
  • Grouping and merging data
import pandas as pd

# Create DataFrame
df = pd.DataFrame({
  'name': ['Alice', 'Bob'],
  'age': [25, 30]
})

# Basic operations
print(df.head())
print(df.describe())

๐Ÿ“Š Matplotlib

Comprehensive plotting library

  • Static, animated, interactive visualizations
  • Publication-quality figures
  • Extensive customization options
  • Integration with NumPy and pandas
import matplotlib.pyplot as plt

# Simple plot
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
plt.plot(x, y)
plt.xlabel('X values')
plt.ylabel('Y values')
plt.show()

๐ŸŽจ Seaborn

Statistical data visualization based on matplotlib

  • Beautiful default styles
  • Statistical plotting functions
  • Integration with pandas DataFrames
  • Complex visualizations made simple
import seaborn as sns

# Load sample data
tips = sns.load_dataset('tips')

# Create visualization
sns.scatterplot(data=tips,
                x='total_bill',
                y='tip')
๐ŸŽฎ Interactive Demo: Python Libraries

Explore the core Python data analysis libraries:

Click a button to see Python library examples...

๐Ÿ“Š Data Analytics with Python

Python provides a comprehensive ecosystem for data analytics, from data manipulation with pandas to machine learning with scikit-learn. This section covers practical analytics workflows.

1Data Loading and Exploration

Loading Data from Different Sources:
import pandas as pd
import numpy as np

# CSV files
df = pd.read_csv('data.csv')

# Excel files
df_excel = pd.read_excel('data.xlsx', sheet_name='Sheet1')

# JSON files
df_json = pd.read_json('data.json')

# From URL
url = 'https://example.com/data.csv'
df_url = pd.read_csv(url)

# Database connection
import sqlite3
conn = sqlite3.connect('database.db')
df_db = pd.read_sql_query("SELECT * FROM table", conn)
Initial Data Exploration:
# Basic information
print(df.shape) # Dimensions
print(df.info()) # Data types and null values
print(df.describe()) # Summary statistics

# First look at data
print(df.head(10)) # First 10 rows
print(df.tail(5)) # Last 5 rows
print(df.columns.tolist()) # Column names

# Check for missing values
print(df.isnull().sum())
print(df.duplicated().sum()) # Duplicate rows

2Data Cleaning and Preprocessing

๐Ÿงน Handling Missing Data
# Remove rows with any missing values
df_clean = df.dropna()

# Remove rows with missing in specific column
df_clean = df.dropna(subset=['important_column'])

# Fill missing values
df['column'].fillna(df['column'].mean(), inplace=True)
df['category'].fillna('Unknown', inplace=True)

# Forward/backward fill
df.fillna(method='ffill', inplace=True)
๐Ÿ”„ Data Transformation
# Remove duplicates
df_unique = df.drop_duplicates()

# Convert data types
df['date'] = pd.to_datetime(df['date'])
df['category'] = df['category'].astype('category')

# Create new columns
df['total'] = df['price'] * df['quantity']
df['month'] = df['date'].dt.month

3Data Manipulation with Pandas

Filtering and Selecting Data:
# Boolean indexing
high_sales = df[df['sales'] > 1000]
recent_data = df[df['date'] >= '2024-01-01']

# Multiple conditions
filtered = df[(df['sales'] > 500) & (df['region'] == 'North')]

# Select specific columns
subset = df[['name', 'sales', 'profit']]

# Query method (alternative syntax)
result = df.query('sales > 1000 and region == "North"')
Grouping and Aggregation:
# Group by single column
by_region = df.groupby('region')['sales'].sum()

# Group by multiple columns
by_region_month = df.groupby(['region', 'month'])['sales'].mean()

# Multiple aggregations
summary = df.groupby('region').agg({
  'sales': ['sum', 'mean', 'count'],
  'profit': ['sum', 'max']
})

# Apply custom functions
custom_stats = df.groupby('category')['price'].apply(lambda x: x.max() - x.min())

4Statistical Analysis with Python

Example: Sales Performance Analysis

import pandas as pd
import numpy as np
from scipy import stats

# Load sample sales data
# Assume we have columns: date, product, sales, region

# Descriptive statistics
print("Sales Summary:")
print(df['sales'].describe())

# Correlation analysis
correlation_matrix = df[['sales', 'advertising', 'price']].corr()
print("Correlation Matrix:")
print(correlation_matrix)

# Hypothesis testing
north_sales = df[df['region'] == 'North']['sales']
south_sales = df[df['region'] == 'South']['sales']
t_stat, p_value = stats.ttest_ind(north_sales, south_sales)
print(f"T-test results: t-statistic = {t_stat:.4f}, p-value = {p_value:.4f}")

5Machine Learning with Scikit-learn

Linear Regression Example:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt

# Prepare data
X = df[['advertising', 'price']] # Features
y = df['sales'] # Target variable

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"MSE: {mse:.2f}")
print(f"Rยฒ: {r2:.2f}")

# Plot results
plt.figure(figsize=(10, 6))
plt.scatter(y_test, y_pred, alpha=0.6)
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--')
plt.xlabel('Actual Sales')
plt.ylabel('Predicted Sales')
plt.title('Actual vs Predicted Sales')
plt.show()
Interactive Demo: Machine Learning Pipeline

Explore different aspects of the ML workflow:

Click a button to see ML examples...

Time Series Analysis

1Working with Time Series Data

Time Series Basics:
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime

# Create time series
dates = pd.date_range('2023-01-01', periods=365, freq='D')
ts = pd.Series(np.random.randn(365).cumsum(), index=dates)

# Basic time series operations
monthly_mean = ts.resample('M').mean() # Monthly averages
rolling_avg = ts.rolling(window=30).mean() # 30-day moving average

# Plot time series
plt.figure(figsize=(12, 6))
plt.plot(ts.index, ts.values, label='Original', alpha=0.7)
plt.plot(rolling_avg.index, rolling_avg.values, label='30-day MA', linewidth=2)
plt.legend()
plt.title('Time Series with Moving Average')
plt.show()

Data Visualization Mastery

Effective data visualization is crucial for communicating insights and patterns in your data. This section covers visualization techniques in both R and Python.

Principles of Effective Visualization

  • Choose the Right Chart Type: Match visualization to data type and purpose
  • Clear Labels and Titles: Make visualizations self-explanatory
  • Appropriate Color Usage: Use color meaningfully and accessibly
  • Avoid Chart Junk: Remove unnecessary elements that distract
  • Tell a Story: Guide viewers to key insights

Visualization in R with ggplot2

Scatter Plots
library(ggplot2)

# Basic scatter plot
ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point(size = 3, alpha = 0.7) +
  geom_smooth(method = "lm", se = FALSE) +
  labs(title = "Car Weight vs Fuel Efficiency",
       x = "Weight (1000 lbs)",
       y = "Miles per Gallon") +
  theme_minimal()
Bar Charts
# Grouped bar chart
mtcars$cyl_factor <- factor(mtcars$cyl)
mtcars$am_factor <- factor(mtcars$am,
  labels = c("Automatic", "Manual"))

ggplot(mtcars, aes(x = cyl_factor, fill = am_factor)) +
  geom_bar(position = "dodge") +
  labs(title = "Car Count by Cylinders and Transmission",
       x = "Number of Cylinders",
       y = "Count",
       fill = "Transmission") +
  theme_minimal()

1Advanced ggplot2 Techniques

Faceting (Small Multiples):
# Create subplots by category
ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point(aes(color = factor(am))) +
  geom_smooth(method = "lm", se = FALSE) +
  facet_wrap(~ cyl, scales = "free") +
  labs(title = "Weight vs MPG by Cylinder Count",
       color = "Transmission") +
  theme_minimal()

Visualization in Python

Matplotlib Fundamentals

import matplotlib.pyplot as plt
import numpy as np

# Create figure and axis
fig, ax = plt.subplots(figsize=(10, 6))

# Sample data
x = np.linspace(0, 10, 100)
y = np.sin(x)

# Create plot
ax.plot(x, y, linewidth=2, label='sin(x)')
ax.set_xlabel('X values')
ax.set_ylabel('Y values')
ax.set_title('Sine Wave')
ax.legend()
ax.grid(True, alpha=0.3)
plt.show()

Seaborn Statistical Plots

import seaborn as sns
import pandas as pd

# Load sample dataset
tips = sns.load_dataset('tips')

# Create correlation heatmap
plt.figure(figsize=(8, 6))
correlation_matrix = tips.select_dtypes(include=[np.number]).corr()
sns.heatmap(correlation_matrix,
            annot=True,
            cmap='coolwarm',
            center=0)
plt.title('Tips Dataset Correlation Matrix')
plt.show()

1Interactive Visualizations

Plotly for Interactive Charts:
import plotly.express as px
import plotly.graph_objects as go

# Interactive scatter plot
fig = px.scatter(tips,
                 x='total_bill',
                 y='tip',
                 color='day',
                 size='size',
                 hover_data=['sex', 'smoker'],
                 title='Restaurant Tips Analysis')
fig.show()
Interactive Demo: Visualization Comparison

See how the same data looks in different chart types:

Click a button to see visualization examples...

Tools and Resources for Data Analysis

Comprehensive collection of tools, platforms, and resources for data analysis and visualization in 2024-2025.

Development Environments

IDEs, notebooks, and development platforms

Data Sources

Public datasets and data collection tools

Cloud Platforms

Cloud-based analytics and ML services

Learning Resources

Courses, books, and tutorials

๐Ÿง  Data Analysis Knowledge Assessment

Which Python library is specifically designed for data manipulation and analysis?
NumPy
Pandas
Matplotlib
Scikit-learn
Question 1 of 12 | Score: 0