🛠️ Working with Missing Data: Filling and Dropping Missing Values in Pandas
🔎 Introduction
Handling missing data is crucial for accurate analysis and modeling. In Pandas, missing values can be either filled (imputed) or dropped, depending on the context and data requirements.
Key methods for handling missing values in Pandas include:
- 📌
fillna() – Replace missing values with a specified value or method (e.g., mean, median, or forward fill). - 📌
dropna() – Remove rows or columns containing missing values. - 📌
interpolate() – Estimate missing values using interpolation techniques.
In this tutorial, we will explore different ways to fill and drop missing values using Pandas.
📌 Example 1: Filling Missing Values Using fillna()
The fillna()
function allows replacing NaN values with a specific value or a statistical measure.
import pandas as pd
# Creating a DataFrame with missing values
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, None, 35, 40],
'Score': [85, 90, None, 78]}
df = pd.DataFrame(data)
# Filling missing values with a default value
df_filled = df.fillna(0)
print(df_filled)
✅ Output:
Name Age Score
0 Alice 25.0 85.0
1 Bob 0.0 90.0
2 Charlie 35.0 0.0
3 David 40.0 78.0
Here, all missing values are replaced with 0
.
📌 Example 2: Filling Missing Values with Mean or Forward Fill
We can also fill missing values with column means or propagate previous values forward.
# Filling missing values with the mean of the column
df_mean_filled = df.fillna(df.mean(numeric_only=True))
print(df_mean_filled)
✅ Output:
Name Age Score
0 Alice 25.0 85.0
1 Bob 33.3 90.0
2 Charlie 35.0 84.3
3 David 40.0 78.0
Alternatively, forward filling propagates previous values down the column:
# Forward fill (propagate last valid value forward)
df_ffill = df.fillna(method='ffill')
print(df_ffill)
📌 Example 3: Dropping Missing Values Using dropna()
To remove rows or columns with missing values, use dropna()
:
# Dropping rows with missing values
df_dropped = df.dropna()
print(df_dropped)
✅ Output:
Name Age Score
0 Alice 25.0 85.0
3 David 40.0 78.0
Only rows with complete data are retained.
To drop columns with missing values:
# Dropping columns with missing values
df_dropped_cols = df.dropna(axis=1)
print(df_dropped_cols)
📌 Summary
🔹
fillna() allows filling missing values with a specific value, mean, or forward/backward fill. 🔹
dropna() removes rows or columns containing missing values. 🔹 Different strategies should be used depending on the data context.
Handling missing data effectively ensures data consistency and improves the quality of analysis. 🚀