🔍 Working with Missing Data: Identifying Missing Values in Pandas
🔎 Introduction
Missing data is a common challenge in real-world datasets. Incomplete or null values can affect data analysis, visualization, and machine learning models. Pandas provides several methods to detect and handle missing values efficiently.
The key functions for identifying missing values in Pandas include:
- 📌
isna() /
isnull() – Detect missing values in a DataFrame. - 📌
sum() with
isna() – Count missing values in each column. - 📌
info() – Get an overview of missing values in the dataset.
In this tutorial, we will explore different ways to identify missing values with practical examples.
📌 Example 1: Checking for Missing Values
The isna()
or isnull()
function helps detect missing values.
import pandas as pd
# Creating a DataFrame with missing values
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, None, 35, 40],
'Score': [85, 90, None, 78]}
df = pd.DataFrame(data)
# Identifying missing values
print(df.isna())
✅ Output:
Name Age Score
0 False False False
1 False True False
2 False False True
3 False False False
Here, True
represents a missing value (NaN).
📌 Example 2: Counting Missing Values in Each Column
To get the number of missing values per column, use sum()
:
# Counting missing values per column
missing_counts = df.isna().sum()
print(missing_counts)
✅ Output:
Name 0
Age 1
Score 1
This output shows that Age
and Score
columns each have one missing value.
📌 Example 3: Using info()
to Get an Overview
The info()
function provides a quick summary, including non-null counts.
# Checking dataset info
df.info()
✅ Output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Name 4 non-null object
1 Age 3 non-null float64
2 Score 3 non-null float64
This output shows the number of non-null values in each column, making it easy to detect missing data.
📌 Summary
🔹
isna() /
isnull() helps detect missing values. 🔹 Using
sum() provides a count of missing values per column. 🔹 The
info() method gives an overview of missing data in the dataset.
Identifying missing values is the first step in data cleaning. In the next steps, we will explore how to handle them effectively. 🚀