Data Science Pandas

5.1.Working with Missing Data – Identifying Missing Values in Pandas

🔍 Working with Missing Data: Identifying Missing Values in Pandas

🔎 Introduction

Missing data is a common challenge in real-world datasets. Incomplete or null values can affect data analysis, visualization, and machine learning models. Pandas provides several methods to detect and handle missing values efficiently.

The key functions for identifying missing values in Pandas include:

  1. 📌 isna() / isnull() – Detect missing values in a DataFrame.
  2. 📌 sum() with isna() – Count missing values in each column.
  3. 📌 info() – Get an overview of missing values in the dataset.

In this tutorial, we will explore different ways to identify missing values with practical examples.

📌 Example 1: Checking for Missing Values

The isna() or isnull() function helps detect missing values.

import pandas as pd

# Creating a DataFrame with missing values
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, None, 35, 40],
        'Score': [85, 90, None, 78]}
df = pd.DataFrame(data)

# Identifying missing values
print(df.isna())

✅ Output:

    Name    Age  Score
0  False  False  False
1  False   True  False
2  False  False   True
3  False  False  False

Here, True represents a missing value (NaN).

📌 Example 2: Counting Missing Values in Each Column

To get the number of missing values per column, use sum():

# Counting missing values per column
missing_counts = df.isna().sum()
print(missing_counts)

✅ Output:

Name     0
Age      1
Score    1

This output shows that Age and Score columns each have one missing value.

📌 Example 3: Using info() to Get an Overview

The info() function provides a quick summary, including non-null counts.

# Checking dataset info
df.info()

✅ Output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Name    4 non-null      object 
 1   Age     3 non-null      float64
 2   Score   3 non-null      float64

This output shows the number of non-null values in each column, making it easy to detect missing data.

📌 Summary

🔹 isna() / isnull() helps detect missing values. 🔹 Using sum() provides a count of missing values per column. 🔹 The info() method gives an overview of missing data in the dataset.

Identifying missing values is the first step in data cleaning. In the next steps, we will explore how to handle them effectively. 🚀

Leave a Reply

Your email address will not be published. Required fields are marked *