🛠️ Working with Missing Data: Handling NaN in Pandas DataFrames

🔎 Introduction

Missing data, represented as NaN (Not a Number) in Pandas, can affect data analysis and machine learning models. Pandas provides multiple ways to handle NaN values, such as filling them with specific values, removing them, or replacing them dynamically.

Key methods for handling NaN values in Pandas include:

📌 fillna() – Replacing NaN values with a specified value or method.
📌 dropna() – Removing rows or columns containing NaN values.
📌 replace() – Replacing NaN values with alternative representations.
📌 interpolate() – Estimating missing values using interpolation.

In this tutorial, we will explore different ways to handle NaN values in Pandas DataFrames.

📌 Example 1: Checking for NaN Values

Before handling NaN values, it’s useful to check where they exist.

import pandas as pd
import numpy as np

# Creating a DataFrame with NaN values
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, np.nan, 35, 40],
        'Score': [85, 90, np.nan, 78]}
df = pd.DataFrame(data)

# Checking for NaN values
print(df.isna())

✅ Output:

    Name    Age  Score
0  False  False  False
1  False   True  False
2  False  False   True
3  False  False  False

Here, True represents a missing (NaN) value.

📌 Example 2: Filling NaN Values Using `fillna()`

We can replace NaN values with a specific value, such as 0:

# Filling NaN values with zero
df_filled = df.fillna(0)
print(df_filled)

✅ Output:

      Name   Age  Score
0   Alice  25.0   85.0
1     Bob   0.0   90.0
2  Charlie  35.0    0.0
3   David  40.0   78.0

Alternatively, we can use column means to fill NaN values:

# Filling NaN values with column means
df_mean_filled = df.fillna(df.mean(numeric_only=True))
print(df_mean_filled)

📌 Example 3: Dropping NaN Values Using `dropna()`

To remove rows containing NaN values:

# Dropping rows with NaN values
df_dropped = df.dropna()
print(df_dropped)

✅ Output:

      Name   Age  Score
0   Alice  25.0   85.0
3   David  40.0   78.0

📌 Example 4: Replacing NaN Values Using `replace()`

We can replace NaN values with a custom label:

# Replacing NaN values with 'Unknown'
df_replaced = df.replace(np.nan, 'Unknown')
print(df_replaced)

✅ Output:

      Name      Age    Score
0   Alice    25.0      85.0
1     Bob  Unknown     90.0
2  Charlie   35.0   Unknown
3   David    40.0      78.0

Question : When we directly can write NAN value the why use numpy?

We use np.nan from the NumPy library because Pandas internally represents missing values as NaN (Not a Number), and np.nan is the standard way to introduce missing values in a DataFrame.

Why `np.nan`?

Standard Representation: In Python, there is no built-in NaN type, so np.nan from the NumPy library is used as the standard representation of missing values.
Compatibility: Pandas is built on top of NumPy, and it recognizes np.nan as a missing value.
Operations: Pandas provides functions like .isna(), .fillna(), and .dropna(), which specifically handle NaN values introduced using np.nan.

Example Without `np.nan`

If you try to use None instead

✅ Output:

Even though None works, Pandas automatically converts it to NaN for numerical columns. Using np.nan is preferred because it’s explicitly meant for numerical operations.

📌 Summary

🔹 fillna() helps replace NaN values with specific values like mean or zero. 🔹 dropna() removes rows or columns containing NaN values. 🔹 replace() allows flexible replacement of NaN values. 🔹 Choosing the right method depends on the data context to maintain data integrity.

Effectively handling NaN values ensures cleaner and more reliable datasets for analysis and machine learning. 🚀

Our dedicated and industry-experienced trainers are here to teach you the core concepts of each subject. After mastering these fundamentals, you'll work on real-world projects to gain practical experience. We place special emphasis on these projects, ensuring that when you secure a placement, you'll be ready to seamlessly integrate and contribute to your new team.

About Us

Categories

100 React JS questions

Angular 20

Animations

ASP.NET

Block Pattern

Our Location

5.3. Working with Missing Data – Handling NaN in Pandas DataFrames

🛠️ Working with Missing Data: Handling NaN in Pandas DataFrames

🔎 Introduction

📌 Example 1: Checking for NaN Values

✅ Output:

📌 Example 2: Filling NaN Values Using `fillna()`

✅ Output:

📌 Example 3: Dropping NaN Values Using `dropna()`

✅ Output:

📌 Example 4: Replacing NaN Values Using `replace()`

✅ Output:

Question : When we directly can write NAN value the why use numpy?

Why `np.nan`?

Example Without `np.nan`

📌 Summary

Leave a Reply Cancel reply

Our Courses

Recent Post

Build a Complete User CRUD REST API in Django REST Framework – Angular Ready with Image Upload

15 Powerful Widgets You Can Use in Flutter’s Scaffold Body

Corporate Office

About Us

Categories

100 React JS questions

Angular 20

Animations

ASP.NET

Block Pattern

Our Location

Social

5.3. Working with Missing Data – Handling NaN in Pandas DataFrames

🛠️ Working with Missing Data: Handling NaN in Pandas DataFrames

🔎 Introduction

📌 Example 1: Checking for NaN Values

✅ Output:

📌 Example 2: Filling NaN Values Using fillna()

✅ Output:

📌 Example 3: Dropping NaN Values Using dropna()

✅ Output:

📌 Example 4: Replacing NaN Values Using replace()

✅ Output:

Question : When we directly can write NAN value the why use numpy?

Why np.nan?

Example Without np.nan

📌 Summary

Leave a Reply Cancel reply

Related Post

📌 Example 2: Filling NaN Values Using `fillna()`

📌 Example 3: Dropping NaN Values Using `dropna()`

📌 Example 4: Replacing NaN Values Using `replace()`

Why `np.nan`?

Example Without `np.nan`