Data Science Pandas

6.2.Data Cleaning and Transformation – Replacing Values in Pandas

πŸ”§ Data Cleaning and Transformation: Replacing Values in Pandas

πŸ” Introduction

Replacing values is a common task in data cleaning, allowing you to correct, standardize, or transform data. Pandas offers flexible methods to replace values in DataFrames and Series efficiently.

Key methods for replacing values in Pandas include:

  1. πŸ“Œ replace() – Replace specific values with others.
  2. πŸ“Œ where() – Replace values based on conditions.
  3. πŸ“Œ mask() – Replace values where a condition is True.

Let’s explore these methods with practical examples.

πŸ“Œ Example 1: Using replace() for Value Substitution

import pandas as pd

# Creating a DataFrame with categorical values
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Status': ['Active', 'Inactive', 'Active']}
df = pd.DataFrame(data)

# Replacing 'Active' with '1' and 'Inactive' with '0'
df['Status'] = df['Status'].replace({'Active': 1, 'Inactive': 0})
print(df)

βœ… Output:

      Name  Status
0   Alice       1
1     Bob       0
2  Charlie      1

πŸ“Œ Example 2: Using where() for Conditional Replacement

# Replacing scores less than 90 with 'Below Average'
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Score': [85, 90, 75]}
df = pd.DataFrame(data)

df['Performance'] = df['Score'].where(df['Score'] >= 90, 'Below Average')
print(df)

βœ… Output:

      Name  Score    Performance
0   Alice     85  Below Average
1     Bob     90              90
2  Charlie     75  Below Average

πŸ“Œ Example 3: Using mask() for Conditional Replacement

# Replacing scores less than 80 with 'Low'
df['Performance'] = df['Score'].mask(df['Score'] < 80, 'Low')
print(df)

βœ… Output:

      Name  Score Performance
0   Alice     85          85
1     Bob     90          90
2  Charlie     75         Low

πŸ”– Summary

πŸ”Ή replace() is perfect for direct value substitution. πŸ”Ή where() conditionally replaces values where the condition is False. πŸ”Ή mask() replaces values where the condition is True.

Understanding and using these methods appropriately enhances data consistency and reliability. πŸš€

Leave a Reply

Your email address will not be published. Required fields are marked *