π§ Data Cleaning and Transformation: Replacing Values in Pandas
π Introduction
Replacing values is a common task in data cleaning, allowing you to correct, standardize, or transform data. Pandas offers flexible methods to replace values in DataFrames and Series efficiently.
Key methods for replacing values in Pandas include:
- π
replace() β Replace specific values with others. - π
where() β Replace values based on conditions. - π
mask() β Replace values where a condition isTrue
.
Let’s explore these methods with practical examples.
π Example 1: Using replace()
for Value Substitution
import pandas as pd
# Creating a DataFrame with categorical values
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Status': ['Active', 'Inactive', 'Active']}
df = pd.DataFrame(data)
# Replacing 'Active' with '1' and 'Inactive' with '0'
df['Status'] = df['Status'].replace({'Active': 1, 'Inactive': 0})
print(df)
β Output:
Name Status
0 Alice 1
1 Bob 0
2 Charlie 1
π Example 2: Using where()
for Conditional Replacement
# Replacing scores less than 90 with 'Below Average'
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Score': [85, 90, 75]}
df = pd.DataFrame(data)
df['Performance'] = df['Score'].where(df['Score'] >= 90, 'Below Average')
print(df)
β Output:
Name Score Performance
0 Alice 85 Below Average
1 Bob 90 90
2 Charlie 75 Below Average
π Example 3: Using mask()
for Conditional Replacement
# Replacing scores less than 80 with 'Low'
df['Performance'] = df['Score'].mask(df['Score'] < 80, 'Low')
print(df)
β Output:
Name Score Performance
0 Alice 85 85
1 Bob 90 90
2 Charlie 75 Low
π Summary
πΉ
replace() is perfect for direct value substitution. πΉ
where() conditionally replaces values where the condition is False
. πΉ
mask() replaces values where the condition is True
.
Understanding and using these methods appropriately enhances data consistency and reliability. π