🔧 Data Cleaning and Transformation: String Operations in Pandas
🔍 Introduction
String operations are a vital part of data cleaning, especially when dealing with textual data. Pandas provides powerful string methods to clean, transform, and analyze string data efficiently.
Key string operations in Pandas include:
- 📌 Lowercasing and Uppercasing – Standardize text data.
- 📌 Replacing Substrings – Modify or correct text.
- 📌 Checking for Substrings – Filter data based on conditions.
Let’s explore these operations with practical examples.
📌 Example 1: Changing Case with str.lower()
and str.upper()
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie']}
df = pd.DataFrame(data)
# Convert names to lowercase
df['Name_lower'] = df['Name'].str.lower()
# Convert names to uppercase
df['Name_upper'] = df['Name'].str.upper()
print(df)
✅ Output:
Name Name_lower Name_upper
0 Alice alice ALICE
1 Bob bob BOB
2 Charlie charlie CHARLIE
📌 Example 2: Replacing Substrings with str.replace()
# Replacing 'Alice' with 'Alicia'
df['Name'] = df['Name'].str.replace('Alice', 'Alicia')
print(df)
✅ Output:
Name Name_lower Name_upper
0 Alicia alice ALICE
1 Bob bob BOB
2 Charlie charlie CHARLIE
📌 Example 3: Filtering Data with str.contains()
# Filtering rows where the name contains 'li'
filtered_df = df[df['Name'].str.contains('li')]
print(filtered_df)
✅ Output:
Name Name_lower Name_upper
0 Alicia alice ALICE
2 Charlie charlie CHARLIE
🔖 Summary
🔹 Use
str.lower() and
str.upper() to standardize text data. 🔹 Use
str.replace() to modify or clean textual data. 🔹 Use
str.contains() for filtering data based on substring matches.
Mastering these string operations simplifies data cleaning and ensures consistent and accurate datasets for analysis. 🚀