Data Science Pandas Uncategorized

6.4.Data Cleaning and Transformation – String Operations in Pandas

🔧 Data Cleaning and Transformation: String Operations in Pandas

🔍 Introduction

String operations are a vital part of data cleaning, especially when dealing with textual data. Pandas provides powerful string methods to clean, transform, and analyze string data efficiently.

Key string operations in Pandas include:

  1. 📌 Lowercasing and Uppercasing – Standardize text data.
  2. 📌 Replacing Substrings – Modify or correct text.
  3. 📌 Checking for Substrings – Filter data based on conditions.

Let’s explore these operations with practical examples.

📌 Example 1: Changing Case with str.lower() and str.upper()

import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie']}
df = pd.DataFrame(data)

# Convert names to lowercase
df['Name_lower'] = df['Name'].str.lower()

# Convert names to uppercase
df['Name_upper'] = df['Name'].str.upper()
print(df)

✅ Output:

      Name Name_lower Name_upper
0   Alice       alice      ALICE
1     Bob         bob        BOB
2  Charlie     charlie    CHARLIE

📌 Example 2: Replacing Substrings with str.replace()

# Replacing 'Alice' with 'Alicia'
df['Name'] = df['Name'].str.replace('Alice', 'Alicia')
print(df)

✅ Output:

      Name Name_lower Name_upper
0  Alicia       alice      ALICE
1     Bob         bob        BOB
2  Charlie     charlie    CHARLIE

📌 Example 3: Filtering Data with str.contains()

# Filtering rows where the name contains 'li'
filtered_df = df[df['Name'].str.contains('li')]
print(filtered_df)

✅ Output:

      Name Name_lower Name_upper
0  Alicia       alice      ALICE
2  Charlie     charlie    CHARLIE

🔖 Summary

🔹 Use str.lower() and str.upper() to standardize text data. 🔹 Use str.replace() to modify or clean textual data. 🔹 Use str.contains() for filtering data based on substring matches.

Mastering these string operations simplifies data cleaning and ensures consistent and accurate datasets for analysis. 🚀

 

Leave a Reply

Your email address will not be published. Required fields are marked *