Data Science Pandas

3.3. Pandas DataFrame: Adding and Removing Columns in Python (Step-by-Step Guide)

📖 Introduction

Pandas is a powerful Python library for data analysis and manipulation. When working with DataFrames, adding and removing columns is a common task. This tutorial will guide you through various ways to add new columns and remove existing ones using Pandas.

🗂️ 1. Creating a Sample DataFrame

Before we dive into adding and removing columns, let’s create a sample DataFrame to work with.

import pandas as pd

# Creating a sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, 35, 40, 28],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix']
}

df = pd.DataFrame(data)
print(df)

✅ Output:

     Name  Age         City
0   Alice   25     New York
1     Bob   30  Los Angeles
2  Charlie   35      Chicago
3   David   40      Houston
4     Eve   28      Phoenix

Now, let’s explore how to add and remove columns in this DataFrame.

➕ 2. Adding Columns

There are multiple ways to add new columns to a Pandas DataFrame.

📌 Adding a Column with a Fixed Value

You can add a new column by assigning a constant value to it.

# Adding a new column with a fixed value
df['Country'] = 'USA'
print(df)

✅ Output:

     Name  Age         City Country
0   Alice   25     New York     USA
1     Bob   30  Los Angeles     USA
2  Charlie   35      Chicago     USA
3   David   40      Houston     USA
4     Eve   28      Phoenix     USA

📌 Adding a Column Based on Another Column

You can create a new column using existing data.

# Adding a column that calculates years to retirement
df['Years_to_Retire'] = 65 - df['Age']
print(df)

📌 Adding a Column Using apply()

You can use apply() to generate a column dynamically.

# Adding a column based on a function
def categorize_age(age):
    return 'Young' if age < 30 else 'Old'

df['Age_Group'] = df['Age'].apply(categorize_age)
print(df)

📌 Adding a Column Using insert()

The insert() method allows adding a column at a specific position.

# Adding a column at index 1
df.insert(1, 'Gender', ['F', 'M', 'M', 'M', 'F'])
print(df)

❌ 3. Removing Columns

You can remove columns using different methods.

📌 Removing a Column Using drop()

The drop() method allows you to delete a column.

# Removing a single column
df = df.drop(columns=['Country'])
print(df)

📌 Removing Multiple Columns

You can pass a list of column names to drop().

# Removing multiple columns
df = df.drop(columns=['Age_Group', 'Years_to_Retire'])
print(df)

📌 Removing a Column Using del

The del keyword can also be used to remove a column.

# Removing a column using del
del df['Gender']
print(df)

📌 Removing a Column Using pop()

The pop() method removes a column and returns it.

# Removing a column using pop
salary_column = df.pop('Age')
print(df)
print("Removed column:", salary_column)

🎯 Conclusion

Adding and removing columns in a Pandas DataFrame is essential for data manipulation. You can add columns using assignment, insert(), and apply(). Removing columns can be done using drop(), del, or pop(). Mastering these techniques will help you efficiently manage your datasets in Python.

Leave a Reply

Your email address will not be published. Required fields are marked *