Data Science Pandas

4.3.Data Manipulation with Pandas – Merging, Joining, and Concatenation

🔗 Data Manipulation with Pandas: Merging, Joining, and Concatenation

🔍 Introduction

Pandas provides powerful functions for efficiently combining multiple datasets. Whether you need to merge data based on a common key, join tables similar to SQL, or concatenate datasets along rows or columns, Pandas has built-in methods to handle these operations seamlessly.

The three main methods for combining data in Pandas are:

  1. 📌 Merging (merge()) – Combining datasets based on common columns.
  2. 📌 Joining (join()) – A convenient way to combine DataFrames using index values.
  3. 📌 Concatenation (concat()) – Stacking DataFrames vertically or horizontally.

In this tutorial, we will explore each method with practical examples.

📌 Example 1: Merging DataFrames

The merge() function is useful when combining datasets based on a shared column (like SQL joins).

import pandas as pd

# Creating two DataFrames
df1 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie']})
df2 = pd.DataFrame({'ID': [1, 2, 4], 'Score': [85, 90, 78]})

# Merging on 'ID' column
df_merged = pd.merge(df1, df2, on='ID', how='inner')
print(df_merged)

✅ Output:

   ID   Name  Score
0   1  Alice     85
1   2    Bob     90

Here, an inner join is performed, returning only matching IDs. You can use how='left', how='right', or how='outer' for different types of joins.

📌 Example 2: Joining DataFrames on Index

The join() method works similarly to merge(), but it joins DataFrames based on their index values.

# Creating DataFrames with custom index
df1 = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie']}, index=[1, 2, 3])
df2 = pd.DataFrame({'Score': [85, 90, 78]}, index=[1, 2, 4])

# Joining DataFrames
df_joined = df1.join(df2, how='inner')
print(df_joined)

✅ Output:

      Name  Score
1   Alice     85
2     Bob     90

Here, join() combines the DataFrames based on their index values.

📌 Example 3: Concatenating DataFrames

The concat() function allows you to stack DataFrames along rows (default) or columns.

# Creating two DataFrames
df1 = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Score': [85, 90]})
df2 = pd.DataFrame({'Name': ['Charlie', 'David'], 'Score': [78, 92]})

# Concatenating DataFrames
df_concat = pd.concat([df1, df2], ignore_index=True)
print(df_concat)

✅ Output:

      Name  Score
0   Alice     85
1     Bob     90
2  Charlie     78
3   David     92

By setting ignore_index=True, we reset the index after concatenation.

📌 Summary

🔹 Merging allows you to combine datasets based on shared columns with different join types. 🔹 Joining is a similar approach that merges data based on index values. 🔹 Concatenation stacks datasets vertically or horizontally.

Mastering these techniques in Pandas will enable you to efficiently handle and manipulate structured data for analysis. 🚀

Leave a Reply

Your email address will not be published. Required fields are marked *