🔗 Data Manipulation with Pandas: Merging, Joining, and Concatenation
🔍 Introduction
Pandas provides powerful functions for efficiently combining multiple datasets. Whether you need to merge data based on a common key, join tables similar to SQL, or concatenate datasets along rows or columns, Pandas has built-in methods to handle these operations seamlessly.
The three main methods for combining data in Pandas are:
- 📌 Merging (
merge()
) – Combining datasets based on common columns. - 📌 Joining (
join()
) – A convenient way to combine DataFrames using index values. - 📌 Concatenation (
concat()
) – Stacking DataFrames vertically or horizontally.
In this tutorial, we will explore each method with practical examples.
📌 Example 1: Merging DataFrames
The merge()
function is useful when combining datasets based on a shared column (like SQL joins).
import pandas as pd
# Creating two DataFrames
df1 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie']})
df2 = pd.DataFrame({'ID': [1, 2, 4], 'Score': [85, 90, 78]})
# Merging on 'ID' column
df_merged = pd.merge(df1, df2, on='ID', how='inner')
print(df_merged)
✅ Output:
ID Name Score
0 1 Alice 85
1 2 Bob 90
Here, an inner join is performed, returning only matching IDs. You can use how='left'
, how='right'
, or how='outer'
for different types of joins.
📌 Example 2: Joining DataFrames on Index
The join()
method works similarly to merge()
, but it joins DataFrames based on their index values.
# Creating DataFrames with custom index
df1 = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie']}, index=[1, 2, 3])
df2 = pd.DataFrame({'Score': [85, 90, 78]}, index=[1, 2, 4])
# Joining DataFrames
df_joined = df1.join(df2, how='inner')
print(df_joined)
✅ Output:
Name Score
1 Alice 85
2 Bob 90
Here, join()
combines the DataFrames based on their index values.
📌 Example 3: Concatenating DataFrames
The concat()
function allows you to stack DataFrames along rows (default) or columns.
# Creating two DataFrames
df1 = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Score': [85, 90]})
df2 = pd.DataFrame({'Name': ['Charlie', 'David'], 'Score': [78, 92]})
# Concatenating DataFrames
df_concat = pd.concat([df1, df2], ignore_index=True)
print(df_concat)
✅ Output:
Name Score
0 Alice 85
1 Bob 90
2 Charlie 78
3 David 92
By setting ignore_index=True
, we reset the index after concatenation.
📌 Summary
🔹 Merging allows you to combine datasets based on shared columns with different join types. 🔹 Joining is a similar approach that merges data based on index values. 🔹 Concatenation stacks datasets vertically or horizontally.
Mastering these techniques in Pandas will enable you to efficiently handle and manipulate structured data for analysis. 🚀