📊 Data Manipulation with Pandas: Grouping and Aggregations
🔍 Introduction
Pandas is a powerful Python library for data manipulation and analysis. One of the key techniques for summarizing data is grouping and aggregation. The groupby()
function in Pandas allows you to split your dataset into groups based on specific criteria and apply aggregation functions like sum, mean, count, and more. This technique is especially useful when working with large datasets, enabling you to extract meaningful insights efficiently.
In this tutorial, we will explore how to use grouping and aggregation in Pandas with two practical examples:
- 📌 Grouping by a single column and applying aggregation.
- 📌 Grouping by multiple columns and using different aggregation functions.
📌 Example 1: Grouping by a Single Column
Let’s consider a dataset of sales data:
import pandas as pd
# Creating a DataFrame
data = {'Category': ['Electronics', 'Clothing', 'Electronics', 'Clothing', 'Furniture'],
'Sales': [200, 150, 300, 100, 250]}
df = pd.DataFrame(data)
# Grouping by Category and summing sales
df_grouped = df.groupby('Category')['Sales'].sum()
print(df_grouped)
✅ Output:
Category
Clothing 250
Electronics 500
Furniture 250
Name: Sales, dtype: int64
Here, the groupby()
function groups the data by ‘Category’ and calculates the total sales for each category using sum()
.
📌 Example 2: Grouping by Multiple Columns with Different Aggregations
Now, let’s expand our dataset and apply multiple aggregation functions:
# Creating an extended dataset
data = {'Category': ['Electronics', 'Clothing', 'Electronics', 'Clothing', 'Furniture', 'Furniture'],
'Region': ['North', 'South', 'North', 'East', 'West', 'North'],
'Sales': [200, 150, 300, 100, 250, 300]}
df = pd.DataFrame(data)
# Grouping by Category and Region with multiple aggregations
df_grouped = df.groupby(['Category', 'Region']).agg({'Sales': ['sum', 'mean', 'count']})
print(df_grouped)
✅ Output:
Sales
sum mean count
Category Region
Clothing East 100 100.0 1
South 150 150.0 1
Electronics North 500 250.0 2
Furniture North 300 300.0 1
West 250 250.0 1
Here, we grouped by both ‘Category’ and ‘Region’ and applied multiple aggregation functions:
sum()
: Total sales per groupmean()
: Average sales per groupcount()
: Number of entries per group
📌 Summary
🔹 Grouping and aggregation help summarize and analyze large datasets efficiently. 🔹 The groupby()
function allows you to split data into groups based on column values. 🔹 Aggregation functions like sum()
, mean()
, and count()
can be applied to extract insights.
Mastering grouping and aggregation techniques in Pandas will significantly improve your data analysis workflow, making it easier to analyze structured data. 🚀