Data Science Pandas

4.2.Data Manipulation with Pandas – Grouping and Aggregations

📊 Data Manipulation with Pandas: Grouping and Aggregations

🔍 Introduction

Pandas is a powerful Python library for data manipulation and analysis. One of the key techniques for summarizing data is grouping and aggregation. The groupby() function in Pandas allows you to split your dataset into groups based on specific criteria and apply aggregation functions like sum, mean, count, and more. This technique is especially useful when working with large datasets, enabling you to extract meaningful insights efficiently.

In this tutorial, we will explore how to use grouping and aggregation in Pandas with two practical examples:

  1. 📌 Grouping by a single column and applying aggregation.
  2. 📌 Grouping by multiple columns and using different aggregation functions.

📌 Example 1: Grouping by a Single Column

Let’s consider a dataset of sales data:

import pandas as pd

# Creating a DataFrame
data = {'Category': ['Electronics', 'Clothing', 'Electronics', 'Clothing', 'Furniture'],
        'Sales': [200, 150, 300, 100, 250]}
df = pd.DataFrame(data)

# Grouping by Category and summing sales
df_grouped = df.groupby('Category')['Sales'].sum()
print(df_grouped)

✅ Output:

Category
Clothing       250
Electronics    500
Furniture      250
Name: Sales, dtype: int64

Here, the groupby() function groups the data by ‘Category’ and calculates the total sales for each category using sum().

📌 Example 2: Grouping by Multiple Columns with Different Aggregations

Now, let’s expand our dataset and apply multiple aggregation functions:

# Creating an extended dataset
data = {'Category': ['Electronics', 'Clothing', 'Electronics', 'Clothing', 'Furniture', 'Furniture'],
        'Region': ['North', 'South', 'North', 'East', 'West', 'North'],
        'Sales': [200, 150, 300, 100, 250, 300]}
df = pd.DataFrame(data)

# Grouping by Category and Region with multiple aggregations
df_grouped = df.groupby(['Category', 'Region']).agg({'Sales': ['sum', 'mean', 'count']})
print(df_grouped)

✅ Output:

                      Sales              
                        sum  mean count
Category    Region                      
Clothing    East       100  100.0     1
            South      150  150.0     1
Electronics North      500  250.0     2
Furniture   North      300  300.0     1
            West       250  250.0     1

Here, we grouped by both ‘Category’ and ‘Region’ and applied multiple aggregation functions:

  • sum(): Total sales per group
  • mean(): Average sales per group
  • count(): Number of entries per group

📌 Summary

🔹 Grouping and aggregation help summarize and analyze large datasets efficiently. 🔹 The groupby() function allows you to split data into groups based on column values. 🔹 Aggregation functions like sum(), mean(), and count() can be applied to extract insights.

Mastering grouping and aggregation techniques in Pandas will significantly improve your data analysis workflow, making it easier to analyze structured data. 🚀

Leave a Reply

Your email address will not be published. Required fields are marked *