Data Science Pandas

3.2.Pandas DataFrame: Selecting Columns and Rows in Python (Complete Guide)

๐Ÿ“š Introduction

Pandas is a powerful data analysis library in Python, and its DataFrame is widely used for handling structured data. One of the key operations in Pandas is selecting specific columns and rows for analysis. This tutorial will cover different ways to select columns and rows in a Pandas DataFrame using various methods, including label-based indexing, integer-based indexing, and conditional selection.

๐Ÿ—‚๏ธ 1. Creating a Sample DataFrame

Before we dive into selecting columns and rows, let’s first create a sample DataFrame to work with.

import pandas as pd

# Creating a sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, 35, 40, 28],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'],
    'Salary': [50000, 60000, 75000, 80000, 65000]
}

df = pd.DataFrame(data)
print(df)

โœ… Output:

     Name  Age         City  Salary
0   Alice   25     New York   50000
1     Bob   30  Los Angeles   60000
2  Charlie   35      Chicago   75000
3   David   40      Houston   80000
4     Eve   28      Phoenix   65000

Now, let’s explore different ways to select specific columns and rows from this DataFrame.

๐Ÿ“„ 2. Selecting Columns

You can select columns in Pandas using different methods.

๐Ÿ“Œ Selecting a Single Column

Use the column name inside square brackets [] to extract a single column.

# Selecting a single column
df['Name']

โœ… Output:

0    Alice
1      Bob
2  Charlie
3    David
4      Eve
Name: Name, dtype: object

Alternatively, you can use dot notation:

# Selecting a single column using dot notation
df.Name

๐Ÿ“Œ Selecting Multiple Columns

To select multiple columns, pass a list of column names inside [].

# Selecting multiple columns
df[['Name', 'Age']]

โœ… Output:

     Name  Age
0   Alice   25
1     Bob   30
2  Charlie   35
3   David   40
4     Eve   28

๐Ÿ“Š 3. Selecting Rows

Selecting rows can be done using index-based selection, label-based selection, and conditional filtering.

๐Ÿ“Œ Selecting a Single Row by Index

Use .iloc[] for integer-based selection.

# Selecting the first row
df.iloc[0]

โœ… Output:

Name      Alice
Age          25
City  New York
Salary    50000
Name: 0, dtype: object

๐Ÿ“Œ Selecting Multiple Rows by Index

You can pass a list of index positions to .iloc[].

# Selecting multiple rows
df.iloc[[0, 2, 4]]

โœ… Output:

     Name  Age      City  Salary
0   Alice   25  New York   50000
2  Charlie   35   Chicago   75000
4     Eve   28   Phoenix   65000

๐Ÿ“Œ Selecting Rows by Label with .loc[]

.loc[] allows selecting rows using index labels, not just numerical positions.

๐Ÿ”น Example 1: Default Numeric Index (Same as .iloc[])

# Selecting row by label (default index 0)
print(df.loc[0])  # Same as df.iloc[0]

โœ… Output:

Name    Alice
Age        25
City  New York
Salary  50000
Name: 0, dtype: object

๐Ÿ”น Example 2: Custom Index Labels

df = pd.DataFrame(data, index=['a', 'b', 'c', 'd', 'e'])

# Selecting row by custom label
print(df.loc['a'])

โœ… Output:

Name    Alice
Age        25
City  New York
Salary  50000
Name: a, dtype: object

๐Ÿ”น Example 3: Date Index (Time-Series)

df = pd.DataFrame(data, index=pd.to_datetime(['2024-01-01', '2024-01-02', '2024-01-03', '2024-01-04', '2024-01-05']))

# Selecting row by date label
print(df.loc['2024-01-02'])

โœ… Output:

Name    Bob
Age      30
City  Los Angeles
Salary  60000
Name: 2024-01-02 00:00:00, dtype: object

๐Ÿ“Œ Selecting Rows Based on Conditions

You can filter rows based on conditions using Boolean indexing.

๐Ÿ“Œ Selecting Rows Where Age is Greater Than 30

# Selecting rows where Age is greater than 30
df[df['Age'] > 30]

โœ… Output:

     Name  Age     City  Salary
2  Charlie   35  Chicago   75000
3   David   40  Houston   80000

๐Ÿ“Œ Selecting Rows Where City is ‘New York’

# Selecting rows where City is 'New York'
df[df['City'] == 'New York']

โœ… Output:

   Name  Age      City  Salary
0  Alice   25  New York   50000

๐ŸŽฏ Conclusion

Selecting specific columns and rows in a Pandas DataFrame is an essential skill for data analysis. You can use simple column selection, .iloc[] for index-based selection, .loc[] for label-based selection, and Boolean indexing for conditional selection. Mastering these techniques will allow you to efficiently analyze and manipulate your data in Python.

๐Ÿ“Œ Key Takeaways:

  • Use df['column'] or df.column for single-column selection.
  • Use df[['col1', 'col2']] for selecting multiple columns.
  • Use .iloc[] for selecting rows by integer index.
  • Use .loc[] for selecting rows by label (custom or default index).
  • Boolean indexing allows filtering rows based on conditions.
  • Custom index labels and time-series indices enhance selection flexibility.

Leave a Reply

Your email address will not be published. Required fields are marked *