๐ Introduction
Pandas is a powerful data analysis library in Python, and its DataFrame is widely used for handling structured data. One of the key operations in Pandas is selecting specific columns and rows for analysis. This tutorial will cover different ways to select columns and rows in a Pandas DataFrame using various methods, including label-based indexing, integer-based indexing, and conditional selection.
๐๏ธ 1. Creating a Sample DataFrame
Before we dive into selecting columns and rows, let’s first create a sample DataFrame to work with.
import pandas as pd
# Creating a sample DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [25, 30, 35, 40, 28],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'],
'Salary': [50000, 60000, 75000, 80000, 65000]
}
df = pd.DataFrame(data)
print(df)
โ Output:
Name Age City Salary
0 Alice 25 New York 50000
1 Bob 30 Los Angeles 60000
2 Charlie 35 Chicago 75000
3 David 40 Houston 80000
4 Eve 28 Phoenix 65000
Now, let’s explore different ways to select specific columns and rows from this DataFrame.
๐ 2. Selecting Columns
You can select columns in Pandas using different methods.
๐ Selecting a Single Column
Use the column name inside square brackets []
to extract a single column.
# Selecting a single column
df['Name']
โ Output:
0 Alice
1 Bob
2 Charlie
3 David
4 Eve
Name: Name, dtype: object
Alternatively, you can use dot notation:
# Selecting a single column using dot notation
df.Name
๐ Selecting Multiple Columns
To select multiple columns, pass a list of column names inside []
.
# Selecting multiple columns
df[['Name', 'Age']]
โ Output:
Name Age
0 Alice 25
1 Bob 30
2 Charlie 35
3 David 40
4 Eve 28
๐ 3. Selecting Rows
Selecting rows can be done using index-based selection, label-based selection, and conditional filtering.
๐ Selecting a Single Row by Index
Use .iloc[]
for integer-based selection.
# Selecting the first row
df.iloc[0]
โ Output:
Name Alice
Age 25
City New York
Salary 50000
Name: 0, dtype: object
๐ Selecting Multiple Rows by Index
You can pass a list of index positions to .iloc[]
.
# Selecting multiple rows
df.iloc[[0, 2, 4]]
โ Output:
Name Age City Salary
0 Alice 25 New York 50000
2 Charlie 35 Chicago 75000
4 Eve 28 Phoenix 65000
๐ Selecting Rows by Label with .loc[]
.loc[]
allows selecting rows using index labels, not just numerical positions.
๐น Example 1: Default Numeric Index (Same as .iloc[]
)
# Selecting row by label (default index 0)
print(df.loc[0]) # Same as df.iloc[0]
โ Output:
Name Alice
Age 25
City New York
Salary 50000
Name: 0, dtype: object
๐น Example 2: Custom Index Labels
df = pd.DataFrame(data, index=['a', 'b', 'c', 'd', 'e'])
# Selecting row by custom label
print(df.loc['a'])
โ Output:
Name Alice
Age 25
City New York
Salary 50000
Name: a, dtype: object
๐น Example 3: Date Index (Time-Series)
df = pd.DataFrame(data, index=pd.to_datetime(['2024-01-01', '2024-01-02', '2024-01-03', '2024-01-04', '2024-01-05']))
# Selecting row by date label
print(df.loc['2024-01-02'])
โ Output:
Name Bob
Age 30
City Los Angeles
Salary 60000
Name: 2024-01-02 00:00:00, dtype: object
๐ Selecting Rows Based on Conditions
You can filter rows based on conditions using Boolean indexing.
๐ Selecting Rows Where Age is Greater Than 30
# Selecting rows where Age is greater than 30
df[df['Age'] > 30]
โ Output:
Name Age City Salary
2 Charlie 35 Chicago 75000
3 David 40 Houston 80000
๐ Selecting Rows Where City is ‘New York’
# Selecting rows where City is 'New York'
df[df['City'] == 'New York']
โ Output:
Name Age City Salary
0 Alice 25 New York 50000
๐ฏ Conclusion
Selecting specific columns and rows in a Pandas DataFrame is an essential skill for data analysis. You can use simple column selection, .iloc[]
for index-based selection, .loc[]
for label-based selection, and Boolean indexing for conditional selection. Mastering these techniques will allow you to efficiently analyze and manipulate your data in Python.
๐ Key Takeaways:
- Use
df['column']
ordf.column
for single-column selection. - Use
df[['col1', 'col2']]
for selecting multiple columns. - Use
.iloc[]
for selecting rows by integer index. - Use
.loc[]
for selecting rows by label (custom or default index). - Boolean indexing allows filtering rows based on conditions.
- Custom index labels and time-series indices enhance selection flexibility.