Data Science Pandas

3.1. Pandas DataFrame: Creating DataFrame from Dictionary, CSV, Excel, and JSON

đź“– Introduction

Pandas is a powerful data analysis and manipulation library for Python. One of its core structures is the DataFrame, which is a two-dimensional, tabular data structure similar to a spreadsheet or SQL table. In this tutorial, we will explore how to create a Pandas DataFrame from different data sources such as dictionaries, CSV files, Excel files, and JSON.

🗂️ 1. Creating a DataFrame from a Dictionary

A dictionary in Python consists of key-value pairs, where keys represent column names, and values represent data. Here’s an example:

import pandas as pd

# Creating a dictionary
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}

# Creating DataFrame
df = pd.DataFrame(data)
print(df)

âś… Output:

     Name  Age         City
0   Alice   25     New York
1     Bob   30  Los Angeles
2  Charlie   35      Chicago

đź“„ 2. Creating a DataFrame from a CSV File

CSV (Comma-Separated Values) files are commonly used to store tabular data. You can read a CSV file into a Pandas DataFrame using the read_csv method.

📌 Sample CSV Data (data.csv):

Name,Age,City
Alice,25,New York
Bob,30,Los Angeles
Charlie,35,Chicago
# Reading a CSV file
df = pd.read_csv('data.csv')
print(df.head())  # Display first five rows

Ensure that the data.csv file is present in the working directory or provide the full file path.

📊 3. Creating a DataFrame from an Excel File

Excel files are widely used in data analysis. Pandas provides read_excel to read Excel files into a DataFrame.

📌 Sample Excel Data (data.xlsx):

Name Age City
Alice 25 New York
Bob 30 Los Angeles
Charlie 35 Chicago
# Reading an Excel file
df = pd.read_excel('data.xlsx', sheet_name='Sheet1')
print(df.head())

Make sure you have openpyxl or xlrd installed to read Excel files: pip install openpyxl xlrd.

đź”— 4. Creating a DataFrame from a JSON File

JSON (JavaScript Object Notation) is a lightweight data format that is widely used in web applications. You can create a DataFrame from a JSON file using read_json.

📌 Sample JSON Data (data.json):

[
    {"Name": "Alice", "Age": 25, "City": "New York"},
    {"Name": "Bob", "Age": 30, "City": "Los Angeles"},
    {"Name": "Charlie", "Age": 35, "City": "Chicago"}
]
# Reading a JSON file
df = pd.read_json('data.json')
print(df.head())

JSON data should be properly formatted to be read correctly.

🎯 Conclusion

Pandas provides multiple ways to create a DataFrame from different data sources, making it a flexible tool for data analysis. Whether working with dictionaries, CSV, Excel, or JSON files, Pandas makes data manipulation easy and efficient. By mastering these techniques, you can seamlessly work with various data formats in Python.

Leave a Reply

Your email address will not be published. Required fields are marked *