Data Science Pandas

8.Pandas Input and Output: Read & Write CSV, Excel, JSON, SQL, and Handle Large Datasets

Introduction (200 words)

Input and output (I/O) operations are fundamental when working with data in Python. Whether you’re analyzing sales figures, sensor logs, or web traffic, the ability to seamlessly load and export data is crucial for effective data workflows. The Pandas library offers robust and flexible tools to handle various file formats like CSV, Excel, JSON, and SQL databases.

In this tutorial, you’ll learn how to:

  • Read data from different file formats into DataFrames

  • Write processed data back to those formats

  • Work with large datasets using memory-efficient techniques

We’ll walk through two real-world examples for each format so you can confidently integrate these operations into your data pipeline. By the end, you’ll be able to handle input and output tasks with ease — even when working with large files that don’t fit into memory.


1. Reading Data

Example 1: Read CSV

import pandas as pd

df_csv = pd.read_csv('data/products.csv')
print(df_csv.head())

Output (example):

id name price
0 1 Chair 50
1 2 Desk 100

Example 2: Read Excel

df_excel = pd.read_excel('data/sales.xlsx', sheet_name='Sheet1')
print(df_excel.head())

Output:

Date Product Sales
0 2025-01-01 Chair 200
1 2025-01-02 Desk 150

Example 3: Read JSON

df_json = pd.read_json('data/users.json')
print(df_json.head())

Output:

id name email
0 1 Alice a@xyz.com
1 2 Bob b@xyz.com

Example 4: Read from SQL

import sqlite3

conn = sqlite3.connect('data/shop.db')
df_sql = pd.read_sql_query("SELECT * FROM orders", conn)
print(df_sql.head())

Output:

order_id customer amount
0 1 Alice 250
1 2 Bob 180

2. Writing Data

Example 1: Write to CSV

df_csv.to_csv('output/products_output.csv', index=False)

Creates: products_output.csv


Example 2: Write to Excel

df_excel.to_excel('output/sales_output.xlsx', index=False)

Creates: sales_output.xlsx


Example 3: Write to JSON

df_json.to_json('output/users_output.json', orient='records', lines=True)

Creates: users_output.json


Example 4: Write to SQL

df_sql.to_sql('orders_backup', conn, if_exists='replace', index=False)

Effect: Table orders_backup added/replaced in shop.db


3. Handling Large Datasets

Example 1: Read Large CSV in Chunks

chunk_iter = pd.read_csv('data/large_sales.csv', chunksize=1000)

for chunk in chunk_iter:
print(chunk.head(1))

Output: First row of each chunk printed (memory efficient)


Example 2: Selective Column Loading

df_filtered = pd.read_csv('data/large_sales.csv', usecols=['product', 'sales'])
print(df_filtered.head())

Output:

product sales
0 Chair 200
1 Desk 150

Summary

Input and output operations are the building blocks of every data science project. With Pandas, you can effortlessly read and write data across CSV, Excel, JSON, and SQL formats. We also explored strategies to manage large datasets using chunking and column filtering — techniques that are essential for working in memory-constrained environments.

Now that you’re equipped with these powerful I/O tools, you can integrate them into any data pipeline, automate reporting tasks, or scale your analytics workflows efficiently.

 

Leave a Reply

Your email address will not be published. Required fields are marked *