Chapter 5 – Bar Charts and Histograms in Matplotlib
Bar charts and histograms are among the most commonly used visualizations in data analysis. Both represent numerical data using rectangular bars, but they serve slightly different purposes. In this chapter, we’ll explore how to create, customize, and interpret bar charts and histograms in Matplotlib step by step.
✅ Difference Between Bar Charts and Histograms
Before diving into code, it’s important to understand the conceptual difference between these two chart types:
- Bar Chart: Used for categorical data (e.g., sales by product, population by country). Each bar represents a category.
- Histogram: Used for numerical data distribution (e.g., ages of people, marks of students). Bars represent ranges (bins) instead of categories.
In short, bar charts compare categories, while histograms show how data is distributed across intervals.
✅ Creating a Basic Bar Chart
Let’s start with a simple bar chart showing sales of different products.
import matplotlib.pyplot as plt
products = ['Apples', 'Bananas', 'Cherries', 'Dates', 'Elderberries']
sales = [50, 75, 30, 90, 60]
plt.bar(products, sales)
plt.title("Fruit Sales Report")
plt.xlabel("Product")
plt.ylabel("Units Sold")
plt.show()
This code creates a vertical bar chart where each bar’s height corresponds to the number of units sold. By default, Matplotlib uses solid blue bars, but you can easily change colors and styles.
✅ Customizing Bar Colors and Width
You can control bar colors, width, and edge style using additional parameters:
plt.bar(products, sales, color='skyblue', edgecolor='black', width=0.6)
Here’s what each argument does:
color— changes the fill color of bars.edgecolor— adds a border color.width— adjusts the bar thickness (default is 0.8).
Experimenting with these options helps you create cleaner and more professional charts for reports or dashboards.
✅ Adding Value Labels on Bars
To make your chart more informative, you can display the value on top of each bar:
bars = plt.bar(products, sales, color='lightgreen')
for bar in bars:
yval = bar.get_height()
plt.text(bar.get_x() + bar.get_width()/2, yval + 2, yval, ha='center', va='bottom')
plt.show()
This loop iterates over each bar, retrieves its height using get_height(), and places the numeric label above it with plt.text(). This makes your chart self-explanatory even without referring to the y-axis.
✅ Horizontal Bar Charts
Sometimes, horizontal bars look better — especially when category names are long. You can create one easily with barh() instead of bar():
plt.barh(products, sales, color='orange')
plt.title("Fruit Sales Report (Horizontal)")
plt.xlabel("Units Sold")
plt.ylabel("Product")
plt.show()
Horizontal charts are ideal when you have many categories or long labels that overlap in a vertical layout.
✅ Grouped Bar Charts
When you want to compare multiple categories across different groups (e.g., sales of two years), use a grouped bar chart. Let’s compare fruit sales for two different months:
import numpy as np
products = ['Apples', 'Bananas', 'Cherries', 'Dates', 'Elderberries']
sales_Jan = [50, 75, 30, 90, 60]
sales_Feb = [55, 80, 35, 100, 70]
x = np.arange(len(products))
width = 0.35
plt.bar(x - width/2, sales_Jan, width, label='January', color='skyblue')
plt.bar(x + width/2, sales_Feb, width, label='February', color='lightcoral')
plt.xlabel("Product")
plt.ylabel("Units Sold")
plt.title("Monthly Sales Comparison")
plt.xticks(x, products)
plt.legend()
plt.show()
This method offsets bars on the x-axis to place them side by side for each category. The width variable ensures proper spacing, while xticks() sets category labels back on the x-axis.
✅ Stacked Bar Charts
A stacked bar chart is useful when you want to show both total and part-wise contribution in one view.
plt.bar(products, sales_Jan, color='lightblue', label='January')
plt.bar(products, sales_Feb, bottom=sales_Jan, color='salmon', label='February')
plt.title("Stacked Sales Comparison")
plt.xlabel("Product")
plt.ylabel("Units Sold")
plt.legend()
plt.show()
In this case, February’s bars are stacked on top of January’s. The bottom parameter tells Matplotlib where each new bar should start, creating a cumulative effect.
✅ Changing Bar Order and Orientation
Sometimes, you may want to reorder bars by value for a cleaner presentation. Python makes this easy:
sorted_pairs = sorted(zip(sales, products))
sales_sorted, products_sorted = zip(*sorted_pairs)
plt.barh(products_sorted, sales_sorted, color='limegreen')
plt.title("Sorted Fruit Sales")
plt.show()
This simple technique helps viewers quickly identify which categories performed best or worst.
✅ Adding Custom Colors for Each Bar
If you want to give each bar a different color, you can pass a list of colors:
colors = ['red', 'yellow', 'pink', 'blue', 'green']
plt.bar(products, sales, color=colors)
plt.show()
This is particularly useful when each bar represents a different group or when you want to highlight specific bars.
✅ Adding Gridlines and Axis Limits
Adding subtle gridlines improves readability. You can also control axis limits:
plt.bar(products, sales, color='teal')
plt.grid(axis='y', linestyle='--', alpha=0.6)
plt.ylim(0, 120)
plt.show()
axis='y' limits gridlines to horizontal lines only. The alpha value makes them semi-transparent for a clean look.
✅ Creating a Histogram
Histograms show how data values are distributed. Suppose you have test scores for 100 students:
import numpy as np
scores = np.random.normal(70, 10, 100)
plt.hist(scores, bins=10, color='lightblue', edgecolor='black')
plt.title("Distribution of Test Scores")
plt.xlabel("Score Range")
plt.ylabel("Number of Students")
plt.show()
Here’s how the histogram works:
scores— the data (in this case, generated randomly with mean 70, std 10).bins— divides data into 10 equal intervals.plt.hist()— automatically counts how many values fall into each bin and draws the bars.
✅ Adjusting Bin Size and Style
The number of bins greatly affects the appearance and interpretation of a histogram. Try changing bins from 5 to 20 to see the difference:
plt.hist(scores, bins=20, color='lightcoral', edgecolor='black')
More bins show finer details, while fewer bins show broader patterns. Choose based on the level of detail you want to present.
✅ Adding Density and Normalization
By default, histograms show frequency counts. If you want to display probability density (area under curve = 1), use the density=True parameter:
plt.hist(scores, bins=10, color='orange', density=True)
This is helpful when comparing two datasets with different sample sizes.
✅ Overlaying a Density Curve
You can overlay a smooth density curve (using NumPy) to make your histogram visually appealing and informative:
import numpy as np
count, bins, ignored = plt.hist(scores, bins=15, density=True, color='lightgreen', edgecolor='black')
plt.plot(bins, 1/(10 * np.sqrt(2 * np.pi)) * np.exp(- (bins - 70)**2 / (2 * 10**2)), linewidth=2, color='red')
plt.title("Histogram with Normal Distribution Curve")
plt.show()
Here, we manually plot the normal distribution curve using the mean (70) and standard deviation (10). This is useful for visualizing how closely data follows a bell curve.
✅ Comparing Multiple Histograms
When comparing two groups, such as male vs. female students’ scores, you can overlay or stack histograms:
male_scores = np.random.normal(72, 8, 100)
female_scores = np.random.normal(68, 10, 100)
plt.hist(male_scores, bins=10, alpha=0.5, label='Male', color='blue')
plt.hist(female_scores, bins=10, alpha=0.5, label='Female', color='pink')
plt.legend()
plt.title("Score Distribution by Gender")
plt.show()
The alpha value adds transparency, allowing both distributions to remain visible when overlapping.
✅ Styling Histograms
You can use styles and gridlines to make your histograms more polished:
plt.style.use('ggplot')
plt.hist(scores, bins=15, color='purple', edgecolor='white')
plt.title("Styled Histogram of Test Scores")
plt.xlabel("Score")
plt.ylabel("Frequency")
plt.show()
The ggplot style mimics a clean, professional aesthetic inspired by R’s ggplot2 library.
✅ Recap
In this chapter, you learned how to create and customize bar charts and histograms using Matplotlib. Here’s what we covered:
- Creating vertical and horizontal bar charts
- Customizing colors, borders, and labels
- Building grouped and stacked bar charts
- Sorting and coloring bars dynamically
- Creating and styling histograms for data distribution
- Overlaying multiple histograms and density curves
Bar charts and histograms are fundamental in data visualization. They help you summarize data effectively, compare groups, and understand overall patterns. Once you master these, you’ll be able to represent almost any data story visually and clearly.
✅ What’s Next?
In the next chapter, we’ll move to Chapter 6 – Pie Charts and Donut Charts. You’ll learn how to represent proportions, customize slices, add percentages, and create professional pie visuals using Matplotlib.
