Python

๐Ÿ“˜ Chapter 8: Boolean Indexing and Masking in NumPy โ€” Smart Filtering Made Simple

๐Ÿง  โ€œBoolean indexing in NumPy is like having a magical magnifying glass โ€” you only see the data you care about.โ€

Welcome to Chapter 8 of your NumPy learning series!

Youโ€™ve already mastered creating, reshaping, and manipulating arrays. Now, letโ€™s move into one of the most powerful features of NumPy โ€” Boolean indexing and masking.

This concept is a game-changer for:

  • Filtering data

  • Making condition-based selections

  • Applying vectorized logic without loops

Whether you’re cleaning data, extracting rows that meet a condition, or highlighting specific values โ€” Boolean indexing helps you do it quickly, efficiently, and cleanly.


๐Ÿ” What You’ll Learn

  • How to create Boolean masks using conditions

  • Select elements that meet one or more criteria

  • Combine multiple conditions using logical operators

  • Real-life examples: filtering, cleaning, and subsetting data

Letโ€™s begin!


โœ… 1. What is Boolean Indexing?

Boolean indexing means selecting data based on whether a condition is True or False.

Letโ€™s start simple:

import numpy as np

arr = np.array([10, 20, 30, 40, 50])
mask = arr > 30
print(mask)

 

Output:

[False False False  True  True]

 

Now apply the mask:

filtered = arr[mask]
print(filtered)  # Output: [40 50]

 

๐ŸŽฏ This is Boolean indexing in action: filtering without loops.


๐Ÿง  2. Creating Boolean Masks

You can use any comparison operator:

Operator Description
> Greater than
< Less than
== Equal to
!= Not equal
>= Greater or equal
<= Less or equal

๐Ÿ“Œ Examples

arr = np.array([5, 10, 15, 20, 25])

print(arr < 20)   # [ True  True  True False False]
print(arr == 10)  # [False  True False False False]
print(arr != 15)  # [ True  True False  True  True]

 


๐ŸงŠ 3. Selecting Elements That Meet Criteria

Use a Boolean mask directly inside the arrayโ€™s brackets:

arr = np.array([12, 23, 34, 45, 56])

# Select values > 30
print(arr[arr > 30])  # [34 45 56]

# Select even numbers
print(arr[arr % 2 == 0])  # [12 34 56]

This can be used in real-life for:

  • Selecting rows with high sales

  • Removing missing data

  • Filtering scores above average


๐Ÿ”— 4. Combining Multiple Conditions

Just like in logic, you can combine multiple conditions using:

Operator Description
& AND
` `
~ NOT

Wrap conditions in parentheses to avoid precedence errors!

๐Ÿ”ธ Example: AND

arr = np.array([10, 15, 20, 25, 30])

# Select values between 15 and 30
filtered = arr[(arr >= 15) & (arr <= 30)]
print(filtered)  # [15 20 25 30]

 

๐Ÿ”ธ Example: OR

print(arr[(arr < 15) | (arr > 25)])  # [10 30]

 

๐Ÿ”ธ Example: NOT

print(arr[~(arr == 20)]) # [10 15 25 30]

๐Ÿงพ 5. Boolean Indexing with 2D Arrays

Letโ€™s take a matrix:

matrix = np.array([
    [10, 20, 30],
    [40, 50, 60],
    [70, 80, 90]
])

# Mask values > 50
print(matrix[matrix > 50])  # [60 70 80 90]

 


๐Ÿ” 6. Updating Elements Based on Conditions

You can also modify values using Boolean masks:

arr = np.array([1, 2, 3, 4, 5])
arr[arr > 3] = 99
print(arr)  # [1 2 3 99 99]

Useful for:

  • Replacing outliers

  • Censoring negative numbers

  • Standardizing data ranges


๐Ÿ’ก 7. Where to Use Boolean Indexing in Real Life?

Application Use Case
Data Cleaning Remove NaNs or negative values
Image Processing Mask pixels below threshold
Finance Select trades > โ‚น10,000
Machine Learning Extract samples with target class
Health Data Patients with high BP or low sugar

๐Ÿ” 8. Bonus: Using np.where()

Sometimes you donโ€™t just want to filter โ€” you want to choose between two values based on a condition.

Syntax:

np.where(condition, value_if_true, value_if_false)

Example:

arr = np.array([10, 20, 30, 40])
new_arr = np.where(arr > 25, 1, 0)
print(new_arr)  # [0 0 1 1]

 

This is especially useful in label encoding, binarizing data, or applying simple transformations.


๐Ÿงช Real-World Use Case: Filter Students Above Average

scores = np.array([55, 70, 90, 60, 85])
avg = np.mean(scores)

above_avg = scores[scores > avg]
print("Above Average Scores:", above_avg)

You can also extract their indices:

indices = np.where(scores > avg)
print("Indices:", indices)  # (array([1, 2, 4]),)

 


โš ๏ธ Common Mistakes to Avoid

Mistake Correction
Not using parentheses with & or ` `
Confusing and / or with & / ` `
Expecting shape preservation Boolean indexing flattens the result
Modifying original array without copy Use .copy() if you need the original unchanged

๐Ÿ“Œ Summary Table: Boolean Indexing Essentials

Concept Example Result
Create mask arr > 20 [False, False, True, ...]
Apply mask arr[arr > 20] Values > 20
AND (a > 10) & (a < 50) Values between 10โ€“50
OR `(a == 0) (a == 1)`
NOT ~(a < 5) Values โ‰ฅ 5
Replace arr[arr < 0] = 0 Set negatives to zero
np.where() np.where(a > 0, 1, 0) Replace values by condition

๐Ÿ”š Wrapping Up Chapter 8

And thatโ€™s a wrap! ๐ŸŽ‰
Youโ€™ve just unlocked one of the most powerful tools in NumPy: Boolean indexing and masking.

This feature allows you to:

  • Filter smartly

  • Replace conditionally

  • Combine logic in a clean, vectorized way

Youโ€™ll use Boolean indexing in every serious NumPy project, whether you’re analyzing data, building AI, or visualizing stats.


๐Ÿ”œ Coming Next in Chapter 9: Sorting and Searching in NumPy

Weโ€™ll explore:

  • Sorting arrays with np.sort(), argsort()

  • Finding values with searchsorted(), nonzero()

  • Real-world examples with data processing

Leave a Reply

Your email address will not be published. Required fields are marked *