Intermediate

Data Visualization

A great visualization is worth a thousand data points. Learn to create clear, compelling charts with Matplotlib, Seaborn, and Plotly that communicate insights effectively.

Why Visualization Matters

Visualization transforms raw numbers into visual patterns that humans can quickly understand. Good visualizations help you:

  • Discover patterns that are invisible in raw data tables
  • Communicate findings to stakeholders who may not understand statistics
  • Validate assumptions before and after building models
  • Tell a story that drives action and decision-making

Choosing the Right Chart

Chart Type Best For Example Use
Bar Chart Comparing categories Sales by region, counts by category
Line Chart Trends over time Revenue over months, stock prices
Scatter Plot Relationships between two variables Height vs weight, price vs demand
Histogram Distribution of one variable Age distribution, exam scores
Heatmap Correlations or matrix data Feature correlations, time-based patterns
Box Plot Distribution and outliers Salary ranges by department
Pie Chart Parts of a whole (use sparingly) Market share, budget allocation

Matplotlib — The Foundation

Matplotlib is Python's most widely used plotting library. It gives you full control over every element of your chart.

Python
import matplotlib.pyplot as plt
import numpy as np

# Bar chart
categories = ['Q1', 'Q2', 'Q3', 'Q4']
revenue = [45000, 52000, 48000, 61000]

plt.figure(figsize=(8, 5))
plt.bar(categories, revenue, color='#4f46e5')
plt.title('Quarterly Revenue', fontsize=14)
plt.xlabel('Quarter')
plt.ylabel('Revenue ($)')
plt.tight_layout()
plt.show()

# Line chart with multiple series
months = np.arange(1, 13)
product_a = [20, 25, 30, 28, 35, 40, 38, 45, 42, 50, 55, 60]
product_b = [15, 18, 22, 20, 25, 28, 30, 35, 33, 38, 40, 45]

plt.figure(figsize=(10, 5))
plt.plot(months, product_a, marker='o', label='Product A')
plt.plot(months, product_b, marker='s', label='Product B')
plt.title('Monthly Sales Comparison')
plt.xlabel('Month')
plt.ylabel('Units Sold')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

Seaborn — Statistical Visualization

Seaborn builds on Matplotlib with beautiful defaults and specialized statistical plots. It integrates directly with Pandas DataFrames.

Python
import seaborn as sns

# Distribution plot
sns.histplot(df['salary'], bins=30, kde=True)
plt.title('Salary Distribution')
plt.show()

# Box plot by category
sns.boxplot(x='department', y='salary', data=df, palette='Set2')
plt.title('Salary by Department')
plt.xticks(rotation=45)
plt.show()

# Heatmap of correlations
plt.figure(figsize=(10, 8))
sns.heatmap(df.corr(numeric_only=True),
            annot=True, cmap='RdYlBu_r', center=0)
plt.title('Feature Correlations')
plt.show()

# Scatter with regression line
sns.regplot(x='experience', y='salary', data=df)
plt.title('Experience vs Salary')
plt.show()

Plotly — Interactive Charts

Plotly creates interactive, web-based visualizations. Users can zoom, hover for details, and toggle data series. It is ideal for dashboards and presentations.

Python
import plotly.express as px

# Interactive scatter plot
fig = px.scatter(df, x='gdp_per_capita', y='life_expectancy',
                 size='population', color='continent',
                 hover_name='country',
                 title='GDP vs Life Expectancy')
fig.show()

# Interactive line chart
fig = px.line(df, x='date', y='revenue', color='product',
              title='Revenue Trends by Product')
fig.show()

# Interactive bar chart
fig = px.bar(df.groupby('region')['sales'].sum().reset_index(),
             x='region', y='sales',
             title='Total Sales by Region')
fig.show()

Storytelling with Data

Great data visualization is about telling a story, not just displaying numbers. Follow these principles:

  1. Start with the Question

    Every chart should answer a specific question. If you cannot state the question, reconsider the visualization.

  2. Choose the Right Chart Type

    Match your chart to your data and message. Do not use a pie chart for 15 categories or a scatter plot for time series.

  3. Remove Clutter

    Eliminate unnecessary gridlines, borders, and decorations. Every element should serve a purpose.

  4. Use Color Intentionally

    Use color to highlight key findings, not to make things "pretty." Ensure accessibility for colorblind viewers.

  5. Add Context

    Include clear titles, axis labels, legends, and annotations. A chart should be understandable without additional explanation.

Recommended reading: Storytelling with Data by Cole Nussbaumer Knaflic is an excellent resource for learning how to create effective data visualizations that drive action.

Dashboard Creation

Dashboards combine multiple visualizations into a single interactive view. Python tools for dashboards include:

  • Plotly Dash — Build web-based analytical applications with Python
  • Streamlit — Turn scripts into shareable web apps with minimal code
  • Panel — Create dashboards from Jupyter notebooks
  • Tableau / Power BI — Enterprise-grade BI tools (not Python, but widely used)
Python (Streamlit)
import streamlit as st
import pandas as pd
import plotly.express as px

st.title('Sales Dashboard')

# Load data
df = pd.read_csv('sales.csv')

# Sidebar filters
region = st.sidebar.selectbox('Region', df['region'].unique())
filtered = df[df['region'] == region]

# Metrics
col1, col2, col3 = st.columns(3)
col1.metric('Total Sales', f"${filtered['sales'].sum():,.0f}")
col2.metric('Orders', len(filtered))
col3.metric('Avg Order', f"${filtered['sales'].mean():,.0f}")

# Chart
fig = px.line(filtered, x='date', y='sales')
st.plotly_chart(fig)
Common mistake: Do not overcomplicate your visualizations. A simple, well-labeled bar chart is almost always more effective than a 3D exploded pie chart with 20 slices.