Data Visualization
A great visualization is worth a thousand data points. Learn to create clear, compelling charts with Matplotlib, Seaborn, and Plotly that communicate insights effectively.
Why Visualization Matters
Visualization transforms raw numbers into visual patterns that humans can quickly understand. Good visualizations help you:
- Discover patterns that are invisible in raw data tables
- Communicate findings to stakeholders who may not understand statistics
- Validate assumptions before and after building models
- Tell a story that drives action and decision-making
Choosing the Right Chart
| Chart Type | Best For | Example Use |
|---|---|---|
| Bar Chart | Comparing categories | Sales by region, counts by category |
| Line Chart | Trends over time | Revenue over months, stock prices |
| Scatter Plot | Relationships between two variables | Height vs weight, price vs demand |
| Histogram | Distribution of one variable | Age distribution, exam scores |
| Heatmap | Correlations or matrix data | Feature correlations, time-based patterns |
| Box Plot | Distribution and outliers | Salary ranges by department |
| Pie Chart | Parts of a whole (use sparingly) | Market share, budget allocation |
Matplotlib — The Foundation
Matplotlib is Python's most widely used plotting library. It gives you full control over every element of your chart.
import matplotlib.pyplot as plt import numpy as np # Bar chart categories = ['Q1', 'Q2', 'Q3', 'Q4'] revenue = [45000, 52000, 48000, 61000] plt.figure(figsize=(8, 5)) plt.bar(categories, revenue, color='#4f46e5') plt.title('Quarterly Revenue', fontsize=14) plt.xlabel('Quarter') plt.ylabel('Revenue ($)') plt.tight_layout() plt.show() # Line chart with multiple series months = np.arange(1, 13) product_a = [20, 25, 30, 28, 35, 40, 38, 45, 42, 50, 55, 60] product_b = [15, 18, 22, 20, 25, 28, 30, 35, 33, 38, 40, 45] plt.figure(figsize=(10, 5)) plt.plot(months, product_a, marker='o', label='Product A') plt.plot(months, product_b, marker='s', label='Product B') plt.title('Monthly Sales Comparison') plt.xlabel('Month') plt.ylabel('Units Sold') plt.legend() plt.grid(True, alpha=0.3) plt.show()
Seaborn — Statistical Visualization
Seaborn builds on Matplotlib with beautiful defaults and specialized statistical plots. It integrates directly with Pandas DataFrames.
import seaborn as sns # Distribution plot sns.histplot(df['salary'], bins=30, kde=True) plt.title('Salary Distribution') plt.show() # Box plot by category sns.boxplot(x='department', y='salary', data=df, palette='Set2') plt.title('Salary by Department') plt.xticks(rotation=45) plt.show() # Heatmap of correlations plt.figure(figsize=(10, 8)) sns.heatmap(df.corr(numeric_only=True), annot=True, cmap='RdYlBu_r', center=0) plt.title('Feature Correlations') plt.show() # Scatter with regression line sns.regplot(x='experience', y='salary', data=df) plt.title('Experience vs Salary') plt.show()
Plotly — Interactive Charts
Plotly creates interactive, web-based visualizations. Users can zoom, hover for details, and toggle data series. It is ideal for dashboards and presentations.
import plotly.express as px # Interactive scatter plot fig = px.scatter(df, x='gdp_per_capita', y='life_expectancy', size='population', color='continent', hover_name='country', title='GDP vs Life Expectancy') fig.show() # Interactive line chart fig = px.line(df, x='date', y='revenue', color='product', title='Revenue Trends by Product') fig.show() # Interactive bar chart fig = px.bar(df.groupby('region')['sales'].sum().reset_index(), x='region', y='sales', title='Total Sales by Region') fig.show()
Storytelling with Data
Great data visualization is about telling a story, not just displaying numbers. Follow these principles:
-
Start with the Question
Every chart should answer a specific question. If you cannot state the question, reconsider the visualization.
-
Choose the Right Chart Type
Match your chart to your data and message. Do not use a pie chart for 15 categories or a scatter plot for time series.
-
Remove Clutter
Eliminate unnecessary gridlines, borders, and decorations. Every element should serve a purpose.
-
Use Color Intentionally
Use color to highlight key findings, not to make things "pretty." Ensure accessibility for colorblind viewers.
-
Add Context
Include clear titles, axis labels, legends, and annotations. A chart should be understandable without additional explanation.
Dashboard Creation
Dashboards combine multiple visualizations into a single interactive view. Python tools for dashboards include:
- Plotly Dash — Build web-based analytical applications with Python
- Streamlit — Turn scripts into shareable web apps with minimal code
- Panel — Create dashboards from Jupyter notebooks
- Tableau / Power BI — Enterprise-grade BI tools (not Python, but widely used)
import streamlit as st import pandas as pd import plotly.express as px st.title('Sales Dashboard') # Load data df = pd.read_csv('sales.csv') # Sidebar filters region = st.sidebar.selectbox('Region', df['region'].unique()) filtered = df[df['region'] == region] # Metrics col1, col2, col3 = st.columns(3) col1.metric('Total Sales', f"${filtered['sales'].sum():,.0f}") col2.metric('Orders', len(filtered)) col3.metric('Avg Order', f"${filtered['sales'].mean():,.0f}") # Chart fig = px.line(filtered, x='date', y='sales') st.plotly_chart(fig)
Lilly Tech Systems