Which Python libraries can you use for data visualization? In which cases do you use each of them?
Matplotlib
for highly customizable plots.Seaborn
for automated plots based on Matplotlib.Plotly
for interactive plots.
How do you import these libraries into the Python environment?
You must import a sub-library containing the main plotting functions for some of them.
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
Enough theory! Let’s look at some examples using the same plot for each library.
Data
Using the famous gapminder
dataset, we will plot data
from countries in 2007.
import pandas as pd
df = pd.read_excel('gapminder.xlsx')
![Basic raw data plot of gapminder dataset using Python visualization libraries](https://images.datons.ai/DATAVIZ-01/D_raw.png)
Scatter plot
Matplotlib
Matplotlib
creates a simple scatter without labels on
the axes.
plt.scatter(x='gdpPercap', y='lifeExp', data=df)
![Simple scatter plot created with Matplotlib showing GDP per capita vs life expectancy](https://images.datons.ai/DATAVIZ-01/output_12_0.png)
Seaborn
On the other hand, seaborn
creates a scatter plot with
labels on the axes. And even adds some borders to the points.
sns.scatterplot(x='gdpPercap', y='lifeExp', data=df)
![Scatter plot with axes labels and styled points, created using Seaborn](https://images.datons.ai/DATAVIZ-01/output_14_0.png)
Now, what if you wanted to color the points by continent?
With matplotlib
, you get a fatal error because it
expects a column with names of colors, not a column with names of
continents.
plt.scatter(x='gdpPercap', y='lifeExp', data=df, c='continent')
# ValueError: Invalid RGBA argument: 'Asia'
On the other hand, seaborn
automates creating a scatter
plot with a color-coded legend for the continent
column,
using the hue
parameter.
sns.scatterplot(x='gdpPercap', y='lifeExp', data=df, hue='continent')
![Color-coded scatter plot by continent using Seaborn's hue parameter](https://images.datons.ai/DATAVIZ-01/output_18_0.png)
Now, how can you identify the country represented by each point?
This process is not automatic when using matplotlib
, nor
is it when using seaborn
.
You’d need to create a horrible for loop to add the country names to the points, ending up with a cluttered plot.
sns.scatterplot(x='gdpPercap', y='lifeExp', data=df, hue='continent')
for idx, data in df.iterrows():
plt.text(x=data['gdpPercap'], y=data['lifeExp'], s=data['country'])
![Seaborn scatter plot with overlaid country names creating a cluttered appearance](https://images.datons.ai/DATAVIZ-01/output_20_0.png)
In addition to these tutorials, I teach live courses so you don’t waste time or get frustrated while learning; you’ll have an expert to guide and solve your doubts in real-time.
If you’re interested, check the courses’ syllabus.
Yes, you can combine seaborn
with
matplotlib
to add new elements to the plot because
seaborn
is built on top of matplotlib
.
Also, you can add new elements to the plot using
matplotlib
functions in a new line of code—for example, a
title, labels, or a legend.
sns.scatterplot(x='gdpPercap', y='lifeExp', data=df, hue='continent')
for idx, data in df.iterrows():
plt.text(x=data['gdpPercap'], y=data['lifeExp'], s=data['country'])
plt.title('Cluttered plot')
![Highly cluttered scatter plot with country names and a title, showcasing the drawbacks of excessive labeling](https://images.datons.ai/DATAVIZ-01/output_22_0.png)
How can we declutter the mess of country names?
Plotly
Instead of adding the country names to the plot, you can hover over the points to see the country names.
px.scatter(df, x='gdpPercap', y='lifeExp', color='continent', hover_name='country')
![Interactive Plotly scatter plot with hoverable points showing country names](https://images.datons.ai/DATAVIZ-01/plotly.png)
I love plotly
because it’s interactive, and you can zoom
in, zoom out, click on the legend to hide some countries, and even save
the plot as an image.
But don’t get me wrong, plotly
is not perfect.
If I wanted a highly customized plot, I’d use matplotlib
because it’s more flexible, and you can control every plot detail.
These three libraries contain the necessary functions to create any other type of plot. Just choose the one that best suits your needs.
To learn more about these libraries, visit the official documentation:
Visit their examples gallery to see what you can do with them and replicate the plots in your projects with your data.
Conclusions
- If you are a beginner, use
plotly
for simple plots. - If you are an intermediate user, use
seaborn
withmatplotlib
for more complex plots. - If you are an advanced user, use
matplotlib
for highly customizable plots.
Further questions? Guidance? Suggestions for the next tutorial? It'll be my pleasure to discuss them in the comments below.