There are vast amount of data visualization libraries in Python. Here take note of a few.

# Pandas, matplotlib, and Jupyter

To make things simple, the data to plot is the distribution of mean and standard deviation of the sum of two normally distributed and correlated random variables in different weight. This is how we generate the data (50 data points):

import numpy as np
import pandas as pd

rho = -0.8 # correlation
mu1,sigma1 = 0.1,0.15
mu2,sigma2 = 0.3,0.45

def blend(mu1, sigma1, mu2, sigma2, rho):
'''Assume RVs X1 ~ N(mu1,sigma1) and X2 ~ N(mu2,sigma2),
return X = w1*X1+w2*X2 ~ N(mu,sigma) for w1+w2=1, w1 in [0,1]'''
w1 = np.arange(0,1.001, 0.02)
w2 = 1-w1
mu = mu1*w1 + mu2*w2
sigma = np.sqrt(sigma1*sigma1*w1*w1 + 2*rho*sigma1*sigma2*w1*w2 + sigma2*sigma2*w2*w2)
return mu, sigma

mu,sigma = blend(mu1, sigma1, mu2, sigma2, rho)
df = pd.DataFrame(data=dict(mu=mu, sigma=sigma))


The $$(x,y)$$ coordinates are in vectors mu and sigma, as well as combined into a dataframe df. Python pandas can plot the data directly, using matplotlib:

df.plot(x='mu', y='sigma', kind='scatter') If plotting this in Jupyter, we need to issue this magic in the first line of the Python code:

%matplotlib inline


which inline is for static charts. We can use notebook or ipympl instead (if appropriate Jupyter plugins are installed) to get interactive charts.

On the other hand, we can also invoke matplotlib directly to plot:

import matplotlib.pyplot as plt
plt.scatter(mu,sigma) # plot for line plot
plt.ylabel('sigma')
plt.xlabel('mu')
plt.show() The bad thing about matplotlib is its verbosity. It takes too much polishing to make the graph appealing. The plotting function in pandas is a rather thin wrapper to it.

# Seaborn

Seaborn is also a wrapper to matplotlib but it make the default visualization more appealing. For example:

import seaborn as sns
sns.regplot(mu, sigma, fit_reg=False) # fit_reg: do not do linear regression In the above example, seaborn by default will give the linear regression line together with scatter plot. This is what seaborn does in setting up the default: make common things easier to use.

# ggplot

ggplot is also on top of matplotlib to improve the visual appeal. It is a port of ggplot2 form R, therefore its API is not pythonic.

In my use, I found that ggplot uses deprecated functions from pandas. I need to replace pandas.tslib.Timestamp into pandas.Timestamp in a few places after I installed the python module.

import ggplot as gg
p = gg.ggplot(df, gg.aes(x="mu", y="sigma")) \
+ gg.geom_point() \
+ gg.xlab("mu") \
+ gg.ylab("sigma")
print(p) # bokeh

Bokeh is an open-source package for browsers. The graph is produced in HTML and it will be interactive by default.

import bokeh.plotting as bk
from bokeh.io import output_notebook
output_notebook() # need this for jupyter inline display
# jupyterlab: need jupyter labextension install jupyterlab_bokeh
p = bk.figure(plot_width=400, plot_height=400)
p.circle(mu,sigma,size=2)
bk.show(p) Using bokeh in Jupyter notebook needs to have the Jupyter extension installed and execute the output_notebook() call before plotting. The extension is installed with

jupyter labextension install jupyterlab_bokeh


# pygal

pygal is also a library for HTML-based chartting. It produces charts in SVG format. To produce a chart in Jupyter notebook, we use the following:

import pygal
from IPython.display import HTML
chart = pygal.XY()
htmltemplate = '''
<!DOCTYPE html>
<html>
<script type="text/javascript" src="http://kozea.github.com/pygal.js/javascripts/svg.jquery.js"></script>
<script type="text/javascript" src="http://kozea.github.com/pygal.js/javascripts/pygal-tooltips.js"></script>
<body>
<figure>
{pygal_render}
</figure>
</body>
</html>
'''
HTML(htmltemplate.format(pygal_render=chart.render())) In order to display the SVG chart in Jupyter, we need to make use of HTML() from IPython.display and embed the SVG code into a HTML template, as above.

# Plot.ly

This is the proprietry library. It has its own documentation and it needs to have its extension installed for Jupyter to work:

jupyter labextension install @jupyterlab/plotly-extension


The code to use plot.ly is:

import plotly.offline as po
import plotly.graph_objs as go

data = go.Data([
go.Scatter(
x=mu,
y=sigma,
showlegend=False,
mode='markers'
)
])
layout = go.Layout(
xaxis = go.XAxis(title = "mu", tickangle=45),
yaxis = go.YAxis(title = "sigma")
)
po.iplot(dict(data=data, layout=layout))


! And the chart is by default interactive.

# Altair

Altair is new and exciting. It supports Vega! (may be indeed the more concise form, Vega-lite)

import altair as alt
alt.Chart(df).mark_point().encode(x='mu',y='sigma') 