There are vast amount of data visualization libraries in Python. Here take note of a few.
Pandas, matplotlib, and Jupyter
To make things simple, the data to plot is the distribution of mean and standard deviation of the sum of two normally distributed and correlated random variables in different weight. This is how we generate the data (50 data points):
import numpy as np import pandas as pd rho = -0.8 # correlation mu1,sigma1 = 0.1,0.15 mu2,sigma2 = 0.3,0.45 def blend(mu1, sigma1, mu2, sigma2, rho): '''Assume RVs X1 ~ N(mu1,sigma1) and X2 ~ N(mu2,sigma2), return X = w1*X1+w2*X2 ~ N(mu,sigma) for w1+w2=1, w1 in [0,1]''' w1 = np.arange(0,1.001, 0.02) w2 = 1-w1 mu = mu1*w1 + mu2*w2 sigma = np.sqrt(sigma1*sigma1*w1*w1 + 2*rho*sigma1*sigma2*w1*w2 + sigma2*sigma2*w2*w2) return mu, sigma mu,sigma = blend(mu1, sigma1, mu2, sigma2, rho) df = pd.DataFrame(data=dict(mu=mu, sigma=sigma))
The \((x,y)\) coordinates are in vectors
sigma, as well as combined
into a dataframe
pandas can plot the data directly, using
df.plot(x='mu', y='sigma', kind='scatter')
If plotting this in Jupyter, we need to issue this magic in the first line of the Python code:
inline is for static charts. We can use
(if appropriate Jupyter plugins are installed) to get interactive charts.
On the other hand, we can also invoke
matplotlib directly to plot:
import matplotlib.pyplot as plt plt.scatter(mu,sigma) # plot for line plot plt.ylabel('sigma') plt.xlabel('mu') plt.show()
The bad thing about
matplotlib is its verbosity. It takes too much polishing
to make the graph appealing. The plotting function in
pandas is a rather thin
wrapper to it.
Seaborn is also a wrapper to
matplotlib but it make the default visualization
more appealing. For example:
import seaborn as sns sns.regplot(mu, sigma, fit_reg=False) # fit_reg: do not do linear regression
In the above example, seaborn by default will give the linear regression line together with scatter plot. This is what seaborn does in setting up the default: make common things easier to use.
ggplot is also on top of
matplotlib to improve the
visual appeal. It is a port of
ggplot2 form R, therefore its API is not pythonic.
In my use, I found that
ggplot uses deprecated functions from
need to replace
pandas.Timestamp in a few
places after I installed the python module.
import ggplot as gg p = gg.ggplot(df, gg.aes(x="mu", y="sigma")) \ + gg.geom_point() \ + gg.xlab("mu") \ + gg.ylab("sigma") print(p)
Bokeh is an open-source package for browsers. The graph is produced in HTML and it will be interactive by default.
import bokeh.plotting as bk from bokeh.io import output_notebook output_notebook() # need this for jupyter inline display # jupyterlab: need `jupyter labextension install jupyterlab_bokeh` p = bk.figure(plot_width=400, plot_height=400) p.circle(mu,sigma,size=2) bk.show(p)
Using bokeh in Jupyter notebook needs to have the Jupyter extension installed
and execute the
output_notebook() call before plotting. The extension is installed with
jupyter labextension install jupyterlab_bokeh
pygal is also a library for HTML-based chartting. It produces charts in SVG format. To produce a chart in Jupyter notebook, we use the following:
In order to display the SVG chart in Jupyter, we need to make use of
IPython.display and embed the SVG code into a HTML template, as above.
This is the proprietry library. It has its own documentation and it needs to have its extension installed for Jupyter to work:
jupyter labextension install @jupyterlab/plotly-extension
The code to use
import plotly.offline as po import plotly.graph_objs as go data = go.Data([ go.Scatter( x=mu, y=sigma, showlegend=False, mode='markers' ) ]) layout = go.Layout( xaxis = go.XAxis(title = "mu", tickangle=45), yaxis = go.YAxis(title = "sigma") ) po.iplot(dict(data=data, layout=layout))
And the chart is by default interactive.
Altair is new and exciting. It supports Vega! (may be indeed the more concise form, Vega-lite)
import altair as alt alt.Chart(df).mark_point().encode(x='mu',y='sigma')