Guide#

Plotting packages#

  • matplotlib
  • seaborn
  • plotly
  • d3.js
  • ggplot
  • plotnine
  • altair
  • bqplot
  • https://www.anaconda.com/python-data-visualization-2018-why-so-many-libraries/

Dashboarding#

  • Dash*
  • Bokeh
  • Voila*
  • Panel

Matplotlib#

Matplotlib is the de facto scientific plotting tool for Python users. It adopts the object-oriented framework of Python such that every element of the figure is user customizable. However, this flexibility comes with the cost of a steep learning curve. Matplotlib does provide an extensive gallery of common charts with their code, so it is a simple matter of copy-and-paste for most use cases.

Seaborn#

Seaborn is built on top of matplotlib, such that figures are essentially matplotlib objects. This means users can create beautiful plots without the overhead of matplotlib while preserving its flexibility. Unfortunately, there is a limited number of ready-made charts in Seaborn, but the available ones are easy to use. Note: Seaborn interfaces very well with dataframe as long as you know when to change your data from long form to wide form and vice versa.

Plotly#

Plotly is a graphing language in .js built on top of D3.js (see below) It has wrappers for both Python and R. The aim is to produce beautiful figures with interactivity that can work across platforms. The number of readily-available figures is even smaller than Seaborn, but what it lacks in variety, it compensates with data interactivity. Since Plotly.py is a wrapper afterall, it can't leverage all the features available in D3.js. That said, if you spend the necessary effort learning the wrapper, it does offer a decent level of flexibility. When combined with Dash and chart_studio, plotly figures can be exported online for readers to play and interact with. Plotly Express supports grammar of graphics!

While plotly figures are interactive, they are quite limited in terms of the number and types of interactions. You can zoom in and out of the figures, or hide and unhide traces, or get information from hoover. You can select items from dropdown menu, buttons and slider, and each selection corresponds to a different figure. Each plotly figure is self-contained in that it contains figure data for all possible button selections before the figure is even rendered into a html, therefore creating a large file. In contrast, dash and other dashboard style charts process the raw data only when a selection is made, resulting in longer load times but smaller file.

Plotly guides and references#

Understanding plotly#

There are three parts, a figure, which contains data and layout. Data are made up of different traces. There are multiple ways to define a trace as follow:

# The following codes will generate the same graph. These trace and layout specifications can be either dictionaries or graph objects.
################### Dictionary only
fig = {
    "data": [{"type": "bar",
              "x": [1, 2, 3],
              "y": [1, 3, 2]}],
    "layout": {"title": {"text": "A Bar Chart"}}
}

################### Using a Figure Constructor
fig = go.Figure({
    "data": [{"type": "bar",
              "x": [1, 2, 3],
              "y": [1, 3, 2]}],
    "layout": {"title": {"text": "A Bar Chart"}}
})

################## Using graph objects for data
fig = go.Figure(
    data=[go.Bar(x=[1, 2, 3], y=[1, 3, 2])],
    layout=dict(title=dict(text="A Bar Chart"))
)

################## Using graph objects for data and layout
fig = go.Figure(
    data=[go.Bar(x=[1, 2, 3], y=[1, 3, 2])],
    layout=go.Layout(title=go.layout.Title(text="A Bar Chart")
    )
)

################# Add traces after figure is created (even if the figure has other traces)
fig = go.Figure()
fig.add_trace(go.Bar(x=[1, 2, 3], y=[1, 3, 2]))

################# There are special add_{type} methods to simplify the code
fig = go.Figure()
fig.add_bar(x=[1, 2, 3], y=[1, 3, 2])
fig.show()

################# Check figure representation
fig.to_dict()
fig.to_json()

Once traces and layout has been defined, we can update them as follow:

# How to create subplots?
from plotly.subplots import make_subplots
fig = make_subplots(rows=1, cols=2)
fig.add_scatter(y=[4, 2, 3.5], mode="markers",
                marker=dict(size=20, color="LightSeaGreen"),
                name="a", row=1, col=1)
fig.add_bar(y=[2, 1, 3],
            marker=dict(color="MediumPurple"),
            name="b", row=1, col=1)
fig.add_scatter(y=[2, 3.5, 4], mode="markers",
                marker=dict(size=20, color="MediumPurple"),
                name="c", row=1, col=2)
fig.add_bar(y=[1, 3, 2],
            marker=dict(color="LightSeaGreen"),
            name="d", row=1, col=2)
fig.show()

#=========== Update all traces
fig.update_traces(marker=dict(color="RoyalBlue"))

#=========== Update selected traces
# Note that update will iterate through all traces, and select those that match selector
fig.update_traces(marker=dict(color="RoyalBlue"),
                  selector=dict(type="bar"))

#=========== Using nested magic underscore notation
fig.update_traces(marker_color="RoyalBlue",
                  selector=dict(marker_color="MediumPurple"))

#=========== Selecting a particular subplot
fig.update_traces(marker=dict(color="RoyalBlue"),
                  col=2)

#=========== Overwrite existing trace properties
# By default, update_traces() will only add, not overwrite properties
fig.update_traces(overwrite=True, marker={"opacity": 0.4})
# The following doesn't work because the overwrite is at the market_opacity level, not marker level.
fig.update_traces(overwrite=True, marker_opacity=0.4)

#=========== Change trace properties using iteration
#To change trace properties based on the existing properties, access the traces
#by iterating through them using .for_each_trace()
import pandas as pd
import plotly.express as px
df = px.data.iris()
fig = px.scatter(df, x="sepal_width", y="sepal_length", color="species")

fig.for_each_trace(
    lambda trace: trace.update(hovertemplate = trace.hovertemplate.replace("=", ": ")),
)
fig.show()

#=========== Other Update functions
fig.update_layout_images
fig.update_annotations
fig.update_shapes
fig.update_xaxes
fig.update_yaxes

#=========== Chaining Updates
import plotly.express as px
df = px.data.iris()
fig = px.scatter(df, x="sepal_width", y="sepal_length", color="species",
            facet_col="species", trendline="ols", title="Iris Dataset")

fig.update_layout(title_font_size=24)
   .update_xaxes(showgrid=False)
   .update_traces(line=dict(dash="dot", width=4),
                  selector=dict(type="scatter", mode="lines"))
   .show()

A plotly figure is largely configured by its layout. For most basic charts, you only need to change a few options in Layout. For more intricate plots though, it'll require a good understanding of all the options available. * options for go.Layout * options for go.layout.{objects}

Creating buttons, sliders, dropdown menu etc.#

Generating Vector Plots#

Orca

D3.js#

Have you wondered how Nytimes create beautiful illustrations for their website and apps? They hired the creater of D3.js! D3.js is a rich graphing language that offers a tonne of customizability to create interactive graphics that can be shared and updated easily. Really, you are limited only by your imagination, and coding skills. In fact, it has the steepest learning curve amongst all the graphing software listed here, so I would recommend learning it only if you need to create novel visualization of data that cannot be done by other software, and you know javascript well.

ggplot#

ggplot is based on the grammar of graphics, which is primarily the layering of data. If a dataset has multiple dimensions i.e. characteristics, we first use the xy axis to plot 2 dimension. xyz can be used for 3 dimensional plots, but it limits the other features that can be stacked on top of it. Marker color, opacity, size and shape can provide additional dimensions. Facets, or subplot can add additional dimensions if the plots are attached vertically, horizontally or both. Plots can also change over time if the dataset has a temporal element to it. Despite the many way to add layers, most charts become unnecessarily complicated beyond 4-5 dimensions especially if the chart is uncommon. R implements a version of grammer of graphics via ggplot. Note: There is a Python equivalent known as ggplot.

plotnine#

Based on ggplot2, plotnine is based on the ggplot2 and adopts the grammar of graphics.