~ 4 min read

Sunburst Plots with Seaborn: Visualizing Hierarchical Data

By: Adam Richardson
Share:

Sunburst Plots with Seaborn: Visualizing Hierarchical Data

Introduction

Sunburst plots are a powerful tool for visualizing hierarchical and multi-level data, especially when working with large and complex datasets. In this article, we’ll walk you through creating sunburst plots with Seaborn, a popular Python data visualization library. These plots allow you to explore the relationships and dependencies between different data categories, helping professionals gain insights for data analysis and decision-making.

Properties and Parameters of Sunburst Plots

Sunburst plots are created in Seaborn using the clustermap() function, which generates hierarchically-clustered heatmaps. Although not a dedicated sunburst plot function, it’s highly configurable and can be tailored to create sunburst-like visualizations. Here are some important parameters:

  • data: The input data in a Pandas DataFrame format.
  • pivot_kws: A dictionary of keyword arguments to pass to the pivot function, which helps create a multi-index on the resulting DataFrame.
  • method: The linkage method to use for calculating distances between the data points. Options include ‘single’, ‘complete’, ‘average’, ‘weighted’, ‘centroid’, ‘median’, ‘ward’.
  • metric: The distance metric to use for the pairwise data. Can be ‘correlation’, ‘euclidean’, ‘hamming’, etc.
  • z_score: Standardize the data by subtracting the mean and dividing by the standard deviation. If set to 0 or 1, it will apply normalization along the respective axis.
  • standard_scale: Standardize the data by subtracting the minimum and dividing by the range (max-min). If set to 0 or 1, it will apply normalization along the respective axis.
  • col_colors, row_colors: Color mappings for the columns or rows. If specified, it should be a DataFrame or a Series.
  • linewidths, linecolor: Line properties for separating the heatmap cells.
  • cmap: The colormap to use for the heatmap cells.
  • xticklabels, yticklabels: Configure the tick labels along the x and y axis.

A Simplified Real-life Example

Let’s say you want to visualize the sales data of an electronics store. You have a dataset comprising items, their categories, subcategories, and the number of units sold.

import pandas as pd
import seaborn as sns

# Sample data
data = {
    'Category': ['Phones', 'Computers', 'Phones', 'Accessories', 'Computers'],
    'Subcategory': ['Smartphones', 'Laptops', 'Tablets', 'Cables', 'Desktops'],
    'Item': ['iPhone', 'MacBook Pro', 'iPad', 'USB-C Cable', 'iMac'],
    'Units Sold': [250, 50, 180, 500, 35],
}

# Create a DataFrame
df = pd.DataFrame(data)
df = df.pivot_table(index=['Category', 'Subcategory'], columns='Item', values='Units Sold', fill_value=0)

# Create a sunburst plot with Seaborn's clustermap function
sns.clustermap(df, cmap="coolwarm", linewidths=1, linecolor="grey", standard_scale=1)

In this example, we first create a Pandas DataFrame, then pivot it to have a multi-level index based on the Category and Subcategory columns. Finally, we use Seaborn’s clustermap() function to create a sunburst-like visualization, standardizing the data for better interpretation.

A Complex Real-life Example

Consider a more complex dataset comprising sales data of various items across different regions, categories, and subcategories over multiple years.

import numpy as np

# Simulate a large and complex dataset
np.random.seed(42)
region_list = ['North', 'South', 'East', 'West']
category_list = ['Phones', 'Computers', 'Accessories']
subcategory_list = ['Smartphones', 'Laptops', 'Tablets', 'Cables', 'Desktops']
years_list = list(range(2010, 2021))

data = {
    'Region': np.random.choice(region_list, 1000),
    'Category': np.random.choice(category_list, 1000),
    'Subcategory': np.random.choice(subcategory_list, 1000),
    'Year': np.random.choice(years_list, 1000),
    'Units Sold': np.random.randint(1, 500, 1000),
}

df = pd.DataFrame(data)
df = df.pivot_table(index=['Region', 'Category', 'Subcategory'],
                    columns='Year', values='Units Sold', aggfunc=np.sum, fill_value=0)

sns.clustermap(df, cmap="coolwarm", linewidths=1, linecolor="grey", standard_scale=1, figsize=(10, 10))

In this example, we generate synthetic data using NumPy and create a multi-level index DataFrame. We then use the clustermap() function to visualize the relationships and dependencies between regions, categories, subcategories, and years.

Personal Tips for Sunburst Plots

  • When working with large datasets, avoid overwhelming the visualization with too many categories, subcategories, or layers.
  • Use meaningful colormaps that highlight differences in data values while maintaining readability.
  • Experiment with different distance metrics and linkage methods to find the best representation suited to your dataset and analysis goals.
  • Adjust the figure size to ensure all labels are legible and the structure is clear.
  • Standardize your data using z_score or standard_scale parameters for better interpretation of the relationships between categories.
Subscribe to our newsletter

Stay up to date with our latest content - No spam!

Related Posts