Stacked bar chart with 3 variables python

So I'm struggling to solve a basic problem with some of the data I have and I can't seem to get round it. I have this data table

Amino AcidAgaricus bisporusAspergillus nidulansBipolaris maydis
CYS 0 0 0
ASP 0 0 0
GLU 0 0 0
PHE 0 0 0
GLY 0 0 0
HIS 0 0 0
ILE 0 0 0
LYS 10 7 16
LEU 0 0 0
MET 0 0 0
ASN 9 15 15
PRO 0 0 0
GLN 11 13 4
ARG 13 16 21
SER 11 13 8
THR 9 11 9
VAL 0 0 0
TRP 8 7 6
TYR 9 6 7

I can't for the life of me figure out how to convert this into a stacked bar chart that looks like this.

Stacked bar chart with 3 variables python
I need the colours to represent the different Amino acids. Honestly I've been awake for 30 hours at this point so any help would be appreciated.

I've tried to convert to a long format however that still creates the same issue as before. When I use the default plot setting this is what I get

Stacked bar chart with 3 variables python

asked Mar 28 at 22:42

Stacked bar chart with 3 variables python

1

For a stacked barplot via pandas, each of the columns will be converted to a layer of bars. The index of the dataframe will be used as the x-axis.

In the given dataframe, you seem to want the columns for the x-axis. Using .T to transpose the dataframe (exchanging rows and columns), will help. First, you'll need to set the amino acids as index.

import matplotlib.pyplot as plt
import pandas as pd

df = pd.read_html('https://stackoverflow.com/questions/71654486/stacked-bar-chart-with-multiple-variables-in-python')[0]

ax = df.set_index('Amino Acid').T.plot.bar(stacked=True, rot=0, cmap='tab20', figsize=(10, 7))
ax.legend(bbox_to_anchor=(1.01, 1.02), loc='upper left')
plt.tight_layout()
plt.show()

Stacked bar chart with 3 variables python

answered Mar 28 at 23:44

Stacked bar chart with 3 variables python

JohanCJohanC

61.9k8 gold badges25 silver badges52 bronze badges

1

An excellent way to visualize proportions and composition

Bar charts are by far my favourite visualization technique. They are very versatile, usually easy to read, and relatively straightforward to build.

Stacked bar chart with 3 variables python

Stacked Bar Chart Example — Image by Author

Just like any visualization, they do have some disadvantages as well. For example, they struggle with scalability.

Too many bars in a bar chart make it confusing and hard to read. That is more than ordinary when we’re working with hierarchical categories — In other words, when we have groups and subgroups that we need to visualize.

Stacked bar chart with 3 variables python

Clustered Bar Chart Example — Image by Author

Stacked bars are a great alternative in those cases, allowing us to compare and analyze those groups' composition.

Stacked bar chart with 3 variables python

100% Stacked Bar Chart Example — Image by Author

In this article, we’ll explore how to build those visualizations with Python’s Matplotlib.

I’ll be using a simple dataset that holds data on video game copies sold worldwide. The dataset is quite outdated, but it’s suitable for the following examples.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Let’s read and get a look at it.

df = pd.read_csv('../data/vgsales.csv')
df.head()

Stacked bar chart with 3 variables python

First five rows of data — Image by Author

I want to visualize the total number of copies sold by platform and analyze the regions where they were sold.

Having the regions already separated into columns helps a lot; we only need to group the records by ‘Platform’ and sum the values from NA_Sales to Global_Sales.

Groupby → Sum → Select Fields

df_grouped = df.groupby('Platform').sum()[['NA_Sales','EU_Sales','JP_Sales','Other_Sales', 'Global_Sales']]df_grouped

Stacked bar chart with 3 variables python

Some of the records on the data frame — Image by Author

That is too many values; even considering the empty records, there will be too many bars in our chart.

Let’s plot a bar for each platform and region and get a look at the result.

# define figure
fig, ax = plt.subplots(1, figsize=(16, 6))
# numerical x
x = np.arange(0, len(df_grouped.index))
# plot bars
plt.bar(x - 0.3, df_grouped['NA_Sales'], width = 0.2, color = '#1D2F6F')
plt.bar(x - 0.1, df_grouped['EU_Sales'], width = 0.2, color = '#8390FA')
plt.bar(x + 0.1, df_grouped['JP_Sales'], width = 0.2, color = '#6EAF46')
plt.bar(x + 0.3, df_grouped['Other_Sales'], width = 0.2, color = '#FAC748')
# remove spines
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
# x y details
plt.ylabel('Millions of copies')
plt.xticks(x, df_grouped.index)
plt.xlim(-0.5, 31)
# grid lines
ax.set_axisbelow(True)
ax.yaxis.grid(color='gray', linestyle='dashed', alpha=0.2)
# title and legend
plt.title('Video Game Sales By Platform and Region', loc ='left')
plt.legend(['NA', 'EU', 'JP', 'Others'], loc='upper left', ncol = 4)
plt.show()

Stacked bar chart with 3 variables python

Clustered Bar Chart — Image by Author

As expected, the chart is hard to read. Let’s try the stacked bar chart and add a few adjustments.

First, we can sort the values before plotting, giving us a better sense of order and making it easier to compare the bars. We’ll do so with the ‘Global Sales’ column since it has the total.

## sort values
df_grouped = df_grouped.sort_values('Global_Sales')
df_grouped

Stacked bar chart with 3 variables python

Some of the records on the data frame — Image by Author

Earlier, to build a clustered bar chart, we used a plot for each region where the width parameter and adjustments in the x-axis helped us fit each platform's four areas.

Similarly, for plotting stack bar charts, we’ll use a plot for each region. This time we’ll use the bottom/left parameter to tell Matplotlib what comes before the bars we’re drawing.

plt.bar([1,2,3,4], [10,30,20,5])
plt.bar([1,2,3,4], [3,4,5,6], bottom = [10,30,20,5])
plt.show()
plt.barh([1,2,3,4], [10,30,20,5])
plt.barh([1,2,3,4], [3,4,5,6], left = [10,30,20,5])
plt.show()

Stacked bar chart with 3 variables python

Stacked Bar Charts (Vertical/ Horizontal) — Image by Author

Cool. We can use a loop to plot the bars, passing a list of zeros for the ‘bottom’ parameter in the first set and accumulating the following values for the next regions.

fields = ['NA_Sales','EU_Sales','JP_Sales','Other_Sales']
colors = ['#1D2F6F', '#8390FA', '#6EAF46', '#FAC748']
labels = ['NA', 'EU', 'JP', 'Others']
# figure and axis
fig, ax = plt.subplots(1, figsize=(12, 10))
# plot bars
left = len(df_grouped) * [0]
for idx, name in enumerate(fields):
plt.barh(df_grouped.index, df_grouped[name], left = left, color=colors[idx])
left = left + df_grouped[name]
# title, legend, labels
plt.title('Video Game Sales By Platform and Region\n', loc='left')
plt.legend(labels, bbox_to_anchor=([0.55, 1, 0, 0]), ncol=4, frameon=False)
plt.xlabel('Millions of copies of all games')
# remove spines
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)
# adjust limits and draw grid lines
plt.ylim(-0.5, ax.get_yticks()[-1] + 0.5)
ax.set_axisbelow(True)
ax.xaxis.grid(color='gray', linestyle='dashed')
plt.show()

Stacked bar chart with 3 variables python

Stacked Bar Chart — Image by Author

Great, this is way more readable than the last one.

It’s important to remember the purpose of this chart before trying to extract any insights. The idea here is to compare the platforms' total sales and understand each platform's composition.

Comparing totals across fields and comparing regions inside one bar is ok. Comparing regions from different bars, on the other hand, can be very misleading.

In this case, we can compare the NA region across the bars since it has the same starting point for every bar, but it isn't so easy to compare the others. Take the X360, for example, it has a lower value for JP than the PS2, but it’s hard to compare if the Others value is higher or lower than the Wii.

Stacked bar chart with 3 variables python

Comparable value — Image by Author

Stacked bar chart with 3 variables python

Uncomparable value — Image by Author

Suppose we change the stack's order, with Other Sales as the first bar, and sort the records by Other Sales. It should be easier to tell which is more significant.

## sort values
df_grouped = df_grouped.sort_values('Other_Sales')
fields = ['Other_Sales', 'NA_Sales','EU_Sales','JP_Sales']
colors = ['#1D2F6F', '#8390FA', '#6EAF46', '#FAC748']
labels = ['Others', 'NA', 'EU', 'JP']

Stacked bar chart with 3 variables python

Stacked Bar Chart, emphasizing the Others category — Image by Author

There are two essential elements in this visualization, the order of the categories in the stack of bars and the rows' order.

If we want to emphasize one region, we can sort the records with the chosen field and use it as the left-most bar.

If we don’t, we can sort the records by the total and order the stacks with the categories that have higher values first.

Stacked bar charts are excellent for comparing categories and visualizing their composition, but we can take even more advantage of them.

We can focus on displaying parts of a whole. To achieve that we’ll have to prepare our data and calculate the proportion of sales for each region.

fields = ['Other_Sales', 'NA_Sales','EU_Sales','JP_Sales']
df_grouped = df.groupby('Platform').sum()
# in some cases global sales is not equal to the sum of all regions
# so I'll re calculate it
df_grouped['Global_Sales'] = df_grouped[fields].sum(axis=1)
# create a column for each regions proportion of global sales
for i in fields:
df_grouped['{}_Percent'.format(i)] = df_grouped[i] / df_grouped['Global_Sales']
df_grouped.sort_values('NA_Sales_Percent', inplace=True)
df_grouped

Stacked bar chart with 3 variables python

Some of the records on the data frame — Image by Author

Now we can pretty much repeat what we did earlier with some small tweaks. Let’s also get our code into a method so we can reuse it.

# variables
labels = ['NA', 'EU', 'JP', 'Others']
colors = ['#1D2F6F', '#8390FA', '#6EAF46', '#FAC748']
title = 'Video Game Sales By Platform and Region\n'
subtitle = 'Proportion of Games Sold by Region'
def plot_stackedbar_p(df, labels, colors, title, subtitle):
fields = df.columns.tolist()

# figure and axis
fig, ax = plt.subplots(1, figsize=(12, 10))

# plot bars
left = len(df) * [0]
for idx, name in enumerate(fields):
plt.barh(df.index, df[name], left = left, color=colors[idx])
left = left + df[name]
# title and subtitle
plt.title(title, loc='left')
plt.text(0, ax.get_yticks()[-1] + 0.75, subtitle)
# legend
plt.legend(labels, bbox_to_anchor=([0.58, 1, 0, 0]), ncol=4, frameon=False)
# remove spines
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)
# format x ticks
xticks = np.arange(0,1.1,0.1)
xlabels = ['{}%'.format(i) for i in np.arange(0,101,10)]
plt.xticks(xticks, xlabels)
# adjust limits and draw grid lines
plt.ylim(-0.5, ax.get_yticks()[-1] + 0.5)
ax.xaxis.grid(color='gray', linestyle='dashed')
plt.show()

plot_stackedbar_p(df_filter, labels, colors, title, subtitle)

Stacked bar chart with 3 variables python

100% Stacked Bar Chart — Image by Author

That’s a great way to visualize the proportion of sales for each region. It’s also easier to compare the Others category since all the bars end at the same point.

In my opinion, visualizing proportion with 100% stacked bar charts looks even better when we have only two categories. Since we have a fixed start for the first and a fixed end for the other, it’s effortless to visualize the differences and compare the values.

Lastly, let’s try building a stacked bar chart with both positive and negative values.

We’ll create a dummy data frame for this example.

# lists
sales_revenue = [1230, 1240, 1170, 1050, 1380, 1480, 1400, 1410, 1360, 1415, 1530]
interest_revenue = [150, 155, 159, 176, 290, 240, 195, 146, 180, 182, 210]
fixed_costs = [-810, -810, -815, -815, -780, -780, -750, -750, -750, -770, -910]
variable_costs =[-380, -410, -415, -370, -520, -655, -715, -670, -515, -510, -420]
# lists to dict
my_dict = {'sales_revenue': sales_revenue, 'interest_revenue': interest_revenue,
'fixed_costs': fixed_costs, 'variable_costs': variable_costs}
# dict to df
result_df = pd.DataFrame(my_dict)
result_df

Stacked bar chart with 3 variables python

Dummy Data Frame — Image by Author

The plan is to have a positive bar, divided into sales and interest revenue, and a negative bar, divided into fixed and variable costs, for each month.

We want interest_revenue on top, so we use sales_revenue as the ‘bottom’ argument when plotting.

We want fixed_costs at the top for the negative values, but instead, we’ll plot variable_costs with fixed_costs as the ‘bottom’ argument.

plt.bar(result_df.index, result_df['interest_revenue'], bottom = result_df['sales_revenue'], color = '#5E96E9', width =0.5)plt.bar(result_df.index, result_df['variable_costs'], bottom = result_df['fixed_costs'], color = '#E17979', width =0.5)

Stacked bar chart with 3 variables python

Upper and Lower ends of the stacked bar chart — Image by Author

Again, the rest is very similar to what we already did.

fig, ax = plt.subplots(1, figsize=(16, 8))plt.bar(result_df.index, result_df['sales_revenue'], color = '#337AE3', width =0.5)plt.bar(result_df.index, result_df['interest_revenue'], bottom = result_df['sales_revenue'], color = '#5E96E9', width =0.5)plt.bar(result_df.index, result_df['fixed_costs'], color = '#DB4444', width =0.5)plt.bar(result_df.index, result_df['variable_costs'], bottom = result_df['fixed_costs'], color = '#E17979', width =0.5)# x and y limits
plt.xlim(-0.6, 10.5)
plt.ylim(-1600, 2000)
# remove spines
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)
#grid
ax.set_axisbelow(True)
ax.yaxis.grid(color='gray', linestyle='dashed', alpha=0.7)
# x ticks
xticks_labels = ['Jan', 'Fev', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov']
plt.xticks(result_df.index , labels = xticks_labels)
# title and legend
legend_label = ['Sales Revenue', 'Interest Revenue', 'Variable Costs', 'Fixed Costs']
plt.legend(legend_label, ncol = 4, bbox_to_anchor=([1, 1.05, 0, 0]), frameon = False)
plt.title('My Company - 2020\n', loc='left')
plt.show()

Stacked bar chart with 3 variables python

Stacked bar chart with positive and negative values — Image by Author

That’s it. We stacked many bars and tried different applications for this convenient visualization technique.

Thanks for reading my article!
More Tutorials|Twitter