Stacked bar chart with 3 variables python
So I'm struggling to solve a basic problem with some of the data I have and I can't seem to get round it. I have this data table
I can't for the life of me figure out how to convert this into a stacked bar chart that looks like this. I need the colours to represent the different Amino acids. Honestly I've been awake for 30 hours at this point so any help would be appreciated.I've tried to convert to a long format however that still creates the same issue as before. When I use the default plot setting this is what I get asked Mar 28 at 22:42
1 For a stacked barplot via pandas, each of the columns will be converted to a layer of bars. The index of the dataframe will be used as the x-axis. In the given dataframe, you seem to want the columns for the
x-axis. Using
answered Mar 28 at 23:44
JohanCJohanC 61.9k8 gold badges25 silver badges52 bronze badges 1 Bar charts are by far my favourite visualization technique. They are very versatile, usually easy to read, and relatively straightforward to build. Just like any visualization, they do have some disadvantages as well. For example, they struggle with scalability. Too many bars in a bar chart make it confusing and hard to read. That is more than ordinary when we’re working with hierarchical categories — In other words, when we have groups and subgroups that we need to visualize. Clustered Bar Chart Example — Image by AuthorStacked bars are a great alternative in those cases, allowing us to compare and analyze those groups' composition. 100% Stacked Bar Chart Example — Image by AuthorIn this article, we’ll explore how to build those visualizations with Python’s Matplotlib. I’ll be using a simple dataset that holds data on video game copies sold worldwide. The dataset is quite outdated, but it’s suitable for the following examples. import numpy as np Let’s read and get a look at it. df = pd.read_csv('../data/vgsales.csv')
First five rows of data — Image by AuthorI want to visualize the total number of copies sold by platform and analyze the regions where they were sold. Having the regions already separated into columns helps a lot; we only need to group the records by ‘Platform’ and sum the values from NA_Sales to Global_Sales. Groupby → Sum → Select Fields df_grouped = df.groupby('Platform').sum()[['NA_Sales','EU_Sales','JP_Sales','Other_Sales', 'Global_Sales']]df_groupedSome of the records on the data frame — Image by Author That is too many values; even considering the empty records, there will be too many bars in our chart. Let’s plot a bar for each platform and region and get a look at the result. # define figureClustered Bar Chart — Image by Author As expected, the chart is hard to read. Let’s try the stacked bar chart and add a few adjustments. First, we can sort the values before plotting, giving us a better sense of order and making it easier to compare the bars. We’ll do so with the ‘Global Sales’ column since it has the total. ## sort values Some of the records on the data frame — Image by AuthorEarlier, to build a clustered bar chart, we used a plot for each region where the width parameter and adjustments in the x-axis helped us fit each platform's four areas. Similarly, for plotting stack bar charts, we’ll use a plot for each region. This time we’ll use the bottom/left parameter to tell Matplotlib what comes before the bars we’re drawing. plt.bar([1,2,3,4], [10,30,20,5])Stacked Bar Charts (Vertical/ Horizontal) — Image by Author Cool. We can use a loop to plot the bars, passing a list of zeros for the ‘bottom’ parameter in the first set and accumulating the following values for the next regions. fields = ['NA_Sales','EU_Sales','JP_Sales','Other_Sales']Stacked Bar Chart — Image by Author Great, this is way more readable than the last one. It’s important to remember the purpose of this chart before trying to extract any insights. The idea here is to compare the platforms' total sales and understand each platform's composition. Comparing totals across fields and comparing regions inside one bar is ok. Comparing regions from different bars, on the other hand, can be very misleading. In this case, we can compare the NA region across the bars since it has the same starting point for every bar, but it isn't so easy to compare the others. Take the X360, for example, it has a lower value for JP than the PS2, but it’s hard to compare if the Others value is higher or lower than the Wii. Comparable value — Image by AuthorUncomparable value — Image by AuthorSuppose we change the stack's order, with Other Sales as the first bar, and sort the records by Other Sales. It should be easier to tell which is more significant. ## sort valuesStacked Bar Chart, emphasizing the Others category — Image by Author There are two essential elements in this visualization, the order of the categories in the stack of bars and the rows' order. If we want to emphasize one region, we can sort the records with the chosen field and use it as the left-most bar. If we don’t, we can sort the records by the total and order the stacks with the categories that have higher values first. Stacked bar charts are excellent for comparing categories and visualizing their composition, but we can take even more advantage of them. We can focus on displaying parts of a whole. To achieve that we’ll have to prepare our data and calculate the proportion of sales for each region. fields = ['Other_Sales', 'NA_Sales','EU_Sales','JP_Sales']Some of the records on the data frame — Image by Author Now we can pretty much repeat what we did earlier with some small tweaks. Let’s also get our code into a method so we can reuse it. # variables100% Stacked Bar Chart — Image by Author That’s a great way to visualize the proportion of sales for each region. It’s also easier to compare the Others category since all the bars end at the same point. In my opinion, visualizing proportion with 100% stacked bar charts looks even better when we have only two categories. Since we have a fixed start for the first and a fixed end for the other, it’s effortless to visualize the differences and compare the values. Lastly, let’s try building a stacked bar chart with both positive and negative values. We’ll create a dummy data frame for this example. # listsDummy Data Frame — Image by Author The plan is to have a positive bar, divided into sales and interest revenue, and a negative bar, divided into fixed and variable costs, for each month. We want interest_revenue on top, so we use sales_revenue as the ‘bottom’ argument when plotting. We want fixed_costs at the top for the negative values, but instead, we’ll plot variable_costs with fixed_costs as the ‘bottom’ argument. plt.bar(result_df.index, result_df['interest_revenue'], bottom = result_df['sales_revenue'], color = '#5E96E9', width =0.5)plt.bar(result_df.index, result_df['variable_costs'], bottom = result_df['fixed_costs'], color = '#E17979', width =0.5)Upper and Lower ends of the stacked bar chart — Image by Author Again, the rest is very similar to what we already did. fig, ax = plt.subplots(1, figsize=(16, 8))plt.bar(result_df.index, result_df['sales_revenue'], color = '#337AE3', width =0.5)plt.bar(result_df.index, result_df['interest_revenue'], bottom = result_df['sales_revenue'], color = '#5E96E9', width =0.5)plt.bar(result_df.index, result_df['fixed_costs'], color = '#DB4444', width =0.5)plt.bar(result_df.index, result_df['variable_costs'], bottom = result_df['fixed_costs'], color = '#E17979', width =0.5)# x and y limitsStacked bar chart with positive and negative values — Image by Author That’s it. We stacked many bars and tried different applications for this convenient visualization technique. Thanks for reading my article! |