Drawing a Bee Swarm Plot in R/ Python

Drawing a Bee Swarm Plot in R/ Python

 

1. What is a bee swarm plot?

Bee swarm plot is a categorical scatter plot.

A normal scatter plot has problem when you come to compare groups of data side by side. Because there are too many points at a similar position that you cannot read out useful information from it. In short, overlapping / overploting is very heavy.

For example, without the idea of bee swarm, we have:

Drawing a Bee Swarm Plot in R/ Python

 

 

2. Why should I care?

A traditional fix approach this problem is boxplot. Boxplot uses some statistics indices to draw a "box", rather than showing full series of scatter points. By this method, people can read and compare series of data easily.

However still it has disadvantage. Boxplot omits the infomation of individual points and it is a overview of the whole series. What if I care about some key individuals? Here is how bee swarm comes to help.

For example, a boxplot will miss some information:

Drawing a Bee Swarm Plot in R/ Python

 

3. How to draw a bee swarm plot? (in R)

Let's go back to the traditional scatter plot with category. The main chanllenge here is too many points share a similar position. So people come up with an idea that add a random value to each point's horizontal position. As long as it still inside it's own category, it has no meaning in horizontal position anyway.

# a normal scatter plot problem
tips %>%
  ggplot() +
  geom_point(aes(x = day, y = tip))

Above code will generate the problem categorical scatter in above section.

In R you can use ggplot's geom_point() function like a normal scatter plot, but add an argument position.  

# try a bee swarm plot
tips %>%
  ggplot() +
  geom_point(aes(x = day, y = tip, color = day), position = "jitter")

Drawing a Bee Swarm Plot in R/ Python

 

I know by far it is not so useful, but don't forget why we came here: we have dimension of individuals so all the techniques in scatter we can use in a bee swarm plot now. This kind of information is not a boxplot can easily give you.

For example, there are less non-smoker in Friday and more non-smoker in Sunday(are they caused by kids related situation?).

# try bee swarm plot with more information
tips %>%
  ggplot() +
  geom_point(aes(x = day, y = tip, color = smoker), position = "jitter")

Drawing a Bee Swarm Plot in R/ Python

Of course, if you like boxplot's statistics indices, we can easily add a boxplot as a layer to it.  

# try bee swarm plot with boxplot
tips %>%
  ggplot() +
  geom_point(aes(x = day, y = tip, color = smoker), position = "jitter") +
  geom_boxplot(aes(x = day, y = tip), alpha = 0.5)

Drawing a Bee Swarm Plot in R/ Python

(One interesting thing is, bee swarm are often seen with a box as well, in the real world...)

 

Drawing a Bee Swarm Plot in R/ Python

 

There are also another kind of bee swarm that it not only use a random number to the horizontal position but use a more detial method. If one position has less points, it will stay still. But if one position has more points, it will pile one after another like a stack at horizontal.

First you need to install ggbeeswarm package. There are also other packages but "gg" means it works well with package ggplot.

# install.packages("ggbeeswarm")
# library(ggbeeswarm)

After that we can use geom_beeswarm() function.

# another kind of bee swarm
tips %>%
  ggplot() +
  geom_beeswarm(aes(x = day, y = tip, color = smoker))

Drawing a Bee Swarm Plot in R/ Python

 

In some situation, this kind of bee swarm can save a layer of boxplot. Because it has already showed up some statistics indices. But it depends on your data and reader's background.

tips %>%
  ggplot() +
  geom_beeswarm(aes(x = day, y = tip, color = time)) +
  geom_boxplot(aes(x = day, y = tip), alpha = 0.5)

Drawing a Bee Swarm Plot in R/ Python

 

  

4. How to draw a bee swarm plot in Python?

Without further ado repeating all above idea, they are all the same. In Python one easy way is using package seaborn. It has a plot function called catplot() and an argument called kind.

import seaborn as sns

sns.catplot(x = "day", y = "total_bill", hue = "sex", kind = "swarm", data = tips)

If you are interested in python/ seaborn, you can check it's official tutorial here: http://seaborn.pydata.org/tutorial/categorical.html

  

  

 

 

 

 

 

 

 

上一篇:docker swarm集群弹性创建服务


下一篇:Docker Swarm