Data visualization is critical to sharing your science, but if you use a color scheme that isn't color-blind friendly, 1 in 12 men and 0.5% of women won't be able to see your data!

A thread from a color-blind data scientist on making #dataviz more color-blind #accessible:

1/
There's also a great thread by @LoanReads showing "what you see/what I see" based on some photos from nature.

https://twitter.com/LoanReads/status/1313974210151305218

3/
These are all great, but I want to give concrete examples using different data modalities common in biomedical science.

Disclaimer: this is based on my own perspectives and opinions. I am red-green color blind, but what I show may not apply to blue-yellow color blindness.

4/
First up: heatmaps! These are used for large and continuous data, such as sequencing data. To give concrete examples, I've clustered the wine data set.

Most color blind folks can't differentiate red from green, but a red-green map is common. What are some alternatives?

5/
One color blind friendly option is to use red-blue. A middle color of white instead of black helps maintain high contrast, which is super helpful!

This "diverging" color map is useful when the middle value is important, e.g. zero is the mean, blue is bigger, red is smaller.

6/
If you don't care about the middle value and only need to show high vs. low, use a "sequential" color map. I like the Reds color map (white to red), but really anything will work here so long as one color is white. Maps like purple-red and blue-purple are a bit trickier.

7/
"Perceptually uniform" color maps are also great because a step in the data has a constant change in color. In other words, an increase from 5 to 10 and 20 to 25 will have the same relative increase in color.

However, not every perceptually uniform map is made equal.

8/
Here are four perceptually uniform color maps: viridis, plasma, inferno, and magma. I think viridis is the best.

Notice the two areas I boxed on the heatmaps. To me, viridis has the most contrast while other maps look more uniform.

9/
Next data modality: DNA sequence! When we visualize raw DNA sequence, each nucleotide (letter) gets a different color.

We usually visualize DNA sequence in two ways: position weight matrices (PWMs) or heatmaps where each row is a different sequence.

10/
The classic color scheme is green=A, blue=C, orange=G, red=T.

A better color scheme uses hex codes 0x009980=A, 0x59B3E6=C, 0xE69B04=G, 0x1A1A1A=T.

It's not so bad when we use PWMs, such as this one for the transcription factor CRX, because we can see the letters...

11/
...but things get more complicated when we visualize DNA as a heatmap. Here are 100 different sequences.

In the classic color scheme, I can differentiate orange from green, but I don't know which one is which.

With the other color scheme, I can tell the difference.

12/
Last data modality: categorical data. Oftentimes, we want to put two categories on the same plot.

I'll use the iris dataset and show scatter plots of the sepal width vs. sepal length. The setosas will be one color and the versicolors and virginicas will be a second color.

13/
We often think red as a bright, attention-drawing color, but some color blind folks have a hard time seeing red against black. This is also true for text -- red text in a presentation just looks black to me.

Instead of red vs. black, use blue vs. black.

14/
I already told you red vs. green is bad, but it's super common with microscopy data because red and green are good colors for fluorescent tags.

I suggest replacing red with magenta and using a lighter shade of green. I don't have microscopy data, but here's a scatterplot.

15/
Another one I see sometimes is blue vs. purple. This one is hit or miss. Sometimes I can see two colors, but even then, I can't tell you which is which.

An alternative approach is to use a dark blue like 0x1F78B4 and light blue like 0xA6CEE3.

16/
A pair with a similar situation to blue vs. purple is green vs. orange. Instead, use a dark green like 0x33A02C against a light green like 0xB2DF8A.

In general, using two shades of the same color is a safer bet then using two different colors.

17/
That's all I’ll cover for now, but there are lots of great resources out there!

For comparisons of several different color maps, check out this tutorial on the matplotlib website: https://matplotlib.org/tutorials/colors/colormaps.html

18/
For other color maps, go to Color Brewer 2. I especially like this map because I can actually distinguish all six colors, even though there is red and green https://colorbrewer2.org/#type=qualitative&scheme=Paired&n=6

Viz Palette will help you visualize color schemes on different plots https://projects.susielu.com/viz-palette 

19/
If you’re an R user, the khroma package has multiple color-blind friendly maps: https://github.com/nfrerebeau/khroma

Tools like ColorOracle can help you color-blind proof your figures (or ask a color-blind friend): https://colororacle.org/ 

20/
If you want to see how I made all these figures, you can find the Jupyter notebook on my Github: https://doi.org/10.5281/zenodo.4263384

Finally, thanks to @stpierreceline and @tripleEkotnik for making sure the color maps here look good!

#DataVisualization #DataScience #Accessibility

21/21
You can follow @rfriedman22.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled:

By continuing to use the site, you are consenting to the use of cookies as explained in our Cookie Policy to improve your experience.