Elegant way to match a string to a random color matplotlib
Choose a color map, such as viridis
:
cmap = plt.get_cmap('viridis')
The colormap, cmap
, is a function which can take an array of values from 0 to 1 and map them to RGBA colors. np.linspace(0, 1, len(names))
produces an array of equally spaced numbers from 0 to 1 of length len(names)
. Thus,
colors = cmap(np.linspace(0, 1, len(names)))
selects equally-spaced colors from the viridis
color map.
Note that this is not using the value of the string, it only uses the ordinal position of the string in the list to select a color. Note also that these are not random colors, this is just an easy way to generate unique colors from an arbitrary list of strings.
So:
import numpy as np
import matplotlib.pyplot as plt
cmap = plt.get_cmap('viridis')
names = ["bob", "joe", "andrew", "pete"]
colors = cmap(np.linspace(0, 1, len(names)))
print(colors)
# [[ 0.267004 0.004874 0.329415 1. ]
# [ 0.190631 0.407061 0.556089 1. ]
# [ 0.20803 0.718701 0.472873 1. ]
# [ 0.993248 0.906157 0.143936 1. ]]
x = np.linspace(0, np.pi*2, 100)
for i, (name, color) in enumerate(zip(names, colors), 1):
plt.plot(x, np.sin(x)/i, label=name, c=color)
plt.legend()
plt.show()
The problem with
clr = {names[i]: colors[i] for i in range(len(names))}
ax.scatter(x, y, z, c=clr)
is that the c
parameter of ax.scatter
expects a sequence of RGB(A)
values of the same length as x
or a single color. clr
is a dict, not a sequence. So
if colors
is the same length as x
then you could use
ax.scatter(x, y, z, c=colors)
I use the hash function to get numbers between 0 and 1, you can use this even when you don't know all the labels:
x = [1, 2, 3, 4, 5]
labels = ["a", "a", "b", "b", "a"]
y = [1, 2, 3, 4, 5]
colors = [float(hash(s) % 256) / 256 for s in labels]
plt.scatter(x, y, c=colors, cmap="jet")
plt.show()
This has upset me so much, that I have written get_cmap_string
that returns a function which works exactly as cmap
but acts also on strings.
data = ["bob", "joe", "pete", "andrew", "pete"]
cmap = get_cmap_string(palette='viridis', domain=data)
cmap("joe")
# (0.20803, 0.718701, 0.472873, 1.0)
cmap("joe", alpha=0.5)
# (0.20803, 0.718701, 0.472873, 0.5)
1. Implementation
The basic idea as mentioned by all other answers is that we need a hash table -- a mapping from our strings to integers, which is unique. In python this is just a dictionary.
The reason hash(str)
won't work, is that even though matplotlib's cmap
accepts any integer, it is possible for two different strings to get the same color. For example, if hash(str1)
is 8
and hash(str2)
is 18
, and we initialize cmap
as get_cmap(name=palette, lut=10)
then cmap(hash(str1))
will be the same as cmap(hash(str2))
Code
import numpy as np
import matplotlib.cm
def get_cmap_string(palette, domain):
domain_unique = np.unique(domain)
hash_table = {key: i_str for i_str, key in enumerate(domain_unique)}
mpl_cmap = matplotlib.cm.get_cmap(palette, lut=len(domain_unique))
def cmap_out(X, **kwargs):
return mpl_cmap(hash_table[X], **kwargs)
return cmap_out
2. Usage
Example as in other answers, but now note that the name pete
appears twice.
import matplotlib.pyplot as plt
# data
names = ["bob", "joe", "pete", "andrew", "pete"]
# color map for the data
cmap = get_cmap_string(palette='viridis', domain=names)
# example usage
x = np.linspace(0, np.pi*2, 100)
for i_name, name in enumerate(names):
plt.plot(x, np.sin(x)/i_name, label=name, c=cmap(name))
plt.legend()
plt.show()
You can see, that the entries in the legend are duplicated. Solving this is another challenge, see here. Or use a custom legend instead as explained here.
3. Alternatives
As far the discussion by matplotlib devs goes, they recommend using Seaborn. See discussion here and an example usage here.