N-color#

Here I will argue that many of the errors I see in ground-truth datasets can be most kindly attributed to a lack of good label visualization. To illustrate, I will use the following cell microcolony.

The insufficiency of cell outlines#

Hide code cell source
 1import matplotlib.pyplot as plt
 2plt.style.use('dark_background')
 3import matplotlib as mpl
 4%matplotlib inline
 5mpl.rcParams['figure.dpi'] = 600
 6import numpy as np
 7import omnipose
 8from omnipose.utils import rescale, crop_bbox
 9from omnipose.plot import imshow
10import fastremap
11
12from pathlib import Path
13import os
14from cellpose_omni import io, plot
15omnidir = Path(omnipose.__file__).parent.parent
16basedir = os.path.join(omnidir,'docs','test_files') #first run the mono_channel_bact notebook to generate masks
17masks = io.imread(os.path.join(basedir,'masks','ec_5I_t141xy5c1_cp_masks.tif'))
18img = io.imread(os.path.join(basedir,'ec_5I_t141xy5c1.tif'))
19imshow(plot.outline_view(img,masks),3,interpolation='None')
_images/700ad413c6cf9f0c1a9e62d96ab75f1f5c9a8d7c13b54d919a2d31b20f6eb633.png

This outline view clearly distinguishes cells from each other, and it requires just one color (one channel). As ground truth, binary maps like this are one of the easiest annotations to generate and are therefore quite common in public datasets (see MiSiC, DeLTA, and SuperSegger just for a few in the realm of bacterial microscopy).

Despite the ease of drawing reasonable cell outlines, it is exceptionally difficult to guarantee that these monochromatic boundaries between cells are precisely 2 pixels thick. Without this property, the resulting label matrix will either exclude boundary pixels or asymmetrically incorporate them into one of the two cells. This is a primary reason why label matrices, not boundary maps, should be used to train and evaluate any segmentation algorithm (labels can fail in self-contact scenarios, but Omnipose now accepts affinity graphs or linked label matrices just for those cases).

Not enough colors to go around#

However, creating and editing label matrices has its own set of issues. If you have too many cells in an image, you quickly run out of distinct colors to distinguish adjacent cells:

Hide code cell source
1bbx = crop_bbox(masks) #in omni
2slc = bbx[0]
3m,_ = fastremap.renumber(masks[slc]) # make sure masks go from 0 to N
4print('number of masks: ', np.max(m))
5
6cmap = mpl.colormaps.get_cmap('viridis')
7pic1 = cmap(rescale(m))
8pic1[:,:,-1] = m>0 # alpha 
9imshow(pic1,3,interpolation='None')
number of masks:  161
_images/4a066e381504ceec677f2be2f4cb26454be0e796ac5f7c1f14018b4658a2f41d.png

This perceptually uniform color map is our best bet of distinguishing cells from each other, but some close cells are too similar to tell apart. The standard technique is to randomly shuffle the labels:

Hide code cell source
 1import fastremap
 2keys = fastremap.unique(m)
 3vals = keys.copy()
 4np.random.seed(42)
 5np.random.shuffle(keys)
 6d = dict(zip(keys,vals))
 7m_shuffle = fastremap.remap(m,d)
 8pic2 = cmap(rescale(m_shuffle))
 9pic2[:,:,-1] = m>0 # alpha 
10imshow(pic2,3,interpolation='None')
_images/cc2efb22c406137d80e522fdd1993d617ccc119b27e4953bd9e6069850a4f561.png

This doesn't fix the problem. You might think that adding more colors would help...

Hide code cell source
1from omnipose.utils import sinebow
2from matplotlib.colors import ListedColormap
3
4cmap = ListedColormap([color for color in list(sinebow(m.max()).values())[1:]])
5pic3 = cmap(m_shuffle)
6pic3[:,:,-1] = m>0 # alpha 
7imshow(pic3,3,interpolation='None')
_images/e2c98c9770f62b8405ae0c03c357d539485379eec35d91d21554245bd7834942.png

... but since even random shuffling does not guarantee that numerically close labels become spatially separated, adjacent labels that were hard to tell apart using a perceptually uniform color map like viridis are often more difficult to tell apart using any kind of unicorn-vomit color map.

Worse still, multiple similar colors can accidentally get used while editing the wrong cell (e.g., color 11 inside cell 12 that are both shades of yellow) and ruin the segmentation despite this error being imperceptible to the human eye (this may account for many of the "errant pixels" we observe across ground-truth datasets of dense cells).

4-color in theory, N-color in practice#

To solve this problem, I developed the ncolor package, which converts \(K\)-integer label matrices to \(N \ll K\) - color labels. The four color theorem guarantees that you only need 4 unique cell labels to cover all cells, but my algorithm opts to use 5 if a solution using 4 is not found quickly. This was integral in developing the BPCIS dataset, and I subsequently incorporated it into Cellpose and Omnipose. By default, the GUI and plot commands display N-color masks for easier visualization and editing:

Hide code cell source
1import ncolor 
2cmap = mpl.colormaps.get_cmap('viridis')
3pic4 = cmap(rescale(ncolor.label(m)))
4pic4[:,:,-1] = m>0 # alpha 
5imshow(pic4,3,interpolation='None')
_images/137768f2c1f2570e971beb64939d0d94473786e7d4fdc3723d380f59a82c7d3a.png

Interesting note: my code works for 3D volume labels as well, but there is no analogous theorem guaranteeing any sort of upper bound \(N<K\) in 3D. In 3D, you could in principle have cells that touch every other cell, in which case \(N=K\) and you cannot "recolor your map". On the dense but otherwise well-behaved volumes I have tested, my algorithm ends up needing 6-7 unique labels. I am curious if some bound on N can be formulated in the context of constrained volumes, e.g., packed spheres of mixed and arbitrary diameter...

Getting uniform colors for non-contacting or sparse objects

Final note: thanks to Ryan Peters for suggesting a fix for displaying segmentations that (a) are from ground-truth sets with pixel-separated (boundary-map-generated) label matrices or (b) have many sparse, disjoint objects. By expanding labels before coloring them (a step that actually takes far longer than the coloring step itself), we get a much more pleasing distribution of colors that can make it easier to assess segmentations when when images are zoomed out. For example,

Hide code cell source
 1from omnipose import plot
 2masks = io.imread(os.path.join(basedir,'masks','caulo_15_cp_masks.tif'))
 3exp = ncolor.expand_labels(masks)
 4ims = [plot.apply_ncolor(masks,expand=False),
 5       plot.apply_ncolor(exp),
 6       plot.apply_ncolor(masks)]
 7
 8titles = ['Original masks','Intermediate expansion', 'Masked result']
 9N = len(titles)
10f = 1.5
11c = [0.5]*3
12fontsize=10
13dpi = mpl.rcParams['figure.dpi']
14Y,X = masks.shape[-2:]
15szX = max(X//dpi,2)*f
16szY = max(Y//dpi,2)*f
17
18fig, axes = plt.subplots(1,N, figsize=(szX*N,szY))  
19fig.patch.set_facecolor([0]*4)
20for i,ax in enumerate(axes):
21    ax.imshow(ims[i])
22    ax.axis('off')
23    ax.set_title(titles[i],c=c,fontsize=fontsize,fontweight="bold")
24
25plt.subplots_adjust(wspace=0.1)
26plt.show()
_images/bf6aba6d22fad5f38f6821885a8ccf08a2611c49187990f21900fd85f159e1d9.png

Left: ncolor applied to raw masks. Middle: ncolor expanded masks. Right: resulting ncolor masks with more uniform color distribution.

Note that the expansion step takes about 2x longer than the ncolor algorithm itself takes to run, but the extra milliseconds are worth it. If you know of any faster way to get a feature transform than scipy.ndimage, please let me know.

 1import string
 2fontsize = 11
 3
 4axcol = [0.5]*3
 5# Set up the figure and subplots
 6images = [pic1,pic2,pic4]
 7N = len(images)
 8M = 1
 9
10h,w = images[0].shape[:2]
11
12sf = w
13p = 0.05 # needs to be defined as fraction of width for aspect ratio to work? 
14h /= sf
15w /= sf
16offset = 0.05
17# Calculate positions of subplots
18left = np.array([i*(w+p) for i in range(N)])*1.+offset
19bottom = np.array([0]*N)*1.
20width = np.array([w]*N)*1.
21height = np.array([h]*N)*1.
22
23max_w = left[-1]+width[-1]
24max_h = bottom[-1]+height[-1]
25
26sw = max_w
27sh = max_h
28
29sf = max(sw,sh)
30left /= sw
31bottom /= sh
32width /= sw
33height /= sh
34
35# Create figure
36s = 6
37fig = plt.figure(figsize=(s,s*sh/sw), frameon=False, dpi=600)
38# fig = plt.figure(figsize=(s,s*sh/sw), frameon=False, dpi=300,constrained_layout=True)
39# fig.set_constrained_layout_pads(w_pad=-0.25, h_pad=0., hspace=0., wspace=0.25)
40# fig.patch.set_facecolor([0]*4)
41
42# Add subplots
43axes = []
44for i in range(N):
45    ax = fig.add_axes([left[i], bottom[i], width[i], height[i]])
46    ax.imshow(images[i])
47    axes.append(ax)
48
49    ax.annotate(string.ascii_lowercase[i], xy=(-offset, 1), xycoords='axes fraction', 
50    xytext=(0, 0), textcoords='offset points', va='top', c=axcol,
51    fontsize=fontsize)
52        
53    ax.axis('off')  
54
55datadir = omnidir.parent
56file = os.path.join(datadir,'Dissertation','figures','ncolor.pdf')
57if os.path.isfile(file): os.remove(file)
58fig.savefig(file,transparent=True,pad_inches=0)#,bbox_inches='tight')
59
60m.max(),ncolor.label(m).max()
(161, 4)
_images/e2ec9cd0c4393c9cca269c0a108cad7a9a92e99231087c825b00efcb584281ef.png