N-color#

Here I will argue that many of the errors I see in ground-truth datasets can be most kindly attributed to a lack of good label visualization. To illustrate, I will use the following cell microcolony.

The insufficiency of cell outlines#

_images/700ad413c6cf9f0c1a9e62d96ab75f1f5c9a8d7c13b54d919a2d31b20f6eb633.png

This outline view clearly distinguishes cells from each other, and it requires just one color (one channel). As ground truth, binary maps like this are one of the easiest annotations to generate and are therefore quite common in public datasets (see MiSiC, DeLTA, and SuperSegger just for a few in the realm of bacterial microscopy).

Despite the ease of drawing reasonable cell outlines, it is exceptionally difficult to guarantee that these monochromatic boundaries between cells are precisely 2 pixels thick. Without this property, the resulting label matrix will either exclude boundary pixels or asymmetrically incorporate them into one of the two cells. This is a primary reason why label matrices, not boundary maps, should be used to train and evaluate any segmentation algorithm (labels can fail in self-contact scenarios, but Omnipose now accepts affinity graphs or linked label matrices just for those cases).

Not enough colors to go around#

However, creating and editing label matrices has its own set of issues. If you have too many cells in an image, you quickly run out of distinct colors to distinguish adjacent cells:

number of masks:  161

_images/4a066e381504ceec677f2be2f4cb26454be0e796ac5f7c1f14018b4658a2f41d.png

This perceptually uniform color map is our best bet of distinguishing cells from each other, but some close cells are too similar to tell apart. The standard technique is to randomly shuffle the labels:

_images/cc2efb22c406137d80e522fdd1993d617ccc119b27e4953bd9e6069850a4f561.png

This doesn't fix the problem. You might think that adding more colors would help...

_images/e2c98c9770f62b8405ae0c03c357d539485379eec35d91d21554245bd7834942.png

... but since even random shuffling does not guarantee that numerically close labels become spatially separated, adjacent labels that were hard to tell apart using a perceptually uniform color map like viridis are often more difficult to tell apart using any kind of unicorn-vomit color map.

Worse still, multiple similar colors can accidentally get used while editing the wrong cell (e.g., color 11 inside cell 12 that are both shades of yellow) and ruin the segmentation despite this error being imperceptible to the human eye (this may account for many of the "errant pixels" we observe across ground-truth datasets of dense cells).

4-color in theory, N-color in practice#

To solve this problem, I developed the ncolor package, which converts \(K\)-integer label matrices to \(N \ll K\) - color labels. The four color theorem guarantees that you only need 4 unique cell labels to cover all cells, but my algorithm opts to use 5 if a solution using 4 is not found quickly. This was integral in developing the BPCIS dataset, and I subsequently incorporated it into Cellpose and Omnipose. By default, the GUI and plot commands display N-color masks for easier visualization and editing:

_images/137768f2c1f2570e971beb64939d0d94473786e7d4fdc3723d380f59a82c7d3a.png

Interesting note: my code works for 3D volume labels as well, but there is no analogous theorem guaranteeing any sort of upper bound \(N<K\) in 3D. In 3D, you could in principle have cells that touch every other cell, in which case \(N=K\) and you cannot "recolor your map". On the dense but otherwise well-behaved volumes I have tested, my algorithm ends up needing 6-7 unique labels. I am curious if some bound on N can be formulated in the context of constrained volumes, e.g., packed spheres of mixed and arbitrary diameter...

Getting uniform colors for non-contacting or sparse objects

Final note: thanks to Ryan Peters for suggesting a fix for displaying segmentations that (a) are from ground-truth sets with pixel-separated (boundary-map-generated) label matrices or (b) have many sparse, disjoint objects. By expanding labels before coloring them (a step that actually takes far longer than the coloring step itself), we get a much more pleasing distribution of colors that can make it easier to assess segmentations when when images are zoomed out. For example,

_images/bf6aba6d22fad5f38f6821885a8ccf08a2611c49187990f21900fd85f159e1d9.png

Left: ncolor applied to raw masks. Middle: ncolor expanded masks. Right: resulting ncolor masks with more uniform color distribution.

Note that the expansion step takes about 2x longer than the ncolor algorithm itself takes to run, but the extra milliseconds are worth it. If you know of any faster way to get a feature transform than scipy.ndimage, please let me know.

import string
fontsize = 11

axcol = [0.5]*3
# Set up the figure and subplots
images = [pic1,pic2,pic4]
N = len(images)
M = 1

h,w = images[0].shape[:2]

sf = w
p = 0.05 # needs to be defined as fraction of width for aspect ratio to work? 
h /= sf
w /= sf
offset = 0.05
# Calculate positions of subplots
left = np.array([i*(w+p) for i in range(N)])*1.+offset
bottom = np.array([0]*N)*1.
width = np.array([w]*N)*1.
height = np.array([h]*N)*1.

max_w = left[-1]+width[-1]
max_h = bottom[-1]+height[-1]

sw = max_w
sh = max_h

sf = max(sw,sh)
left /= sw
bottom /= sh
width /= sw
height /= sh

# Create figure
s = 6
fig = plt.figure(figsize=(s,s*sh/sw), frameon=False, dpi=600)
# fig = plt.figure(figsize=(s,s*sh/sw), frameon=False, dpi=300,constrained_layout=True)
# fig.set_constrained_layout_pads(w_pad=-0.25, h_pad=0., hspace=0., wspace=0.25)
# fig.patch.set_facecolor([0]*4)

# Add subplots
axes = []
for i in range(N):
    ax = fig.add_axes([left[i], bottom[i], width[i], height[i]])
    ax.imshow(images[i])
    axes.append(ax)

    ax.annotate(string.ascii_lowercase[i], xy=(-offset, 1), xycoords='axes fraction', 
    xytext=(0, 0), textcoords='offset points', va='top', c=axcol,
    fontsize=fontsize)
        
    ax.axis('off')  

datadir = omnidir.parent
file = os.path.join(datadir,'Dissertation','figures','ncolor.pdf')
if os.path.isfile(file): os.remove(file)
fig.savefig(file,transparent=True,pad_inches=0)#,bbox_inches='tight')

m.max(),ncolor.label(m).max()

(161, 4)

_images/e2ec9cd0c4393c9cca269c0a108cad7a9a92e99231087c825b00efcb584281ef.png