Reading .PNGs, how do you identify clusters of color and rewrite the image file so that every cluster has a unique RGB code?

Issue

Continued from this question: How could you rewrite a list of lists so that "islands" of values are unique from one another?

Brief: How would you parse an image, for example:

example input

in such a way that you identify the several clusters of distinct pixels and rewrite the file so that each cluster has a unique color, for example:

example output

Here’s how I have tried to implement it with assistance from a few sources, including stackoverflow user @Rabinzel: (detailed reasoning below main code block)

    from scipy import ndimage
    import numpy as np
    from PIL import Image
    
    #set the file path to wherever your provinces.png is located
    im = Image.open(r"C:\\Users\\scoop\\Desktop\\prov_test.png")
    
    print('-------------------------------------------')
    #DEBUGGING: simply prints the format, size, and mode of your file
    print(im.format, im.size, im.mode)
    #saves the width and depth of the file
    im_xsize = im.size[0]
    im_ysize = im.size[1]
    #DEBUGGING: prints it
    print(im_xsize, im_ysize)
    #DEBUGGNG: prints data bands, should be R, G, B
    print(im.getbands())
    
    #DEBUGGING: prints RGB value of pixel of choice
    print(im.getpixel((0,0)))
    print('-------------------------------------------')
    
    #creates array for pixel RGBs
    rgb_array = [[None] * im_ysize for length in range(0,im_xsize)]
    
    #fills pixel RGB array
    for x in range(0,im_xsize):
        for y in range(0,im_ysize):
            rgb_array[x][y] = im.getpixel((x,y))
       
    #find unique clusters of identical RGB codes
    def find_clusters(array):
        clustered = np.empty_like(array)
        unique_vals = np.unique(array)
        cluster_count = 0
        for val in unique_vals:
            labelling, label_count = ndimage.label(array == val)
            for k in range(1, label_count + 1):
                clustered[labelling == k] = cluster_count
                cluster_count += 1
        return clustered, cluster_count
    
    clusters, cluster_count = find_clusters(rgb_array)
    print("Found {} clusters:".format(cluster_count))
    #print(clusters)
    
    #defining a list of unique colors
    province_color_list = [[0] * 3 for length in range(0,cluster_count)] 
    
    #DEBUGGING
    print('province count...', cluster_count)
    #variables
    r = 255
    g = 0
    b = 0
    count = 0
    
    #generating colors
for length in range(0,cluster_count):
    province_color_list[length][0] = r
    province_color_list[length][1] = g
    province_color_list[length][2] = b
    g += 25
    b += 25
    count += 1
    if count >= 11:
        r -= 1
        g = 0
        b = 0
        count = 0

#DEBUGGING
print('# of colors... ', len(province_color_list))
print(province_color_list)
print('-------------------------------------------')

#writing colors to pixels
for x in range(0,im_xsize):
    for y in range(0,im_ysize):
        #places province color based on which province current pixel is assigned to
        im.putpixel((x,y),   (province_color_list[0][0],   province_color_list[0][1],   province_color_list[0][2]))
         
#im.save(r"C:\\Users\\scoop\\Desktop\\prov_test.png", im.format)

I load the image using PIL:

im = Image.open(r"C:\\Users\\scoop\\Desktop\\prov_test.png")

I create an array to more easily(?) access the image array, which stores each pixel’s color as an RGB color code in tuple form. Then this method identifies the relevant pixel clusters.

rgb_array = [[None] * im_ysize for length in range(0,im_xsize)]

#fills pixel RGB array
for x in range(0,im_xsize):
    for y in range(0,im_ysize):
        rgb_array[x][y] = im.getpixel((x,y))
   
#find unique clusters of identical RGB codes
def find_clusters(array):
    clustered = np.empty_like(array)
    unique_vals = np.unique(array)
    cluster_count = 0
    for val in unique_vals:
        labelling, label_count = ndimage.label(array == val)
        for k in range(1, label_count + 1):
            clustered[labelling == k] = cluster_count
            cluster_count += 1
    return clustered, cluster_count

clusters, cluster_count = find_clusters(rgb_array)

Then I create a list of unique RGB codes the length of the # of pixel clustes that exist.

province_color_list = [[0] * 3 for length in range(0,cluster_count)] 

#DEBUGGING
print('province count...', cluster_count)
#variables
r = 255
g = 0
b = 0
count = 0

#generating colors
for length in range(0,cluster_count):
    province_color_list[length][0] = r
    province_color_list[length][1] = g
    province_color_list[length][2] = b
    g += 25
    b += 25
    count += 1
    if count >= 11:
        r -= 1
        g = 0
        b = 0
        count = 0

and finally, I rewrite each pixel with the new RGB code associated with the unique cluster from earlier (and save the image).

#writing colors to pixels
for x in range(0,im_xsize):
    for y in range(0,im_ysize):
        #places province color based on which province current pixel is assigned to
        im.putpixel((x,y),   (province_color_list[clusters[x][y]][0],   province_color_list[clusters[x][y]][1],   province_color_list[clusters[x][y]][2]))
         
#im.save(r"C:\\Users\\scoop\\Desktop\\prov_test.png", im.format)

Unfortunately there’s multiple issues with this script and I get the feeling its degenerated into a bit of nonsense. The chief issues seem to be accessing the RGB tuples of the .PNG Image class and changing them to integers to identify them properly as well as differentiating between distinct clusters not just distinct colors. I haven’t even been able to get the script to write the image as anything but a flat color so far.

For reference, I hope to be able to scale this up to handle an image like this:

enter image description here

and give each of those little clusters a unique color. Any and all help appreciated.

Solution

OK, let’s see if that works for you. If I understood it right what you are trying to achieve, here is my (beginner) solution.

Essentially I take the image, in a 3D array, find all unique colors in the picture and replace them with an integer( function: arr_to_int). Then find all the clusters with the function find_clusters. Create a dictionary with new colors with as many colors as number of clusters (so every int of every cluster gets replaced with a color again).
At the end replace all int with colors again and save the picture.

This was the image I used to start with:

enter image description here

and that’s the new picture I got as output:

enter image description here

If you change the process of how to apply them clusters the specific colors you want to use, I think I’m pretty close to what you are trying to achieve (hope so 🙂 )

import numpy as np
import cv2
from scipy import ndimage

# array of GBR colors to single int
def arr_to_int(arr, col_mask):
    out = np.ndarray(shape=arr.shape[:2], dtype=int)
    out[:,:] = -1
    for rgb, idx in col_mask.items():
        out[(arr==rgb).all(2)] = idx
    return out

# find unique clusters of identical RGB codes
def find_clusters(array):
    clustered = np.empty_like(array)
    unique_vals = np.unique(array)
    cluster_count = 0
    for val in unique_vals:
        labelling, label_count = ndimage.label(array == val)
        for k in range(1, label_count + 1):
            clustered[labelling == k] = cluster_count
            cluster_count += 1
    return clustered, cluster_count
# Load image
im = cv2.imread("prov_test.png")
#im = cv2.resize(im, (2, 3)) #resize for debugging
#print('original image: \n', im, '\n')

#find all unique colors in image (cv2 presents in BGR format!!!)
unique_col_BGR = list(set(tuple(v) for m2d in im for v in m2d))
print('unique values: ', unique_col_BGR, '\n')

#create dict with GBR_colors as keys and unique integers as value
mask_GBR_int = {color:idx for idx,color in enumerate(unique_col_BGR)}
print('mask dict: ', mask_GBR_int, '\n')

#change all color values in im to a single int (mask)
im_with_ints = arr_to_int(im, mask_GBR_int)
#print('pic with mask values: \n', im_with_ints, '\n')

# due to replacing array of 3 values to a single int, new array has one dimension less
print('orig pic resized shape', im.shape)
print('Mask int pic shape', im_with_ints.shape, '\n')

clusters, cluster_count = find_clusters(im_with_ints)
print(f'Found {cluster_count} clusters', '\n')
#print(clusters)

#create dict with length equal to number of clusters and choose color of list_of_colors (random from the internet)
list_of_colors = [[192,192,192],[128,128,128],[128,0,0],[128,128,0],[0,128,0],[128,0,128],[0,128,128],[0,0,128],[255,0,0],[0,255,0],[0,0,255],[255,255,0],[0,255,255],[255,0,255]]
new_color_dict = {idx:val for idx,val in enumerate(list_of_colors[:cluster_count])}
print('new_color_dict: ', new_color_dict,'\n')

#change arr with int to colors again
res = np.array([*new_color_dict.values()])[clusters]
#print('image array with new colors: \n', res)

cv2.imwrite("prov_test_output.png", res)


Output:

unique values:  [(0, 255, 0), (255, 0, 0), (0, 0, 255), (0, 255, 255)] 

mask dict:  {(0, 255, 0): 0, (255, 0, 0): 1, (0, 0, 255): 2, (0, 255, 255): 3} 

orig pic resized shape (100, 100, 3)
Mask int pic shape (100, 100) 

Found 9 clusters 

new_color_dict:  {0: [192, 192, 192], 1: [128, 128, 128], 2: [128, 0, 0], 3: [128, 128, 0], 4: [0, 128, 0], 5: [128, 0, 128], 6: [0, 128, 128], 7: [0, 0, 128], 8: [255, 0, 0]} 


Answered By – Rabinzel

Answer Checked By – Marilyn (AngularFixing Volunteer)

Leave a Reply

Your email address will not be published.