How to scan two images for differences?

Issue

I’m trying to scan 2 images (32bppArgb format), identify when there is a difference and store the difference block’s bounds in a list of rectangles.

Suppose these are the images:
enter image description here

second:
enter image description here

I want to get the different rectangle bounds (the opened directory window in our case).

This is what I’ve done:

private unsafe List<Rectangle> CodeImage(Bitmap bmp, Bitmap bmp2)
{

    List<Rectangle> rec = new List<Rectangle>();
    bmData = bmp.LockBits(new System.Drawing.Rectangle(0, 0, 1920, 1080), System.Drawing.Imaging.ImageLockMode.ReadOnly, bmp.PixelFormat);
    bmData2 = bmp2.LockBits(new System.Drawing.Rectangle(0, 0, 1920, 1080), System.Drawing.Imaging.ImageLockMode.ReadOnly, bmp2.PixelFormat);

    IntPtr scan0 = bmData.Scan0;
    IntPtr scan02 = bmData2.Scan0;
    int stride = bmData.Stride;
    int stride2 = bmData2.Stride;
    int nWidth = bmp.Width;
    int nHeight = bmp.Height;
    int minX = int.MaxValue;;
    int minY = int.MaxValue;
    int maxX = 0;
    bool found = false;

    for (int y = 0; y < nHeight; y++) 
    {
        byte* p = (byte*)scan0.ToPointer();
        p += y * stride;
        byte* p2 = (byte*)scan02.ToPointer();
        p2 += y * stride2;
        for (int x = 0; x < nWidth; x++) 
        {

            if (p[0] != p2[0] || p[1] != p2[1] || p[2] != p2[2] || p[3] != p2[3]) //found differences-began to store positions.
            {
                found = true;
                if (x < minX)
                    minX = x;
                if (x > maxX)
                    maxX = x;
                if (y < minY)
                    minY = y;

            } 
            else 
            {

                if (found) 
                {

                    int height = getBlockHeight(stride, scan0, maxX, minY, scan02, stride2);
                    found = false;
                    Rectangle temp = new Rectangle(minX, minY, maxX - minX, height);
                    rec.Add(temp);
                    //x += minX;
                    y += height;
                    minX = int.MaxValue;
                    minY = int.MaxValue;
                    maxX = 0;
                }
            } 
            p += 4;
            p2 += 4;
        }
    }

    return rec;
}

public unsafe int getBlockHeight(int stride, IntPtr scan, int x, int y1, IntPtr scan02, int stride2) //a function to get  an existing block height.
{
    int height = 0;;
    for (int y = y1; y < 1080; y++) //only for example- in our case its 1080 height.
    {
        byte* p = (byte*)scan.ToPointer();
        p += (y * stride) + (x * 4); //set the pointer to a specific potential point. 
        byte* p2 = (byte*)scan02.ToPointer();
        p2 += (y * stride2) + (x * 4); //set the pointer to a specific potential point. 
        if (p[0] != p2[0] || p[1] != p2[1] || p[2] != p2[2] || p[3] != p2[3]) //still change on the height in the increasing **y** of the block.
            height++;
    }

    return height;
}

This is actually how I call the method:

Bitmap a = Image.FromFile(@"C:\Users\itapi\Desktop\1.png") as Bitmap;//generates a 32bppRgba bitmap;
Bitmap b = Image.FromFile(@"C:\Users\itapi\Desktop\2.png") as Bitmap;//

List<Rectangle> l1 = CodeImage(a, b);
int i = 0;
foreach (Rectangle rec in l1)
{
    i++;
    Bitmap tmp = b.Clone(rec, a.PixelFormat);
    tmp.Save(i.ToString() + ".png");
}

But I’m not getting the exact rectangle.. I’m getting only half of that and sometimes even worse. I think something in the code’s logic is wrong.

Code for @nico

private unsafe List<Rectangle> CodeImage(Bitmap bmp, Bitmap bmp2) 
{
    List<Rectangle> rec = new List<Rectangle>();
    var bmData1 = bmp.LockBits(new System.Drawing.Rectangle(0, 0, bmp.Width, bmp.Height), System.Drawing.Imaging.ImageLockMode.ReadOnly, bmp.PixelFormat);
    
    var bmData2 = bmp2.LockBits(new System.Drawing.Rectangle(0, 0, bmp.Width, bmp.Height), System.Drawing.Imaging.ImageLockMode.ReadOnly, bmp2.PixelFormat);

    int bytesPerPixel = 3;

    IntPtr scan01 = bmData1.Scan0;
    IntPtr scan02 = bmData2.Scan0;
    int stride1 = bmData1.Stride;
    int stride2 = bmData2.Stride;
    int nWidth = bmp.Width;
    int nHeight = bmp.Height;

    bool[] visited = new bool[nWidth * nHeight];

    byte* base1 = (byte*)scan01.ToPointer();
    byte* base2 = (byte*)scan02.ToPointer();

    for (int y = 0; y < nHeight; y += 5) 
    {
        byte* p1 = base1;
        byte* p2 = base2;

        for (int x = 0; x < nWidth; x += 5) 
        {
            if (!ArePixelsEqual(p1, p2, bytesPerPixel) && !(visited[x + nWidth * y])) 
            {
                // fill the different area
                int minX = x;
                int maxX = x;
                int minY = y;
                int maxY = y;

                var pt = new Point(x, y);

                Stack<Point> toBeProcessed = new Stack<Point> ();
                visited[x + nWidth * y] = true;
                toBeProcessed.Push(pt);
                
                while (toBeProcessed.Count > 0) 
                {
                    var process = toBeProcessed.Pop();
                    var ptr1 = (byte*)scan01.ToPointer() + process.Y * stride1 + process.X * bytesPerPixel;
                    var ptr2 = (byte*) scan02.ToPointer() + process.Y * stride2 + process.X * bytesPerPixel;
                    //Check pixel equality
                    if (ArePixelsEqual(ptr1, ptr2, bytesPerPixel))
                        continue;

                    //This pixel is different
                    //Update the rectangle
                    if (process.X < minX) minX = process.X;
                    if (process.X > maxX) maxX = process.X;
                    if (process.Y < minY) minY = process.Y;
                    if (process.Y > maxY) maxY = process.Y;

                    Point n;
                    int idx;
                    
                    //Put neighbors in stack
                    if (process.X - 1 >= 0) 
                    {
                        n = new Point(process.X - 1, process.Y);
                        idx = n.X + nWidth * n.Y;
                        if (!visited[idx]) 
                        {
                            visited[idx] = true;
                            toBeProcessed.Push(n);
                        }
                    }

                    if (process.X + 1 < nWidth) 
                    {
                        n = new Point(process.X + 1, process.Y);
                        idx = n.X + nWidth * n.Y;
                        if (!visited[idx]) 
                        {
                            visited[idx] = true;
                            toBeProcessed.Push(n);
                        }
                    }

                    if (process.Y - 1 >= 0) 
                    {
                        n = new Point(process.X, process.Y - 1);
                        idx = n.X + nWidth * n.Y;
                        if (!visited[idx]) 
                        {
                            visited[idx] = true;
                            toBeProcessed.Push(n);
                        }
                    }

                    if (process.Y + 1 < nHeight) 
                    {
                        n = new Point(process.X, process.Y + 1);
                        idx = n.X + nWidth * n.Y;
                        if (!visited[idx]) 
                        {
                            visited[idx] = true;
                            toBeProcessed.Push(n);
                        }
                    }
                }

                if (((maxX - minX + 1) > 5) & ((maxY - minY + 1) > 5))
                    rec.Add(new Rectangle(minX, minY, maxX - minX + 1, maxY - minY + 1));
            }

            p1 += 5 * bytesPerPixel;
            p2 += 5 * bytesPerPixel;
        }

        base1 += 5 * stride1;
        base2 += 5 * stride2;
    }

    bmp.UnlockBits(bmData1);
    bmp2.UnlockBits(bmData2);

    return rec;
    
}

Solution

I see a couple of problems with your code. If I understand it correctly, you

  1. find a pixel that’s different between the two images.
  2. then you continue to scan from there to the right, until you find a position where both images are identical again.
  3. then you scan from the last “different” pixel to the bottom, until you find a position where both images are identical again.
  4. then you store that rectangle and start at the next line below it

enter image description here

Am I right so far?

Two obvious things can go wrong here:

  • If two rectangles have overlapping y-ranges, you’re in trouble: You’ll find the first rectangle fine, then skip to the bottom Y-coordinate, ignoring all the pixels left or right of the rectangle you just found.
  • Even if there is only one rectangle, you assume that every pixel on the rectangle’s border is different, and all the other pixels are identical. If that assumption isn’t valid, you’ll stop searching too early, and only find parts of rectangles.

If your images come from a scanner or digital camera, or if they contain lossy compression (jpeg) artifacts, the second assumption will almost certainly be wrong. To illustrate this, here’s what I get when I mark every identical pixel the two jpg images you linked black, and every different pixel white:

enter image description here

What you see is not a rectangle. Instead, a lot of pixels around the rectangles you’re looking for are different:

enter image description here

That’s because of jpeg compression artifacts. But even if you used lossless source images, pixels at the borders might not form perfect rectangles, because of antialiasing or because the background just happens to have a similar color in that region.

You could try to improve your algorithm, but if you look at that border, you will find all kinds of ugly counterexamples to any geometric assumptions you’ll make.

It would probably be better to implement this “the right way”. Meaning:

  • Either implement a flood fill algorithm that erases different pixels (e.g. by setting them to identical or by storing a flag in a separate mask), then recursively checks if the 4 neighbor pixels.
  • Or implement a connected component labeling algorithm, that marks each different pixel with a temporary integer label, using clever data structures to keep track which temporary labels are connected. If you’re only interested in a bounding box, you don’t even have to merge the temporary labels, just merge the bounding boxes of adjacent labeled areas.

Connected component labeling is in general a bit faster, but is a bit trickier to get right than flood fill.

One last advice: I would rethink your “no 3rd party libraries” policy if I were you. Even if your final product will contain no 3rd party libraries, development might by a lot faster if you used well-documented, well-tested, useful building blocks from a library, then replaced them one by one with your own code. (And who knows, you might even find an open source library with a suitable license that’s so much faster than your own code that you’ll stick with it in the end…)


ADD: In case you want to rethink your “no libraries” position: Here’s a quick and simple implementation using AForge (which has a more permissive library than emgucv):

private static void ProcessImages()
{
    (* load images *)
    var img1 = AForge.Imaging.Image.FromFile(@"compare1.jpg");
    var img2 = AForge.Imaging.Image.FromFile(@"compare2.jpg");

    (* calculate absolute difference *)
    var difference = new AForge.Imaging.Filters.ThresholdedDifference(15)
        {OverlayImage = img1}
        .Apply(img2);

    (* create and initialize the blob counter *)
    var bc = new AForge.Imaging.BlobCounter();
    bc.FilterBlobs = true;
    bc.MinWidth = 5;
    bc.MinHeight = 5;

    (* find blobs *)
    bc.ProcessImage(difference);

    (* draw result *)
    BitmapData data = img2.LockBits(
       new Rectangle(0, 0, img2.Width, img2.Height),
          ImageLockMode.ReadWrite, img2.PixelFormat);

    foreach (var rc in bc.GetObjectsRectangles())
        AForge.Imaging.Drawing.FillRectangle(data, rc, Color.FromArgb(128,Color.Red));

    img2.UnlockBits(data);
    img2.Save(@"compareResult.jpg");
}

The actual difference + blob detection part (without loading and result display) takes about 43ms, for the second run (this first time takes longer of course, due to JITting, cache, etc.)

Result (the rectangle is larger due to jpeg artifacts):

enter image description here

Answered By – Niki

Answer Checked By – Candace Johnson (AngularFixing Volunteer)

Leave a Reply

Your email address will not be published.