------------------------------------------------------------------------------- Hints, Tips and Notes on bulk sorting of images All additions and infomation welcome... Please mail Anthony Thyssen ------------------------------------------------------------------------------- Image comparing thoughts... Specifically locating images which close duplicates or same subject. This is very difficult when faced with jpeg distortion, or resizing. For example should match images with borders or spam text added. Generally want lists of close matches for eyeball comparison. Comparisons should be fuzzy. What I mean by a "fuzzy" compare would be to report the difference of the colors in the regions between the two images, if this difference is too big, images aren't identical, and further more intensive tests skipped. Basic color matching Strip boarder reduce colors histogram to get top two or three colors and areas involved to fuzzy match General region colors. Compare general color of small regions. Say 5 areas of 1/3 of width and height, in a loose checkerboard over the image. Specifically I would avoid the image edges where boarders or "spam" text may have been added. Smoothing image as a pre-step may get rid of a lot of 'artifacts' in the image. particularly at borders of test regions. Edge Detect resize images to same size (say 120x120) do edge detection, and line generation compare line lists for closeness (How) Histogram Shape detect FFT of sub-regions Divide the (scaled) image in 8x8 square subregions, calculate FFT (or cosine transform) of every region, and fuzzy compare only the first few coefficients. This will essentially give "identical" on JPEG-compressed images. Peter Vanroose Fractal Dimensions Generate a rough fractial dimenstion of your images. Those with simular fractial dimensions should be images which are also simular in general appearence. See http://www.weihenstephan.de/ane/algorithms/algorithms.html Hugh Brackett I advice you to use the "box counting method" to estimate the fractal index, it's the simplest one. It is however for binary images. Mazzone Fabio RGB space Vectors Calculate the pixel color differences ad RGB space vectors. Then get a 3 dimensional standard deviation of those vector differences. Kevin Myers wrote... Some other things that you might consider are various color reduction algorithms (including simple level reductions) and blurring the images or lowering their resolution prior to comparison. By reducing the amount of detail in the images, you will increase your chances of being able to detect a match in images that have minor variations. ------------------------------------------------------------------------------- Known Image Comparison Applications Available.. Mail me if you discover anything more... ImageMagick Compare This gives user access to the ImageMagcik "compare()" functions. compare image1 image2 x: & The unchanged image is given as dimmed image in the background, While all the changes are highlighted in a red tint. You can get an actual difference value using the -metric compare -metric MAE image1 image2 null: Their are a number of different metrics to chose from. With the same set of test images (mostly the same) I produced the following results... _metric_|____result_______|__black_vs_white__ MAE | 137.478 dB | 65535 dB MSE | 4.65489e+06 dB | 4.29484e+09 dB PSE | 63479 dB | 65535 dB PSNR | 29.6504 dB | 0 dB RMSE | 2157.52 dB | 65535 dB Result: is a compare of a image with large JPEG like differences black vs white: is a compare of a solid black verses a solid white image The dB is a logightimic scale, 0 = no differences. The e+06 is scientific notation, on how many places to shift the decimal point. EG: 4.65489e+06 --> 4654890 Notice that black and white are thought to be the same image for the PSNR metric. And to it they are, just different tints. However for actual image, it can produce a very good indicator. I have NOT figured out the "-define" options to the "compare" function. ImageMagick Difference... Using the Alpha Composite "difference"... (images must be the same size!) This is exactly the same as using "compare", but without the heavily tinted copy of the original images, though differences can be more colorful, black in the result means no difference. First a straight difference image convert image1 image2 -compose difference -composite x: & If images are simular you will probably hardly see any differences. If images are very different you will see a hugh range of colors. What is interesting is if the two images are simular but only slightly offset, you will get a mostly black image with all the edges glaringly white. Almost (but not quite) like the result of edge detection. Possibly usfully in automatic attempts to 'align' the offset images. For simular images (mostly black), normalize the result to see what areas are different convert image1 image2 -compose difference -composite -normailze x: & Basic statistics convert image1 image2 -compose difference -composite miff:- |\ identify -verbose - |\ sed -n '/statistics:/,/^ [^ ]/ p' The numbers in parenthesis (if present) are normalized values between zero and one, so that it is independant of the Q level of your IM. Reduce it to an average percentage above zero convert image1 image2 \ -compose difference -composite -fx '(r+g+b)/3' miff:- |\ identify -verbose - |\ sed -n '/^.*Mean: */{s//scale=2;/;s/(.*)//;s/$/*100\/32768/;p;q;}' | bc With the same two text images I used for "compare" I had a result of... .55 As you can see it is a VERY low number, so images are very simular. For an example of making use of difference images such as generated by the above see... http://ecocam.evsc.virginia.edu/change_detection_description/change_detection_description.htm Programmed use of 'Compare()' in PerlMagick (see below) GQview Has a built in Simular function which in later GQview programs is now available to users under GUI control (it is NOT accessable from the command line so the process can not be automated. The function can be found in the source called "similar()", and constats of a averaged flat comparision of the RGB values. It also inverts the the result so that 1.0 = exact. It consists of just the avergaed difference between the individual red, green and blue pixel values (flat comparision of pixel values). However it is a great tool for comparing images, as you can generate a ordered collection of images such that 'simular images are next to each other, allowing you to 'flip' between the two to see the changes. I have yet to find another GUI tool that will preserve the order of images from multiple directories so simply. geeqie is its successor with simular cababilities ------------------------------------------------------------------------------ Comparing difference in pixel colors of two images Results to be expected of image difference comparing methods NOTE: grey is exactly half of white and black (RGB = .5,.5,.5) image1 image2 flat comp RGB squared RGB space ------------------------------------------------------------- white white 0.00000 0.000000 0.000000 white black 0.99999 0.999999 0.999999 red black 0.33333 0.333333 0.577350 red blue 0.66667 0.666667 0.816497 white grey 0.50000 0.250650 0.500000 flat comp Average Difference in seperate RGB values divided by 3 This is used by the internal QView Simular image comparitor RGB squared The squared RGB differences. This is used by the compare() API of ImageMagick RGB space As "RGB squared" but then square root taken to return to most correct RGB space distance (linear, rather than squared) Each of these are applied between each pixel of the two images, then results averged over the area of image being compared. What is interesting is the dramatic drop in the last value for white to grey for "RGB squared" (this result is also true for all pure primary and secondary colors in RGB space). As most colors in a image aren't a primary or secondary color, most colors when compared to a pure grey image will have values less than 25%. As such this comparison seems to make images match a solid grey image much more than the other two methods. The comparision of images in this way can also be improved by not only returning the average of the compared pixels. but also the standard deviation. A low deviation, even when the average pixel difference is high could mean the image is actually the same but with some general global color modification (image is darker, or lighter, or reder overall). One method missing from the above is the average color vector difference, and a average color difference. Average Color Vector Difference (the Tint) This is essentually the "tint" that was applied to two images that are otherwise simular. It is a average of all the red, green, and blue differences, which are returned seperately, and each may be a negative color change. Average Color Difference. This is the the average in the color vector lengths. But requires a square root to be applied for each pixel comparision, before the resulting value is averaged. With both of the above applied the "RGB space" error return is very simular to a standard divation (or error distortion) of the Average Color Difference Value (though with a per pixel, rather than a per value division before the final square root). ------------------------------------------------------------------------------- Taking "tint" into account. None of the above however takes a 'tint' change between to images in account. For example consider two images, on a moden looking photo, and the other a faked 'yellowed' photograph style of the original image. All the above methods will generally produce a large difference between the two images. To take this sort of this into account, you really need to calculate the "Average Color Vector Difference" (see above) and then "tint" the first image to remove it from the equation. You can then do the comparision as before, to get a better comparsion value between the two images. ------------------------------------------------------------------------------- Comparision Function Trials (imagemagic)... ImageMagick Compare() Example script =======8<-------- #!/usr/bin/perl # # Return Compared differences between two images. # use Image::Magick; $i1 = Image::Magick->new; $i2 = Image::Magick->new; $i1->Read( filename=> shift ); # Read Images $i2->Read( filename=> shift ); $i1->Scale(geometry=>"32x32\!"); # Convert to thumbs $i2->Scale(geometry=>"32x32\!"); $i3 = $i1->Compare(image=>$i2, metric=>'MSE'); # Compare die "Compare Failure\n" unless @$i3; # size mis-match? printf "Errors is %f\n", $i3->Get('error'); =======8<-------- This function does a straight numerical comparision of the two images, using the RGB space distance squared. "mean-error" returned the difference as a value from 0.0 to 1.0 Problems with method.... * only images of the same size can be compared * SPAM images added to the image can really make a big difference in one small area of the image * cropped images often do not compare well * it will error if any transparency is present in image To solve the first problem, my final solution thumbnails images being compared to 32x32 pixels regardless of the size. The second required more work. I ended up capturing 5 areas of the image (corners and middle) and comparing them seperately. I then discard the 2 most different regions before averaging the results of the last three regions. My current program now compares two directories of images to each other, showing the images it considers to be "related" in some way. Even minor cropping of an image will usally come out as irrelevent to the result! Email me if you like to see it. As Image magic is currently using "RGB squared" for comparision then a solid grey image seemes to closely match a lot of images you wouldn't think matched. I have seen this result in my image comparisions so far. :-( ------------------------------------------------------------------------------- Detecting spam text or other additions to an photo image Build a Histogram and look for single color spikes A photo of a person will have a more-or-less continuous histogram while a photo with overlaid computer graphics and text will have a spiky one. Sorry I don't have data to support this assertion. Clearly experimental, but images happen, and even a marginal ***Spam Analyzer*** would be quite valuable. Glenn Randers-Pehrson I would add that a spike is only valid if a spectrum of other colors are present. Otherwise you will also match titles, buttons, and other images with only a few colors (icons) or diagrams. -------------------------------------------------------------------------------