top of page

Feature Extraction Using RGB Colorspace

More often than not, the only access to certain spatial data is via an article or a PDF report. If the authors are kind enough, they'll publish a rich PDF which includes spatial data, but that is seldom the case. Take this KKL-JNF map delimiting the 2010 Carmel Forest fire:

How can one extract this polygon's outline?

The obvious solution is to digitize it: Georeference the map, create a new polygon vector layer, and go about tracing the red outline. While this method is pretty fast in most cases, it may be time-consuming in more complex polygons.

The solution I offer here is one that I've perfected over years of extracting data.

The main factor in extracting the polygon is the color. We need a method that "knows" it's seeing the color red, without any machine vision.

We do that by using the RGB colorspace.

In short: any color image is basically 3 greyscale images, one for Red, Green and Blue respectively. These greyscale images range from 0 to 255, the most common 8-bit binary representation, with 0 being complete black (or in some cases no data), and 255 being white. Mix these the images together into a multi-band image, and you got yourself an RGB image. (0,0,0) is Black, (255,255,255) is White, (242,187,102) is Mac n' Cheese, (255,0,0) is red, and all other 16.5 million colors are combinations of these 3.

Back to the image above, the outline is red, which makes it very easy for extraction. In fact, the more basic the color, the easier it is. But the red isn't perfect. Due to JPG compression, those pixelated edges vary in their redness hue:

Here's the image's Red band:

Red is obviously a higher value and close to 255, but so are all other light hues like pink, light gray, and of course - white.

In order to extract the red, we need to use the chromatic coordinates in the RGB colorspace, where: X = red, Y = green and Z = blue.

By using the formula R/(R+G+B), I am able to extract red hues from the image. Any pure red pixel will be 1: (255/(255+0+0), near red will be close to 1, while all other pixel values across the image will be distributed more or less around 0.33. Note that I've selected to divide red by the sum of all 3 bands because the polygon is red. I can do it for green and blue, as well as any other combination, as long as I chose the dominant base color. If the color we need to extract sits on an edge of the RGB colorspace and is equidistant from the base colors, like Cyan, Magenta or Yellow,we cab convert the image to CYMK, and proceed with the extraction in much the same way.

After calculating the chromatic red value from all 3 bands, the result highlights all the red pixels in the original map:

Now we can clearly see how the reds are a very high value, and all other pixels are 30%-40% grey. I can ignore the artifacts around text labels, as i mentioned, it's the result of the jpg compression.

Now comes the trial and error part of the process, finding the value ranges that will represent only the specific pixels we need. In this case, it's pretty easy: X > 0.95. In some other cases I've dealt with, this range would be so narrow, even 0.01 value ranges would cause false positives and omission errors.

You can see in the raster histogram window how the mean value is 0.33, the majority of values distributed around it, and that there are a small percentage of values close to 1. These are the red pixels we're after.

A basic conditional statement later and we've binarized the raster to reds and others:

The next step is to vectorize this image by the raster value: 1 for red, 0 for all other pixels. This produces a polygon feature spanning the extent of the original image.

At this point, we can manually select the outline feature we are seeking. The result is this polygon layer, full of holes.irregularities, and two gaps where the labels overlapped.

If there any gaps in the shape of the feature, like the ones cause by the labels, we must close these gaps with two simple polygon features, using the base map as a guide.

Now we need to merge all the little polygons together, as well as the doughnut fill itself. To do this, we first convert the polygon features into line features. This has the added benefit of eliminating all the irregularities cause by the raster vectorization, as well as encircling the doughnut fill with lines.

Next, we convert the line features back into polygons. this operation forms a polygon "inside" any outline, including the doughnut.

The last step is to dissolve all the features into one polygon:

And there you have it: from a print map to a vector feature in just a few steps.

The end result can be further refined with a negative buffer of half the outline width, and even with some smoothing to hide any remaining small kinks.

Featured Posts
Recent Posts
Archive
Search By Tags
No tags yet.
Follow Us
  • Facebook Basic Square
  • Twitter Basic Square
  • Google+ Basic Square
bottom of page