Computer vision is easier than it seems thanks to libraries like OpenCV.

Here we will take a look at a previous assignment I worked on which implemented a rather rudimentary algorithm for detecting a group of hand signs.

First, let’s define what it means to recognize an object. In the context of this project, we’ll define it as successfully distinguishing hand signs from a given image or video feed and properly identifying the specific name of the hand sign if one exists.

This means that our program needs to complete essentially two main tasks:

  1. Distinguish signs
  2. Identify signs

Distinguishing objects from their surroundings is something we as humans do really well. When it comes to automating it, it seems rather confusing unless we know exactly what we are looking for. Luckily, we have a fairly good idea of what to look for! We know that a hand sign is composed of a hand which is an extension of a human being. It turns out that, despite vast ethnic differences, humans share a small range of skin colors which can be used to distinguish human parts from background images.

Hand Sign Recognition

To distinguish objects, we must apply a threshold function to essentially filter out everything that is not skin colored. We can do this by checking every pixel’s RGB value and creating a black & white image where white pixels are possible skin and black is background. Thanks to prior research[1][2], we also know what RGB values to look for:

  • Red > 95
  • Blue > 20
  • Green > 40
  • max(Red, Green, Blue) – min(Red, Green, Blue) > 15
  • abs(Red – Green) > 15
  • Red > Green
  • Red > Blue

Once we have a thresholded image, we can then have OpenCV draw contours of the skin colored objects. Once the contours are drawn, we have successfully distinguished objects that could possibly be a body part such as a hand.

After that, we keep a limited number of only the biggest contours and discard the rest. Since we are repeating all of these processes multiple times in less than a second, we need to be conscious about our system resources and execution time. Considering only the biggest contours allows us to investigate objects that are most likely to be hand signs as they tend to be closer to the camera than background objects which means they will have bigger contours.

Each remaining contour then gets compared to every single pre-computed hand sign contours in our system. Before we start processing images or a video feed, we must process sample images for hand signs we’d like to recognize. You can see processed images for some of the hand signs used in our system.

Paper Sign OK Sign

The comparison is done by utilizing the matchShapes function of OpenCV. The result is a number that represents the dissimilarity of two contours, i.e. closer to zero the more similar two contours are. In our case, we experimentally determined acceptable dissimilarity values which yielded reliable results. There are also very complex theoretical methods to do this.

Different hand signs are inherently similar to each other which needs to be alleviated as you could get false positives. (e.g. identifying a thumbs up while it in fact is a high five..) This can be done by picking the least dissimilar identification that is also below its predetermined acceptable value.

Here is an example code with in-line comments. The procedures outlined above are walked through in the code step by step.

[1]Vezhnevets, Vladimir, Vassili Sazonov, and Alla Andreeva. “A survey on pixel-based skin color detection techniques.” Proc. Graphicon. Vol. 3. 2003.

[2]Kakumanu, Praveen, Sokratis Makrogiannis, and Nikolaos Bourbakis. “A survey of skin-color modeling and detection methods.” Pattern recognition 40.3 (2007): 1106-1122.

Special thanks to Professor Margrit Betke, Ajjen Joshi and my teammates Maria Kromis and Abesary Woldeyesus.