Real-Time Sign Language Detection

This was my next ML project after the Depression Detection. I completed it in a group of 3 as part of my Machine Learning class. I was curious to try my hand at Computer Vision and thought learning some sign language on the side could be fun as well. As a result, I came up with a program that uses my laptop's webcam to detect several American Sign Language gestures in real time using OpenCV and Tensorflow's Object Detection API. This project was completed in November 2021.

Data Collection

To collect our data, I ran a quick loop to snap some pictures on my webcam of different gestures. In total, the project uses 18 different gestures. The gestures involved are: letters A-G, numbers 1-5, 'hello', 'yes', 'no', 'thank you', 'i love you', and 'birds up' (also known as the 'hang loose' or shaka sign, this is the official UTSA hand sign). We recorded 15 images for each gesture, ending up with a total dataset of 270 images.

To obtain labels for our images, we used the LabelImg graphical annotation tool. This is a standalone Qt-based app that allows us to mark out detection boxes for each of our images, enabling the model to train on the labels. After marking out and labeling each image in our dataset, the data was ready to be used.

Left: marking a border for the 'Thank you' gesture within LabelImg. Right: labelling the 'thank you' image.

Training and testing the model

With our data neatly labelled, we were ready to train the model. For this project, I used transfer learning, making use of the pre-trained ssd-MobileNet V2 model contained in TensorFlow's Object Detection API. This is an object detection and segmentation model trained on the MS-COCO dataset, containing over 300,000 images. Using this model, we can then adapt it to our sign language problem by training it on our sign language dataset.


With our model trained for over 20,000 steps, we received a classification loss of 0.026, a regularization loss of 0.071, and a total loss of 0.109. This indicates that our model performs fairly well at not only detecting that gestures are present on the screen, but that it can guess which gesture is present as well. We can now use our model to guess gestures in real time. Using the program, we found accuracy values of around 89-96% for all gestures. We then obtained a precision value of 76.9% and a recall value of 80.2% for the detection boxes, indicating fair performance of the model to detect gestures.


I was happy with the results we found in our project. With the amount of resources available we managed to create a good performing, easily adaptable real-time sign language detection program. To better enhance performance, we could potentially accrue more data in different lighting scenarios and across different angles to train it better against noise. In the end though, this was a great project to introduce myself to CV and object detection. I learned about the importance of precision, recall, and good quality data in tasks such as these.

To run the program for yourself and play with the model, check it out on my github: https://github.com/simonrendon/Real-time-sign-language