SmileDetector: Detecting and Rating Smiles

Proposal

Medium Article

Introduction

Smile! Ever wonder if machine learning models can recognize facial features like humans can? Well, here’s your answer! Project director Noah Cristino and his team created a multi-step approach for detecting smiles. Rather than using a standard CNN, the developers were able to reduce the model’s input size significantly using a novel approach. The result is a model that can perform smile classification with high accuracy in a fraction of the time of comparative models. Its name? SmileDetector!

SmileDetector has the potential to increase the quality of life of consumers by being applied in a variety of settings, including engagement analysis, marketing, and photography. This feature can be integrated into camera apps to detect smiles, in order to help take pictures of children and babies, who struggle to keep prolonged smiles for pictures. In the future, if this technology is extended to detect a variety of other facial expressions, it has massive potential to be used on video streaming platforms to rate user engagement.

Smile detection with lip keypoint

Framework

Dataset

The GENKI-4000 Dataset compiled by the MPLab at UC San Diego was used to train the model. This dataset consists of 4,000 images that all contain faces that are either smiling or not smiling. This dataset is tagged with 1 = smiling, 0 = not smiling.

GENKI-4000 Dataset

Preprocessing

A program takes the input image, resizes it, and converts it to grayscale. This image is then passed into the DLib model, and facial vectors are generated. The vector representing the bridge of the nose is used to rotate the facial vectors so that the nose bridge is perfectly vertical. The vectors are then localized to deal with the fact that the face can be a different size in each image. These vectors are then taken and added to the ones around the mouth to a list. This list is passed into the SVC model, which detects whether the image contains a smile or not.

Face Keypoints

Architecture

Network Structure The DLib Facial Detect model uses a Histogram of Oriented Gradients (HOG) feature combined with a linear classifier, an image pyramid, and a sliding window detection scheme described in this paper. This model is pretrained and provided in the DLib python library. The SVC model was trained using lists of vectors as input data, which mapped to a list of booleans that indicate smiling or not smiling.

Model Architecture

Additional Features In addition to creating the basic smile detector, the same model was used to create a video parsing feature. Using this feature, one would be able to upload a video, and then receive an edited version of that video containing only the sections where someone is smiling. This worked by analyzing keyframes in the video to see if they were smiling or not, as opposed to processing the entire video, which made for faster processing times. Using this technique, large video files could be processed in a relatively short amount of time and increase the usefulness of the model created.

Video parsing tool

Conclusion

Our project resulted in a much faster model than competing smile detection models while maintaining high accuracy. The speed allowed us to perform a live video analysis up to 720 FPS, allowing us to analyze webcam footage in real-time. This makes it possible to use our model for applications previously not possible with other competing models. Furthermore, we can analyze higher resolution cameras in real-time and run our model alongside others due to its small size. This allows us to generate additional data that can be used alongside the user’s emotion for analyzing engagement. Shrinking the number of required resources to create a faster model has allowed us to use this technology in new ways and outperform existing models.