As the capabilities of devices increase, so do the boundaries of what those devices can be used for. A recent DSTIL project required the creation of an application which would detect a human face within a selfie image and apply a face altering mask over the top, all within a very short timeframe. Additional features were also required in the application but they had no relevance to the face detection aspects of the app. All processing was required to occur on the device and not through an external server.
The application being developed was being targeted for all phones and tablet devices on both Android and iOS. Due to the project time frame, the target device requirements and the graphical requirements of the application we selected development via Unity 3D. This would allow us to deploy to both platforms from the single project and allows for the required visual effects to be included with relative ease. Another concession made was to make use of the extensive library of Unity plugins available in order to reduce development time wherever possible.
As with all projects, there were a number of issues and pain points encountered during development. This post is intended to highlight those encountered regarding the implementation of the image processing and face detection libraries within a mobile application and the way in which they were resolved in this scenario. The issues identified include the initial implementation of OpenCV within the project, dealing with hardware capabilities/restrictions, the accuracy required of the face detection itself as well as the specific modifications that had to be applied to the images.
The first stop that the majority of people will go to for image processing is OpenCV. It gives you (relatively) free reign over what you what to accomplish through image processing. In order to implement this within Unity you have 2 potential methods.
Both of these methods give you full access to OpenCV within Unity and its range of capabilities. The significant differences between the two are time and price. Installing by hand is the cheaper option but requires you to run through the full setup process whereas for $95 USD the Unity plugin gives you full access to everything straight away. As this project was on a tight schedule we opted to go with the paid plugin.
When targeting face detection in particular, there are a few methods that can be used. The following were trialed during this project:
We tried using a Haar Cascade Classifier initially and we found that it works quite well when detecting faces within images. The issues with this method became apparent when it could not reliably detect eyes within these images. It also did not seem to be best suited for mobile use as it would process the images fairly slowly.
In this scenario, as we also needed to directly place an image mask over the top of the users face, more specific details of the detected face were required in order to correctly line up both images. The level of detail required can be achieved by using the Dlib Face Landmark Detector which will retrieve 68 points around the face in order to map out the features of a face.
This allows matching the points detected from the image masks directly with the corresponding points detected in the faces.
As this product was required to run on as many phones and tablets as possible it meant that we had to cater for devices of varying power and capability. As we got into the testing phase we discovered that there was a rather large variation in camera responsiveness depending on which device was used. It was not simply a matter of the older devices performed worse than the new. Some of the most recent phones actually had more camera issues than their earlier models. To resolve the majority of these camera performance issues we adjusted a number of the preview image settings as well as the quality of the image stored to ensure a consistent result on all devices.
A key point to make note of is that the orientation of the image is critically important when running face detection. In mobile devices, the physical cameras are built into the devices as landscape. When taking portrait images, the image is saved as landscape while the intended orientation is tracked through the metadata stored with the image. This means that the image preview is displayed correctly on the device, but the image processing would use the landscape image in portrait mode. This resulted in the image being stretched incorrectly, subsequently causing the face detection to fail. There was a relatively simple solution to this problem. The correct orientation was determined and then we rotated the actual saved file before running the Dlib detection.
As the application was required to do all image detection processing on the mobile device and not connect to an external server, this also resulted in different devices having drastically different processing times. During the trials of the different face detection techniques it was made clear very quickly which of the two methods would process the images quickest. It was a happy coincidence that Dlib could process the fastest as it was required for the specific face regions. In addition to using the faster of the two face detection methods we also added in a minimum load time screen containing the client campaign information to mask the majority of the unavoidable processing time.
All in all it was a relatively simple process to get OpenCV setup with mobile applications due to the tools available these days. The task required the specific facial region markers which determined the specific method used and it is always key to keep in mind the target devices in order to ensure the desired performance can be achieved. Ultimately OpenCV is a very powerful tool for image processing and the possibilities are endless with what can be achieved. As it is expected that device capabilities will keep increasing, it will be interesting to see what application concepts are created in response to this.
Thanks to Shannon Pace, James Gardner, Simon Vajda and Antonio Giardina for proofreading and providing suggestions.