Face recognition using deep learning
Some time ago, I had this idea of making a hand gesture recognition glove using MPU6050 gyroscope and accelerator sensor. Though it was possible to detect some basic gestures I was unable to use it to recognize complex hand gestures. So then I turned my eyes towards image processing. Though I had not really taken a university course on this area, after talking with the supervisor I started following some online lessons on "Processing." Then I came across "Open cv" library. I started following the tutorials on "docs.opencv.org". Though "opencv" can be used in a wide range of applications I was particularly interested in "Object and feature detection" and "Machine learning." After extensively reading playing around with feature algorithms like SIFT (Scale-Invariant Feature Transform), SURF (Speeded-Up Robust Features, Fast Algorithm and basic machine learning algorithms like k-Nearest Neighbor (kNN), Sample Vector Machines (SVM), k-Means Clustering, I wanted to combine those two branches and try a project that recognizes faces using machine learning.
Then there I came across this library named "face_recognition" written for python by Adam Geitgey. Click here to read about face_recognition. It is wrapped around "Dlib" another library written in C++ by Davis King which containing machine learning algorithms.
Face_Recognition
Face_recognition divides the task of recognizing a face into several steps. In the first step it finds the locations of all the faces in a photo or a frame. To do this they use the algorithm called HOG (Histograms of Oriented Gradients). After locating the faces in the second step it solves the problem of faces being turned to different directions. There an algorithm called Face Landmark estimation is used. It works by training a machine learning algorithm to find 68 particular points on a face. The third step is encoding. There a deep neural network is trained to take 128 separate measurements(an encoding) of a face and write it into a file. Open face is the library they have used here. In the fourth step, once an example face is given finding the matching encoding and name is done by a linear SVM classifier.
PY Image search
Reading further on this topic brought me to this fantastic website by Adrian Rosebrock. He has written a collection of articles and tutorials on computer vision. His article titled Face recognition with OpenCV, Python, and deep learning provided me with a complete guide on how to effectively use face_recognition library along with machine learning.
Installing necessary libraries
The first step was installing all the necessary libraries. And I was so far using opencv in windows and I could run the programs with no hassle. However it proved to be a mammoth task getting dlib and face_recognition running on windows. Therefore I shifted to Ubuntu. As prescribed by many authors I created a virtual environment and isolated my project from the standard python. It was much easier from there onwards to install dlib, face_recognition and imutils. Once again the first two packages I built from the source rather than using PIP.
Creating the deep learning image data set
To create the image data set I referred to a separate article, once again written by Adrian. There he has used Microsoft's Bing Image Search API to download images linked to a certain query. After registering myself at Microsoft cognitive services it was possible to get a free trial API for a week. Using the python "requests" library a program was run to get the images. Since I'm a big fan of the popular sitcom "The Big Bang Theory" I created a dataset of the main characters Sheldon, Leonard, Penny, Howard, Amy, Bernadette and Raj. Each character had about 40 pictures.
The program
For the codes I mostly referred to the PYimagesearch.com and adapted with slight changes. So a python program called "encode_faces" was written to write the encodings into a file. To detect the faces I used the "HOG" algorithm, which is said to be faster but a little low on accuracy. Though I tried "CNN" algorithm which is supposedly more accurate, since I was running without a GPU my computer was stuck. Therefore decided to stick with HOG. So the program loops through all the images in the dataset, and first it will locate the faces in each image. Then it will write two lists, known encodings and known names. An encoding is a 128 measurements of a face. And then it writes the information into a .pickle file.
Then another program was run to to test and match the encodings that are already written. First, an image was tested. So the initial step was to find out the locations of the test image. Similar to the first code this was done by using the "Hog" algorithm which was relatively faster. Then the encodings of the faces were calculated. Once it was done we had to compare these encodings with our previously acquired data to see whether there is a matching name.
The .pickle file containing the data was loaded and then compare_faces function was used to do the comparison. This function returns a Boolean value(true/false) for each sample picture in the data set. After looping through the whole dataset, the number of votes (number of true values) in front of each name can be taken. Since there can be more than one face in a single picture, the names with true votes higher than a threshold are put in another list.
Then another loop was written to go through this list and draw rectangles around the faces and display names of the matches found.
A similar approach was followed in detecting faces in a video.
Then there I came across this library named "face_recognition" written for python by Adam Geitgey. Click here to read about face_recognition. It is wrapped around "Dlib" another library written in C++ by Davis King which containing machine learning algorithms.
![]() |
| Test Image |
Face_Recognition
Face_recognition divides the task of recognizing a face into several steps. In the first step it finds the locations of all the faces in a photo or a frame. To do this they use the algorithm called HOG (Histograms of Oriented Gradients). After locating the faces in the second step it solves the problem of faces being turned to different directions. There an algorithm called Face Landmark estimation is used. It works by training a machine learning algorithm to find 68 particular points on a face. The third step is encoding. There a deep neural network is trained to take 128 separate measurements(an encoding) of a face and write it into a file. Open face is the library they have used here. In the fourth step, once an example face is given finding the matching encoding and name is done by a linear SVM classifier.
![]() |
| 128 measurements |
![]() |
| Embedding |
Reading further on this topic brought me to this fantastic website by Adrian Rosebrock. He has written a collection of articles and tutorials on computer vision. His article titled Face recognition with OpenCV, Python, and deep learning provided me with a complete guide on how to effectively use face_recognition library along with machine learning.
Installing necessary libraries
The first step was installing all the necessary libraries. And I was so far using opencv in windows and I could run the programs with no hassle. However it proved to be a mammoth task getting dlib and face_recognition running on windows. Therefore I shifted to Ubuntu. As prescribed by many authors I created a virtual environment and isolated my project from the standard python. It was much easier from there onwards to install dlib, face_recognition and imutils. Once again the first two packages I built from the source rather than using PIP.
Creating the deep learning image data set
To create the image data set I referred to a separate article, once again written by Adrian. There he has used Microsoft's Bing Image Search API to download images linked to a certain query. After registering myself at Microsoft cognitive services it was possible to get a free trial API for a week. Using the python "requests" library a program was run to get the images. Since I'm a big fan of the popular sitcom "The Big Bang Theory" I created a dataset of the main characters Sheldon, Leonard, Penny, Howard, Amy, Bernadette and Raj. Each character had about 40 pictures.
The program
For the codes I mostly referred to the PYimagesearch.com and adapted with slight changes. So a python program called "encode_faces" was written to write the encodings into a file. To detect the faces I used the "HOG" algorithm, which is said to be faster but a little low on accuracy. Though I tried "CNN" algorithm which is supposedly more accurate, since I was running without a GPU my computer was stuck. Therefore decided to stick with HOG. So the program loops through all the images in the dataset, and first it will locate the faces in each image. Then it will write two lists, known encodings and known names. An encoding is a 128 measurements of a face. And then it writes the information into a .pickle file.
Then another program was run to to test and match the encodings that are already written. First, an image was tested. So the initial step was to find out the locations of the test image. Similar to the first code this was done by using the "Hog" algorithm which was relatively faster. Then the encodings of the faces were calculated. Once it was done we had to compare these encodings with our previously acquired data to see whether there is a matching name.
The .pickle file containing the data was loaded and then compare_faces function was used to do the comparison. This function returns a Boolean value(true/false) for each sample picture in the data set. After looping through the whole dataset, the number of votes (number of true values) in front of each name can be taken. Since there can be more than one face in a single picture, the names with true votes higher than a threshold are put in another list.
Then another loop was written to go through this list and draw rectangles around the faces and display names of the matches found.
A similar approach was followed in detecting faces in a video.
![]() |
| Test 2 |
![]() |
| Test 2 |











































