Classifying images of important personalities

Alifia Ghantiwala
6 min readApr 4, 2022

--

Introduction

Using machine learning or deep learning techniques computers can perform tasks of the human eye.

It has its use cases in multiple applications like automatic fast tag scanners on toll nakas on Indian highways, in-camera locks to identify if the phone is being unlocked by the owner or not, and so on.

This article attempts to classify images of well-known people.

As part of this article, we would

1) Work on collecting data

2) Analyse the data

3) Use a Convolution Neural Network for image classification

Dataset Collection

The dataset includes images of

  • Sania Mirza(Tennis Champion),
  • APJ Abdul Kalam(Former President of India, and an aerospace scientist),
  • Salman Khan(founder of Khan Academy),
  • Muhammad Ali(One of the best heavyweight boxers of all time) and,
  • Bismillah Khan(Indian musician and recipient of the Bharat Ratna)

For data gathering, you could use any of the following methods

Web scraping

Using Fatkun extension on your browser

Buying images from a third-party vendor

We have used fatkun extension for this case study to download images available on Google.

We load the image in python using OpenCV:

img = cv2.imread("../input/saniamirza/Adelaide International 2022 tennis.jpg")

Data Cleaning

For identifying a person from their image the most important feature would be their face, other features like height may help, but the face can alone help in the classification.

We would use Open CV to crop faces from the original images. Even in faces, the angles of the face matter a lot. Consider an image as the next one.

The face of Muhammad Ali is not visible clearly, it would not help the model to learn its facial features correctly. We would hence only keep images that have both eyes visible in the cropped image of the face. We would use the haar cascade model for both the cropping and discarding of images.

Code breakdown and explanation:

We would take one sample for explaining the code. Viewing the image we had loaded above.

plt.imshow(img)
plt.show()
img.shape

We convert the image to grayscale, to make the computation simpler.

gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
plt.imshow(gray,cmap='gray')
plt.show()
gray.shape

We load haar cascades models for detecting frontal faces and eyes.

face_cascade = cv2.CascadeClassifier('../input/haarcascades/haarcascade_frontalface_alt.xml')
eye_cascade = cv2.CascadeClassifier('../input/haarcascades/haarcascade_eye.xml')

For detecting the face in the image using the face_cascade model we defined, use

faces = face_cascade.detectMultiScale(gray)

It would return the coordinates of the rectangle around the face, as:

Plotting the same on our original image

face_img = cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),2)
plt.imshow(face_img)

For detecting the eyes on the image

cv2.destroyAllWindows()
for (x,y,w,h) in faces:
face_img = cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),2)
roi_gray = gray[y:y+h, x:x+w]
roi_color = face_img[y:y+h, x:x+w]
eyes = eye_cascade.detectMultiScale(roi_gray)
for (ex,ey,ew,eh) in eyes:
cv2.rectangle(roi_color,(ex,ey),(ex+ew,ey+eh),(0,255,0),2)
plt.figure()
plt.imshow(face_img, cmap='gray')
plt.show()

Viewing the cropped image

%matplotlib inline
plt.imshow(roi_color, cmap='gray')

Adding all of the above into a single function we get.

path_to_cr_data = "/kaggle/working/cropped_images/"
import shutil
if os.path.exists(path_to_cr_data):
shutil.rmtree(path_to_cr_data)
os.mkdir(path_to_cr_data)
def get_cropped_image_if_2_eyes(image_path):
img = cv2.imread(image_path)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(gray)
for (x,y,w,h) in faces:
roi_gray = gray[y:y+h, x:x+w]
roi_color = img[y:y+h, x:x+w]
eyes = eye_cascade.detectMultiScale(roi_gray)
if len(eyes) >= 2:
return roi_color
cropped_image_dirs = []
celebrity_file_names_dict = {}
Y = []
X = []
for img_dir in img_dirs:
count = 1
celebrity_name = img_dir.split('/')[-1]
print(celebrity_name)
celebrity_file_names_dict[celebrity_name] = []
for entry in os.scandir(img_dir):
roi_color = get_cropped_image_if_2_eyes(entry.path)
if roi_color is not None:
cropped_folder = path_to_cr_data + celebrity_name
if not os.path.exists(cropped_folder):
os.makedirs(cropped_folder)
cropped_image_dirs.append(cropped_folder)
print("Generating cropped images in folder: ",cropped_folder)
cropped_file_name = celebrity_name + str(count) + ".png"
cropped_file_path = cropped_folder + "/" + cropped_file_name
Y.append(celebrity_name)
X.append(cropped_file_path)
cv2.imwrite(cropped_file_path, roi_color)
celebrity_file_names_dict[celebrity_name].append(cropped_file_path)
count += 1

Acknowledgments:

https://www.youtube.com/playlist?list=PLeo1K3hjS3uvaRHZLl-jLovIjBP14QTXc

Using Convolution Neural Network for classification

We create the train and validation datasets

width = 128
height = 128
batch_size = 32
path = "/kaggle/working/cropped_images"
train_ds = tf.keras.utils.image_dataset_from_directory(
path,
validation_split=0.2,
subset="training",
seed=123,
image_size=(width, height),
batch_size=batch_size)
val_ds = tf.keras.utils.image_dataset_from_directory(
path,
validation_split=0.2,
subset="validation",
seed=123,
image_size=(width, height),
batch_size=batch_size)

We resize the image to be 128X128 pixels and set the batch size to be 32.

It generates the class names as well from the folder names you have created.

You would have to create a separate folder for each class.

class_names = train_ds.class_names
print(class_names)

We then create a sequential model to classify the images

num_classes = len(class_names)
model = Sequential([
layers.Rescaling(1./255, input_shape=(width, height, 3)),
layers.Conv2D(16, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(32, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(64, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.Dense(num_classes)
])
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
model.summary()

After training the model for 25 epochs and plotting the train validation loss and accuracy we see that the model is overfitting.

epochs=25
history = model.fit(
train_ds,
validation_data=val_ds,
epochs=epochs
)
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs_range = range(epochs)
plt.figure(figsize=(8, 8))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')
plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()

The training loss is decreasing linearly with the epochs but the validation loss does not seem to, after the 5th epoch it is increasing rather.

To overcome overfitting we would use data augmentation and Dropout.

Handling Overfitting

Data Augmentation

Working with complex models as neural networks requires a lot of training data, in the absence of which the model tends to learn unnecessarily details of the training data and not perform adequately on unseen data. Data Augmentation involves augmenting the original data to create more samples for each class.

Examples are

layers.RandomFlip: This would flip the image horizontally or vertically depending on the parameter passed. Since vertical flipping would not suit our current scenario, we use only horizontal flipping.

layers.RandomRotation: Rotates the image randomly based on the parameter, a negative number would be rotating the image clockwise and the positive number would mean rotation of the image in a counterclockwise direction.

layers.RandomZoom: Zooms the image

Viewing the code in action:

data_augmentation = keras.Sequential(
[
layers.RandomFlip("horizontal",
input_shape=(height,
width,
3)),
layers.RandomRotation(0.1),
layers.RandomZoom(0.1),
]
)

Dropout:

Dropouts can be very useful in tackling overfitting, What it does is if you specify a dropout of 0.1, then 10% of random nodes would have a weight of 0 for that epoch, that is they would not contribute to the training process.

Adding both augmentation and dropouts in our model it looks like

model = Sequential([
data_augmentation,
layers.Rescaling(1./255),
layers.Conv2D(16, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(32, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(64, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Dropout(0.2),
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.Dense(num_classes)
])

Model Summary

Plotting the train test loss curve now, we see

Conclusion

  • The above model can be further improved by using more input data.
  • Data Augmentation and Dropout can be used to handle overfitting.

A tiny bit about me:-

I am Alifia, currently working as an analyst. By writing these articles I try to deepen my understanding of applied machine learning.

--

--

Alifia Ghantiwala
Alifia Ghantiwala

Written by Alifia Ghantiwala

Trying to investigate data better!

No responses yet