Classifying images of important personalities
Introduction
Using machine learning or deep learning techniques computers can perform tasks of the human eye.
It has its use cases in multiple applications like automatic fast tag scanners on toll nakas on Indian highways, in-camera locks to identify if the phone is being unlocked by the owner or not, and so on.
This article attempts to classify images of well-known people.
As part of this article, we would
1) Work on collecting data
2) Analyse the data
3) Use a Convolution Neural Network for image classification
Dataset Collection
The dataset includes images of
- Sania Mirza(Tennis Champion),
- APJ Abdul Kalam(Former President of India, and an aerospace scientist),
- Salman Khan(founder of Khan Academy),
- Muhammad Ali(One of the best heavyweight boxers of all time) and,
- Bismillah Khan(Indian musician and recipient of the Bharat Ratna)
For data gathering, you could use any of the following methods
Web scraping
Using Fatkun extension on your browser
Buying images from a third-party vendor
We have used fatkun extension for this case study to download images available on Google.
We load the image in python using OpenCV:
img = cv2.imread("../input/saniamirza/Adelaide International 2022 tennis.jpg")
Data Cleaning
For identifying a person from their image the most important feature would be their face, other features like height may help, but the face can alone help in the classification.
We would use Open CV to crop faces from the original images. Even in faces, the angles of the face matter a lot. Consider an image as the next one.
The face of Muhammad Ali is not visible clearly, it would not help the model to learn its facial features correctly. We would hence only keep images that have both eyes visible in the cropped image of the face. We would use the haar cascade model for both the cropping and discarding of images.
Code breakdown and explanation:
We would take one sample for explaining the code. Viewing the image we had loaded above.
plt.imshow(img)
plt.show()
img.shape
We convert the image to grayscale, to make the computation simpler.
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
plt.imshow(gray,cmap='gray')
plt.show()
gray.shape
We load haar cascades models for detecting frontal faces and eyes.
face_cascade = cv2.CascadeClassifier('../input/haarcascades/haarcascade_frontalface_alt.xml')
eye_cascade = cv2.CascadeClassifier('../input/haarcascades/haarcascade_eye.xml')
For detecting the face in the image using the face_cascade model we defined, use
faces = face_cascade.detectMultiScale(gray)
It would return the coordinates of the rectangle around the face, as:
Plotting the same on our original image
face_img = cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),2)
plt.imshow(face_img)
For detecting the eyes on the image
cv2.destroyAllWindows()
for (x,y,w,h) in faces:
face_img = cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),2)
roi_gray = gray[y:y+h, x:x+w]
roi_color = face_img[y:y+h, x:x+w]
eyes = eye_cascade.detectMultiScale(roi_gray)
for (ex,ey,ew,eh) in eyes:
cv2.rectangle(roi_color,(ex,ey),(ex+ew,ey+eh),(0,255,0),2)
plt.figure()
plt.imshow(face_img, cmap='gray')
plt.show()
Viewing the cropped image
%matplotlib inline
plt.imshow(roi_color, cmap='gray')
Adding all of the above into a single function we get.
path_to_cr_data = "/kaggle/working/cropped_images/"
import shutil
if os.path.exists(path_to_cr_data):
shutil.rmtree(path_to_cr_data)
os.mkdir(path_to_cr_data)
def get_cropped_image_if_2_eyes(image_path):
img = cv2.imread(image_path)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(gray)
for (x,y,w,h) in faces:
roi_gray = gray[y:y+h, x:x+w]
roi_color = img[y:y+h, x:x+w]
eyes = eye_cascade.detectMultiScale(roi_gray)
if len(eyes) >= 2:
return roi_color
cropped_image_dirs = []
celebrity_file_names_dict = {}
Y = []
X = []
for img_dir in img_dirs:
count = 1
celebrity_name = img_dir.split('/')[-1]
print(celebrity_name)
celebrity_file_names_dict[celebrity_name] = []
for entry in os.scandir(img_dir):
roi_color = get_cropped_image_if_2_eyes(entry.path)
if roi_color is not None:
cropped_folder = path_to_cr_data + celebrity_name
if not os.path.exists(cropped_folder):
os.makedirs(cropped_folder)
cropped_image_dirs.append(cropped_folder)
print("Generating cropped images in folder: ",cropped_folder)
cropped_file_name = celebrity_name + str(count) + ".png"
cropped_file_path = cropped_folder + "/" + cropped_file_name
Y.append(celebrity_name)
X.append(cropped_file_path)
cv2.imwrite(cropped_file_path, roi_color)
celebrity_file_names_dict[celebrity_name].append(cropped_file_path)
count += 1
Acknowledgments:
https://www.youtube.com/playlist?list=PLeo1K3hjS3uvaRHZLl-jLovIjBP14QTXc
Using Convolution Neural Network for classification
We create the train and validation datasets
width = 128
height = 128
batch_size = 32
path = "/kaggle/working/cropped_images"
train_ds = tf.keras.utils.image_dataset_from_directory(
path,
validation_split=0.2,
subset="training",
seed=123,
image_size=(width, height),
batch_size=batch_size)
val_ds = tf.keras.utils.image_dataset_from_directory(
path,
validation_split=0.2,
subset="validation",
seed=123,
image_size=(width, height),
batch_size=batch_size)
We resize the image to be 128X128 pixels and set the batch size to be 32.
It generates the class names as well from the folder names you have created.
You would have to create a separate folder for each class.
class_names = train_ds.class_names
print(class_names)
We then create a sequential model to classify the images
num_classes = len(class_names)
model = Sequential([
layers.Rescaling(1./255, input_shape=(width, height, 3)),
layers.Conv2D(16, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(32, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(64, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.Dense(num_classes)
])model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
model.summary()
After training the model for 25 epochs and plotting the train validation loss and accuracy we see that the model is overfitting.
epochs=25
history = model.fit(
train_ds,
validation_data=val_ds,
epochs=epochs
)
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs_range = range(epochs)
plt.figure(figsize=(8, 8))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')
plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()
The training loss is decreasing linearly with the epochs but the validation loss does not seem to, after the 5th epoch it is increasing rather.
To overcome overfitting we would use data augmentation and Dropout.
Handling Overfitting
Data Augmentation
Working with complex models as neural networks requires a lot of training data, in the absence of which the model tends to learn unnecessarily details of the training data and not perform adequately on unseen data. Data Augmentation involves augmenting the original data to create more samples for each class.
Examples are
layers.RandomFlip: This would flip the image horizontally or vertically depending on the parameter passed. Since vertical flipping would not suit our current scenario, we use only horizontal flipping.
layers.RandomRotation: Rotates the image randomly based on the parameter, a negative number would be rotating the image clockwise and the positive number would mean rotation of the image in a counterclockwise direction.
layers.RandomZoom: Zooms the image
Viewing the code in action:
data_augmentation = keras.Sequential(
[
layers.RandomFlip("horizontal",
input_shape=(height,
width,
3)),
layers.RandomRotation(0.1),
layers.RandomZoom(0.1),
]
)
Dropout:
Dropouts can be very useful in tackling overfitting, What it does is if you specify a dropout of 0.1, then 10% of random nodes would have a weight of 0 for that epoch, that is they would not contribute to the training process.
Adding both augmentation and dropouts in our model it looks like
model = Sequential([
data_augmentation,
layers.Rescaling(1./255),
layers.Conv2D(16, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(32, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(64, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Dropout(0.2),
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.Dense(num_classes)
])
Model Summary
Plotting the train test loss curve now, we see
Conclusion
- The above model can be further improved by using more input data.
- Data Augmentation and Dropout can be used to handle overfitting.
A tiny bit about me:-
I am Alifia, currently working as an analyst. By writing these articles I try to deepen my understanding of applied machine learning.