PetFaces
Classification of Pet's Faces
Lab Assignment from AI for Beginners Curriculum.
Getting the Data
In this assignment, we will focus on relatively simple classification task - classification of pet's faces. We will use the Oxford-IIIT Pet Dataset, which contains images of 37 different breeds of dogs and cats. Let's start by downloading and visualizing the dataset.
Note: The Oxford-IIIT Pet Dataset contains full pet images. The images will be organized by breed in the extracted folder.
--2022-02-17 12:32:43-- https://thor.robots.ox.ac.uk/~vgg/data/pets/images.tar.gz Resolving mslearntensorflowlp.blob.core.windows.net... 20.150.90.68 Connecting to mslearntensorflowlp.blob.core.windows.net|20.150.90.68|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 24483412 (23M) [application/x-gzip] Saving to: ‘images.tar.gz’ images.tar.gz 100%[===================>] 23.35M 12.5MB/s in 1.9s 2022-02-17 12:32:45 (12.5 MB/s) - ‘images.tar.gz’ saved [24483412/24483412]
We will define generic function to display a series of images from a list:
Now let's traverse all class subdirectories and plot first few images of each class:
Let's also define the number of classes in our dataset:
35
Preparing dataset for Deep Learning
To start training our neural network, we need to convert all images to tensors, and also create tensors corresponding to labels (class numbers). Most neural network frameworks contain simple tools for dealing with images:
- In Tensorflow, use
tf.keras.preprocessing.image_dataset_from_directory - In PyTorch, use
torchvision.datasets.ImageFolder
As you have seen from the pictures above, all of them are close to square image ratio, so we need to resize all images to square size. Also, we can organize images in minibatches.
Now we need to separate dataset into train and test portions:
Now let's print the size of tensors in our dataset. If you have done everything correctly, the size of training elements should be
(batch_size,image_size,image_size,3)for Tensorflow,batch_size,3,image_size,image_sizefor PyTorchbatch_sizefor Labels
Labels should contain numbers of classes.
Define a neural network
For image classification, you should probably define a convolutional neural network with several layers. What to keep an eye for:
- Keep in mind the pyramid architecture, i.e. number of filters should increase as you go deeper
- Do not forget activation functions between layers (ReLU) and Max Pooling
- Final classifier can be with or without hidden layers, but the number of output neurons should be equal to number of classes.
An important thing is to get the activation function on the last layer + loss function right:
- In Tensorflow, you can use
softmaxas the activation, andsparse_categorical_crossentropyas loss. The difference between sparse categorical cross-entropy and non-sparse one is that the former expects output as the number of class, and not as one-hot vector. - In PyTorch, you can have the final layer without activation function, and use
CrossEntropyLossloss function. This function applies softmax automatically.
Train the Neural Network
Now we are ready to train the neural network. During training, please collect accuracy on train and test data on each epoch, and then plot the accuracy to see if there is overfitting.
To speed up training, you need to use GPU if available. While TensorFlow/Keras will automatically use GPU, in PyTorch you need to move both the model and data to GPU during training using
.to()method in order to take advantage of GPU acceleration.
What can you say about overfitting? What can be done to improve the accuracy of the model
Optional: Calculate Top3 Accuracy
In this exercise, we were dealing with classification with quite high number of classes (35), so our result - around 50% validation accuracy - is pretty good. Standard ImageNet dataset has even more - 1000 classes.
In such cases it is difficult to ensure that model always correctly predicts the class. There are cases when two breeds are very similar to each other, and the model returns very similar probablities (eg., 0.45 and 0.43). If we measure standard accuracy, it will be considered a wrong case, even though the model did very small mistake. This, we often measure another metric - an accuracy within top 3 most probable predictions of the model.
We consider the case accurate if target label is contained within top 3 model predictions.
To compute top-3 accuracy on the test dataset, you need to manually go over the dataset, apply the neural network to get the prediction, and then do the calculations. Some hints:
- In Tensorflow, use
tf.nn.in_top_kfunction to see if thepredictions(output of the model) are in top-k (passk=3as parameter), with respect totargets. This function returns a tensor of boolean values, which can be converted tointusingtf.cast, and then accumulated usingtf.reduce_sum. - In PyTorch, you can use
torch.topkfunction to get indices of classes with highers probabilities, and then see if the correct class belongs to them. See this for more hints.
Optional: Build Cats vs. Dogs classification
We also want to see how accurate our binary cats vs. dogs classification would be on the same dateset. To do it, we need to adjust labels: