PetFaces

artificial-intelligencernnganmicrosoft-for-beginnerslessonsAImicrosoft-AI-For-Beginnersmachine-learning07-ConvNetsdeep-learninglab4-ComputerVisioncomputer-visioncnnNLP

Classification of Pet's Faces

Lab Assignment from AI for Beginners Curriculum.

Getting the Data

In this assignment, we will focus on relatively simple classification task - classification of pet's faces. We will use the Oxford-IIIT Pet Dataset, which contains images of 37 different breeds of dogs and cats. Let's start by downloading and visualizing the dataset.

Note: The Oxford-IIIT Pet Dataset contains full pet images. The images will be organized by breed in the extracted folder.

[1]
--2022-02-17 12:32:43--  https://thor.robots.ox.ac.uk/~vgg/data/pets/images.tar.gz
Resolving mslearntensorflowlp.blob.core.windows.net... 20.150.90.68
Connecting to mslearntensorflowlp.blob.core.windows.net|20.150.90.68|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 24483412 (23M) [application/x-gzip]
Saving to: ‘images.tar.gz’

images.tar.gz     100%[===================>]  23.35M  12.5MB/s    in 1.9s    

2022-02-17 12:32:45 (12.5 MB/s) - ‘images.tar.gz’ saved [24483412/24483412]

We will define generic function to display a series of images from a list:

[2]

Now let's traverse all class subdirectories and plot first few images of each class:

[ ]

Let's also define the number of classes in our dataset:

[4]
35

Preparing dataset for Deep Learning

To start training our neural network, we need to convert all images to tensors, and also create tensors corresponding to labels (class numbers). Most neural network frameworks contain simple tools for dealing with images:

  • In Tensorflow, use tf.keras.preprocessing.image_dataset_from_directory
  • In PyTorch, use torchvision.datasets.ImageFolder

As you have seen from the pictures above, all of them are close to square image ratio, so we need to resize all images to square size. Also, we can organize images in minibatches.

[5]

Now we need to separate dataset into train and test portions:

[6]

Now let's print the size of tensors in our dataset. If you have done everything correctly, the size of training elements should be

  • (batch_size,image_size,image_size,3) for Tensorflow, batch_size,3,image_size,image_size for PyTorch
  • batch_size for Labels

Labels should contain numbers of classes.

[1]
[9]
OutputOutputOutput

Define a neural network

For image classification, you should probably define a convolutional neural network with several layers. What to keep an eye for:

  • Keep in mind the pyramid architecture, i.e. number of filters should increase as you go deeper
  • Do not forget activation functions between layers (ReLU) and Max Pooling
  • Final classifier can be with or without hidden layers, but the number of output neurons should be equal to number of classes.

An important thing is to get the activation function on the last layer + loss function right:

  • In Tensorflow, you can use softmax as the activation, and sparse_categorical_crossentropy as loss. The difference between sparse categorical cross-entropy and non-sparse one is that the former expects output as the number of class, and not as one-hot vector.
  • In PyTorch, you can have the final layer without activation function, and use CrossEntropyLoss loss function. This function applies softmax automatically.
[2]

Train the Neural Network

Now we are ready to train the neural network. During training, please collect accuracy on train and test data on each epoch, and then plot the accuracy to see if there is overfitting.

To speed up training, you need to use GPU if available. While TensorFlow/Keras will automatically use GPU, in PyTorch you need to move both the model and data to GPU during training using .to() method in order to take advantage of GPU acceleration.

[11]
[13]
Output

What can you say about overfitting? What can be done to improve the accuracy of the model

Optional: Calculate Top3 Accuracy

In this exercise, we were dealing with classification with quite high number of classes (35), so our result - around 50% validation accuracy - is pretty good. Standard ImageNet dataset has even more - 1000 classes.

In such cases it is difficult to ensure that model always correctly predicts the class. There are cases when two breeds are very similar to each other, and the model returns very similar probablities (eg., 0.45 and 0.43). If we measure standard accuracy, it will be considered a wrong case, even though the model did very small mistake. This, we often measure another metric - an accuracy within top 3 most probable predictions of the model.

We consider the case accurate if target label is contained within top 3 model predictions.

To compute top-3 accuracy on the test dataset, you need to manually go over the dataset, apply the neural network to get the prediction, and then do the calculations. Some hints:

  • In Tensorflow, use tf.nn.in_top_k function to see if the predictions (output of the model) are in top-k (pass k=3 as parameter), with respect to targets. This function returns a tensor of boolean values, which can be converted to int using tf.cast, and then accumulated using tf.reduce_sum.
  • In PyTorch, you can use torch.topk function to get indices of classes with highers probabilities, and then see if the correct class belongs to them. See this for more hints.
[3]

Optional: Build Cats vs. Dogs classification

We also want to see how accurate our binary cats vs. dogs classification would be on the same dateset. To do it, we need to adjust labels:

[5]
[4]