Hi
all,
I am trying to develop a solution for binary image
classification. In my training set, I have two folders, folder 1 including
around 1000 images and folder 2 around 400 images (all images look like pac man
maze, colored 224*224 pixels). Images in Folder 1 are examples of a specific
strategy (called strategy 1), and folder 2 represents a different strategy
(called strategy 2). I used image augmentation to generate more images for both
folders (used cropping and rotation) and currently folder 1 and 2 contains 1900
images each. For the test set, I randomly picked 20 images from each folder.
I have attached my solution for this binary classification task.
As you can see in the process, I import a pre-trained model of Resnet 50 for
transfer learning. To fine-tune the model, I freeze up to "avg_pool"
and remove "flatten_1" and "fc1000". I added fully
connected, dropout, fully connected, dropout, and output layers to retrain the
model.
The result that I get (after training 14 hours on a CPU, 16 GB
RAM) is the prediction of only one of the classes, for instance, class strategy
1 or class strategy 2 (see attached). My questions:
1) should I change the way I retrain the Resnet 50 (e.g.,
freezing or retraining more/less layers) so the model can correctly predict the
classes? if so, how?
2) when placing the "import existing model" and
"fine-tune model" inside a split or cross-validation it gives errors
complaining about tensor input. How should I revise the solution to have the
model cross-validated?
3) any suggestions on other pre-trained models like VGGs and how
to adjust their architecture (freeze, retrain, etc.) to possibly get better
results in a shorter time?
Thank you,
Liina
PS, I took an image from the puzzle nation blog to
roughly show an example of my training images