Skin Cancer prediction by image processing through CNN

The objective of this problem is to classify skin cancer detections, around 1.98% of people in the world are affected due to skin cancer and this would help the community diagnose it in early stages where there is limited clinical expertise.

Complete code for this project can be found here:

This is a HAM10000 dataset containing 10012 images, classifying 7 types of cancer, and each instance has been resized to 64*64 RGB image for this problem, associated with label.


This work is part of my experiments with HAM10000 dataset using various Machine Learning/Models. The objective is to identify correct skin cancer category from given images using various possible Machine Learning models and comparing their results (performance measures/scores) to arrive at the best ML model. Upon analyzing the dataset we realize that dataset is highly imbalanced so we use the augmentation technique (Zoom, Flipped, Rotation, width and height shift), to make a balanced dataset(tried to have almost same number of images for each category). After augmentation we have a total of 42650 images of 64*64 RGB images.

Acknowledgements: –

I’ve used HAM10000 dataset for this experiment with Machine Learning. HAM10000 dataset is a collection of skin cancer images provided by Harvard HAM10000. Thanks for hosting this dataset.

Understanding and Analyzing dataset: –

In this dataset we have a metadata file and images, so our task is to first label it, we did this by creating one more column(image_path) in metadata file. We’ll save this data frame into csv file in order to avoid repeating this step again.

Final dataset contains 42650 images, and we’ll use the train_test_split function with stratify=y and test_size=0.2 to split this into train and test dataset. So our trained dataset contains 34120 images and test dataset contains 8530 images, each having 4096 features (64*64 pixels). Each pixel has value from 0 to 255, describing the pixel intensity. 0 for white and 255 for black. We’ll save the dataset into respective csv file to avoid these steps again and again.

Original & Augmented Class labels along with value counts are as following . 

CategoryOriginal CountAfter Augmentation Count

Let us have a look to one instance (an article image) of the training dataset.

Problem definition: –

The ‘target’ dataset has 7 class labels, as we can see from above (0 – akiec, 1 – bcc,….6 – vasc).

Given the images of the articles, we need to classify them into one of these classes, hence, it is essentially a ‘Multi-class Classification’ problem.

Preparing the dataset: –We’ve already split our training and test dataset in ratio 80:20 (34120:8530), We’ll use the same for this problem. First we need to convert these images into numpy array and add one extra dimension(batch) for CNN. We’ll divide the numpy array by 255, to give an output in the range between 0 and 1(Standard Scaler). 

Note: Due to limitations of resources I was unable to run the CNN algorithm on whole training dataset, so I further split it into training dataset up to 12000 images.

So my final training data is following as , which is pretty stratified as compare to values in each class.

Convert data into numpy array

Since our target variable is categorical so we apply the OneHotEncoder to the train_df1[‘dx’].values column.

Now our data is ready to apply the ML algorithm on it.

Training ML Deep Learning Model:

Based on the problem, we use the pre trained VGG16 model with top layers = false option. We initialize the model using imagenet weights.

As our model would predict the 7 values for each instance, hence we used activation function as “softmax”, this function would select the value against for category which has maximum value.

We used the train_test_split function to divide the train dataset further into train and validation dataset. Loss function used was categorical_crossentropy and optimizer = “nadam”.

We also tried to use different optimizer like “adam” and “rmsprop”, shown in the table below.


Seeing results from the above table, we conclude that “nadam” optimizer performs best in training and validation phase. So, we’ll execute the next steps with this optimizer only.

Evaluating the Final Model on the Test Dataset: –

We’ll convert the test dataset into numpy array by adding one more dimension(batch) and applying the standardscaler on it to convert the numpy values between 0 and 1.

Then we’ll apply the onehotencoder on the test_df[‘dx’] column to convert categorical values into 0 and 1. 

As we are seeing Test accuracy is around 89.50% on the test set.

AccuracyPrecisionRecallF1- Score

Below is the Confusion matrix.

Classification Report: –

Conclusion: –

(90.24- 89.46 * 100) divide by 90.24 =0.864 As we see, the difference between scores of test and training dataset is very low(0.86), hence the conclusion is that our final model( CNN with nadam optimizer) is good enough, it does not have any overfitting or underfitting.

Although this is does well, but still, we can try this on larger dataset and higher resolution image sizes. We can also try this with more trainable layers =true to see the performance. 

Again, for the reference the whole code for this project can be found here.