Approach

Research: Welcome

Base Model: Adapting Methods from Current Literature for Multi-class Outputs

The initial goal was to build on the feedforward CNN-SVM Hybrid approach presented by Chakrabarty and Chatterjee [6], which was used on a small dataset of 30 high-resolution images of retinas, 15 healthy and 15 diabetically retinopathic. Figure 1 below displays a schematic depicting the algorithm Chakrabarty and Chatterjee proposed.

The algorithm begins with preprocessing steps, such as binarization and thresholding (depicted in the schematic by the grayscale image as the input), followed by a simple convolutional neural network for feature extraction. Lastly, the algorithm uses Kernel-based SVM on the extracted feature outputs to classify an eye image as either healthy or diabetic retinopathy. With this small dataset, the authors saw 100% success rate in identifying diabetic retinopathy.

For our project, using the five-category Kaggle dataset [4], our aim was to first reproduce the structure proposed by Chakrabarty and Chatterjee [6], and then modify the SVM component of the CNN-SVM hybrid approach to allow for multi-classification, not just binary. The goal of this modification was to see whether the methods proposed by Chakrabarty generalize well to both larger and more variant datasets, such as those from our Kaggle dataset. In addition, adapting this model will give insight into whether more precise diagnostics (classifying the stage of DR on scale of 0-4 as opposed to healthy vs. non-healthy) can be obtained using similar methods to that of binary classification.

Base Model: Our Approach

1. Organizing the Data

The first step was to organize our entire dataset into batches, consisting of training and testing batches. Creating batches has several benefits. Primarily, the dataset is large. Loading and training on the entire or large portions of the data is not feasible due to limited RAM capabilities. Second, creating batches allows training to be performed in batches, reducing the likelihood of overfitting using cross-validation techniques. Lastly, separating training data from testing data is important to make sure that the performance measurements of the model remain unbiased (i.e. validation and training data do not get mixed).

In this experiment, we choose to separate the data into an 80 / 20 percent split for train and test data respectively. To create the training and testing batches, data is first randomly shuffled. Each batch is created by randomly selecting samples such that each batch has a distribution of classes equal to the overall class distribution of the entire dataset. Note that the batches are mutually exclusive. We choose to make each batch have a class distribution equal to the overall distribution of the dataset because this is a more realistic representation of the real world environment, and may lead to better performance in practical applications. However, we believe that using such an imbalanced dataset may also introduce bias into the model and lead to sub-optimal results. Thus, we create a second train-test data batch split, with batches that have uniform distribution of classes within each batch (each class frequency is the same in each batch). We plan to explore model performance using training under both situations (non-uniform and uniform class distribution in training). Discussions of the limitations of both strategies are evaluated later on in the Results and Discussion page.

2. Preprocessing of the Data

After the dataset was organized into batches, the next step was the preprocessing of the images before sending to the CNN. We followed the steps described in the proposed paper as closely as possible. The steps are shown here below in the figure. Note, after the normalization all the pixels are between [0, 1] and thus visually there is nothing to see.

For each sample, we did the following. The code for the preprocessing steps can be found in the GitHub repository.

Step 1: Converting the image from RGB to Grayscale. Note, the ideal way to perform this conversion is to typically use luminous factors between the different color channels; however, to stay in line with the proposed path in the paper, a simple average of the three color channels was used to get the grayscale image.
Step 2: Image thresholding was performed with a 25x25 window size. Adaptive thresholding was used here as it was best at capturing the damaged and newly grown blood vessels. Note, the paper did not indicate the offset or type of adaptive thresholding specifically used. Thus the default values provided by the OpenCV package (0 offset and mean adaptive thresholding) were used.
Step 3: The image was then resized to a 1000x1000 pixel size in order to create uniformity between each retina scan.
Step 4: The last step was to normalize the image into a [0,1] range. This was simply done by dividing each of the pixel values by 255, as directed by the paper.

3. Convolutional Neural Network

In our CNN model there are three convolutional layers, followed by two fully connected layers. We used ReLu as the activation function after each linear layer, and applied the softmax function to the last layer to compute the final probability for each category. The loss function is the cross entropy between the output of the softmax layer and the label in one hot encoding format.

We used Adam as the optimizer to guide the network training. Before the training, the dataset was shuffled. In each training step, the network took as input a mini-batch of images as their corresponding labels, and then updated its parameter through back propagation. When the training stage was done, we evaluated the accuracy and recall rate of the network on the test set, and saved the parameter into disk. In this way, we can load pre-trained models whenever we need to reuse the parameter, e.g., to extract features for SVM described in the next step.

4. SVM

From the first layer after the convolutional layer of the CNN described above, we extracted the feature vectors to input into the multi-class kernel based SVM model we designed. Each of the hyper parameters were tuned using 5-fold CV on the training set.

Additional Model Iteration: Adapt State of Art Deep Learning Structure to replace CNN Structure from Literature

Generally, a deeper convolutional neural network is more suitable for larger dataset as it has more parameters to process more diverse information. As a result, our team decided to investigate how adopting a state-of-the-art deep learning structure (ResNet) as the CNN part would compare to the performance of the original structure.

Tools and Implementations

Our entire model was created in Python. Our code can be found on the Github repository [9]. We chose Python because it is an open sourced, free software with an abundance of packages and libraries that make image manipulation and machine learning methods easy to implement. There are multiple excellent python libraries available for image manipulation, such as Pillow and OpenCV. As for the convolutional neural network, we used Keras for constructing, training and fine-tuning networks. Keras is widely used for customizing machine learning models, especially for deep CNNs. Apart from building and training a network, Keras provides APIs for loading existing pre-trained parameters and then fine-tuning them on datasets given by users. Finally we used the sklearn library to perform the multi-class SVM.

Evaluation Methods

We evaluated our model using the same statistics generated by Chakrabarty [6], which include training accuracy, testing accuracy, precision, sensitivity/recall, and F1-score. In addition, we examined the performance of our model across the multiple classes by performing analysis using confusion matrices. This gave us insight into how well the model is able to distinguish different forms of disease precisely and where potential limitations of our model may exist within a medical context. Our results and analysis are shown on the Results and Discussion page.

Early Problems: Large Dataset

The largest challenge has been working with a data set of such a large size. After the data was downloaded, our team found saving and loading the data to be a difficult challenge (almost 80GB in total). Due to such a large dataset, we decided it was best to run our programs on Google Colab, which provides access to Google’s cloud servers to help us run our code. With UW’s access to unlimited storage on google drive, this solved the issue of storage space. However, we faced many struggles managing the RAM limitations for Google Colab (12 GB).

There were several mitigation strategies that we designed to help us get around this issue. The first was to split our dataset into 11 evenly sized batches (2 for testing, 9 for training). This allowed for more manageable numpy arrays of 6GB each. After the preprocessing was complete, we were able to shrink each batch size to about 3-4GB. However, this still posed an issue when training the CNN. As a result, we had to train the CNN with one batch at a time and forcing python to use gc.collect() (garbage collector) to remove the previous batch from the memory. Though these steps did help us get around the original issue - they still imposed a large time restriction that was unavoidable.

Research: About