Dataset

Research: Welcome

Research: About

The data for our project will be from a featured prediction competition on Kaggle. This includes a large set of high-resolution retina images taken under a variety of imaging conditions. Both the left and right fields are provided for each subject. The images come from a variety of different models and types of cameras - some of which may have inverted the image (such as those taken by a microscope condensing lens).

Additionally, a clinician has rated the presence of diabetic retinopathy in each image on a scale of 0 to 4, according to the following scale below. Note there is no information as to how many clinicians were involved in the original labeling of the data set, thus leaving room for bias in our study.

As with all data-sets, there is expected to be some noise in both the images and the labels - thus making the need for a robust algorithm that much more important.

The original size of the data-set is approximately 35,000 images. Due to memory and time limitations, there is an additional version of this data-set, also on Kaggle, that we will be using. This resized data-set still contains all the original images, however, they have been resized to a maximum of 1024x1024. If the original size of the image was smaller than 1024x1024, then it was not altered.

It is also important to take a look into the dataset to understand the differences we see both between different classes as well as within the same class.

Looking at the figure below, we can see the differences between different classes. Note, in this image, well lit pictures were individually picked to helped highlight the differences we can see between the classes. Classes 0 vs. 4 clearly show the signs of no vs. proliferative DR (as expected). However, the incremental changes in between classes from 1 to 3 is much harder to distinguish. In order to expand our model to perform early diagnosis, we will need to be able to detect these changes accurately.

Additionally, we want to analyze how images within the same class can differ. The figure below highlights these differences for images all within class 0. Note there is a lack of general consistency, from brightness to size to blurriness. For example, some of these images show spots near the retina that may be perceived as signs of DR; however, they are all just variations in lighting conditions. These are changes that must be accounted for by the model.