The AI Commons Project is a proof of concept of a new methodology of developing Artificial Intelligence solutions that allows anyone, anywhere to benefit from the possibilities that AI can provide. The project aims to increase/improve the accessibility, reproducibility, contextualization and enhancement of Artificial Intelligence solutions globally and especially in emerging markets.
The project aims to demonstrate how a global community of AI experts can learn and co-create mutually beneficial solutions with the opportunity for cross-county incremental enhancement.
Data Science Nigeria
Gbemileke Onilude, Data Science Nigeria and Wuraola Oyewusi, Data Science Nigeria
Plasmodium falciparum malaria is one of the greatest world health burdens with over 200 million cases globally and almost half a million deaths yearly, most of them in Sub-Saharan Africa. Thick Film Microscopy (TFM) remains the gold standard for diagnosing malaria in sub-Saharan regions. TFM relies on the availability of a trained human microscopist to visually inspect Giemsa stained blood smears under a light microscope to identify and count the P. falciparum parasites. This is time-consuming and subject to human error. Without the presence of a trained human microscopist diagnosing is almost impossible.
People from Sub-Saharan Africa are mostly affected. Pregnant women and children face higher fatality from the parasite.
A blood sample is taken from the paitent and a trained microscopist examines it with the help of a microscope to detect malaria parasite
Yes, Malaria parasite detection using machine learning and cancer detection using machine learning have been developed
The solution developed is a computer vision technique able to accurately detect Malaria Parasites (MP) and White Blood Cell nuclei (WBC) in color brightfield digitized images of Giemsa-stained thick blood films captured with a 100x/1.4 oil-immersion objective lens.
Presentation deck: Here
The output of the solution is an image with bounding boxes drawn within it for detecting both malaria parasite and white blood cell within an image.
A trained microscopist to create the training dataset and a machine learning engineer to build the model.
Small amount of training data and low computational resources
Get more training data.
For use in hospitals and labs to aid in the fast detection of malaria parasites.
Yes. Expert microscopist was used to label the train dataset.
The dataset consists of images captured using an upright brightfield microscope, a total number of 239 images which were already annotated by expert microscopists. The annotation consisting of 2986 Malaria Parasites and 1272 White Blood Cell nuclei from 13 unique blood smears
Creation of malaria parasites detection model
The dataset was created by researchers from University College London and University College Hospital, Ibadan.
Images and Json file.
Images = 239, Json files = 13
No, it does not contain all possible instances available. All available set would be all individual blood smear diagnosis of malaria
Each instance consist of an Image of Giemsa-stained thick blood films captured with a 100x/1.4 oil-immersion objective lens and an associated Json file that describes the image.
A sample image in the dataset is shown here.
JSON file: An annotation file consisting of the location of malaria parasites and white blood cell nuclei
The dataset is self – contained.
Labeling was done by expert microscopist.
Tasks related to the solution.
Cancer prediction, blood type prediction, e.t.c.
Model date: 2019. The model built is an object detection model, The model was built using Mask-RNN which uses ResNet50 for Feature Extraction
The dataset was loaded into the workspace from a file directory, each image was matched with is annotation file and feed into the model as an input. the image input shape was 2560 by 2160 by 3
The Model training started with a weights file that’s been trained on the COCO dataset. Coco dataset has about 120000 images and does not contain micrographs or biomedical images but the data is large enough to have learned features from objects of similar shape with regions of interest.
The dataset was split in the ratio 80:10:10 (train, validation, and test dataset). The performance metric used is mean average precision (mAP), it is a very important metric to use when building an object detection model.
The model achieved a mean average precision of 0.6 with a training time of more than 2 hours.
The model was trained with an epoch of 30, steps per epoch of 100. The other hyper-parameters were held constant as inherent from the pre-train model. Try and error was used to determine the best epoch to use
Mean average precision for a set of queries is the mean of the average precision scores for each query.
Google colab was used with GPU enabled
The solution was implemented using Google colab. It can be reproduced by running all cells in the colab notebook in the link Here
No, the output are not easily explainable.
Expected performance of unseen data is between 0.6 to 0.7 mAP
An expert microscopist can check the output image an determine if it predict was right.