Skip to main content
OpenConf small logo

Providing all your submission and review needs
Abstract and paper submission, peer-review, discussion, shepherding, program, proceedings, and much more

Worldwide & Multilingual
OpenConf has powered thousands of events and journals in over 100 countries and more than a dozen languages.

Application of Active Learning On Medical Images To Enhance Machine Learning Models

Artificial Intelligence has made some huge advancements in the healthcare field, particularly in medical imaging. However, data and annotations in this area are often scarce and expensive. Although es- sential for machine learning models, labeling images is a tedious and time-consuming task. Active learning addresses this challenge by select- ing informative samples to try and create a subset of unlabeled data where the model could have more difficulty predicting the labels, which are then given to experts to annotate. The goal is to try to use less anno- tated data while still getting a good model performance. Breast cancer is one of the most common cancers in women. The proposed solution uses the PatchCamelyon dataset, with patches from histopathologic scans of sentinel lymph node sections for the detection of metastatic tissue of breast cancer patients. This work proposes an active learning approach that includes the division of the unlabeled data into clusters, which are then classified based on their level of informativeness. Then, from each cluster, several samples are selected based on the previously defined in- formativeness level, and each sample is scored based on a formula that includes both entropy and Euclidean distance to the cluster centroid. Fi- nally, samples with the lowest uncertainty score are added to the training dataset with the model’s prediction. The proposed method includes both model uncertainty and data distribution. The solution showed promising results when compared with a random sampling approach. To evaluate the proposed solution, greyscale, and Macenko normalization techniques were used in all different approaches (random sampling approach, a varia- tion of the proposed solution with no pseudo label task, and the proposed solution). In some iterations, the difference between the F1 score in the proposed active learning solution and random sampling was more than 0,20. With the application of this method, experts can spend less time annotating images while still achieving a high-performance model.

Maria Santos
DevScope/ISEP
Portugal

Goreti Marreiros
ISEP/GECAD
Portugal