ML in breast cancer IHC: Pilot evaluation on non-ideal slides in Kazakhstan
Arshat Urazbayev 1 , Bakytzhan Amangeldinovna Issakhanova 2 3 , Zhanas Baimagambet 4 5 , Zamart Ramazanova 1 6 , Yeldar Baiken 1 7 , Askhat Myngbay 1 8 *
More Detail
1 PI National Laboratory Astana, Nazarbayev University, Astana, KAZAKHSTAN2 Department of Anatomic Pathology, RSE “Medical Centre Hospital of the President’s Affairs Administration of the Republic of Kazakhstan”, Astana, KAZAKHSTAN3 QazGene LLP, Astana, Kazakhstan4 School of Medicine, Nazarbayev University, Astana, KAZAKHSTAN5 University of Cape Town, Faculty of Health Sciences, SOUTH AFRICA6 Department of Electrical and Computer Engineering, School of Engineering and Digital Sciences, Nazarbayev University, Astana, KAZAKHSTAN7 Center for BioEnergy Research LLP, Astana, KAZAKHSTAN8 Department of Biology, K. Zhubanov Aktobe Regional University, Aktobe, KAZAKHSTAN* Corresponding Author

Abstract

Objectives: Automation of quantitative analysis of breast cancer (BC) immunohistochemistry (IHC) specimens is important to optimize pathologists’ workflow and improve diagnostic reproducibility. This is especially important in low- and middle-income countries where there is a shortage of highly trained pathologists. However, existing approaches face challenges in implementing fully automated quantitative IHC due to the difficulty of both delineating tumor areas, including discrete areas, especially in IHC slides with poor quality. Moreover, accurate identification of invasive carcinoma areas and accurate quantification of positive and negative cells in the specimen are critical for quantitative analysis.
Methods and results: This study presents a method to automatically identify types of carcinoma areas in whole slide IHC images of BC, focusing on quantifying IHC images on realms of Kazakhstan. The used model is a combination of morphological characteristics and boundary features, which provides high accuracy of segmentation of tumor zones of images of mild and low quality. We used several methods includes convolutional neural network based on the Keras framework, k-nearest neighbors machine learning methods, and self-developed image analysis methods. The developed model showed high accuracy, where the results corresponded to the diagnoses of pathologists. As expected, the method proved to be ineffective when applied to severely degraded slides, such as those with insufficient staining or inadequate washing. Slides of inferior quality were excluded from analysis, which negatively affected the statistical robustness. On slides of moderate quality, the reliability of nucleus segmentation dropped significantly.
Conclusions: The combination of models we used showed high accuracy in differentiating BC cells between the basal-like subtype of BC and its invasiveness and recurrence in Kazakhstan. However, IHC specimens with low DPI or low-quality IHC need further optimizations and improvements in algorithm design. The main issue can be considered methodological differences between the approaches of AI and humans: AI operates in a large number of cases (more than 10,000), yet its accuracy is relatively low. In contrast, humans work with a much smaller number of cases but achieve a level of precision that AI cannot currently match. This discrepancy necessitates a revision of the methodology of IHC analysis for AI, including the development of new requirements, methods, and thresholds from scratch. This approach provides analysis of the entire area of the slide, increases the speed of interpretation of IHC results, and reduces human errors in diagnosis, especially in low- and middle-income countries.

License

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Article Type: Original Article

ELECTRON J GEN MED, Volume 23, Issue 3, June 2026, Article No: em732

https://doi.org/10.29333/ejgm/18520

Publication date: 05 May 2026

Article Views: 166

Article Downloads: 114

Open Access References How to cite this article