br This paper presents a potent
This paper presents a potent approach for the automated classifi-cation of cervical cancer from Pap smears using an enhanced fuzzy c- Informatics in Medicine Unlocked 14 (2019) 23–33
Fig. 1. The approach to achieve cervical cancer classification from Pap smear images.
means algorithm. A sequential elimination method is proposed for debris removal, Trainable Weka Segmentation for cell segmentation, and an efficient approach for feature selection to generate a feature subset that minimises the classification error.
2. Materials and methods
Cervical cancer classification was achieved in our study through a sequential approach (depicted in Fig. 1). The approach was assessed using two DTU/Herlev datasets. Dataset 1 consists of 917 single 110044-82-1 of Pap smear images prepared by Jantzen et al. . The dataset contains Pap smear images taken with a resolution of 0.201 μm/pixel by skilled cytopathologists using a microscope connected to a frame grabber. The images were segmented using CHAMP commercial software (developed by DIMAC Imaging systems) and then classified into seven classes . Of these, 200 images were used for training and 717 images for testing. Dataset 2 consists of 497 full slide Pap smear images prepared by Norup et al. . Of these, 200 images were used for training and 297 images for testing. Furthermore, the performance of the classifier was eval-uated on samples of 98 Pap smears (49 normal and 49 abnormal) ob-tained from Mbarara Regional Referral Hospital (MRRH). Specimens were imaged using an Olympus BX51 bright-field microscope equipped with a 40 × , 0.95 NA lens and a Hamamatsu ORCA-05G 1.4 Mpx monochrome camera, giving a pixel size of 0.25 μm with 8-bit gray depth. Each image was then divided into 300 areas with each area containing between 200 and 400 cells. Based on the opinions of the cytopathologists, 10,000 objects in images derived from the 98 different Pap smear slides were selected of which 500 were free lying cervical epithelial cells (250 normal cells from normal smears and 250 abnormal cells from abnormal smears) and the remaining 9,500 were debris
Fig. 2. Application of CLAHE (B) on the original Pap smear image (A). Original histogram (C) and Enhanced histogram (D).
Fig. 4. Generation of the feature vector from the training images.
objects. This Pap smear segmentation was achieved using the Trainable Weka Segmentation toolkit.
2.2. Image enhancement
Image enhancement is very useful where the subjective quality of images is important for human and computer interpretation . A contrast local adaptive histogram equalization (CLAHE)  was ap-plied to the grayscale image. A clip limit value of 2.0 was determined to be appropriate for providing adequate image enhancement while pre-serving the dark features. Conversion to grayscale was achieved using a grayscale technique implemented using Equation (1) as defined in Ref. .
where R = Red, G = Green and B=Blue colour contributions to the new image.
A contrast local adaptive histogram equalization algorithm was implemented for image enhancement. This resulted in noticeable changes to the images (as shown in Fig. 2) by adjusting image in-tensities where the darkening of the nucleus, as well as the cytoplasm boundaries, became easily identifiable using a clip limit of 2.0. CLAHE resulted in intensities in the images that were better distributed so as to facilitate further image analysis.
2.3. Pap smear segmentation
The majority of cells observed in a Pap smear are, not surprisingly, cervical epithelial cells . In addition, varying numbers of leuko-cytes, erythrocytes and bacteria are usually evident, while small num-bers of other contaminating cells and microorganisms are sometimes observed. However, the Pap smear contains four major types of squa-mous cervical cells - superficial, intermediate, parabasal and basal - of which superficial and intermediate cells represent the overwhelming majority in a conventional smear; hence these two types are often used for a conventional Pap smear analysis . A Trainable Weka Seg-mentation (TWS) was utilized to identify and segment the different objects on the slide. At this stage, a pixel level classifier was trained on cell nuclei, cytoplasm, background and debris identification with the help of a skilled cytopathologist, using the TWS toolkit . This was achieved by drawing lines/selection through the areas of interest and assigning them to a particular class. The pixels under the lines were taken to be representative of the nuclei, cytoplasm, background and debris (depicted in Fig. 3).
The outlines drawn within each class were used to generate a fea-
ture vector, F which was derived from the number of pixels belonging to each outline. The feature vector from each image (200 from Dataset 1