Yu-Hang Zhang, Lin-Jie Guo, Xiang-Lei Yuan, Bing Hu
Abstract Esophageal cancer poses diagnostic, therapeutic and economic burdens in highrisk regions. Artificial intelligence (AI) has been developed for diagnosis and outcome prediction using various features, including clinicopathologic, radiologic, and genetic variables, which can achieve inspiring results. One of the most recent tasks of AI is to use state-of-the-art deep learning technique to detect both early esophageal squamous cell carcinoma and esophageal adenocarcinoma in Barrett’s esophagus. In this review, we aim to provide a comprehensive overview of the ways in which AI may help physicians diagnose advanced cancer and make clinical decisions based on predicted outcomes, and combine the endoscopic images to detect precancerous lesions or early cancer. Pertinent studies conducted in recent two years have surged in numbers, with large datasets and external validation from multi-centers, and have partly achieved intriguing results of expert’s performance of AI in real time. Improved pre-trained computer-aided diagnosis algorithms in the future studies with larger training and external validation datasets, aiming at real-time video processing, are imperative to produce a diagnostic efficacy similar to or even superior to experienced endoscopists. Meanwhile, supervised randomized controlled trials in real clinical practice are highly essential for a solid conclusion, which meets patient-centered satisfaction. Notably, ethical and legal issues regarding the blackbox nature of computer algorithms should be addressed, for both clinicians and regulators.
Key Words: Artificial intelligence; Computer-aided diagnosis; Deep learning; Esophageal squamous cell cancer; Barrett’s esophagus; Endoscopy
Esophageal cancer (EC) is one of the top ten leading prevalent malignancies worldwide, ranking the seventh in incidence and the sixth in mortality in 2018[1]. The major histological types are squamous cell carcinoma (SCC), which is predominant worldwide, and adenocarcinoma (AC) which is more prevalent in Caucasian people[2-4]. Data collected from 12 countries have indicated that AC will possibly experience a dramatic increase in incidence up to 2030, while the incidence of SCC will continuously decrease[5]. It is estimated that EC causes absolute years of life lost reduction of 7.8 (95%CI: 2.3-12.7)[6]. Although EC is not the most common cause of admission or readmission to hospital[7], it certainly imposes economic burdens. A cohort study conducted in the United Kingdom showed that the mean net costs of care per 30 patient-days of AC were $1016, $669, and $8678 for the initial phase, continuing care phase and terminal phase, respectively[8]. The cost grows with an increase of tumor node metastasis (TNM) staging at first diagnosis[8].
Apparently, EC is a serious health threat, imposing economic burden on both highincome and low-income countries. Therefore, early diagnosis and evidence-based expert opinions on selecting the optimal treatment modality are crucial for reducing such burden. Although various diagnostic methodologies [including endoscopic ultrasonography (EUS), chromoendoscopy, optical coherence tomography (OCT), high-resolution microendoscopy (HRM), confocal laser endomicroscopy (CLE), volumetric laser endomicroscopy (VLE), and positron emission tomography (PET)], serous and genetic predictors have been developed to improve diagnostic accuracy and predict outcomes, inter-observer variabilities in interpreting images and heavy workloads limit their clinical efficiency[9-11]. A practical tool that can improve accuracy and reduce workload is in urgent need for clinical practice.
Artificial intelligence (AI), which mimics human mind’s cognitive behavior, has been an emerging hot spot globally in various disciplines. Numerous models have been attempted for machine learning (ML), and the terminologies can be referred in the previous studies[12,13]. ML models are trained by datasets to extract and transform features, thereby achieving the goal of classification and prediction by selflearning[13-15]. In gastroenterology, AI-based technologies, which are characterized by deep learning (DL) as state-of-the-art machine learning algorithms, have been mainly developed to identify dysplasia in Barrett’s esophagus (BE), SCC, gastric cancers, andHelicobacter pyloriin upper gastrointestinal (UGI) tract[16], and to diagnose polyps, inflammatory bowel diseases, celiac disease, and gastrointestinal (GI) bleeding in lower GI tract[17]. Various models have been developed and studied to detect anatomical structure, discriminate dysplasia, and predict therapeutic and survival outcomes of EC. The ultimate goal of AI is to assist physicians and patients to make a superior data-based diagnosis or decision. In the following sections, we will (1) provide an overview of AI applications in diagnosis and prediction of advanced cancer; (2) specify computer-aided diagnosis (CAD) for early detection of esophageal SCC (ESCC) and esophageal adenocarcinoma (EAC) based on optical imaging; and (3) outline limitations of the existing studies and future perspectives. We searched PubMed database using terms “esophageal cancer” and “artificial intelligence” for papers published up to March 1, 2020, and initially obtained 172 studies. After exclusion of 128 items, 44 research articles that provided detailed data were included in the review and discussion (Figure 1).
EC is highly malignant, and the 5-year survival rate of late-stage EC is less than 25%[18]. Radical therapies, including surgery, chemotherapy, radiotherapy or their combination are highly essential to improve survival outcome. Accurate diagnosis, precise staging, optimal modality selection, as well as responsiveness and survival outcome prediction are necessary in making true clinical decisions. However, these decisions are made mainly based on the current guidelines and expertise in clinical practice. AI technologies have been therefore developed to enhance the reliability of those decisions in an individualized manner.
One of the important roles of AI is to detect malignant lesions. In 1996, Liuet al[19]proposed a tree-based algorithm called PREDICTOR, to classify patients with dyspeptic symptoms into EC, which achieved a discriminating accuracy of 61.3%, with sensitivity (SEN) and specificity (SPE) of 94.9% and 39.8%, respectively. In 2002, a probabilistic network-based decision-support system was developed, which could correctly predict the cancer stage of 85% of tested data in reasonable time[20]. In the same year, a robust classifier, artificial neural network (ANN), imitating neural network of the human brain, was adopted to distinguish BE from EC[21]. The ANN was trained using 160 genes selected by significance analysis of microarrays (SAM) from cDNA microarray data of esophageal lesions. This ANN outperformed cluster analysis by correctly diagnosing all the tested samples. Kanet al[22]also combined ANN with SAM-extracted 60 gene clones to accurately predict lymph node metastasis in 86% of all SCC cases, with SEN and SPE of 88% and 82%, respectively, better than clustering or predictive scoring. Kanet al[22]suggested that AI was a potential tool to detect lymph node metastasis when the SEN of coherence tomography (CT), EUS, and PET is insufficient[23,24]. Since tumor risk factors have complex nonlinear correlations, a fuzzy neural network, trained on hybridization of chaotic optimization algorithm and error back propagation (EBP), was able to correctly diagnose 87.36% of ESCC and 70.53% of dysplasia[25]. This fuzzy-logic based model outperformed traditional statistics, such as multivariate logistic regression model that was previously described by Etemadiet al[26].
While symptoms are not quite reliable and gene analysis or PET scans are expensive, a simpler noninvasive detection method may be more practical. Liet al[27]combined support vector machine (SVM), a traditional classifier, with surfaceenhanced Raman spectroscopy in order to distinguish serum spectra of EC patients from healthy controls. Eventually, a combination of SVM with principle component analysis (PCA) on the basis of radial basis function (RBF), namely RBF PCA-SVM algorithm, exhibited the greatest efficacy among others with accuracy, SEN, and SPE of 85.2%, 83.3% and 86.7%, respectively.
Another significant role of AI is to predict prognosis of EC based on various demographic, clinicopathologic, hematologic, radiologic, and genetic variables. Surgery and neoadjuvant chemotherapy, radiotherapy or chemoradiotherapy are important definitive modalities for advanced EC. Selecting the optimal strategy with superior predictive outcome is of vital importance.
Traditionally, TNM staging system is used as a predictor. However, a previous study showed that it was not very accurate[28]. Hence, multiple computational algorithms were developed to assist more reliable predictions. In 2005, Satoet al[29]trained an ANN to predict survival outcome. They found that the best predictive accuracy was obtained, with 65 clinicopathologic, genetic and biologic variables for 1-year survival and 60 variables for 5-year survival. The area under ROC curve (AUC), SEN, and SPE were 0.883, 78.1%, 84.7% and 0.884, 80.7%, 86.5%, respectively. Similar results with higher SEN and SPE could be achieved in another ANN model to predict the 1- and 3-year post-operative survival of EC and esophagogastric junction cancer[30]. These two ANNs both outperformed TNM staging system[29,30].
Figure 1 Flow chart of study selection and logic arrangement of review. BE: Barrett’s esophagus; EAC: Esophageal adenocarcinoma; OCT: Optical coherence tomography; ESCC: Esophageal squamous cell carcinoma.
In addition to ANN, other models were also proposed to solve certain problems. A prognostic scoring system, using serum C-reactive protein and albumin concentrations, was fused with expertise by fuzzy logic[31]. The proposed model could perform 1-year survival prediction with an AUC of 0.773. Another hierarchical forward selection (HFS), a wrapper feature selection method, was developed to solve the problem of small sample size[32]. In this SVM-validated model, clinical and PET features were learned to predict disease-free survival. The results unveiled that HFS achieved the highest accuracy of 94%, with robustness of 96%. Robustness could be further increased to 98%, if HFS was incorporated prior knowledge (pHFS).
Based on the condition and prognosis of patient, an individualized treatment strategy is needed. For instance, when chemotherapy is prescribed for a patient, what is the optimal medication with appropriate dosage and period? Generally, clinicians make decisions according to their own experience, guidelines or consensus. However, those recommendations are often fixed and human errors are sometimes inevitable. A group of Iranian experts attempted to train a multilayer neural network with particle swarm optimization and EBP algorithms, in order to determine the dosage of chemotherapy[33]. Encouraging results showed that accuracy of particle swarm optimization and EBP was both 77.3%. Zahediet al[33]were positive about its future application as a supplementary decision-making system.
While the majority of decisions are made before treatment, is it possible to make real-time treatment decisions? The answer is YES. Maktabiet al[34]tested a relatively new hyperspectral imaging system. They found that SVM was able to detect cancerous tissue with 63% SEN and 69% SPE within 1s. It is promising that hyperspectral imaging may assist surgeons in identifying tumor borders intra-operatively in real time.
A good treatment response is crucial for consequential therapeutic decision and predicting outcome[35]. Endeavors have been made to select candidate factors that correlate with responsiveness to treatment. However, it is often extremely laborious to testify these numerous variables in clinical trials. AI technologies are potential powerful tools for this selection.
One indicator is the genetic biomarker. In 2010, Warnecke-Eberzet al[36]reported their usage of ANN to predict histopathologic responsiveness of treatment-na?ve patients to neoadjuvant chemoradiotherapy by analyzing 17 genes using the TaqMan low-density arrays. Their results were promising, with 85.4% accuracy, 80% SEN, and 90.5% SPE. Radiology is another important indicator to assess tumor regression after treatment. One rationale for this exploration is that tumor heterogeneity exists within radiologic images[37]. The standardized uptake values of18F-fluorodeoxyglucose in PET imaging were reported to have predictive potential[38]. However, this predictive power is limited[39]due to some confounding factors, such as intra-observer variations. Ypsilantiset al[40]adopted a three-slice convolutional neural network that could extract features from pre-treatment PET scans automatically to predict response to chemotherapy. It achieved a moderate accuracy of 73.4%, with SEN and SPE of 80.7% and 81.6%, respectively, which outperformed other ML algorithms trained on handcrafted PET scan features. Recently, CT radiomics three months after chemoradiotherapy were combined with dosimetric features of gross tumor volume and organs at risk to identify non-responders. Jinet al[41]found that these combinative features trained by the model of extreme gradient boosting plus PCA achieved an accuracy of 70.8%, with AUC of 0.541.
While tumor regression is an important indicator to assess responsiveness, posttreatment distant metastasis is also vital to evaluate responsiveness, which is correlated with survival outcome. In order to predict post-operative distant metastasis of ESCC, further SVM models incorporated with clinicopathological and immunohistological variables were established[42]. Finally, the SVM model with four clinicopathological features and nine immunomarkers had better performance, with accuracy, SEN, SPE, positive predictive value, and negative predictive value of 78.7%, 56.6%, 97.7%, 95.6%, and 72.3%, respectively. Another least squares SVM model was also proposed to predict post-operative lymph node metastasis in patients who received chemotherapy preoperatively, by exploiting preoperative CT radiomics[43]. Tumor length, thickness, CT value, long axis and short axis size of the largest regional lymph node were analyzed. The model reached an AUC of 0.887.
In addition to its diagnostic and predictive value, AI has learned to identify meaningful alterations in molecular and genetic level. In 2017, Linet al[44]compared the serum chemical elements concentrations between ESCC patient and healthy controls, and found that nearly half of the elements were different between the two groups. They then trained several classifiers to perform the discrimination, with Random Forest being the best (98.38% accuracy) and SVM the second (96.56% accuracy). Later, Mourikiset al[45]developed a robust sysSVM algorithm to identify 952 genes that promoted EAC development, using 34 biological features of known cancer genes. They called these rare and highly individualized genes ”helper” genes, which function alongside known drivers.
AI may be a feasible option to help determine an optimal treatment strategy. This was previously evidenced by a study of 13 365 EACs from 33 cancer centers worldwide, which incorporated random forest algorithm, and found that the predicted survival of AI-generated therapy was superior to actual human decisions[46]. However, most of the above-mentioned ML algorithms described were developed for the sake of advanced cancer. Diagnosing EC in an early stage contributes to a far better outcome when treatment is undertaken appropriately. This is highly dependent on the development of optical imaging technologies that can directly visualize the morphology of esophageal lesions.
In recent decades, endoscopic optical imaging techniques have been rapidly advanced, which provide endoscopists a fine inspection of the morphology of esophageal mucosa, micro-vessels, and even cells. In lieu of white light imaging (WLI) and magnifying endoscopy (ME), emerging OCT, CLE, VLE, and HRM techniques have been developed to diagnose BE[47-49]. Meanwhile, the diagnosis of SCC more relies on chromoendoscopy and intra-epithelial papillary capillary loop (IPCL) observed under narrow band imaging (NBI) plus ME[50]. Although these modalities have yielded preferable diagnostic value, the interpretation of these images need expert’s experience (inter- and intra-observer variability[51]), and processing large dataset is laborious and time consuming. Researchers in medicine and information engineering have collaborated to develop different AI models for this purpose.
The current screening and surveillance recommendation for BE is endoscopic examination plus random biopsy[52], which is limited by sampling error. AI models trained with various endoscopic modalities and pathologies aimed to overcome these shortcomings (Table 1).
Endoscopy: In 2009, German experts developed a content-based image retrieval framework[53]. In this frame, novel color-texture features were combined with an interactive feedback loop. The algorithm could correctly recognize 95% of normal mucosa and 70% of BE from 390 training images, with a moderate inter-rater reliability of 0.71. The authors thought that the CAD system might be incorporated to the endoscopic system to help lesser experienced clinicians. In 2013, van der Sommenet al[54]tried an SVM algorithm, which could automatically identify and locate irregularities of esophagus on high-definition endoscopy with an accuracy of 95.9% and AUC of 0.99, taking a first step towards CAD. Later, these authors used a CAD system to automatically recognize region of interest (ROI) in dysplastic BE[55]. The SVM-based classification yielded a SEN and SPE of 83% for per-image level, and 86% and 87%, for the patient level, respectively. However, thef-score of the system, which indicates the similarity with the gold standard, was lower than experts.
In order to improve the outcome, Horieet al[56]were the first who adopted a deep CNN (Single Shot MultiBox Detector, SSD) model to detect EC from WLI and NBI images in 2018. Only 8 EACs were used in that study. The diagnostic accuracy for EAC was 90%, and SEN for WLI and NBI at patient level was both equal to 88%. The system processed one image in only 0.02 s, which is promising for a real-time job. This ability of SSD to detect EAC was assessed in another study, which outperformed their proposed regional-based CNN (R-CNN), Fast R-CNN and Faster R-CNN in both precision and speed, which achieved F-measure, SEN, and SPE of 0.94, 96% and 92%, respectively[57]. The authors stated that SSD worked faster due to its single forward pass network nature. CNN was then validated in a more recent study to detect early dysplastic BE[58]. The system was pretrained on ImageNet, and was then trained with 1853 images and tested with 458 images. The CNN accurately detected 95.4% of the dysplasia, with 96.4% SEN and 94.2% SPE. One highlight for this study is that it studied WLI and NBI images, as well as images with standard focus and near focus. Another highlight is its ability to deal with real-time videos.
Except for the above-mentioned CNNs, another CNN built upon residual net (ResNet) was introduced. Ebigboet al[59]tested this system in two databases, Augsburg and Medical Image Computing and Computer-Assisted Intervention, with SEN both being over 90%. Later, de Groofet al[60]used a custom-made hybrid ResNet/U-Net which was pretrained on GastroNet to distinguish non-dysplastic BE from dysplasia. The system was trained using state-of-the-art ML techniques (transfer learning and ensemble learning) and validated in a sequential five datasets, with accuracy of 89% and 88% for two external validation datasets, which were slightly superior to the model pre-trained with ImageNet in its supplementary ablation experiment.
Endomicroscopy: In 2017, Honget al[61]reported their experience in adopting CNN as a classifier to distinguish intestinal metaplasia (IM), gastric metaplasia (GM) and neoplasia (NPL) of BE using endomicroscopic images. The total accuracy was 80.77%. It performed well for IM and NPL. However, it could not identify GM in the tested samples. VLE is an advanced imaging techinique that can provide a 3-mm deep scan of the esophagus in full circumference, which is commercially available (Nvision VLETMImaging System). In the same year, Swageret al[62]reported the first attempt of using CAD to detect NPL by adopting histology-correlatedex-vivoVLE. The authors used eight separate ML algorithms that were trained with clinically inspired features. They found that “l(fā)ayering and signal decay statistics” feature performed the best, with AUC, SEN, and SPE of 0.95, 90%, and 93%, respectively. Similar results were obtained by van der Sommenet al[63], with a maximum AUC of 0.93 in identifying early EAC in BE. Notably, the authors discovered that scanning depth of 0.5-1 mm was the most appropriate range for classifying tissue categories.
Table 1 Computer-aided endoscopic diagnosis for dysplastic Barrett’s esophagus
AdaBoost: Adaptive boost; AUC: Area under ROC curve; BE: Barrett’s esophagus; CAD: Computer-aided diagnosis; CBIR: Content-based image retrieval; CC: Mucosa of cardia; CNN: Convolutional neural network; DA: Discriminant analysis; EAC: Esophageal adenocarcinoma; EP: Epithelium; HGD: High-grade dysplasia; HRM: High-resolution microendoscopy; Knn: K-nearest neighbor; LDA: Linear discriminant analysis; LogReg: Logistic regression; LOO: Leave-oneout cross-validation; LR: Linear regression; NA: Not available; NB: Na?ve bayes; NBI: Narrow band imaging; OCT: Optical coherence tomography; PCA: Principle component analysis; PDE: Principle dimension encoding; R-CNN: Regional-based CNN; RF: Random forest; SEN: Sensitivity; SPE: Specificity; SSD: Single shot multibox detector; SVM: Support vector machine; VLE: Volumetric laser endomicroscopy; WLI: White light imaging.
Since VLE produces an overwhelming number of images in a short time, a real-time CAD system is more helpful in actual clinical practice. In 2019, Trindadeet al[64]reported a video case illustrating an intelligent real-time image segmentation system which employed three established features to dynamically enhance abnormal VLE images with color in endoscopic procedure. They are now undergoing a multicenter RCT (NCT03814824) to further validate this CAD system. While most studies use single frame to include ROI, a recent study tried to add neighboring VLE images to pathology-correlated ROI[65]. Hopefully, the so-called multi-frame analysis combing PCA improved the performance of single-frame analysis, from an AUC of 0.83 to 0.91. Meanwhile, the novel CAD system needs only 1.3 ms to automatically differentiate non-dysplastic BE from dysplasia in one image, and this is also a promising result for a real-time setting.
While previous studies employedex-vivoscan images, the following study conducted by van der Puttenet al[66]usedin-vivohistology-correlated images. In addition, they used principle dimension encoding (PDE) to encode images into score vector. They combined this PDE with traditional ML algorithms,e.g.,random forest and SVM, to classify the degree of dysplasia(high-grade dysplasiavsearly EAC). They obtained an AUC of 0.93 and F1 score of 87.4%, which outperformed some traditional DL classifiers, such as Squeezenet and Inception.
Another kind of endomicroscopic technique is HRM. Shinet al[67]designed an automated imaging processing algorithm extracting epithelium morphology and BE glandular architecture features, and a classification algorithm, which distinguished NPL from dysplasia in BE with an accuracy, SEN and SPE of 84.9%, 88%, and 85% in validation dataset, respectively. This quantitative CAD is cost-effective and may be applied in clinical settings after improvement of image acquisition quality and processing speed.
OCT: OCT is also a noninvasive imaging technique that can detect BE, dysplasia and early EAC, in compensation to routine endoscopy. In 2006, Qiet al[68]attempted to extract image features using center-symmetric auto-correlation method and a PCAbased CAD algorithm was used for classification. A total of 106 pathology-paired images were included for training, which ended up with an accuracy of 83%, SEN of 82% and SPE of 74% to distinguish non-dysplastic BE from dysplasia. In general, the accuracy of OCT in identifying dysplasia is not satisfactory, which limits its application[69].
In addition to endoscopic images, pathologic morphology has also been studied. Saboet al[70]employed an ANN-validated computerized nuclear morphometry (pseudostratification, pleomorphism, chromatin texture, symmetry and orientation) model to discriminate the degree of dysplasia in BE. The model was able to differentiate non-dysplastic BE from low-grade dysplasia with an accuracy of 89%, and low-grade dysplasia from high-grade dysplasia with an accuracy of 86%.
ESCC is the dominant histological type of esophageal cancer worldwide. Diagnosing early cancer mainly depends on endoscopic screening, which also produces a large number of images that needs special training to interpret. AI technologies have also been explored globally to address this issue (Table 2).
Endoscopy: In 2016, Liuet al[71]designed an algorithm called joint diagonalization principle component analysis, which correctly detected 90.75% of EC with an AUC of 0.9471. To improve the performance of CAD system, Horieet al[56]did the first attempt to use DL to diagnose ESCC with a large number of endoscopic images. The CNN had a diagnostic accuracy of 99% for ESCC, 99% for superficial cancer, and 92% for advanced one. The SEN of CNN was 97% for per-patient level and 77% for per-image level. Later in 2019, Caiet al[72]proposed a novel CAD system called deep neural network (DNN). They used only standard WLI images to train the model. The DNNCAD model could detect 91.4% of early ESCC, higher than senior endoscopists. By using this model, the average diagnostic performance of endoscopists improved satisfactorily, in terms of accuracy, SEN, and SPE. However, these studies excluded magnified images.
Later, Ohmoriet al[73]evaluated both ME and non-ME images [including WLI and NBI/blue laser imaging (BLI)] using a CNN based on SSD to recognize SCC. The accuracy for ME, non-ME + WLI, and non-ME + NBI/BLI was 77%, 81%, and 77%, respectively, all with high SEN and moderate SPE. The result was similar to experienced endoscopists tested in this study. Zhaoet al[74]conducted another study and evaluated ME + NBI images by employing a double-labeling fully convolutional network. This system used ROI-label and segmentation-label to delineate IPCLs based on the AB classification by the Japan Esophageal Society[75]. The study showed that senior observers had significant higher diagnostic accuracy than mid-level and junior ones. The model reached a diagnostic accuracy of 89.2% and 93% in lesion and pixel level, respectively, for distinguishing type A, B1 and B2 IPCLs, which was similar to that of senior group. Specifically, the model had a higher sensitivity for type A IPCL than clinicians (71.5%vs28.2%-64.9%), which might avoid unnecessary radical treatment. Instead of identifying IPCL patterns, the study conducted by Nakagawaet al[76]aimed to predict invasion depths. The authors developed two separate SSDbased CNNs for ME and non-ME images. The ability of the system to correctly distinguish EP/submucosal (SM) 1 cancers from SM2/SM3 cancers was 91%, 92.9%, and 89.7% for the ME+ non-ME, non-ME and ME images, respectively. Regrading M and SM cancers, the differentiating accuracy was 89.7%, 90.3%, and 92.3% for the total, non-ME, and ME, respectively. The performance of this CAD model was also comparable to experts, but much faster.
A processing speed over 30 images/s is necessary for dynamic video analysis[56]. Although Horieet al[56], Ohmoriet al[73]and Nakagawaet al[76]reported that their systems could process one image in 0.02, 0.027, and 0.033 s, respectively, they have not tested the systems in real-time videos. After Caiet al[72]had validated the efficacy of their DNN-CAD model, they split the video into images and then assembled them, enabling the model to delineate early cancer in real time. Eversonet al[77]validated another CNN investigating IPCLs using sequential still images in real time of 0.026 to 0.037 s per image. The CNN could differentiate type A from type B IPCLs with a mean accuracy of 93.3%. Last year, Luoet al[78]reported a multicenter, comparative study, exploiting 1 036 496 endoscopic images to construct a gastrointestinal artificial intelligence diagnostic system (GRAIDS) based on the concept of DeepLab’s V3+. The GRAIDS yielded a diagnostic accuracy of UGI cancer ranging from 91.5% to 97.7% for internal, external, and prospective validation datasets, with favorable sensitivities, which were similar to experts and superior to non-experts. They also incorporated the CAD model to endoscopic videos in real time, with the highest speed of 0.008 s per image and latency less than 0.04 s. However, they did not report their outcome in distinct histology. Recently, Guoet al[79]specially developed a CNN-CAD system built on SegNet architecture, aiming at real-time application in clinical settings. In thisstudy, 13144 NBI (ME + non-ME) images and 80 video clips were employed. In the image dataset, the SEN, SPE, and AUC were 98.04%, 95.03% and 0.989, respectively. For the video dataset, the SEN of per frame for non-ME and ME was 60.8% and 96.1%,
respectively; the SEN of per lesion for non-ME and ME was both 100%. When they analyzed 33 original videos of full-range normal esophagus, they acquired a SPE of 99.95% and 90.9% for per-frame and per-case analysis, respectively. The ability of this model to process each frame with a maximum time of 0.04 s and latency less than 0.1 s set a good example for future model optimization for real-time applications[80].
Table 2 Computer-aided endoscopic diagnosis for early esophageal squamous cell cancer
Endomicroscopy: In 2007, Kodashimaet al[81]used ImageJ software to label the border of nuclei under endo-cytologic images from 10 ESCC patients. They found that the computer-labelled nuclei area of ESCC was significantly different from that of normal tissues, which demonstrated the diagnostic possibility of computer. HRM is another low-cost tool that can illustrate the esophageal epithelium in cellular level, which compensates the low specificity of iodine staining and is also more cost-effective compared with CLE. In 2015, Shinet al[82]developed a 2-class linear classification algorithm using nuclei-related features to identify neoplastic squamous mucosa (HGD + cancer). It resulted in an AUC, SEN, and SPE of 0.95, 87%, 97% and 0.93, 84%, 95% for the test and validation datasets, respectively. However, the application of this system for real-time practice needs acceleration of analyzing speed. To solve this problem and reduce the cost of equipment, a smaller, tablet-interfaced HRM with realtime algorithm was developed by Quanget al[83]. The algorithm was able to automatically identify SCC with an AUC, SEN, and SPE of 0.937, 95%, and 91%, respectively, which is comparable to the result achieved by the first generation bulky laptop-interfaced HRM[82]or the combination of Lugol chromoendoscopy and HRM[84].
The exciting and promising findings of various CAD models have been summarized in detail above. Researches are ongoing worldwide because none of the studies were perfect. The limitations and problems are driving forces for evolution and innovation. We hereby discuss several major drawbacks that limit the strength of the studies.
Firstly, the most mentioned drawback is insufficient training sample size. The number of endoscopic images that the majority of studies employed ranged from 248 to about 7000 (Tables 1 and 2). The limited number of training data, lack of imaging variability, and single-center nature are likely to cause overfitting[85], which attenuates the ability of AI models to perform well in unused datasets and leads to unstable results[12,55]. To overcome this problem, various regularization methods have been developed, such as segmenting the image or using cross validation with 5 folds or even 10 folds to augment the datasets. Recently, the size of datasets has been greatly enlarged in several studies[56,73,79], the largest of which included over one million UGI images from six centers[78]. Therefore, further multicenter studies including large dataset with different kinds of images (i.e.,WLI, NBI, ME and non-ME) harvested by different endoscopic systems for SCC and AC are likely to produce results with robustness and external generalizability. In addition, different AI algorithms tested in prospective external dataset need to be developed to increase the diversity of AI technology[13].
Secondly, selection bias is another contributor to limited generalizability. Most of the previous studies were retrospective and used only high-quality images. Suboptimal quality images with mucus, blur, or blood. were excluded. Additionally, unbalanced distribution of lesion types (SCCvsAC, type B1vsB2 and B3 IPCLs), different numbers of images for each patient, and non-uniform processing method for different lesions all might cause bias in the result. Further prospective RCTs will be required in the future.
Thirdly, almost all of the studies employed still images to train AI model. Not until recently did the researchers validate the efficacy in dealing with endoscopic videos in a real-time manner. Future video-based researches are needed to narrow the gap between study and clinical practice.
Gold standard: Consensus-based ground truth for lesions is preferred over a single expert’s annotation. The committees of expert endoscopists and pathologists from different countries need to be formed to improve the precision of annotation. In addition, the AI should play a role in helping endoscopists recognize lesions and target biopsies for gold standard pathological examination, rather than replacing our “job”.
Hardware upgrade: Computers equipped with powerful GPU are needed to perform more sophisticated algorithms and process large volume of graphical data, in order to achieve the goal of real-time recognition.
Pre-training database: ImageNet and GastroNet have been introduced, which store mass datasets of manually labeled images. These databases should be constantly enriched, since CAD models with prior knowledge are prone to have better discriminative ability[60].
Cost-effect analysis: When a novel diagnostic method is introduced to clinical practice, whether it is cost-effective is an important issue. A recent multi-center add-on analysis revealed that AI is able to reduce cost of colonoscopic management of polyps[86]. Since medical cost is one of the major concerns for both patients and government, it is necessary to assess whether AI can improve diagnostic performance of EC while reducing cost of unnecessary examinations and radical therapies. Future studies concerning medical cost and reimbursement should be conducted in different countries with different healthcare and insurance systems to address this issue.
Ethics and legality: Believe it or not to believe it, it is a real question. While we have taken a giant leap of AI technology in medicine which has the potential to improve the performance of clinicians with different experience and reduce error, the black-box[87]nature of the ML algorithms truly brings doubts[88]. Can we trust the results of AI, since they lack explainability? What should we do with these computer-generated results? Are they certified to be legal evidence? Challenges for legislation, regulation, insurance and clinical practice are inevitable. Supervised RCTs and AI participation in clinical workflow are needed to provide solid evidence that AI is acceptable within the range of legal and ethical concerns[89]. Nevertheless, trends of AI are irreversible. The ultimate role of AI in medicine might be a supervised task performer[90].
In this manuscript, we provided a comprehensive review of AI technology in diagnosis, treatment decision and outcome prediction for EC. We searched only PubMed database for clinical researches and applications. Issues regarding computer science and image processing are not our topics. The CAD systems have evolved from traditional ML algorithms to neural network-based DL, and from still image analysis to real-time video processing. AI can improve non-expert’s performance while correct erroneous classification by experts[78]. Researches with larger datasets and more reliable CAD models are being conducted worldwide. It is promising that AI may facilitate early cancer screening, surveillance and treatment in high-risk regions. However, it is noteworthy that patient’s consent and satisfaction are of first priority.
World Journal of Gastroenterology2020年35期