Introduction
In this chapter, we look at more advanced AI machine learning computer programs for the assessment of normal and diseased lungs using lung CT AI. These AI programs go beyond the reactive machine methods and use the more advanced limited memory AI methods. We have discussed in detail several reactive machine AI approaches to assessing the presence of emphysema, air trapping, and lung fibrosis in Chapter 5, Chapter 6 . There is a learning component that precedes the use of reactive machine AI algorithms that select the best analytical lung CT AI metric for a given task. This is seen in the LAA −950 in Chapter 5 and LAA −856 in Chapter 6 . The learning process is done by trial and error, often using linear regression methods, which are a form of machine learning to decide which analytical lung CT metric is best to assess emphysema on a TLC chest CT scan. The −950 HU threshold was “learned” by looking at several thresholds and determining which threshold corresponds best to other independent measures of emphysema (e.g., lung pathology or pulmonary function testing).
The process of all lung CT AI methods links together a series of four lung CT AI agents that work together to accomplish a final objective to detect and assess diffuse lung disease ( Fig. 7.1 ). The first step is the CT scanner AI program that generates 3D CT images of the entire thorax. The second step is sending these 3D CT images to a lung segmentation AI program that separates the lungs, airways, and pulmonary vessels from the rest of the thoracic anatomy (e.g., heart, aorta, spine, chest wall muscles). This lung segmentation AI agent is a big enabler in making it possible to analyze a large number of chest CT scans with little or no intervention by human beings. The more advanced lung CT AI lung segmentation software programs use limited memory AI approaches including deep learning. For the third step, the lung segmentation AI program passes the images of the lungs to another AI agent that looks for image features in the lung CT images that can be used to predict certain tissue states (e.g., normal, emphysema, pulmonary fibrosis). The lung features that are extracted by reactive machine AI agents have been described in Chapter 5, Chapter 6 . The feature extraction mechanism was hardwired into the reactive memory AI agent without the AI agent needing to learn anything about the lung CT image data. For example, identifying the percent of the lung tissue that was <−950 HU on TLC chest CT scans as a feature of emphysema. The fourth step of lung CT AI is to send the extracted lung CT features to an AI program to detect and assess diffuse lung disease based on the features extracted from the 3D lung CT images. The detection and assessment can be a simple lookup program that assesses if there are any lung CT voxels <−950 HU and, if there are, calculate how many and express this as a percentage of the total lung tissue (e.g., LAA −950 ) to assess the amount of emphysematous tissue that is present. The LAA −950 metric has been previously validated as a viable measure of emphysema, as described in Chapter 5 .
In this chapter, we will discuss limited memory (also known as machine learning) lung CT AI agents that train themselves to extract features from the lung CT images that best detect and assess evidence of diseased lung tissue. Supervised training of the limited memory CT AI agent is when the important CT image features are first identified by an expert imaging physician and then the limited memory lung CT agent trains itself to recognize the key features in the image that identify the diseased lung tissue previously identified by the expert imaging physician ( Fig. 7.2 ). The process is unsupervised when the limited memory lung CT AI agent automatically extracts the best imaging features based on the AI agent learning the best lung CT image features to detect and assess the presence of diseased lung tissue ( Fig. 7.3 ).
Supervised training methods of limited memory AI algorithms include support vector machine, decision tree, linear regression, logistic regression, naïve Bayes, k -nearest neighbor, random forest, AdaBoost, and neural network methods. Unsupervised training methods of limited memory AI algorithms include K -means, mean shift, affinity propagation, hierarchical clustering, DBSCAN (density-based spatial clustering of applications with noise), Gaussian mixture modeling, Markov random fields, ISODATA (iterative self-organizing data), and fuzzy C-means systems. Deep machine learning methods such as convolutional neural networks (CNN) are a recent exciting supervised or unsupervised training method that has been applied recently in the detection and assessment of emphysema and COVID-19 pneumonia; more on this later in the chapter.
After the limited memory lung CT AI agent is trained to detect and assess a class of diseased lung tissue, such as emphysema, pulmonary fibrosis, and pneumonia, it is tested on a new set of chest CT cases that have been labeled by an independent method (e.g., by a human who has visually looked at the CT images for evidence of normal lung tissue, emphysema, pulmonary fibrosis). The results of this testing, or validation step, determine the performance of the supervised or unsupervised machine learning algorithms to detect and quantify important features of lung disease. The results of the AI agent in quantitating the amount of important feature(s) in the CT images, such as emphysema, are often correlated with other measures of disease severity or outcomes (e.g., physiology testing and death rate or mortality).
Limited Memory Lung CT AI and the Assessment of Emphysema
Adaptive Multiple Feature Method (AMFM) AI Agent (Supervised, Bayesian Classifier)
The adaptive multiple feature method (AMFM) first described by Uppaluri et al. in 1997 is one of the first lung CT AI papers to use limited memory AI in the assessment of normal and emphysematous lung tissue from chest CT scans. The approach used supervised learning to train the AMFM AI agent. The study had 9 normal subjects and 10 subjects with emphysema. Normal subjects were scanned in the prone position, since they were part of another study looking at interstitial lung disease where prone scanning was done. The emphysematous subjects were scanned in the supine position, since they had advanced COPD and were also being evaluated for lung volume reduction surgery to treat their emphysema. The CT protocol obtained four 3-mm-thick axial images of the lungs obtained using the Imatron Fastrac C-150 XL electron beam CT scanner. Two of the axial CT images were obtained at the level of the carina (tracheal bifurcation), and two were obtained halfway between the carina and the diaphragm.
The different steps that the limited memory AI program AMFM uses are summarized in ( Fig. 7.4 ). The four AMFM steps are in order the following: acquire four 2D lung CT Images, automatically segment the lung tissue from the rest of the thoracic anatomy on the four 2D CT images, expert imaging physician selects regions of interest (ROI) of normal and emphysematous tissue; extract multiple statistical and fractal texture features from training ROI in the four 2D CT images and learn which of these features are optimal for the assessment of normal versus emphysematous lung tissue; detect and assess normal versus emphysema from the test ROI in the four 2D lung CT images based on the prior probability of the extracted features matching normal lung or emphysematous lung.
The third step was accomplished by first having an expert imaging physician label six predefined regions of both the right and left lung on each of the four lung CT images as to whether they were definitely normal or definitely had emphysema. The CT images were then processed so that neighboring voxels of similar values were all assigned an average of the neighboring voxels. These are referred to as the preprocessed ROI. The ROI corresponding to the visually labeled regions of normal and emphysema on the CT images were matched between the unprocessed and processed images. The unprocessed training ROI were then assessed by assessing five first-order statistical features: mean, variance, skewness, kurtosis, grey level entropy, and the geometric fractal dimension. Then the preprocessed training ROI were assessed using eleven second-order statistical features. Five of these second-order statistical features were run-length features and included: short-run emphasis, long-run emphasis, grey level nonuniformity, run-length nonuniformity, and run percentage. The remaining six second-order statistical features were based on the cooccurrence matrix and included: angular second moment, entropy, inertia, contrast, correlation, and inverse difference moment. All of the features were normalized for pixel size and size of the lung in the CT image. The ROI regions that were either definitely normal lung tissue or definitely emphysematous lung tissue were randomly split into two groups of ROI: a training group of ROI used to train the AMFM AI agent and a test set of ROI used to evaluate the ability of the AMFM AI agent to detect normal versus emphysematous ROI. The optimal set of features were selected from the training ROI using the divergence measure along with correlation analysis. Classification into normal and emphysema was done using a Bayesian classifier. The optimal features from the training ROI were used to determine the Bayesian classifier parameters. The ROI from the test set were classified by the Bayesian classifier parameters as to whether the test ROI was normal lung tissue or emphysematous lung tissue. This process could then be repeated to improve the performance/learning of the AMFM lung CT AI agent by adding more labeled normal and emphysematous training and testing ROI. Additional statistical features could also be included to try and improve the performance/learning of the AMFM lung CT AI agent.
The optimal set of features obtained by the AMFM AI agent from the training ROI to separate the normal from emphysematous lung tissue were mean lung density and two run-length features: short-run emphasis and grey level nonuniformity. It is important to note that the mean lung density value of the voxels in the image was an important feature identified by the AMFM AI agent in the training process. Mean lung density is a simple and easy concept to understand that was discussed in Chapter 5 . The fundamental lung tissue parameter measured by the CT scans of the lung is lung density. Lung density is known to decrease in patients with emphysema.
The AMFM AI agent was compared to the MLD and the 5th percentile histogram methods for identifying normal versus emphysematous lung. The histogram method looked at the CT number in HU threshold where the lowest 5% of lung voxels occurred. In distinguishing normal lung from emphysematous lung, the AMFM method was 100% accurate, 5th percentile histogram method was 97.4% accurate, and the MLD method was 94.7% accurate. The AMFM AI agent achieved a modest improvement in identifying normal versus emphysematous lung tissue compared to the simpler reactive machine lung CT AI methods in this study.
Deep Learning Enables Automatic Classification of Emphysema Pattern at CT
Humphries et al. in 2020 reported using a deep learning algorithm that used both CNN and long short-term memory (LSTM) to train a lung CT AI agent to classify patterns of emphysema according to the Fleischner Emphysema Criteria ( Fig. 7.5 ). The Fleischner whitepaper published in 2015 described in detail the visual features of smoking-induced emphysema on chest CT scans. The Fleischner system uses a six-point ordinal scale to visually assess increasing grades of emphysema on chest CT scans. The labels in increasing order of severity are absent, trace, mild, moderate, confluent, and advanced destructive. Between 2007 and 2011, 9652 subjects from the COPDGene study had baseline chest CT scans. These chest CT scans were assessed visually using the Fleischner system. These 9652 subjects also had follow-up mortality information through 2018. A deep learning algorithm was developed using Python Version 3.6 and PyTorch. The input to the convolutional neural network were 25 axial CT images evenly spaced over the z-axis length of the lung, head to toe. The CNN extracted the chest CT image features. The CNN included four blocks with a total of eight layers, four 2D convolutional layers each followed by a pooling layer. The output from the last pooling layer is passed to a concatenation layer that creates a vector of chest CT features that are then passed to a LSTM layer, a kind of artificial neural network, and the output of the LSTM is passed to a dense layer that outputs the probability of each of the six features being present with the total probabilities adding up to 1.0. The final classification score is the probability-weighted average of the categories rounded to the nearest integer.
To train this AI agent, 2407 COPDGene chest CT scans were used. The AI agent was then validated on 7143 separate COPDGene chest CT scans. Additional testing of the AI agent was also done on an additional 1962 subjects from the ECLIPSE research study. The computation time for the AI agent was approximately 60 seconds per chest CT scan. There was moderate agreement in the 7143 COPDGene test chest CT scans between the deep learning emphysema score and the visual emphysema score. The AI agent classified 34% of the cases as one category more severe than the visual score and 13% of the cases as one category less severe than the visual score. The greatest discordance between the AI agent emphysema score and the visual emphysema score were in those cases where the visual score was normal and the AI agent score indicated trace emphysema. The subjects with normal AI and visual emphysema scores had better measures of airflow (FEV1% predicted and FEV1/FVC ratio) than those subjects with normal visual scores and trace AI agent scores, suggesting the AI agent was detecting real disease with functional significance that the visual score did not. They also had less evidence of emphysema using the less sophisticated AI agent described in Chapter 4 that measures emphysema by determining the amount of lung <−950 HU on TLC CT scans. The AI agent assessment of emphysema severity significantly correlated with the severity of airflow limitation (FEV1 % predicted and FEV1/FVC ratio), 6-minute walk test (6MWT), mMRC dyspnea score (shortness of breath), and the St. George Respiratory Quality (SGRQ) of life score. Similarly, the AI agent emphysema scores were significantly correlated with the clinical stage of COPD. The AI agent emphysema score improved the fit of the visual emphysema score in predicting FEV1 % predicted, FEV1/FVC, 6MWT, and SGRQ of life score adjusting for age, race, sex, height, weight, smoking history, current smoking status, education level, and study site. This indicates that the AI agent emphysema score provides additional information in addition to the visual emphysema score. The AI agent emphysema score predicted increased mortality in subjects with increasing severity scores and the AI agent emphysema score was able to separately predict the mortality risk for Fleischner grade 5 and grade 6 levels, whereas the visual emphysema score could not resolve differences in mortality between grade levels 5 and 6. The AI agent emphysema score predicted increasing mortality as the AI emphysema score increased, even after adjusting for the amount of lung <−950 HU, a simpler AI approach to the assessment of emphysema (see Chapter 4 ). The training and validation of the CNN AI agent in assessing emphysema was now complete. The CNN AI agent was then tested on a completely separate cohort of 1962 patients. The 1962 patients in the ECLIPSE study had TLC chest CT scans, LAA <−950 HU emphysema score, pulmonary physiology measurements, 6MWT, mMRC dyspnea score, and SGQR score; these were used to test the CNN AI emphysema agent. There were no visual readings done for the ECLIPSE cohort chest CT scans. The CNN AI agent emphysema scores correlated well with increasing severity of FEV1% predicted, FEV1/FVC, 6MWT, mMRC dyspnea score, SGQR score, and increasing LAA <−950 HU scores. This successful testing outcome for the CNN AI agent emphysema severity score suggests that this CNN AI method can be applied to other chest CT scans that are performed to assess smoking-related lung disease to assess the presence and severity of emphysema and predict mortality risk. Tying the CNN AI agent to the visual grades of emphysema established by the Fleischner whitepaper makes it intuitive for the interpreting and treating physicians to understand what the CNN AI agent is doing. The AI agent did better than the visual scoring in the COPDGene validation cohort in assessing physiological impairment, 6MWT, mMRC dyspnea score, and SGRQ quality-of- life score, as well as predicting mortality, suggesting it is capturing additional features of the COPD lung disease that the visual scoring process is not capturing. This could be because there are lung CT features that the human visual process cannot detect.
Limited Memory Lung CT AI and the Assessment of Interstitial Lung Disease (ILD)
AMFM AI Method for Assessing Interstitial Lung Disease
The AMFM method was described in detail above ( Adaptive Multiple Feature Method (AMFM) AI Agent (Supervised, Bayesian Classifier) ) for its application in assessing normal versus emphysematous lung tissue. The AMFM method has been used to not only assess normal and emphysematous tissue but also to assess ILD due to IPF and Sarcoid.
Uppaluri et al. (1999) reported that the AMFM AI method was able to distinguish between normal lung tissue, IPF-related lung disease, sarcoid-related lung disease, and emphysema significantly better than the MLD and Histogram methods. This further supported the notion that the limited memory AI agent approach improved the performance of the reactive machine AI agent approach in identifying and distinguishing between normal and three other different important lung diseases.
CALIPER (Computer-Aided Lung Informatics for Pathology Evaluation and Rating)
CALIPER is a novel software program implementing an AI agent developed at the Mayo Biomedical Imaging Resource Core that can take chest CT images and, in near real-time, identify features of COPD (emphysema) and ILD (ground-glass opacities (GGO), reticular opacities (RO), and honeycombing (HC)) and quantify the amount of lung that is affected with these features of COPD and ILD ( Fig. 7.6 ). The CALIPER AI agent first segments the lungs from the rest of the thoracic anatomy and then segments the airways and pulmonary vessels from the rest of the lung tissue. The CALIPER AI agent uses limited memory AI to learn how to classify different tissue types including normal, emphysema, GGO, RO, and HC. Chest CT scans of 14 subjects from the Lung Tissue Research Consortium (LTRC) with the diagnosis of COPD or ILD were used for training the CALIPER AI agent. These 14 chest CT cases were used by expert thoracic imaging physicians to identify 15 mm × 15 mm × 15 mm lung voxel volume of interest (VOI) that were considered to contain 70% or more of the following features: normal, emphysema, GGO, RO, HC. This process identified a total of 976 15 mm × 15 mm × 15 mm VOI: 265 normal VOI, 80 emphysema VOI, 150 GGO VOI, 294 RO VOI, and 187 HC VOI. Pairwise, dissimilarity metrics based on the voxel histograms contained in the same VOI that were visually classified by the expert thoracic imaging physicians were assessed using multidimensional scaling (MDS). The optimal metric was the MDS representation of the Cramer-Von Mises Distance (CVM), which is the L2 metric between cumulative density functions. CVM was the most consistent in matching the VOI voxel values with the visual classification of the VOI. Assessing concordance between the human expert VOI labels (columns) and the unsupervised affinity propagation clustering of the four pairwise CVM dissimilarity metric (rows) of the 976 VOI used to train CALIPER were assessed using the k statistic. K × K tables method (columns × rows) was used to assess the agreement for each visual label. The result of this showed that the CALIPER AI agent grouping of the VOI were well correlated with the visual VOI labels.