Deep Learning Approach Readily Differentiates Papilledema, Non-Arteritic Anterior Ischemic Optic Neuropathy, and Healthy Eyes





OBJECTIVE


Deep learning (DL) has been used in differentiating a range of ophthalmic conditions. We describe a model to distinguish among fundus photos of acquired optic disc swelling due to idiopathic intracranial hypertension (IIH), non-arteritic anterior ischemic optic neuropathy (NAION), and healthy eyes.


DESIGN


Development and validation of a DL diagnostic algorithm.


SUBJECTS, PARTICIPANTS, AND/OR CONTROLS


Our model was trained and validated on 15 088 fundus photos from 5866 eyes, including eyes with IIH with a Frisén grade ≥1 (418), acute NAION (780), and healthy controls (4668). We performed external validation on an additional 1126 photos from 928 eyes across these groups. All images were obtained from randomized and nonrandomized clinical trials, publicly available datasets, and real-world clinical sources.


METHODS


After preprocessing images to standardize optic disc position, we fine-tuned a ResNet-50 model. Performance was evaluated using 5-fold cross-validation, with metrics such as accuracy, area under the receiver operating characteristic curve (AUC-ROC), precision, recall, F1 scores, and confusion matrices calculated. We applied gradient-weighted class activation mapping to generate visual activation maps highlighting the regions of interest in the fundus images. External validation evaluation was performed with majority voting of our cross-validated models.


MAIN OUTCOME MEASURES


The primary outcome measures were the model’s overall accuracy, class-wide AUC-ROC, precision, recall, F1 scores, and confusion matrices.


RESULTS


The model achieved an internal validation accuracy of 96.2%, with a macro-average AUC-ROC of 0.995. F1 scores ranged from 0.90 to 0.97 for all classes. The external validation set had an accuracy of 93.6%, F1 scores from 0.90 to 0.95, and a macro-average AUC-ROC of 0.980. Activation maps consistently highlighted the optic disc, with emphasis on the inferior optic disc for IIH, superior optic disc for NAION, and the entire optic disc for healthy eyes.


CONCLUSIONS


Our study demonstrates the potential of fundus-based DL models to accurately distinguish among IIH, NAION, and healthy eyes, providing a potentially valuable diagnostic method. With its strong discriminative capabilities, this model can be an important tool for neuro-ophthalmic assessment, particularly when access to specialized neuro-ophthalmologists is limited.


N on-arteritic anterior ischemic optic neuropathy (NAION) and idiopathic intracranial hypertension (IIH) are distinct disorders that cause optic disc swelling and vision loss. ,


Although neuro-ophthalmologists can typically differentiate these disorders through clinical assessment, nonexpert ophthalmologists, optometrists, and neurologists may find it more challenging. NAION is primarily identified based on clinical presentation, fundoscopic findings, and the exclusion of other potential causes of optic neuropathy. Moreover, cases of bilateral or sequential NAION can be diagnostically difficult even for experts because these are more rare presentations and no definitive test exists to confirm the diagnosis.


Given the diagnostic challenges, leveraging advanced image analysis techniques offers a promising avenue for distinguishing among IIH, NAION, and healthy eyes. Fundus photography captures detailed images of the optic disc and peripapillary retina region, including vessel dilation and obscuration, hemorrhages, cotton wool spots, optic nerve head (ONH) swelling and pallor, peripapillary retinal nerve fiber layer opacification and retinal wrinkles, folds, and exudates. Deep learning (DL) models built on fundus images have already demonstrated success in providing preliminary diagnoses for conditions such as diabetic retinopathy, glaucoma, and age-related macular degeneration.


DL also has been used to determine the severity of neuro-ophthalmologic conditions such as IIH. These advancements in fundus-based DL highlight the potential to extend such approaches to making preliminary diagnoses of neuro-ophthalmic conditions, such as IIH and NAION, where subtle differences in optic disc and retinal structures may aid in differentiating between these disorders and healthy eyes. In IIH, asymmetric optic disc edema often can present with a characteristic C-shaped swelling, typically sparing the temporal disc margin, which may help distinguish it from other causes of optic neuropathy. Milea and associates have shown that a DenseNet-based model was effective at detecting papilledema versus nonpapilledema ONH abnormalities, which is useful for triaging patients with headache or patients with other neurological symptoms, but does not discriminate between specific ONH diseases. Despite prior artificial intelligence advancements, significant diagnostic challenges remain. Accurate differentiation among IIH, NAION, and healthy eyes often depends on clinical assessments and experience, which may lead to inconsistent results, especially in borderline cases. Additionally, the scarcity of experienced specialists and the variability of early-stage presentations further complicate diagnosis. A DL approach holds the potential to reduce diagnostic uncertainty of these diseases by standardizing the analysis of the optic disc and peripapillary retina. By leveraging this technology, we aim to further improve on automated diagnosis and enhance clinical workflow in a variety of settings using fundus photography.


METHODS


This study was approved by the Institutional Review Board of the Icahn School of Medicine at Mount Sinai and required no additional consent because the data used were de-identified and derived from participants who had consented for use of their data collected in clinical trials or at multiple study institutions for research. The study was conducted according to the tenets of the Declaration of Helsinki and is in accordance with Health Insurance Portability and Accountability regulations.


PHOTO PREPROCESSING


Because color fundus photos were captured using various imaging modalities, we performed automated preprocessing for standardization. Although the images differed in file size and resolution, we made the assumption they were taken with approximately the same field of view. Photos of left eyes were horizontally flipped to a right eye orientation. If the ONH or peripapillary retina was blurry, or the photo was under- or overexposed, it was excluded from our analysis. To ensure that the DL model focused on the features within the images rather than on aspects such as scope or resolution, we aimed to crop the optic disc along with the surrounding peripapillary retina. We accomplished this by first automatically locating the optic disc. Next, we determined whether the shorter side of the image was its length or width and then drew a square centered on the optic disc, with each side measuring 60% of the shorter dimension. If this square extended beyond the boundaries of the image or included areas outside the field of view, such as black pixels outside the photo, we excluded those images. We also excluded any photos in which the optic disc was partially obstructed by the edges of the image frame. Because we excluded all images with noncentered optic discs, we then cropped a square centered on the photo, using the smallest side length to ensure a uniform, centered crop. We then resized all photos to 256 × 256 pixels. This process ensured that all photos were standardized to feature a centered optic disc while including some area of the peripheral retina ( Figure 1 ). Our earlier attempts of DL using the entire photo found that the model was learning cues related to photographic technique, such as peripheral artifacts or the field of view, rather than clinically meaningful retinal features. This 60% determination was a compromise between ensuring enough retinal detail for meaningful classification decisions while minimizing peripheral artifacts that would bias the model.




FIGURE 1


Fundus photo preprocessing pipeline. We standardized fundus photos with the following steps: horizontally flipping left-eye images to right-eye orientation and locating the optic disc, ensuring it is not too close to the periphery (which would exclude the peripapillary retina), and then applying a center crop followed by resizing. We excluded low-quality photos and photos where the optic disc was not roughly centered, which ensured uniformity for deep learning (DL) analysis.


IIH


We obtained IIH fundus photos from 3 sources. One group was obtained from the IIH treatment trial, a multicenter, randomized, double-blind, placebo-controlled study evaluating the effectiveness of a weight reduction and low-sodium diet combined with acetazolamide compared with the same diet plus placebo. We included any photos over the course of the study that had an expert grader–determined Frisén grade ≥1. Of the 14 026 total photos, 5803 met our preprocessing criteria. There were 1698 photos of grade 1, 2103 photos of grade 2, 1220 photos of grade 3, 726 photos of grade 4, and 56 photos of grade 5 papilledema. Photographs were taken with a digital imaging sensor with a minimum resolution of 3 megapixels capable of imaging at either a 30° or 35° field of view. The other dataset was a collection of Frisén grade 1 (33), grade 2 (17), grade 3 (13), grade 4 (50), and grade 5 (45) photos from the University of Iowa neuro-ophthalmology clinic to provide a dataset encompassing the entire range of severity of optic disc edema. For our external validation dataset, we collected 388 photos of active IIH at presentation to the New York Eye and Ear Infirmary clinic. Although these images were not labeled by Frisén grade, they all exhibited swollen optic nerves.


NAION


NAION fundus photos were obtained from the QRK207 trial, a multinational, prospective, randomized controlled trial run across 80 sites in 8 countries in Australia, China, Germany, India, Israel, Italy, Singapore, and the United States. The study investigated the safety and efficacy of a synthetic microRNA to block caspase 2 in participants aged 50 to 80 years diagnosed with acute NAION who met study entry criteria. We collected photos of acute NAION at day 1 of enrollment (4618, of which 1473 fulfilled preprocessing criteria) before study medication administration. Photos were collected with a standard single-lens reflex system. We also included 80 images from 80 eyes with acute NAION at presentation, sourced from the University of Iowa Neuro-Ophthalmology Clinic. For our external validation dataset, we used 265 photos of acute NAION from the New York Eye and Ear Infirmary clinic.


HEALTHY


We obtained fundus photos of healthy eyes from 3 sources. The first was the Diabetic Retinopathy Disease dataset on Kaggle taken via the EyePACS – HL7, used in hospitals all across the United States and from 3 eye hospitals in India. The dataset consists of diabetic retinopathy grades 0 to 4, with grade 0 indicating the absence of the condition. Given the large size of the dataset, we assumed that the grade 0 photos (25 810) represented a general population sample of optic nerve–swelling diseases separate from a diabetes diagnosis, which is relatively rare. Preprocessing was needed to find images of adequate quality and centering of the optic disc, which resulted in a total of 4130 usable photos. The other source consisted of fellow eyes, excluding eyes with prior NAION, from the QRK207 trial (3474). For our external validation set, we used healthy eyes from the ORIGA dataset, which consists of ONH-centered, high-resolution images of healthy eyes aged 40 to 80 years.


ACTIVATION MAPS


To visualize the regions of interest for each group, we used gradient-weighted class activation mapping, a technique that highlights the areas in the input images that contribute most to the model’s predictions. For our external validation set, we generated activation maps for each class and overlaid them on a representative fundus photo. We also found class-wide average activation maps by manually identifying the center of each optic disc and applying the same translation to the activation maps as would be needed to center the optic disc in the photo. This standardized alignment across images, enabling a clearer visualization of activation patterns across disease groups.


MODEL ARCHITECTURE


In this study, we used a fine-tuned ResNet-50 25 architecture for our image classification task, focusing on differentiating among IIH, NAION, and healthy eyes. Our data augmentation pipeline incorporated a range of transformations to enhance model robustness. Images were randomly cropped to 224 × 224 pixels and resized to 256 × 256, followed by random rotations up to 15°, with empty areas filled with black to avoid introducing artifacts. We applied random translations up to ±10 pixels in both x and y directions, also filling any gaps with black. We randomly flipped the images horizontally with a probability of 50%. To account for lighting and color variations, color jittering was applied, adjusting brightness, contrast, saturation (each up to ±30%), and hue (up to ±0.15). Finally, images were normalized using the ImageNet dataset’s mean and SD.


We implemented a 5-fold cross-validation strategy using Python’s StratifiedGroupKFold function from scikit-learn to maintain a consistent class distribution across training and validation splits. This approach ensures that in each iteration, 4 folds are used for training and the remaining fold serves as validation. The process is repeated 5 times, so that each data point is used for both training and validation at different stages, maximizing the robustness of our model evaluation. Notably, several participants had multiple images taken of either the same eye on the same day or various times. To prevent data leakage, we ensured that all images from the same eye were kept within the same group during the folds. This approach allowed us to evaluate the model’s performance more reliably while mitigating the risk of overfitting due to repeated measures.


For external validation, we used the 5 cross-validated models to predict each image in the external dataset, applying a majority voting strategy to determine the final class label. The class receiving the most votes was assigned as the final prediction. In situations where there was a tie, we summed the softmax probabilities from all models and selected the class with the highest combined probability as the final prediction. This method ensured a balanced and robust approach to external validation.


The ResNet-50 model was initialized with pretrained weights from ImageNet, a large-scale dataset containing more than 14 million labeled images across 1000 object categories. The final fully connected layer was adapted to classify our specific set of classes corresponding to the conditions of interest. We trained the model for 15 epochs using a stochastic gradient descent optimizer with a default learning rate of 0.001, a batch size of 64, and a momentum of 0.9. Because our dataset was moderately imbalanced, we weighted the loss function inversely proportional to the number of images in each class. This approach prevented underrepresentation of classes with fewer samples, such as NAION, and overrepresentation of more abundant classes, such as healthy eyes. We evaluated the performance of the model with area under the receiver operating characteristic curve (AUC-ROC) scores, overall accuracy, class-wide precision, recall, and F1 scores.


True positives (TPs) refer to instances predicted as positive that are actually positive, true negatives (TNs) are instances predicted as negative that are actually negative, false positives (FPs) are instances predicted as positive but are actually negative, and false negatives (FNs) are instances predicted as negative but are actually positive. Accuracy [(TP + TN)/(TP + TN + FP + FN)] measures what fraction of all predictions the model gets correct. Precision [TP/(TP + FP)], also called “positive predictive value,” focuses on those instances labeled as positive and indicates how many of them were truly positive. Recall [TP/(TP + FN)], also called “sensitivity,” reflects how many of the truly positive instances the model successfully identifies. F1 score [2 × (precision × recall)/(precision + recall)] combines both precision and recall into a single metric by taking their harmonic mean.


RESULTS


We investigated 15 088 fundus photos from 5866 eyes where 418 had IIH, 780 had acute NAION, and 4668 were healthy. Each group’s demographics are shown in Table 1 . Our model achieved an overall internal validation accuracy of 96.2%, with individual folds ranging from 95.9% to 96.4%, and a macro-average AUC-ROC of 0.995 ( Figure 2 ). For IIH, NAION, and healthy eyes, they had a precision of 0.99, 0.88, and 0.96, recall of 0.95, 0.93, and 0.98, and F1 scores of 0.97, 0.90, and 0.97, respectively ( Figure 3 and Table 2 ).


Jul 26, 2025 | Posted by in CARDIOLOGY | Comments Off on Deep Learning Approach Readily Differentiates Papilledema, Non-Arteritic Anterior Ischemic Optic Neuropathy, and Healthy Eyes

Full access? Get Clinical Tree

Get Clinical Tree app for offline access