Development of a Consensus Document to Improve Multireader Concordance and Accuracy of Aortic Regurgitation Severity Grading by Echocardiography Versus Cardiac Magnetic Resonance Imaging




Current guidelines recommend a multiparametric echocardiographic assessment of aortic regurgitation (AR). However, the absence of a hierarchical weighting of discordant parameters could cause interobserver variability. In the present study, we sought to define and improve the interobserver variability of AR assessment. Seventeen level 3 readers graded 20 randomly selected patients with AR. The readers also provided a usefulness score for each parameter, depending on its influence on their decision of the AR severity grade. A consensus strategy was subsequently formulated and validated against cardiac magnetic resonance imaging in a separate group of 80 patients. The readers were updated with the consensus document and recalibrated using the same cases. Agreement was statistically assessed using Randolph’s free-marginal multirater kappa. At baseline, no uniform approach was used to combine the individual parameters, contributing to the interobserver variability (overall kappa 0.5). A consensus strategy to categorize AR severity was developed in which the left ventricular volume took precedence over the other parameters and was used to differentiate chronic severe AR from less severe categories. Recalibration of the readers using this consensus strategy improved concordance (kappa increased to 0.7). The new strategy also improved the accuracy relative to cardiac magnetic resonance imaging, as evidenced by full agreement on severe AR between the consensus document-based grading and AR severity defined by cardiac magnetic resonance imaging in the separate validation group of 80 patients. In conclusion, grading of chronic AR using a multiparametric approach has suboptimal consistency between readers and a left ventricular volume-based consensus document improved concordance and accuracy.


Echocardiographic assessment of the severity of aortic regurgitation (AR) has been based on multiparametric assessment, increasing the risk of interobserver variability. Previous attempts to standardize AR grading using either a composite score or an integrated approach have not resolved the problem, because the proposed strategies were complex and did not incorporate robust quantitative parameters such as the left ventricular (LV) volume. Recent recommendations for quality echocardiographic laboratory operations have highlighted the need for an assessment of the interobserver variability in regurgitant valvular lesions. The aims of the present study were to (1) assess interobserver variability in grading the severity of AR, (2) assess the causes of interobserver variability, (3) formulate a simple consensus document to facilitate uniform reading and check its validity against cardiac magnetic resonance imaging (MRI) in a separate validation group, and (4) evaluate whether the consensus document improved multireader concordance and accuracy.


Methods


The study was divided into 4 phases. In the calibration phase, we performed a baseline assessment of interobserver agreement and majority accuracy against a reference standard. The consensus phase involved formulation of a consensus document to standardize the grading of AR. In the validation phase, we checked the accuracy of the consensus document-based AR grading against the cardiac MRI findings. Finally, the recalibration phase involved updating all readers with the consensus document and a reassessment of the interobserver agreement and accuracy.


The institutional review board of the Cleveland Clinic approved the study, with a waiver of individual informed consent. We selected 20 outpatients who had been referred to our echocardiography laboratory for assessment and management of AR involving a native aortic valve for the calibration group. The 12 patients (patients 9 to 20) from the calibration group underwent cardiac MRI within 24 hours of their echocardiogram. For the validation cohort, we recruited 51 consecutive men and 29 consecutive women with AR, who were undergoing echocardiography and MRI within 24 hours of each other starting January 1, 2009. This gave us an overall sample size of 100 patients (inclusive of the 20 patients in the calibration group) with a 2:1 distribution of men and women ( Table 1 ). Hemodynamics were recorded at both echocardiography and cardiac MRI. All patients were in sinus rhythm and had satisfactory images considered suitable for interpretation.



Table 1

Baseline clinical and echocardiographic characteristics of calibration and validation group








































Variable Calibration Group (n = 20) Validation Group (n = 80)
Age (years) 53 ± 18 49 ± 18
Body surface area (m 2 ) 2.0 ± 0.3 1.9 ± 0.3
Men/women 16/4 51/29
Leaflet morphology (trileaflet/bicuspid) 13/7 36/44
Aortic regurgitation jet (eccentric/central) 6/14 27/53
Aortic root (dilated/normal) 4/16 41/39
Left ventricular ejection fraction (%) 57 ± 4 55 ± 9
Echocardiographic left ventricular end-diastolic volume index (ml/m 2 ) 91 ± 31 83 ± 31

Data are presented as mean ± SD or numbers.


Echocardiography was performed by experienced sonographers using standard commercially available equipment. The key qualitative parameters recommended for AR assessment by the American Society of Echocardiography were gathered. The LV size was obtained from images in the parasternal long-axis view, and the LV volumes were obtained using the biplane Simpson method. The LV size was classified as normal or dilated according to the American Society of Echocardiography guidelines on native valvular regurgitation. A parasternal long axis image depicting all 3 components of the jet, flow convergence, vena contracta, and jet area at a Nyquist limit of 50 to 60 cm/s were identified and used for evaluation of the vena contracta and the jet width/LV outflow tract ratio. The proximal isovelocity surface area radius was to be measured only if the flow convergence zone was deemed spherical.


Images for jet density and pressure halftime were obtained by continuous-wave Doppler in the best aligned views. Holo diastolic flow reversal (HDFR) was assessed using pulsewave Doppler in the descending thoracic and abdominal aorta.


Seventeen expert readers in the United States, Japan, Belgium, Italy, and Australia were asked to complete a data entry sheet for each case. The comprehensive studies included a standardized collection of images, including all the key parameters required for assessing AR severity using echocardiography. All moving and still images with measurements were accessed by readers by way of a secure on-line folder. The readers were able to freeze the images and view them frame-by-frame. In addition to grading the AR severity, the interpreters gave a usefulness score to each parameter, depending on its influence on the overall grade of AR.


Eleven United States readers formulated a LV volume-based consensus document with a hierarchy of key factors determined by their opinion of the reliability, robustness, and reproducibility of each key parameter according to the published data. This document was then circulated to the international readers. None of the cases used in the initial calibration phase were reviewed or discussed. The consensus document-based AR grade was validated against cardiac MRI in a total of 80 patients.


Cardiac MRI was performed with commercially available equipment (Achieva XR 1.5-T, Philips Medical Systems, Best, the Netherlands). Velocity-encoded MRI was performed at the midascending aorta on the basis of axial images. HDFR was evaluated at the mid-descending thoracic aorta, and the regurgitant volume was measured in the ascending aorta at the level of the pulmonary artery bifurcation. Cine steady-state free precession images were also obtained in short-axis orientation with coverage of the entire left ventricle. Endocardial contours were drawn manually at end-diastole and end-systole to allow calculation of the LV volumes and ejection fraction. We used a composite of HDFR in the descending aorta, a regurgitant fraction of >27%, and elevated LV end-diastolic volume index (LVEDVI; >2 SD greater than normal for age and gender) as the prespecified criteria for severe AR. In patients with a normal LV size, a regurgitant fraction of ≥15% was used as the criterion for moderate AR. Three weeks after the initial calibration, the readers were updated with the final consensus document and were asked to rescore all the cases.


Statistical analysis was performed with JMP, version 9 (SAS, Cary, North Carolina). Agreement was assessed using the chance-adjusted multirater free marginal kappa coefficient. We not only assessed the overall agreement, but also the agreement in the individual subcategory (i.e., mild, moderate, or severe AR). The usefulness score for each parameter was expressed as a percentage of the overall score. Univariate and multivariate analysis was performed to identify predictors of poor agreement (kappa <0.5). Accuracy was evaluated as sensitivity and specificity, using a 2 × 2 table. The correlation between cardiac MRI and consensus document-based grading was evaluated using Pearson’s correlation coefficient.




Results


The clinical and echocardiographic characteristics of the studied patients are listed in Table 1 . The mean ejection fraction was within the normal range, but the ventricles were generally enlarged. The baseline concordance among the readers was suboptimal, with an average kappa of 0.5 and the lowest kappa (0.4) for moderate AR. Agreement of >80% of readers was observed for only 13 of the 20 patients ( Figure 1 ). Logistic regression analysis did not show a statistically significant association between interobserver variability and bicuspid valve (p = 0.07) or jet eccentricity (p = 0.18), implying the possibility that discordant readings were less attributable to the valve characteristics than to the reader characteristics.




Figure 1


Baseline raw agreement in all 20 cases. These results show >80% raw agreement on AR severity in only 13 of 20 patients.


To identify the cause of interobserver variability, the readers were asked to give a usefulness score. The relative usefulness score of all key parameters was within 10%, indicating no preference or hierarchy regarding how the key parameters contributed to the interobserver variability ( Figure 2 ). Figure 3 illustrates case 7, which was categorized as severe by 70% of readers and moderate by 30% at baseline. This example shows how, in the absence of a hierarchy, incongruent key parameters can result in interobserver variability.




Figure 2


Relative usefulness score of key parameters. The parameter identified as the most key to defining severity (expressed as the percentage of the maximum total score). The proximity of these relative scores (within 10% of each other) attests to the lack of hierarchy.



Figure 3


Ambiguity in the classification of AR as moderate or severe. Case 7 was a typical case in which incongruent parameters lead to interobserver variability. Jet width/left ventricular outflow tract ratio and pressure halftime (PHT) indicate moderate AR. In contrast, a dilated left ventricle and holo-diastolic flow reversal indicate severe AR. The consensus document would lead to these being designated as severe.


To address the etiology of interobserver variability, a LV volume-based consensus document was developed in which the LV volume was given precedence over the other parameters and was used to differentiate chronic, severe AR from less severe categories ( Figure 4 ). All parameters were selected according to their reliability, robustness, and reproducibility. This consensus document was then validated in 80 cases against cardiac MRI (using the composite of LVEDVI elevated >2 SD greater than the normal for age and gender, regurgitant fraction >27%, and descending HDFR as a criteria for severe AR by cardiac MRI). Overall, a strong correlation was found between the cardiac MRI findings and consensus document-based grading across all categories of AR ( R = 0.91; p <0.001). Furthermore, 100% agreement was found for severe and nonsevere AR between consensus document-based grading and cardiac MRI in both the calibration and validation groups.




Figure 4


Consensus schema for hierarchical grouping of key echocardiographic parameters in chronic AR. The key parameters are divided into 2 hierarchical groups—diagnostic parameter (LV size) and specific parameters (indexes of regurgitant volume). It should be noted that the LV size/volume criteria might not be valid for acute AR or LV dilation from other causes.


The readers were provided with a consensus document-based algorithm to assist them in grading AR severity in the recalibration phase ( Figure 5 ). Multireader concordance increased in all categories, with the overall kappa increasing from 0.5 to 0.7 ( Figure 6 ). There was >80% raw agreement in all but 2 cases ( Figure 7 ). The consensus document-based algorithm also improved the accuracy relative to cardiac MRI. At baseline, most readers identified only 4 of 7 cases categorized as severe by cardiac MRI. After intervention, sensitivity increased from 60% to 100%, without affecting the baseline specificity.


Dec 7, 2016 | Posted by in CARDIOLOGY | Comments Off on Development of a Consensus Document to Improve Multireader Concordance and Accuracy of Aortic Regurgitation Severity Grading by Echocardiography Versus Cardiac Magnetic Resonance Imaging

Full access? Get Clinical Tree

Get Clinical Tree app for offline access