Accurate assessment of right ventricular (RV) size (RVS) and RV systolic function (RVSF) is vital in the management of various conditions, but their assessment is challenging using echocardiography. The aim of this study was to determine the accuracy and interobserver concordance of qualitative and quantitative RV echocardiography.
Fifteen readers evaluated RV function in 12 patients (360 readings) who underwent echocardiography and cardiac magnetic resonance for RV assessment. Readers qualitatively estimated RVS and RVSF as normal, mild, moderate, or severe and then reassessed quantitatively by adding RV dimensions, fractional area change, S′, tricuspid annular plane systolic excursion, and RV index of myocardial performance. Cardiac magnetic resonance was used as the reference standard for grading RVS and RVSF.
Quantitative measurements increased accuracy and interreader agreement compared to qualitative assessment alone, especially in normal categories. Readers’ accuracy for diagnosing normal and severe RVS increased from 38% to 78% ( P = .001) and from 70% to 97% ( P = .018), and readers’ accuracy for diagnosing normal and mild RVSF increased from 52% to 84% ( P < .001) and from 36% to 56% ( P = .001). Interreader agreement for classification of the subjects as normal or abnormal improved from a κ value of 0.40 to 0.77 (fair to good agreement) for RVS and from 0.43 to 0.66 (moderate to good agreement) for RVSF.
Visual estimation of RVS and RVSF is inaccurate and has wide interobserver variability. Quantitation improves accuracy and reliability, especially in distinction of normal and abnormal. The reliability of mild and moderate grades remains inadequate, and further guidance is needed for the classification of abnormal categories.
The right ventricle is important in a number of cardiopulmonary diseases, and accurate assessment of the right ventricle is vital in the management of these conditions. Currently, right ventricular (RV) assessment with cardiac magnetic resonance (CMR) imaging is considered the “gold standard” for RV assessment, but the availability of this test is limited. In contrast, echocardiography is widely available and also provides hemodynamic data. Unfortunately, echocardiographic assessment of the right ventricle is challenging because of its complex geometry. Many physicians and echocardiography laboratories still depend on visual estimation as a rapid method for assessment of RV size (RVS) and RV systolic function (RVSF). The current guidelines for echocardiographic assessment of the right heart are based on both quantitative and qualitative parameters using multiple acoustic windows. The impact of these guidelines on the accuracy and interobserver concordance of RVS and RVSF assessment is undefined. In this study, we sought to determine the accuracy and interobserver concordance of echocardiographic assessment of RVS and RVSF using visual estimation and quantitative methods, using CMR imaging as the reference standard.
Studies from 12 patients (mean age, 52 ± 19 years; six women) undergoing RV assessment with both echocardiography and CMR imaging performed within 24 hours were selected retrospectively. Five patients had grade 3 or 4+ tricuspid regurgitation, two had grade 3 or 4+ pulmonary regurgitation, three had undergone right-sided valvular repair, one presented for shunt evaluation, and one presented for for evaluation of possible constriction. The institutional review board approved this study.
Experienced sonographers performed transthoracic echocardiography using commercially available equipment (Vivid 7 or E9, GE Medical Systems, Milwaukee, WI; or iE33, Philips Medical Systems, Andover, MA). The sonographers had no prior knowledge that the studies would be used for a quality control exercise. Complete studies were acquired on the basis of the standard protocol in use in our laboratory. Stored images were used to measure two-dimensional RV parameters, including RV basal, mid cavity, and longitudinal dimensions; RV outflow tract proximal and distal dimension; fractional area change (FAC); tricuspid annular plane systolic excursion (TAPSE); tissue Doppler–derived tricuspid lateral annular systolic velocity (S′); and RV index of myocardial performance (RIMP).
CMR imaging was performed on a standard scanner (Achieva 1.5 T; Philips Medical Systems, Best, The Netherlands). Turbo spin-echo and gradient-echo images were obtained for anatomic definition. Cine images were obtained for the evaluation of cardiac function and flow. RVS and RVSF were assessed by measuring RV volumes from steady-state free precession bright blood cine. All studies were interpreted by qualified personnel. The normal values of RV volume and RVSF were based on published 95% confidence intervals (CIs) of the mean value in the normal population studies, which is dependent on gender, age, and body surface area. Below the lower 95% confidence limit, RV dilatation or systolic dysfunction was diagnosed. In the absence of a predefined classification of the severity of RV volume and RVSF by CMR imaging, we categorized the severity of RVSF using the same criteria as for left ventricular ejection fraction (40% to the lower limit of normal as mild dysfunction, 30%–39% as moderate dysfunction, and <30% as severe dysfunction).
The accuracy of echocardiographic readings of RVS and RVSF was assessed by comparison with CMR imaging. Because there are no criteria for defining mild, moderate, and severe RV enlargement by CMR, these were defined by a consensus of three experts (T.H.M., L.R., Z.P.), informed by the normal RV literature.
Fifteen readers (nine level 3 and six level 2) evaluated the anonymized echocardiographic images. The only clinical data provided were age, gender, and the indication of the study. In the first part of the study, left parasternal long-axis, RV inflow, parasternal short-axis, apical four-chamber, and subcostal views were shown. The readers were asked to visually estimate RVS and RVSF as normal, mild, moderate, or severe in each case. The readers were blinded to one another’s readings and filled individual data entry sheets. At the end of the first part of the exercise, the current guidelines were briefly presented and distributed to all readers. The readers met again for the second part of the study 2 weeks later. In addition to the previous images from multiple windows, additional images with quantitative measurements performed by an observer who was blinded to the CMR results were added, which included RV dimension, RV outflow tract dimension, FAC, TAPSE, S′, and RIMP. The readers were then asked to grade RVS and RVSF for each case. The data entry sheet was collected after each session.
Statistical analysis was performed using standard software (SPSS version 17.0, SPSS, Inc., Chicago, IL; and SAS version 9.2, SAS Institute Inc., Cary, NC). Accuracy of grading of RVS and RVSF was estimated by comparison with the reference standard, CMR imaging. The data were assessed in two ways: first as four ordinal categories and then as binary categories, treating normal as negative for disease and mild, moderate, or severe as diseased. For each value of the reference standard, logistic regression analysis was used to compare the accuracy of the visual estimation to addition of quantification. The dependent variable in the models was a dichotomous variable for whether the reader correctly or incorrectly scored RVS (or RVSF); the independent variable was the method of scoring (visual estimation or addition of quantification). Generalized estimating equations with an exchangeable working correlation structure were used to account for multiple observations on the same patient. The null hypothesis was that the method of scoring was not associated with reader accuracy; the alternative hypothesis was that the method of scoring is associated with reader accuracy (two-tailed test). A Wald test was used to evaluate whether the regression coefficient for the method of scoring differed from zero; a significance level of .05 was applied. Ninety-five percent CIs for the difference in accuracy between the two methods were constructed from the fitted models.
Overall accuracy (i.e., the probability of a correct diagnosis), which is a weighted average of sensitivity and specificity, was estimated for each method of scoring for prevalence rates of 5%, 10%, 25%, 50%, and 67% (the observed prevalence in the study sample) for RVS and similarly for RVSF (the observed prevalence was 50% in the study sample).
Interreader agreement was characterized as the mean of all readers’ pairwise Cohen’s unweighted κ statistics; values of κ range from −1 to 1, with values <0.2 indicating poor agreement, 0.20 to 0.40 fair agreement, 0.40 to 0.60 moderate agreement, 0.60 to 0.80 good agreement, and 0.80 to 1.0 excellent agreement.
The readers’ overall accuracy for diagnosing normal, mild, moderate, and severe RVS and normal, mild, and moderate RVSF is shown in Tables 1 and 2 . With only visual scoring, readers had poor accuracy in characterizing RVS as normal, mild, and even moderate. Over a quarter of normal cases were reported as moderate or severe RVS. Visual scoring had 70% accuracy for diagnosis of severe RVS, with 27% of severe cases interpreted as moderate. The addition of quantification correctly classified 78% of patients with normal RVS, with only 8% misdiagnosed as moderate or severely enlarged, and 97% of severe RVS was correctly staged.
|Reference standard (CMR)||Echocardiographic readings|
|Normal ( n = 60)||23 (38%)||47 (78%)||20 (33%)||8 (13%)||11 (18%)||5 (8%)||6 (10%)||0 (0%)|
|Mild ( n = 60)||12 (20%)||2 (3%)||32 (53%)||30 (50%)||16 (27%)||28 (47%)||0 (0%)||0 (0%)|
|Moderate ( n = 30)||3 (10%)||1 (3%)||8 (27%)||7 (23%)||8 (27%)||11 (37%)||11 (37%)||11 (37%)|
|Severe ( n = 30)||1 (3%)||0 (0%)||0 (0%)||0 (0%)||8 (27%)||1 (3%)||21 (70%)||29 (97%)|
|Reference standard (CMR)||Echocardiographic readings|
|Normal ( n = 90)||47 (52%)||76 (84%)||31 (34%)||10 (11%)||12 (13%)||4 (4%)||0 (0%)||0 (0%)|
|Mild ( n = 45)||16 (36%)||7 (16%)||16 (36%)||25 (56%)||11 (24%)||11 (24%)||2 (4%)||2 (4%)|
|Moderate ( n = 45)||2 (4%)||0 (0%)||4 (9%)||7 (16%)||23 (51%)||26 (58%)||16 (36%)||12 (27%)|
Analogous results are shown for RVSF in Table 2 . With visual scoring alone, readers demonstrated poor accuracy in diagnosing normal RVSF, with 48% of normal cases scored as mild or moderate. With quantification, only 16% of normal cases were scored as mild or moderate.
Figures 1 and 2 illustrate the comparison of visual scoring to quantification. Severe RV enlargement showed the highest accuracy, with 70% with visual estimation and 97% ( P = .018; 95% CI for difference, 0.04–0.75) with the addition of quantification. Readers also demonstrated significant improvement in diagnosing normal RVS with the addition of quantification (from 38% to 78%; P = .001; 95% CI for difference, 0.19–0.66). The recognition of normal function improved substantially with the addition of quantification. Normal RVSF showed the highest accuracy with quantification (from 52% to 84%; P < .001; 95% CI for difference, 0.21–0.45). There was also an improvement in the diagnosis of mild RVSF with the addition of quantification (from 36% to 56%; P = .001; 95% CI for difference, 0.13–0.26). Accuracy for diagnosing mild and moderate RV enlargement and moderate dysfunction was similar with and without the addition of quantification.