Background
Although not recommended in isolation, visual estimation of echocardiographic ejection fraction (EF) is widely applied to confirm quantitative EF. However, interobserver variability for EF estimation has been reported to be as high as 14%. The aim of this study was to determine whether self-directed education could improve the accuracy and interobserver variability of visual estimation of EF and whether a multireader estimate improves measurement precision.
Methods
Thirty-one participants provided single-point EF estimates for 30 echocardiograms with a spectrum of EFs, image quality, and clinical contexts in patients undergoing cardiac magnetic resonance (CMR) within 48 hours. Participants received their own case-by-case variance from CMR EF, and the 10 cases with the largest reader variability were discussed along with corresponding CMR images. Self-directed learning was undertaken by side-by-side review of echocardiographic and CMR images. Two months later, 20 new cases were shown to the same 31 participants, using the same methodology.
Results
The baseline interobserver variability of ±0.120 improved to ±0.097 after the intervention. EF misclassification (defined as ±0.05 of CMR EF) was reduced from 56% to 47% ( P < .001), and the intervention also resulted in a decrease in the absolute difference between CMR and echocardiography for all cases and all readers (from 0.07 ± 0.01 to 0.06 ± 0.01, P = .0001). This improvement was most prominent for the readers with lower baseline accuracy. A combined physician-sonographer EF estimate improved the precision of EF determination by 25% compared with individual reads.
Conclusions
In readers with varying levels of experience, a simple, mostly self-directed intervention modestly decreased interobserver variability and improved the accuracy of EF measurements. Combined physician-sonographer EF reporting improved the precision of EF estimates.
Continuous quality assessment and the implementation of methods to improve the interpretation of echocardiograms is an important responsibility of all echocardiography laboratories. The process of quality improvement should include a baseline assessment of performance, the use of an external reference standard, a review and an educational process, and a reassessment of performance. The American Society of Echocardiography recommends the ejection fraction (EF) as an important target for quality exercises, with specific suggestions for group-based case reviews along with cross-modality comparisons.
Although not suggested as the primary method for EF quantification by the American and European societies of echocardiography, visual confirmation of the quantitative EF is still recommended to support EF quantification.
Two metrics for quality improvement of visual EF estimation seem important. Interobserver variability (IOV) is a measure of reliability (i.e., the consistency and stability of test scores across readers). Accuracy pertains to the proximity to a reference standard (e.g., EF calculated from cardiac magnetic resonance [CMR]), and the term “precision” being used as the inverse of variance from such a standard. The literature on methods to assess and/or improve IOV and the accuracy of visual EF quantification is limited. In one study, the use of immediate feedback with radionuclide EF showed an improvement in correlation with radionuclide ventriculography in three readers of varying experience. A more recent study showed improvements in IOV and accuracy in comparison with expert-derived biplane EF after structured educational sessions to improve both. However, a lengthy structured educational session may be organizationally challenging at many institutions, and providing an intervention uniformly to all members of a laboratory can be difficult, so it may be important to identify individuals who may benefit most from an intervention. Also, other practical methods to improve the precision of visual determination of EF have not been explored, and an external reference standard (as suggested by the American Society of Echocardiography) may be a valuable addition. Therefore, the aims of our study were to (1) assess whether a mostly self-directed educational session using CMR as a reference standard would improve the IOV and accuracy of visual EF estimations by echocardiography, (2) identify the subgroup of individuals who may benefit most from the intervention, and (3) assess the use of combined physician-sonographer EF estimation to improve the precision of visual EF estimates.
Methods
Baseline Assessment
Thirty two-dimensional transthoracic echocardiograms with ( n = 8) and without ( n = 22) contrast administration for left ventricular (LV) opacification were chosen on the basis of the availability of a CMR study with appropriate images for LVEF measurement within 48 hours of echocardiography and without any intervening procedures or changes in clinical status. Selected clips (consisting of three apical views as well as basal, mid, and apical short-axis views as available, displayed with equal time in a quad format) were provided to 31 participants in a group setting, allowing two min per case. The participants consisted of sonographers ( n = 11) and cardiologists ( n = 20) with level II and level III certification in echocardiography. Participants were required to provide visual estimates of LVEF as single integers and were blinded to one another’s interpretations. All data were collected on an identified answer sheet with each case coded separately.
To avoid any bias, no clinical data about the cases were provided. All cases were obtained from images acquired >3 months before the date of the exercise to minimize the possibility of recollection in case some of the cases were originally reported or performed by the participants. The presented cases consisted of a spectrum of EF range, image quality, and clinical indications. All images were scored for image quality on the basis of visualization of the LV walls and endocardium: 1 = excellent acoustic detail, 2 = good acoustic detail, 3 = adequate acoustic detail, and 4 = technically difficult study with poor or inadequate visualization of LV walls and/or endocardium.
CMR Quantification
CMR was used as the reference method because of its defined accuracy, as well as its applicability as a teaching tool, on the basis of high contrast-to-noise and signal-to-noise ratios that allow confident assessment of wall motion abnormalities and evaluation of regional function abnormalities. All CMR studies were analyzed by an expert (S.D.F.) with >15 years’ experience with CMR. Steady-state free precession short-axis cine images were used for manual planimetry to obtain LV EF. The reader did not participate in the EF exercise and was blinded to the echocardiographic interpretation and the original CMR endocardial contours or clinical report. Previous guidelines and recommendations were followed for endocardial contouring, with inclusion of the papillary muscles in the ventricular volumes and careful selection of the basal segments for inclusion or exclusion.
Self-Directed Education
Two weeks after the initial session, each participant was provided a summary of his or her LVEF estimates in comparison with the CMR LVEF values using a scatter diagram with a line of unity. Ten cases with the largest discrepancies in LVEF among the group were then reviewed in a group setting. The 1-hour session consisted of a review of the echocardiograms along with both the CMR images and EF and discussion of the reasons for the discrepancies and ways to improve accuracy and reduce IOV of the visual LVEF determination. Consensus for improvement of LVEF estimation included (1) greater emphasis on the LV base than the apex (e.g., by weighting short-axis over apical images), (2) avoidance of zoomed images, and (3) careful identification and actively accounting for segmental variations in ventricular function.
Subsequently, the readers were given all 30 echocardiographic cases, the corresponding CMR cine images (three long-axis images and a short-axis stack), and the CMR EFs for self-review over a 2-week period. To confirm that all cases were reviewed, all participants were asked to provide a second set of EF values for all 30 cases. These EF values were used as a way of confirming that the exercise was performed, not for analysis. The goal of the self-directed exercise was to recalibrate LVEF estimations using CMR images and to understand why the visual LVEF may have been overestimated or underestimated.
Posteducation Assessment
One month later, 20 new cases were shown to all participants in a group setting using the same methodology described for the baseline assessment. The cases were selected to have similar pathologies, image quality, and range of EF as those of the baseline assessment. At the beginning of the session, the principles used to improve visual EF estimation discussed in the original review session were briefly reviewed before presentation of the cases. IOV was calculated after the intervention and compared with preintervention values. There were six readers (five physicians, one sonographer) who participated in the baseline and follow-up sessions but could not attend the review session and could not review the CMR images. These six readers were analyzed separately.
Assessment of Accuracy
For each participant, the LVEF for each case was compared with the CMR EF values. A difference of more than ±0.05 absolute EF points between the visual EF and CMR EF was deemed to illustrate misclassification. Additional analysis was done with differences of more than ±0.08 and more than ±0.10 to define misclassification. The proportion of cases misclassified by the readers was compared between the baseline and postintervention cases. In addition, the mean absolute differences between CMR and visual EFs for all readers were compared before and after the intervention. A subgroup comparison was made for differences in baseline reader accuracy. The baseline accuracy of the readers was determined using the coefficient of variance for the difference between echocardiography and CMR-EF for all cases for each reader. Those with above the median coefficient of variance were denoted as readers with lower baseline accuracy and those with below the median coefficient of variance as those with higher baseline accuracy.
Use of Combined Physician and Sonographer EF Estimation to Improve Precision
In many echocardiography laboratories, the EF is usually determined by two readers: the sonographer performing the study and the physician finalizing the report. Usually, the physician chooses his or her own EF value to report if there is a discrepancy with the sonographer. We hypothesized that in a laboratory with experienced sonographers and physicians, averaging the two visual EF values would improve the precision of the EF measurements in comparison with CMR. We tested this hypothesis by first determining the precision of the individual physicians (staff cardiologists only, n = 16) and the sonographers ( n = 11) by obtaining the standard deviation of the absolute differences between the visual EF and CMR EF for all cases for each reader using the postintervention data. The baseline precision for the group was calculated as the mean of the standard deviation of all readers. Then the results for each sonographer were averaged with 16 physicians to provide 16 combined reads for each sonographer, and results for each physician were combined with each of 11 sonographers to provide 11 combined reads for each physician. Then the absolute difference between the CMR EF and the combined EF was taken for all the cases for each “combined read.” The standard deviation of these differences was used as a measure of the “combined read” precision.
Statistical Analysis
Continuous data are expressed as mean ± SD and categorical data as frequencies or percentages. All absolute EF values are represented as decimals (e.g., 10% is represented as 0.10), while any relative change in EF is represented as a percentage. The IOV for visual assessment before and after the intervention was calculated using two separate analyses of variance using two different methods. First, to compare our findings with those of a previous study, we used EF as the dependent factor and readers as a fixed factor, with the IOV calculated as the square root of the observer mean squared error (MSE). We then performed a second analysis using cases (each of the echocardiograms shown was taken as a “case”) as fixed factors with the IOV taken as the square root of the MSE for the within-cases variance (error term). Preintervention and postintervention IOVs were compared with the approximate F -distribution ratio as observer MSE preintervention/observer MSE postintervention for the first analysis and as cases within MSE preintervention/cases within MSE postintervention for the second analysis. Fisher’s exact test was used to determine differences in misclassification rates before and after the intervention. All statistical analysis was performed using SPSS version 19.0.0 (IBM Corporation, Chicago, IL).
Results
Cases and Participants
Among the 31 participants 11 were sonographers and 20 were physicians. The median duration of experience among the sonographers was 3.5 years (interquartile range, 24.3 years; range, 1–32 years), while among the physicians, it was 9.5 years (interquartile range, 19.5 years; range, 1–36 years). All sonographers were full-time clinical staff members who perform on average 8–10 echocardiographic studies per day and provide complete preliminary reports, including LVEF estimation. The physicians were all cardiologists with subspecialty echocardiography training (level II or III training).
A description of the cases used before and postintervention is summarized in Table 1 . There were a total of 930 visual EF responses before the intervention and 620 responses after the intervention. The EF and quality range of the cases were similar between both time points. Image quality of the included cases was similar at both time points, with equal numbers of cases at both time points receiving contrast administration.
Preintervention | Postintervention | ||||
---|---|---|---|---|---|
Group mean ± SD | CMR EF | Study quality | Group mean ± SD | CMR EF | Study quality |
0.16 ± 0.05 | 0.11 | 2 | 0.16 ± 0.04 | 0.11 ∗ | 3 |
0.15 ± 0.04 | 0.17 ∗ | 1 | 0.17 ± 0.04 | 0.18 | 2 |
0.23 ± 0.06 | 0.19 | 1 | 0.18 ± 0.05 | 0.23 ∗ | 2 |
0.24 ± 0.05 | 0.19 | 1 | 0.30 ± 0.08 | 0.26 ∗ | 2 |
0.28 ± 0.08 | 0.22 | 1 | 0.23 ± 0.06 | 0.31 | 3 |
0.20 ± 0.04 | 0.25 | 2 | 0.29 ± 0.06 | 0.33 | 2 |
0.12 ± 0.04 | 0.25 | 1 | 0.43 ± 0.08 | 0.36 | 2 |
0.18 ± 0.04 | 0.27 | 1 | 0.40 ± 0.06 | 0.37 ∗ | 3 |
0.37 ± 0.06 | 0.32 | 2 | 0.35 ± 0.08 | 0.42 | 2 |
0.33 ± 0.07 | 0.32 ∗ | 2 | 0.36 ± 0.05 | 0.43 | 1 |
0.26 ± 0.06 | 0.32 ∗ | 2 | 0.37 ± 0.06 | 0.44 | 1 |
0.44 ± 0.06 | 0.33 ∗ | 3 | 0.50 ± 0.06 | 0.46 | 2 |
0.36 ± 0.06 | 0.37 ∗ | 2 | 0.49 ± 0.05 | 0.51 | 2 |
0.27 ± 0.06 | 0.39 | 1 | 0.49 ± 0.07 | 0.53 ∗ | 2 |
0.45 ± 0.06 | 0.42 | 3 | 0.52 ± 0.05 | 0.56 ∗ | 2 |
0.41 ± 0.07 | 0.42 ∗ | 2 | 0.63 ± 0.05 | 0.57 ∗ | 2 |
0.42 ± 0.07 | 0.43 | 3 | 0.56 ± 0.05 | 0.60 | 2 |
0.43 ± 0.08 | 0.48 | 2 | 0.67 ± 0.04 | 0.63 ∗ | 2 |
0.39 ± 0.05 | 0.49 | 3 | 0.64 ± 0.04 | 0.66 | 2 |
0.49 ± 0.06 | 0.50 ∗ | 2 | 0.64 ± 0.04 | 0.68 | 1 |
0.52 ± 0.09 | 0.53 | 2 | |||
0.57 ± 0.08 | 0.53 | 2 | |||
0.49 ± 0.06 | 0.54 | 1 | |||
0.56 ± 0.04 | 0.58 | 2 | |||
0.61 ± 0.04 | 0.58 | 2 | |||
0.55 ± 0.05 | 0.58 | 2 | |||
0.49 ± 0.07 | 0.58 ∗ | 1 | |||
0.51 ± 0.05 | 0.64 | 3 | |||
0.62 ± 0.04 | 0.68 | 2 | |||
0.57 ± 0.05 | 0.69 | 2 |