To make no mistakes is not in the power of man; but from their errors and mistakes the wise and good learn wisdom for the future. —Plutarch (Greek historian and biographer, ad 46-120)
In the current era of sophisticated cardiac imaging, echocardiography is still a unique modality, in several ways. Beyond the obvious biomedical and logistic considerations, echocardiography is also unique because the acquisition and interpretation of images is critically dependent on the human factor, or at least more so compared with modalities that involve automated acquisition (computed tomography, magnetic resonance, radionuclide imaging). As such, echocardiography is bound to yield results with a wider margin of reproducibility compared with other cardiac imaging modalities. On the other hand, it is the very same human factor that renders echocardiography the most versatile imaging modality in clinical practice. In fact, a major advantage of echocardiography is the possibility for the rapid visual assessment of left ventricular wall motion abnormalities and global systolic function, an extremely useful tool in various clinical scenarios, and also heavily dependent on both the sonographer or the physician who acquires the images and the echocardiographer who interprets them. It is exactly this reliance on the human factor that has spurred multiple studies on the reproducibility of visual assessment of segmental and global left ventricular function. And although it is hard for echocardiography to claim victory in the reproducibility wars (at least in statistical terms), visual assessment has notably withstood the test of time. The question is whether visual assessment dominates clinical practice because the visual approach is clinically adequate or merely because inertia is innate to human nature.
In this issue of JASE , Blondheim et al report their findings on the reproducibility of visual assessment of left ventricular wall motion abnormalities and ejection fraction in a dedicated, carefully designed study. In this investigation, classification of segmental wall motion was modestly consistent among 12 highly experienced echocardiographers, with classification of hypokinetic segments not unexpectedly demonstrating higher variability compared with normal or akinetic segments. Wall motion score index and ejection fraction exhibited better nominal reproducibility among readers. Surprisingly, despite the expertise of the readers involved, intraobserver variability was also considerable. Finally, the echocardiographic detection of the culprit lesion in the subset of patients with acute myocardial infarctions was good for lesions in the left anterior descending coronary artery but only fair for lesions in the circumflex or right coronary artery. However, this part of the study needs to be interpreted with utmost caution, because wall motion abnormalities are the result of a complex interplay of multiple factors, not simply a function of angiographic disease alone, even in the case of a recent myocardial infarction.
Where do these findings leave us? Ultimately, is the visual interpretation of wall motion abnormalities and ejection fraction adequate for clinical decision making, or should we turn exclusively to digital quantification? To answer this question, it is important to differentiate between the methodologic and clinical perspectives of reproducibility analyses. In this context, several aspects of the study by Blondheim et al merit further discussion.
First, what constitutes acceptable reproducibility in medicine for any given method is determined by the clinical application of the corresponding results. For example, discovering left ventricular systolic dysfunction in a patient presenting with symptoms suggestive of heart failure would prompt treatment for heart failure and clinical workup to elucidate the cause of systolic dysfunction. If the originally estimated ejection fraction was 25%, and a second reader thought that the ejection fraction looked closer to 35%, it is unlikely that the course of action for this patient would change. Of note, this interobserver variability scenario lies in the extreme 5% to 10% of observations of Blondheim et al ; for ejection fraction, the standard deviation of readings was 5.8%, effectively meaning that 95% of paired measures had <11.6% discrepancy. Most of the time, discrepancies were far below that level. In the same sense, demonstrating regional wall motion abnormalities in the setting of acute chest pain with ambiguous presentation would direct clinical management toward the acute coronary syndrome pathway. This decision would not probably be affected by having, for example, 4 segments classified as abnormal instead of 6; however, the same discrepancy would cause a considerable drop in reproducibility statistics. Thus, visual assessment is probably adequate when decisions rely mainly on qualitative assessment, exactly because quantitative estimates have only minor influence on decisions in such cases; this probably applies to the majority of clinical scenarios.
From this point of view, the findings of Blondheim et al are reassuring: the reported margins of reproducibility imply that in the vast majority of patients, clinical decisions would be unlikely to change drastically as a result of interobserver variability. Thus, the use of visual assessment of left ventricular function in everyday clinical workflow is justified. In cases in which some form of quantitative documentation would be desirable, the options of quantitative echocardiographic assessment or confirmation with alternative imaging modalities are always there; for example, the implantation of a defibrillator for primary prevention because of a severely impaired ejection fraction, the replacement of the mitral valve because of declining systolic function, a decision for revascularization on the basis of the number of affected segments, and so on. Again, clinical judgment should always prevail in such cases; thresholds are provided by guidelines as a general guide to clinicians, and numerical values alone should not be the decisive point for any decision.
Second, in the study of Blondheim et al, short-axis views were not considered during the interpretation process. In practice, short-axis views (which take advantage of the superior axial resolution of ultrasound and often provide clearer views of most myocardial segments) are always acquired and help substantially in the assessment both of segmental wall motion and global ventricular function. Most likely, the inclusion of short-axis views would have led to more consistent classification of segments. In the same line of thought, contemporary echocardiography provides an array of options to improve visualization and aid decision making in challenging cases. For example, intravenous echocardiographic contrast can enhance the visualization of wall motion in patients with suboptimal acoustic windows; in fact, the use of contrast is known to improve reproducibility of wall motion assessment and ejection fraction. Furthermore, the use of contemporary strain imaging applications can potentially facilitate the detection of segmental abnormalities in challenging cases (eg, in patients with global ventricular dysfunction or in those with paced rhythms or extreme heart rates ). Therefore, when wall motion cannot be ascertained with confidence from standard views, the echocardiographer should always seek ways to improve visualization and the strength of evidence. However, it is important to stress that quantitative approaches, even the semiautomated or automated ones, do not guarantee better reproducibility than visual estimation. It is only constant quality improvement efforts and diligent teamwork that reduce interobserver and intraobserver variability in an echocardiography lab, not automation per se.
Third, all reproducibility statistics are highly dependent on the classification method and the case mix. For example, the higher reproducibility of wall motion score index and ejection fraction compared with that of segment classification should not come as a surprise, because continuous measures allow for finer degrees of classification. It is not uncommon for a “borderline” segment to be classified, for example, as hypokinetic by reader A and as normal by reader B, which would count as a hefty discrepancy for reproducibility calculations, exactly because there is no “intermediate” category to choose from. This situation is very much like being forced to classify an ejection fraction as either 60% or 40% when in fact it looks closer to 50%; these two options would effectively lead to the classification of systolic function into normal or abnormal, respectively, a critical difference. However, for ejection fraction, the options 45%, 50%, and 55% are always there, allowing for the fine-tuning of ratings and leading to better reproducibility values. Similarly, a discrepancy in the classification of a single left ventricular segment between readers would lead to minimal discrepancy in wall motion score index values, because the effect of this segment on overall wall motion score would be diluted by the concordance in the rest of the segments in this summary metric. This would be further exaggerated in populations with a very high proportion of normal left ventricles because of the low prevalence of abnormal segments; that is, the statistics would look great even if disagreement were substantial for the few abnormal segments. Therefore, both the metric used and the population being studied need to be taken into account when interpreting reproducibility results. Preferably, reproducibility studies should refer to populations naturally occurring in clinical practice for their results to be readily interpretable by clinicians.
Finally, the case of echocardiographic assessment of left ventricular function in clinical trials merits a special note. In trial settings, quantitative methods should always be preferred over semiquantitative or qualitative assessments (in this sense, the visual estimation of ejection fraction falls under “semiquantitative” because only a small number of distinct values can be assigned). The reason is simple and has little to do with reproducibility: as the metric gets coarser, the power to detect a treatment effect in a clinical trial or a clinically relevant association in an observational study diminishes. As an exaggerated example, it would be difficult to detect an average improvement of the magnitude “from severely impaired to moderately impaired” in a phase I trial enrolling 30 patients compared with an average 5% quantitative improvement in ejection fraction (at the same level of statistical confidence). The gain in power in trial scenarios will pay back for the additional resources (additional time for analysis, software modules, etc) required for quantitative assessment most of the time. The clinical perception of the trial results, however, would be heavily influenced by what is confidently detectable in clinical practice: it is unlikely that a treatment effect of, for example, 2% improvement in ejection fraction with a new agent or device would sound convincing to clinicians, no matter how statistically significant, exactly because such an effect is well within the reproducibility margins of all current imaging methods.
In summary, the advantages and disadvantages of the long-surviving visual assessment of left ventricular function depend on the application and on the experience of the interpreter. In clinical practice, the visual approach will be around for quite a while, because of its proven clinical usefulness and because the incremental value of quantitative information for decision making cannot justify the additional resources required for semiautomated or automated quantitative assessment of segmental and global left ventricular function, at least in the majority of cases. That having been said, echocardiographers should keep an open mind and use the full armamentarium of modern equipment in challenging cases or when quantitative documentation is important. In this direction, only standardization, continuous education, and quality improvement (ie, teamwork) in the echocardiography lab can bring the integration of newer echocardiographic modalities in clinical practice while ensuring consistency of results obtained with either traditional or digital approaches.