Accurate assessment of left ventricular (LV) size and function is among the most important requisites of clinical cardiology. Echocardiographic quantitation of LV volumes and ejection fraction (EF) is of paramount importance for diagnosis, risk stratification, and treatment of cardiovascular diseases, and not surprisingly, it remains the most frequent indication for performing echocardiography. However, which is the most effective method of LV quantitative analysis by ultrasound is still unsolved.
Although the modified Simpson’s biplane method has been adopted by clinicians and researchers, many sources of error have been demonstrated, and its reproducibility among observers and its test-retest reliability are rather unsatisfactory. Moreover, the time required to trace the endocardium manually in four- and two-chamber views at both end-diastole and end-systole may become an issue at busy clinical or research core laboratories with particularly high workloads. Three-dimensional (3D) echocardiography (3DE), being free from errors related to image plane position and geometric assumptions, is now emerging as a superior echocardiographic technique for LV quantitation compared with two-dimensional (2D) echocardiography (2DE). The adoption of 3DE for measuring LV volumes and EF has led to significant advantages over 2D imaging in terms of accuracy, test-retest reproducibility, and outcome prediction. However, 2DE remains the most widely used echocardiographic technique in mainstream clinical practice.
An important milestone for LV quantitation by ultrasound was the recent development of endocardial border–tracking algorithms designed for both 2D and 3D echocardiographic data sets, which allow the detection of the LV endocardial surface in a semiautomated or fully automated manner. These tools have been developed to improve the reproducibility and minimize the effort and time required for measurements. Several research centers have embraced this new way to quantify the left ventricle, demonstrating the accuracy and reproducibility of such algorithms in (often selected) cohorts. However, in the everyday practice of most clinical echocardiography laboratories, a clear superiority of these algorithms must be demonstrated before they can be recommended for routine clinical implementation.
In this issue of JASE , Aurich et al . report their experience with the application of semiautomated 2D and 3D software packages in experimental and clinical settings and their accuracy in comparison with the conventional Simpson’s biplane method and cardiac magnetic resonance (CMR) imaging in a sample of 47 patients with various cardiac pathologies and acoustic windows. First, they tested the standard Simpson’s method, semiautomated 3D echocardiographic software, and CMR imaging on water-filled latex balloons and found that 3DE had the best accuracy for in vitro volume measurements. Then, they tested both 2D and 3D echocardiographic software packages in patients and compared the semiautomated LV measurements with those provided by CMR imaging and conventional 2DE with Simpson’s method. Although they found fairly good agreement of semiautomated methods with CMR imaging and conventional 2DE with Simpson’s method (reflected by intertechnique correlations for LV EF >0.7), they reported a considerable underestimation of LV volumes and EF by the semiautomated 2D and 3D echocardiographic software, with wide limits of agreement and unacceptable individual errors with respect to CMR imaging.
A relevant aspect of this investigation in comparison with previous studies is that the investigators intentionally did not exclude patients with suboptimal image quality, aiming to reproduce a “real-life” clinical setting. Although this approach allowed them to avoid the typical selection bias related to image quality, they might well have introduced a different selection bias by randomly enrolling patients with clinical indications for evaluation by CMR imaging, rather than enrolling consecutive patients sent for echocardiographic studies. However, the results of this study regarding the accuracy of semiautomated software in such a clinical setting are certainly interesting and relevant to the echocardiography community, and they allow us to make a few observations.
Role of Image Quality in Endocardial Delineation
Among the most important benefits of echocardiography is the ability to measure key parameters, such as LV EF. In daily clinical practice, this capability cannot be taken for granted, because it depends heavily on image quality. A clear delineation of the blood pool–endocardial interface is of paramount importance for accurate and reproducible quantitation by echocardiography. Yet the physics of echocardiographic imaging (apical views depending on lateral with or without azimuthal, not axial, ultrasound resolution; lung interference; dropouts or rib artifacts; etc.) and the irregular nature of this interface (consisting of interdigitated trabeculae and columnae carnae) can make this delineation particularly challenging in routine practice.
In the study by Aurich et al ., the main culprits for LV volume underestimation by both semiautomated 2D and 3D echocardiographic algorithms compared with CMR were (1) poor endocardial boundary definition (“45% of patients with more than 3 segments not visible by conventional 2D echocardiography”), requiring manual interpolation of nonvisible parts and (2) the inclusion of all quantitative measurements in the statistical analysis, irrespective of the image quality of the data sets from which they had been obtained (“volume and EF calculation was possible in all included subjects”). The considerable bias of echocardiography versus CMR imaging reported by Aurich et al . reflects the large extent to which we tend to rely on our ability to extrapolate the LV boundary when it cannot be clearly visualized. The concept “if you can’t see it clearly, then you can’t trace it accurately” is once again reinforced and seems pertinent for all imaging modalities. Indeed, the gold standard, CMR imaging, showed low accuracy in measuring latex balloon volume in vitro, because the images could not demonstrate the balloon-water interface clearly enough. On the other hand, the 3D echocardiographic algorithm was less accurate in patients than in balloons, suggesting once more that in vivo situations rarely resemble in vitro settings used for the validation of these algorithms. Of note, border-tracing error is the only source of error in 2DE that has not been solved by introducing 3DE.
Sources of Error in Endocardial Border Identification
Manual tracing of the LV endocardium introduces considerable latitude for echocardiographers to include varying amounts of endocardial trabeculae and papillary muscles as part of the LV cavity. This results in large measurement variability when parts of the LV cavity borders are difficult to recognize or completely missing. As a consequence, we have become used to interpreting the serial changes in LV EF detected by 2DE with a significant amount of tolerance and to being cautious before concluding that a 5% to 10% decrease in EF represents a true worsening in LV performance rather than random measurement variability. We know to expect slightly lower volumes when applying 3DE than CMR imaging, mainly because the lower spatial resolution (image quality) of 3DE does not allow the human eye to make a reliable discrimination between the trabecular and the compacted layers of the myocardium. Finally, we have also learned to tolerate more underestimation of LV volumes from 2DE due to supposed apical foreshortening and to interpret LV volumes by considering specific reference ranges for each imaging modality.
Quantitative approaches based on semiautomated or fully automated computer algorithms for LV endocardial border identification have been developed with the hope of reducing human error and the time required to analyze 2D or 3D echocardiographic data sets. Recent technological advances in acquisition have improved the spatial and temporal resolution of 3DE data sets, stimulating the development of high-performing border identification algorithms able to handle the increasing amount of data available from a single cardiac cycle. Although computer-aided LV quantitation is generally deemed a faster, more reproducible, and reliable alternative to standard manual methods, this should not be taken for granted in any circumstance. Paraphrasing the pioneer of portable computers, Adam Osborne, computers are not meant to prevent humans from making mistakes, but “with computers you only make mistakes faster.”
In general, automated algorithms are very attractive, because they are effective at repetitive tasks and extremely predictable. Indeed, automated algorithms used for LV quantification show excellent reproducibility when applied on the same data sets. However, in unusual circumstances, an algorithm cannot react as well as a human operator. When LV boundaries have a different shape than predicted (mitral valve flail, LV aneurysm, hourglass-shaped cavity in hypertrophic cardiomyopathy, etc.) or when they are poorly defined (low signal-to-noise ratio, dropout, etc.), the automated contouring algorithm performs poorly. In addition, automated algorithms using grayscale or gradient information do not make a reliable distinction between papillary muscles and myocardium, because they have the same echogenicity. In these situations, input from the human operator is needed. On the other hand, humans are highly unpredictable and cannot perform repetitive tasks as well and as fast as an automated computer-based algorithm. Although computer-aided quantitative methods for LV quantitation by 2DE and 3DE have considerably improved over the years, they cannot be the answer to the problems of manual methods when images are of poor quality. This concept is highlighted in the study by Aurich et al ., who report that semiautomated LV measurements were indeed more closely correlated with those by CMR in patients with good quality images than in patients with suboptimal images. Furthermore, as expected, the use of these semiautomated tools did not show any benefit in comparison with standard 2D echocardiographic analysis when the images were suboptimal.
The importance of image quality in echocardiography has been established in studies showing the benefit of contrast enhancement for measuring LV volumes and function in difficult-to-image patients. We could certainly argue that the scenario of Aurich et al .’s study may not exactly reflect the “real life” of many clinical echocardiography labs, because a significant proportion of enrolled patients met actual criteria for the administration of an echocardiographic contrast agent rather than performing a semiautomated analysis on the basis of border detection algorithms in native images of limited quality. However, this “imperfect” scenario set by the investigators is extremely useful to understand the potential risks that one could take by using semiautomated algorithms improperly.
A frequent misconception is that high–performing, fully automated software should necessarily lead to a superior outcome compared with semiautomated software, because the latter is still affected by the unpredictability and the subjectivity of the human operator who is required to set landmarks or trace borders. This belief fuels at present extensive research from engineers and investments from industry to render these algorithms one step closer to a fully automated analysis. Indeed, with fully automated algorithms, operators are required only to select the data set and press a couple of buttons to trigger the automated analysis (with or without alignment for 3D data sets), which is completed entirely by the software. But because the operators are not forced to remain involved in the control of the algorithm, they are at risk for a false sense of self-satisfaction, and ultimately they may produce more errors than if actively intervening in the computer-aided analysis (a condition called “operator’s dropout”). In addition, fully automated LV analysis, despite being faster than semiautomated analysis, may lead to a considerable underestimation of LV volumes, especially if image quality is suboptimal ( Figure 1 ).