12 Digital Image Processing and Automated Image Analysis in Echocardiography
Digital Image Processing and Automated Analysis: Definition and Motivation
In clinical practice, quantitative measurements are still underemployed: Visual estimation of parameters such as ejection fraction or semiquantitative classification (e.g., wall motion scoring for stress echocardiography) still plays a dominant role. Such eyeballing can be done fast, without much ado, and some experts reach an admirable accuracy. In general, however, it is inaccurate, irreproducible, subjective, and hard to learn.1 Visual estimation of quantifiable measures should be discouraged for any purpose beyond a rough classification, whenever a quantitative alternative is present. Quantitative analysis is advisable when repetitive interpretations are done, when more subtle differences are sought, when interpretation experience is limited, and whenever scientific research is the goal.
Procedures such as stress echocardiography that currently rely on visual scoring of wall motion could benefit enormously from automated analysis; the lack of quantification and the large inter- and intraobserver and interinstitution variabilities2 are perceived as important limitations. The potential of real-time 3D echocardiography for volume estimation and regional wall motion quantification also depends largely on automated analysis.
Digital Image Storage, Communication, and Compression
Digital Images
Inside any modern ultrasound system, images are created as digital images. The generation of the ultrasound images (ultrasound physics, signal processing, and instrumentation) is beyond the scope of this chapter. Excellent descriptions can be found in many handbooks.3–5 Digital images consist of pixels whose brightness or color is represented by a numeric (digital) value. Brightness level is also referred to as intensity or gray value. A cineloop or movie is a sequence of such images, typically at a frame rate of 20 to 200 images per second. The digital representation makes it possible to store and process images in a computer—hence, digital image processing. Modern ultrasound systems are totally digitized and support the storage and communication of digital images and cineloops. For display on a monitor and recording on VCR, these digital images are converted into an analog video signal. The use of analog video output or VCR tape should be strongly discouraged for analysis purposes. Although it is possible to redigitize the analog video with devices such as frame grabbers, this results in severe, irreversible loss of information and image quality: spatial resolution, frame rate, and intensity accuracy will be degraded; separation among image, graphics, and color overlays is lost; and calibration and patient information disappears.
Storage Formats and Image Communication
DICOM
The method of choice for digital image storage and exchange is DICOM (Digital Imaging and Communications in Medicine6). DICOM is a generally accepted international standard for medical images of all modalities, including all types of ultrasound imaging. The DICOM standard (current version: 2009) is still being extended and improved to better support new developments in medical imaging. As its name implies, DICOM is a communication standard rather than a file format—it defines how medical imaging devices such as ultrasound systems, Picture Archiving and Communications Systems (PACS) servers, and printers communicate to transport, store, retrieve, find, or print images and associated patient information. All major manufacturers have committed themselves to support DICOM. Ultimately, this should lead to the integrated electronic patient record, which contains the full patient file, including patient history, laboratory reports, images of all modalities, and other information. Therefore, DICOM is a very complicated standard: the full description covers several thousand pages.6 A very readable explanation of DICOM for echocardiographers is given by Thomas.7
Image Compression
To reduce data storage requirements, image compression can be employed. Lossless compression techniques (such as run-length encoding [RLE], lossless JPEG) can reduce file sizes by a factor of 2 to 5, and uncompressing will produce a perfect copy of the original image. Lossy compression reaches much higher compression ratios (up to 20 to 100) by eliminating information for which the eye is least sensitive, at the cost of some irreversible image degradation. This degradation is acceptable visually (JPEG factor 20 has been found to produce no diagnostically significant degradation8) and is marginal compared to the degradation associated with VCR storage. However, the compression artifacts may certainly influence digital image processing and analysis. Severely lossy compression is not advisable for archiving or when digital image postprocessing is foreseen. Lossy compression techniques include lossy JPEG, fractal and wavelet compression, and MPEG. DICOM currently6 supports RLE, JPEG (lossless and lossy), JPEG2000, and MPEG2 compression schemes.
Medical Image Processing
Medical image processing is a thriving subdiscipline of digital image processing.9,10 Several good handbooks on medical image processing, with special attention to ultrasound, are available.11,12
Image Enhancement: Level Manipulations, Filtering
Brightness level manipulations include all one-to-one conversions of image brightness levels (input) to display brightness levels (output), either linear or nonlinear. Examples are digital contrast/brightness adjustments, image inversion, and gamma correction. Some examples are given in Figure 12-1. Note that many level manipulations may result in clipping (see Fig. 12-1, C, D, and E) and in reduction of the effectively used number of brightness levels. The extreme example is thresholding (see Fig. 12-1, E), in which all brightness levels above a threshold are set to white, and all below to black.
Pseudocoloring involves a direct conversion of brightness levels to a color scale, generally labeled with names such as Rainbow, Ocean, or Harvest. Because the eye is more sensitive to color differences than to intensities, this may reveal subtle intensity differences. It can be visually pleasing but may also be highly suggestive, as it clusters similar gray values into color groups. Because brightness levels in ultrasound are highly dependent on signal attenuation and local gain settings, the borders that are suggested visually by these colors have little practical significance.13 Pseudocoloring is also sometimes applied to highlight brightness differences with respect to some baseline value (e.g., increase above the local brightness level in a baseline image) with a color (e.g., to visualize the arrival of contrast agents in perfusion imaging). This is an effective tool, but one should be aware that tissue motion may also induce a brightness change and show up as color.
Image Interpretation: The Interpretation Pyramid
The interpretation of medical images is an extremely complicated task that is very hard to transfer into a computer. For us humans, vision is a natural task that we perform instantly and automatically. From the study of human perception however, we know that vision is anything but a simple, straightforward process. Think of the many well-known optical illusions: there is a lot of hidden interpretation going on. In the interpretation of images, several information abstraction levels can be distinguished. This is generally known as the image interpretation pyramid (Fig. 12-2). The levels of this pyramid give us more insight into the mechanisms of different automated techniques and their limitations. A good analogy is found in the interpretation of handwriting or spoken language. This analogy is described in Table 12-1. For interpretation of a written text, one has to know about the alphabet, spelling, vocabulary, syntax, and semantics, and ultimately about the subject of the text, the intentions of the source, and adornments such as humor, sarcasm, and metaphors. These last aspects are not about language—they refer to the real-world domain that the text is discussing. Interpretation is not a simple bottom-up process of combining letters into words into sentences into significance. Text can be fragmented; there are imperfections, such as misspellings and ambiguities, and missing domain knowledge that necessitate feedback between all levels, and even guessing, to come to a consistent interpretation.
Cardiac Image Interpretation
1. They use simplifying assumptions regarding the objects. For example, the LV is considered to be a dark, round object in the middle of the image; the endocardial contour is convex, the endocardium is the strongest edge in the image, and the cardiac wall will not move more than x pixels per frame. Most of such assumptions will hold only to some extent or are overly general.
2. They limit themselves to a subset of the problem domain, such as certain standard views (e.g., only short-axis at midpapillary level), image quality (no dropouts, low noise), anatomy (e.g., no congenital defects), or imaging equipment type or settings (scale, gain, frequency).
3. They require the user to handle the high-level aspects by initializing, guiding, or correcting the system.
Rules for a Well-Behaved Automated Border Detection Method
1. The method should generate “correct” contours. Because this judgment may be subjective (in the light of multiple possible interpretations), a system should preferably be able to adapt to the expert user’s general ideas about correct contours.
2. The contours should be reproducible. This seems obvious for an automatic system, but almost all systems require some type of user interaction (parameter choices, indicating a start point or region, corrections), which will lead to some variability in results. This inter- and intraobserver difference should possibly be smaller than the inter- and intraobserver variabilities associated with similar manual work.
3. The method should be user friendly. It should only address the user for high-level expert decisions, not for handling “stupid” mistakes or do repetitive corrections. This implies:
Automated Border Detection in Echocardiography
Problems and Pitfalls of Border Detection in Echography
1. Pixel intensity does not directly reflect any physical property of the tissue visualized, in contrast to the Lambert-Beer law for radiography or the Hounsfield units for computed tomography. In ultrasound, images are formed by sound reflection and scattering, resulting in a combination of interference patterns (ultrasound speckle patterns) and reflections at tissue transitions or inhomogeneities. Different tissues are often distinguishable only by subtle differences in texture (speckle patterns) or by the coherent behavior of this texture over time, rather than by different intensity values.
2. Ultrasonographic image information is very anisotropic and position-dependent. Reflection intensity, lateral and radial point-spread functions, and signal-to-noise ratio are strongly dependent on both the depth and the angle of incidence of the ultrasound beam, as well as on the user-controlled time gain compensation settings.
3. Image disturbances (artifacts) are caused by factors such as side lobes, reverberations, clutter, and aberration. Many of these problems are especially prominent with high gain settings, which are often necessary in obese or older patients.
4. Parts of the anatomy are not imaged, because of dropouts (for structures parallel to the ultrasound beam), shadowing (behind acoustically impermeable structures such as bone or lung), scan sector limitations, and limited echocardiographic windows. Still-frame images generally miss some information; the human eye compensates for this when viewing a sequence of images. It resolves ambiguities and interpolates the missing parts by exploiting the temporal coherence of structures and speckle, which allows discrimination among noise, artifacts, and anatomy.
5. In specific cases (especially in 3D imaging) the limited temporal resolution and the scanning process may introduce artifacts. The sequential scanning of lines combines information from different moments into one image. For quickly moving structures, this may lead to spatial distortion. Sharp transitions between “older” and “newer” image parts may appear. This is particularly prominent in real-time 3D ultrasound, where information from different heartbeats is stitched together to include a complete object (such as the LV).
6. In 2D ultrasound, the exact spatial localization of the cross-sectional plane is generally not known. This in contrast to 3D techniques (3D ultrasound, magnetic resonance imaging, or computed tomography), where the 3D context is known, and this information is often employed in model positioning for detection. In 2D cardiac ultrasound, the choice of the imaged cross section depends both on the skill and precision of the sonographer and on the available echocardiographic window, which is limited by ribs, lungs, and so on. Apart from volume measurement errors, this may also result in detection problems if the ABD method relies on assumptions of shape and the presence or absence of other structures such as valves or papillary muscles.
Practical Considerations for Automated Border Detection
Practical considerations for appropriate border detection (either automatic or manual) are listed in Box 12-1, subdivided into three categories. A few explanations follow.
Box 12-1 Practical Considerations for Automated Border Detection