Reliability of Visual Assessment of Global and Segmental Left Ventricular Function: A Multicenter Study by the Israeli Echocardiography Research Group


The purpose of this multicenter study was to determine the reliability of visual assessments of segmental wall motion (WM) abnormalities and global left ventricular function among highly experienced echocardiographers using contemporary echocardiographic technology in patients with a variety of cardiac conditions.


The reliability of visual determinations of left ventricular WM and global function was calculated from assessments made by 12 experienced echocardiographers on 105 echocardiograms recorded using contemporary echocardiographic equipment. Ten studies were reread independently to determine intraobserver reliability.


Interobserver reliability for visual differentiation between normal, hypokinetic, and akinetic segments had an intraclass correlation coefficient of 0.70. The intraclass correlation coefficient for dichotomizing segments into normal versus other abnormal was 0.63, for hypokinetic versus other scores was 0.26, and for akinetic versus other scores was 0.58. Similar results were found for intraobserver reliability. Interobserver reliability for WM score index was 0.84 and for left ventricular ejection fraction was 0.78. Similar values were obtained for the intraobserver reliability of WM score index and ejection fraction. Compared to angiographic data, the accuracy of segmental WM assessments was 85%, and correct determination of the culprit artery was achieved in 59% of patients with myocardial infarctions.


Among experienced readers using contemporary echocardiographic equipment, interobserver and intraobserver reliability was reasonable for the visual quantification of normal and akinetic segments but poor for hypokinetic segments. Reliability was good for the visual assessment of global left ventricular function by WM score index and ejection fraction.

The accurate evaluation of segmental left ventricular (LV) wall motion (WM) abnormalities is of paramount importance for the interpretation of echocardiograms. The detection of abnormally contracting segments facilitates the diagnosis of myocardial damage, the distribution of the abnormal segments suggests the coronary artery involved, the number of abnormal segments suggests the extent of myocardial damage, and the severity of the contraction abnormalities pertains to viability. The evaluation of WM abnormalities includes primarily the visual assessment of displacement and thickening of LV myocardial segments and the classification of each segment according to the severity of the detected abnormalities. Visual estimation is also used for evaluating LV ejection fraction (LVEF) as a measure of systolic function and has important prognostic and therapeutic implications. Recent advances in echocardiographic technology (eg, the incorporation of second harmonics, new transducers) have enhanced endocardial visualization and improved the detection of WM abnormalities.

The purpose of this multicenter study was to determine the reliability and variability of the visual assessment of segmental WM abnormalities and global LV function among highly experienced echocardiographers using contemporary echocardiographic technology in patients with a variety of cardiac conditions.



Echocardiograms of 105 patients (mean age, 59 ± 14 years; 68% men) were reviewed for this study. Ninety patients hospitalized for chest pain (mean age, 57 ± 14 years; 68% men) were catheterized and underwent standard echocardiographic studies within 48 hours of admission. Of these, 62 patients (mean age, 59.2 ± 13.1; 75% men) had electrocardiographic and enzymatic criteria indicating an acute myocardial infarction (MI; 58 with single-vessel disease), and 28 patients (mean age, 53.4 ± 15.7; 54% men) had ischemic disease excluded by history, nonspecific electrocardiographic changes, a lack of cardiac enzyme elevation, and negative findings on coronary angiography (performed in all but 3 patients). Echocardiograms of an additional 15 patients with known nonischemic dilated cardiomyopathy (mean age, 70.5 ± 10.9 years; 60% men) were also evaluated.

Analysis by Experienced Physicians

Twelve physicians with extensive experience in reading echocardiograms, who interpret echocardiograms as their main daily activity, were included in this study. Eight head echocardiography laboratories or noninvasive units, and all are from 9 major cardiology centers. Each reader received compact discs with studies of the 105 patients for blinded assessment. For each patient, 2-dimensional clips of a single cardiac cycle (to verify that all readers were relating to the same cycle) recorded from the 4-chamber, 2-chamber, and apical long-axis views were provided. All echocardiographic recordings were made using a Vivid 7 digital ultrasound scanner (GE Vingmed Ultrasound AS, Horten, Norway). The readers were instructed to assess WM abnormalities and LV function as they would in their routine clinical work, so that this study would reflect real-world practice. Segments considered inadequate for analysis (“unreadable”) were not scored. The analysis was done on workstations installed for this purpose with EchoPAC software (GE Vingmed Ultrasound AS). The left ventricle was automatically divided into 18 segments (6 basal, 6 middle, and 6 apical) to facilitate comparison among readers, and the readers scored each of the segments (1 = normal, 2 = hypokinetic, 3 = akinetic, 4 = dyskinetic, 5 = aneurysmatic). There were only a few dyskinetic and aneurysmatic segments, so they were grouped together with the akinetic segments in the analysis. All scores were automatically stored in an Excel file (Microsoft Corporation, Redmond, WA) that was later sent to a central laboratory for statistical analysis.

Estimation of Systolic LV Function

For each patient, a WM score index (WMSI) was calculated by each reader as the average score of all readable segments. The LVEF of each patient was visually estimated by each reader solely on the basis of the apical views. An average LVEF and standard deviation of the estimates made by the 12 readers was assigned to each patient. The WMSI and LVEF were repeated in 10 patients by 11 of the readers.

Relation of WM Abnormalities to the Culprit Artery

We assessed the ability of experienced readers to detect segmental WM abnormalities correctly by comparing their findings with angiographic data from the catheterized patients. Correct identification of WM abnormalities was considered when such segments were located in territories supplied by culprit arteries and incorrect when WM abnormalities were detected in territories supplied by normal arteries. Sensitivity, specificity, positive and negative predictive values, and accuracy were calculated for each reader (the means, standard deviations, and ranges of the readers are reported) and for each segment (segments were classified according to the scores assigned to them by the majority of readers).

The ability of each reader to predict a patient’s angiographically determined culprit artery was assessed. The artery supplying the territory in which the majority of segments with WM abnormalities were detected was considered the culprit artery. Obviously, by this method, the culprit artery could not be determined in patients who had an identical number of abnormal segments in >1 coronary territory or who had no WM abnormalities. The averages, standard deviations, and ranges of successful determinations of the culprit artery by the individual readers are presented, as well as the rate of correct identification using the segmental majority score.

For the purpose of these analyses, we adopted the American Society of Echocardiography’s classification of territories supplied by each coronary artery with necessary modifications.

Statistical Analysis

Data were analyzed using MATLAB (The MathWorks Inc, Natick, MA) and SAS version 9.1 (SAS Institute, Inc, Cary, NC). Categorical data are presented as counts and percentages and continuous data as mean ± SD. The intraclass correlation coefficient (ICC) was used as an index of interobserver and intraobserver reliability. Interobserver reliability was assessed by fitting several random-effects models with the SAS PROC MIXED procedure, whereby the score was modeled with physician and segment within subject entered as random effects. The score was modeled as an ordinal variable by classifying WM abnormalities into a trichotomous variable (normal = 1, hypokinetic = 2, and akinetic or worse = 3) and then as 3 separate binary variables (normal vs other, hypokinetic vs other, and akinetic or worse vs other). Segments that were unreadable were considered missing values. The ICCs were then calculated from each of the models’ variance components as the ratio of the between-reader error variance and the total variance. For estimation of intraobserver reliability, similar models were used, but data entered into the models represented a sample of 10 randomly selected subjects’ segments scored by 11 of the physicians a second time, enabling measurement of the ICC for repeated measurements (ie, read-reread reliability). The average intraobserver reliability was calculated from each reader’s individual reliability.

ICCs were interpreted in a similar manner as correlation coefficients: ICC > 0.80, excellent; 0.60 ≤ ICC ≤ 0.80, good; 0.40 ≤ ICC ≤ 0.60, moderate; and ICC < 0.40, poor.

The interobserver and intraobserver ICCs were calculated for WMSI and LVEF measurements, using similar models as described above. Because these variables are continuous, we also estimated the interobserver and intraobserver variability (represented by the standard deviation) from the models using the variance components and the mean WMSI score of the study sample adjusted for physicians using the estimated intercept and its 95% confidence interval. Interobserver variability was calculated as the square root of the interobserver variance component, whereas intraobserver variability was calculated as the square root of the error variance component per reader (from 11 separate models) and then averaged.

The sensitivity, specificity, positive and negative predictive values, and overall accuracy of the visual determination of normal, hypokinetic, and akinetic segments by the “average” reader in the study were estimated using a bootstrapping method. At each step, 5 physicians’ segmental scores were randomly selected, and the majority’s score was assumed to be the “true” score of a specific segment for that run. If no majority existed, the segment was considered “unreadable” and dropped from that run. The remaining 7 physicians’ WM classifications were then evaluated versus the scores assigned to each segment by the other 5 physicians, and all 5 measures of accuracy were calculated. This process was repeated 10,000 times. Mean and 95% bootstrap confidence intervals for sensitivity, specificity, positive and negative predictive values, and overall accuracy were calculated.


Interobserver Reliability

A total of 1890 segments from 105 patients were visually read, and WM quality was scored by 12 readers ( Table 1 ). An average of 17.4 ± 1.3 segments per patient were scored by the readers (range, 15.8-17.8). There were 78 “unreadable” segments; most were apical (69%) and least were in the basal area (22%). According to the average of 10,000 bootstrap iterations performed, 74.3% of segments were normal, 11.2% were hypokinetic, and 14.5% were akinetic.

Table 1

Interobserver and intraobserver reliability of visual assessment of WM assessed by ICCs

Interobserver Intraobserver
Variable ICC Mean ICC ± SD ICC range
Normal, hypokinetic, or akinetic 0.70 0.79 ± 0.06 0.68-0.88
Normal vs other (abnormal) 0.63 0.71 ± 0.12 0.50-0.91
Hypokinetic vs other 0.26 0.37 ± 0.19 0.03-0.62
Akinetic vs other 0.58 0.70 ± 0.10 0.56-0.83

Interobserver and intraobserver reliability for the determination of WM abnormalities between trichotomized ordinal segmental scores (normal = 1, hypokinetic = 2, and akinetic =3) and dichotomized segmental scores (normal vs abnormal, hypokinetic vs other, and akinetic vs other scores). Note the low values for hypokinetic segments.

Interobserver reliability for classifying segments into 3 categories had an ICC of 0.70. The ICC for dichotomizing segments into normal versus abnormal was 0.63 and for akinetic versus others was 0.58. The reliability for separating hypokinetic from normal or akinetic segments was poor (ICC, 0.26).

Intraobserver Reliability

Of 180 segments from 10 studies arbitrarily selected and reread by 11 of the readers for the assessment of intraobserver variability and reliability, the average number of segments scored per reader was 146.5 ± 6.3 (range, 138-159). There were a total of 182 changes in WM scoring between the duplicate readings (11.3% of segments), and the number of scores changed per reader ranged from 4 to 31 (average, 16.5 ± 8.9; median, 8.5). These alterations in scoring between the first and the second readings changed the assessment from normal to abnormal or vice versa in an average of 8.5 ± 7.4 segments per reader (5.8% of segments per reader).

Intraobserver reliability for classifying segments into 3 categories had a mean ICC coefficient of 0.79 ( Table 1 , Figure 1 ). Reliability for dichotomizing segments into normal versus abnormal was 0.71 on average. The lowest reliability was for differentiating hypokinetic from normal and akinetic segments (mean ICC, 0.37).

Figure 1

ICCs for intraobserver reliability for trichotomous ordinal scoring (normal, hypokinetic, and akinetic segments) and for dichotomous scoring (each score classification vs the other scores) for each reader. Ranges of the ICCs considered poor, moderate, good, and excellent are specified. Colors represent results of individual readers. Note worst results (ie, lowest reliability) for hypokinetic segments.

Effect of Segment Location on Interobserver Reliability

The variability for the assessment of segments in basal, middle, and apical sections of the left ventricle showed a trend for lower concordance and reliability in basal ventricular segments. However, regardless of segmental level, the reliability coefficient for interobserver assessments of hypokinetic segments (range, 0.23-0.29) was almost half of the corresponding coefficients for normal or akinetic segments (range, 0.49-0.73).

Diagnostic Accuracy of Visual Segmental Scores

Sensitivity, specificity, positive and negative predictive values, and accuracy for the visual determination of normal, hypokinetic and akinetic segments by individual readers compared with a segmental score determined by bootstrapping (see “Methods”) are presented in Table 2 . Whereas overall sensitivity, positive predictive value, and accuracy for the qualitative determination of segmental WM were high, specificity and negative predictive value were lower. The lowest specificity and negative predictive value were for hypokinetic segments (49% and 38%, respectively). A moderately low specificity for akinetic segments (74%) and for the negative predictive value of normally contracting segments (75%) was also noted.

Table 2

Diagnostic accuracy of the visual assessment method for the determination of WM quality versus the mean scores determined for each segment by the bootstrap iterative procedure of 12 experienced readers

Normal Hypokinetic Akinetic
Variable Average 95% CI Average 95% CI Average 95% CI
Sensitivity 90 87-93 90 88-92 95 93-97
Specificity 85 80-90 49 44-54 74 64-83
PPV 95 92-97 93 91-95 96 93-98
NPV 75 68-82 38 31-46 73 62-83
Accuracy 89 87-90 85 84-86 92 91-93

CI , Confidence interval; NPV , negative predictive value; PPV , positive predictive value.

Assessment of WMSI

The average WMSI assigned by the readers to the 105 patients was 1.5 ± 0.2 (range of averages, 1.4-1.6; Figure 2 ). The mean WMSI for patients with normal left ventricles was 1.0 ± 0.0, for patients with ischemic heart disease was 1.3 ± 0.7, and for patients with dilated cardiomyopathy was 2.0 ± 0.9 ( P < .001 for differences between all pairs). Among patients with MIs who had left anterior descending coronary artery (LAD) disease, the WMSI was 1.4 ± 0.8, compared with 1.1 ± 0.4 for circumflex coronary artery disease ( P < .02) and 1.3 ± 0.6 for right coronary artery disease ( P = NS).

Figure 2

Visual assessment of WMSI by each individual reader per patient versus average of all visual scores per patient. Some points in the figure contain a number of superimposed determinations.

Interobserver variability (by standard deviation) was 0.19 (coefficient of variation, 13%), and reliability (by ICC) was 0.84. Similarly, the mean intraobserver variability was 0.13 ± 0.04 (range, 0.06-0.21), and mean reliability was 0.90 ± 0.06 (range, 0.77-0.98). The relation between individual readers’ WMSIs assigned to each patient and their mean is presented in Figure 2 .

Visual Assessment of LVEF

The average LVEF was 51.1 ± 5.8% (range of averages, 43.2%-59.5%; Figure 3 ). The interobserver variability was 5.8% (by standard deviation; coefficient of variation, 11%), and reliability was 0.78 (by ICC). Similarly, the mean intraobserver variability was 5.1 ± 3.4% (range, 2.0%-12.1% for individual readers), and mean reliability was 0.72 (range, 0.16-0.96). The difference of individual readers’ estimations of LVEF (for each of the 105 patients) from the mean LVEF assigned to each patient by the 12 readers ranged from −8.1 to +4.4 percentage points ( Figure 3 ), and the mean difference between duplicate LVEF estimates of the individual readers was 3.4 percentage points (range, 0.2-12.7 percentage points).

Jun 16, 2018 | Posted by in CARDIOLOGY | Comments Off on Reliability of Visual Assessment of Global and Segmental Left Ventricular Function: A Multicenter Study by the Israeli Echocardiography Research Group

Full access? Get Clinical Tree

Get Clinical Tree app for offline access