## Background

Coronary artery (CA) *Z *scores are commonly used for clinical decisions in Kawasaki disease, including treatment, anticoagulation, and duration and frequency of follow-up. The aim of this study was to evaluate CA measurement reproducibility, *Z *score calculation variability, and the impact of variability on management.

## Methods

Twenty-one patients with Kawasaki disease with right CA (RCA) or left anterior descending CA (LAD) *Z *scores of 1.5 to 3 (group 1) were randomly selected, and all patients with Kawasaki disease with *Z *scores of 7 to 14 for either the RCA or LAD ( *n *= 20; group 2) were included from March 2008 to May 2014. Two echocardiographers measured left main CA, LAD, and RCA dimensions. The inter- and intraobserver reliability of absolute measurements was calculated, and the CA *Z *scores derived from three commonly used formulas were compared.

## Results

Median age at echocardiography was 1.2 years (range, 0.2–11.5 years), and 68% of subjects ( *n *= 28) were male. Interobserver reliability was high for the LAD (intraclass correlation coefficient [ICC], 96.79%) and RCA (ICC, 93.31%) and lower for the left main CA (ICC, 73.54%). Intraobserver reliability was also high for the LAD and RCA (ICC, 99.08% and 97.74%) and lower for the left main CA (ICC, 80.88%). Calculated *Z *scores were similar among the three formulas for group 1 but varied markedly in group 2. Calculated *Z *scores using the same CA measurement in each of the three formulas resulted in different clinical management in up to seven of 21 group 1 patients (22%) and in up to 10 of 20 group 2 patients (50%).

## Conclusions

Although CA measurements have high inter- and intraobserver agreement, CA *Z *scores vary dramatically on the basis of the *Z *score formula at larger CA dimensions. Discrepancies in CA *Z *score calculators may affect clinical decision making.

## Highlights

- •

Calculated

*Z*scores were similar among the three formulas for coronary artery*Z*scores of 1.5 to 3.

- •

Calculated

*Z*scores varied markedly among the three formulas for coronary artery*Z*scores of 7 to 14.

- •

Discrepancies in CA

*Z*score calculators may affect clinical decision making.

Kawasaki disease (KD) is an acute vasculitis of childhood manifested by fever together with characteristic clinical symptoms including nonexudative conjunctivitis, erythema of the oral mucosa, extremity changes, cervical lymphadenopathy, rash, and signs of inflammation. Coronary artery (CA) aneurysms (CAAs) develop in 20% to 25% of untreated patients and in about 5% of patients treated with intravenous immunoglobulin. Because the etiology of KD is unknown, and no definitive diagnostic test is available, KD diagnosis is dependent on clinical assessment and echocardiographic evaluation of CA involvement. The 2004 American Heart Association (AHA) guidelines incorporate echocardiographic criteria into the diagnostic algorithm for cases in which clinical criteria are incomplete for KD. The guidelines recommend treatment for incomplete KD if the *Z* score for the left anterior descending CA (LAD) or right CA (RCA) is ≥2.5, in addition to other clinical criteria.

In addition to KD diagnosis, CA *Z* scores are now commonly used for clinical decision making, including risk assessment in patients with giant CAAs. Patients with giant CAAs are at risk for adverse cardiac events, including CA thrombus and occlusion, myocardial infarction, and death. Currently, AHA KD guidelines recommend anticoagulation with aspirin and warfarin or enoxaparin for patients with giant CAAs (defined any CA segment ≥8mm), but the guidelines are not definitive on anticoagulation criteria for patients with moderate-size aneurysms (5–8 mm). Previous studies have suggested that in smaller children, absolute CA diameter may underestimate the severity of CAA and may not be adequate for risk assessment for CA events. CA *Z* score criteria for chronic anticoagulation ( *Z* score ≥ 10) have been proposed and may provide better risk assessment. Because CA *Z* scores now play an important role in both KD diagnosis—particularly at *Z* scores of about 2—and clinical decision making about anticoagulation at a *Z* score of about 10, it is critical to understand the variation in both CA measurements and CA *Z* score calculation. Currently, three easily accessible CA *Z* score calculators are used in the United States: those of Colan, Olivieri *et al* ., and Dallaire and Dahdah. The purpose of this study was to evaluate CA measurement reproducibility and *Z* score calculation variability in KD. In addition, we assess how often different results in the *Z* score calculations for a given CA dimension could lead to a difference in clinical management for *Z* scores of about 2 and *Z* scores of about 10.

## Methods

## Subjects

We searched our clinical database for patients diagnosed with KD and RCA or LAD CA *Z *scores between 1.5 and 3.0 or 7 and 14 using the Colan *Z *score formula between March 2008 and May 2014. All patients in the database were treated for KD at Boston Children’s Hospital and received intravenous immunoglobulin. We randomly selected 21 patients with KD with echocardiograms that demonstrated either an RCA or an LAD *Z *score of 1.5 to 3 (group 1). We also selected all patients with KD with *Z *scores of 7 to 14 ( *n *= 20; group 2) during the same time period. We did not use the left main CA (LMCA) for patient selection, because only the LAD and RCA are used in the AHA guidelines because of prior reports showing wider variation in LMCA measurement and anatomy. Basic demographics were collected on all patients and included gender, body surface area (BSA) (calculated using the Haycock method), age at echocardiography, and use of sedation during echocardiography (given that patient movement, especially in infants and young children, during echocardiography may increase variability in CA measurement).

## Echocardiography

Two-dimensional echocardiographic examinations are routinely performed at KD diagnosis, 1 to 2 weeks after treatment, and 4 to 6 weeks after treatment, and more frequently if clinically indicated. For each patient, we chose the first echocardiogram chronologically that met our inclusion criteria. Two echocardiographers (K.G.F. and A.H.-O.) retrospectively remeasured the LMCA, LAD, and RCA in all patients ( *n *= 41) using the standard method demonstrated in Figure 1 . The internal diameters of the RCA, LMCA, and LAD were measured from trailing edge to leading edge on digital images. The LMCA was measured in the midposition, distal to the flaring often seen near the aortic orifice and before the first bifurcation. The LAD was measured at its largest diameter distal to the bifurcation and before the first marginal branch. The RCA was measured at its largest diameter in the relatively straight section of artery just after the initial rightward turn from the anterior facing sinus of Valsalva. Vessels were visualized in multiple planes using a combination of low and high parasternal long-axis and short-axis views, apical four-chamber and two-chamber views, and subcostal projections. In patients in whom multiple CAAs were present in a vessel segment, only the largest CAA was measured. The left circumflex CA and distal segments of the RCA, LMCA, and LAD were not evaluated in this analysis. At two separate time points, both echocardiographers measured each of the three CA segments three times, and the mean value of these measurements was used for analyses. All echocardiograms were performed on Philips iE33 machines using 8- and/or 12-MHz transducers (Koninklijke Philips, N.V., Amsterdam, The Netherlands).

*Z *Scores

*Z *scores for all of the CA measurements (RCA, LAD, and LMCA) were calculated using three commonly used *Z *score calculators on the basis of their published formulas. These *Z *scores were then checked with easily accessible versions of the formula at http://parameterz.blogspot.com .

## Statistical Analysis

Inter- and intraobserver reliability was calculated for each CA segment measurement using κ statistics. Analysis of variance was used to compare CA *Z *scores using three commonly used formulas (those of Colan, Olivieri *et al *., and Dallaire and Dahdah ). Bland-Altman plots were used to analyze the agreement between inter- and intraobserver CA measurements and for the three *Z *score formulas to evaluate systemic and proportional bias. Linear regression was performed to assess for proportional bias, and *t *statistics are reported. Plots of CA dimensions versus BSA for *Z *score = 2 and *Z *score = 10 were created with a curve for each of the three *Z *score formulas to evaluate variation in these *Z *score cut points.

Using the larger CA measurement (RCA or LAD) for each patient, we analyzed how often variation in measurement and *Z *score formula would lead to differences in clinical management for both groups 1 and 2. Using the Colan *Z *score formula, we compared inter- and intraobserver values for each patient and determined how often paired measurements would lead to a clinical discrepancy. For group 1, we considered a clinical discrepancy to be present if *Z *scores were on opposite sides of the upper limit of normal ( *Z *score = 2). For group 2, we considered a clinical discrepancy to be present if *Z *scores were on opposite sides of proposed cutoff for anticoagulation with warfarin or enoxaparin ( *Z *score = 10). Comparisons of differences in clinical management due to variation in measurement were made using *Z *scores derived from the Colan formula. The first observation of each echocardiographer was used to examine interobserver variation in clinical management. All statistical analyses were two sided and had a significance level of .05. Analyses were performed using IBM SPSS Statistics version 20 for Windows (IBM, Armonk, NY).

## Results

Demographic data are presented in Table 1 . More than half of the patients (62%) <3 years of age were sedated for their echocardiographic studies. Measured CA dimensions for each CA segment and their respective *Z *scores for groups 1 and 2 are shown in Table 2 . Inter- and intraobserver agreement was high for LAD and RCA measurements ( Table 3 ). Although measurement agreement for the LMCA was good, it was lower than for the LAD and RCA. Bland-Altman plots ( Figure 2 ) for inter- and intraobserver CA measurements demonstrate that there was good agreement and no systematic or proportional bias for either RCA or LAD measurements.

Variable | Value |
---|---|

Male | 28 (68%) |

Age (y) | 1.2 (0.2–11.5) |

Age < 3 y | 29 (71%) |

Sedated echocardiographic study (age < 3 y) | 18 (62%) |

BSA (m ^{2 }) |
0.56 (0.31–1.17) |

CA | Group 1 ( n = 21) |
Group 2 ( n = 20) | ||
---|---|---|---|---|

Dimension (mm) | Z score ^{∗ } |
Dimension (mm) | Z score ^{∗ } | |

LMCA | 2.3 (1.5 to 4.6) | 0.1 (−1.2 to 4.2) | 3.4 (1.9 to 6.5) | 2.7 (−0.2 to 7.5) |

RCA | 1.9 (1.3 to 3.2) | 0.3 (−1.9 to 2.7) | 3.9 (1.5 to 7.0) | 6.7 (−0.6 to 15.8) |

LAD | 2.0 (1.5 to 3.0) | 0.9 (−1.2 to 9.9) | 3.65 (2.2 to 9.6) | 8.6 (1.0 to 28.7) |

CA | Interobserver ICC | Intraobserver ICC |
---|---|---|

LMCA | 73.54 | 80.88 |

RCA | 93.31 | 97.74 |

LAD | 96.79 | 99.08 |

Calculated CA *Z *scores for LAD measurements using three different *Z *score formulas are shown in Figure 3 (all LAD measurements [ Figure 3 A], *P *= .011; LAD measurements with *Z *scores > 3 [ Figure 3 B], *P *= .002; and LAD measurements with *Z *scores < 3 [ Figure 3 C], *P *= .012). There was better agreement between *Z *score formulas at low *Z *scores and wider variation between *Z *score formulas at higher CA values, as shown in the Bland-Altman plots in Figure 4 . The Bland-Altman plots are indicative of both systematic and proportional bias. Specifically, systematic bias is demonstrated by *Z *scores’ being highest when calculated by the Colan *Z *score formula, lower by the Dallaire and Dahdah formula, and lowest by the Olivieri *et al *. formula. Proportional bias was present for all three comparisons because the degree of agreement depends on the actual measurement; the difference in *Z *scores calculated by each of the formulas was considerably higher as mean *Z *score increased ( *t *= 23.9, *P *< .001 [ Figure 3 A]; *t *= 30.2, *P *< .001 [ Figure 3 B]; and *t *= 17.4, *P *< .001 [ Figure 3 C]). Systemic and proportional biases were lowest using the Colan and Dallaire and Dahdah *Z *score formulas ( Figure 3 A) compared with the other two combinations (i.e., Colan and Olivieri *et al *., Olivieri *et al *. and Dallaire and Dahdah). This pattern was similar for LMCA and RCA CA *Z *scores.