Background
There is significant disparity in the reported incidence of moderate and severe paravalvular aortic regurgitation (PAR) between the Placement of Aortic Transcatheter Valves (PARTNER) I and PARTNER II trials, which may be related to the echocardiographic methodologies used by separate core laboratories. To further explore the variability in echocardiographic interpretation of PAR, agreement between the grading of PAR by the core laboratory of PARTNER IIB was compared with that by a consortium of echocardiography core laboratory directors.
Methods
The PARTNER IIB core laboratory reevaluated patients using primarily the circumferential extent of the regurgitant jet for PAR. A consortium of echocardiography core laboratory directors was formed to evaluate the echocardiographic images and to grade PAR and central and total aortic regurgitation in a randomly chosen subset of the randomized patients in the PARTNER IIB trial using a multiwindow, multiparametric approach. Both a four-class scale (none or trace, mild, moderate, and severe) and a seven-class (none, trace, mild, mild to moderate, moderate, moderate to severe, and severe) scale were used. Levels of grading agreement between the consortium and original core laboratory in both scales were determined using weighted κ statistics.
Results
Only 87 patients assessed for PAR by the consortium could be paired with readings by the PARTNER IIB core laboratory. Using the four-class grading scheme the weighted κ statistic for PAR was 0.481 (95% confidence limits, 0.367, 0.595). Using the seven-class scale, the weighted κ statistic for PAR was 0.517 (95% confidence limits, 0.431, 0.607). For either grading scheme, 15.9% of patients graded by the PARTNER IIB core laboratory as having moderate PAR would have been graded as having mild PAR using the multiparametric approach. Similar results were seen for central and total aortic regurgitation assessments.
Conclusions
Using primarily the circumferential extent criteria, the PARTNER IIB core laboratory overestimated the severity of PAR compared to the consortium using a multi-parametric approach. Although a more granular classification scheme for PAR may slightly improve concordance between core laboratories, differences in the incidence of moderate or severe PAR are likely related to differences in grading methodology. A multiparametric approach is advocated, and other echocardiographic methods for assessing PAR deserve further study.
Transcatheter aortic valve replacement (TAVR) has rapidly emerged as a reasonable alternative to surgical aortic valve replacement in high-risk and inoperable patients with severe, symptomatic aortic stenosis (AS). Multiple studies have shown a higher incidence of paravalvular aortic regurgitation (PAR) in the TAVR population, with moderate or severe PAR seen in 0% to 24%. The inconsistencies in reported incidences of PAR are multifactorial. These differences may be due in large part to differences in the methods of assessment (cine angiography vs hemodynamics vs cardiac magnetic resonance [CMR] vs echocardiography). Pitfalls inherent in assessing prosthetic valve regurgitation differ for each of these modalities: ventricular size and function for cine angiography, heart rate and diastolic function for hemodynamics, flow turbulences for CMR, and acoustic shadowing for echocardiography, to name just a few.
The inherent differences between surgical and transcatheter PAR jets make assessment by echocardiographic methods challenging. Post-TAVR PAR jets are frequently multiple, irregular in shape, and eccentric in direction of flow. Echocardiographic guidelines for assessing the severity of prosthetic regurgitation were developed for surgical prostheses and have not been well validated for transcatheter valves. The accuracy of published echocardiographic guidelines has recently been called into question, particularly with studies comparing echocardiographic assessment with other modalities, such as CMR. These studies have shed light on the methodologic differences in quantifying PAR by echocardiography, which may help explain some of the between-study inconsistencies in reported PAR incidence. The incidence of moderate or severe PAR in the Placement of Aortic Transcatheter Valves (PARTNER) IB (inoperable) trial for the Edwards SAPIEN valve (Edwards Lifesciences, Irvine, CA) was 11.8% of patients at 30 days and 10.5% at 1 year. The incidence of moderate or severe PAR in the PARTNER IIB trial for the Edwards SAPIEN valve was 16.9% of patients at 30 days and 20.9% at 1 year. This important difference in the incidence of PAR reported by different trials using the same model of transcatheter valve and targeting similar populations may be related to differences in the methods used to grade PAR or to the grading scheme itself. In addition, the three-class grading scheme suggested by the American Society of Echocardiography (ASE) guidelines (mild, moderate, and severe) and used by core laboratories for a number of randomized trials may be more difficult to translate into a four-class clinical or angiographic grading scheme used by multiple other studies (grades 1, 2, 3, and 4). A new grading scheme that subdivides the broader categories of mild and moderate, thus generating a seven-class grading scheme ( Table 1 ), may improve agreement laboratories. The objective of this study was thus to examine the variability in reported incidences of PAR by (1) assessing differences in PAR grading that may be attributable to the broad three-class grading scheme and (2) assessing differences in PAR grading that may be attributable to differences in methods of assessing PAR.
Seven-class grading scheme | Four-class grading scheme (PARTNER I/ASE guidelines) | Regurgitant volume (mL) | RF (%) | EROA (mm 2 ) |
---|---|---|---|---|
1. None | 1. None/trace | (Not quantifiable or within the error of the methods) | ||
2. Trace | ||||
3. Mild | 2. Mild | <15 | <15 | <5 |
4. Mild to moderate | 15–30 | 15–30 | 5–10 | |
5. Moderate | 3. Moderate | 31–44 | 31–39 | 11–19 |
6. Moderate to severe | 45–60 | 40–50 | 20–30 | |
7. Severe | 4. Severe | >60 | >50 | >30 |
Methods
PARTNER Trial Design
The PARTNER IIB trial was a multicenter, randomized trial evaluating the comparative efficacy of TAVR with the Edwards SAPIEN valve versus the SAPIEN XT valve for severe symptomatic AS in inoperable surgical candidates. Similar inclusion and exclusion criteria were used as previously reported for the PARTNER IB trial. In the PARNTER IIB trial, 560 inoperable patients were randomized to either transfemoral SAPIEN ( n = 276) or SAPIEN XT ( n = 284) prostheses. The results of this trial were presented at the American College of Cardiology Scientific Sessions on March 10, 2013.
Study Design
To assess the between-trial discordance in reported PAR incidence, 100 studies previously interpreted by the PARTNER IIB core laboratory were reread by (1) the original core laboratory using its original methods for assessing PAR severity but then using a seven-class grading scheme ( Table 1 ) and (2) a consortium of core laboratory directors using the multiparametric approach to PAR assessment with the seven-class grading scheme. Thus, each study was interpreted three times: twice by the original core laboratory and once by the consortium. Comparisons of variability within and between laboratories were then performed.
Echocardiography Core Laboratory Methodology
The PARTNER IIB echocardiography core laboratory used similar transthoracic echocardiographic acquisition protocols and reliability testing as previously reported by PARTNER I. Unlike the methods used by the PARTNER I core laboratory for PAR assessment, the PARTNER IIB core laboratory used the ASE guidelines with the lower circumferential extent criteria (i.e., PAR was considered severe when the regurgitant jet was >20% of the short-axis [SAX] annular circumference). In addition, circumferential extent was weighted more heavily than other parameters.
Two readers (W.A.J. and L.R.) from the PARTNER IIB core laboratory who had initially interpreted the 100 studies reinterpreted a randomly chosen subset of 100 PARTNER IIB 30-day post-TAVR echocardiograms using their usual methods, which included preferential use of the SAX annular circumference criteria: mild, <10%; moderate, 10% to 20%; and severe, >20%. For the original reads, the severity of PAR was graded as none, mild, moderate, or severe. To align the grading scheme with the commonly used clinical scheme, a more granular grading scheme was used for the repeat reads. Table 1 shows the quantitative grading scheme, which divides mild PAR into two separate grades: mild and mild to moderate. It further divides moderate PAR into moderate and moderate to severe. This results in a seven-class grading scheme, which can be easily collapsed into the four-class scheme used in the ASE guidelines and the PARTNER I trial. The proposed quantitative equivalents of each grade are also listed in Table 1 .
A consortium of echocardiographic core laboratory directors (R.T.H., P.P., and N.J.W.) was used to evaluate PAR in the same subset of 100 PARTNER IIB 30-day post-TAVR echocardiograms independent of the PARTNER IIB core laboratory. The consortium used the ASE guidelines and Valve Academic Research Consortium 2 criteria to assess PAR using a multiwindow, multiparametric approach but relying heavily on color Doppler parameters of jet width just beneath the stent and in relation to the left ventricular outflow tract diameter, circumferential extent of the jet in the SAX views (taking care to image the vena contracta and not jet spray), vena contracta width, and the number of jets (integrating all views). The experience of each reader was relied on to subcategorize each parameter into the seven-class scheme and to integrate all the parameters into a final interpretation. Importantly, jet length and total jet area were not used, as per recommendations of the guidelines. The multiparametric method weighs each parameter on an individual basis, taking into account the quality of the information from each echocardiographic view. As such, no one color Doppler criterion was consistently weighed more heavily than another. Continuous-wave Doppler parameters of jet density and pressure half-time, as well as pulsed-wave Doppler of reversal of flow in the descending aorta, were also integrated into the approach, but as the guidelines make clear, these parameters lack specificity because of the influence of other hemodynamic parameters such as ventricular or aortic compliance. The interpretations for the consortium were initially performed by each laboratory director independently, blinded to the other readers’ interpretations. If unanimous agreement was not obtained on unblinding, the case was discussed openly, and a consensus interpretation was performed. A consensus interpretation was required in <5% of the reads. In addition, the consortium categorized the level of confidence each study’s PAR grade as high, intermediate, or low, as well as uninterpretable. Reasons for inability to interpret most frequently involved missing views, preventing accurate interpretation of the severity of aortic regurgitation (AR). These repeat readings of the core laboratory were then compared with the consortium reads; only those studies with high and intermediate confidence levels were used for the analysis. Central AR was also interpreted using a multiwindow, multiparametric approach, with total AR integrating the severity of PAR and central AR.
Statistical Analysis
Concordance between initial and repeat PAR grading by the core laboratory was performed using Kendall’s τ coefficient of concordance, a nonparametric approach. Weighted κ statistics were used to assess grading agreement between the consortium and the core laboratory.
Results
Of the 100 randomly chosen patients from the PARTNER IIB cohort, five studies could not be interpreted for PAR by the consortium. In addition, the consortium’s level of confidence was low in eight studies, intermediate in 26 studies, and high in 62 studies. Thus, 13 (five uninterpretable and eight low-confidence) studies were eliminated from analysis.
Kendall’s coefficient of concordance for the PARTNER IIB core laboratory repeat reads was 0.903 ( P < .0001), confirming that the repeat reads were read the same as the original report of the PARTNER IIB trial. The incidence of PAR from both the core laboratory and consortium reads is shown in Figure 1 for the four-class grading scheme ( Figure 1 A) and seven-class grading scheme ( Figure 1 B). Tables 2 and 3 show the number of patients in each PAR grade for the PARTNER IIB core laboratory and the consortium. Using the four-class grading scheme ( Table 2 ), exact agreement between the PARTNER IIB core laboratory and consortium was seen in 53 of 87 patients (60.9%). The core laboratory underestimated the consortium’s grading in one patient but overestimated the severity of PAR in 33 of 87 (37.9%). Of those with overestimated grades, 14 patients (42.4%) were graded as having moderate PAR by the core laboratory but mild PAR by the consortium, 17 (51.5%) were graded as having mild PAR by the core laboratory but no or trace PAR by the consortium, and two (6.1%) were graded as having severe PAR by the core laboratory but moderate PAR by the consortium. Using the seven-class grading scheme ( Table 3 ), exact agreement was seen in 34 of 87 patients (39.1%). The core laboratory underestimated the consortium’s grading in five of 87 patients (5.8%) but overestimated the severity of PAR by one grade in 40 of 87 (45.5%) and by two grades in eight of 87 (9.2%). Of those with overestimated grades, 14 of 48 patients (29.2%) were graded as having moderate PAR by the core laboratory but mild or mild to moderate PAR by the consortium, eight (16.7%) were graded as having mild to moderate PAR by the core laboratory but no or trace or mild PAR by the consortium, 16 (33.3%) were graded as having mild PAR by the core laboratory but trace PAR by the consortium, and two (2%) were graded as having severe PAR by the core laboratory but moderate PAR by the consortium.
PARTNER IIB core laboratory | Consortium | |||
---|---|---|---|---|
1 | 2 | 3 | 4 | |
1 | 31 ∗ | 1 † | 0 | 0 |
2 | 17 ‡ | 21 ∗ | 0 | 0 |
3 | 0 | 14 ‡ | 1 ∗ | 0 |
4 | 0 | 0 | 2 ‡ | 0 ∗ |
∗ Agreement between core laboratory and consortium.
† Underestimation by the core laboratory.
PARTNER IIB core laboratory | Consortium | ||||||
---|---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | 7 | |
1 | 11 ∗ | 1 † | 0 | 0 | 0 | 0 | 0 |
2 | 8 ‡ | 11 ∗ | 1 † | 0 | 0 | 0 | 0 |
3 | 0 | 16 ‡ | 9 ∗ | 3 † | 0 | 0 | 0 |
4 | 0 | 1 § | 7 ‡ | 2 ∗ | 0 | 0 | 0 |
5 | 0 | 0 | 5 § | 9 ‡ | 1 ∗ | 0 | 0 |
6 | 0 | 0 | 0 | 0 | 0 | 0 ∗ | 0 |
7 | 0 | 0 | 0 | 0 | 2 § | 0 | 0 ∗ |
∗ Agreement between core laboratory and consortium.
† Underestimation by the core laboratory.
‡ One-grade overestimation by the core laboratory.
Tables 4 and 5 shows the percentage of patients in each PAR grade for the PARTNER IIB core laboratory and the consortium. By either the four-class ( Table 4 ) or seven-class ( Table 5 ) grading scheme, the incidence of moderate or severe PAR, central AR, and total AR was consistently higher by the core laboratory; increasing the number of classes did not change the percentages graded as having moderate or greater AR. Figure 2 A shows exact and partial agreement using the four-class scale (weighted κ = 0.481; 95% confidence limits, 0.367, 0.595). Figure 2 B shows exact and partial agreement using the seven-class scale (weighted κ = 0.517; 95% confidence limits, 0.431, 0.607). The seven-class scheme may have slightly improved the level of concordance as demonstrated by weighted κ estimates.
% | None/trace | Mild | Moderate | Severe |
---|---|---|---|---|
CL PAR | 37 | 32 | 18 | 2 |
Consortium PAR | 52 | 43 | 4 | 0 |
CL CAR | 81 | 17 | 2 | 0 |
Consortium CAR | 95 | 4 | 1 | 0 |
CL TAR | 30 | 48 | 20 | 2 |
Consortium TAR | 51.5 | 44.5 | 4 | 0 |