Background
Multiparametric scoring of valvular regurgitation may compromise interobserver agreement, as readers weight parameters differently. The aims of this study were to quantify interobserver variability in the grading of chronic tricuspid regurgitation (TR), develop an algorithm for grading TR, and assess the effect of this algorithm on concordance and accuracy.
Methods
On the basis of current guidelines, two experts graded the severity of TR by consensus in 40 patients with a spectrum of TR severity. A subgroup of patients ( n = 18) also had TR severity assessed by cardiac magnetic resonance. Sixteen cardiologists independently graded the first 20 cases as severe or nonsevere TR. After group review, a grading algorithm to differentiate severe and nonsevere TR was devised by consensus. The same observers used the algorithm to grade the second set of cases.
Results
Baseline differentiation of severe from nonsevere TR showed modest reliability and accuracy compared with an expert read (multirater κ = 0.55; overall agreement, 78%; accuracy, 81%). The consensus algorithm for severe TR was a suggestive color jet and at least one of (1) right atrial area > 18 cm 2 and inferior vena cava diameter > 2.5 cm; (2) vena contracta width > 0.7 cm and jet area > 10 cm 2 ; (3) a dense, triangular TR Doppler profile; and (4) holosystolic reversal of hepatic vein flow. Application of this algorithm improved the multirater κ coefficient to 0.80, the level of agreement to 90% ( P = .033), and mean reader accuracy to 92% ( P = .001).
Conclusions
Only modest baseline agreement was found between readers on the distinction of severe and nonsevere TR. An objective, structured grading algorithm improved both interrater agreement and accuracy.
Attention ASE Members:
ASE has gone green! Visit www.aseuniversity.org to earn free CME through an online activity related to this article. Certificates are available for immediate access upon successful completion of the activity. Non-members will need to join ASE to access this great member benefit!
Methods
Study Design Overview
A three-step process was carried out for building and evaluating a consensus document on TR severity grading. In phase 1, a group of observers was presented with a set of 20 echocardiographic cases to grade TR severity in a group setting. Their grading was compared with that of two expert readers, whose interpretations were used as the reference standard for the purposes of this study. The second phase consisted of an open forum discussion among the participants about the value and role of each of the TR severity grading parameters outlined in the American College of Cardiology guidelines for valvular disease management and the American Society of Echocardiography (ASE) recommendations for native valvular regurgitation assessment. Particular emphasis was placed on cases from phase 1 for which there was significant disagreement between observers. The association between each of the severity parameters and the expert grading was discussed in a group setting with all participants present. After this meeting, a consensus document was generated to outline an agreed means of grading TR severity. This consensus document was then verified by all participants. In the third phase, after the consensus document was circulated, the same group of observers was presented with another set of 20 cases and asked to grade TR severity in a similar fashion. The overall agreement of observers, as well as their accuracy in relation to the two expert readers, was assessed in both the first and third phases, and the two phases were then compared. In a subgroup of patients in whom cardiac magnetic resonance imaging (MRI) was available within 48 hours of echocardiography, TR severity was also determined using MRI by an experienced reader. This provided an external standard against which the participants and expert readers were compared. This was done to follow the ASE’s suggestion of using an external standard whenever possible in quality improvement exercises.
Baseline Assessment
Twenty two-dimensional transthoracic echocardiograms in patients with a spectrum of TR severity were shown to 16 participants. The participants consisted of 16 cardiologists with level 2 ( n = 4) and level 3 ( n = 12) certification in echocardiography. Each participant was asked to grade the degree of TR in each case as severe or nonsevere. This initial assessment was performed in a group format, with images displayed on a single screen in front of the whole group and with participants marking their answers on scoring sheets that were circulated at the beginning of the session. For each case, the images were advanced at a standard rate, allowing 3 min per case to grade TR severity. The 3-min time per case was chosen to allow review of 20 cases within a 1-hour period. All data were collected on an identified answer sheet, with each case coded separately.
Cases were selected retrospectively from the archived echocardiography database in an effort to include a range of pathology and TR severity. All cases were selected to represent chronic rather than acute TR. In the selection of images, cases were chosen preferentially if a comprehensive cardiovascular MRI study with appropriate sequences for assessment of TR severity had been performed within 48 hours of echocardiography.
Images shown to the participants included (1) two-dimensional and color Doppler cine clips from each of the parasternal right ventricular (RV) inflow, parasternal short-axis, apical four-chamber, and subcostal views; (2) still-frame images of the TR color jet taken in systole from an apical view used to measure color jet area and vena contracta width; (3) images of the inferior vena cava (IVC), including two-dimensional, Doppler color flow, and M-mode images with respiration; and (4) hepatic vein pulsed-wave Doppler recordings and TR jet continuous-wave Doppler recordings as available. All echocardiographic studies were performed by experienced clinical sonographers using conventional equipment according to standard laboratory protocol. For color flow images, gain was increased to a level that maximized color mapping but eliminated random color speckles in nonmoving regions. An aliasing velocity of 45 to 65 cm/sec was used in all cases.
To avoid any bias, no clinical data about the cases were provided. The presented cases consisted of a spectrum of image quality and clinical indications. All images were scored for image quality on the basis of visualization of the tricuspid valve leaflet morphology, definition of the right atrial (RA) and RV walls, adequacy of color Doppler and spectral Doppler images, and adequacy of visualization of the IVC. A study was given a score of 1 (complete study with excellent acoustic detail), 2 (adequate acoustic detail but limited color flow or pulsed Doppler assessment) or 3 (technically difficult study with poor or inadequate visualization of tricuspid valve morphology and/or RA and RV walls). On the basis of this differentiation, an effort was made to ensure that there was balance between the first and second sets of 20 cases with respect to study image quality, so as not to bias the results.
TR Severity Consensus Building
After this initial phase of evaluation, each of the parameters outlined in the American College of Cardiology guidelines on valvular heart disease and the ASE statement on valvular regurgitation were discussed in an open forum. All of the participants were involved in this discussion, as well as the two expert readers. The parameters evaluated included: RA area, RV basal diameter, RV mid diameter, IVC size, IVC response to respiration, hepatic vein flow profile, TR jet area by color, TR jet vena contracta width, hepatic vein Doppler profile, and multiple combinations of these parameters. The sensitivity and specificity of these echocardiographic parameters in predicting expert-determined and cardiac MRI–determined TR severity (described below) in the baseline set of cases were calculated and discussed as a group consisting of all the initial readers. On the basis of this analysis and discussion, a consensus was developed with the input of all observers on which parameters should be used to distinguish severe from nonsevere TR, especially when there were incomplete or discrepant data ( Figure 1 ).