Evidence-Based Surgery: Critically Assessing Surgical Literature

Chapter 9 Evidence-Based Surgery


Critically Assessing Surgical Literature




Not long ago, case series published by a single surgeon or group of surgeons reporting the results of a novel management strategy or new technique were the mainstay of communication in the surgical community. These reports highlighted surgical advances that could be applied to patients, but often reflected the best surgeons reporting their best results. Such reports represented much of the evidence base that guided surgical practice. However, with growing recognition that almost everyone will require surgery at some point in their lives, surgical disease is being increasingly considered in the context of the public’s health. From this perspective, the published experience of one surgeon becomes less relevant than evidence that describes how surgical procedures actually work in the general community, how their effectiveness compares with other strategies, and the full spectrum of outcomes needed to assess a procedure’s impact on patients and the health care system. Over the last decade, surgical health services and outcomes research has emerged as an essential approach for informing the modern surgical era with evidence. Surgical investigators apply a range of research methods to draw truths from the collective surgical experience, with the goal of integrating the best available evidence into what surgeons do in general practice. Distinct from the past era of surgical research, current efforts aim to move beyond reporting what can be done to patients to establishing what should be done for patients.


Outcomes and health services research are broad terms for scientific inquiries evaluating health care outcomes, care delivery, and the systems delivering that care. This enterprise does not focus on outcomes alone, but also considers the daily actions performed by health care teams and surgeons (processes of care) as well as the environment in which services are delivered (structures of care). With the medical community facing increasing regulatory oversight and a drive for more accountable care, it is essential that surgeons understand and embrace the approach of evidence-based surgery so they can improve the care of their patients and maintain a leadership role in health policy and quality improvement activities. The goal of this chapter is to help the reader become a more critical evaluator of the surgical literature and advance the use of better evidence in surgical practice. To that end, this chapter is framed through questions that a critical reader should ask when reading a research study.



What is the Purpose of the Study?


Assessing the value of a study requires an understanding of the investigator’s intended purpose. Most studies can be placed into one of two general categories, descriptive (or exploratory) and analytic (Fig. 9-1). Most descriptive studies should be considered hypothesis-generating rather than causality-focused whereas analytic studies test a prespecified hypothesis. A study’s purpose should drive the selection of study groups, outcomes of interest, data sources, study design, and analytic plan. Unfortunately, many studies fall short in linking study purpose and methodology; investigators may sometimes try to establish causality from descriptive studies. For example, in a study describing trends in the misdiagnosis of appendicitis during a time when there was increased use of diagnostic testing, an attempt to establish a causal link between these two findings (i.e., the trend in misdiagnosis was caused by the trend in diagnostic testing) would be overreaching the descriptive nature of the study.1 The intent of descriptive studies should be identifying possible associations and serving as an impetus for future investigations using more rigorous analytic approaches.




What is being Compared?


Many surgical studies evaluate outcomes (e.g., complications, cost, efficacy, effectiveness, quality of life, functional status, patient satisfaction) of one intervention or strategy compared with another. The method of classifying subjects into one group or another and the fact that some exposures vary with time pose important methodologic challenges to be considered when evaluating the strength of evidence provided by a study.



Misclassification


Misclassification is the incorrect categorization of a subject into a study group. This issue is important because in the context of misclassification, even a properly performed analysis with an appropriate study design will yield biased results. There are two types of misclassification, nondifferential and differential. Nondifferential misclassification indicates an equal and random chance that any one subject will be misclassified (or included as part of the wrong study group). With differential misclassification, the chance a subject is misclassified is nonrandom.


Stage migration, also known as the Will Rogers phenomenon, is a classic example of misclassification.2 Cancer stage has a well-defined relationship with long-term survival. Patients may be staged through clinical examination, radiographic assessment, invasive procedures, or pathologic tissue examination (the gold standard). Staging techniques other than pathology-based approaches may be inaccurate. It is not uncommon for higher accuracy staging modalities to be associated with higher observed survival rates when compared with lower accuracy methods (e.g., a clinical examination). Patients only assessed clinically might be understaged—categorized as early-stage cancer but actually with late-stage cancer. Survival rates for early- stage patients would then be worse than they really were because misclassified late-stage patients lower the group average. Similarly, if overstaged patients were considered with truly late-stage patients, survival would be better than in actuality. This phenomenon has been demonstrated in a study of lung cancer patients in which those who underwent pathologic staging had better 5-year survival rates compared with those who underwent clinical staging.3


If a difference in outcome truly exists between two groups, nondifferential misclassification will bias the results toward the null hypothesis, a conservative bias. With differential misclassification, the bias may be conservative or anticonservative, depending on the manner in which patients were misclassified and the true relationship between group assignment and outcome. Because nondifferential error leads to a conservative bias, preferable to nonconservative, which could lead to false-positive findings, differential misclassification is the more serious concern. Consider a hypothetical study of a surgical intervention for cancer involving two groups of patients, one classified based on clinical staging and the other on pathologic staging. In this case, incorrectly assuming that both groups have been equally classified would be a mistake. If the study demonstrated a significant benefit for the surgical intervention in the pathologically staged group, the reader would have to wonder whether the observed difference in survival was attributable to the intervention or to differential misclassification (understaging) of patients in the clinically staged group. By comparison, if staging in both study groups were based on a radiographic evaluation, each patient would have an equal chance of being overstaged or understaged. Failure to demonstrate a difference in outcome between the two interventions might be a false-negative finding attributable to nondifferential misclassification.



Time-Varying Exposures


Time-varying (or time-dependent) exposures refer to predictors whose value may vary with time (e.g., smoking status, transplantation status). Failure to account for time-varying exposures in the analysis of an observational study may lead to biased results and incorrect conclusions. An example of potential bias arising from time-varying covariates is an analysis of heart transplantation survival data.4 The impact of heart transplantation on survival was assessed by comparing patients who received a transplant with those who did not. Although the initial analysis revealed a survival benefit associated with transplantation, the manner in which patients were grouped (treating transplantation as a fixed variable) led to bias in favor of transplanted patients.


Transplantation wait times are often long and many patients die while awaiting a donor organ; therefore, patients on the transplantation wait list, but who died a short time after being listed, did not have a chance to undergo transplantation. When the investigators retrospectively assigned patients to these two study groups (transplanted versus not transplanted), the patients who survived long enough to receive a new heart introduced selection bias in favor of transplantation, because their survival times were on average longer than in the nontransplantation group. In actuality, each subject’s exposure status (transplanted versus not transplanted) was time-dependent. While on the wait list and prior to transplantation, a subject could contribute survival time to the nontransplantation group; subsequent to transplantation, the same subject could then contribute survival time to the transplantation group. Reanalysis of the data evaluating exposure status in a time-dependent fashion revealed no association between transplantation and survival.5



What is the Outcome of Interest?


Concluding that operation A is better than operation B must be supported by evidence of a difference in outcomes. But what does “better” mean? What if operation A is better with regard to one type of outcome but worse in terms of another? Outcomes assessment cannot determine which procedure is better for the patient, but it can inform patients and providers about differences between two or more competing therapeutic options. Readers judging a study’s value should determine which outcomes were assessed, from what perspective, and whether the chosen outcomes were consistent with the study’s stated aims.





Patient-Reported Outcomes


Patient-reported outcomes (PROs) measure subjective outcomes (termed concepts in the PRO literature) of care reported by the patient directly, without further interpretation of this response by a provider or researcher. Similar to outcomes informing safety, efficacy, or effectiveness, PROs are measurable study outcomes. Examples of common PRO concepts are health-related quality of life (HRQOL), satisfaction with care, functional status, well-being, and health status. PROs usually consist of several more discrete concepts (or domains). HRQOL, for example, at a minimum, should ideally include domains that measure physical (e.g., pain), psychological (e.g., depression), and social functioning (e.g., the ability to carry out activities of daily living). Specific examples of items contained within these domains might include pain, sleep problems, sexual function, vitality and energy, and pain, any or all of which may be relevant to the research question and are certainly of interest to patients.


PRO data are collected through the use of survey instruments. These instruments are composed of individual questions, statements, or tasks evaluated by a patient. PRO instruments use a clearly defined method for administration, data are collected using a standardized format, and the scoring, analysis, and interpretation of results should have been validated in the study population. In general, researchers are advised to use existing instruments to measure PROs (rather than creating their own) because the appropriate development of an instrument requires significant time, resources, testing, and validation before application.6 Knowing whether the chosen instrument has been validated in the population of interest is also essential when interpreting the results and should be questioned when reading a study reporting PROs.


Although PROs represent a useful, informative, and important outcome, they are difficult to measure accurately and can be controversial. For example, there is often a disconnect between what clinicians and patient believe to be a low HRQOL associated with a chronic condition. When patients actually experience a chronic health condition that seems intolerable, they may shift their frame of reference, and there is also a degree of patient adaptation that is difficult to quantify. For instance, the quality of life reported by a newly wheelchair-bound patient compared with one who has been in a wheelchair for a number of years could be drastically different—the former might be quite low, whereas the latter might be higher than anticipated. Part of the difficulty is that PROs are a more subjective, less tangible outcome than mortality or readmission. However, incorporating these measures into outcomes assessment is paramount in counseling future patients.



Resource Utilization


Resource utilization refers to the use of health services related to an intervention. In the context of surgical care, this includes utilization of hospital resources—length of stay, hospital readmission, use of outpatient, pharmacy, and durable medical equipment (e.g., wheelchairs and oxygen) services, and emergency room use. Defining criteria for expected utilization is challenging and average use is often considered as a benchmark. Excess resource utilization, as compared with the average, is considered an inferior outcome and is often associated with some form of complication. It can be challenging to determine how much resource utilization is related to the intervention or procedure under study and how much is attributable to a patient’s baseline clinical conditions (e.g., chronic disease, adverse events) and nonclinical factors (e.g., patient-level social support, patient preference for in-hospital versus out of hospital care, insurance status precluding use of home nursing). For example, an investigator might use Medicare data to study readmission after pancreatic resection for cancer. Although readmission events are readily identified, it is not possible to know whether the readmission was planned (for chemotherapy administration) or unplanned (because of a complication).


The chosen timeline for assessing health care utilization is also critical. Only measuring immediate health care utilization associated with a diagnostic test would miss the potential downstream impact on future diagnostic and therapeutic care. Limiting assessment to brief periods might miss potentially important future implications over a patient’s life. For example, although the quality and use of high-resolution imaging studies (e.g., computed tomography [CT] scan) has risen, the number of incidentalomas identified (e.g., adrenal, lung, or liver lesions too small to be diagnosed accurately on imaging) has concurrently increased. If an investigator hoped to describe the impact of CT scanning as a cancer screening modality, only measuring the individual screening study would fail to capture the downstream effect in the form of multiple, costly follow-up studies and/or biopsies to evaluate an incidentaloma further.



Costs


Charges are the amount of money requested for health services and supplies. By comparison, costs are the actual amount of money required to deliver care. Differentiating the two is critical because health economic studies should aim to characterize the costs of care. Most data used for health economic analyses provide information on health care charges. If charges are evaluated instead of costs, an intervention or management strategy would appear more expensive than it actually is. When reading the methods section of such a study, the critical reader must look for several important points. First, the investigators should describe if and how they converted charges to costs, generally through the use of a charge-to-cost ratio. Second, costs should be discounted (typically, 3% to 5%) to account for the fact that a dollar today will be worth less than a dollar in the future. Finally, studies spanning several years should adjust for inflation.


The perceived relationship between health care utilization and costs depends on the perspective (e.g., patient, provider, hospital, payer, or societal) taken by the investigator. A hospital may be reimbursed a prespecified amount for performing a procedure and all patient care associated with that operation for the subsequent 90 days. If the patient experiences a complication and requires multiple clinic visits to deal with that complication, this health care utilization may be viewed as a poor outcome from the perspective of the patient, surgeon, and hospital or clinic. However, in this scenario, from the payer’s perspective, the cost of all complications-related care within 90 days of the operation would be irrelevant because they do not have to pay more for it. Alternatively, some types of hospitals (critical access) can receive greater reimbursement for greater care delivery; therefore, increased utilization may not be an adverse outcome for a given hospital or surgeon, even though it may be for the health care system as a whole. The perspective of the study will define which costs are necessary to ascertain and include in the analysis. For example, whereas a societal perspective would include the costs of care as well as the direct and indirect monetary costs associated with care (e.g., travel and boarding expenses, lost productivity at work, caretaker expenses), a hospital’s perspective would be more selective, not considering the patient’s out of pocket expenses, but certainly including whether delivered care is covered by a global payment to the hospital.


There are several different methods for comparative health economic analyses. All methods consider the costs of care in terms of dollars, but differ in terms of how they quantify health benefit. A cost-benefit analysis quantifies health benefit in terms of dollars. Although easy to compare and interpret such results, the great challenge with this approach is assigning a dollar value to a life or a specific health outcome. A cost-utility analysis quantifies health benefits in terms of quality-adjusted life-years (QALYs). Utilities are a measure of overall quality of life, usually scaled between 0 and 1, with 1 being perfect health, and are ascertained using a visual analogue scale, the time trade-off, or standard gamble techniques.7 Utilities are multiplied by survival time to determine QALYs. When this outcome metric is evaluated as a cost/QALY, it is readily comparable between interventions. An intervention with an associated cost/QALY of $50,000 or less has typically been considered cost-effective. In the original Medicare law that included dialysis as a publicly funded treatment, $50,000 was determined to be the cost of dialysis. However, there is ongoing debate about the validity of this metric and a range of costs/QALY of $20,000 to $100,000 has been proposed as more reasonable.8 Cost-effectiveness analyses measure health benefit in terms of an outcome metric called the incremental cost-effectiveness ratio (ICER), which is the difference in costs between two competing therapeutic options divided by the difference in health outcome. If the ICER comparing a treatment with a standard reveals that it is more expensive and less efficacious, it is considered to be dominated by the standard and not favored, whereas a less expensive and more efficacious treatment dominates the standard and is favored. Circumstances in which an intervention is more expensive and efficacious or less expensive and efficacious represent a trade-off.



Surrogate End Points


Interest in surrogate end points has emerged because definitive clinical outcomes may be difficult to assess secondary to the infrequency of a chosen clinical end point, the cost of ascertainment, or a long lag time to development. Surrogate end points are commonly used in studies of new pharmaceutical interventions when efficient data gathering about treatment effect is essential to move a product to the marketplace rapidly.9 The true clinical benefits of an intervention may take years to recognize, and it may be desirable to identify an intermediate outcome that could serve as a surrogate for the actual clinical effect. Unfortunately, the problem with using surrogate end points is that an intervention may influence an outcome through various, and potentially unintended or unanticipated, pathways.


A classic example illustrating the dangers of using surrogate end points was the Cardiac Arrhythmia Suppression Trial.10 This study hypothesized that the incidence of sudden cardiac death could be reduced through the administration of flecainide or encainide. These drugs became popular because they had been designed to reduce the rate of ventricular ectopy, a common rhythm aberrancy thought to cause sudden cardiac death. Although these drugs had been shown to reduce ventricular ectopy, when mortality (a clinical, nonsurrogate end point) was measured in this trial, administration of these drugs was found to result in a threefold increase in the rate of death. Suppression of ventricular ectopy was therefore a poor surrogate for the intended clinical impact (improved survival) of these agents.


When evaluating a study, the reader must not only ask whether the selected outcome can answer the research question, but also whether that outcome is a meaningful clinical end point or simply a more easily measured surrogate. Criteria for validating a surrogate end point have been proposed—the surrogate end point should be correlated with the clinical end point of interest and fully capture the net effect of the intervention on the end point of interest.9 For example, with stage III colon cancer, there was interest in using adjuvant chemotherapy to improve survival. Disease-free survival was proposed as a surrogate for overall survival. Clearly, these two end points are correlated, satisfying the first criterion. Using meta-analysis, adjuvant chemotherapy was shown to result in similar relative improvement in both disease-free and overall survival.9 In other words, disease-free survival fully captured the net effect of adjuvant chemotherapy for stage III colon cancer, suggesting that it might be a valid surrogate for assessing overall survival benefit. Unless a chosen surrogate outcome has been validated and vetted in other surgical studies, the results and conclusions should be interpreted with caution.

< div class='tao-gold-member'>

Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

Aug 1, 2016 | Posted by in CARDIAC SURGERY | Comments Off on Evidence-Based Surgery: Critically Assessing Surgical Literature

Full access? Get Clinical Tree

Get Clinical Tree app for offline access