Measurement and documentation are critical components in the process of providing patient care. Measurements (numerical or categorical assignments based on testing or measuring) form the basis for deciding intervention strategy and therefore influence patient response to therapeutic interventions.1 Measurements are also used during treatment sessions to determine rate of progression and appropriateness of exercise prescriptions. Typically, therapists obtain a series of measures and, in combination with those made by other health care professionals, formulate a clinical hypothesis. The hypothesis includes both physical and psychosocial aspects. If parts of the hypothesis are incorrect because of inaccurate measures, interventions may be misdirected, which can result in treatment that is either not effective or unsafe. Consequently, knowledge of the qualities of measurements that relate to the cardiovascular and pulmonary systems is essential for effective patient care. Measurements can be described according to their type or level of measurement. There are four levels of measurements: nominal, ordinal, interval, and ratio (Table 7-1).2 Recognizing the level of measurement aids understanding and interpretation of the result. Table 7-1 Examples of Commonly Used Measurements and Their Respective Level of Measurement The categories of a nominal measurement scale are defined using objective indicators that are universally understood. For example, the classification of patients with heart failure could be based on the primary cause for the development of the condition (Box 7-1). In each case, the cause would be determined by diagnostic testing such as angiography or echocardiography. Clear descriptions of the criteria for inclusion in each category are necessary to facilitate clinicians’ agreement on the assignment of patients to categories. A high percentage of agreement indicates high interrater reliability. Ordinal measurements are similar to nominal measurements with the exception that the categories are ordered or ranked. The categories in an ordinal scale indicate more or less of a certain attribute. The scale for rating angina is an example of an ordinal scale (Table 7-2). Each category is defined, and a rating of grade 1 angina is less than a rating of grade 4. In an ordinal scale, the differences between consecutive ratings are not necessarily equal. The difference between grade 1 angina and grade 2 is not necessarily the same as between grade 3 and grade 4 angina. Consequently, if numbers are assigned to categories, they can be used to represent rank but cannot be subjected to mathematical operations. Averaging angina scores is incorrect because averaging assumes that there are equal intervals between categories. A group of ordinal data could be reported as a percentage of each response (i.e., 80% of clients reported exercise-induced angina as 3 before a cardiac rehabilitation program.) Table 7-2 From American College of Sports Medicine: ACSM’s guidelines for exercise testing and prescription. Philadelphia, 2010, Lippincott Williams & Wilkins. Categorical measurements are considered ordinal if being assigned to a specific category is considered better than or worse than being in another category. For example, patients with angina could be classified as having either stable or unstable angina. This measurement would be considered ordinal, because stable angina is considered a better condition to have compared with unstable angina.3 Ratio measurements have scales with units that are equal in size and have a zero point that indicates absence of the attribute being measured. Examples of ratio measurements that are used in cardiopulmonary physical therapy include heart rate, cardiac output, oxygen consumption, and 6-minute walk distance (6-MWD). Ratio measurements are always positive values and can be subjected to all arithmetic operations. For example, an aerobic capacity of 4 L/min is twice as great as an aerobic capacity of 2 L/min. The Borg CR10 scale and visual analogue scales are also examples of ratio level measurements.4,5 The zero point of these scales is “nothing at all” or no perception of exertion. The CR10 scale may be preferable for use with patients who experience strong symptoms during testing or training.6 For a measurement to be of value to the therapist, the measure should be both reliable (reproducible) and valid (meaningful). When selecting and performing tests and measures, it is important to remember that measures can be reliable but not valid for a specific application.7 A third factor contributing to measurement variability is the difference in the methods therapists use to obtain measurements. If a measurement is consistent when the same therapist repeats a test, then the measurement is said to have high intrarater reliability. Measurements that are consistent when multiple therapists perform the test under the same conditions are said to have high interrater reliability. Often, measurements have high intrarater reliability but lower interrater reliability because of variations in the specific methods used by therapists to attain the same measurement. Auscultation of breath sounds, a commonly used method of assessing patients in cardiopulmonary settings, has been shown to have only poor to fair interrater reliability.8 Interrater reliability is important in clinical settings in which a patient may be evaluated and treated by more than one therapist. If the interrater reliability of a measurement is low, changes in the patient over time may not be accurately reflected. Interrater reliability for auscultation of breath sounds can be improved through education of the persons performing the measurements.8 Valid measurements are those that provide meaningful information and accurately reflect the characteristic for which the measure is intended. For a measure to be useful in a clinical setting, it must possess a certain degree of validity. Measurements can be reliable but not valid. For example, the ankle-brachial index (ABI) is reliable but not necessarily valid in all populations.9 There are various types of validity. Of importance in clinical practice are concurrent, predictive, and prescriptive validity. Concurrent validity is when a measurement accurately reflects measurements made with an accepted standard. Comparing a measurement made with a heart rate monitor with an ECG recording is an example of determining concurrent validity. In this example, the ECG recording would be considered the gold or accepted standard. Another example is using pulse oximetry during exercise testing. Yamaya and colleagues (2002)10 compared pulse oximetry versus directly measured arterial oxygen saturation (the gold standard) and reported that a forehead sensor was more valid than a finger sensor. Measurements with predictive validity can be used to estimate the probability of occurrence of a future event. Screening tests often involve measurements that are used to predict future events. For example, identifying people with risk factors for coronary artery disease (CAD) leads to a prediction that their likelihood of developing CAD is higher than normal. Measures with prescriptive validity provide guidance to the direction of treatment. The categorical measurement of determining a person’s risk for a future coronary event is a measurement that would need to have prescriptive validity. By classifying patients into high- versus low-risk categories on the basis of results of a diagnostic exercise test, the intensity and rate of progression of treatment is determined.
Measurement and Documentation
Characteristics of Measurements and Outcomes
Levels or Types of Measurements
Patient Characteristic
Test or Other Measure
Level of Measurement
Gender
Male/female
Nominal
Range of motion
Goniometry
Ratio
Muscle strength
Manual muscle testing (MMT)
Ordinal
Isokinetic dynamometry
Interval
Functional status
Functional independence measure
Ordinal
Timed Up and Go (TUG)
Ratio
Angina
Angina rating scale
Ordinal
Borg CR10
Ratio
Dyspnea
MRC scale
Ordinal
Borg CR10
Ratio
Visual Analog Scale
Ratio
Nominal
Ordinal
Rating
Description
1
Mild, barely noticeable
2
Moderate, bothersome
3
Moderately severe, very uncomfortable
4
Most severe or intense pain ever experienced
Ratio
Reliability and Validity of Measurements
Reliability
Validity