Chief Complaints and ICD Codes

Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.

1. INTRODUCTION

A chief complaint is a concise statement in English or other natural language of the symptoms that caused a patient to seek medical care. A triage nurse or registration clerk records a patient's chief complaint at the very beginning of the medical care process ( Figure 23.1 ).

An external file that holds a picture, illustration, etc. Object name is f23-01-9780123693785.jpg

Points in the healthcare process at which chief complaints and ICD codes are recorded and transmitted to a health department or other biosurveillance organization. This figure illustrates a hypothetical patient with anthrax who seeks care at an emergency department (ED) and is subsequently admitted to a hospital. The patient's chief complaint is recorded at the time of registration and transmitted immediately to a health department via a HL7 message router. When the patient is discharged from the ED and admitted to the hospital, a professional coder reads the patient's ED chart and assigns ICD codes for billing purposes. The delay in transmission to a health department is indicated by a slanted arrow. Another ICD code may be assigned at the time of hospital admission. Finally, ICD codes are assigned by professional coders at the time of hospital discharge. These codes are transmitted to third party payers, who may submit them to data aggregators (e.g., commercial companies that analyze healthcare trends or health departments that assemble hospital discharge data sets for statistical purposes). In general, the diagnostic precision of the data available to a health department increases over time (moving from left to right in the figure). Note that there is variability from healthcare system to healthcare system. In some settings, the chief complaints are coded directly into ICD codes by physicians at the time of service (e.g., U.S. military).

In contrast, an ICD code is a number (e.g., 558.9) that a clinician or professional coder uses to represent a medical diagnosis, syndrome, or symptom–usually for the purpose of billing. The ICD-coding system allows physicians and professional coders to express their diagnostic impression of a patient at different levels of diagnostic precision, ranging from very precise (e.g., ICD code 022.1 for inhalational anthrax) to syndrome (e.g., ICD code 079.99 for viral syndrome) to symptom (e.g., ICD code 780.6 for fever). The diagnosis may be a working diagnosis (a provisional diagnosis) or a definitive diagnosis, although ICD does not allow the clinician or coder to indicate this distinction. A clinician or professional coder may record an ICD code early in the process of medical care. Professional coders, not clinicians, invariably encode hospital discharge diagnoses, which are not available until after a patient is discharged from a hospital. The important points to remember about ICD coding are the heterogeneity in diagnostic precision, who does the encoding, and when the encoding is done.

Chief complaints and ICD codes are used ubiquitously in medical care in the United States in both the civilian and military healthcare systems. Medicare and other third party payers require these data for billing and claims. As a result, the healthcare industry has built significant electronic infrastructure to capture chief complaints and ICD codes.

Over the past six years, researchers have studied methods to obtain and analyze patient chief complaints and ICD codes for the purpose of early detection of outbreaks. The intensity of research on these data has been motivated in part by their availability. The objective of research is to test hypotheses that these data can be used either alone or in conjunction with other data to improve the timeliness of outbreak detection (Wagner et al., 2001). As a result of the research, many health departments are now routinely monitoring chief complaints and ICD codes.

For clarity of exposition, we discuss chief complaints and ICD codes separately in this chapter. However, we do not wish to reinforce a somewhat prevalent impression that they are competing alternatives. Both types of data contain information that is useful in biosurveillance and together they are complementary. In the future, we expect that biosurveillance systems will collect both types of data routinely. They will link these data to other data about a patient to support more accurate inference about a patient's true disease state. We explore the future roles of chief complaints and ICD codes and their synergies in the final section of this chapter.

2. CHIEF COMPLAINTS

The concept of a chief complaint is important in medicine. It is a statement of the reason that a patient seeks medical care. Medical and nursing schools teach future clinicians to begin their verbal presentations of patient cases with a statement of the chief complaint. They teach them to record the chief complaint using the patient's words and to avoid replacing the patient's words with their diagnostic interpretation. It is considered bad form to proffer a diagnostic impression in a chief complaint. 1 As a result, the chief complaint usually states the key symptoms that a patient is experiencing.

During the process of medical care, a patient's chief complaint is recorded many times. Triage nurses and registration clerks create the first record at the time of initial registration for service at a clinic or emergency department (ED). Clinicians also record chief complaints in daily progress notes and discharge, transfer, and patient acceptance summary notes.

The research that we will discuss has shown that chief complaints contain information that may be very useful in biosurveillance. This result is not surprising. If a patient is ill with an infectious disease and presents to a physician, we would expect her chief complaint to reflect the nature of the illness.

2.1. Description of Chief Complaint Data Used in Biosurveillance

The recorded chief complaint of most interest to biosurveillance is the one recorded at the time a patient initially presents for medical care. 2 This chief complaint is often recorded directly into a registration computer by triage nurses or registration clerks and is highly available for biosurveillance purposes.

Table 23.1 is a sample of chief complaints from a registration computer in an ED. The chief complaints are more terse (four or five words) than those recorded in physician notes. They also contain misspellings, unconventional abbreviations, and unorthodox punctuation. Only two of these chief complaints contain diagnoses (finger lac and uti, which are abbreviations for finger laceration and urinary tract infection, respectively). The rest describe the patient's symptoms. The second column of the table shows the syndromes that a human expert assigned to the patient for purposes of training a Bayesian natural language processor. We will discuss syndromes shortly.

TABLE 23.1

Examples of Chief Complaints Recorded in an Emergency Department

Chief complaint	CoCo Syndrome
diff breathing	Respiratory
chest pain	Other
abd pain nausea vomiting	Gastrointestinal
Finger lac	Other
resp dist	Respiratory
Fever	Constitutional
nausea diarhea chest tightness sob	Gastrointestinal, Respiratory
chest pain vomiting	Gastrointestinal
r side pain	Other
rectal bleeding walkin	Hemorrhagic
chest pain	Other
Uti	Other
urinary problems	Other
abd pain	Gastrointestinal

Notes: These 14 examples come from a file used to train a Bayesian natural language processing (NLP) program called CoCo (described in Chapter 17). The second column shows the syndromes that a physician assigned to the chief complaints. For clarity, we adopt a typographical convention of italicizing syndromes.

2.1.1. Natural Language Processing of Free Text Chief Complaints

Before chief complaints can be analyzed by computerized biosurveillance systems, they must be converted from English (or other natural language) into computer-interpretable format. Biosurveillance systems typically use natural language processing (NLP) to convert chief complaints into computer-interpretable format. We are aware of one system that takes advantage of a routine translation of chief complaints into computer-interpretable form (Beitel et al., 2004).

There are two basic NLP methods for converting free-text chief complaints into computer-interpretable format—keyword parsing and probabilistic. We discussed these methods in Chapter 17 and will not repeat the discussion here.

The NLP component of a biosurveillance system analyzes a recorded chief complaint to classify a patient into a syndrome category. Some biosurveillance systems use NLP to identify syndromes directly and others use NLP to identify symptoms in the chief complaint and subsequently use Boolean (AND, OR, NOT) or probabilistic combinations of symptoms to assign a syndrome.

The subsequent (non-NLP) analysis performed by a biosur-veillance system searches for clusters of syndromes in space, time, and/or demographic strata of a population, as discussed in Part III.

2.1.2. Syndromes

The concept of a syndrome is important in medical care (and in epidemiology). A syndrome is a constellation of symptoms, possibly combined with risk factors and demographic characteristics of patients (e.g., age and gender). Familiar examples of syndromes are SARS (severe acute respiratory syndrome) and AIDS (acquired immune deficiency syndrome). A syndrome plays the same role as a diagnosis in medical care—it guides the physician in selection of treatments for patients.

In this chapter, we will be discussing syndromes such as respiratory that are far less diagnostically precise than SARS or AIDS. The syndromes used in automated analysis of chief complaints and ICD codes are diagnostically imprecise by intent. The developers of these syndromes recognize that chief complaints (and ICD-coded diagnoses obtained close to the time of admission) in general do not contain sufficient diagnostic information to classify a patient as having SARS or other more diagnostically precise syndrome. They create syndrome definitions that are sufficiently precise to be useful, but not so precise that few if any patients will be assigned to them automatically, based solely on information contained in a four- or five-word chief complaint (or ICD code assigned early during the process of medical care).

Table 23.2 is the set of syndromes used by the RODS system. The table shows each syndrome name (which is just a convenient handle to reference the syndrome definition) and its definition. The RODS system uses the syndrome definitions in two ways. First, it makes them available to epidemiologists and other users of the system to assist in interpreting time series and maps of chief complaint data that have been aggregated by syndrome. If the user sees an increase in a syndrome such as respiratory, his interpretation of the increase should be that it could be due to any disease that is consistent with the definition of the respiratory syndrome. Second, RODS provides the definitions to individuals who are developing training sets (discussed in Chapter 17) for the CoCo parser. 3

TABLE 23.2

CoCo Syndrome Definitions

CoCo Syndrome	Definition
Gastrointestinal	Includes pain or cramps anywhere in the abdomen, nausea vomiting, diarrhea and abdominal distension or swelling.
Constitutional	Is made up of nonlocalized, systemic problems including fever, chills, body aches, flu symptoms (viral syndrome), weakness, fatigue, anorexia, malaise, irritability, weight loss, lethargy, sweating (diaphoresis), light headedness, faintness and fussiness.
Shaking (not chills) is not constitutional but is other. Includes all of the “vaguely unwell” terms: doesn't feel well, feels ill, feeling sick or sick feeling, feels bad all over, not feeling right, sick, in pain, poor vital signs.
Shaking or shaky or trembling (not chills) are not constitutional but are other (8).
However, tremor(s) is neurological (7).
Note: cold usually means a URI (cold symptoms; 3), not chills. Weakness, especially localized, is often neurological (7), rather than constitutional.
Respiratory	Includes the nose (coryza) and throat (pharyngitis), as well as the lungs. Examples of respiratory include congestion, sore throat, tonsillitis, sinusitis, cold symptoms, bronchitis, cough, shortness of breath, asthma, chronic obstructive pulmonary disease (COPD), pneumonia, hoarseness, aspiration, throat swelling, pulmonary edema (by itself; if combined with congestive heart failure, it is 8). If both cold symptoms and flu symptoms are present, the syndrome is respiratory.
Note: “Sore throat trouble swallowing” is respiratory, not respiratory and botulinic. That is, the difficulty in swallowing is assumed to be an aspect of the sore throat.
Rash	Includes any description of a rash, such as macular, papular, vesicular, petechial, purpuric or hives. Ulcerations are not normally considered a rash unless consistent with cutaneous anthrax (an ulcer with a black eschar).
Note: Itch or itchy by itself is not a rash.
Hemorrhagic	Is bleeding from any site except the central nervous system, e.g., vomiting blood (hematemesis), nose bleed (epistaxis), hematuria, gastrointestinal bleeding (site unspecified), rectal bleeding and vaginal bleeding. Bleeding from a site for which we have a syndrome should be classified as hemorrhagic and as the relevant syndrome (e.g., Hematochesia is gastrointestinal and hemorrhagic; hemoptysis is respiratory and hemorrhagic ). Bleeding from a site for which we have a syndrome should be classified as hemorrhagic only without reference to the relevant syndrome, except hematochesia… hemoptysis.
Note: “Spitting up blood” is assumed to be hemoptysis.
Botulinic	Includes ocular abnormalities (diplopia, blurred vision, photophobia), difficulty speaking (dysphonia, dysarthria, slurred speech) and difficulty swallowing (dysphagia).
Neurological	Covers nonpsychiatric complaints which relate to brain function. Included are headache, head pain, migraine, facial pain or numbness, seizure, tremor, convulsion, loss of consciousness, syncope, fainting, ataxia, confusion, disorientation, altered mental status, vertigo, concussion, meningitis, stiff neck, tingling, numbness, cerebrovascular accident (CVA; cerebral bleed), tremor(s), vision loss or blindness (but changed or blurred vision or vision problem is botulinic). Dizziness is constitutional and neurological.
Note: headache can be constitutional is some contexts, for example, “headache cold sxs achey” or “headache flu sxs.”
Other	Is a pain or process in a system or area we are not monitoring. For example, flank pain most likely arises from the genitourinary system, which we are not modeling, and would be considered other. Chest pain with no mention of the source of the pain is considered other (e.g., chest pain [other] versus pleuritic chest pain [respiratory]). Earache or ear pain is other. Trauma is other. Hepatic encephalopathy (not neurological), dehydration (not constitutional), difficulty sleeping or inability to sleep (not constitutional), constipation (not constitutional), and choking (but aspiration is respiratory) are all other.

Note: A physician or other medical expert refers to these definitions when creating training examples for the CoCo Bayesian classifier (described in Chapter 17).

Tables 23.3 and 23.4 are syndrome classification systems used by Maryland Department of Hygiene and Mental Health and the New York City Department of Health and Mental Hygiene, respectively.

TABLE 23.3

Syndrome Classification System Used by Maryland Department of Hygiene and Mental Health

Syndrome
Death
Gastrointestinal
Neurologic
Rash
Respiratory
Sepsis
Unspecified
Other

From Sniegoski, C. (2004). Automated syndromic classification of chief complaint records. Johns Hopkins University APL Technical Digest 25:68-75, with permission.

TABLE 23.4

Syndrome Classification System Used by New York City Department of Health and Mental Hygiene

Syndrome	Includes	Excludes
Common cold	Nasal drip, congestion, stuffiness	Chest congestion, sore throat
Sepsis	Sepsis, cardiac arrest, unresponsive, unconscious, dead on arrival
Respiratory	Cough, shortness of breath, difficulty breathing, croup, dyspnea, bronchitis, pneumonia, hypoxia, upper respiratory illness, chest congestion	Cold
Diarrhea	Diarrhea, enteritis, gastroenteritis, stomach virus
Fever	Fever, chills, flu, viral syndrome, body ache and pain, malaise	Hay fever
Rash	Vesicles, chicken pox, folliculitis, herpes, shingles	Thrush, diaper and genital rash
Asthma	Asthma, wheezing, reactive airway, chronic obstructive airway disease
Vomiting	Vomiting, food poisoning

Heffernan, R., Mostashari, F., Das, D., et al. (2004b). Syndromic surveillance in public health practice, New York City. Emerg Infect Dis 10:858-64. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15200820 with permission.

A final subtle point about the definition of syndromes that applies to both chief complaint syndrome definitions and ICD-code sets described later: The field of artificial intelligence distinguishes between intensional and extensional definitions. Tables 23.2 and 23.4 are intensional definition of syndrome categories. They express the intent of the system designer. The extensional definition of each syndrome is the actual set of chief complaints that NLP parsers, trained or configured based on the intensional definitions, assign to the categories.

2.1.3. Symptoms

Some researchers have divided the one-step chief-complaint-to-syndrome assignment process into two steps. Rather than using NLP to assign a syndrome directly to a patient based on the chief complaint, they use NLP to find all of the symptoms embedded in the chief complaint. They define a syndrome as a Boolean or probabilistic combination of symptoms. The two-step process then is: (1) NLP extracts symptoms from the chief complaint, and (2) a non NLP process determines whether the symptoms satisfy a Boolean or probabilistic syndrome definition. As an example, consider the chief complaint “n/v/d.” With a two-step process, an NLP system would extract three symptoms from the chief complaint: nausea, vomiting, and diarrhea. Any syndrome definition that required nausea, vomiting, or diarrhea would be satisfied by this chief complaint. Biosurveillance systems operated by Washington State, New York State, and those using the CoCo classifier translate free text directly to syndromes; ESSENCE, SyCo2, and MPLUS classify to symptoms first.

The two-step approach has several potential advantages. It is more natural for epidemiologists and physicians, who conceive of a syndrome as a combination of symptoms. In fact, case definitions (discussed in Chapter 3) are Boolean combinations of symptoms. Additionally, it is possible to create a new syndrome definition “on the fly” without retraining a Bayesian classifier or restructuring lists of keywords. To do this, one simply defines a Boolean or probabilistic combination of symptoms. This advantage is important because new syndromes emerge with regularity, so it is important to be able to create a new syndrome definition quickly. Human biology changes very slowly, so new symptoms do not occur and the NLP conversion from free-text to symptom will be relatively stable, except as the language patients and triage nurses use to record chief complaints slowly evolves (e.g., the first time a patient uttered “I think I have SARS”).

A limitation of the two-step approach is that it has not been validated. A real concern is that users of such systems can define syndromes for which the system's sensitivity and specificity may be extremely poor. A user may create a syndromic definition that is rational from an epidemiological standpoint but is not well-suited to the input data being classified. Without deep knowledge of the underlying processing method and its assumptions, a user will be completely unaware of this phenomenon and may be falsely reassured by the absence of disease activity when the newly created syndrome is put into operational use. As an extreme example of this problem, consider that a user might define a syndrome as a Boolean conjunction (AND statement) of five symptoms. Since the average registration chief complaint comprises four words, it is almost inconceivable that any patient would match such a syndrome definition. 4

2.2. Availability and Time Latencies of Registration Chief Complaints

The chief complaints recorded at the time of registration are among the earliest data available electronically from a patient's interaction with the healthcare system. They are typically recorded before a doctor sees a patient. If the ED or a clinic is busy, many hours may pass before a registered patient is seen by a clinician.

The time latency between recording of a chief complaint and its availability to a biosurveillance organization can range from seconds to days, depending on whether the data collection system utilizes the HL7-messaging capability of a healthcare system for real-time communication or batch transfer of files ( Figure 23.2 ). Hospitals are frequently capable of real-time transmission whereas office practices are not—unless they are associated with a larger organization (e.g., the Veterans Administration, U.S. military, or a large healthcare system).

An external file that holds a picture, illustration, etc. Object name is f23-02-9780123693785.jpg

Comparison of time latencies of real-time and batch feeds. The negative time latencies associated with the real-time feed are due to slight clock differences between the biosurveillance system and the ED registration system.

Real-time transmission is possible when a healthcare system has a pre-existing Health Level 7 (HL7) messaging capability. Several publications describe the technical approach to HL7-based data collection and chief-complaint processing (Tsui et al., 2002, 2003, Gesteland et al., 2002, 2003, Olszewski, 2003a). Briefly, when a patient registers for care at an ED, a triage nurse or registration clerk enters the chief complaint into a registration system. This step is part of normal workflow in many U.S. hospitals (Travers et al., 2003). The registration system almost always transmits chief-complaints in the form of HL7 messages (Tsui et al., 2003) to an HL7-message router located in the healthcare system. To transmit these data to a biosurveillance organization, the healthcare system would configure the HL7-message router to de-identify these messages and transmit them via the Internet to a biosurveillance organization as they are received from the registration system. This configuration process is a native capability of commercial HL7-message routers and it is a routine task for an HL7 engineer or other information technology staff working in or for a healthcare system.

Batch transfer can either be automatic or manual. Automatic means that a computer program periodically queries the registration computer (or other system in which the chief complaint data are stored) for recent registrations, writes a file, and transmits the file to the biosurveillance organization via the Internet. The transmission may use a secure file transfer protocol, non secure file transfer protocol, a web transfer protocol, or PHIN MS. 5 Manual means that someone working in the healthcare system must run a query and attached the results of the query to an email to the biosurveillance organization, or upload the file to a computer in the biosurveillance organization. In the past, manual data transfer often involved faxing of paper log files.

Tsui et al. (2005) studied the time latencies, data loss, and reliability associated with real-time HL7 feeds and batch feeds. Figure 23.3 compares the distribution of time delays between the time that a chief complaint was recorded during registration and receipt of that chief complaint by a biosur-veillance system. The median time delay for a real-time feed was 0.033 hours and for batch was 23.95 hours.

An external file that holds a picture, illustration, etc. Object name is f23-03-9780123693785.jpg

Detection of outbreaks in pediatric population from chief-complaint analysis, Utah 1998–2001.

The proportion of U.S. hospitals that are capable of sending a real-time HL7 feed appears to be approximately 84% based on our experience. Table 23.5 summarizes our experience with hospitals in the United States, suggesting that many hospitals have this capability.

TABLE 23.5

Numbers of Hospitals Using Real-Time Versus Batch Connections to the RODS System, September 2005

Healthcare Facilities
Jurisdiction (project inception)	Real time	Batch	Location of Server
Pennsylvania (1999)	123	2
Utah (2002)	26	0
Ohio (2004)	50	14	Solaris/Linux/Oracle
Kentucky (2005)	0	5	(Running at the
Nevada (2005)	3	0	RODS Public Health
Atlantic City NJ (2004)	3	0	Data Center,
California (2005)	1	1	University of
Illinois (2005)	2	0	Pittsburgh)
Kentucky (2005)	0	5
Michigan (2005)	2	0
Los Angeles (2004)	1	4	Los Angeles County DOH
Houston TX (2004)	14	2	Houston DOH
Dallas Forth Worth
Area TX (2004)	16	14	Tarrant County DOH
El Paso TX (2004)	2	0	El Paso
Totals	243	47

Note: Many of these projects are statewide or citywide efforts that have an objective to connect every hospital to the health department. We provide the year of project inception to indicate the rate at which biosurveillance organizations are able to develop chief complaint data feeds from hospitals. DOH, Department of Health; RODS, Real-Time Outbreak and Disease Surveillance Laboratory, Center for Biomedical Informatics, University of Pittsburgh.

2.3. Studies of Informational Value of Chief Complaints

Researchers have studied the ability of algorithms to detect syndromes and outbreaks of disease from chief complaints. These studies contribute to our understanding of the informational value of chief complaints for the detection of cases and of outbreaks. The studies utilized experimental methods that we discussed in Chapter 21. In this section, we review these studies and discuss how they address the following three hypotheses of interest:

Hypothesis 1: A chief complaint can discriminate between whether a patient has syndrome or disease X or not (stated as the null hypothesis: the chief complaint cannot discriminate).

Hypothesis 2: When aggregated with the chief complaints of other patients in a region, chief complaints can discriminate between whether there is an outbreak of type Y or not.

Hypothesis 3: When aggregated with the chief complaints of other patients in a region, algorithmic monitoring of chief complaints can detect an outbreak of type Y earlier than current best practice (or some other reference method).

It is important to note that the experiments that we will discuss differ in many details. They differ in the hypothesis being tested; the syndrome or type of outbreak studied; the NLP method; the detection algorithm; and the reference standard used in the experiment. Thus, achieving a “meta-analytic” synthesis about the informational value and role of chief complaints in biosurveillance requires that we pay attention to these distinctions. The one thing these studies share in common, however, is that they are all studies of the informational value of chief complaints.

2.3.1. Detection of Cases

Table 23.6 summarizes the results of studies that are informative about Hypothesis 1: A chief complaint can discriminate between whether a patient has syndrome or disease X or not.

TABLE 23.6

Performance of Bayesian and other classifiers in detecting syndromes

Classifier Being Tested	Reference Standard for Comparison	Sensitivity (95% CI)	Specificity (95% CI)	Positive Likelihood Ratio (95% CI)	Negative Likelihood Ratio (95% CI)
Respiratory Syndrome
Chief Complaint	Utah Department of Health	0.52	0.89	5.0	0.54
Bayesian Classifier	(UDOH) Respiratory with fever	(0.51–0.54)	(0.89–0.90)	(4.74–5.22)	(0.52–0.56)
Respiratory
(CCBC) a
CCBC b	Human review of ED reports	0.77	0.90	7.9	0.26
(0.59–0.88)	(0.88–0.92)	(5.8–10.8)	(0.13–0.49)
CCBC a	Utah ICD-9 list	0.60	0.94	10.45	0.25 (0.13–0.49)
(0.59–0.62)	(0.94–0.95)	(9.99–10.96)
Manual Assignment c	Human review of ED reports	0.47	0.99	56.73	0.53 (0.46–0.63)
for Pediatric respiratory illness	(0.38–0.55)	(0.97–0.99)	(18.12–177.59)
CCBC d	ICD-9 list	0.63	0.94	11.14	0.39
(0.63–0.64)	(0.94–0.94)	(11.00–11.30)	(0.39–0.40)
CCBC e	Human review of ED reports	0.34	0.98	18.0	0.67
(0.30–0.38)	(0.97–0.99)	(11.24–28.82)	(0.63–0.71)
CCBC Respiratory	Human review of ED reports	0.02	0.99	20.83	0.99
and Keyword Fever e	for Febrile respiratory	(0.01–0.04)	(0.99–1.0)	(2.18–199.28)	(0.97–1.0)
Gastrointestinal (GI) Syndrome
CCBC GI a	UDOH Gastroenteritis	0.71	0.90	7.34	0.32
without blood	(0.69–0.74)	(0.90–0.90)	(6.98–7.72)	(0.29–0.35)
CCBC GI f	Human review of ED reports	0.63	0.94	7.77	0.40
for Acute infectious GI	(0.35–0.85)	(0.92–0.96)	(4.77–12.65)	(0.20–0.80)
CCBC a	Utah ICD-9 list	0.74	0.92	9.5	0.28
(0.72–0.76)	(0.92–0.92)	(9.04–9.94)	(0.26–0.30)
CCBC d	ICD-9 list	0.69	0.96	15.70	0.32
(0.68–0.70)	(0.96–0.96)	(15.46–15.95)	(0.32–0.33)
CCBC e	Human review of ED reports	0.22	0.90	2.09	0.87
(0.16–0.29)	(0.88–0.91)	(1.51–2.91)	(0.80–0.95)
CCBC GI and Keyword	Human review of ED reports	0.04	0.99	60.82	0.96
Fever e
Neurologic/Encephalitic	for Febrile GI Syndrome	(0.02–0.08)	(0.99–1.0)	(7.65–483.45)	(0.93–0.99)
CCBC Neurologic a	UDOH Meningitis/Encephalitis	0.47	0.93	4383.26 g	0.53
(0.32–0.63)	(0.93–0.94)	(1394.21–13780.56)	(0.39–0.72)
CCBC a	Utah ICD-9 list	0.72	0.95	13.5	0.29
(0.69–0.76)	(0.94–0.95)	(12.57–14.41)	(0.26–0.33)
CCBC d	ICD-9 list	0.68	0.93	9.25	0.35
(0.67–0.69)	(0.93–0.93)	(9.076–9.418)	(0.34–0.36)
CCBC e	Human review of ED reports	0.31	0.97	8.89	0.72
(0.27–0.35)	(0.95–0.98)	(6.25–12.65)	(0.68–0.76)
CCBC Neurologic and	Human review of ED reports	0.03	0.99	12.79	0.98
Keyword Fever e	for Febrile Neurologic	(0.01–0.07)	(0.99–1.0)	(2.89–56.59)	(0.95–1.0)
Constitutional Syndrome
CCBC d	ICD-9 list	0.46	0.97	13.65	0.56
(0.45–0.47)	(0.97–0.98)	(13.30–14.0)	(0.55–0.57)
CCBC e	Human review of ED reports	0.27	0.95	5.12	0.77
(0.23–0.32)	(0.93–0.96)	(3.82–6.85)	(0.72–0.82)
Rash syndrome
CCBC Rash a	UDOH Febrile illness with	0.50	0.99	55.6	0.51
rash	(0.40–0.59)	(0.99–0.99)	(44.25–69.91)	(0.42–0.61)
CCBC a	Utah ICD-9	0.60	0.99	80.9	0.40
(0.52–0.67)	(0.99–0.99)	(67.43–97.07)	(0.33–0.49)
CCBC d	ICD-9 list	0.47	0.99	65.25	0.54
(0.45–0.49)	(0.99–0.99)	(61.79–68.90)	(0.52–0.56)
CCBC e	Human review of ED reports	0.31	0.99	34.01	0.70
(0.24–0.39)	(0.98–1.0)	(18.76–61.68)	(0.62–0.78)
CCBC Rash and	Human review of ED reports	0.12	1.0	h	0.88
Keyword Fever e	for Febrile rash	(0.05–0.27)	(0.99–1.0)	(0.78–1.0)
Hemorrhagic Syndrome
CCBC d	ICD-9 list	0.75	0.99	49.01	0.25
(0.74–0.76)	(0.98–0.99)	(47.79–50.25)	(0.24–0.26)
CCBC e	Human review of ED reports	0.39	0.99	36.61	0.62
(0.34–0.44)	(0.98–0.99)	(20.96–63.93)	(0.57–0.68)
CCBC Hemorrhagic	Human review of ED reports	0.0	1.0	h	1.0
and Keyword Fever e	for Febrile hemorrhagic	(0–0.07)	(1.0–1.0)	(1.0–1.0)
Reference Standard	Sensitivity	Specificity	Positive Likelihood	Negative Likelihood
Classifier Being Tested	for Comparison	(95% CI)	(95% CI)	Ratio (95% CI)	Ratio (95% CI)
Botulinic syndrome
CCBC a	UDOH Botulism-like	0.17	0.998	104.45	0.83
(0.05–0.45)	(0.998–0.999)	(28.57–381.86)	(0.64–1.07)
CCBC a	Utah ICD-9 list	0.22	0.999	166.94	0.78
(0.13–0.36)	(0.998–0.999)	(89.07–312.90)	(0.67–0.91)
CCBC d	ICD-9 list	0.30	0.99	44.26	0.70
(0.28–0.32)	(0.99–0.99)	(41.06–47.70)	(0.68–0.72)
CCBC e	Human review of ED reports	0.10	0.99	10.96	0.91
(0.06–0.17)	(0.99–1.0)	(5.11–23.48)	(0.86–0.97)
Fever
Keyword g	Human review of ED reports	0.61	1.0	h	0.39
(0.51–0.69)	(0.96–1.0)	(0.31–0.49)

a From Wagner, M., Espino, J., Tsui, F.-C., et al. (2004). Syndrome and outbreak detection from chief complaints: the experience of the Real-Time Outbreak and Disease Surveillance Project. MMWR Morb Mortal Wkly Rep 53(Suppl):28–31, with permission.

b From Chapman, W. W., Espino, J. U., Dowling, J. N., et al. (2003). Detection of Acute Lower Respiratory Syndrome from Chief Complaints and ICD-9 Codes. Technical Report, CBMI Report Series 2003. Pittsburgh, PA: Center for Biomedical Informatics, University of Pittsburgh, with permission.

c From Beitel, A. J., Olson, K. L., Reis, B. Y., et al. (2004). Use of emergency department chief complaint and diagnostic codes for identifying respiratory illness in a pediatric population. Pediatr Emerg Care 20:355–60, with permission.

d From Chapman, W. W., Dowling, J. N., and Wagner, M. M. (2005). Classification of emergency department chief complaints into seven syndromes: a retrospective analysis of 527,228 patients. Ann Emerg Med 46(5):445–455.

e From Chapman WW, unpublished results.

f From Ivanov, O., Wagner, M. M., Chapman, W. W., et al. (2002). Accuracy of three classifiers of acute gastrointestinal syndrome for syndromic surveillance. In: Proceedings of American Medical Informatics Association Symposium, 345–9, with permission.

g From Chapman, W. W., Dowling, J. N., Wagner, M. M. (2004). Fever detection from free-text clinical records for biosurveillance. J Biomed Inform 2004;120–7, with permission.

g Large positive likelihood ratio due to specificity of 0.9999.

h Not able to calculate (denominator is zero). CCBC, Chief complaint Bayesian classifier CI, confidence interval.

Methodologically, the studies measure the sensitivity and specificity with which different NLP methods (which we refer to as “classifiers”) identify patients with a variety of syndromes using only the recorded chief complaints. The reference syndrome (“gold standard”) for patients in these studies was developed by physician review of narrative medical records, such as ED reports, or automatically from ICD-9 primary discharge diagnoses. Most studies have evaluated detection of syndromes in adults, whereas a single study examined detection of syndromes in pediatric patients (Beitel et al., 2004).

Table 23.6 groups the experiments by syndrome because many experiments studied the same or similar syndromes. Each row in Table 23.6 reports the sensitivity and specificity of a classifier for a particular syndrome, and the likelihood ratio positive and negative. The likelihood ratio positive is the purest measure of the informational content of a chief complaint for detecting a syndrome (i.e., its ability to discriminate between a person with the syndrome and one without the syndrome). In a Bayesian analysis, it is a number that indicates the degree to which a system should update its belief that a patient has the syndrome, given the chief complaint (see Chapter 13).

The gold standard used in these studies varied. The most valid standard used was classification based on review of patients' ED reports using random selection of patients. The earliest studies evaluating the ability of chief complaints to identify syndromes were able to use this gold standard because they studied common syndromes, such as respiratory (Chapman et al., 2003, Beitel et al., 2004) or gastrointestinal (Ivanov et al., 2002). When a syndrome is common, a pool of randomly selected patients will produce a sufficient sample of actual respiratory cases.

Later studies examined less common syndromes. To obtain a sufficient sample of patients with uncommon syndromes, researchers searched ICD-9 discharge diagnoses to find cases (Wagner et al., 2004, Chapman et al., 2005b, Mundorff et al., 2004). Using a patient's discharge diagnosis as the gold standard enabled these studies to acquire large numbers of patients—even for rare syndromes, such as botulinic. Chart review, however, probably provides more accurate gold standard classifications than ICD-9 codes (Chang et al., 2005).

A few recent studies have used chart review as the gold standard for evaluating a variety of syndromes, including syndromes of low prevalence (Chang et al., 2005, Chapman et al., 2005c). One study compared chief complaint classification during the 2002 Winter Olympic Games against gold-standard classification of potentially positive cases selected by Utah Department of Health employees who performed drop-in surveillance (Wagner et al., 2004).

Chapman et al. (2005c) used ICD-9 searching to find a set of patients with discharge diagnoses of concern in biosur-veillance. Physicians then reviewed ED reports for each of the cases to finalize a reference syndrome assignment. Using ICD-9 codes to select patients made it possible to use chart review on a fairly small sample of patients while still acquiring a reasonably sized set of patients for seven different syndromes.

An important issue is whether the same classification accuracy observed in a study of chief complaints from hospital X will be observed for chief complaints from hospital Y. Levy et al. (2005) showed that classification accuracy of a keyword-based parser differed from hospital to hospital for gastrointestinal syndrome. Chapman et al. (2005b), however, showed that the classification accuracy of a Bayesian chief complaint classifier was no different when it was used on a set of chief complaints from a geographic region other than the one that it had been trained on.

There are a number of studies in the literature that we did not include in Table 23.6 because they measured the sensitivity and specificity of an NLP program's syndrome assignment relative to a physician who is classifying a patient only from the chief complaint (Chapman et al., 2005a, Olszewski, 2003a, Sniegoski, 2004). These studies report much higher sensitivities and specificities than those in Table 23.6 . These studies represent formative studies of NLP algorithms. The accuracy of syndrome classification should always be measured relative to the actual syndrome of the patient as determined by a method at least as rigorous as medical record review or discharge diagnoses when accepting or rejecting Hypothesis 1 for a syndrome under study.

In summary, the experiments in Table 23.6 , although somewhat heterogeneous methodologically, are similar enough to be considered meta-analytically. They made the same measurements (sensitivity and specificity), studied similar syndromes, used simple techniques for classifying chief complaints into syndromic categories, and used similar gold standards.

With respect to Hypothesis 1, these experiments demonstrate that:

Chief complaint data contain information about syndromic presentations of patients and various NLP techniques including a naïve Bayesian classifier and keyword methods can extract that information.

For syndromes that are at the level of diagnostic precision of respiratory or gastrointestinal it is possible to automatically classify ED patients (both pediatric and adult) from chief complaints with a sensitivity of approximately 0.60 and a specificity of approximately 0.95.

Sensitivity of classification is better for some syndromes than for others.

When syndromes are more diagnostically precise (e.g., respiratory with fever), the discrimination ability declines quickly.

The specificity of syndrome classification from chief complaints is less than 100%, meaning that daily aggregate counts will have false positives among them due to falsely classified patients. 6

2.3.2. Detection of Outbreaks

This section describes studies that address Hypotheses 2 and 3, which we reproduce here for convenient reference:

Hypothesis 2: When aggregated with the chief complaints of other patients in a region, chief complaints can discriminate between whether there is an outbreak of type Y or not.

As discussed in Chapter 20, studies of outbreak detection are more difficult to conduct than studies of syndrome (case) detection. These studies require chief complaint data collected from multiple healthcare facilities in a region affected by an outbreak. Research groups often expend significant time and effort building biosurveillance systems to collect the chief complaint data needed to conduct this type of research. Because outbreaks are rare events, many of the studies we will discuss are of common diseases such as influenza or of seasonal outbreaks (winter) that may be caused by multiple organisms.

Adequate sample size (of outbreaks) is difficult to achieve in studies of outbreak detection. Only one of the studies computed a confidence interval on its measurements of sensitivity or time of outbreak detection. The scientific importance of adequate sample size cannot be overstated. There are two possible approaches to increasing sample size: (1) conduct research in biosurveillance systems that span sufficiently large geographic regions that they are expected to encounter sufficient numbers of outbreaks, or (2) meta-analysis. To emphasize the importance of adequate sample size, we divide this section into studies of multiple outbreaks (labeled N>1), prospective studies, and studies of single outbreaks (N=1).

N>1 Studies.

Ivanov and colleagues used the detection-algorithm method and correlation analysis to study detection of six seasonal outbreaks in children from CoCo classifications of patients into respiratory and gastrointestinal based on chief complaints (Ivanov et al., 2003). They studied the daily visits to ED of a pediatric hospital during annual winter outbreaks due to diseases such as rotavirus, gastroenteritis and influenza for the four-year period 1998–2001.

The researchers identified outbreaks for study using the following procedure: They created two ICD-9 code sets, corresponding to infectious diseases of children that are respiratory or gastrointestinal, and used them to create two reference time series from ICD9-coded hospital discharge diagnoses. Figure 23.3 (top) from the publication is a plot of the daily respiratory time series and the reference time series of respiratory disease created from hospital discharge diagnoses of children under age five. Figure 23.3 (bottom) is a similar comparison of gastrointestinal time series and the reference time series of infectious gastrointestinal conditions.

The detection algorithm (exponentially weighted moving average) identified three respiratory and three gastrointestinal outbreaks in the hospital discharge data (the reference standard for outbreaks). The detection from chief complaints preceded detection from automatic analysis of hospital discharge diagnoses by a mean of 10.3 days (95% confidence interval [CI], 15.15–35.5) for respiratory outbreaks and 29 days (95% CI, 4.23–53.7) for gastrointestinal outbreaks ( Table 23.7 ). The researchers used the date of admission rather than the date of discharge in constructing the reference time series.

TABLE 23.7

Detection Algorithm Analysis of Timeliness of Detection from Chief Complaints

Syndrome	Gold Standard Outbreak	Sensitivity	Specificity	Timeliness (95% CI)
Respiratory	Seasonal outbreaks of pediatric respiratory illness (bronchiolitis, P&I)	100%	100%	10 days (–15–35)
Gastrointestinal	Seasonal outbreaks of pediatric gastrointestinal illness (rotavirus gastroenteritis)	100%	100%	29 days (4–53)

CI, confidence interval; P&I, pneumonia and influenza.

The correlation analysis of three respiratory outbreaks showed that on average the chief complaint time series was 7.4 days earlier (95% CI, 8.34–43.3), although the 95% CI included zero. For the three gastrointestinal outbreaks, the chief complaint time series was 17.6 days earlier (95% CI, 3.4–46.7).

Prospective Studies.

Prospective studies are field evaluations of a biosurveillance system. In a prospective evaluation, the detection algorithms are operated at a fixed detection threshold for an extended period and the ability of the biosurveillance system to detect known outbreaks or to identify new outbreaks is measured.

Heffernan et al. (2004b) used the detection-algorithm method prospectively to study respiratory and fever syndrome monitoring in New York City (Heffernan et al., 2004b). They studied the New York City Department of Health and Mental Hygiene (DOHMH) syndromic system for the one-year period November 2001-November 2002. Note they also report the DOHMH one-year experience monitoring diarrhea and vomiting, however, the paper by Balter, which we discuss next, included that year in a three-year analysis, so we do not discuss it here.

In New York City, EDs transmit chief complaints to the DOHMH on a daily basis as email attachments or via FTP. The researchers estimated that the DOHMH system received chief complaint data for approximately 75% of ED visits in New York City. The NLP program was a keyword-based system that assigned each patient to exactly one syndrome from the set: common cold,sepsis/dead on arrival,respiratory,diarrhea, fever, rash,asthma,vomiting, and other ( Table 23.4 ). The NLP program was greedy, which means that the algorithm assigned a patient to the first syndrome from the list of syndromes whose definition was satisfied and did not attempt further assignment.

DOHMH used the detection-algorithm method to identify potential outbreaks from daily counts of respiratory and fever. They used a univariate detection algorithm on data aggregated for the entire city (citywide), and spatial scanning for data aggregated by patient home zip code and by hospital (separate analyses).

The citywide monitoring of respiratory found 22 above-threshold anomalies (called signals), of which the researchers stated that 14 (64%) occurred during periods of peak influenza activity. The first citywide signal occurred in December 2001 and it was followed by additional signals in both respiratory and fever signals on the six successive days. The authors commented that these signals coincided with a sharp increase in positive influenza test results, but did not report a correlation analysis. They also commented that the reports of influenza-like illness (ILI) from the existing sentinel physician ILI system showed increases three weeks after the first signal. Three other respiratory signals occurred during periods of known increases in asthma activity. The remaining five signals occurred during periods of increasing visits for respiratory disease. Thus, there were no signals that could not be attributed to known disease activity.

The citywide monitoring of fever generated 22 signals, of which 21 (95%) occurred during periods of peak influenza activity.

The hospital monitoring of respiratory and fever produced 25 signals. The home zip code monitoring of these two syndromes produced 18 signals. Investigations of these 43 (25+18) signals found no associated increases in disease activity.

Balter and colleagues analyzed the DOHMH three-year experience (November 2001-August 2004) monitoring diarrhea and vomiting using the same biosurveillance system as described in the previous paragraphs (Balter et al., 2005). The authors estimate that by the end of the study period, the monitoring system received data for approximately 90% of ED visits in New York City.

During the three years, the DOHMH system signaled 236 times (98 citywide and 138 hospital or zip code) for diarrhea or vomiting. Of 98 citywide signals, 73 (75%) occurred during what the authors referred to as “seasonal” outbreaks likely due to norovirus (fall and winter) and rotavirus (spring). One citywide signal after the August 2003 blackout was believed to have represented a true increase in diarrheal illness. Their investigations of the 138 hospital or zip code signals found no increased disease activity.

During the same period, DOHMH investigated 49 GI outbreaks involving ten or more cases; none of which were detected by monitoring of diarrhea or vomiting. In 36 of these outbreaks, few or no patients went to EDs. In two outbreaks, the victims were visitors to New York City who returned to their homes before onset of symptoms. In three outbreaks, victims visited EDs not participating in the monitoring system. In three outbreaks, victims visited EDs over a “days or weeks” (the algorithms used by DOHMH were sensitive to rapid increases, not gradual increases in daily counts of syndromes). In two outbreaks, the victims presented to the ED as a group and their chief complaints were recorded by reference to the group (e.g., “school incident”). In two outbreaks, a combination of the above causes explained the failure.

N=1 Studies.

Irvin and colleagues (Irvin et al., 2003) used the detection-algorithm method to retrospectively study the ability of their anthrax syndrome to detect a single influenza outbreak. The paper is not explicit about the anthrax syndrome, but states, “The presence of any of the following symptoms were sufficient to categorize a patient into anthrax: cough, dyspnea, fever, lethargy, pleuritic chest pain, vomiting, generalized abdominal pain, or headache,” suggesting that the researcher included symptoms with which pulmonary anthrax may present. They studied an atypical monitoring system based on numeric chief-complaint codes from a commercial ED charting system. This charting system, called E/Map (Lynx Medical Systems, Bellevue WA, http://www.lynxmed.com), offers clinicians charting templates for approximately 800 chief complaints. Each template has a numerical code. A clinician's selection of charting template reflects the patient's chief complaint. The detection algorithm used a fixed detection threshold set at two standard deviations from a recent two-month mean. The algorithm signaled when two of the previous three days exceeded the threshold. The reference standard was the Centers for Disease Control and Prevention (CDC) defined peak week of influenza activity. The system signaled one week prior to the CDC peak and signaled one false positive.

Yuan et al. (2004) used the detection-algorithm method to study the timeliness of detection of one influenza outbreak in southeastern Virginia. They manually assigned chief complaints to seven syndromes (fever,respiratory distress,vomiting, diarrhea, rash, disorientation, and sepsis). The detection algorithm was CUSUM, operated at three different moving averages (7-day window, window days 3-9, and 3-day window) and set at a threshold of 3 S.D. They reported that the CUSUM algorithm detected trends in fever and respiratory in one hospital that preceded the local sentinel influenza surveillance system by one week.

A key limitation of N=1 studies is that any correlation found may be spurious. Meta analysis could address this problem if differences among analytic and reporting methods used by studies were reduced so that studies of single outbreaks could be merged analytically. In 2003, the RODS Laboratory developed a case-study series that encourages the use of a standard method of studying single outbreaks that would enable the application of uniform analytic methods across outbreaks (or alerts) occurring in different regions (Rizzo et al., 2005). The objectives of the case report series are to: (1) ensure complete description of outbreak and analytic methods, and (2) collect the raw surveillance data and information about the outbreak in a way that future re-analyses are possible.

Each case study describes the effect of a single outbreak or other public health event, such as low air quality due to forest fires, on surveillance of data available for the event. At present, these case studies are available only to authorized public health users of the NRDM system because of legal agreements with organizations that provide surveillance data (employees of governmental public health organizations can access case studies through the RODS interface by sending e-mail to nrdmaccounts@cbmi.pitt.edu).

Of the 15 case studies developed to date, eight are examples of outbreaks considered “detectable” from available surveillance data; six were not detectable. These case studies include outbreaks of influenza, salmonella, norovirus, rotavirus, shigella, and hepatitis A. One case study describes a false alarm investigation that resulted from a retailer recording error.

Figure 23.4 is taken from a case study of a large spike in CoCo respiratory cases in a single county outside Pittsburgh that resulted in an alert being sent automatically on Friday July 18, 2003 at 8 PM to an on-call epidemiologist. Normally, daily counts of respiratory cases numbered 10, but on that day they numbered 60 by 8 PM. The epidemiologist logged into the RODS web interface, reviewed the verbatim chief complaints of affected patients and discovered that the cases were related to carbon-monoxide exposure, which a phone call to an ED revealed to be related to a faulty furnace at a day-care center.

An external file that holds a picture, illustration, etc. Object name is f23-04-9780123693785.jpg

Daily counts of respiratory cases, Washington County, Pennsylvania, June–July 2003. The small increase in early June 2003 corresponds to new hospitals being added to the surveillance system.

The case studies include three studies of the effect of influenza on emergency room visits for the CoCo constitutional and respiratory syndromes. Figure 23.5 illustrates the size of the influenza effect in 2003-2004 in Utah (middle spike) on constitutional, and respiratory, as well as sales of thermometers by pharmacies participating in the National Retail Data Monitor.

An external file that holds a picture, illustration, etc. Object name is f23-05-9780123693785.jpg

Daily counts of constitutional and respiratory syndrome. January 2003–September 2005. The largest spikes correspond to the 2003–2004 influenza outbreak, which was more severe (involving more people) than the previous or following year's outbreak.

These case studies add to the previously described studies the following: Influenza has a strong early effect on free text chief complaints in the constitutional and respiratory categories. Air pollution and small carbon monoxide events may have marked effects on chief complaints in the respiratory category. The results for gastrointestinal outbreaks have been negative for relatively small, protracted outbreaks of Norovirus and Shigella.

Summary of Studies of Outbreak Detection from Chief Complaints.

With respect to Hypotheses 2 and 3, the studies we reviewed demonstrate that:

Some large outbreaks causing respiratory, constitutional, or gastrointestinal symptoms can be detected from aggregate analysis of chief complaints. Small outbreaks of gastrointestinal illness generally cannot (Hypothesis 2).

Research to date is suggestive but not conclusive that influenza can be detected earlier by chief complaint monitoring than current best practice (Hypothesis 3).

The false-alarm rates associated with such monitoring can be low. In New York City for city-wide monitoring of respiratory and fever, there were few signals that did not correspond to disease activity. There were more signals that did not correspond to disease activity from monitoring of diarrhea and vomiting. Conversely, all of the signals from spatial monitoring of hospital or zip code were not correlated with known disease activity.

The methodological weaknesses in the studies included failure to describe or measure time latencies involved in data collection. Some studies did not report sampling completeness, the method by which chief complaints are parsed, or details of the syndrome categories.

In general, the number of published studies is small, perhaps due to the fact that chief complaint monitoring systems are still being constructed. We expect more studies to be published in the near future.

The answers to Hypotheses 2 and 3 for surveillance of chief complaints in isolation may not be as important long term as the question of whether chief complaints contain diagnostic information (Hypothesis 1). The reason is that chief complaints can be used with other surveillance data to detect outbreaks, either through linking at the level of the individual patient or as a second source of evidence. Nevertheless, because of their availability and earliness, and the threat of bioterrorism and large outbreaks, it is important to understand the ability to detect outbreaks solely from this type of data.

3. ICD CODES

The International Classification of Diseases, 9th Revision, Clinical Modification (ICD) is a standard vocabulary for diagnoses, symptoms, and syndromes (see Chapter 32). ICD has a code for each class of diagnoses, syndromes, and symptoms that it covers. For example, the ICD code 034.0 Streptococcal sore throat includes tonsillitis, pharyngitis, and laryngitis caused by any species of Streptococcus bacteria. There are more than 12,000 ICD codes. Internationally, some countries use the 10th revision of the International Classification of Diseases, or a modification of it.

Data encoded using ICD are widely available in the United States. Most visits to physicians or other healthcare providers and hospitalizations result in one or more ICD codes. The reason is that healthcare insurance corporations require providers of care to use ICD codes when submitting insurance claims to receive reimbursement for their services.

ICD codes range in diagnostic precision from the very imprecise level of symptoms to very precise diagnoses. There are precise codes for infectious diseases, specifying both the causative organism and the disease process (e.g., 481 Pneumococcal pneumonia). However, there are less precise codes that providers can use if the organism is unknown or not documented (e.g., 486 Pneumonia, organism unspecified). There are also ICD codes for syndromes (e.g., 079.99 is the code for Viral syndrome) and even for symptoms (e.g., 786.2 Cough and 780.6 Fever).

ICD codes may be assigned at different times during the course of care ( Figure 23.1 ). As you go from left to right in Figure 23.1 , who assigns the ICD code, how, and when vary. Physicians, when they do assign ICD codes to office or ED visits, usually do so during or within hours to days of the visit. They either enter ICD codes into a point-of-care system or record them on an encounter form (also sometimes known as a “superbill”). Professional coders usually assign the final, billing ICD codes for ED and office visits days later. They also assign ICD codes to hospital discharge diagnoses, typically days to weeks after the patient leaves the hospital. Professional coders often enter ICD codes into specialized billing software. ICD-coded data from organizations that collect large volumes of insurance claims data (we discuss these “data aggregators” in more detail below) are usually not available for months after visits or hospital stays.

The diagnostic precision of ICD codes generally increases with time, as you go from left to right in Figure 23.1 . The reason that discharge diagnoses generally have higher diagnostic precision relative to visit diagnoses is that discharge diagnoses typically represent the outcome of a greater amount of diagnostic testing that leads to greater diagnostic certainty (i.e., providers are more likely to order—and have the results available from— laboratory tests, microbiology cultures, x-rays, and so on).

Health services researchers have established that the accuracy of ICD-coded data is highly variable and often only moderately high (O'Malley K et al., 2005, Hsia et al., 1988). They have identified several causes for inaccuracy (Peabody et al., 2004, O'Malley et al., 2005). One cause is that two different, highly trained, experienced coders may assign different codes to the same hospitalization (Fisher et al., 1992, Lloyd and Rissing, 1985, MacIntyre et al., 1997). One reason is that coders work from the patient chart, which is an imperfect representation of the patient's true medical history and is subject to variable interpretation. Professional coders are typically not physicians or nurses, so their level of understanding of the medical process is imperfect. Finally, the rules for assigning codes are complex and change at least annually. 7

The problem of correct assignment of ICD codes is compounded when clinicians encode the diagnoses (Yao et al., 1999). Clinicians rarely have formal training in the rules for assigning codes. They typically have little time to ensure that the codes they assign are accurate. They often view the assignment of ICD codes to patient encounters as a distraction from patient care. To address these problems, physicians often use preprinted encounter forms that have check boxes for an extremely small subset of commonly used codes. Although these forms typically include a blank space to write in additional ICD codes, clinicians are extremely busy so an open question is how often they use the space and how accurate are the ICD codes that are hand entered. Another question is whether busy clinicians, who do not use the data on the encounter form for subsequent patient care, completely code all diagnoses made during a patient visit. One study found that, during patient visits, physicians addressed an average 3.05 patient problems but documented only 1.97 on billing forms (they documented nearly as many problems in the paper record as they addressed) (Beasley et al., 2004).

ICD codes, because of their inaccuracy and the fact that their primary use and purpose is billing, are likely to be less than ideal when used for other purposes. One study found low accuracy of billing data about cardiac diseases relative to a clinical research database (Jollis et al., 1993). Another study found that one-third of patients who received an ICD code that indicated the presence of a notifiable disease did not truly have the notifiable disease (Campos-Outcalt, 1990). A third study found that data about prescriptions identified patients with tuberculosis more accurately than all 60 ICD codes for tuberculosis combined (Yokoe et al., 1999).

ICD codes might be less ideal for biosurveillance than other coding systems such as SNOMED-CT. The designers of ICD did not design it with biosurveillance requirements in mind. One study found that SNOMED-CT was superior to ICD for coding ED chief complaints (McClay and Campbell, 2002). SNOMED-CT had a term that was a precise match for 93% of chief complaints; ICD had a precise match for only 40% of chief complaints.

In summary, billing ICD codes from insurance claims and hospital discharge data sets are widely available, but at long time latencies (weeks to months). ICD codes at shorter time latencies ICD (within 24 hours of ED or office visit) are less available. Who assigns ICD codes and when and how influence the time latency, diagnostic precision, and accuracy of ICD-coded data. Thus, it is essential that studies describe the process that generated the ICD codes and measure time latency.

3.1. Categories of ICD Codes (“Code Sets”)

Despite the potential for high diagnostic precision, biosurveillance researchers and developers group ICD codes into categories (“code sets”) such as “respiratory illness.” The set of all 60 ICD codes for tuberculosis mentioned above is an example of an ICD code set. It is not necessary to group ICD codes into codes sets, although it is almost always done. For example, we could monitor for the single ICD code for inhalational anthrax (022.1). Creators of code sets usually group codes of diseases and syndromes that share similar early presentations to form syndrome code sets. Respiratory, gastrointestinal, neurological, rash and febrile illnesses are representative of code sets in common use.

A key reason that developers create code sets is to improve the sensitivity of case detection, because patients with the same disease may be assigned different ICD codes. This variability may be due to variability in how coders assign codes or that the patients are at different stages in their diagnostic work-ups. For example, a patient with influenza who has not yet undergone definitive testing may be coded as 780.6 Fever (or any of a number of other ICD codes for symptoms of influenza), 079.99 viral syndrome, 465.9 Acute upper respiratory infection NOS, or 486 Pneumonia, organism unspecified.

The most difficult and “art-more-than-science” aspect of ICD-code monitoring is development of code sets. The next sections describe several code sets used in biosurveillance. Our purpose is to illustrate how a code set is developed. It is important to note that one code set could be superior (i.e., have better case detection and/or outbreak detection performance) to others for the detection of one disease (e.g., influenza), but inferior to other code sets for the detection of another disease (e.g., bronchiolitis due to respiratory syncy-tial virus). To date, only one study (Tsui et al., 2001) has compared the accuracy of two alternative code sets to determine their differential ability to detect the same set of cases or outbreaks. That study lacks generalizability because no other research groups have found the data they studied—ICD codes for chief complaints assigned by registration clerks—to be available at other institutions. 8 Therefore, which code sets are better than others for the detection of various outbreaks such as influenza or cryptosporidiosis remains unknown.

3.1.1. ESSENCE Code Sets

The Department of Defense (DoD) developed the first ICD code sets for use in biosurveillance as part of its Global Emerging Infections System (DoD-GEIS). The ESSENCE biosurveillance system 9 uses these code sets (DoD-GEIS, 2005) to aggregate ambulatory visits at DoD outpatient clinics into seven syndromes ( Table 23.8 ). A number of researchers and system developers have used these code sets, or slight modifications of them, in their work (Lazarus et al., 2002, Reis and Mandl, 2004, Lazarus et al., 2001, Lewis et al., 2002, Mocny et al., 2003, Magruder et al., 2005, Buckeridge et al., 2005, Henry et al., 2004).

TABLE 23.8

The Seven ESSENCE ICD Code Sets