Hlt.utdallas.edu
Clinical Data-Driven Probabilistic Graph Processing
Travis Goodwin and Sanda Harabagiu
Human Language Technology Research Institute
University of Texas at Dallas
Richardson, TX 75083-0688, USA
Electronic Medical Records (EMRs) encode an extraordinary amount of medical knowledge. Collecting and interpreting this knowledge,however, belies a significant level of clinical understanding. Automatically capturing the clinical information is crucial for performingcomparative effectiveness research. In this paper, we present a data-driven approach to model semantic dependencies between medicalconcepts, qualified by the beliefs of physicians. The dependencies, captured in a patient cohort graph of clinical pictures and therapies isfurther refined into a probabilistic graphical model which enables efficient inference of patient-centered treatment or test recommendations
(based on probabilities). To perform inference on the graphical model, we describe a technique of smoothing the conditional likelihood of
medical concepts by their semantically-similar belief values. The experimental results, as compared against clinical guidelines are verypromising.
Keywords: Information Retrieval, Bioinformatics, Patient Cohort
the assertions formulated by physicians when discussing anyof the medical concepts.
An increasing abundance of clinical data is available through
The 2010 i2b2/VA challenge evaluated the task of automati-
massive warehouses of Electronic Medical Records (EMRs).
cally inferring six types of assertions, or belief states, used
Both within the United States and across the world, hospitals
to qualify medical problems in EMRs
generate millions of EMRs each year. These EMRs include
However, those assertions correspond to clinical information
rich clinical information, consisting of detailed notes on
found in only one type of EMR: discharge summaries. Be-
patients' medical history, physical exam findings, lab re-
cause we consider more types of EMRs, we have extended
ports, radiology reports, operative reports, and discharge
the problem of classifying medical assertions by consider-
summaries. Clinical information contains multiple men-
ing additional types of assertions. The new assertion values
tions of medical problems, including observations resulting
were selected based on discussions with practicing clini-
from a physical exam (known as signs), features that the
cians, and by following the guidelines outlined in
patient observed first-hand (known as symptoms), historical
and present medical problems (known as co-morbidities), in
Medical concepts and their assertions were cast as nodes
addition to diagnostic information. We have used the onto-
in a graph which encodes a patient's clinical picture and
logical definitions of medical concepts related to diseases
therapy along with the potential dependencies between
outlined in to capture the seman-
them. We called this graph the clinical graph (CG). As
tics of clinical information. Hence, we have considered the
in the clinical picture is defined
fact that EMRs also document the medical interventions per-
as the clinical which contains the clinical findings
formed during the patient's hospital stay, including medical
(e.g. medical problems, signs, symptoms and tests). Like-
tests and their results, as well as all the medical treatments
wise, we use Scheuermann's definition of therapy as all the
performed as part of the patient's therapy. These forms of
treatments, cures, and preventions included within the man-
clinical information are crucial for performing comparative
agement plan for an individual patient. Figure illustrates
effectiveness research. As shown in
our representation of the CG for a patient. Given the pa-
capturing the clinical information from EMRs enables the
tient's hospital visit, we automatically discover the medical
discovery of alternative methods to prevent, diagnose, treat,
problems along with the tests and treatments documented
or monitor a medical problem.
during the patient's hospital course. Medical problems, tests,
It has been shown that clinical information – medical con-
and treatments are qualified by their assertions and con-
cepts (e.g. problems, tests and treatments) – can be automat-
nected by their dependencies (e.g. when cellulitis was a
ically identified from clinical texts, as described in
present diagnostic, a blood culture test was conducted).
However, because medical science centers
Moreover, as reported in the
around asking hypotheses, experimenting with new methods
clinical picture may vary widely between patients with the
of care, and evaluating medical evidence, medical concepts
same disease and even for the same patient during the course
are associated with different degrees of belief, or assertions.
of his or her diseases. Therefore, in order to capture the vari-
As such, clinical writing entails a large number of specula-
ation in the corresponding clinical graphs (CGs), we have
tive statements indicating the physician's belief at the time,rather than strictly quantifying a fact. In order to take into
1While the clinical phenotype refers to the set of observations
account the physicians' beliefs when automatically process-
related to a medical condition, the clinical phenome is the set of
ing the clinical information from EMRs, we also recognized
observations pertaining to a single patient.
Hospital Visit
Medical Problems
Diagnostic & Co-Morbidities
Treatments
Atrial Fibrillation
Figure 1: The Clinical picture & therapy Graph (CG).
Hospital Visits
Clinical Picture & Therapy
Medical Problems
Treatments
Clinical Picture
& Therapy
Medical Problems
Treatments
Clinical Picture
& Therapy
Medical Problems
Treatments
Clinical Picture
& Therapy
Medical Problems
Treatments
Clinical Picture
& Therapy
Medical Problems
Treatments
Figure 2: The combined Cohort Clinical Graph (CCG).
considered a patient cohort which we obtained by using the
k-partite graph (where k = 4) because there are four types of
system reported in Patient
nodes (V, M, E and R), as illustrated in Figure It is to be
cohort retrieval results in an ordered set of hospital visits
noted that the edges from the CCG originate from the CGs
which correspond to a cohort of patients sharing the same
of patients from the cohort. We also noticed that, crucially,
diagnosis (e.g. patients with As illustrated in Fig-
the CCG can also be viewed as a factorization of a Markov
ure this enabled us to access all the clinical pictures and
network. In this way, we were able to transform the CCPT
therapies from all the clinical graphs (CGs) of all patients
into a probabilistic graphical model. Probabilistic graphical
within a cohort. This clinical information regarding a patient
models are known to be a
cohort constitutes the set of all hospital visits (V), the set of
state-of-the-art representation for producing probabilistic
all medical problems (M), the set of all medical tests (E),
inference, which we used for finding recommendations for
and the set of all treatments (R), across the CGs of all the
the most adequate tests or treatments for a patient, given
patients belonging to the cohort. We refer to the graph that
inference on the CCG.
combines all CGs as the Cohort Clinical Graph (CCG).
The remainder of this paper is organized as follows. In Sec-
Given a patient cohort, the corresponding CGG was cast as a
tion 2, we describe the clinical language processing requiredfor generating the CGs. Section 3 describes the construction
2Abscess is an infectious disease of the skin and soft tissue.
of the CCG, as well as how it can be transformed into a prob-
abilistic graphical model. Section 4 presents the inference
well as our own set of 2,349 EMR annotations. As illus-
mechanisms we considered and how they may be used for
trated in Figure we incorporated knowledge from many
clinical test and treatment recommendation. Section 5 dis-
lexico-semantic resources. In this research, we used the
cusses the experimental results, and Section 6 summarizes
feature set reported in Addi-
the conclusions.
tionally, we have normalized the detected medical conceptsby (1) converting the surface string to lowercase, (2) filtering
Medical Language Processing
words belonging to words, and (3) ignoringword order.
Open-source software, such as MetaMap or,more recently, cTakes can parse EMRs
Medical Assertion Classification
to determine concept unique identifiers (CUIs) which corre-
In order to encode the medical knowledge from EMRs with
spond to entries in the Unified Medical Language System
the clinical graph (CG) of each patient, we needed to au-
(UMLS) However, UMLS includes
tomatically qualify each medical concept with one of the
many concepts that were authored according to ontological
assertions given in Table We performed this automatic
principles and, thus, it is too fine-grained for our purpose
classification using an SVM classifier which considers in-
of data-driven probabilistic processing of EMRs. In select-
formation from: (a) the medical concept to be classified,
ing a conceptual representation, we also evaluated the more
(b) the section header where the assertion is implied, (c)
general frameworks developed by the i2b2/VA challenge in
features available from UMLS (extracted by MetaMap), (d)
2010 This framework was designed
features reflective of negated statements, disclosed through
to detect medical concepts within clinical text and assign
the NegEx negation detection package, and (e) belief values
one of several distinct assertions indicating the state of the
are available from the Harvard General Inquirer's category
author's belief for each concept. This i2b2 challenge helped
information Additional details of the
popularize the notion that recognizing medical concepts
automatic assertion identification techniques are provided
alone is not sufficient for clinical reasoning, because, when
medical concepts are used in clinical texts, physicians alsoexpress their belief state about such concepts, e.g. that a
Generating the Graphical Model
medical problem is present or absent, that a treatment is con-
For clinical decision support, it is critical to analyse the
ditional on a test. The i2b2 challenge, however, considered
relationships between medical problems, medical tests, and
assertions only for medical problems. In our aim to build the
associated treatments across patients' hospital visits. As
CCG, we have extended the problem of assertion classifica-
such, we must move beyond merely identifying the textual
tion in two ways: (1) we have produced assertions (or belief
mentions of medical concepts and their associated belief
values) for all medical concepts (including treatments and
values. To this end, we present a framework for modelling
tests) that we have automatically identified; and (2) we have
the data-driven interactions between problems, treatments,
introduced 6 additional values which are defined in Table
and tests. We first create a CG in which connections betweenmedical concepts are not only inferred, but their strength
Medical Concept Recognition
is also quantified by a weight. Because of the economy
To recognize the nodes of the CCG, we have partitioned
of language, relations between medical concepts are rarely
medical concepts within three categories: (1) medical prob-
explicitly stated, but they are rather implied. To capture
lems (e.g. ATRIAL FIBRILLATION – an irregular heart beat);
these implications, we postulate that co-occurrence statistics
(2) medical treatments (e.g. ABLATION – the removal of
can inform these relations, and further that they can also
undesired tissue); and (3) medical tests (e.g. ECG – an elec-
inform the strength of these relations.
trocardiogram). We detect these medical concepts using the
After we create complete CGs, we can then transform the
methods reported in Further,
combined CGs for a cohort of patients (the CCG) into a
we distinguish three sub-classes of medical problems: (a)
probabilistic graphical model.
signs (observations from a physical exam), (b) symptoms
(observations by the patient), (c) co-morbidities (diseases or
Inferring Edges in the Cohort Clinical
disorders), and (d) the diagnostic. Our method recognizes
medical concepts in three steps:
The nodes of the CCG are automatically discovered by the
Step 1: Identification of the boundaries within text that
language processing techniques described in Section 2. In
refers to a medical concept;
addition, we needed to infer the edges of the CCG and the
Step 2: Classification of the medical concept into (1) medi-
weights of the edges indicating semantics used in the clinical
cal problems, (2) medical treatments, or (3) medical
picture and therapy ontological definition. The observations
from the clinical picture of a patient connected hospital
Step 3: Classification of medical problems into (a) signs,
visit (or nodes from V) to the observed medical problem (or
(b) symptoms, (c) co-morbidities, or (d) diagnos-
nodes from M) generating edges of type TVM. In the clin-
ical picture of patients, connections between the observed
Medical concepts were recognized both within the narrative
(i.e. report text) and structured sections (e.g. CHIEF COM-
3In linguistics, a closed-class of words is a class of words for
PLAINT) of EMRs. To do this, we used two conditional
which new words are rarely introduced, for example pronouns,
random fields (CRFs), trained on the i2b2 annotations as
determiners, prepositions, etc.
the patient's past medical history is signif-
occurred during a previous hospital visit
icant for CONGESTIVE HEART FAILURE
readmit him for REHAB once the WOUND
occurs only during certain conditions
she was given ROCEPHIN and ZITHRO-
has been assigned and will occur
the patient denies any CHEST PAIN at this
was recommended that he be on ALLOP-
has been advised, but cannot be assumed to occur
there is a moderate PERICARDIAL EFFU-
is currently happening
may occur in the future
she is to return for any WORSENING PAIN
we will do a PULMONARY FUNCTION
has been scheduled and will occur in the future
not associated with the patient
father died of LUNG CANCER
I believe that this may represent worsen-
may occur, but there is uncertainty
ing for PULMONARY HYPERTENSION
currently exists and can be assumed to persist
continue DIALYSIS
has been performed and completed
UNASYN 3 GRAMS IV was given
Table 1: Assertion values for medical concepts (typeset in SMALLCAPS) in each excerpt; "moment" refers to the specific
instant when the medical concept was mentioned. Newly defined assertions are marked with an ‘*'.
Medical Concept
Type Classifier
Prose Concept
Pattern-based Entity
Boundary Detector
Non-prose Concept
Problem Test Treatment
Boundary Detector
Section Header Extractor
External Resources for Concept Classification
Medical Assertion
Lemmas, Part-Of-Speech
Tags, Phrase Chunks,
PropBank –Based
External Resources for Assertion Classification
Figure 3: Language processing used for constructing the CGs and CCG.
medical problems (i.e. nodes from M) and results of tests
in the CCG between medical problems (i.e. nodes from M)
(i.e. nodes from E) exist as well, giving rise to edges of
and treatments (nodes from R), generating edges of type
type TME in the CCG. In addition, connection between both
TMR. Similarly, we have edges between tests (i.e. nodes
types of nodes (medical problems and tests) in the clinical
from E) and treatments (nodes from R), generating edges of
picture and therapies exist. Thus, we shall also have edges
type TER. The weight of edges of each type is computed as
the partition function, as given in Equation
• The weight of an edge of type TVM between a visit
v ∈ V and a medical problem m ∈ M is computed
as the number of EMRs associated with v which also
1(v, m)Φ2(m, e)Φ3(m, r)Φ4(e, r)
• The weight of an edge of type TME between a medical
Probabilistic Inference
problem m ∈ M and test einE is computed by the
By modelling the CCG as a probabilistic graphical model,
number of EMRs in which both m and e co-occur
we have gained access to an incredible breadth of proba-
(regardless of the patient).
bilistic information through the power of probabilistic infer-
• The weight of an edge of type TMR between a medical
ence. We can use this probabilistic information to construct
problem m ∈ M and treatment rinR is computed by
a recommendation engine enumerating the most probable
the number of EMRs in which both m and r co-occur
treatments for a given patient given their medical problems
(regardless of the patient).
and/or their medical tests.
• The weight of an edge of type TER between a test
We can use this joint distribution to calculate posterior prob-
e ∈ E and treatment rinR is computed by the number
ability of conducting a medical test during a particular pa-
of EMRs in which both e and r co-occur (regardless of
tient's hospital visit (i.e. P (E = e V = v)) as shown in
the patient).
The Probabilistic Graphical Model
Φ2(e, m)Φ1(v, m)
In Section 3.1 we presented a co-occurrence-based method
of building a cohort clinical graph (CCG). The observation
Likewise, we can infer the posterior distribution of med-
that this graph is in fact a k-partite graph (where k = 4)
ical treatments for a given set of N medical problems,
enables us to build the factorized Markov network illustrated
m0, m1, . . , mN ∈ M , as the conjunction of each prob-
in Figure which we call the Clinical Markov Network
lem's posterior distribution, as shown in Equation
Φ3(mi, r)Φ2(mi, r)
Although this straightforward approach yields precise re-
sults, it suffers from significant sparsity problems induced
by our decision to qualify all medical concepts by the physi-cian's belief state. Rather than restricting ourselves to theinteractions between concepts exactly matching the speci-
fied belief states (e.g. the likelihood that a test is conductedgiven than a problem is present), we also consider the inter-action between the same concepts with semantically similar
Figure 4: The factorized Clinical Markov Network (CMN).
belief states (e.g. suggested, ordered, prescribed, condi-tional). For example, consider that assertions ONGOING andCONDUCTED both imply a strong degree of certainty that
In the CMN, we assume that each vertex class (V, M,
the medical concept occurred and are likely to have simi-
E, or R) represents a distinct random variable in the in-
lar semantic relationships despite having different temporal
duced Markov network. Similarly, each of the four types of
groundings. Thus, they are semantically coherent. Based
weighted edges (TVM, TME, TMR, TER) have associated four
on this observation, we introduce an assertion smoothing
different factors to indicated the strength of the edge in the
factor, S, that encodes the degree to which two assertions
are semantically coherent, as given in Equation
• Φ1(v, m) = weight of edge {v, m} ∈ TVM• Φ2(m, e) = weight of edge {m, e} ∈ TME•
3(m, r) = weight of edge {m, r} ∈ TMR
4(e, r) = weight of edge {e, r} ∈ TER
This factorization allows us to perform efficient probabilis-
tic inference by defining the joint probability as the Gibbs
distribution given in Equation
This smoothing factor, S(a1, a2), captures the degree bywhich occurrences for a certain medical concept labeled
with the assertion a
2 may be relevant to probabilistic queries
targeting the same medical concept with assertion a1. We
Φ3(m, r)Φ4(e, r)
estimate this value as the number of two-step paths in theCMN from any concept with assertion a1 to any concept
Note that Z is the typical normalization constant equal to
with assertion a2.
This assertion smoothing factor allows us to make recom-
mendations for a query concept given an evidence concept
(e.g. P (qc, qa) (ec, ea)), by considering information
across all belief values weighted by their semantic similarityto the given belief values. We accomplish this by smooth-
ing the co-occurrence probability as a mixture model ofthree components as shown in Equation (1) the directprobability, P , that the exact concepts co-occurred; (2) thetotal probability that the exact query concept co-occurred
with the evidence concept qualified by any possible asser-
tion (i.e. P P (q
(ec, ai), scaled by the smooth-
ing factor between the encountered evidence assertion andthe desired evidence assertion, i.e. S(qa, ai); and (3) the
Figure 5: Distribution of edges in the CCG.
total probability that the query concept qualified by anyassertion co-occurred with the exact evidence concept (i.e.
Cellulitis & Abscess
(ec, ea), scaled by the smoothing factor
between the encountered query assertion and the desired
query assertion, i.e. S(ai, ea).
Table 2: Precision and accuracy for the top 15 treatments
P ((c, a) (d, b); δ) =
for each cohort.
λ0P (c, a)(d, b)
We annotated these EMRs with the medical concepts and
P (c, a)(d, β); δ − 1S(b, β)
assertions described in Section 2.
By automatically processing the medical language in this
P (c, α)(d, b); δ − 1S(α, a)
subset of EMRs, we were able to generate the Clinical
Markov Network (CMN) described in Section 4, which
P (c, a)(d, b)
corresponds to a cohort of patients with cellulitis or abscess.
The distribution of edge classes in the CMN for these cohorts
In order to limit the length of transitive paths considered, we
is not uniform, as illustrated in Figure
introduce a limiting parameter, δ, which limits the recursive
Figure plots the distribution of edges in the CCG by type.
depth by which medical concepts will be smoothed (if δ = 0,
Note that the distribution of edges in the CCG corresponds
no smoothing will occur). This smoothing allows us to
to the un-normalized probability mass of each factor in the
predict the likelihood of a certain medical test or treatment
CMN. It is clear from this distribution, that the majority of
for a given patient by considering the dependencies encoded
edges involve medical problems, with a nearly equal num-
in the EMRs across all assertion values without disregarding
ber of inferred dependencies between medical problems and
the semantics of each assertion.
tests. In Figure the number of edges between medicalproblems and tests, T
Experimental Results
ME (denoted as M ↔ E), and between
medical problems and treatments, TMR, denoted as M ↔ R,
To produce the data-driven Clinical Markov Network
are nearly equal. As such, the number of edges between med-
(CMN), we used the same EMRs that enabled us to build a
ical tests and treatments, TER, denoted as E ↔ R, makes
patient cohort retrieval system for the medical records track
up a smaller portion, indicating that there are an abundance
(TRECMed) of the Text REtrieval Conference (TREC) in
of medical problems listed in each EMR. This reinforces to
the fact that physicians typically document all the historical,
This dataset includes 95,703 de-identified
possible, and related or even unrelated medical problems
EMRs which were generated from multiple hospitals during
observed during a patient's physical or other examinations.
2007. The EMRs were grouped into hospital visits con-
In order to evaluate the validity of the inference that the
sisting of one or more medical reports from each patient's
CMN enables, we asked two inferential questions: (1) "what
hospital stay. Thus, the EMRs were organized into 17,199
are the most probable medical treatments for a certain pa-
different patient hospital visits. Each visit had the patient's
tient cohort?" and (2) "which tests are most likely to be
admission diagnoses, discharge diagnoses, and related ICD-
conducted on patients with the given medical problem(s)?".
9 codes. We also used the 826 discharge summaries used
We answered the first question by computing the conditional
during the 2010 i2b2/VA challenge which contained 72,896
probability distribution for all treatments conditioned on
medical concepts and their assertions.
the medical problems associated with the cohort retrieved
As illustrated in Figure in addition to the hospital visits
for Q1, Q2, and Q3. These probability distributions are
and associated EMRs, we have also used annotations which
computed according to Equation
we produced on the EMRs resulting for three patient co-
The second question was answered by calculating the condi-
horts targeted by the queries (Q1) "patients who presented
tional probability distribution over all tests conditioned on
with cellulitis," (Q2) "patients diagnosed with abscess," and
the hospital visits associated with each cohort, as computed
(Q2) "patients suffering from both cellulitis and abscess."
Cellulitis & Abscess
Treatments
Treatments
Treatments
vancomycin/ONGOING
vancomycin/ONGOING
vancomycin/ONGOING
emergency department/ONGOING 12.61%
linezolid/ONGOING
procedure/CONDUCTED
emergency department/ONGOING
procedure/CONDUCTED
linezolid/ONGOING
eradication protocol/ONGOING
emergency department/ONGOING
eradication protocol /ONGOING
procedure/CONDUCTED
drainage/CONDUCTED
drainage/CONDUCTED
antibiotics/ONGOING
iv dilaudid/ONGOING
lisinopril/ONGOING
antibiotics/ONGOING
pain control/ONGOING
vanco/HISTORICAL
protonix/ONGOING
protocol/ONGOING
ibuprofen/ONGOING
prednisone/ONGOING
drainage/ONGOING
⋮
(12 rows omitted)
pressure blood/CONDUCTED
blood pressure/CONDUCTED
physical examination/CONDUCTED 11.39%
vital signs/CONDUCTED
vital signs/CONDUCTED
pressure blood/CONDUCTED
temperature/CONDUCTED
temperature/CONDUCTED
systems review/CONDUCTED
systems review/CONDUCTED
physical examination/CONDUCTED 6.20%
vital signs/CONDUCTED
systems review/CONDUCTED
palpation/CONDUCTED
hemoglobin/CONDUCTED
temperature/CONDUCTED
respirations/CONDUCTED
palpation/CONDUCTED
auscultation/CONDUCTED
creatinine/CONDUCTED
creatinine/CONDUCTED
⋮
(3 rows omitted)
auscultation/CONDUCTED
physical exam/CONDUCTED
Figure 6: Treatment and test recommendations for present medical problems "cellulitis", "abscess", and both "cellulitis &abscess."
The distributions of the 15 most-likely treatments and 10
The most common treatment across all patient cohorts is
most-likely tests for each cohort are illustrated in Figure
Vancomycin which is the most recommended treatment for
We have evaluated the recommendations, as shown in Ta-
methicillin-resistant Staphylococcus aureus (MRSA), the
ble based on (1) the Infectious Diseases Society of Amer-
most common cause of cellulitis and abscess. However,
ican (IDSA)'s Practice Guidelines for the Diagnosis and
after Vancomycin, the treatment distributions begin to dif-
Management of Skin and Soft-Tissue Infectious
fer. We have highlighted the treatment Zosyn (a mixture
(2) Howe and Jones Guidelines for the Man-
of Piperacillin and Tazobactam) which is an antibiotic ap-
agement of Periorbital Cellulitis/Abscess
proved to treat for infections such as cellulitis and abscess.
(3) Uzcategui et. al's Clinical Practice Guidelines
Despite being commonly given to patients with cellulitis
for the Management of Orbital Cellulitis
(4.46%, the second highest-ranked treatment), it is ranked
and (4) the National Library of Medicine's MED-
twentieth for treating abscess, at only 0.49%. This corre-
LINEplus Web Service
sponds to the most typical treatment for abscessing concern-ing draining the cyst, corresponding to entries four and six.
According to these sources, we achievement a precision
Additionally, more general antibiotics, such as Linezolid and
within the first 15 treatments of 50% for cellulitis, 71% for
Ciprofloxacin are more commonly given for abscess, as they
cellulitis & abscess, and 64% for abscess. In this measure-
treat a variety of underlying infections.
ment, we considered a treatment as relevant if it should bedirectly associated with the patient cohort. Note: we do not
However, for the cohort of patients suffering from both
consider treatments for associated symptoms (e.g. pain) as
conditions, Zosyn rises to position 7 at 1.83% reflecting
relevant. Additionally, because precision does not take into
the fact that it is able to effectively treat both conditions.
the probability associated with each item, we have also cal-
This shows the ability of the CMN to capture the interaction
culated the accuracy of each distribution as the proportion
between treatments for combinations of medical problems.
of probability mass assigned to relevant treatments. Using
As our dataset is represented by primarily hospitalized pa-
this definition, we achieve an accuracy of 58.2% for celluli-
tients (rather than outpatient procedures), many of the rec-
tis, 98.1% for cellulitis & abscess, and 83.6% for abscess.
ommended treatments are general purpose medications per-
Before discussing specific treatments, we list the following
scriped during the patients hospital stay, such as pain reliev-
abridged definitions from MEDLINEplus:
ers (e.g. aspirin, ibuprofen, pain control), stool softeners
abscess a pocket of white blood cells, germs, and dead
(e.g. colace), diaretics (e.g. lasix) and blood thinners (e.g.
tissues on the skin resulting from an infection.
cellulitis an infection of the skin and underlying tissues
We have also evaluated the top 10 tests most likely to be
caused by bacteria (typically streptococcal).
conducted for patients in each cohort, as illustrated in Fig-
ure We observed that the likelihood of conducting a
Howe, L. and Jones, N. (2004). Guidelines for the manage-
physical examination has a distribution rank which varies
ment of periorbital cellulitis/abscess. Clinical Otolaryn-
across all cohorts. Although it is ranked second for cel-
gology & Allied Sciences, 29(6):725–728.
lulitis (at 11.39% likelihood), it is ranked much lower for
Koller, D. and Friedman, N. (2009). Probabilistic graphical
abscess at position 12 (at 2.12% likelihood). This reflects
models: principles and techniques. MIT press.
the recommendation in the guidelines for cellulitis: because
Miller, N., Lacroix, E.-M., and Backus, J. E. (2000). Med-
cellulitis leaves a patient vulnerable to secondary conditions,
lineplus: building and maintaining the national library of
a thorough physical examination should be performed. As
medicine's consumer health web service. Bulletin of the
such, for patients suffering from both cellulitis & abscess,
Medical Library Association, 88(1):11.
the likelihood of conducting a physical examination moves
Ratner, R., Eden, J., Wolman, D., Greenfield, S., and Sox,
up to rank 5 (6.20%), reflecting the interaction between the
H. (2009). Initial national priorities for comparative
two conditions in EMRs.
effectiveness research. National Academies Press.
We also observed that the first three most-commonly con-
Roberts, K. and Harabagiu, S. (2011). A flexible framework
ducted tests (i.e. blood pressure, pulse, and vital signs)
for deriving assertions from electronic medical records.
constitute the majority of the probability mass. This reflects
Journal of the American Medical Informatics Association,
a critical observation on the utility of medical test annota-
tions: that the mere mention of a medical test is not sufficient
Savova, G. K., Masanz, J. J., Ogren, P. V., Zheng, J., Sohn,
for statistical reasoning. EMRs document a wide battery of
S., Kipper-Schuler, K. C., and Chute, C. G. (2010). Mayo
tests and their results for each patient allowing physicians to
clinical text analysis and knowledge extraction system
ascess not only their primary medical problem, but also any
(ctakes): architecture, component evaluation and appli-
secondary conditions or co-morbidities. In order to improve
cations. Journal of the American Medical Informatics
the capability of clinical reasoning enabled by the CMN, the
value of tests should be considered and associated with the
Scheuermann, R. H., Ceusters, W., and Smith, B. (2009).
identification of the mention of each test.
Toward an ontological treatment of disease and diagnosis.
Proceedings of the 2009 AMIA Summit on Translational
In this paper, we show how medical language processing
Stevens, D. L., Bisno, A. L., Chambers, H. F., Everett,
enables the automatic derivation of clinical pictures and
E. D., Dellinger, P., Goldstein, E. J., Gorbach, S. L.,
therapies for entire patient cohorts. We explain how this
Hirschmann, J. V., Kaplan, E. L., Montoya, J. G., et al.
knowledge can inform a data-driven probabilistic graphical
(2005). Practice guidelines for the diagnosis and manage-
model on which inference can be performed in a rigorous
ment of skin and soft-tissue infections. Clinical Infectious
way for determining the most probable treatments for a
given set of medical conditions. Further, we observe that
Stone, P. J., Dunphy, D. C., and Smith, M. S. (1966). The
the utility offered by medical test mentions is limited for
general inquirer: A computer approach to content analy-
probabilistic reasoning. Despite this, we evaluated the most
likely treatments against (1) the Infectious Diseases Society
Uzcategui, N., Warman, R., Smith, A., and Howard, C.
of American (IDSA)'s Practice Guidelines for the Diagnosis
(1997). Clinical practice guidelines for the management
and Management of Skin and Soft-Tissue Infectious
of orbital cellulitis. Journal of pediatric ophthalmology
(2) Howe and Jones Guidelines for the Man-
and strabismus, 35(2):73–9.
agement of Periorbital Cellulitis/Abscess
O., South, B. R., Shen, S., and DuVall, S. L. (2011).
(3) Uzcategui et. al's Clinical Practice Guidelines
2010 i2b2/va challenge on concepts, assertions, and re-
for the Management of Orbital Cellulitis
lations in clinical text. Journal of the American Medical
and (4) the National Library of Medicine's MED-
Informatics Association, 18(5):552–556.
LINEplus Web Service and confirmed
Voorhees, E. and Hersh, W. (2012). Overview of the trec
the validity the probabilistic information encoded by our
2012 medical records track. In The Twenty-First Text RE-
trieval Conference Proceedings (TREC 2012), Gaithers-burg, MD. National Institute for Standards and Technol-
ogy. Unpublished. Draft available at http://trec.nist.gov/.
Aronson, A. R. (2001). Effective mapping of biomedical
Voorhees, E. and Tong, R. (2011). Overview of the trec
text to the umls metathesaurus: the metamap program. In
2011 medical records track. In The Twentieth Text RE-
Proceedings of the AMIA Symposium, page 17. American
trieval Conference Proceedings (TREC 2011), Gaithers-
Medical Informatics Association.
burg, MD. National Institute for Standards and Technol-
Bodenreider, O. (2004). The unified medical language sys-
tem (umls): integrating biomedical terminology. Nucleicacids research, 32(suppl 1):D267.
Goodwin, T. and Harabagiu, S. M. (2013). The impact
of belief values on the identification of patient cohorts.
In Information Access Evaluation. Multilinguality, Mul-timodality, and Visualization, pages 155–166. SpringerBerlin Heidelberg.
Source: http://www.hlt.utdallas.edu/~travis/papers/lrec_2014.pdf
Check List 8(2): 264-266, 2012© 2012 Check List and Authors ISSN 1809-127X (available at www.checklist.org.br) Journal of species lists and distribution Mammalia, Rodentia, Cricetidae, Calomys laucha (Fischer, 1814): Distribution extension in southern Brazil Caroline Badzinski 1*, Daniel Galiano 2 and Jorge R. Marinho 1 1 Universidade Regional Integrada do Alto Uruguai e das Missões – Campus de Erechim, Departamento de Ciências Biológicas. Avenida Sete de
MANUAL DE USO Y CUIDADO ESTE APARATO DE AIRE ACONDICIONADO ESTÁ EQUIPADO CON UN NUEVOCABLE ELÉCTRICO ESTÁNDAR CON UNA FUNCIÓN DE TEST-REPOSICIÓN LEA Y GUARDE ESTAS INSTRUCCIONES APARATO DE AIRE ACONDICIONADO CONTROL ELECRÓNICO DE VELOCIDADES EN VARIOS PASOS GARANTÍA DEL AIRE ACONDICIONADO DE HABITACIÓNSu producto está protegido por esta garantíaSu electrodoméstico está garantizado por la empresa Electrolux. Electrolux ha autorizado a Servicios al Consumidor Frigidaire y a susservicios autorizados de otorgar servicio bajo esta garantía. WCI no autoriza a ninguna otra persona a cambiar o agregar a cualquiera de las obligaciones bajo esta garantía. Cualquier obligación de servicio y partes bajo esta garantía deben ser desempeñadas por ServicioFrigidaire para el Consumidor o un servicio Frigidaire autorizado.