Automated identification of type 2 diabetes mellitus: code versus text

University of South Carolina

AUTOMATED IDENTIFICATION OF TYPE 2 DIABETES MELLITUS: CODE VERSUS TEXTVanessa L. CongdonUniversity of South Carolina - Columbia Follow this and additional works at: Recommended CitationCongdon, V. L.(2014). AUTOMATED IDENTIFICATION OF TYPE 2 DIABETES MELLITUS: CODE VERSUS TEXT. (Doctoraldissertation). Retrieved from This Open Access Dissertation is brought to you for free and open access by Scholar Commons. It has been accepted for inclusion in Theses andDissertations by an authorized administrator of Scholar Commons. For more information, please contact .
AUTOMATED IDENTIFICATION OF TYPE 2 DIABETES MELLITUS: CODE VERSUS TEXT Vanessa L. Congdon Bachelor of Science Longwood University, 2007 Submitted in Partial Fulfillment of the Requirements For the Degree of Master of Science in Public Health in The Norman J. Arnold School of Public Health University of South Carolina Anwar T. Merchant, Director of Thesis Robert Moran, Reader Linda J. Hazlett, Reader Lacy Ford, Vice Provost and Dean of Graduate Studies Copyright by Vanessa L. Congdon, 2014 All Rights Reserved This work is dedicated to my family and friends. Thank you all for believing in me and continually encouraging me to achieve my dreams. ACKNOWLEDGEMENTS This thesis would not have been possible without the continued support and guidance from a number of people. First I would like thank my committee chair, Dr. Anwar Merchant for his knowledge, guidance, and flexibility to work with me from afar. I am also indebted to my thesis committee members, Dr. Linda Hazlett and Dr. Robert Moran, for their time, honest critiques, and willingness to guide me through the entire process. I would also like to acknowledge my PPRNet mentors, Dr. Steven Ornstein and Dr. Ruth Jenkins for providing me with the PPRNet data and for molding me into a true My success as a student would not have been possible without the unwavering support of my family and friends. A special thanks to my parents for their unconditional love and support through my darkest of days during this long process. And lastly, thank you to my biggest cheerleader and best friend, Jason, for your limitless support and for making every day of this journey a lot more enjoyable. Background: A growing emphasis in the healthcare industry today is being placed on
demonstrating meaningful use of one's Electronic Health Record (EHR) system. As rates of chronic disease, including diabetes mellitus (DM) rise, it has become clear that accurate and timely disease surveillance could be greatly improved utilizing the technologies available to clinicians today. As the Centers for Medicare and Medicaid Services (CMS) meaningful use incentive program deadlines fast approach, it remains unclear if their limited attestation criteria clearly reflect their end goal of improving patient care. The objective of this research was to determine the diagnostic accuracy of an automated text- based algorithm for identifying patients with diabetes mellitus from the longitudinal PPRNet Database. Methods: The longitudinal PPRNet database is comprised of McKesson's Practice
Partner, Lytec or Medisoft EHR system users nationwide. The analysis included data from the 115 PPRNet practices that submitted their 4th quarter data extract in January 2014. An unstructured free-text algorithm was used to determine the number of type 2 diabetics among all active adult patients. This algorithm which examines unstructured free-text data documented within the EHR title lines was compared to a previously established protocol which used a combination of ICD-9 diagnostic codes and/or active DM prescriptions. Results: Between all algorithm comparisons, the patients identified as having diabetes
varied considerably. Using the combination of ICD-9 diagnostic codes and/or active DM prescriptions as comparison method, the resulting sensitivity was 77.8% and specificity was 97.2% for the free-text definition. Using diagnostic codes alone as the standard for comparison resulted in a much higher sensitivity (99.3%), and lower specificity (91.9%). However, when we compared the free-text definition to the ICD-9 diagnostic codes alone, 70% of free-text identified cases were found to be un-coded. Conclusions: As EHR use continues to rise, it is crucial that we continue to develop
ways to accurately translate patient data out of these systems in order to meaningfully utilize these powerful technologies. This thesis has helped clarify the need for further development of accurate data translation platforms in order to capture each patient's full and unique health story as well as for monitoring treatment and outcomes all while minimizing physician burden. TABLE OF CONTENTS DEDICATION . iii ACKNOWLEDGEMENTS . iv LIST OF TABLES . ix LIST OF ABBREVIATIONS .x CHAPTER I – Introduction .1 1.1 Statement of the Problem .1 1.2 Purpose and Objectives .4 1.3 Significance of Research .5 CHAPTER II – Literature Review .6 2.1 Diabetes Mellitus .6 2.2 U.S Healthcare's Transition to Electronic Health Record Systems .7 2.3 Data Structure .10 CHAPTER III – Methods .17 3.1 Study Design .17 3.2 Measurement .18 3.3 Statistical Analysis .21 CHAPTER IV – Results .23 4.1 Sample Characteristics .23 4.2 Sample Characteristics of Test Identified Diabetes Mellitus Population .23 4.3 Algorithm Evaluation: DM Prevalence, Sensitivity and Specificity .24 CHAPTER V – Discussion .27 5.1 Strengths of Study .28 5.2 Limitations of Study .28 5.3 Future Research .29 5.3 Conclusions .29 TABLE 2.1: Description of comparative studies that examine the reliability and validity of EHR derived algorithms for clinical quality measurement .14 TABLE 3.1: Drugs for treatment of Type 2 Diabetes Mellitus .20 TABLE 4.1: Sample Characteristics of PPRNet Population and Adults with Text-Identified Type 2 Diabetes Mellitus .25 TABLE 4.2: 2-year DM Prevalence among All Active Adult Patients in 115 PPRNet Practice Sites by Algorithm .26 TABLE 4.3: Sensitivity and Specificity of Unstructured Free-Text Algorithm Using Different Standards of Comparison .26 LIST OF ABBREVIATIONS CDC………………………………………….Centers for Disease Control and Prevention HITECH…………….Health Information Technology for Economic and Clinical Health PBRN…………………………………………….Primary Care Based Research Network PPRNet……………………………………………….Practice Partner Research Network Statement of the Problem
Diabetes mellitus (DM) is one of the most prevalent, costly and burdensome, chronic illnesses in the U.S, with nearly 10% of the entire population diagnosed with diabetes and 35% with prediabetes. The American Diabetes Association predicts that as many as 1 in 3 Americans will have diabetes by 2050 . As Americans become increasingly plagued by diabetes, accurate and timely disease surveillance is becoming increasingly important for clinicians, clinical researchers, policy makers and health plan administrators. Historically, disease surveillance required manual review of paper charts or large national surveys, both of which are time consuming and costly; however the nationwide shift to electronic health records (EHR) provides the potential for a more efficient alternative. The Health Information Technology for Economic and Clinical Health (HITECH) Act passed by the U.S Congress in 2009 is investing billions of dollars in incentives to clinicians who can demonstrate meaningful use of their EHR systems over the next several years. This act was set into motion with hopes of molding EHR's from data graveyards into data warehouses. Ideally these warehouses will contain extractable, secure, comprehensive, and standardized health information . Meaningful use includes both a core set and a menu set of objectives that are specific to eligible providers, hospitals and critical access hospitals (CAH). There are a total of 24 meaningful use objectives for eligible providers, and 23 objectives for eligible hospitals and CAHs. To qualify for an incentive payment, 19 of these 24 or 18 of the 23 objectives must be met. Due to the significant requirements for meaningful use attestation, the program is divided into 3 stages for qualification. In the first stage of participation, providers must demonstrate meaningful use for a 90-day EHR reporting period; in subsequent stages, providers will demonstrate meaningful use for a full year EHR reporting period. Programs are not required to demonstrate meaningful use in consecutive years; however, there are deadlines for attesting to each stage. All hospitals and practices that choose not to participate in the program will face reductions in Medicare reimbursement rates . The overarching goals of this meaningful use incentive program are to push the U.S health care system to exploit and expand health information technology; however this major overhaul presents many challenges to all parties involved. As the deadlines for qualifying as a stage 2 meaningful use vendor quickly approach, EHR software companies struggle to keep up, preventing proper usability assessments during development . A certified stage 2 meaningful use EHR vendor must enable providers to record data in a structured format, allowing for data to be more easily retrieved and transferred, with hopes of optimizing health technology to improve patient care. Meanwhile, practitioners continue to struggle with current insufficient interfaces, and clinical researchers suffer from lacking standardized terminologies, yet both have little say in future system developments . EHRs contain two types of data; structured, coded data and, unstructured, free text data. Both types of data contain important information about the patient's unique health story. Many providers find that entering standardized data, rather than free text takes more time and effort. Some feel that current software is lacking in standardized matches for many common chronic conditions . West et al highlighted that the fragmentation of the US healthcare system hinders chronic disease management as well as longitudinal research on these diseased populations. Because patients see multiple providers in their lifetime, tracking a patient's care remains extremely difficult . Researchers advise further validation on electronic database extraction techniques before using them to assess quality of care . Diabetes surveillance remains a top priority of the CDC, who developed and maintains the world's first diabetes surveillance system. These surveillance data rely on national and state-based household, telephone, and hospital-based surveys and vital statistics to monitor diabetes trends. In collaboration with the NIH, the CDC has also initiated the SEARCH for Diabetes in Youth study, the largest major surveillance system to quantify and track the diabetes burden in Americans under 20 years of age. The SEARCH study provides population-based information on the underlying factors, trends, impact and level of care provided as well as allows researchers to clarify the degree to which type 2 diabetes is affecting youth of different racial and ethnic backgrounds. Overall, the CDC's surveillance data is used to understand the diabetes epidemic, identify vulnerable at-risk populations, set prevention objectives and monitor successes of programs over time, all at the national level. Purpose and Objectives
The purpose of this thesis is to optimize methods for identification of patients with type 2 diabetes mellitus (DM) from de-identified EHRs of primary care practices in the Practice Partner Research Network (PPRNet). PPRNet is a practice based research network (PBRN) that was established in 1995 as a collaborative effort between the Department of Family Medicine at the Medical University of South Carolina (MUSC), McKesson in Seattle, WA, and participating primary care or internal medicine practices nationwide. The PPRNet database contains historical clinical data from 1987 through 2013 from 340 practices and more than 5 million patients. Currently PPRNet has 151 active member practices who electronically submit quarterly data extracts to PPRNet for aggregation and analysis. Our structured coded-data algorithm used for comparison was developed from the previously established definition that Miller et al. used in 2004 to auto-identify DM patients in the Department of Veteran Affairs database to calculate best estimates of DM prevalence and incidence rates . Our unstructured text data algorithm uses a developed data dictionary based on natural language processing to identify cases of DM through evaluation of unstructured text data from the title lines within the EHR. This thesis will test the diagnostic accuracy of the unstructured text algorithm in comparison with Miller's identification protocol. The specific aims for this thesis are: Specific Aim 1: Unstructured text data • Identify cases of DM from de-identified EHR's of primary care practices participating in PPRNet using developed algorithms based on natural language processing to identify cases of DM through evaluation of unstructured text data from the title lines within the EHR. Specific Aim 2: Structured coded data • Identify cases of DM from de-identified EHR's of primary care practices participating in PPRNet using an algorithm established by Miller et al. that assesses ICD-9 codes and diabetes medications from structured diagnostic Specific Aim 3: Diagnostic accuracy • Compare the unstructured text-based algorithm versus Miller's algorithm that assesses ICD-9 codes and diabetes medication prescriptions for identifying patients with diabetes. Significance of Research
Specific aims of this thesis will assess the diagnostic accuracy of a new unstructured text-based algorithm in comparison to an established structured code-based algorithm. Several studies have been conducted to evaluate methods for estimating disease prevalence or identifying high-risk patients from structured EHR data, or claims data. Much existing research focuses on the use of automated data retrieval strategies to assess quality of care, although a study comparing the data documented within structured, coded fields with unstructured, narrative fields has yet to be performed. As the goals of the meaningful use EHR incentive program continue to propel the U.S healthcare system forward at a rapid rate, it's important to evaluate the current system operations in order to monitor the impact these changes have on achieving desired long-term outcomes. This thesis intends to not only present the diagnostic accuracy of this proposed diagnostic tool, but also highlight the fundamental differences between data recorded in structured and unstructured formats. Literature Review Diabetes Mellitus
Prevalence of type 2 DM in the United States is increasing at a rapid rate, along with it are health care costs, and other associated complications. From 1980 to 2011, the crude prevalence of diagnosed diabetes rose 176% (from 2.5% - 6.9%) . The American Diabetes Association (ADA) reported as of March 2013, 25.8 million (8.3%) Americans have diabetes, listing 7.0 million of those as undiagnosed. The total annual costs attributable to diabetes are estimated to be nearly 245 billion dollars, accounting for 20% of all health care expenditures in the U.S. Another 79 million Americans have prediabetes, of which only 7.3% have been told by their physician . Prediabetes, also commonly referred to as impaired glucose tolerance (IGT) or impaired fasting glucose (IFG) almost always precedes the development of type 2 diabetes. While risk factors such as genetics, ethnicity, birth weight and metabolic syndrome certainly play a role in the development of diabetes, several controllable lifestyle factors, such as one's weight, diet, exercise regimen and smoking status also influence a person's probability of acquiring the disease. The ADA reported 85.2% of people with type 2 diabetes are overweight or obese . Given the magnitude of this problem, the U.S healthcare system needs accurate, automated data retrieval methods to estimate and monitor its prevalence and evaluate the quality of care. U.S Healthcare's Transition to Electronic Health Record Systems
Many large institutions nationwide have adopted EHR systems, while fewer small clinics and primary care practices, who treat a majority of Americans, have integrated health information technology (HIT) into their practices. Among these early adopters, few properly utilized advanced features such as clinical decision support, point of care alerts, patient activation, and overdue service reminder letter generation . While clinical decision support has been shown to improve things like preventive care screening rates among primary care doctors, an unintended inverse effect of alert fatigue has surfaced when used too frequently 15). Lacking standard data definitions and interoperability hinder nationwide implementation of comprehensive Personal Health Records (PHR), highlighting the urgent need for clinical informatics . These patient portals are currently utilized by less than 1% of the U.S population. The healthcare system recognizes the potential these portals could have on stimulating patient engagement. This platform would allow patients access to their personal health information, as well as educational material and tools, empowering them to become active participants in the management of their own health 18). The U.S congress enacted the Health Information Technology for Economic and Clinical Health (HITECH) Act as part of the American Reinvestment and Recovery Act of 2009 to allow the Center for Medicare and Medicaid to provide incentives to clinicians and hospitals who demonstrate meaningful use of their EHR system . The requirements for participation gradually increase throughout the three stages, qualifying providers that attest to each stage with significant incentive payments, and penalizing those that don't successfully attest to stage two requirements at least three months before the end of the 2014 payment year. 2.2.1 Electronic Health Records and Quality Clinical Care and Measurement As clinicians across the country strive to earn these meaningful use incentives, greater emphasis has been placed on the validity of current EHR-derived clinical quality measures. Although the potential rewards are enormous, the accompanying challenges should not be underestimated. Historically, clinical researchers, health plan administrators and policymakers have relied on administrative, claims-based databases, and self-report to deduce clinical context, often producing misleading results that underestimate quality-of-care measures . Self-report has been shown to over- estimate diabetes quality of care measures . Claims databases were developed to collect insurance payments, not track clinical information. Consequently, much relevant health information that is unnecessary for processing payments may not be collected or recorded accurately. Pharmacy claims often fail to identify chronic conditions like diabetes and hypertension that are being controlled by diet alone . The comparison of claims with medical record data produced complementary information on diabetes quality of care measures, resulting in mixed reliability, the highest being microalbumin testing and the lowest agreement for eye examination . A later study compared a claims-based strategy and an EHR-based method with a manual review reference group in the identification of pharyngitis. Overall, a larger proportion of cases were correctly identified by the EHR-based strategy than the administrative data-based strategy. The administrative data-based strategy did however boast a higher specificity than the EHR-based method, emphasizing the need for more rigorously defined EMR-based retrieval strategies, before utilizing them for quality of care measurement . In 2012, Ganz et al extracted structured coded data on falls in the elderly, and compared it with manual review. He found that only 54% of falls were identified within the coded data, and that much documentation regarding the care surrounding each event was recorded in non-structured form. In conclusion, because the accuracy of quality of care measures vary greatly between the types of care process being evaluated, and prevent unique challenges, future validation studies comparing automated algorithms to manual review will be beneficial . 2.2.2 Chronic Disease identification within the Electronic Health Record Accurate chronic disease identification within the EHR is essential to surveillance efforts, the development of patient care plans, and clinical research advancements. Clinician documentation style remains the essential focus for improvement. Chronic disease management often requires the coordination of many physicians. Due to incongruent EHR systems, much treatment documentation from specialists fails to be entered into the EHR utilized by the patient's primary care providers. Most information that is relayed winds up in the free text portion of office notes, which automated searches do not detect . Shifting to a more team-based care approach is necessary for improved identification and care of chronic illness. Strict algorithms for identification also prove to be important. In 2004, a study to estimate DM rates over a three year period within the Department of Veterans Affairs DEpic electronic database was conducted. This study compared varying combinations of EHR derived DM criteria to self-reported DM cases. The algorithm with the highest sensitivity (93%) and specificity (98%) used DM medication prescription records in the current year and/or 2 diabetes codes from inpatient and/or outpatient visits (VA and Medicare) over a 24 month period. When similar algorithms were applied to claims databases in 2006, Solberg et al reported final positive predictive values (PPV) between 0.965 and 1.0. All algorithms were tested on a small sample population and then adapted, producing a final algorithm with the following inclusion criteria; 2 or more outpatient or 1 inpatient ICD-9 codes for diabetes within one year, or a filled prescription for diabetes-specific medication in the same calendar year. After initial chart review, Metformin was found to be used to treat other conditions, such as polycystic ovary syndrome, infertility and reactive hyperglycemia, and was removed as a diabetes-specific medication from the final algorithm . Data Structure
The type of data contained in an EHR can be classified into one of two types; structured, coded data, or, unstructured, free-text data. Much recent research has focused on comparing the type of data stored in each form and its relation to clinical quality measurement. The meaningful use incentive program has identified many of the limitations in using unstructured data for these purposes, thus encouraging clinicians to document in structured, coded formats in order to attest in both stage 2 and stage 3. Many structured fields successfully capture all relevant information needed for some quality measures, such as blood pressure recorded in vital signs for hypertension measures . Although, much of the literature suggests that the completeness of the medical records and ease of extractability vary greatly depending on the clinical area of focus . The literature referenced in the following sections present the positive and negative attributes of both data types. 2.3.1 Unstructured Data Unstructured, narrative text provides unique insight into the quality of care because it represents a provider's thought process, unrestricted by structured vocabularies. This extensive narrative data is made valuable through the use of natural language processing (NLP). Most challenges in NLP arise in the process of deriving meaning from human or natural language input. Although NLP continues to improve, recall and precision rates vary significantly between systems. Narrowly and consistently defined variables, such as gender, race and test results tend to demonstrate the highest rates of both, while variables with multiple definitions remain difficult to capture and Studies that have only evaluated structured data fields have regularly stated that the algorithms missed recognition because relevant information, such as exclusion criteria, was only documented in narrative form . Another study found that their NLP system consistently out-performed the use of ICD-9 billing codes in identifying the condition of interest . Overall, the condition of interest being evaluated has the largest impact on NLP results. Existing literature highlights the limitations associated with manual review, the use of administrative data, EHR data structure and format, and extraction procedures . One major issue with auto-extracted data stems from under recording in reasonably accessible fields such as medication lists . This type of automated recognition software has been applied to discharge summaries, radiology reports, and other qualitative data from limited sections of the patient's EHR resulting in a validity ranging from low to high . When used in combination with ICD-9 codes, Zeng et al found that accuracy improved. NLP systems have been shown to accurately identify risk factors and diagnostic criteria associated with certain medical conditions. Byrd et al successfully developed NLP algorithms using Framingham criteria for early detection of heart failure patients . 2.3.2 Structured Data Structured, coded data allows for interoperability between systems. This type of data eases the accuracy for secondary use purposes. Readily available and directly analyzable EHR data reduces the need for extensive manual chart review, thus allowing for performance measures to be more easily assessed on a larger proportion of patients in care. When structured data was compared with full chart review results from the Veterans Health Administration's External Peer Review Program (EPRP) on several measures, over 80% of the data on these selected measures was found in a directly analyzable format within the EHR. While the EPRP data were found to be more complete, the correlation of measures between sources was very high (0.89-0.98) . Much focus been placed on standardizing EHR output, while very little emphasis, until recently has been aimed at standardizing EHR data inputs. All clinicians are initially trained on proper documentation techniques in their EHR training. These techniques are often reinforced by quality improvement specialists; however no mechanism within the EHR forces providers to document in a particular location in the chart. Intensive training, automatic prompts and proper feedback are necessary in standardizing their documentation habits to reflect the care given in EHR-derived quality Even standardized data comes with drawbacks. Botsis et al found much inaccuracy within coded data. Often times a non-specific ICD-9 code is selected, such as 250 for diabetes, when a more accurate diagnosis is actually made at the point of care. Inconsistencies within the data also prove to be troublesome, sometimes displaying both 250.01 and 250.02 for type-1 and type-2 diabetes respectively. He also highlights the lack of contextual information the current ICD-9 coding system supports . Table 2.1: Description of Comparative Studies that examine the Reliability and Validity of EHR derived Algorithms for Clinical Quality Measurement Citation
Study Population
Study Design
Baker et al.,
Automated review of the EHR was comparable to failure patient with manual review for Left ventricular ejection fraction 2 or more clinic (LVEF) measurement (94.6% vs. 97.3%), prescription of visits within the 18 beta blockers (90.9% vs. 92.8%), and prescription of ACE inhibitors or ARBs (93.9% vs. 98.7%). Performance was lower for prescription of warfarin for atrial fibrillation (70.4% vs. 93.6%). Baldwin et al., Accuracy
N= 60; Women ≥ A significant difference between Natural Language 40 years structured Processing (NLP) methods and manual review was convenience sample found. The NLP method found a false positive rate of 0, and a false negative rate of .035. Health Center in 2001 Benin et al.,
N= 479; possible When comparing each group to the reference; 91% of EMR-based strategy episodes were confirmed and 59% of the administrative data-based strategy. analyzed using; (1.) EMR-based, (2.) administrative data-based, and (3.) manual review reference strategies Fowles et al.,
Cross-sectional Reliability between primary medical record and claims with Diabetes, aged varied by measure; Eye examination (K= 0.371), Oral agents(K= 0.699), Insulin (K= 0.548), HbA1c (K= Minnesota health 0.678) and Microalbumin (K= 0.748) maintenance organization Ganz et al.,
N=215; Falls data A structured visit note was found in 54% of charts within 3 months of the date patients had been identified as falling. The reliability of the codable-data algorithm initiative in primary was good (K=0.61) compared with full medical record care medical groups review for three care processes. Goulet et al.,
VA patients with Over 80% of the selected measures were found in directly analyzable form within the EMR. The degree of correlation between automated algorithms assessing structured fields in comparison to the Veterans Health Administration's External Peer Review Program(EPRP) was high (0.89-0.98). Hivert et al.,
N=122,715; Active Directly measured EHR-defined MetS had 73% adult patients from sensitivity and 91% specificity. DM incidence was 1.4% in the No MetS group vs. 4% in the At-Risk-for-MetS practices in eastern Miller et al.,
The most accurate criterion was a prescription for Veterans Affairs diabetes medication in the current year and/or 2 + patients recorded in diabetes codes from inpatient and/or outpatient visits the longitudinal, (VA and Medicare) over a 24-month period (Se= 93% national database and Sp= 98%) against patient self-report. Owen et al.,
The percent agreement between automated algorithms sample of inpatient and manual review among patients with chlorpromazine and outpatient visits equivalents < 300, 300-1,000, and > 1,000, are .11, .41, for Schizophrenia and .21, respectively for inpatients, and .19, .21 and .40 patients from the for outpatients. The overall weighted Kappa for inpatients (K=0.55) and outpatients (K= 0.63). Administration database (VistA) Parsons et al., Accuracy;
N=4,081; patient The majority of diagnoses for chronic conditions had EHR records from information documented in the problem list (a structured field) and were recognized by the automated quality measures, including diabetes (>91.4% across measures), hypertension (89.3%), ischemic cardiovascular disease (>78.8% across measures) and dyslipidemia (75.1%). Persell et al.,
N=1,006; All CAD Performance on 7 quality measures varied from 81.6% for lipid measurement to 97.6% for blood pressure measurement. After including Free-text data, the medicine practice adherence rate increased, ranging from 87.5% for lipid measurement and low-density lipoprotein cholesterol to 99.2% for blood pressure measurement. Study Design
3.1.1 PPRNet We used a cross-sectional study of diagnostic accuracy design, analyzing data from the longitudinal PPRNet database. PPRNet was established in 1995 as a collaborative effort between the Department of Family Medicine at the Medical University of South Carolina (MUSC), Practice Partner/McKesson in Seattle, WA and participating primary care and internal medicine practices. PPRNet is a practice based research network (PBRN) that strives to improve the quality of healthcare in its member practices by; turning clinical data into actionable information, empirically testing theoretically sound quality improvement interventions, and disseminating successful interventions to primary care providers across the country. Currently PPRNet has 151 physician practices, representing over 1068 health care providers, and approximately 1.4 million patients located in 38 states. All of PPRNet's member practices currently use McKesson's Practice Partner, Lytec or Medisoft's EHR systems. These data are extracted and sent to PPRNet on a quarterly basis. Data are then cleaned, appended to the longitudinal database and analyzed to produce quality improvement reports on 65 clinical quality measures (CQM). These quality measures include ten diabetes mellitus measures and track the quality of care on several other common conditions such as cardiovascular disease, respiratory disease with other focuses on women's health, cancer screening, immunizations, mental health, substance abuse, and medication safety. 3.1.2 Study population This eligible patient population was comprised of active patients from 115 PPRNet practices that sent their fourth quarter data extract in January 2014. A patient was defined as active if he/she had a visit within 1 year and was not designated with a deceased or inactive status. A visit was determined by a progress note title that did not include text indicating a cancelled appointment or no show. Similarly, in either approach, the recorded data must not be designated with an inactive status or a resolved 3.1.3 Inclusion and exclusion criteria The electronic health record of all active patients ≥ 18 years of age were evaluated for an active diagnosis of type 2 diabetes mellitus made within the last 2 years. Measurement
The aims of this study were to assess DM diagnosis in a database of electronic medical records using 3 methods: NLP, Miller's protocol, and ICD-9 codes. NLP is a newer method that uses an algorithm based on unstructured text data, while the other two methods have been used in the past. 3.2.1 Unstructured text evaluation The unstructured text algorithm utilizes NLP techniques for automated identification of diagnoses. We first developed common text variations of DM, including full diagnosis names, ICD-9 codes, abbreviations, synonyms, and common misspellings. These 341 text string variations were then compared to the free text data, flagging possible diagnoses of type 2 DM and suggesting a corresponding ICD-9 code. All flagged diagnoses with a frequency of 4 or more were then manually reviewed by a research assistant for correctness. Text strings were then either classified as definite diagnoses of type 2 DM, or excluded from future analysis. These text string classifications were then reviewed by a clinician for accuracy. This review process is conducted on a quarterly basis. Each quarter, only new text variations, with a frequency greater than 3 are flagged for manual review. Currently, the PPRNet database contains 13,231 text variants included as DM. 3.2.2 Structured data evaluation The coded, structured data evaluation algorithm we used is based on Miller's definition for DM identification in a VA population [Miller 2004]. This criterion included a prescription for a diabetes medication in the current year and/or 2 or more recorded type 2 diabetes ICD-9 diagnostic codes within a 24-month period. As of January, 2014, the PPRNet database contained data through December 31, 2013 from 115A practices. The DM codes included for analysis were comprised of the following ICD-9 codes; 250(excluding type 1 codes), 357.2, 362.01, 362.02, 366.41. These were extracted from the 4 code fields within the EHR. The medications included for DM treatment will be taken from the most current Treatment Guidelines from The Medical Letter. The DM medications included in the analysis are listed in Table 2 .
Table 3.1: Drugs for Treatment of Type 2 Diabetes Mellitus
500,850,1000 mg tabs Glucophage 500,850,1000 mg tabs extended- release – generic 500, 750 mg tabs Glucophage XR 500, 750 mg tabs 500, 1000 mg tabs Fortamet 500, 1000 mg tabs Riomet- liquid 500 mg/ 5 mL (4, 16 oz) Second- Generation Sulfonylureas
Glimepiride – generic Glipizide – generic Glucotrol extended- release – generic 2.5, 5, 10 mg tabs Glucotrol XL Glyburide – generic 1.25, 1.5, 2.5, 3, 5, 6 mg tabs 1.25, 2.5, 5 mg tables Micronase 1.25, 2.5, 5 mg tabs micronized tablets – generic 1.5, 3, 4.5, 6 mg tabs Glynase Prestab 1.5, 3, 6 mg tabs Non-Sulfonylurea Secretagogues
Nateglinide – generic Repaglinide -- Prandin 0.5, 1, 2 mg tabs Pioglitazone – Actos 15, 30, 45 mg tabs Rosiglitazone -- Avandia Alpha-Glucosidase Inhibitors
Acarbose – generic 25, 50, 100 mg tabs 25, 50, 100 mg tabs 25, 50, 100 mg tabs DPP-4 Inhibitors
Sitagliptin -- Januvia 25, 50, 100 mg tabs Saxagliptin -- Onglyza Linagliptin -- Tradjenta GLP-1 Agonists
Exenatide – Byetta 250 mcg/mL (1.2, 2.4 mL Liraglutide – Victoza 6 mg/mL (3 mL prefilled pen) Colesevelam – Welchol Bromocriptine – Cycloset Pramlintide -- Symlin 1000 mcg/mL (1.5, 2.7 mL Combination Products
Metformin/glipizide – generic Metformin/glyburide 1000 mcg/mL (1.5, 2.7 mL Glucovance Metformin/pioglitazone 500/15, 850/15 mg tabs Actoplus Met 500/15, 850/15 mg tabs Actoplus Met XR 1000/15, 1000/30 mg tabs Metformin/repaglinide – Prandimet 500/1, 55/2 mg tabs Metformin/rosiglitazone – Avandamet 500/2, 55/4, 1000/2, 1000/4 Glimepiride/rosiglitazone – Anandryl 1/4, 2/4, 4/4, 2/8, 4/8 mg tabs Glimepiride/pioglitazone – Duetact 2/30, 4/30 mg tabs Metformin/sitagliptin -- Janumet 500/50, 1000/50 mg tabs Metformin/saxagliptin -- Kombiglyze 500/5, 1000/2.5, 1000/5 mg Statistical analysis
Statistical analysis was performed using SAS software version 9.2 (SAS Institute, Cary, NC). The number of type 2 DM cases was calculated using both algorithms (described above), as well as an algorithm that evaluated ICD-9 diagnostic codes, alone. The accuracy of the unstructured text algorithm was compared to Miller's approach as well as the ICD-9 diagnostic code algorithm by calculating sensitivity and specificity. The unstructured text algorithm was used to calculate the 2-year prevalence of DM in PPRNet. Rates are presented overall and in population subsets defined by patient characteristics: age, sex, body mass index (BMI), as well as practice characteristics, including; practice type, being either internal medicine or family practice, a mix of both, multi-specialty, or "other". Sample Characteristics
There were a total of 368,384 active adult patients among the 115 practices who sent their 4th quarter data extracts to PPRNet in January 2014 (Table 3). More than half of the population was female (57.5%). Within the sample, 36.6% were aged 18-44 years old, 18.6% were 45-54 years old, 19.5% were 55-64 years old, 13.9% were 65-74 years old, 7.6% were 75-84 years old, and 3.2% were 85-108 years old. Nearly a quarter of the population was underweight/normal weight (24.7%), while 29.8% were overweight, and 38.9% were obese. A majority of PPRNet practices are family practices, accounting for 70.5% of the patient sample. The majority of remaining patients belong to internal medicine practices (17.1%). A small sample of patients belongs to mixed practices made up of both family practitioners and internists. Rounding out the sample are multispecialty practices (2.6%), and "other" which consists of Rheumatology, Pulmonary, Gynecology, Neurology, Urology and Pediatric practices (4.5%). Sample Characteristics of Text-identified Diabetes Mellitus Population
Just over half of adult diabetics are female (51.1%). The percentage of diabetics increases with age before leveling off at age 74 and declining thereafter. As expected, most of these type-2 diabetics fell in the overweight (23.7%) or obese (63.0%) BMI categories. Less than 10% of PPRNet's diabetic patients are underweight (0.8%) or normal weight (8.6%). The DM patient sample was representative of the full population in regards to practice type as displayed in Table 3. Algorithm Evaluation: DM Prevalence, Sensitivity and Specificity
Table 4 presents 2-year DM prevalence estimates based on each of the three algorithms (detailed description provided above in Section 3.2). Both the unstructured free-text algorithm and Miller's algorithm produced the same prevalence (11.1%), while the ICD-9 diagnostic code algorithm identified far fewer cases of DM, resulting in a prevalence of 3.4%. Between all algorithm comparisons, the patients identified as having diabetes varied considerably. When we compared the unstructured free-text algorithm to Miller's, each protocol found close to 10,000 patients that were missed by the opposing definition. Using Miller's protocol as the standard of comparison, the resulting sensitivity was 77.8% and specificity was 97.2%. However, when we compared the free-text definition to the ICD-9 diagnostic codes alone, 70% of free-text identified cases were found to be un-coded. Only 86 additional patients had 2 or more recoded ICD-9 diagnostic codes but were not identified using the free-text algorithm. All 86 cases identified by the code definition alone were due to the low frequency of the corresponding text string. As described in detail in the methodology, only those unstructured text diagnoses that occur 4 or more times within the data are included for review to be counted as a definite diagnosis of DM. Using diagnostic codes alone as the standard for comparison resulted in a much higher sensitivity (99.3%), and lower specificity (91.9%). Table 4.1: Sample Characteristics of PPRNet Population and Adults with Text-Identified Type 2 Diabetes Mellitus All Adult patients (≥18) Overall Number and DM Prevalence
Age (years)
Underweight (< 18.5) Normal (18.5-25) Overweight (25-30) Practice Type
Family Practice/Internal Medicine Internal Medicine Table 4.2: 2-year DM Prevalence among All Active Adult Patients in 115 PPRNet Practice Sites by Algorithm Definition
No. (368,384)
Prevalence (%)
Miller's structured-coded: Active medication prescription and/or 2+
ICD-9 codes recorded within the previous 2 years Unstructured free-text: Active text diagnoses recorded in unstructured
title lines within previous 2 years ICD-9 diagnostic codes: 2+ ICD-9 diagnostic code recorded within
previous 2 years

Table 4.3: Sensitivity and Specificity of Unstructured Free-Text Algorithm Using Different Standards of Comparison
Compared with
unstructured free-text
Standard of Comparison
Standard of
Text definition
Miller's structured-coded ICD-9 diagnostic codes The first aim of this study was to replicate, in PPRNet, the best definition for automated DM identification within EHR data from Miller's 2004 study comparing various definitions for DM identification using the Department of Veteran Affairs electronic health record database. We found that while the same overall percentage of diabetic patients were identified using this method as compared to the free-text method, there were several thousand diagnoses that had clear evidence of a free-text diagnoses that were missing a corresponding diagnostic code, and that were not on an active prescription for a DM medication. Similarly, there were close to the same number of diabetic patients identified by Miller's definition alone when compared to the free-text algorithm. Miller's best definition includes an active prescription for DM recorded within the last year, or 2 or more ICD-9 diagnostic codes recorded within the last 2 years. One of the main limitations of this definition is that some commonly used medications for DM, such as Metformin, which is the first-line drug of choice for the treatment of type 2 diabetics who are overweight or obese and with normal kidney function is also used in the treatment of polycystic ovary syndrome and other diseases where insulin resistance may be an important factor. Secondly, this paper aimed to test a newly developed unstructured free-text based algorithm in accurate identification of DM cases within an active PPRNet patient population. One overarching limitation was due to our inability to access and manually review each individual patient record, leaving us with no true gold standard for comparison. We chose Miller's definition because it had been found to be quite accurate when compared to patient survey. Using this standard of comparison, the free-text definition resulted in a fair sensitivity and very good specificity. Although we did not manually review each patient record, each unique text string with a frequency of 4 or more that was flagged for review using our automated DM text string dictionary consisting of 341 unique and comprehensive text strings was reviewed by a trained research assistant. Text diagnoses that were unclear were then also reviewed by a physician. While we cannot say with certainty that all cases of DM identified using the text algorithm is an actual case of DM, we are very confident that the rate of misclassification is very low due to this extensive processing. After comparing our algorithm with ICD-9 diagnostic codes alone, it also appears that we are missing very few coded cases of DM, resulting in a very high sensitivity (99.3%) and specificity (91.9%). Several more cases were identified when adding prescriptions for DM to the definition, but as we previously stated, we cannot be sure that the medication is being used to treat DM. Strengths of the Study
A major strength of this study is the large sample size. This sample represents the differing documentation styles of hundreds of physicians nationwide treating hundreds of thousands of patients in both urban and rural practice settings. Limitations of the Study
PPRNet has very little variation in practice type and practice size, consisting of mostly small to mid-size family practices and internal medicine clinics. Another limitation is the fact that all PPRNet practices use one common EHR software product in an ever growing market place of products with varying configurations. Lastly, we did not compare our free-text based algorithm with a gold standard (physician diagnosis) preventing the estimation of its sensitivity and specificity. However, the development of the NLP algorithm is an iterative process. After a query is used to identify diabetes cases, a physician reviews the cases that the query identifies for accuracy. The query is then modified and the process is repeated. This happens on an ongoing basis. This rather efficient NLP algorithm was used to identify cases in this study. Future Research
We recommend that similar studies in the future use databases that contain data from several EHR software systems to reduce bias. It would be interesting to replicate this study in a more diverse research network; stratifying by practice site characteristics such as size, location and specialty as well as provider characteristics such as degree and specialty. In looking at both practice and provider characteristics, we could get a better understanding of what major factors influence physician EHR documentation styles. It would also be useful to attain patient records for manual chart review to use as a gold standard for comparison when testing new algorithms that could potentially aid in a variety of arena's such as population health. In a similarly large research network, one could collect a randomized sample of a small percentage of the total population rather than manually review the charts of the entire population. Conclusions
Our unstructured free-text evaluation performed quite well in accurately identifying Type 2 DM patients within the PPRNet active patient population. As EHR use is on the rise, it is crucial that we continue to develop ways to accurately translate patient data out of these systems in order to meaningfully utilize these powerful technologies. This paper has helped clarify the need for further development of accurate data translation platforms in order to capture each patient's full and unique health story as well as for monitoring treatment and outcomes all while minimizing physician burden. FAST FACTS Data and Statistics about Diabetes. In. 3/1/2013 ed: American Diabetes Association; 2013. p. 2. 2. Holmes C. The problem list beyond meaningful use. Part I: The problems with problem lists. J AHIMA 2011;82(2):30-3; quiz 34. 3. Prokosch HU, Ganslandt T. Perspectives for medical informatics. Reusing the electronic medical record for clinical research. Methods Inf Med 2009;48(1):38-44. 4. EHR Incentive Programs: Meaningful Use. In: Centers for Medicare and Medicaid Services; 2013. 5. Lobach DF, Detmer DE. Research challenges for electronic health records. Am J Prev Med 2007;32(5 Suppl):S104-11. 6. Richesson RL, Krischer J. Data standards in clinical research: gaps, overlaps, challenges and future directions. J Am Med Inform Assoc 2007;14(6):687-96. 7. West S, Blake C, Zhiwen L, McKoy J, Oertel M, Carey T. Reflections on the use of electronic health record data for clinical research. Health Informatics Journal 2009;15(2):108-21. 8. Benin AL, Vitkauskas G, Thornquist E, Shapiro ED, Concato J, Aslan M, et al. Validity of using an electronic medical record for assessing quality of care in an outpatient setting. Med Care 2005;43(7):691-8. 9. Miller DR, Safford MM, Pogach LM. Who has diabetes? Best estimates of diabetes prevalence in the Department of Veterans Affairs based on computerized patient data. Diabetes Care 2004;27 Suppl 2:B10-21. 10. Diabetes Data & Trends: Crude and Age-adjusted Percentage of Civilian Non- institutionalized Adults with Diagnosed Diabetes, United States, 1980-2011. In. Atlanta, GA: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention; 2011. 11. Goetz Goldberg D, Kuzel AJ, Feng LB, DeShazo JP, Love LE. EHRs in primary care practices: benefits, challenges, and successful strategies. Am J Manag Care 2012;18(2):e48-54. 12. Greiver M, Barnsley J, Glazier RH, Moineddin R, Harvey BJ. Implementation of electronic medical records: effect on the provision of preventive services in a pay-for-performance environment. Can Fam Physician 2011;57(10):e381-9. 13. O'Connor PJ, Crain AL, Rush WA, Sperl-Hillen JM, Gutenkauf JJ, Duncan JE. Impact of an electronic medical record on diabetes quality of care. Ann Fam Med 2005;3(4):300-6. 14. Harrison MI, Koppel R, Bar-Lev S. Unintended consequences of information technologies in health care--an interactive sociotechnical analysis. J Am Med Inform Assoc 2007;14(5):542-9. DeJesus RS, Angstman KB, Kesman R, Stroebel RJ, Bernard ME, Scheitel SM, et al. Use of a clinical decision support system to increase osteoporosis screening. J Eval Clin Pract 2010;18(1):89-92. 16. Katzan IL, Rudick RA. Time to integrate clinical and research informatics. Sci Transl Med 2012;4(162):162fs41. 17. Tang PC, Lansky D. The missing link: bridging the patient-provider health information gap. Health Aff (Millwood) 2005;24(5):1290-5. 18. Nagykaldi Z, Aspy CB, Chou A, Mold JW. Impact of a Wellness Portal on the delivery of patient-centered preventive care. J Am Board Fam Med 2012;25(2):158-67. 19. Blumenthal D, Tavenner M. The "meaningful use" regulation for electronic health records. New England Journal of Medicine 2010;363(6):501-4. 20. Pawlson LG, Scholle SH, Powers A. Comparison of administrative-only versus administrative plus chart review data for reporting HEDIS hybrid measures. Am J Manag Care 2007;13(10):553-8. 21. Tang PC, Ralston M, Arrigotti MF, Qureshi L, Graham J. Comparison of Methodologies for Calculating Quality Measures Based on Administrative Data versus Clinical Data from an Electronic Health Record System: Implications for Performance Measures. Journal of the American Medical Informatics Association 2007;14(1):10-15. 22. Fowles JB, Rosheim K, Fowler EJ, Craft C, Arrichiello L. The validity of self- reported diabetes quality of care measures. Int J Qual Health Care 1999;11(5):407-12. 23. Rector TS, Wickstrom SL, Shah M, Thomas Greeenlee N, Rheault P, Rogowski J, et al. Specificity and sensitivity of claims-based algorithms for identifying members of Medicare+Choice health plans that have chronic medical conditions. Health Serv Res 2004;39(6 Pt 1):1839-57. 24. Ganz DA, Almeida S, Roth CP, Reuben DB, Wenger NS. Can structured data fields accurately measure quality of care? The example of falls. J Rehabil Res Dev 2012;49(9):1411-20. 25. Roth CP, Lim YW, Pevnick JM, Asch SM, McGlynn EA. The challenge of measuring quality of care from the electronic health record. Am J Med Qual 2009;24(5):385-94. 26. Persell SD, Wright JM, Thompson JA, Kmetik KS, Baker DW. Assessing the validity of national quality measures for coronary artery disease using an electronic health record. Arch Intern Med 2006;166(20):2272-7. 27. Solberg LI, Engebretson KI, Sperl-Hillen JM, Hroscikoski MC, O'Connor PJ. Are claims data accurate enough to identify patients for performance measures or quality improvement? The case of diabetes, heart disease, and depression. Am J Med Qual 2006;21(4):238-45. 28. Borzecki AM, Wong AT, Hickey EC, Ash AS, Berlowitz DR. Can we use automated data to assess quality of hypertension care? Am J Manag Care 2004;10(7 Pt 2):473-9. 29. Weiskopf NG, Hripcsak G, Swaminathan S, Weng C. Defining and measuring completeness of electronic health records for secondary use. J Biomed Inform 2013. 30. Baldwin KB. Evaluating healthcare quality using natural language processing. J Healthc Qual 2008;30(4):24-9. Baker DW, Persell SD, Thompson JA, Soman NS, Burgner KM, Liss D, et al. Automated review of electronic health records to assess quality of care for outpatients with heart failure. Annals of Internal Medicine 2007;146(4):270-7. 32. Pakhomov SS, Hemingway H, Weston SA, Jacobsen SJ, Rodeheffer R, Roger VL. Epidemiology of angina pectoris: role of natural language processing of the medical record. Am Heart J 2007;153(4):666-73. 33. Chan KS, Fowles JB, Weiner JP. Review: electronic health records and the reliability and validity of quality measures: a review of the literature. [Review]. Medical Care Research & Review 2010;67(5):503-27. 34. Parsons A, McCullough C, Wang J, Shih S. Validity of electronic health record- derived quality measurement for performance monitoring. J Am Med Inform Assoc 2011. 35. Tu K, Mitiku T, Lee DS, Guo H, Tu JV. Validation of physician billing and hospitalization data to identify patients with ischemic heart disease using data from the Electronic Medical Record Administrative data Linked Database (EMRALD). Canadian Journal of Cardiology 2010;26(7):e225-8. 36. Owen RR, Thrush CR, Cannon D, Sloan KL, Curran G, Hudson T, et al. Use of electronic medical record data for quality improvement in schizophrenia treatment. J Am Med Inform Assoc 2004;11(5):351-7. 37. Chapman WW, Fizman M, Chapman BE, Haug PJ. A comparison of classification algorithms to automatically identify chest X-ray reports that support pneumonia. J Biomed Inform 2001;34(1):4-14. 38. Denny JC, Peterson JF, Choma NN, Xu H, Miller RA, Bastarache L, et al. Extracting timing and status descriptors for colonoscopy testing from electronic medical records. J Am Med Inform Assoc 2010;17(4):383-8. 39. Hripcsak G, Friedman C, Alderson PO, DuMouchel W, Johnson SB, Clayton PD. Unlocking clinical data from narrative reports: a study of natural language processing. Ann Intern Med 1995;122(9):681-8. 40. Melton GB, Hripcsak G. Automated detection of adverse events using natural language processing of discharge summaries. J Am Med Inform Assoc 2005;12(4):448-57. 41. Zeng QT, Goryachev S, Weiss S, Sordo M, Murphy SN, Lazarus R. Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system. BMC Med Inform Decis Mak 2006;6:30. 42. Jain NL, Knirsch CA, Friedman C, Hripcsak G. Identification of suspected tuberculosis patients based on natural language processing of chest radiograph reports. Proc AMIA Annu Fall Symp 1996:542-6. 43. Byrd RJ, Steinhubl SR, Sun J, Ebadollahi S, Stewart WF. Automatic identification of heart failure diagnostic criteria, using text analysis of clinical notes from electronic health records. Int J Med Inform 2013. 44. Goulet JL, Erdos J, Kancir S, Levin FL, Wright SM, Daniels SM, et al. Measuring performance directly using the veterans health administration electronic medical record: a comparison with external peer review. Med Care 2007;45(1):73-9. 45. Botsis T, Hartvigsen G, Chen F, Weng C. Secondary Use of EHR: Data Quality Issues and Informatics Opportunities. AMIA Summits Transl Sci Proc 2010;2010:1-5. 46. Treatment Guidelines from the Medical Letter. The Medical Letter, Inc


Leflunomide medac, inn-leflunomide

OMBRE DEL MEDICAME TO Leflunomida medac 20 mg comprimidos recubiertos con película 2. COMPOSICIÓ CUALITATIVA Y CUA TITATIVA Cada comprimido recubierto con película contiene 20 mg de leflunomida. Excipiente(s) con efecto conocido: Cada comprimido recubierto con película contiene 152 mg de lactosa (como monohidrato) y 0,12 mg de lecitina de soja. Para consultar la lista completa de excipientes, ver sección 6.1. 3.

World transport policy & practice

World Transport Policy & Practice Vol ume 4, Num ber 1, 1998 Abstracts & keywords Dutch Transport Policy: From Rhetoric to RealityGary Haq and Machiel Bolhuis Urban Transport and Equity: the case of São PauloEduardo A. Vasconcel os Sustainable Transport: Some challenges for Israel and PalestineYaakov Garb