Friday, October 30, 2009

Widening the Use of Electronic Health Records Data for Research

Wisconsin North, Oct 30, 2009

A symposium hosted by the National Center for Research Resources (NCRR). Louise Ramm, Deputy Director of NCRR provided framing challenges and welcome. Zak Kohane introduced use cases, sources of and reviewed the false dichotomy between health-record based research and clinical trials..

Gary Gibbons provided a perspective of disease in the African-American population as an exemplar of a complex orphan disease in the sense that like rare orphan diseases, it is understudied and insufficiently treated. He also pointed out how in many parts of the country underserved minorities are located away from the academic health centers that have made the most inroads in the use of electronic health records.. Therefore, institutions such as Morehouse School of Medicine have their work cut out for them (and not a lot of resources) to integrate data from a large number of only lightly affiliated practices. That same challenge presents an opportunity to be even more impactful in an orphan disease of epidemic quality. Professor Gibbons also urged a broadening of the captured context beyond what is conventionally captured in a standard (brief) healthcare visit. Environmental variables that are highly penetrant, much more so than many genomic markers are poorly captures. He concluded by reviewing the current compelling information about pharmacogenomic differences, and population genetic risks and also the wide holes in our knowledge of these as they pertain to various groups within the USA.

Andrew Auerbach from UCSF addressed comparative effectiness research and its translation into "Health system innovation research" Described how much can be done with charge data, and how additional codified data types (e.g. medications) can further improve the quality of that data. Closed with a discussion of how the various stakeholders in using EHR data for research (.e.g NIH, Payors, health systems leaders, physicians, and patients) might be well aligned or not. Put us on notice that IRB's are unfamiliar about distributed query systems and/or grids and this is becoming at last an obstacle for many CER studies. Summarized several use cases such as optimal length of treatment of pneumonia? Can a patient-focused discharge checklist reduce risk for readmission?

Wisconsin North, Oct 30, 2009
Robert Plenge described his use of i2b2 and electronic health records for genotypic research and discovery of endophenotypes.
John Brownstein reviewed non-traditional public health research using institutional data and non-traditional, non-institutional healthcare data extraction and analysis.

Wednesday, October 28, 2009

i2b2 Academics Users Group—Natcher Building, NIH

Zak Kohane summarized some of the new i2b2-based projects that recently were announced including a South Carolina consortium (a GO grant funds the automated consent component), a pediatric rheumatology research network including 60 sites (NIH GO grant), Shawn Murphy is working with an imaging consortium to better integrate images into i2b2 instances (CTSA administrative supplement) using XNAT/BIRN infrastructure, and B.U. and U. Mass received a GO grant to study health disparities using i2b2 as infrastructure. Lynn Bry received a 2 year R01 under ARRA for the Crimson-i2b2 integration project.

Susanne Churchill welcomed the group and noted that there is a consortium of European i2b2 users/implementors/refiners that are putting together a proposal for joint work across the European Union (for EU funding). She noted that the AUG now numbers over 100 and represents 30 healthcare/academic institutions including 5 internationally.

Shawn Murphy summarized several new developments including

  • Release Candidate 1.4 to support the Enterprise including: analysis views, improved role-based access and auditing, replacement of Gridsphere with webservices and AJAX client, Microsoft Active Directory integration, obfuscation of results for privacy purposes.
  • In the future, the distributions of i2b2 be via configurable VM's so that the functionality can be attached to local databases without requiring a full install. The full source code compile and install will still be available.
  • Eclipse plug-in "store" where developers can contribute their own plug ins and where users can download the plugins they want.
  • More support for derived data (e.g. to systematically return NLP concepts derived from the clinical notes

Lynn Bry described the Crimson system and how i2b2 discarded sample have far higher utilization rates than other samples (e.g. for biorepositories). She also described the Sample Ontology and how she is borrowing from WHO and SNOMED to standardization. The system includes an ontology manager to allow local ontology management and update samples. Also described are the IRB permissions data from RC 1.4. Will work towards multisite studies within the two year implementation time frame. Lynn described the Enterprise Master Specimen Index that tracks samples and patient relationships in various levels of identity (consented identified, de-identified, and anonymous samples).

Dan Housman and Peter Emerson from Recombinant were invited into the AUG for the discussion segment about sample management (because of the AUG's wish to keep companies at arm length) to discuss their own efforts in sample management. They made it clear that all their developments they are involved in will be contributed back to the i2b2 community as fully open source code.

Andy McMurry summarized the status of the distributed querying system called SHRINE that is now implemented at several Harvard-affiliated hospitals and several West Coast academic health centers (e.g. UCSF and UW) that is now fully IRB approved (at Harvard) for queries returning aggregate numbers (across demographics, laboratory results, medications and diagnoses). Andy also made a very clear several technical hurdles that were overcome including the ontology matching process (on the fly). Finally, he announced the availability of SHRINE code in a fully open source codebase.

i2b2 CICTR presented by Nick Anderson. They have been able to query muiti-institutional "anonymized PHI data". Application is in diabetes and cardiovascular disease. Described technical, governance, ontology and evaluation process that CICTR is driving. Nick described the heterogeneous systems that CICTR has to query across. Nick distinguished the need for high level institutional support which is a sine qua non requirement for success and the need for a broad range of paid technical personnel.

Keith Marsolo from Cincinnati's Children's reported on their Epic roll out and how that relates to their i2b2. Described their quality assurance efforts. Notes the challenge of the firehose and makes the acute observation that most investigators just want a spreadsheet and anything more complicated than that tends to get ignored. Keith also emphasized their goal to allow streamlined adding of research data to the clinical data. He makes the important point that "age at FACT" is essential for pediatric applications to allow them to be easily accessed in the i2b2 workbench. Keith mentioned using i2b2 for research databases for Eosinophilic esophagitis, and IBD.

Phil Reeder from UT Houston talked about medications mapping. It is a challenge and they have chosen to map to RxNorm and then manually had to map into SNOMED CT (perhaps their database was out of date). Started from an All Scripts database and had a semi-automated process. Notes that every year there are at least a 1000 new drugs (i.e. different packaging, pill sizes etc). Brought up the thorny (and annoying IMHO) of the proprietary mappings to standard vocabularies.

Ralph Zottola and Edward Westrick described the effort at U. Mass (data sourced from Meditech system, REDcap EDC, biorepository, EMPI, Allscripts, and departmental systems) where they are up to 2,000,000 patients. Edward described the managed care network (1000 physicians) that plugs into U. Mass and how quality measures inform the discussions and bargaining with payors. Reviewed different measures including HEDIS, patient experience, was well as the increasingly important Relative Resource Utilization (Efficiency). Demonstrated how knowing what is going on in the healthcare institution allows for a sober and leveraged discussion with payors. The healthcare system approached the medical school and settled on i2b2 and they already have seen that they can accurately forecast their performance and to provide a feedback loop (with financial incentives) to healthcare providers. Ralph pointed out that the fact that clinical operations are using i2b2 is also causing an improvement in the quality of the data being delivered to the data marts.

Iain Sanderson and Jihad Obeid. Iain started by describing a very comprehensive Informatics Initiatives in South Carolina. They have a unified IRB with a goal of clinical trials across the state. There is both a scientific and a funding motivation in this. There are three informatics initiatives have dovetailed (CTSA biomedical informatics, HSSC IT business plan, and a GO grant on consent). This has resulted in the South Carolina Integrated Platform for Research (SCIPR) that uses i2b2 for the clinical research data warehouse. In the process they are adopting a wide range of open source solutions including Sun Microsystems' JavaCaps. Iain reports that the data sharing agreements between the 6 centers across HSSC are under way and likely to result in an MoU in short order. Iain also described the beginnings of the consent management/gathering system, the permissions ontology and documenting the different consent processes at the institutional members of the HSSC. Finally, Iain discussed how personal patient health portals may be used to provide the patient-facing part of the network.

Bethesda, Oct 28, 2009BOS, Oct 28, 2009BOS, Oct 28, 2009Bethesda, Oct 28, 2009

Friday, October 23, 2009

Rheumatoid Arthritis

Plenge et al.,

3/4 of the genotyping is completed.

Manuscript to be submitted today regarding phenotyping accuracies.

What's next

Discussed what might be the priority areas for i2b2 in Core 2 for the competitive re competition.

The RFA has not yet be announced so the discussion was, of necessity, wide-ranging.

Major Depressive Disorder (and Bipolar Disease)

Smoller, Perlis, et al.,

Discussed the challenge of merging note-level NLP conclusions to wholistic patient evaluations.

Reviewed various Bayesian and (more broadly) machine learning approaches to defining which notes contribute most to the accurate classification of the patient phenotype.

Major Depressive Disorder: 10/16/09

Courtesy Patience Gallagher

Minutes:

Collapsing terms

o   McLean admission à change to psych admission

o   List of terms that Jordan gave Margarita that we didn’t necessarily annotate, Margarita will add after.

o   Terms that have the same regular expression will be filtered at the end

o   Medication category: fga

  • We annotated as IW bipolar, but last time we said was CW, which one is it?
  • Sergey says right now doesn’t matter, because the computer will decide
  • We will keep it labeled as IW

o   Li/vpa/lamtical à change to mood stablilizer

  • Add term mood stabilizer(s)

o   Neuro/cognitive impairment includes

  • Confused and disoriented
  • Gross cognitive impairment
  • Significant cognitive deficits
  • dementia

o   Agitation includes

  • Hyperactivity
  • Hyper
  • pacing

o   dx depression includes:

  • depressive disorder
  • dsythymia

o   bipolar disorder includes:

  • dx bipolar disorder
  • 296.0, .1, .4, .5, .6, .7
    • .2, .3 are MDD, so we cannot say 296.x = BPD

o   Inappropriate behavior category, leave in:

  • Wanting to disrobe
  • Inappropriate sexual contacts (do not put with excessive pleasurable activities)

o   Rapid cycling

  • 4 or more per year

o   Mania will include

  • Category now called “mania/manic”
  • Category now called “manic episode”
  • Cycling, cycles (not specified as rapid/less than 4 per year)

o   Typos will include variations on spellings of:

  • Grandiose
  • distractibility

o   “loose associations” will be its own category

o   Depressed episode à put with hx depression

o   “unable to read, study, concentrate” à put with distractibility

o   “tx bipolar disorder” à add in term “rx”

o   Hypomania will be own category

o   hyper mood à mood elevation

o   Delete:

§ Line 310: dx substance abuse, schizophreniaParanoid schizo-affected disorder

§ Out of control

Big group: 9:30-10:30

·       Protocol in the grant

  • Are we doing what we said in the grant? For ex. Step #3
    • In grant, describes that classic way of doing NLP
    • You come up with list of terms from your head without looking at notes.
    • Then expand these terms, add regular expressions, negation
  • Our method
    • Is a hybrid of the 2. We annotated notes, but we also came up with a list of terms
    • Annotate notes, generate list of terms, create regular expression, group, add in a priori terms, Tianxi goes through them and takes out terms that don’t matter (lasso), then feed to the computer
    • The IRB will not really care that this is slightly different.
  • Which method is better?
    • Unknown
    • Classic model relies on what’s in your head, our method relies on # notes reviewed
    • We could try to compare, but it’s a little too late because our list of terms wasn’t blind to the notes
  • Dr. Savova ) will be coming on boad as NLP team lead
    • Would be good for her to give mini lecture about NLP so we are prepared to present at conferences
    • Should consult with her about project
  • Negation
    • So far Margarita has only added negation to some terms
    • Should modify algorithm so that every term also has corresponding negation term
    • Negation terms are very common in psych notes
    • Sergey to see if this can be done, although a lot of work
    • Would be useful for future i2b2 projects
    • Margarita explained that she plans on conducting a small validation study
      • For a particular patient, she can provide an idea of how strong the diagnosis we assign (e.g. BP2) is by showing what are the arguments for/against placing a person in a particular class

·       Next steps:

o   Jordan, Vivian, Margarita, Sergey, Victor, Roy to meet on Tuesday, October 20th 10:00 – 11:00 to finish collapsing term.

Friday, October 9, 2009

Rheumatoid Arthritis

Plenge et al,

384 genotypes of risk alleles from prior studies.

A subset of the genes will shortly also have their exons resequenced.

Discussed adding antibody studies to the RA datamart. As a preliminary to adding the SNP and other data sets.

Major Depressive Disorder

Perlis, Smoller, Iosifescu et al.,

Discussed the new privacy guidelines that come with the new Recovery Act. This may make it harder for those of us who are trying to advance medical science and it is unclear if it will slow down those who are trying to commercially exploit patient data. What it is certain is that it will increase the workload/opportunities for lawyers specializing in medical privacy.

Victor reported that he is running Tianxi's logistic regression model of the NLP features on the 2M notes in the MDD data mart.

Discussed the cross-DBP NLP challenges worth investing some hardening on. So far this includes cigarette smoking, alcoholism, obesity, and other substance abuses.

Open source

Discussed planning for going from the current open source license to open source community support.

Friday, October 2, 2009

MDD

Perlis, Iosifescu, Smoller et al,

Reviewed claims data vs (claims data + NLP) and saw very large increase in AUC for treatment response.

Also discussed recruitment rates with much tighter inclusion criteria.

i2b2 AUG prep

Shawn, Diane, Zak, Susanne, Griffin

Discussed provisioning new DBP's

Discussed the GO grants that other sites have obtained using i2b2 infrastructure.