i2b2Blog: 2012

Sunday, December 9, 2012

Outcomes in dental medicine and i2b2

From our colleagues at UT Houston and the Harvard School of Dental Medicine, we just heard about a couple of interesting projects that leverage i2b2 for dental medicine. The most recent is one that studies patient safety in dentistry. Also i2b2 has been used to create a multi-institutional dental study. Hats off to Dr. Walji and colleagues.

Monday, December 3, 2012

New i2b2 plugins from U. Mass

The family relationship analysis tools seem particularly interesting.

See http://micard.umassmed.edu/software.html

Tuesday, November 20, 2012

i2b2 Annual Retreat with External Advisory Board

We met with our external advisory board that includes Dan Masys, Eric Perasklis, Lisa A. Cannon-Albright, Peter Tarczy-Hornoch, Valerie Florance and George Hripcsak.

Isaac Kohane reviewed i2b2 impact and questions for the EAB.

Kat Liao presented the RA DBP and the genetic risk score and it's nice superimposition on larger, more expensive, longer studies. She also described NLP algorithm portability (to NorthWestern U. and Vanderbilt). She also described interesting interactions between lipid levels, lipid-informing genotypes, RA risk genotypes and heart disease.

Ashwin N. Ananthakrishnan described the early work on the IBD DBP that is already accumulating samples and some intriguing early findings regarding vitamin D.

Zongqi Xia described the multiple sclerosis DBP which brings together EMR textual contents (through NLP), volumetric analyses (from clinical imaging of the CNS) and clinical outcomes.

Stan Shaw provided an overview of the challenge of characterizing and stratifying type 2 diabetes mellitus genetically and phenotypically. He also demonstrated how many of the biomarkers we are used to studying for cardiovascular risk are typically not studied longitudinally in patients akin to the virtual cohort

Tianxi Cai painted the road ahead. She articulated two questions of interest: who are the likely treatment responders and who are members of high risk populations? By way of example, she described investigations she has performed with conventional cohort studies (as a prelude to EHR studies) to define what is the incremental value of a biomarker (e.g. CRP). She demonstrated how subgroup analyses are clarifying with regard to highlighting the actual contribution of any specified biomarker to clinical decision-making.

Mark Natter spoke of the huge national investment in registries and the very large tail of ongoing investment on pre-existing registries. He also summarized the problem with existing registries. He summarized the CARRAnet i2b2 registry experience with over 8000 pediatric rheumatology patients (20K total) with 31,000 detailed visit records so far. Each investigator has full access to their own data and aggregate views across the entire registry's network.

Shawn Murphy gave a broad technical update on i2b2 and a glimpse of the future roadmap. This included the clinical trials capabilities, augmentation with SMART apps, episode of care analyses, and additional new modules.

The EAB then went into executive session.

i2b2 EAB Meeting 2012

Friday, November 16, 2012

Inflammation/Autimmune

Zongqi et. al. reviewed current manuscript draft on the MS cohort and computational and clinical estimates of brain volume.

We also reviewed the agenda for the upcoming External Scientific Advisory Board meeting and retreat.

Friday, October 19, 2012

All Hands Meeting: Bette Phimister.

Bette Phimister, Deputy Editor of NEJM discussed the challenges of being an editor of the New England Journal of Medicine and what the process towards publication looks like. The most vivid challenge described was that of assigning causality at the clinical level to variants discovered in genome-scale studies.

Friday, October 12, 2012

Inflammation DBP

Kat Liao reviewed the interaction between LDL genetic risks and RA genetic risks in the determination of lipid profile.

Type 2 DM and Coronary Artery Disease study

Stan Shaw led the discussion on what studies we will initially perform to validate the cohort relative to other previously studied cohorts.

Friday, September 28, 2012

DM DBP

Shan Shaw led discussion of early validation exercises (e.g. mortality prediction from various clinical characteristics)

This led to a discussion of the wholesale representation of the NLP concept derivatives of a data mart within the i2b2 database. This includes a discussion, for example, whether we should store modifiers with concepts or new compound concepts (that bury the explicit modifiers). The most important part of the discussion was the remarkably small contribution that negation seems to have to the performance of the NLP algorithms.

Sunday, September 23, 2012

Roy Perlis: All Hands Meeting on Major Depression and Treatment Resistance (genetic basis)

Why study genetics of TRD?

Heritability of TRD estimated to be ~60% (styled in the manner of Visscher.

The Challenge:

1) Lasagna's Law

2) Establishing TRD requires multiple treatment trials (costly).

During the closed i2b2 MDD TRD DBP:

a) Developed high specificity for extremes of response (specificity of 95%)

b) Studied changes in CNV burden and MDD/TRD. Results are still preliminary.

c ) Clinical Goal: Risk Stratification

d) Discussed CNV changes

Friday, September 7, 2012

DM and CAD DBP's

Liao et al.

Several new faces introduced (see photo below) and Kat gave a quick update on Coronary Artery Disease phenotyping/

Ashwin discussed depression, anxiety in IBD (sample size ~5000). About 20% of the patients were depressed and various co-variates we assessed (for example TNF use was not significantly correlated to depression).

Zongqi presented the characteristics of the Multiple Sclerosis in the MS i2b2 Virtual Cohort.

Wednesday, September 5, 2012

Summary and videos from the SHRINE Conference

Thanks to Katia and Kerry Ann Foley, we now have a site summarizing the very successful SHRINE meeting that was held this Summer at Harvard. It includes videos of all the presentations.

Friday, August 31, 2012

Write up of successful examples of research IT

For those of you in the i2b2 community, this piece in Science Translational Medicine may be obvious. I am pretty sure it's not an obvious conclusion to all (particularly some of the vendors) so you might want to read it for reference.

Wednesday, July 25, 2012

SHRINE National Meeting

Mark Overhage, formerly of the Regenstrief Institute and now of Siemens gave the keynote. He emphasized what can be done today with Big Data and how much of it can (and cannot) be extracted from observational data from healthcare. He cast the challenge of analysis in the context of accountable care. He also reviewed how the coarse grained guidelines often can run counter to the best population-evidence-based decision models.

Keith Marsolo described the comprehensive efforts at CCHMC to harness healthcare data to improve quality and then how he has employed the SHRINE infrastructure to allow for large inter institutional registries (particularly in Inflammatory Bowel Disease).

Mark Natter described the value of large registries and illustrated how to create these cost-effectively using SHRINE for the CARRAnet pediatric rheumatological disease registry.

David Ortiz then described the technical details of the self-scaling system that implements for the CARRAnet SHRINE.

Rebecca Miksad described her provocative study of IBD, colon cancer and diabetes using SHRINE.

Lisa Dahm described the very impressive 11 million patient UCReX SHRINE Implementation that is currently likely the largest near-real-time clinical patient data resource.

John Hutton summarized what it takes for an institution like Cincinatti Childrens Hospital Medical Center to participate in the eMERGE network (along with Boston Children's Hospital) using the SHRINE mechanism as a means to provide real-time phenotyping for genomic studies.

Day 2 SHRINE Meeting

There was a highly productive conversation about what it will take to scale SHRINE for clinical trials nationwide. We had a longer agenda but there was so much interest in the regulatory implications, in the different network topology, different loci of control, different levels of identification, different models of user registration, different models of reduplication of patient identities.

Tuesday, July 24, 2012

i2b2 AUG

Attendance was as good (slightly better) than last year despite this date being in the midst of the Summer doldrums.

I spoke about the positive disruptive effect i2b2 was having on data sharing, code sharing, clinical domain script sharing (e.g. upload from ETL), and even electronic health record functionality.

Shawn Murphy brought us up to date on temporal reasoning, text searching and other 1.7 features.

Mike Mendis reviewed new infrastructure components (Jboss 7, Bamboo etc)

The parallel track of poster sessions was well attended.

Ulrich Sax gave an impressive review of al the i2b2 activity occurring in Germany

AUG i2b3

Sebastian Mate described and then demonstrated(!) the i2b2 install wizard.

Andrew Post from Emory described the use of i2b2 for quality improvement, temporal abstraction, the creation of individual i2b2 instances in the AWS cloud, predictive modeling and the Minority Health Grid.

Michael Buck, NYC Dept of Public Health described relevant efforts in the Query Health effort. He provided interesting insights in the QDM data model fro the National Quality Forum and how it fits into the i2b2 ontology. These codes (now represented in the i2b2 ontology) will determine the operational definitions of the characteristics of phenotypes that are linked to quality outcomes (and therefore reimbursement).

Day 2:

Shawn Murphy introduced the next step of SHRINE, SHRINE-CT (CT = Clinical Trials) to enable a distributed mechanism of identifying patients meeting criteria for trials and then recruiting them. This is important because so many patients "have" a pathology and yet do not meet criteria for a trial. SHRINE-CT allows you to (with the right credentials and IRB permissions) browse across multiple patient populations to find the "right" subjects for the clinical trial. He also spoke about the SMART EMR view and other mechanisms for community contributions.

Nich Wattanasin gave us the details on his work on making SMART work atop of i2b2 and the workflows that this enables.

Lori Philips described the ontology mapping cell and focused on the complexity of mapping one large terminology to another (e.g. ICD9 <-> ICD10).

Brian Wilson from Tufts University described the GARLIC genomic integration cell that includes annotation, search (by variant ontology) and display.

Friday, June 29, 2012

Reproducibility of i2b2 NLP selection of RA patients across multiple academic health centers

A very nice result of i2b2, Vanderbilt and Northwestern teams with remarkable reproducibility (much better than I have seen for inter-expert variability in other studies) of the automated natural language processing-driven selection of patients with Rheumatoid Arthritis.

Tuesday, June 26, 2012

i2b2-based Registry for Juvenile Rheumatoid Arthritis

This article: http://jamia.bmj.com/content/early/2012/06/24/amiajnl-2012-001042.full.pdf+html describes what appears to be the largest U.S. registry for pediatric rheumatoid arthritides. It uses a multi-site SHRINE query system atop dozens of i2b2 instances representing most large sites in this country.

Friday, June 15, 2012

Pharmacoepidemiology

Dan Solomon gave a very comprehensive overview of the art and science of pharmaco-epidemiology in our monthly all-hands meeting. Special emphasis on the particular problems of using EHR data.

Thursday, June 14, 2012

SHRINE (Distributed Queries Across Hospitals) for the Study of Peripartum Cardiomyopathy

Recently published in Nature is a study entitled "Cardiac angiogenic imbalance leads to peripartum cardiomyopathy." On close review we were pleasantly surprised to see:

Retrospective analyses of PPCM and pre-eclampsia in the Harvard teaching hospitals were performed using the Harvard Shared Health Research Information Network (SHRINE)[ref], a de-identified repository of aggregate patient information.

This is all the more exciting because none of the i2b2 core team were aware of this study that used this distributed query mechanism across multiple health centers. There are now a half doze SHRINE network operational nationwide (some of which go coast to coast) and so we anticipate that there will be many more such unanticipated results.

Friday, May 25, 2012

Defining cardiovascular disease phenotypes

Shaw, Liao et al.

Non-fatal MI, cardiac revascularization procedures, CHF, all CVD death

Defining DM

Cai et al.

In the data mart 314,292 patients that may have DM

Training sets had more than 2 notes and test set was not thresholded in the same way. That explains why our first iteration of the algorithm was so unstable!

Friday, May 11, 2012

Diabetes

Report on the diabetes data mart:

13,000,000 notes automatically read by the NLP engine and parsed into SNOMED. That's BIG DATA! This is perhaps the largest phenotypically detailed such database about diabetes.

Identity Management

Discussed a variety of techniques to include different quality master patient indices unified with the identity system within i2b2.

Friday, May 4, 2012

Multiple Sclerosis DBP & Inflammation

Zongqi Xia led a discussion of his projects regarding 1) EHR correlates of MS disease activity 2) Pre-symptomatic MS 3) Pharmacogenomics 4) MS comorbidities.

The discussion was the most extensive around how to evaluate the functional status.

Friday, April 27, 2012

DM--> Cardiovascular disease

Guest Visitor: Allison Goldfine Regulars: Including Shaw, Murphy, Churchill, Kohane, Cai Reviewed the current state of the clean up of the DM database so that we can answer some very pressing and important questions regarding cardiovascular disease and cardiovascular mortality.

Security Policy and Identity Management

Churchill, Kohane, Murphy, Wattanasin, Bickel, Simons Discussed different configurations of identity management to support a range of solutions of i2b2 database that are along the spectrum of identified to de-identified.

Friday, April 20, 2012

Review of methodology

Friday, March 16, 2012

All Hands Meeting

Cynthia Morton, PhD spoke about the DGAP project http://www.bwhpathology.org/dgap/

Among the fascinating cases she presented was that of Brenden Adams.

Also discussed the aCGH studies of the microdeletions found in the DGAP. Most of these arrays are used for neurodevelopmental screening and they found single gene insufficiency in several of these cases.

Reviewed the amusingly horrific nomenclature demands for mutations that might arise from massively complex genomic rearrangements.

Friday, March 9, 2012

Diabetes DBP

Discussed the analysis of various inflammatory markers.

cTAKES discussion

Savova et al.,

Discussed tighter i2b2 integration with cTAKES. Discussed representation of modifiers and how to communicate to the community which subset of which standardized ontology was used for each term. Pei demonstrated a very impressive preliminary integration.

Inflammatory Bowel Disease DBP

Ashwin et al.,

Discussed policy and security details about joining an existing registry who have prospectively consented to an identified study with a data mart of IBD patients that contains those very same patients.

Reviewed NLP performance with ~800 gold-standard annotated (399 Crohn's and 378 Ulcerative Colitis) and 201 controls gold-standard annotated (each with approximately 8 years follow-up).

AUG meeting and other planning

Kohane, Churchill, Murphy, Weber, Mendis, Bicket et al.,

Discussed the agenda for this Summer's AUG and at next week's TBI Summit at AMIA.

Discussed impending very LARGE implementations of SHRINE (independent of this core group).

Friday, February 24, 2012

NLP and i2b2 Status Report

Guergana led a discussion on the current state. Total team is currently 7-9 individuals (depending on how directly they are involved for a given project).

Currently, in i2b2, NLP is cast as a supervised classification task that is applied to patients that are filtered according to various criteria (e.g. lab values, ICD-9 codes). Domain experts typically annotate (extensively) 100+ charts as part of the gold-standard (used for supervised learning). A subset of these annotations are done twice (at least) to assess inter-annotator agreement.

The cTAKES has processed 28M documents at Partners Healthcare System to date.

Reviewed recent results with IBD (NLP alone almost as good as NLP+Codified data), Multiple Sclerosis (NLP helped but the combination of NLP + codified is significantly better).

Finding the timing of the myocardial infarction

Liao, Shaw, Tsai, Kohane, Churchill, Savova (Murphy absent)

Discussed various ways to annotate the timing of heart attacks of heart attacks. Did a little review of the temporal utility package.

New version of Excel/tab-delimited file i2b2 Export plugin

This can be found here: https://community.i2b2.org/wiki/display/ExportXLS/ExportXLS+Home

Congratulations to the Pavia and U. Mass teams.

Friday, February 17, 2012

All Hands Meeting-Aurel Cami

Aurel Cami described the work he performed under an NLM K99 award, that was recently published in Science Translational Medicine. He used safety information on existing drugs up to 2005, chemical properties of the drugs and taxonomic characteristics of the drugs to create a network predictive model of adverse events in 2010. This model achieved accuracies in the 80% range.

Governance models

We met this morning to discuss several alternative governance models. Given the spreading use and an increasingly vibrant community, we will be developing an even more pro-active and long-term plan to support i2b2's continued growth. Details to be presenting at the Summer i2b2 AUG meeting.

Friday, February 10, 2012

Topic models and unsupervised NLP

Very nice presentation from Tim Miller today about work at i2b2 on this.

Review of Adaptive Lasso

Professor Cai reviewed for us the Adaptive Lasso function. She also provided the framework to understand the similarities between various machine learning formalisms (e.g. Hinge Loss function and SVM)

Also discussed the fundamental bias vs variance tradeoff. Approximation error (i.e. from selection from the feature space). Variance (i.e. sample error). If you have lower approximation error, for example, you can afford a high sample error (and vice versa).

Inflammation (DM,CAD,CVD)

Kat, Stan et al.,

Stan takes a birds eye view of the various views of cardiovascular morbiditird of our various cohorts. Such as non-fatal MI, non-fatal strokes, hospitalization for heart failure, cardiovascular death (fatal MI, fatal stroke), peripheral vascular disease), revascularizartion.

SHRINE & i2b2

Murphy, Churchil, Kohane et al

Reviewed the next steps in disseminating SHRINE and necessary architecture additions.

Friday, January 27, 2012

DM

Shaw walked us through the NLP analysis. Using a gold standard of 391 patients. Saw extremely good performance where NLP + codified data clearly domination NLP or codified data alone. Area Under the Curve was at least 98%. Quite remarkable.

Friday, January 6, 2012

HAPPY NEW YEAR 2012 - Both DBP Meeting

Shaw, Murphy, Plenge, Kohane, Karlson, Prilutsky, Doshi, Szolovits, Liao, Churchill, et al.

Reviewed the NLP on the cardiovascular data mart (340,000 patients).

Reviewed what is the right sample prep for our new studies (tending towards spinning down and storing the buffy coat and plasma fraction sepeartely and -70).

Kat presented results on PheWAS on the autoimmune panel.