Friday, February 24, 2012

NLP and i2b2 Status Report

Guergana led a discussion on the current state. Total team is currently 7-9 individuals (depending on how directly they are involved for a given project).

Currently, in i2b2, NLP is cast as a supervised classification task that is applied to patients that are filtered according to various criteria (e.g. lab values, ICD-9 codes). Domain experts typically annotate (extensively) 100+ charts as part of the gold-standard (used for supervised learning). A subset of these annotations are done twice (at least) to assess inter-annotator agreement.

The cTAKES has processed 28M documents at Partners Healthcare System to date.

Reviewed recent results with IBD (NLP alone almost as good as NLP+Codified data), Multiple Sclerosis (NLP helped but the combination of NLP + codified is significantly better).

Finding the timing of the myocardial infarction

Liao, Shaw, Tsai, Kohane, Churchill, Savova (Murphy absent)

Discussed various ways to annotate the timing of heart attacks of heart attacks. Did a little review of the temporal utility package.

New version of Excel/tab-delimited file i2b2 Export plugin

This can be found here: https://community.i2b2.org/wiki/display/ExportXLS/ExportXLS+Home

Congratulations to the Pavia and U. Mass teams.

Friday, February 17, 2012

All Hands Meeting-Aurel Cami

Aurel Cami described the work he performed under an NLM K99 award, that was recently published in Science Translational Medicine. He used safety information on existing drugs up to 2005, chemical properties of the drugs and taxonomic characteristics of the drugs to create a network predictive model of adverse events in 2010. This model achieved accuracies in the 80% range.

Aurel Cami

Governance models

We met this morning to discuss several alternative governance models. Given the spreading use and an increasingly vibrant community, we will be developing an even more pro-active and long-term plan to support i2b2's continued growth. Details to be presenting at the Summer i2b2 AUG meeting.

Friday, February 10, 2012

Topic models and unsupervised NLP

Very nice presentation from Tim Miller today about work at i2b2 on this.

Review of Adaptive Lasso

Professor Cai reviewed for us the Adaptive Lasso function. She also provided the framework to understand the similarities between various machine learning formalisms (e.g. Hinge Loss function and SVM)

Also discussed the fundamental bias vs variance tradeoff. Approximation error (i.e. from selection from the feature space). Variance (i.e. sample error). If you have lower approximation error, for example, you can afford a high sample error (and vice versa).

Inflammation (DM,CAD,CVD)

Kat, Stan et al.,

Stan takes a birds eye view of the various views of cardiovascular morbiditird of our various cohorts. Such as non-fatal MI, non-fatal strokes, hospitalization for heart failure, cardiovascular death (fatal MI, fatal stroke), peripheral vascular disease), revascularizartion.

DBP-inflammation

SHRINE & i2b2

Murphy, Churchil, Kohane et al

Reviewed the next steps in disseminating SHRINE and necessary architecture additions.