i2b2Blog

Friday, January 27, 2012

DM

Shaw walked us through the NLP analysis. Using a gold standard of 391 patients. Saw extremely good performance where NLP + codified data clearly domination NLP or codified data alone. Area Under the Curve was at least 98%. Quite remarkable.

Friday, January 6, 2012

HAPPY NEW YEAR 2012 - Both DBP Meeting

Shaw, Murphy, Plenge, Kohane, Karlson, Prilutsky, Doshi, Szolovits, Liao, Churchill, et al.

Reviewed the NLP on the cardiovascular data mart (340,000 patients).

Reviewed what is the right sample prep for our new studies (tending towards spinning down and storing the buffy coat and plasma fraction sepeartely and -70).

Kat presented results on PheWAS on the autoimmune panel.

Friday, December 16, 2011

All Hands - Physio-MIMI

Susan Redline et al.,

Described the next version of this Sleep-Repository/Analytic Warehouse called Hybrid. Described a Ruby on Rails implementation.

Reconciliation of data types across multiple ontologies is done by local mapping of member data sources into a data dictionary. The decision was explicitly made to allow transparency into these source data at the cost of having the investigator determine the validity of the mappings and their semantics for each of the data sources.

Report back from the Psychiatry DBP's

Perlis et al,

3000 blood samples so far on the bipolar project.

The identification of controls has turned out to be a big edge for EDGR as those are the most difficulty to ascertain in conventional cohorts (where the control for one study may be contaminated by disease for another study).

~1000 samples for the MDD study.

Friday, December 9, 2011

Multiple sclerosis

Xia et al., Reviewed NLP oddities from prior presentation (i.e. different predictive weights from dysarthria and slurred speech). Yet by manually collapsing terms into manually identified superconcepts did not seem to improve performance significantly.

Diabetes

Shaw et al., Reviewed NLP results for DM DBP. Seems that overall working well. Some descriptions of neuropathy are rarer than expected.

Crohn's and Ulcerative colitits

Ashwin et al., Discussed the AUC for UC and Crohn's with NLP and codified data. In this domain, once the cohort is roughly identified with the ICD-9 codes (low specificity), the NLP terms are much better than the codified terms in accurately identifying who has disease and who does not (as measured by AUC). Based on these analyses, it looks like PHS has 12000 (UC + Crohn's) of which about 3600 samples can be accumulated in one year.