Friday, December 16, 2011

All Hands - Physio-MIMI

Susan Redline et al.,

Described the next version of this Sleep-Repository/Analytic Warehouse called Hybrid. Described a Ruby on Rails implementation.

Reconciliation of data types across multiple ontologies is done by local mapping of member data sources into a data dictionary. The decision was explicitly made to allow transparency into these source data at the cost of having the investigator determine the validity of the mappings and their semantics for each of the data sources.

Report back from the Psychiatry DBP's

Perlis et al,

3000 blood samples so far on the bipolar project.

The identification of controls has turned out to be a big edge for EDGR as those are the most difficulty to ascertain in conventional cohorts (where the control for one study may be contaminated by disease for another study).

~1000 samples for the MDD study.

Friday, December 9, 2011

Multiple sclerosis

Xia et al., Reviewed NLP oddities from prior presentation (i.e. different predictive weights from dysarthria and slurred speech). Yet by manually collapsing terms into manually identified superconcepts did not seem to improve performance significantly.

Diabetes

Shaw et al., Reviewed NLP results for DM DBP. Seems that overall working well. Some descriptions of neuropathy are rarer than expected.

Crohn's and Ulcerative colitits

Ashwin et al., Discussed the AUC for UC and Crohn's with NLP and codified data. In this domain, once the cohort is roughly identified with the ICD-9 codes (low specificity), the NLP terms are much better than the codified terms in accurately identifying who has disease and who does not (as measured by AUC). Based on these analyses, it looks like PHS has 12000 (UC + Crohn's) of which about 3600 samples can be accumulated in one year.

Friday, December 2, 2011

Review NLP processes for our DBP's

Guergana et al,

Learning environment

1. Training/test set creation (labeling of data; currently using Knowtator (which itself used Protege 3.1))
2. Automated toolset (cTAKES + automated feature selection flow)
3. Loop 1-2

Photo