Friday, December 16, 2011

All Hands - Physio-MIMI

Susan Redline et al.,

Described the next version of this Sleep-Repository/Analytic Warehouse called Hybrid. Described a Ruby on Rails implementation.

Reconciliation of data types across multiple ontologies is done by local mapping of member data sources into a data dictionary. The decision was explicitly made to allow transparency into these source data at the cost of having the investigator determine the validity of the mappings and their semantics for each of the data sources.

Report back from the Psychiatry DBP's

Perlis et al,

3000 blood samples so far on the bipolar project.

The identification of controls has turned out to be a big edge for EDGR as those are the most difficulty to ascertain in conventional cohorts (where the control for one study may be contaminated by disease for another study).

~1000 samples for the MDD study.

Friday, December 9, 2011

Multiple sclerosis

Xia et al., Reviewed NLP oddities from prior presentation (i.e. different predictive weights from dysarthria and slurred speech). Yet by manually collapsing terms into manually identified superconcepts did not seem to improve performance significantly.

Diabetes

Shaw et al., Reviewed NLP results for DM DBP. Seems that overall working well. Some descriptions of neuropathy are rarer than expected.

Crohn's and Ulcerative colitits

Ashwin et al., Discussed the AUC for UC and Crohn's with NLP and codified data. In this domain, once the cohort is roughly identified with the ICD-9 codes (low specificity), the NLP terms are much better than the codified terms in accurately identifying who has disease and who does not (as measured by AUC). Based on these analyses, it looks like PHS has 12000 (UC + Crohn's) of which about 3600 samples can be accumulated in one year.

Friday, December 2, 2011

Review NLP processes for our DBP's

Guergana et al,

Learning environment

1. Training/test set creation (labeling of data; currently using Knowtator (which itself used Protege 3.1))
2. Automated toolset (cTAKES + automated feature selection flow)
3. Loop 1-2

Photo

Friday, November 18, 2011

Inna Dubchak visit

Inna Dubchak from the LANL Joint Genome Institute gave an overview of her VISTA i2b2 docking project. This allows the VISTA i2b2 cell to provide analyses of the "meaning" of variants based largely on conservation data. Also gave an update on the rviewer tool

Inna Dubchak

Technical Futures

Shawn, Kohane, NIch, Mike, Zak, Susanne, Bickel Discussed several important minimal commitment abstractions useful for temporal reasoning and for suggesting related queries.

Friday, November 4, 2011

Diabetes Mellitus + Autoimmune NLP definitions for cardiovascular disease

Shaw, Liao, Plenge et al., discussed our common needs on cardiovascular disease. The question of how detailed to go as opposed to a broad definition was reviewed. The tendency was to go in both directions.

Discussing cardiovascular disease NLP

Distributed queries

Griffin Weber, Shawn Murphy, Bill Simons, Susanne Churchill, Zak Kohane Discussed latest national developments for SHRINE queries. Very significant support and enthusiasm from the AUG. Reviewed temporal query capabilities and roadmap.

Friday, October 7, 2011

Inflammatory Bowel Disease

Ashwin presented

Reviewed the NLP approach to date.

Showed the distinctions made based on terms of varying specificity. The NLP processing increased the accuracy as expected but probably by only 10% relative to the codified data. However the codified data alone had much less stable performance, probably because those are easier to overfit to local billing practice and demographics of the underlying hospital population.

Also reviewed when to start sample collection.Review of NLP for i2b2 IBD driving biological project

i2b2 Version 1.6 release preparations

Kohane, Churchill, Murphy, Mendis, Wattansin Last minute preparation for documentation and squishing any remaining bugs.

Saturday, September 24, 2011

DM/CVD

Stan Shaw, Tianxi Cai, Raoul, Shawn, Zak, Susanne et al.,

Discussed whether to expand the data mart to include other hospitals.

Reviewed study design for measures of inflammation.

Friday, September 23, 2011

SMART integration with i2b2 reviewed

Discussed how additional SHRINE influences future directions of i2b2. Then demonstrated the latest i2b2/SMART integration (quite impressive!).

Core1

Thursday, September 22, 2011

MDD

(minutes courtesy Caitlin)


Attendance: Jordan Smoller, Roy Perlis, Shawn Murphy, Susanne Churchill, Isaac Kohane, Tianxi Cai, Victor Castro, Alison Hoffnagle, Patience Gallagher, Sydney Weill, Caitlin Clements

Minutes:

-ICCBDà Refresh: Victor has performed a preliminary refresh of the data mart; however, this did not include the NLP cohort. Victor will finish NLP classification and will perform another refresh by the end of the following week that includes this data. The current updated numbers for the other cohorts may change a bit as well when he performs this second refresh; however, they will not change by much.

  • The numbers for the cohort sizes increased after the refresh, particularly because the group added McLean patients to the data mart (McLean patients must come to MGH/BWH for blood draws)
    • While the cohort sizes are relatively large, the number of people the group has actually collected (through Crimson) is very lowàabout only 10% of the possible cohort size
    • Cohort MRP/MRP-SV numbers are abysmal. PPV for these cohorts are getting worse and performing poorly. Could the group possibly run a simple regression for these cohorts? àNo, because the numbers are too low
  • Where does the group stand with projection?
    • The group is in the 4th year of our 5 year grant, so the group needs to figure out where we stand quickly.

Next Stepsà 1. Final refresh of data mart

                        2. The group needs to decide what we are going to do with the MRP and MRP-SV cohorts

StudyGroup

Cohort

ClassificationType

N

TP

FP

PPV

PPV 95% CI Lower

PPV 95% CI Upper

Case

95NLP

NLP+EMR

45

38

7

0.844

0.712

0.923

Case

NoNLP

EMR

26

24

2

0.923

0.759

0.979

Case

MRP

EMR

25

14

11

0.560

0.371

0.733

Case

MRP-SV

EMR

5

1

4

0.200

0.036

0.624

StudyGroup

Cohort

ClassificationType

N

TN

FN

NPV

NPV 95% CI Lower

NPV 95% CI Upper

Control

MDD

Advertising

11

10

1

0.909

0.623

0.984

Control

SCZ

Advertising

15

13

2

0.867

0.621

0.963

Control

Control

EMR

13

13

0

1.000

0.772

1.000

Friday, September 16, 2011

All hands meetings

Topic: Local i2b2-initiated federally funded spinoffs.

1.Kat Liao presented the RA DBP and the spinoff RA-CVD project. What is the relationship with risk for CAD/CVD risk in the general population (and the risk alleles) and that in RA. Our of 4453 in RA cohort (i2b2), 335 have CAD.

2. Jordan Smoller presented the Bipolar Disease NIMH grant. Reviewed the challenges of a very specific case definition. Claims codes alone seem to give positive predictive values (PPV) in the 20-50% range and the PPV of NLP is about 85% (with specificity of 95%).

3. Roy Perlis spoke about his R01 derived from the MDD-resistant-to-SSRI DBP. He is studying a MDD clinical and genetic risk risk algorithm/classifier. He is on the way to a 1500 population which represented 1/2 of all the drug response GWAS currently published in this particular domain?

4. Robert Plenge described the spinoff of the i2b2 RA DBP to the PGRN project. The project looks at genetic informants of treatment response and treatment toxicities. Involves Vanderbilt, Northwestern and Partners. Early result shows the portability of the NLP RA classifier.

All hands meeting

Kat Ra cvdSmoller on bipolar disorderRoy on MDD

Plenge on RA and PGRN spinoff

Review of drug exposure methodology

Given intermittent recording of drug exposure and clinical events, there is going to be a lot of censoring in the EHR data stream. This kind of data censoring is a methodological challenge and we reviewed various heuristics and comparisons that can be used to correct/account for this censoring. A lot of focus on modeling the frequency of observations.

Perlis et al review drug exposure methodology

Discussion of which pieces have to be locally vs centrally hosted

For the next generation of i2b2, we are trying to minimize how many cells have to be locally hosted. Even more crucially, how many cells have to be locally customized and supported and how many can be generic. In other words, can we or should we reposition any abstraction barriers?

Shawn looks at pieces

Local vs centrally supported

Friday, September 9, 2011

Multiple sclerosis DBP

Xia et al,

Very extensive discussion of randomization procedures in selecting patient samples.

Back for Summer: CVD DBP

Shaw, Cai, Churchill, Liao, Murphy, Kohane, Sordo, Savova

Reviewed the subsamples to see if we are getting the right DM rate.

Photo

Photo

Thursday, June 30, 2011

First Annual SHRINE meeting

After an illuminating keynote by David Blumenthal, we heard of SHRINE implementations at Harvard, West Coast, and in Europe (the latter embryonic) as well as large national registries (IBD and RA) powered by SHRINE. In the second day, with a kickoff keynote by Patrick Taylor, we addressed the thorny issue of the regulatory, ethical and institutional aspects of SHRINE data sharing. The video and slides from this meeting will be available shortly.

SHRINE conference

SHRINE conferenceSHRINE meeting

SHRINE conference

SHRINE conference

Patrick Taylor

First annual SHRINE conference

Shawn Murphy

SHRINE meetingSHRINESHRINE meeting report from Europe

Wednesday, June 29, 2011

i2b2 AUG in Boston

The first annual i2b2 meeting was kicked off today (6/28) at noon at Harvard Medical School. After my introduction, Shawn Murphy provided the current roadmap and answered specific technical questions. Robert Plenge described the genomic studies in rheumatoid arthritis, preliminary results regarding cardiovascular disease

Bill Adams described his HOME (Health Outcome Monitoring and Evaluation) cell for i2b2 and had the temerity to run a live demonstration. They developed 21 measures around healthcare outcomes (e.g. lipid control, smoking cessation) and implemented these within i2b2. There was a good sidebar conversation with Shawn Murphy of the role of the core i2b2 team in supporting these new developments.

Keith Marsolo described talked about chart review with i2b2 and the use of i2b2 as a registry (for liver transplant). He too ran a live demonstration.

The onco-i2b2 project was described by our colleagues from Unversity of Pavia.

Brian Wilson described a new data exchange infrastructure and message queues to support LIMS of various stripes as well as genomic annotation pipelines. He also bravely demonstrated his listener dipatching/queueing functionality.

There were then a flurry of very technical discussions which engaged the developer and user community in detail. By mid morning on Wednesday, we started to discuss the relevance the of the SMART project to i2b2 which independently is generating a large developer community.Annual AUG meeting

Annual AUG meetingAnnual AUG meetingAnnual AUG meetingAnnual AUG meetingAnnual AUG meeting

Friday, June 24, 2011

SHRINE, next generation functionality

Weber, Churchill, Kohane, Mendis, Murphy

Discussed how to augment the payload of SHRINE (for v2) to include 1 by 1 viewable limited data sets rather than merely aggregate counts to enable subject recruitment across multiple sites.

cTAKES web service implementation

Guergana demonstrated a prototype webservice for cTAKES (the NLP pipeline) which will be available soon.

SHRINE enhancements

Murphy, Kohane, Churchill, Simons, Weber, Mendis

Reviewed the differences between the XML payloads of i2b2 and the payloads of SHRINE and what additional calls to the underlying i2b2 instance would be required to support patient-by-patient limited data set access across i2b2 instances.

Friday, June 3, 2011

Developing the next generation of SHRINE functionality

Murphy, Kohane, Churchill, Weber, Simons, Bickel, Mendis,

Discussed the next set of functionalities in SHRINE, the distributed query system across i2b2. Right now in production it returns aggregate numbers of patients meeting search criteria. There are several additional next steps that will be reviewed at the SHRINE (6/29-6/30—Keynote day 1 David Blumenthal, Keynote day 2 Patrick Taylor) meeting following the AUG meeting in Boston.

Friday, May 27, 2011

A review of electronic health record-driven genomic research

I recently wrote a review published in Nature Reviews Genetics that discusses how electronic health recocrds can be used to drive genomics studies. Also includes a survey of international efforts.

Cardiovascular Disease DBP

Shaw, Cai et al.

The performance of the diabetes filter based on 7 variables was reviewed. It was surprisingly good. We then discussed what features NLP would be best applied to and the agreement was that medications would be the highest value in that regard.

I

Friday, April 29, 2011

Pre-competitive data sharing between pharma and with academia

As part of NIH's Public-Private Partnership Program, a meeting was held in Washington, DC (4/28/2011) that brought together many from industry (particularly the pharmaceutical companies) and academia around the idea of pre-competitive data sharing for opportunistic acceleration of discovery pipelines while remaining respectful of privacy and property. i2b2 serves as the foundational platform proposed for this effort, in its implementation within TransMart. It will remain to be seen whether the governance and competitive concerns can be managed successfully in this initiative triggered by Eric Perakslis.

Transmart meeting

Transmart meeting

Friday, April 8, 2011

Modeling Disease Activity in RA

Karlson et al.,

Discussed the current "standard of care" for disease activity (DAS28) for which there is a straightforward calculation. Can we estimate a highly correlated measure using NLP on the clinical record? This will require extracting temporal relations (e.g. activity AFTER anti-TNF therapy). With regard to the disease activity/state itself, we will start by identifying four states: Remission, Low, Moderate, HIgh (also Indeterminate if the note is unclear or too short) which will first have to be identified by our "gold standard" reviewers and annotations (using Knowtator).

Several machine learning approaches to predicting the DAS28 using NLP were reviewed.

Mission Hill Projects, Apr 8, 2011

Mission Hill Projects, Apr 8, 2011

Tuesday, April 5, 2011

Cerner adopts the i2b2 model

This is an interesting piece of news seen on a mailing list. We were surprised to see this announcement regarding Cerner using i2b2 for its clients. As we do not have any relationship with Cerner in this regard, the precise nature of the implementation and its license are not clear. Nonetheless, for those of us working on secondary use of healthcare data using i2b2, this is another signal that our community is growing.

---------------------------------------------------------------------------------------------------------------------

Cerner i2b2 Node - Enabling Research Illumination

A client session of this Illumination is scheduled for Wednesday, April 20, 2011 12:00 PM CT.

Description of Session

The Cerner i2b2 node is a Cerner-hosted solution that improves the performance of i2b2 and facilitates secure data collaborations across multiple institutions. I2b2 (Informatics for Integrating Biology and the Bedside) is a National Center for Biomedical Computing. The informatics initiative is funded as a cooperative agreement with the National Institutes of Health (NIH). Using the Cerner i2b2 node, clients can perform cohort discovery queries on a de-identified dataset, save the query results, access patient identifiers for the saved query with IRB approval, and partner with other participating organizations to share data for cohort discovery queries. The service uses proprietary tools to map client data and improve i2b2 performance. The Cerner-hosted model reduces client investment in IT infrastructure, support, and maintenance

Benefits:

• Data mapping and security capabilities improve performance of i2b2

• Cerner-hosted model facilitates inter-institutional research

• Eliminates costs related to data mapping, security, system maintenance and IT infrastructure

Registration

To register for the session, click the link below. You will need a Cerner.com user name and password.

https://applications.cerner.com/members/illuminations/IllumDetails.aspx?illumid=4158

Friday, March 25, 2011

NLP

Guergana Savova led a discussion on the annotation tool, with a focus on functional status (e.g. SF 36). Pete Szolovits provided ongoing commentary on the applicability to specific UMLS relations to these annotations.

Diabetes and CVD

Stan shaw led a discussion of inflammatory markers in the diabetes population.

i2b2 EAC

March 24th, External Advisory Committee came to Boston for a visit.

The EAC is constituted of:

Daniel Masys (chair), Vanderbilt University

Elmer Bernstam, University of Texas at Houston

Lisa Cannon-Albright, University of Utah

Peter Tarczy-Hornoch, University of Washington, Seattle

George Hripczak, Columbia University

NIH representative: Valerie Florance, NLM project officer

Very productive meeting with useful directional advice from the EAC.

i2b2 External Advisory Committee 2011

photo

i2b2 External Advisory Committee 2011

i2b2 External Advisory Committee 2011

Sunday, March 20, 2011

i2b2 All Hands Meeting

Eric Perakslis PhD presented his work on i2b2 in the J&J context and beyond with tranSMART. Interestingly, this i2b2 instance is hosted on the Amazon EC2 cloud. He had the temerity to run a live demo which worked flawlessly. Good Karma! Also the involvement of tranSMART with the Innovative Medicines Initiative in Europe with significant uptake by pharmaceutical industry (i.e. GSK/ECLIPSE, AstraZeneca, Pfizer, Novartis) was intriguing and encouraging.

His slide show is available here.

Other i2b2 business reviewed including the very success i2b2/NCBO tutorial last week at AMIA in San Francisco and the other SHRINE implementations that are occurring nationally.

Monday, February 28, 2011

Epigenetics

Shaw, Kohane, Kasif, Plenge, Churchill, Liao, Murphy, Savova

Extensive discussions about which epigenetics marks are best captured from clinical discards and under what hypotheses.

Friday, February 25, 2011

Autoimmune DBP

Savova reported that the new cTAKES pipeline is up and running against the production system.

Plenge reported on the state on the new RA datamart.

Liao has finished the chart reviews of a random sample of the 4500 RA patients. 10% had coronary artery disease. Of that 10%, 41% have definite CAD and 27% have probable CAD.

Feena provided an update on the sequencing project: 500 cases and 650 controls. Discussed a lot of the travails of de-"noising" and de-"batching" the variation found in the pooled samples (done without barcoding).

i2b2 Mapping Cell

Friday, February 4, 2011

Review of the annotation procedures for NLP (in context of RA-CVD and multiple sclerosis).

Present: Murphy, Savova, Szolovits, Liao, Cai, Xia, Gainer, Raoul, Kohane, Churchill

In a discussion led by Guergana Savova, we reviewed the Knowtator package that is built on top of PROTÉGÉ. The focus of the discussion was on the "squishy" and detailed assessments of patient functioning (e.g. "bending", "grasping", "doing housework" etc). We also discussed comparing the "bag of words" classification approach vs. the fine-grained sentence-level or concept-level annotation.FX PhotoStudio Image

Friday, January 28, 2011

i2b2 phenotyping challenges

Plenge, Bickel, Weber, Kohane, McGaw, Liao

Discussed the effortful process required to train a NLP filter for a given phenotype. We

Currently:

Step 1: Experts galore: (clinical, NLP, modeling/statistics, programmers)

Step 2: Build study-specific data mart

Step 3: Extract information

Step 4: Define "Gold Standards" from the extracted knowledge

Step 5: Build filter algorithm and apply to study at hand

Step 6: Not remember very well and loss of institutional memory regarding phenotypes

We reviewed several alternative, more fully automated models.

Auto-immune CVD

Friday, January 14, 2011

Autoimmunity and CVD

Karlson reviewed the i2b2/PGRN project on predicting response to disease modifying drugs in rheumatoid arthritis. This included adding functional status assessment to cTAKES. She articulated the first task is finding the best combination of structured and NLP-derived clinical variables that are predictive of disease activity. As an aside, Elizabeth noted that "Biologicals" have become first line thereapeutics in RA. Quite a change from what is in our EMR from 15 years ago.

Most of the session was a review of the use of the Knowtator tool on the rheumatoid arthritis corpus.

The remainder was focused on a discussion of the selection of the cohort with multiple sclerosis (MS) and a radiologically isolated precursor of MS (i.e. a radiologically made incidental diagnosis).

CVD and Diabetes

Present: Rich Grant, Stan Shaw, Rahul, Murphy, Vivian Gainer, Plenge

As the meeting was gathering, Robert Plenge asked the right question: Given how slow the NLP optimization process is, what can we do to make it faster? He drew an interesting analogy between the era of contigs, BAC's and YAC's that was boutique-like and the industrial whole genome process.

Discussed Richard Grants latest CVD algorithm which heavily uses the text of the clinician's problem lists (not the billing codes). This was implemented as a very large and complex logical predicate, rather than a probabilistic or statistical model.

Temporal queries

Present: Murphy, Weber, Kohane, McGow, Wilson, Mendis, Bickel

Reviewed Griffin's extensive proposal for a user interface. Batted around the implications for the underlying tables (while maintaining backwards compatibility).

Friday, January 7, 2011

Rheumatoid arthritis and PGRN

Discussed the NLP challenge for defining Low/Medium/HIgh Disease Activity Score (DAS).

CVD and Diabetes Mellitus

Shaw, Kohane Gainer, Liao, Churchill, Margarita, Murphy, Savova

Discussed the logistics of loading clinical notes on the 380K patients. Also discussed several control populations.