Friday, September 25, 2009

Rheumatoid Arthritis

Kat Liao et al.,

Reviewed our edits to the manuscript that Kat is pulling together. Discussed the underlying hypothesis.

MDD

Smoller, Perlis, Iosifescu et al.,

Tianxi showed that the incremental value of billing codes to NLP'd characterizations of depressed patients was minimal at a wide range of false positive rates.

Then a longer discussion regarding controls.

  • Utilization is a bad match, because increased utlization is confounded with psychiatric overlay.
  • Completely healthy individuals is a bad match because we would be selecting for populations with a lower burden of risk alleles for numerous diseases.

(The following courtesy of Patience Gallagher):


Isaac Kohane, Susanne Churchill, Jordan Smoller, Roy Perlis, Sergey Goryachev, Shawn Murphy, Dan Iosifescu, Victor Castro, Tianxi Cai, Vivian Gainer, Wouter Hoogenboom, Margarita Sordo, Stefanie Block, Patience Gallagher

Friday, September 18, 2009

Short term planning

Zak, Shawn, Griffin, Susanne

Committed to the details of AUG meeting at the CTSA

Reviewed additional hires required for ancillary i2b2

Discussed storage needs.

Wednesday, September 16, 2009

MDD

Minutes courtesy Patience Gallagher

I2B2

Meeting Minutes

Date: Friday, September 11, 2009

Attendees:

Vivian Gainer

Sergey Goryachev

Dan Iosifescu

Shawn Murphy

Roy Perlis

Holly Sciortino

Jordan Smoller

Margarita Sordo


A

Minutes:

  • Scheduling a reoccurring meeting to discuss the RO1s (ICCBD & Roy’s RO1)
    • Roy and Jordan will forward suggested days/times to Susanne Churchill to decide on
  • Imaging (Dan):
    • Update on analyses:
      • Good data from the structural images
      • Continuing to work on DTI
    • Dan is still waiting on healthy volunteer data from Vivian
      • Vivian reported that the delay has been due to the source system changing the format of the RPDR data. However, since the team only wants the reports and not the images, Vivian anticipates being able to provide the data to Sergey by Monday (September 14, 2009).
  • Bipolar
    • The best model is the B0, B1, B2 vs. NB2
      • Exact features used include: bipolar disorder, Depakote, grandiose thinking, lithium, major depressive disorder, irritability
      • Grandiose thinking appears to predict a non-bipolar diagnoses
    • Predictors
      • Each term highlighted by the diagnosticians is being treated as a separate predictor
        • For example: Although “Family History: Bipolar disorder” and “Family history of bipolar disorder” mean the same thing, they are being counted/grouped separately.
        • Margarita will review the terms and group the expressions which will increase the frequency of the predictors
    • Improving the performance of the model
      • Layering additional requirements on top of the model would increase the features used to determine the diagnosis
        • For example: Require that in order to be labeled as Bipolar, a prescription of “Lithium” would need to be found in at least 3 separate notes
        • Jordan will provide additional features to layer on top of the model
    • Interpreting the model (B0, B1, B2 vs. NB2)
      • Why isn’t B2 vs. NB2 the best model?
        • Could be due to the number of instances
          • 69 cases included in the B2 vs. NB2 model versus 84 cases in the NB2 vs. B0, B1, B2 model
      • The current “best model” isn’t doing what we ultimately want
        • The “best model” could change once Margarita groups the features
        • If this doesn’t work, we may need to determine if additional notes need to be reviewed by the diagnosticians
    • Predicting cases versus controls
      • Will we be able to determine controls without an algorithm?
        • Everybody not identified by the bipolar algorithm as a B2 would be considered a control
          • This is not a good solution, since we want people that don’t have evidence of any psychiatric conditions
        • If we only identified individuals that don’t have psychiatry notes, this would not necessarily mean that they don’t have a psychiatric condition rather we may be identifying individuals that don’t use the health system
        • Shawn suggested matching controls by age and gender and then excluding certain diagnoses
      • When reporting the algorithm’s performance at selecting Bipolar cases and distinguishing them from controls we should report the percentage of individuals with Bipolar disorder we would expect from the MGH population and compare that to the population prevalence
        • Roy suggested identifying 200 individuals classified as controls by Shawn’s method (above) and using a bipolar screening instrument to determine if they are truly controls
          • Unsure of how we would recruit these individuals
      • If there are a few true cases included in the control sample, there would be a small effect. Controls that are identified as cases have a huge affect in research on diseases with low population prevalence (such as bipolar disorder)
    • Things to look into:
      • Margarita will look into why grandiose thinking is not predicting bipolar disorder
      • Margarita/Sergey will reconfirm that the model is not just looking at diagnostic codes
      • Margarita will group the features
      • Margarita/Sergey will layer further features on top of model (after Jordan provides the additional features)
      • Team will determine if we need to review more cases
  • MDD
    • How will we determine responder vs non-responder?
      • Vivian ran analyses per subject based on billing codes and created patterns according to the timeline of the diagnoses, visits to psychiatry, and prescriptions
        • Potential Issues:
          • The diagnostic codes are not used consistently in the notes
          • When the visits to psychiatry end, unsure if the patient is lost to follow-up, in remission, or seeing a private psychiatrist
          • If ECT is present at the same time as a prescription, unsure of which treatment the patient is responding to
          • Additional criteria may be needed such as death
        • Potential benefits:
          • Once above issues are addressed, we may be able to look at the patterns to determine responders vs non-responders
        • It would be informative to add NLP to this visual
          • Tianxi will take the last digit of the diagnostic code, extract the status and incorporate into the model/algorithm
          • If the results are very different, we may have evidence that research based on billing codes is missing data and therefore inaccurate
      • Categories/Parameters
        • Roy and Dan drafted parameters which may be able to add complexity to the classifier
        • Roy will send these to Shawn


NEXT MEETING: 9/18/09

Friday, September 11, 2009

Review of Core 1 needs from Core 2

Murphy et al.

Reviewed data marts required for Core 1 activities

  • Normals
  • NLP
  • Predictive medicine
  • Relevance networks
  • Inflammation as an underlying process across diseases (the systems approach)

Friday, September 4, 2009

Rheumatoid arthritis

Plenge et al.,

How to go from GWAS to prediction.

Discussed how to get better insight into the genetic "dark matter" that might explain additional inherited variation compared to published variants.

Major Depressive Disorder

Perlis, Smoller et al.

Discussed the NLP again and reviewed Tianxi's analysis of the NLP PPV and NPV and AUC

Sergei and Jordan reviewed the results of logistic regression trained on several hundred expert review cases.

Working on Web Client for PM Cell

Present: Shawn, Griffin, Susanne, Diane, Zak, Mike

Discussed a bare-bones web client for PM management.