Guergana led a discussion on the current state. Total team is currently 7-9 individuals (depending on how directly they are involved for a given project).
Currently, in i2b2, NLP is cast as a supervised classification task that is applied to patients that are filtered according to various criteria (e.g. lab values, ICD-9 codes). Domain experts typically annotate (extensively) 100+ charts as part of the gold-standard (used for supervised learning). A subset of these annotations are done twice (at least) to assess inter-annotator agreement.
The cTAKES has processed 28M documents at Partners Healthcare System to date.
Reviewed recent results with IBD (NLP alone almost as good as NLP+Codified data), Multiple Sclerosis (NLP helped but the combination of NLP + codified is significantly better).