CBioC: Collaborative Bio Curation
The volume of existing biomedical articles is huge and it grows day by day. From 1994 to 2004, close to 3 million biomedical articles were published by US and European researchers alone. Added to the approximately 15 million abstracts already in PubMed, this represents over 800 new articles per day and a myriad of individual new facts to survey for information relevant to a particular research question.
Currently two approaches are pursued to extract and combine facts from biomedical publications. The first approach of hiring human curators is expensive, and thus does not scale-up. It also leads to bias. The second approach of using automated information extraction systems only has a recall and precision of around 60%.
We present here a new approach to the problem through mass collaboration, where the community of researchers that writes and reads the biomedical texts will be able to contribute to the curation process, dictating the pace at which it is done.
Overview of our Approach
Automated text extraction is used as a starting point to bootstrap the database, but then it is up to biologists improve upon the extracted data, "ironing out" inconsistencies by subsequent edits on a massive scale. 
CBioC runs as a web browser extension and allows unobtrusive use of the system during the regular course of research in PubMed. It can also be accessed directly (withouth having to install a plug-in).
Statistics for CBioC
| Abstracts | Integrated Data | ||
| Total Processed: | 1,804,264 | BIND Interactions: | 114,684 |
| With Interactions: | 53% | GRID Interactions: | 58,366 |
| Interactions | MINT Interactions: | 51,721 | |
| Total Protein/Protein: | 1,274,798 | DIP Interactions: | 52,068 |
| Total Gene/Disease: | 376,425 | IntAct Interactions: | 93,148 |
| Total Gene/Bio-Process: | 287,414 | ||