Harmonization of gene/protein annotations: towards a gold standard MEDLINE.

David Campos, Sérgio Matos, Ian Lewin, José Luís Oliveira, Dietrich Rebholz-Schuhmann

University of Aveiro, IEETA/DETI, Campus Universitário de Santiago, Aveiro, Portugal. david.campos@ua.pt

Bioinformatics (Oxford, England) 2012 May 1

filter terms:

The recognition of named entities (NER) is an elementary task in biomedical text mining. A number of NER solutions have been proposed in recent years, taking advantage of available annotated corpora, terminological resources and machine-learning techniques. Currently, the best performing solutions combine the outputs from selected annotation solutions measured against a single corpus. However, little effort has been spent on a systematic analysis of methods harmonizing the annotation results and measuring against a combination of Gold Standard Corpora (GSCs). We present Totum, a machine learning solution that harmonizes gene/protein annotations provided by heterogeneous NER solutions. It has been optimized and measured against a combination of manually curated GSCs. The performed experiments show that our approach improves the F-measure of state-of-the-art solutions by up to 10% (achieving ≈70%) in exact alignment and 22% (achieving ≈82%) in nested alignment. We demonstrate that our solution delivers reliable annotation results across the GSCs and it is an important contribution towards a homogeneous annotation of MEDLINE abstracts. Availability and implementation: Totum is implemented in Java and its resources are available at http://bioinformatics.ua.pt/totum

Citation

David Campos, Sérgio Matos, Ian Lewin, José Luís Oliveira, Dietrich Rebholz-Schuhmann. Harmonization of gene/protein annotations: towards a gold standard MEDLINE. Bioinformatics (Oxford, England). 2012 May 1;28(9):1253-61

Mesh Tags

Substances

PMID: 22419783

View Full Text

FAQ

Harmonization of gene/protein annotations: towards a gold standard MEDLINE.

filter terms:

Citation

var meshTagsSectionCollapsed = true; Mesh Tags

var substancesSectionCollapsed = true; Substances

Mesh Tags

Substances