To enable a live collaboration for annotation of Covid-19-related literature.
The outbreak of Covid-19 has motivated NLP researchers to explore ways to mine scientific literature, in order to potentially identify information that can be useful in fighting the virus. To this end, multiple Covid-19 datasets have been released, including LitCovid by NCBI and CORD-19 by Allen Institute for AI, and several research groups are producing and releasing annotations of these data that identify relevant entities mentioned within them. To avoid duplication of effort and, crucially, provide a communal collection of annotations over Covid-19-related literature, we have established Covid19-PubAnnotation, an open collaboration to collect annotations from researchers around the world in order to produce a set of integrated annotations that can be used for further research to fight the disease.
PubAnnotation is an open repository of annotations of biomedical literature whose goal is to collect and integrate annotations contributed by the global NLP community. The PubAnnotation project has set up a collaborative annotation environment to enable a focused community effort and initiated a virtual bio-hackathon group involving annotators and annotation software developers from around the world. The Hackathon is taking place from April 5-11, after which a sizable set of richly annotated Covid-19 data will be released for public use. Contribution of annotation datasets, both during and after the hackathon, is open to anyone. All contributions will be integrated and made accessible via search, visualization, and fine-grained access from the PubAnnotation platform.
Proof of concept example
Individual vs. Aggregated annotations
The following example demonstrates the benefit of aggregating various annotations over Covid-19 literature:
- Annotation using MONDO
Search over aggregated annotations
Once integrated into PubAnnotation, annotations such as those shown above can be searched for individual entities as well as groups of entities. The example below shows the result of a search for sentences mentioning SARS (severe acute respiratory syndrome) and anatomical locations::
Search results for sentences mentioning SARS and anatomical locations
PubAnnotation has set up a public repository of Covid-19 literature annotations to which anyone can contribute. All contributed annotations are automatically aligned to the canonical texts in PubAnnotation. Search facilities are enabled over contributed annotations.
Literature data sets
- LitCovid (NCBI) : Home | PubAnnotation
- CORD-19 (Allen research institute of AI) : Home | PubAnnotation
More will be added.
Participants, contribution, prograss, so far
|Mariana Neves (BfR)||LitCovid-ArguminSci - Discourse elements||V.1 added for LitCovid|
|Nico Colic (UZH), Fabio Rinaldi (IDSIA)||OGER-BB - Biomedical term normalization||V.2 added for LitCovid|
|Simon Suster (UMelbourne)||CORD-PICO - PICO categories||V.1 added for CORD-19|
|Mayla Boguslav, William Baumgartner, Larry Hunter (UColorado)||Epistemic_Statements - Epistemic statements||V.1 added for CORD-19|
|Zhiyong Lu (NCBI)||PubTator Central||Annotation service ready to cooperate with PubAnnotation|
|Keith Suderman, Nancy Ide (Vassar College)||LAPPS Grid biomedical analysis software (LappsGridBioNER, LappsGridGeneTagger, LappsGridStanfordPOSTagger, LappsGridTimeML)||Annotation service ready to cooperate with PubAnnotation|
- The literature collections (LivCovid and CORD-19) will be kept updating, as the collections themselves are growing.
- Some texts that are already stored in PubAnnotation may be changed.
- In the case, annotations made to the texts will be migrated to the updated texts.
- As the literature collections will be growing, contribution with automatic annotation tools is a more sustainable way than contribution with pre-computed annotation datasets.
- With contribution with automatic annotation service, which conforms this API, automatic execution of the tools as the update of the literature collections will be set up.
- Annotation datasets which are at ‘Production’ status will be automatically converted to RDF statements and fed into a SPARQL endpoint, so that they can be immediately explored through SPARQL-based search interfaces.
Anyone can subscribe to be informed, to discuss, to contribute, and more.
Aiko T. Hiraki (DBCLS)
Jin-Dong Kim (DBCLS)