
Kaggle, an online community for data scientists and a platform for data science competitions, has unveiled a new and timely bounty-paying challenge: the Covid-19 Open Research Dataset Challenge, or CORD-19.
CORD-19 asks artificial intelligence (AI) and machine learning researchers to develop text and data mining tools to analyse a dataset comprising tens of thousands of articles on virology and infectious disease. The goal is to help provide answers for 10 tasks, or lines of inquiry about the disease.
The prize for each of the tasks in the CORD-19 challenge is $1000, delivered as cash or as a charitable donation to research and relief efforts.
The world isn’t lacking for research about Covid-19. Kaggle’s dataset contains “over 29,000 scholarly articles, including over 13,000 with full text, about Covid-19, SARS-CoV-2, and related coronaviruses,” according to the challenge introduction.
But there’s little time to paw manually through those haystacks of research for the needles we need, so Kaggle is encouraging the use of machine learning techniques like natural language processing to get relevant data into the right hands more quickly.
The CORD-19 tasks revolve around common questions about Covid-19. Each of the high-level tasks (e.g. what do we know about Covid-19 risk factors?) includes a number of subtasks (e.g. what populations are more susceptible? What roles do smoking or pre-existing pulmonary diseases play?).
Other Covid-19-related datasets are also available on Kaggle. These include a full RNA sequencing of the virus and details about previous infectious disease outbreaks like Ebola and SARS.
Previous Kaggle challenges related to medical science featured projects with longer and less urgent time frames, such as devising better ways to screen for cervical cancer. Because the Covid-19 outbreak requires answers immediately, the Kaggle community is facing its first major test in real time.