RedHOT: A Corpus of Annotated Medical Questions, Experiences, and Claims on Social Media
(EACL 2023)
About
Reddit Health Online Talk (RedHOT) is a large scale corpus of over 22k richly annotated social media posts from Reddit spanning 24 health conditions. Annotations include demarcations of spans corresponding to medical claims, personal experiences, and questions. Additionally, claims are annotated for medically relevant Populations, Interventions, and Outcomes (PIO elements).
We introduce the task of retrieving trustworthy evidence relevant to a given claim made on social media. To do this, we also propose a novel method to automatically derive (noisy) supervision for this task which we use to train a dense retrieval model -- which outperforms baselines when evaluated by medical doctors (MDs).
The Evidence Retrieval Task
We describe the task of evidence retrieval as follows:
Given a natural language medical claim, identified PIO elements, and a very large corpus of medical abstracts from randomized control trials (RCTs), identify the relevant abstracts that aid users in making an informed decision about the underlying claim.
Paper
Link to paper@inproceedings{redhot23,
author={Somin Wadhwa and Vivek Khetan and Silvio Amir and Byron Wallace},
Booktitle = {European Association of Computational Linguistics (EACL)},
Year = "2023",
Title={RedHOT: A Corpus of Annotated Medical Questions, Experiences, and Claims on Social Media}
}