PHM2017 is a new dataset consisting of 7,192 English tweets across six diseases and conditions: Alzheimer’s Disease, heart attack (any severity), Parkinson’s disease, cancer (any type), Depression (any severity), and Stroke. The Twitter search API was used to retrieve the data using the colloquial disease names as search keywords, with the expectation of retrieving a high-recall, low precision dataset. After removing the re-tweets and replies, the tweets were manually annotated. The labels are:
self-mention. The tweet contains a health mention with a health self-report of the Twitter account owner, e.g., "However, I worked hard and ran for Tokyo Mayer Election Campaign in January through February, 2014, without publicizing the cancer."
other-mention. The tweet contains a health mention of a health report about someone other than the account owner, e.g., "Designer with Parkinson’s couldn’t work then engineer invents bracelet + changes her world"
awareness. The tweet contains the disease name, but does not mention a specific person, e.g., "A Month Before a Heart Attack, Your Body Will Warn You With These 8 Signals"
non-health. The tweet contains the disease name, but the tweet topic is not about health. "Now I can have cancer on my wall for all to see <3"