Journal Article
Print(0)
Journal of medical Internet research
J.Med.Internet Res.
29-Aug
15
8
e174
LR: 20150423; GR: U01 CA154280/CA/NCI NIH HHS/United States; GR: U01CA154280/CA/NCI NIH HHS/United States; GR: U54 HL108460/HL/NHLBI NIH HHS/United States; GR: U54HL108460/HL/NHLBI NIH HHS/United States; JID: 100959882; OID: NLM: PMC3758063; OTO: NOTNLM;
Canada
1438-8871; 1438-8871
PMID: 23989137
eng
Journal Article; Research Support, N.I.H., Extramural; IM
10.2196/jmir.2534 [doi]
Unknown(0)
23989137
BACKGROUND: Social media platforms such as Twitter are rapidly becoming key resources for public health surveillance applications, yet little is known about Twitter users' levels of informedness and sentiment toward tobacco, especially with regard to the emerging tobacco control challenges posed by hookah and electronic cigarettes. OBJECTIVE: To develop a content and sentiment analysis of tobacco-related Twitter posts and build machine learning classifiers to detect tobacco-relevant posts and sentiment towards tobacco, with a particular focus on new and emerging products like hookah and electronic cigarettes. METHODS: We collected 7362 tobacco-related Twitter posts at 15-day intervals from December 2011 to July 2012. Each tweet was manually classified using a triaxial scheme, capturing genre, theme, and sentiment. Using the collected data, machine-learning classifiers were trained to detect tobacco-related vs irrelevant tweets as well as positive vs negative sentiment, using Naive Bayes, k-nearest neighbors, and Support Vector Machine (SVM) algorithms. Finally, phi contingency coefficients were computed between each of the categories to discover emergent patterns. RESULTS: The most prevalent genres were first- and second-hand experience and opinion, and the most frequent themes were hookah, cessation, and pleasure. Sentiment toward tobacco was overall more positive (1939/4215, 46% of tweets) than negative (1349/4215, 32%) or neutral among tweets mentioning it, even excluding the 9% of tweets categorized as marketing. Three separate metrics converged to support an emergent distinction between, on one hand, hookah and electronic cigarettes corresponding to positive sentiment, and on the other hand, traditional tobacco products and more general references corresponding to negative sentiment. These metrics included correlations between categories in the annotation scheme (phihookah-positive=0.39; phi(e-cigs)-positive=0.19); correlations between search keywords and sentiment (chi(2)(4)=414.50, P<.001 cramer="" v="0.36)," and="" the="" most="" discriminating="" unigram="" features="" for="" positive="" negative="" sentiment="" ranked="" by="" log="" odds="" ratio="" in="" machine="" learning="" component="" of="" study.="" automated="" classification="" tasks="" svms="" using="" a="" relatively="" small="" number="" achieved="" best="" performance="" tobacco-related="" from="" unrelated="" tweets="" score="0.85)." conclusions:="" novel="" insights="" available="" through="" twitter="" tobacco="" surveillance="" are="" attested="" high="" prevalence="" sentiment.="" this="" is="" correlated="" complex="" ways="" with="" social="" image="" personal="" experience="" recently="" popular="" products="" such="" as="" hookah="" electronic="" cigarettes.="" several="" apparent="" perceptual="" disconnects="" between="" these="" their="" health="" effects="" suggest="" opportunities="" control="" education.="" finally="" posts="" shows="" promising="" edge="" over="" strictly="" keyword-based="" approaches="" yielding="" an="" improved="" signal-to-noise="" data="" paving="" way="" applications.="">
Myslin,M., Zhu,S.H., Chapman,W., Conway,M.
Department of Linguistics, University of California, San Diego, La Jolla, CA 92093, USA.
20130829
PMC3758063
http://vp9py7xf3h.search.serialssolutions.com/?charset=utf-8&pmid=23989137
2013