Domain Adaptation for Signal Extraction from Large Social Media Datasets
33 Pages Posted: 19 Aug 2020
Date Written: July 17, 2018
There has been increasing interest in using social media data for quantitative research in many different domains. Although using these datasets in different areas has significant promise, mounting evidence suggests that many of the results being produced could be misrepresented because of various types of noises. We introduce novel and efficient techniques combining Natural Language Processing (NLP) and Machine Learning (ML) techniques to extract signals from social media text. The proposed framework makes a significant methodological contribution by developing a feature augmentation and sample reweighting based domain adaptation method. It reduces the training effort for signal extraction by re-using previously annotated data. The proposed framework was tested using several large real-world datasets from social media and outperforms other baseline methods by a large margin. The framework described in this paper can be used for a variety of purposes to yield improved analyses of social media and contributing to predictive analytics.
Keywords: Signal Extraction, Machine Learning, Natural Language Processing, Domain Adaptation, Social Media Analysis
JEL Classification: C89
Suggested Citation: Suggested Citation