Domain Adaptation for Signal Extraction from Large Social Media Datasets

33 Pages Posted: 19 Aug 2020

See all articles by Wenli Zhang

Wenli Zhang

Iowa State University

Sudha Ram

University of Arizona - Department of Management Information Systems

Date Written: July 17, 2018

Abstract

There has been increasing interest in using social media data for quantitative research in many different domains. Although using these datasets in different areas has significant promise, mounting evidence suggests that many of the results being produced could be misrepresented because of various types of noises. We introduce novel and efficient techniques combining Natural Language Processing (NLP) and Machine Learning (ML) techniques to extract signals from social media text. The proposed framework makes a significant methodological contribution by developing a feature augmentation and sample reweighting based domain adaptation method. It reduces the training effort for signal extraction by re-using previously annotated data. The proposed framework was tested using several large real-world datasets from social media and outperforms other baseline methods by a large margin. The framework described in this paper can be used for a variety of purposes to yield improved analyses of social media and contributing to predictive analytics.

Keywords: Signal Extraction, Machine Learning, Natural Language Processing, Domain Adaptation, Social Media Analysis

JEL Classification: C89

Suggested Citation

Zhang, Wenli and Ram, Sudha, Domain Adaptation for Signal Extraction from Large Social Media Datasets (July 17, 2018). Available at SSRN: https://ssrn.com/abstract=3653884 or http://dx.doi.org/10.2139/ssrn.3653884

Wenli Zhang (Contact Author)

Iowa State University ( email )

Ames, IA 50011-2063
United States

Sudha Ram

University of Arizona - Department of Management Information Systems ( email )

AZ
United States

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
8
Abstract Views
99
PlumX Metrics