Volume 70, Issue 7 p. 675-684
RESEARCH ARTICLE

Automatic detection of influencers in social networks: Authority versus domain signals

Javier Rodríguez-Vidal

Corresponding Author

Javier Rodríguez-Vidal

Department of Lenguajes y Sistemas Informáticos, Universidad Nacional de Educación a Distancia (UNED), Madrid, Spain

E-mail: [email protected]Search for more papers by this author
Julio Gonzalo

Julio Gonzalo

Department of Lenguajes y Sistemas Informáticos, Universidad Nacional de Educación a Distancia (UNED), Madrid, Spain

Search for more papers by this author
Laura Plaza

Laura Plaza

Department of Lenguajes y Sistemas Informáticos, Universidad Nacional de Educación a Distancia (UNED), Madrid, Spain

Search for more papers by this author
Henry Anaya Sánchez

Henry Anaya Sánchez

Séntisis Analytics, Calle Gran Vía 62, 28013 Madrid, Spain

Search for more papers by this author
First published: 07 January 2019
Citations: 11

Abstract

Given the task of finding influencers (opinion makers) for a given domain in a social network, we investigate (a) what is the relative importance of domain and authority signals, (b) what is the most effective way of combining signals (voting, classification, learning to rank, etc.) and how best to model the vocabulary signal, and (c) how large is the gap between supervised and unsupervised methods and what are the practical consequences. Our best results on the RepLab dataset (which improves the state of the art) uses language models to learn the domain-specific vocabulary used by influencers and combines domain and authority models using a Learning to Rank algorithm. Our experiments show that (a) both authority and domain evidence can be trained from the vocabulary of influencers; (b) once the language of influencers is modeled as a likelihood signal, further supervised learning and additional network-based signals only provide marginal improvements; and (c) the availability of training data sets is crucial to obtain competitive results in the task. Our most remarkable finding is that influencers do use a distinctive vocabulary, which is a more reliable signal than nontextual network indicators such as the number of followers, retweets, and so on.