sentometrics research repository

Logo

An open-source repository about sentiment+econometrics research.

View the Project on GitHub sborms/econometrics-meets-sentiment

Papers Links Workflow Glossary

Glossary

Corpus A corpus in linguistics jargon designates the collection of textual data units (e.g. documents) to be analyzed. It can be generalized to indicate the collection of data units from textual, audio, or visual data.

Features A feature is a broad term to represent any type of metadata attached to the original textual, audio, or visual data as stored in a corpus. Examples are source, expresser, entity, location, topic, and so on. This definition is slightly different but in line with how features are used in a machine learning context, where they refer to the set of explanatory variables. In video and audio data, (low-level) features are compact, mathematical representations of the physical properties of the data (Wang et al., 2003).

Lexicon A lexicon is a list of tokens (e.g. words, a sequence of words, a facial expression, or a sound) with, for each token, an associated score that represents its average sentiment. Also interchangeably called a sentiment lexicon, a sentiment word list, or a sentiment dictionary.

Natural language processing (NLP) The broad subfield within artificial intelligence occupied with the understanding, interpretation, and manipulation of human language. It draws from computer science, computational linguistics, and machine learning.

Polarity The polarity (or semantic orientation) of an expression (whether it is a text, a sound, or something else) represents its degree of positivity. Polarity categories go from very positive to very negative, discrete or continuous.

Sentiment Sentiment equals the disposition of an entity toward an entity, expressed via a certain medium. This working definition consists of (1) the expression by an entity of its disposition, in the form of verbal or nonverbal communication, (2) the expression has a polarity or a semantic orientation measurable on a discrete or a continuous scale, and (3) the expression is oriented toward (an aspect of) an entity.

Sentiment analysis Sentiment analysis is about the extraction of sentiment from the medium it is expressed through. Multimodal sentiment analysis covers textual, audio, and visual media.

Sentometrics The term “sentometrics” is a portmanteau of sentiment and econometrics. It deals with the computation of sentiment from any type of qualitative data, the evolution of sentiment, and the application of sentiment in an economic analysis using econometric methods.

Supervised learning Supervised learning is a branch of machine learning that requires an annotated data set (i.e. a set of input data with associated output values) to train a model.

Unsupervised learning Unsupervised learning is a branch of machine learning where the input data decide the output categories or representation by themselves. Any unsupervised method is typically hybrid or semi-supervised, as there is often need for certain minimal inputs from the modeler.