Skip to content

Marketing Thought

Clarifying management/marketing theory

Menu
  • Neil Bendle
  • Popular Marketing Metrics: How Not To Mess Them Up
  • Marketing Metrics 4th Edition Book
  • Marketing PhD Applications
  • Advice For The Marketing Academic Job Market
  • Marketing Strategy
  • Sustainable Marketing Strategy
  • Public Policy, Behavioral Economics and Marketing
Menu

Comparing Text Classification Methods

Posted on June 24, 2022June 11, 2022 by neilbendle

Marketing research, especially academic research, now assesses a lot of unstructured text data. (Unstructured data is that which does not come in neat database/spreadsheet form of rows and columns). Classifying such text is a task that computers excel at. So, how do we go about comparing text classification methods to find which one best fits our purposes?

Service To The Field

Four researchers did a service to the field by assessing ten different types of text classification. They looked at data a marketer might want to analyze and saw what worked well on that particular type of text.

How to decide what is good performance? The researchers compare the performance of the models to human judgment.

  • Does a human think the text is positive negative or neutral?
  • Does the human think the text contains information, emotion, or a combination of both?

The idea being that humans understand text really rather well. We want our algorithms to generate a similar understanding to us.

Trending
Defining Customer Experience And Its Role In Customer Journey

So why don’t people just read the text themselves, if man be the measure of a machine?

There certainly can be benefits to automated methods in terms of quality, e.g., consistency. Yet, that probably isn’t the main reason. Have you seen how much text exists online? The greatest benefits of setting machines to analyze text data often come in speed, cost savings, and lack of boredom. (Computers don’t get bored, or at least we don’t think they do. We may be storing up resentments that help explain why nearly every science fiction movie has robots rebelling. Occasionally, I too feel angry at the world after reading enough Twitter posts. As did many Trump supporters I hear).

Out-of-sample Accuracy Test

The authors use an out-of-sample accuracy test. They test a variety of text sources, e.g., Twitter, corporate blogs, Yelp, Facebook etc…

They train the models on 80% of each dataset. This is called the training data/sample. The authors tell the model what the right answer is for 80% of the data. ‘This blog has a positive tone according to our human judges’. Then the models are set loose on the other 20% of data from each sample. This is called the hold-out sample/data.

Comparing Text Classification Methods With An Out-of-sample Accuracy Test
Comparing Text Classification Methods With An Out-of-sample Accuracy Test

Comparing Text Classification Methods

They compared ten text classification methods. Five of these were lexicon-based. Such methods basically use dictionaries of words related to relevant categories. For example, “great” might be classed as a positive word. Thus, if you get a load of words like “great” the machine will say the piece of text is positive. The five lexicon-based methods they used were LIWC, AFINN, BING, NRC, and VADER. None of these did very well so you don’t really need to remember them.

Five models were machine learning approaches. ANN (Artificial Neural Network), kNN (k-Nearest Neighbors), NB (Naive Bayes), SVM (Support Vector Machine), and RF (Random Forest). These methods learn patterns, seeing what the characteristics of the various texts in the training samples were classed by the humans as, for example, positive. They then look for those characteristics in the hold-out sample. The models ‘guess’ what the sample is from pattern matching on characteristics.

Performance varies across contexts. Still, they did have winners.

Nevertheless, across all different contexts, ANN, NB, and RF consistently achieve the highest performances.

Hartmann, Huppertz, Schamp, Heitmann (2019) page 27

The authors explain that it is important to have the right methods for the data at hand. Yet, Random Forest and Naive Bayes do consistently well so it would be hard to go wrong with them. (The latter being especially impressive given it is Naive and Bayes was long dead before the first tweet was ever tweeted).

It is useful to know what tools work best for each task. The authors do the field a favor by seeing what works best with the various marketing-relevant data samples.

Read: Jochen Hartmann, Juliana Huppertz, Christina Schamp, and Mark Heitmann (2019) Comparing automated text classification methods, International Journal of Research in Marketing, 36(1) pages 20-38

Share on Social Media
twitter facebook linkedin reddit

  • Management Accounting For Marketers
  • A New Marketing Metric, Earned Growth
  • Points Pressure Versus Rewarded Behavior
  • A Business Case For Marketing Budgets

MASB’s Common Language Dictionary

Need a marketing definition? Use MASB’s Common Language Marketing Dictionary. Click for the MASB Common Language Marketing Dictionary

Pages in Marketing Thought

  • A Plea About Language Used In Marketing
  • Advice For The Marketing Academic Job Market
  • Behavioral Econ For Kids: The Cartoon Book
  • Data Visualization Advice
  • Machine Learning (ML) And Marketing: What Should You Know?
  • Marketing In The Movies
  • Marketing Metrics 4th Edition Book
    • Chapter of Marketing Metrics 4th Edition, Free Sample
  • Marketing PhD Applications
  • Marketing Strategy
    • Measuring Culture Is A Challenge, But Don’t Be Silly
    • Rationality And Marketing Strategy
    • Segmentation, Targeting and Positioning
    • Strategy And Evolutionary Thinking
  • Neil Bendle
  • Popular Marketing Metrics: How Not To Mess Them Up
    • Brand Valuation: Progress But Lots More Needed
    • Customer Equity: Nice Idea, Bit Of A Mess In Execution
    • How To Use, And Misuse, Customer Lifetime Value (CLV)
    • Is ROI The Most Abused Term In Marketing?
    • Market Share: Always An Indicator Never A Target
    • Marketing Accounts: A Better Way To Measure Marketing Performance
    • Measuring Competition With The Bendle Panda Index
    • Net Promoter Score: Sadly Not As Magical As Supporters Suggest
    • Profit Measurement: Choose Your Own Level Is Problematic
    • Tobin’s Q: Why Academics Should Listen To Managers
    • Total Q, A New Improved Tobin’s Q? Not By Much
    • Value Of A Like: Do Not Use For Budgeting
  • Public Policy, Behavioral Economics and Marketing
  • Recommended Books
  • Sustainable Marketing Strategy
©2022 Marketing Thought | Built using WordPress and Responsive Blogily theme by Superb
Go to mobile version