I was impressed by Annalyn Ng’s and Kenneth Soo’s short book Numsense. I have already discussed it in a prior post, see here. Today I will note how they discuss clustering. This is central to a lot of marketing analyses.
Numsense Covers A Lot Of Basic Data Science
The subtitle Data Science for the Layman (no math added) gives you a good sense of what you are going to get. They cover an awful lot of concepts in a very succinct book. The book contains details of social network analysis, regression, and clustering. The authors also have some fancier things such as Support Vector Machine, Random Forests, and Neural Networks. To do this in a short book they cut out extraneous descriptions. They don’t bother with humor but that allows a lot of ground to be covered.
It is also true if you want to apply some of the techniques you will need to go further. Still, Ng and Soo’s book is a nice introduction. Making terms seem a bit less scary is helpful by itself. This should help the reader predict whether they will want to go further in their research of any area.
Clustering
Clustering has been a mainstay of marketing for many years and the principles apply in data science. (A lot of data science is just traditional quantitative marketing with more data and cooler computer equipment).
The authors tell us what clustering is about and why you might want to do it. They say “..by identifying common preferences or characteristics, it is possible to sort customers into groups, which retails may then use for targeted advertisement.” (Ng and Soo, 2017).
How Many Clusters Are Enough
The authors highlight a major challenge of clustering. We never really know how many are enough clusters. “The number of clusters should be large enough to enable us to extract meaningful patterns that can inform business decisions, but also small enough to ensure that clusters remain clearly distinct”. This is a Goldilocks definition, we want something just right
Sadly we often can’t really prove what right is. Still, we can often argue that some choices are clearly better than others.
It is useful to remember that data science still has a bit of art in it.
Read: Annalyn Ng and Kenneth Soo (2017) Numsense! Data Science of the Layman (No Math Added)