Hierarchical Clustering
Hierarchical Clustering¶
Clustering can be used to group customers or markets based on similarities. Customer segmentation can be used to create an appropriate marketing strategy for that segment. In this blog, we will look at customer segmentation using beer data set.
Hierarchical clustering is a clustering algorithm which builds a hierarchy from the bottom-up. It uses the following steps to develop clusters:
1. Start with each data point in a single cluster
2. Find the data points with the shortest distance (using an appropriate distance measure) and merge them to form a cluster.
3. Repeat step 2 until all data points are merged to form a single cluster.
Beer data set¶
The beer data set contains 20 records of different type of beer brand and contains information about the calories, alcohol, sodium content and cost. It is taken from Machine Learning Using Python - Manaranjan Pradhan
name | calories | sodium | alcohol | cost |
---|---|---|---|---|
Budweiser | 144 | 15 | 4.7 | 0.43 |
Schlitz | 151 | 19 | 4.9 | 0.43 |
Lowenbrau | 157 | 15 | 0.9 | 0.48 |
Kronenbourg | 170 | 7 | 5.2 | 0.73 |
Heineken | 152 | 11 | 5.0 | 0.77 |
Old_Milwaukee | 145 | 23 | 4.6 | 0.28 |
Augsberger | 175 | 24 | 5.5 | 0.40 |
Srohs_Bohemian_Style | 149 | 27 | 4.7 | 0.42 |
Miller_Lite | 99 | 10 | 4.3 | 0.43 |
Budweiser_Light | 113 | 8 | 3.7 | 0.40 |
Coors | 140 | 18 | 4.6 | 0.44 |
Coors_Light | 102 | 15 | 4.1 | 0.46 |
Michelob_Light | 135 | 11 | 4.2 | 0.50 |
Becks | 150 | 19 | 4.7 | 0.76 |
Kirin | 149 | 6 | 5.0 | 0.79 |
Pabst_Extra_Light | 68 | 15 | 2.3 | 0.38 |
Hamms | 139 | 19 | 4.4 | 0.43 |
Heilemans_Old_Style | 144 | 24 | 4.9 | 0.43 |
Olympia_Goled_Light | 72 | 6 | 2.9 | 0.46 |
Schlitz_Light | 97 | 7 | 4.2 | 0.47 |
Find distances between all points¶
As the features are on different scales, they should be normalized. After normalizing, the distance between every pair of points is computed. The distance metric should be selected based on the type of features. In this particular case, euclidean distance gives better results as the variables are continuous. After normalizing, the distance between every pair of points is shown in a matrix below.
## Budweiser Schlitz Lowenbrau Kronenbourg Heineken
## Schlitz 0.6757423
## Lowenbrau 3.5360570 3.7478149
## Kronenbourg 2.5913185 2.8431126 4.5013847
## Heineken 2.4544248 2.6450186 4.3136034 0.9125425
## Old_Milwaukee 1.5998120 1.2477763 3.8868316 4.0677186 3.8672105
## Augsberger 1.8712535 1.2459169 4.5173402 3.4590922 3.3487149
## Srohs_Bohemian_Style 1.8321163 1.2330993 3.9706743 3.8087854 3.4400718
## Miller_Lite 1.7089224 2.2633337 3.7591742 3.2676911 3.0014940
## Budweiser_Light 1.7512699 2.3722713 3.1892420 3.2644236 3.1333976
## Coors 0.4883132 0.4856244 3.4879470 2.8437559 2.5716098
## Coors_Light 1.5068182 1.8897230 3.4596559 3.3190336 2.8912688
## Michelob_Light 0.9499789 1.5505681 3.1807422 2.2518883 2.0808508
## Becks 2.3660808 2.2857313 4.0446639 2.0037202 1.2501115
## Kirin 2.8547419 3.1765983 4.5521713 0.8422027 0.7785034
## Pabst_Extra_Light 3.3591412 3.7029356 3.2816965 5.0759627 4.6336682
## Hamms 0.6875341 0.6068278 3.3454136 3.0335154 2.7340487
## Heilemans_Old_Style 1.3798178 0.7941165 3.9612923 3.4313954 3.0804259
## Olympia_Goled_Light 3.2098342 3.7589113 3.6258537 4.2940397 3.9826335
## Schlitz_Light 2.0429773 2.6447024 3.8221315 3.1427848 2.9150574
## Old_Milwaukee Augsberger Srohs_Bohemian_Style Miller_Lite
## Schlitz
## Lowenbrau
## Kronenbourg
## Heineken
## Old_Milwaukee
## Augsberger 1.5411174
## Srohs_Bohemian_Style 1.1529724 1.2266573
## Miller_Lite 2.7124475 3.4760348 3.0884077
## Budweiser_Light 2.7716223 3.5832060 3.2575686 0.8081580
## Coors 1.3507153 1.7109934 1.4092318 1.8415655
## Coors_Light 2.2910705 3.0835607 2.4725914 0.8146724
## Michelob_Light 2.4239167 2.7478844 2.5768934 1.2954515
## Becks 3.3741567 2.8241049 2.6434210 3.1671867
## Kirin 4.3840798 3.9594358 4.0965521 3.1121580
## Pabst_Extra_Light 3.5900677 4.7984133 3.9270245 2.2635759
## Hamms 1.2307318 1.7480134 1.2913002 1.9034640
## Heilemans_Old_Style 1.0828051 1.1810662 0.5230776 2.6528073
## Olympia_Goled_Light 4.0581785 4.9931362 4.4113827 1.6920939
## Schlitz_Light 3.2059710 3.8688057 3.5374917 0.5448379
## Budweiser_Light Coors Coors_Light Michelob_Light
## Schlitz
## Lowenbrau
## Kronenbourg
## Heineken
## Old_Milwaukee
## Augsberger
## Srohs_Bohemian_Style
## Miller_Lite
## Budweiser_Light
## Coors 1.9657761
## Coors_Light 1.2529869 1.4186610
## Michelob_Light 1.1930283 1.2104952 1.2812243
## Becks 3.3626474 2.2406464 2.7340113 2.2706129
## Kirin 3.1908882 3.0636458 3.1863491 2.3107286
## Pabst_Extra_Light 2.2392841 3.2405907 2.0743545 3.0000793
## Hamms 1.9968978 0.2504784 1.4075077 1.3275409
## Heilemans_Old_Style 2.8666790 0.9640584 2.0921695 2.1535201
## Olympia_Goled_Light 1.6240658 3.2905014 2.0169536 2.5316147
## Schlitz_Light 0.8642703 2.2333413 1.2321060 1.4095448
## Becks Kirin Pabst_Extra_Light Hamms
## Schlitz
## Lowenbrau
## Kronenbourg
## Heineken
## Old_Milwaukee
## Augsberger
## Srohs_Bohemian_Style
## Miller_Lite
## Budweiser_Light
## Coors
## Coors_Light
## Michelob_Light
## Becks
## Kirin 2.0054519
## Pabst_Extra_Light 4.4101287 4.8160486
## Hamms 2.3232863 3.2390072 3.1162778
## Heilemans_Old_Style 2.4165933 3.7003060 3.7415005 0.9031475
## Olympia_Goled_Light 4.1907285 3.9218100 1.5800965 3.2772677
## Schlitz_Light 3.2567746 2.8969220 2.4146863 2.3147609
## Heilemans_Old_Style Olympia_Goled_Light
## Schlitz
## Lowenbrau
## Kronenbourg
## Heineken
## Old_Milwaukee
## Augsberger
## Srohs_Bohemian_Style
## Miller_Lite
## Budweiser_Light
## Coors
## Coors_Light
## Michelob_Light
## Becks
## Kirin
## Pabst_Extra_Light
## Hamms
## Heilemans_Old_Style
## Olympia_Goled_Light 4.0688402
## Schlitz_Light 3.0937449 1.4619234
The minimum distance is between 17 and 11 which are Coors and Hamms. These two beers are combined into one cluster and the centroid of the cluster is considered as a point for the next step. The next two closest points/clusters are combined to form a bigger cluster and this continues till all the points are clustered into one big cluster.
Dendrogram¶
Dendrogram is a pictorial representation of merging of various cases as the Euclidean distance is increased. The distance is rescaled to a scale between 0 and 4. By drawing a vertical line at different values of re-scaled distance, one can identify the clusters. The dendrogram for beer dataset is shown below.
From the above plot, we can observe that Coors and Hamms were the closest and thus were clustered first.
Then Srohs_bohemian_style and Heilemans_Old_Style were merged into one cluster
Subsequently, the centroid of the coors-hams cluster is close to Schlitz, so all the three beers were clustered And so on until all the beers are finally clustered into one cluster
From the above dendrogram, I want to segment customers for effective marketing strategy. How many clusters are ideal?
If I take a cut-off of distance 2.5 in the dendrogram, we have 4 clusters, but if I take a smaller 1.5 as cut-off, the number of clusters increases to 12. So 4 (or 5) clusters seems to be an appropriate number of clusters.
Let us look at each of the clusters
Cluster 1¶
Cluster 1 contains Becks, Kronenbourg, Heineken and Kirin beers. They are imported brands into the US. They have high alcohol content, low sodium content and high costs. The target customers are brand sensitive, and the brands are promoted as premium brands.
Cluster 2¶
Cluster 2 contains Budweiser, Schlitz, Coors, Hamms, Augsberger etc beers. They have medium alcohol content and medium cost. They are the largest segment of customers.
Cluster 3¶
Cluster 3 contains light beers like Coors_light, Budwiser_light, Miller_lite etc. These are beers with low calorie, low sodium and low alcohol content. The target customers are the customer segment who want to drink but are also health conscious.
References¶
- Business Analytics: The Science of Data-Driven Decision Making - Dinesh Kumar (textbook for reference)
- Machine Learning Using Python - Manaranjan Pradhan and U Dinesh Kumar (textbook for reference)
- Exploratory Data Analysis with R - Roger D. Peng Online
- UC Business analytics R guide - University of Cincinnati - Online