top of page

Market Expansion Using Cluster analysis

  • Writer: Yash Raj
    Yash Raj
  • Jun 2, 2020
  • 2 min read

Updated: Jun 3, 2020

Have you ever wondered how companies like Dunkin, Starbucks, etc are using data analytics to select markets for expansions or locating their stores in a strategic location? 


Nowadays there is an abundance of data, data analytics techniques are proving to be decisive in locating a potential market for future expansion. This blog is dedicated to one such method to find the best location out of available locations.


Problem Statement

The problem is to select the best market for expansion where the market's feature e.g. competitor's sales, the average experience of dealers, and average operating margins, etc are given.


Clustering to filter out markets with growth potential


Cluster Analysis

Cluster analysis is an unsupervised machine learning algorithm used to classify objects based on similarities/dissimilarities of features.


There are different clustering techniques and distance measures that are used based on objective and data type e.g. Hierarchical clustering is used for a dichotomous data type, Two-step clustering is used when data has both categorical and numerical data types and K means clustering is done when the data type is metric.


Methodology

The idea is to classify locations based on their features into preferably two classes namely "Good markets" and "Bad markets". Good markets are the locations that have higher earning potential or favorable features and Bad markets are the opposite.


The objects in the data don’t have labels attached to it hence, supervised learning algorithms can’t be employed to classify locations. Cluster analysis can be used in these cases to classify objects into two groups


The objective of clustering is formulated using two dummy locations namely “ideal location” and “Worst location”. The ideal location is defined as the dummy location which has all the possible best features while the worst location is defined as the one which has all the worst features.


Based on the nature of data, clustering to be employed needs to be chosen. I chose K Means since the data type available was numeric. Once the cluster membership vector is generated, objects which are in the same cluster with "Worst Location" are rejected and only the remaining objects are considered for the ranking of markets.


The ranking is done based on how similar the remaining markets are when compared to the "Ideal location", higher the similarity better is the rank. To measure similarity in K means “Euclidean distance” was used (distance measurement to be chosen based on the data type)


Note


  • There are two types of clustering namely hard clustering and soft. Hard clustering is the one that I have used. Here, members belong to only a particular cluster, whereas in soft clustering/Fuzzy clustering, objects can be a partial member of multiple clusters.


  • Clustering technique depends on the objective of clustering along with the data set available


  • Different distance measures have different properties and needs to be used as and when required, e.g. "euclidean distance for numeric", jaccard distance for dichotomous variables, etc


Give its implementation a try and let us know your results along with the challenges faced.


Reference:




 
 
 

Comentarios


Subscribe Form

  • instagram
  • linkedin

©2020 by rampkart

bottom of page