Fundamentals Of Unsupervised Learning with Python

Unsupervised Learning

Unsupervised Learning is a part of machine learning and mostly intake with AI. In layman language, unsupervised is a machine learning from unsettled and unlabeled data, without supervision and patterned structured based on predictions only. With all these factors one can say that, unsupervised learning consists of a variety of techniques from clustering to factorization and density estimation. Here we will look at its fundamentals, matrix and implementation of algorithms. To understand this concept in detail, let’s have a look on its factor one by one!

To understand unsupervised learning, let’s first look at its building blocks like clustering, prediction and label!

Clustering: It is a basic phenomena of data structuring and classifying. In plain words clustering is a collection and group making of data on behalf of their similarities and dissimilarities. It’s more like dividing and manufacturing data in subsets, known as clusters, and after this, these clusters than use as data processing. Below is one simple reference for a cluster description.


So, how exactly the predictions have been made in unsupervised process.

  • First of all, an input has been given to the machine.
  • Then an input check respective cluster.
  • If a cluster met with an input, then the prediction being made, otherwise crashed.

K – Means and Mean shift algorithms are the main fundamentals used to make clusters. Let’s have a keen look at both.

K- means algorithm is one of the renowned criteria for cluster data, and to do this we have to assume the flat number of clusters to determine the object, which is why this terminology is called as flat clustering. The below steps are qualified enough to investigate about flat clustering.

  • Set desire number of K subgroups.
  • Now fix a number of clusters and classify each data with a cluster.This is an iterative algorithm, so we need to update the location until and unless the centroid react at its optimal location. To understand in deeper, we have an example below.

The below code can be used to generate a two dimensional database.

Use below code to visualize the dataset.

Now initializing the K-means to be the K-Means algorithm, the required parameter to set like how many clusters (n_clusters).

kmeans = KMeans(n_clusters = 4)

Mean Shift Algorithm:

It is also an another powerful algorithm to set cluster, but unlike than K-means, mean shift algorithm doesn’t work on assumptions, because of its non-parametric nature.

Below are the basic steps to initialize the mean shift algorithm.

  • Firstly, go with the data pointer assignment to their own cluster.
  • With this it will compute the centroid and will assign new location, with this process it will move towards higher density of the cluster.
  • Once the peak of density has been achieved, then the stage will come, where the centroid will not move anymore.

Now about the coding part! With the following code one can implement mean shift clustering algorithm in python.

make_blob from the sklearn.dataset will help in generating the two-dimensional dataset with four blobs.

from sklearn.datasets.samples_generator import make_blobs

Now to visualise the actual generated code, use the following code:

centers = [[2,2],[4,5],[3,10]]
X, _ = make_blobs(n_samples = 500, centers = centers, cluster_std = 1)

Hierarchical Clustering

Hierarchical clustering is an another mean of clustering, to create a hierarchy of clusters. It works with the assigned data on each set and make groups with most closest groups. This algorithm ends when only a single cluster left. There are two main types of hierarchy clustering to expand data, which are Agglomerative and Divisive.

Agglomerative; also known as bottom-up approach. With this criteria each cluster merges in a way to form linkage criterion, and this hierarchy come in a form when, clusters of interest made of only few linked observations. It is much applicable for large clusters and more effective than K-means.

Divisive: Divisive and also known a top-to down! In this all moves come from one cluster, and splits one moves down. This process is relatively slower than agglomerative and K-means.

Now let’s have a look at one example of hierarchy clustering with very simple data:

In [4]

# generate the linkage matrix
Z = linkage(X, ‘ward’)

In [5]

from scipy.cluster.hierarchy import cophenet
from scipy.spatial.distance import pdist

c, coph_dists = cophenet(Z, pdist(X))

Out: 0.98001483875742679

With these very simple and easy to navigate examples, one can understand a very firm role of data, cluster, prediction and inputs. All these examples can be perform on relevant platform with python installation.


Please enter your comment!
Please enter your name here