News and Blog

Unsupervised Learning in Machine Learning Algorithms

Giribone - September
AI in banking managementNews

Unsupervised Learning in Machine Learning Algorithms

Edited by Pier Giuseppe Giribone

In the previous article in this series, we discussed the computational paradigms associated with supervised learning in machine learning systems. This article continues this discussion, illustrating the most common classification criteria based on learning types, focusing in particular on unsupervised learning.

Unlike a supervised learning problem, the training dataset of an unsupervised learning algorithm contains only the inputs, without specifying the correct outputs.

These methodologies are generally used to investigate data characteristics and for their preprocessing.

Unsupervised learning can be compared to the action of a student who simply organizes problems based on their structure and intrinsic attributes, without conceptualizing the solution method, not knowing the correct outputs.

In unsupervised learning, the training dataset therefore has no labels: the system tries to learn without a teacher.

For example, suppose we have a large amount of data about blog visitors. We could use a clustering algorithm that attempts to identify groups of similar visitors, in order to attribute a new visitor to an existing group by building connections within the data without the need for supervision.

For example, the algorithm might identify that 60% of visitors to a classic car blog are people between the ages of 40 and 50 who tend to access new posts in the late afternoon, while the motorcycle racing column is primarily followed by a younger audience, between the ages of 15 and 25, who visit the site in the early afternoon, likely after school. This information is valuable for the webmaster, who can choose the most appropriate times to publish posts or the type of advertising to target to different visitor segments.

An example in the banking sector could be customer segmentation: the analysis can be based on age, wealth, investment frequency, and other variables deemed significant to identify which investment products are most attractive to different clusters. Based on these preferences, new offerings can be proposed to the same target audience, or existing products can be offered to customers with similar characteristics and habits.

Clustering is often combined with algorithms for graphically displaying the results. These are represented in two or three dimensions and allow for a quick and intuitive understanding of the data organization, with the ability to highlight anomalous patterns.

Another task that can be handled with unsupervised algorithms is dimensionality reduction, the goal of which is to simplify data without losing too much relevant information. One approach is to combine multiple correlated features into a single one. For example, in the supervised task of car price regression, mileage might be considered to be strongly correlated with vehicle age.

In banking, when valuing a property pledged as collateral for a loan, its zip code is often strongly correlated with its geographic location; therefore, the number of variables influencing the property's price can be reasonably reduced. This process is known as feature extraction and is part of the data pre-processing process before it is processed by a predictive model.

It's often considered good practice to try to reduce the size of the training data using such an algorithm before applying a supervised machine learning algorithm. This saves computation time, the data takes up less memory, and often produces better results.

Another example of this category of problems that can be solved with an unsupervised machine learning algorithm is anomaly detection, which is useful for discovering anomalous transactions to prevent fraud, detect defects in industrial products, or, more generally, remove anomalous values ​​(outliers) from a data set.

The basic idea is that if the system is presented with mostly normal instances during the unsupervised learning phase, it learns to recognize them; consequently, when a new instance is presented, the system can assess whether it resembles a normal one or whether it exhibits abnormal characteristics.

A related task is novelty detection, the goal of which is to detect new instances that appear different from all those used in the training set. This requires a very "clean" training set, devoid of any instances the algorithm is intended to detect.

Finally, another common unsupervised task is association rule learning, in which the goal is to explore large amounts of data to find interesting relationships between attributes. For example, analyzing sales logs from a home improvement store might reveal that customers interested in purchasing gardening tools (such as shears) also frequently purchase protective gear (such as gloves). The machine learning system could then suggest placing these items close together, as they are often purchased together.

Similarly, in a banking context, a client who regularly invests in structured bonds might also be potentially interested in investment certificates, which combine a bond component with an options strategy. This correlation could be highlighted by a financial advisor following an automatic alert from an unsupervised algorithm based on association rules.

The next article will cover the last two major categories of learning-based algorithms: semi-supervised learning and reinforcement learning.

Select the fields to be shown. Others want to be hidden. Drag and drop to rearrange the order.
  • Image
  • SKU
  • Rating
  • Price
  • Stock
  • Availability
  • Add to Cart
  • Description
  • Content
  • Weight
  • Size
  • Product information
Click outside to hide the comparison bar
Compare