Predictive Maintenance with Machine Learning: Process, Challenges, and Insights

Introduction

Chosen first in the project-based learning for a re-introduction to machine learning, I chose the Machine Predictive Maintenance Classification dataset on Kaggle. Classification remains one of the key techniques in machine learning and I wanted a dataset where I could perform the two main types of classification tasks; binary and multi-class classifcations tasks.

At the end of this project, I want to have gone through all the steps involved in a basic machine learning pipeline and have a Streamlit page for easy presentation and an article describing these steps.

Predictive maintenance is an engineering technique that uses data analysis to predict machine failure ahead of time. This ensures a smooth sailing operation, minimal downtime and cost savings

Quick Dataset Insights

Dataset Overview

A quick study of the first 5 rows in the columns brought forward the following observations:

  • There were unnecessary columns which were used for identification. These are the UDI and Product ID columns. They will be removed.

  • The dataset has multiple target columns for the binary and multi-class classification tasks.

Dataset Cleaning

Null Values

There are 10,000 data points in the dataset with no missing or null values.

Dataset Anomalies

With prior knowledge, there are anomalies in the dataset.

  • Values are classified as “failure” in the Target column but as “No Failure” in the failure_type column.

  • Values are classifed as “Random Failures” by Failure Type, but they are classifed as “No failure” by the Target column .

Investigating this reveals there are 27 data points falling under these anomalies. Removing them will not cause a major imbalance in the dataset as it is just 0.27%. Eliminating these anomalies leaves the dataset with 9,973 datapoints.

Detecting Outliers

From this, we can see there could be outliers in rotational_speed, torque and tool_wear because the difference in the 75% and the max is a lot more than the standard deviation. These outliers will be investigated and dealt with if need be.

EDA and Data Preprocessing

Addressing Class Imbalances

From further analysis in the dataset, there are imbalances which could affect the model if it is not addressed. Some imbalances:

  • For the failure_type columns, 97% of the data is “No Failure” while just 3% is of “Failure”. This imbalance will not be addressed bceause it is representative of real life results.

  • Within the failure type column, there is less of an imbalance but it will be corrected. My reasoning behind this is that for multi-class classification, where different failure types must be distinguished, this imbalance will affect the minority classes.

    SMOTE (Synthetic Minority Over-sampling Technique) is used to address the imbalance here. This technique generates synthetic samples from the existing minority-class cases, this allows is to boost the model’s exposure without just duplicating the data points. This helps reduce overfitting and boost the models abilityy to learn patterns better.

Feature Scaling

Data Scaling is the process of adjusting the values in a dataset to fit within a range. This improves the model and makes sure no feature is more important than another in the algorithm chosen.

A cheatsheet I found for scaling data:

  • Use MinMaxScaler as your default

  • Use RobustScaler if you have outliers and can handle a larger range

  • Use StandardScaler if you need normalized features

  • Use Normalizer sparingly - it normalizes rows, not columns

I made use of the StandardScaler for my dataset.

Encoding

Data encoding is the process of converting data into a format that can be processed by computers and machines. We have categorical and texts within columns, so we encode the data so the algorithms can be trained on them.

PCA (Principal Component Analysis)

Principal Component Analysis (PCA) is done to reduce the number of variables while allowing it to have as much variance as possible. By transforming the original features into a set of principal componests, PCA helps to identify the most significant patterns in the data. This helps simplify the dataset and also makes it easier to visualize and analyze. It also helps to reduce multicollinearity. Multicollinearity is when two or more variables are highly correlated. This analysis will allow you to make decisions on the number of components to keep for further analysis.

From this PCA, we see how each variable contributes to the analysis and so, we will not be removing any of them.

Correlation Heatmap

This heatmap shows the strong positive correlations between variables such as air_temperature and process_temperature which is to be expected. The negative relationship between rotational speed and torque is also explained becuase there is an inverse relationship between the two. The higher the rotational speeds, the lower the torque. Finding the balance between the two will help to avoid decline in machine performance.

Model Selection and Training

  • Choosing the right classification models (XGBoost, etc.).

  • Hyperparameter tuning strategies.

  • Training the model and evaluating performance (accuracy, precision-recall, AUC-ROC).

Before training models, the first thing to do is to spilt the dataset into train, test and validation sets. This was done with splits of 70,20,10.

Data Augmentation

Data augmentation is a technique that creates new data samples from existing data to improve machine learning models. It's used to increase the size and variety of training datasets, which can help models generalize better to new data.

It’s not always necessary to augment data, even though it is often times useful. I tested different feature engineering techniques to see which one would work best and these were the results:

  • A new feature temperature was created as the product of air_temperature and process_temperature. These two variables were dropped.

  • A feature power was created as the product of rotational_speed and torque. These two variables were dropped.

  • Both features power and temperature were created and the original 4 variables were dropped.

From the charts, it is clear that the original dataset works best without any form of augmentation.

Using Logistic Regression as the Benchmark

For classification tasks, logisitic regresion is oftentimes used as a benchmark because it is easy to implement. The model provides coefficients for each feature showing their exact impact on the final probability. This makes it easy to understand how features relate to the outcome.

It is also a simple model, if a more complex model does not significantly outperform logistic regression, it may not be worth using. It is quick to train and computationally efficient, making it great for large datasets. It is a great starting point for further exploration

[Input the dataframe showing the validation metrics in here]

Now that we have a benchmark for validating the models, we can start creating them.

Hyperparameter Tuning

In machine learning, there are parameters and hyperparameters. Parameters are variables adjusted during the model training phase for it to perform best. They are learned from the data such as weights and biases.

Hyperparameters are variables set before training. It is necessary to tune hyperparameters before training models because they control the learning process itself and they set optimal structures for the model to learn. Proper tuning can lead to better generalization on unseen data.

Once hyperparameter tuning is done, the models can be trained. For this project, I will be training KNN, SVC, RFC and XGBoost classifiers.

Data Validation

Validating the trained models come next. Using metrics such as ACC, AUC, F1 and F2 gives a detailed impression of what models performed best on the dataset.

We notice a few things from the visualization above:

  • XGBoost model performed best for all metrics.

  • There is not much deviation in scores from the validation to the test scores showing there is minimal overfitting.

The same steps hold for both the binary classification and the multi-class classification.

Deployment with Streamlit

The model training has now been done, the final steps involve saving the models and deploying them on a Streamlit page for easy access and usage. Streamlit is great to use simply because once you upload the code on Github, sign up for a Streamlit account, connect your repo to a page and, you’re up and rolling.

Deploying models makes them accessible for real-time predictions, making sure they’re useful and practical.

Conclusion

This article went through a step by step process and my thought process for working on a classification project with the Predictive Maintenance dataset. Some key things I learned are the imprortance of data preprocessing, hyperparameter tuning and model selection. You can check out the Github repo for all code and models. You can also visit the Streamlit page to test the project.

Thank you for reading!