MindsDB Tutorial: How to Predict the Cost of Weather and Climate Disasters

MindsDB Tutorial: How to Predict the Cost of Weather and Climate Disasters

A Step-by-Step Guide to Predicting Disaster Costs

Introduction

The world we live in today is dominated by an explosion of data, with enormous amounts of information generated every second from various sources such as social media, web applications, sensors, and IoT devices. As a result, traditional approaches to data analysis and decision-making are no longer adequate to deal with the increasing volume, variety, and velocity of data. This is where machine learning enters the picture.

Machine Learning is a subfield of AI that involves the use of statistical models and algorithms to gain insights from data. Patterns and relationships are gotten from the data, these are then used to make predictions and form the basis of actions that are performed from the data.

The use cases of Machine Learning are numerous and it cannot be understated but with resources needed to be allocated to gathering the data, and other business expenses, there are now tools that exist to use Machine Learning on your data to extract insights from it and propel it forward.

Introducing MindsDB! MindsDB is an open-source tool that allows developers and regular users to create and use machine learning models without requiring a deep understanding of machine learning. MindsDB employs algorithms to identify patterns and relationships in data.

MindsDB and Machine Learning use cases can be viewed here. It applies across the board to every industry from Fraud Reduction to Inventory Management to Risk Assessment. MindsDB provides the tooling and capacity to build models that aid your processes for any conceivable task.

Explanation of the problem statement

This article explores how MindsDB can be used. It'll be used to create a model that then makes predictions on the cost of Weather and Climate Disasters in the US.

Explanation of the dataset used

The dataset used was gotten from Kaggle and can be found here. It contains information about disasters that occurred between 1980 and 2021 and is presented in CSV format. It includes information about the Name of the disaster, the start date of the disaster, the end date of the disaster, the Adjusted Cost of the disaster and the number of deaths that are related to each disaster in the dataset.

Explanation of the machine learning model used

For this problem, we made use of a regression model, for more information about Machine Learning models, MindsDB has excellent documentation for you.

Data Exploration

As stated earlier, the data was downloaded from Kaggle and needs a little bit of cleaning before it goes into MindsDB for a model to be created.

Data preprocessing steps

The Pandas Python Library is used to clean the data and prepare it for use after it has been downloaded and placed in the proper folder for use.

import pandas as pd 
df = pd.read_csv("events-US-1980-2021.csv", header=[1])

The header is set on the second row due to the first row being redundant for usage.

Data cleaning steps

df.head()

Results after importing the dataframe. The columns are now properly represented

The data still needs to be cleaned up, the dates need to be changed to a date-type format, and the columns need to be renamed to make it easier to create models.

Changing the dates to a date-type format

df['Begin Date'] = pd.to_datetime(df1['Begin Date'], format='%Y%m%d')
df['End Date'] = pd.to_datetime(df1['End Date'], format='%Y%m%d')

Renaming the Columns

df.rename(columns = {'Begin Date':'Begin_Date', 'End Date':'End_Date', 'Total CPI-Adjusted Cost (Millions of Dollars)':'total_CPI_adjusted_cost_millions_of_dollars'}, inplace = True)

df.to_csv("clean_events_US_disaster.csv")

The data is now clean and ready for MindsDB to work on.

Introducing MindsDB

MindsDB has been introduced earlier in this article and it is now time to explore the tool to show the full capability it provides. MindsDB brings machine learning into databases by employing the concept of AI Tables.

AI Tables are machine learning models stored as virtual tables inside a database. They facilitate making predictions based on your data. You can perform the time series, regression, and classification predictions within your database and get the output almost instantly by querying an AI Table with simple SQL statements.

For this article, we'll be making use of the MindsDB Web client to perform all our tasks. To create an account, simply go to the web, create an account and get signed in.

Once logged in, You should now see this page:

MindsDB Web Cloud GUI Page

Uploading the Data

MindsDB is an extensive tool and has integrations to work with almost any data source. Integrations that work with MindsDB include MongoDB, PostgreSQL, MySQL, Airtable, DB2, and so much more.

For this project though, we are uploading the CSV file that we have worked on.

Clicking on Upload file leads to this page where we import the file and name it with any name we would like to give it. The only rule for naming is that underscores are used instead of spaces. For this project, it is named clean_events_US_disaster

This is important because it will be used later.

Upload file page

Exploring the data.

SELECT * FROM files.clean_events_US_disaster LIMIT 10;

Exploring the data with SQL Query SELECT and using the LIMIT query to return only 10 options

The result of this query will be seen in the image above.

Creating the Model

Now that we have our data in MindsDB and have checked to make sure it exists, it is now time to use the tool at hand to create a model that takes the data in and helps make predictions

CREATE MODEL 
  mindsdb.predict_us_disaster_cost
FROM files
  (SELECT * FROM files.clean_events_US_disaster)
PREDICT total_CPI_adjusted_cost_millions_of_dollars;

The predictor is then trained. To begin, we use the CREATE MODEL statement to create a model, which can have any name. We then specify the data that will be used to train the model. Choosing the dataset we just uploaded using the FROM statement. This uses the columns in the dataset as the features we want to train on. Finally, we develop the Predictor function, which represents the desired predicted column. The total_CPI_adjusted_cost_millions_of_dollars column is specified using the PREDICT statement in this case. Congratulations! Running this query resulted in the creation of a model.

Monitoring the Model

Although the model has now been created, it is not always immediately usable. You can run the following query to keep track of and assess the development of the built-in model.

SELECT *
FROM mindsdb.models
WHERE name='predict_us_disaster_cost';

This provides information about the model you recently created. The status will read "generating" if it is still in training, and "complete" when it is finished.

This query returns information on the model name, the MindsDB ML engine that was used to perform the operation, the accuracy, the predict target, the training options, and the precise time it was created.

Using the Model for Prediction

Once the model is complete, you can use it to make predictions. As an illustration, we use the model to estimate the expense of a fictitious disaster. flooding that takes place between April 5 and April 9 of 2001, resulting in 20 fatalities.

 SELECT total_CPI_adjusted_cost_millions_of_dollars, 
       total_CPI_adjusted_cost_millions_of_dollars_explain 
FROM mindsdb.predict_us_disaster_cost
WHERE disaster = 'flooding'
AND Deaths=20
AND Begin_Date= '2001-04-09'
AND End_Date= '2001-04-05';

The result is shown below:

Results of the query above

The outcome includes the predicted value, the result's accuracy, as well as the lower and upper bounds between which the actual value is supposed to fall.

Conclusion

In the field of machine learning, MindsDB is a game-changer. It is the go-to tool for anyone looking to explore, experiment with, and use machine learning to solve complex problems due to its user-friendly interface and strong capabilities. By following the instructions provided in this article, you can begin using MindsDB right away to benefit from the global data explosion and discover novel insights that were previously unattainable. Why then wait? Start using MindsDB today.

It is very user-friendly and ought to be included in your toolkit to make ML tasks easier. MindsDB offers a ton more features, and their documentation can help you use the tool more effectively.