More info at https://cryptocurrents.net/
Over the past few months, I’ve had the opportunity to mentor aspiring data scientists and enthusiasts of the field. As a data scientist myself, I couldn’t help but notice that all the mentees started off their journey by learning SQL!
It is not a secret that the amount of data science tools out there makes it very hard for people to quickly jump in to the field which is in high demand. Overall, this makes transitioning into data science really tough and scary!
If you were to ask any current data scientists about their most used tools, they will probably answer “SQL” and “Python”. However, what happends if you are transitioning without any programming background? MindsDB makes transitioning even easier for you as it’s an open-source tool that allows you to generate predictions using SQL.
Let’s dive in more into the tool by solving a time-series forecasting example with the famous Bitcoin Historical Data from Kaggle.
As previously mentioned, MindsDB allows you to make predictions directly in your database. That’s it. That simple. This is the reason why this is a great tool for mentees who are transitioning into data science, without the need to overcomplicate the learning with disproportionate number of tools.
In this tutorial, I want to showcase how to easily solve a time-series problem by using this tool. You can set up MindsDB in many ways, and they have amazing docs in their website explaining how to do so. We will be following the next steps:
Add data to existing/new MySQL database.
Deploy MindsDB Cloud.
Connect MindsDB using MySQL client.
Create a predictor on the data.
Visualize the results.
In this hands-on tutorial we will solely focus on MySQL, but you can use any of their supported databases. If you don’t have MySQL installed, you can do so by downloading the installer here and follow MySQL’s getting started guide.
Once you have MySQL installed, we can simply log in and create a database with the data that you can find here.
Let’s first log in mysql using the following command:mysql —local_infile=1 -u root -p
We will then create a database btc_data and populated with the data previously downloaded. The data contains a cleaned version of the daily weighted_price (weighted bitcoin price) of the original bitcoin historical dataset mentioned above. We have two columns: ds which corresponds to the (daily) timestamps, and y which corresponds to the weighted price of bitcoin.CREATE DATABASE btc_data; USE btc_data; CREATE TABLE btc (ds DATE NOT NULL, y FLOAT); LOAD DATA LOCAL INFILE '/path/to/btc_data.csv' INTO TABLE btc FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' IGNORE 1 ROWS (ds, y);
Where /path/to/btc_data.csv corresponds to the path where you have downloaded and stored the btc_data.csv file.
We can now check that our data has been loaded successfully by running the following command:SELECT * FROM btc LIMIT 10;
The next step that you need to do is to create a MindsDB account here. Then you need to connect your database. If your database is hosted locally, as mine is, you need to allow remote connection. I have done this by exposing it using ngrok .
You will use the selected host and port as your input to connect your local DB to MindsDB Cloud, with extra parameters:
Supported Database: MySQL in this case - choose any other depending on the database that you are using.
Integration Name: this name is chosen by you and how you want to name the integration between your database and MindsDB. I chose to call it btc_data.
Database Name: in this case btc_data.
Host: the host given by ngrok, in my case 2.tcp.ngrok.io.
Port: the port given by ngrok, in my case 17421.
Username: since I have my local database, I will use root.
Password: password to root user.
We can now connect to out MindsDB throughout MySQL client by running the following command:
We can now see the databases in MindsDB Cloud:
To create and train the predictive model, we will use the mindsdb database. You should see something like this when running the following command:USE mindsdb; SHOW tables;
We will create the predictor by running:CREATE PREDICTOR btc_pred FROM btc_data (SELECT * FROM btc) PREDICT y ORDER BY ds WINDOW 1;
Where btc_pred is the name of the predictor/model, btc_data corresponds to the integration database/name, and y is the name of the feature we want to predict from our data. For time-series forcasting, MindsDB also allows us to add additional flags such as ORDER BY or WINDOW, which allows us to specify the number of rows to “look back” into when making a prediction. You can see the status of your predictor by running:SHOW predictors;
One the training has completed; you should see something like this with status complete:
For you to have an idea, at this point we have spent couple of minutes setting up our environment and the training took approximately less than 1 minute. We can now proceed to run the predictive model on our data.
We can now run the predictive model to get the output of our predictions. You can do this by running the following custom query:SELECT orig_table.ds AS DATE, pred_table.y AS PREDICTED_Y, orig_table.y AS ACTUAL_Y FROM btc_data.btc AS orig_table JOIN mindsdb.btc_pred AS pred_table WHERE orig_table.ds '2011-12-30' ORDER BY orig_table.ds DESC LIMIT 10;
What we are doing here is a simple query to get back a table with the timeframe and predicted vs. actual weighted bitcoin price. You should get something like this (results are limited for the sake of representation):
You can also use the keyword LATEST as:SELECT orig_table.ds AS DATE, pred_table.y AS PREDICTED_Y, orig_table.y AS ACTUAL_Y FROM btc_data.btc AS orig_table JOIN mindsdb.btc_pred AS pred_table WHERE orig_table.ds LATEST;
You should then get the following table in return:
Now that we have or predictive model, let’s create empty rows on our original dataset to retrieve future predictions for representation. We are going to insert them using out MySQL server instance. An easy query statement to add the next 7 days after the last would look like this:INSERT INTO btc(ds, y) VALUES ('2020-09-15', NULL), ('2020-09-16', NULL), ('2020-09-17', NULL), ('2020-09-18', NULL), ('2020-09-19', NULL), ('2020-09-20', NULL), ('2020-09-21', NULL);
If you run the following command:SELECT * from btc ORDER BY ds DESC LIMIT 10;
You should see the output:
For representing the predictions, we are going to add dates until 2020-12-31. Using MySQL Workbench, I will query the predictions and export it into csv file to visualize it. You can use any tool to visualize the results, for example by integrating MindsDB with Tableau.
In this tutorial, we have learned how to easily create and apply a Machine Learning model for time-series forecasting. MindsDB is a great new and innovative tool to be used by those who are just starting off in the field of data science.
Data Science is lacking experts with inter-disciplinary backgrounds and with domain expertise outside of the programming world. MindsDB is really helping the field of data science by providing such easy tool to use for non-experts and can potentially help experts in the field with tasks such as experimentation and overall making the development and production environment less complex.
There are many other use cases that you can explore in their website. I believe their community will keep growing over the coming years, so don’t doubt to try it yourself and join the community through their Slack channel!
Share yor thoughts!
Leave a comment
Subscribe for more statistically significant topics!