With just a few kilobytes of resources, you'll be surprised how accurate you can get: Decision Tree, Random Forest, and XGBoost microcontrollers are now available: you can develop highly RAM-optimized applications for superfast classification on embedded devices.
Decision Treeis undoubtedly one of the best known classification algorithms.It's easy to understand that it's probably the first classifier you encounter in any Machine Learning tutorial.
We will not tell you the details of how a Decision Tree classifier trains and selects panes for input properties: here we will explain how such a classifier uses RAM efficiently.
Since we are willing to sacrifice program space (a.k.a. flash) in favor of memory (a.k.a. RAM), and RAM is the scariest resource in the vast majority of microcontrollers, the smart way to move the Decision Tree classifier from Python to C is to store the panes in the code by hard coding without any reference to the variables.
Since no variables are allocated, we use 0 bytes of RAM to get the classification result.On the other hand, the program area will grow almost linearly with the number of partitions.
Because program space is often much larger than RAM on microcontrollers, this application leverages this abundance to run larger models.How big is it?It will depend on the current flash size: many next-generation cards (Arduino Nano 33 BLE Sense, ESP32, ST Nucleus…) have 1 Mb flash to hold tens of thousands of compartments.
Random Forestconsists of numerous Decision Trees assembled in a voting scheme.The basic idea is the "wisdom of conservation",so much so that if many trees vote for a particular class (if they are trained on different subsets of the training set), this class is probably the real class.
Towards Data Sciencehas a more detailed guide on random forest and how it is balanced with bagging technique.
Random Forest, which is as easy as Decision Trees, takes the same application as the required 0 bytes of RAM (in fact, it takes as many bytes as the number of classes to store votes, but this is really insignificant): it only encodes all the generating trees.
XGBoost (Overgrade Boost)
Extreme Gradient Boosting is "Gradient Boost on Steroids" and has received great attention from the Machine learning community due to its best results in many data competitions.
- "Gradient enhancement" refers to the process of chaining a series of trees so that each tree tries to learn from the mistakes of the previous one.
- "Extreme(X)" refers to many software and hardware optimizations that greatly reduce the time required to train the model.
You can read the original article about XGBoost here.
Moving to Flat C
If you're new, you'll need a few things:
- Install the micromlgen package with:
pip install micromlgen
If you want to use Extreme Gradient Boosting on demand, install the xgboost package with:
pip install xgboost
micromlgen.portWe can create the code C flat using the function:
You can then copy and paste the C code and transfer it to your project.
Using in arduino draft
After you receive the classifier code, for example, create a new project named TreeClassifierExample and copy the classifier code to a file named DecisionTree.h (or randomforest.h or XGBoost.h depending on the model you selected).
Copy the following to the main .ino file.
To compare the three classifiers, we will consider these key points:
- Time to Learn
- required RAM
- required Flash size
For each classifier in various datasets. In the older generation Arduino Nano we will report results for RAM and Flash, so you should consider more relative figures than absolute figures.
|Gas Sensor Dataset||Decision Tree||1,6||0.781 ± 0.12||290||5722|
|13910 sample x 128 features||Random Forest||3||0.865 ± 0.083||290||6438|
|6 classes||XGBoost||18,8||0.878 ± 0.074||290||6506|
|Transaction Segmentation Dataset||Decision Tree||0,1||0.943 ± 0.005||290||5638|
|10000 sample x 19 features||Random Forest||0,7||0.970 ± 0.004||306||6466|
|5 classes||XGBoost||18,9||0.699 ± 0.003||306||6536|
|Driver Diagnostics Dataset||Decision Tree||0,6||0.946 ± 0.005||306||5850|
|10000 sample x 48 features||Random Forest||2,6||0.983 ± 0.003||306||6526|
|11 classes||XGBoost||68,9||0.977 ± 0.005||306||6698|
All datasets are retrieved from the UCI Machine Learning datasets archive.
We can collect more data for a complete comparison, but by the way you can see that both Random Forest and XGBoost are equal: if not, XGBoost takes 5 to 25 times longer to train.
You may receive a TemplateNotFound error when using
micromlgen, in which case you can go through the problem by removing and reinstaling the library:
pip uninstall micromlgen
Then go to Github,download the package as a zip and take the
micromlgenfolder to your project.