Embedded machine learning
Machine learning models are everywhere, and we don't even know about them 👻
Hidden in our smartphones, our smartwatches but also in less common things such as street furnitures, industrial tools and even washing machines—yes, for real.
Why would I run an algorithm on a device rather than on the cloud ?
Well, there are many arguments in favour of embedded algorithms. To me, the most important thing is privacy. Smart objects are a nightmare for security. Any data sent to a server can be hacked. Thanks to embedded systems, the data is stored and processed on the device and doesn't need to be sent to a distant server.
Some other arguments in favour of embedded algorithms could be:
- Low energy consumption: Models are tweaked to be as light as possible. This allows them to be energy efficient. Some computer vision models, known for being heavy and computation intensive, can fit on a device powered with 2 watts.
- Latency : The model stays in the same place as the data. There is no need to connect to a server where the network latency can become a true bottleneck.
However, this comes at a cost. As you can imagine, it’s hard to fit a model into low-memory and low-performance hardware.
With small devices comes small
This is the biggest problem for embedded machine learning. How to deal with low performances? Well, there are many techniques. The best technique is still to choose the smallest algorithm available from the start. 😅
Linear regression might be the most used algorithm in data science and this is for a good reason. It’s the most efficient algorithm you can find, and it’s good enough in many cases. However, linear regression isn’t an all-in-one solution. Sometimes, you need to use a more sophisticated algorithm. To reduce the weight and latency of most models, reducing the number of features is the best solution.
What if I have a deep learning model ?
Deep Learning models are known for their weight and lengthy inference time. Indeed, without a modern GPU, many algorithms can take several seconds to execute.
There are solutions to allow our models to split their weights and inference times with minimal impact on model performance. This technique, called quantization, converts the weights of a model from a float32 type to a smaller type like a float16 or int8. This makes it possible to divide the weight of a model by 2 or 4 and drastically accelerate its inference time.
Another option, called pruning, allows for some weights to be removed from the model according to given strategies. For example, if a weight is zero or close to zero, it can be deleted. There are also pruning algorithms that target entire layers of a network if they provide little or no performance gain.
Last but not least, hardware
The last step to embed a model in a connected device is to have hardware that is up to the task. There are thousands of hardware solutions to support a model. If inference time is not really a problem, then a simple Raspberry Pi can support most of your algorithms. If, on the other hand, you have speed constraints, you can turn to solutions such as Google Coral or Jetson Nano which include GPU chips for deep learning models, and only cost a few dozen euros.
You’re now all set for running your first model on an embedded device, have fun!
Thanks for reading ! You can reach me on Linkedin or contact the IALab team at ia-lab.fr.