Binary Neural Networks: A Game Changer in Machine Learning

Raw Text

Member-only story

Vikas Kumar Ojha · Follow

Published in Geek Culture · 9 min read · Feb 19

--

1

Share

Kier... in Sight

Unsplash

Whenever we think of neural networks, GPUs automatically pop into our minds because they are very compute-heavy and often require powerful GPUs to train them. Most of the state-of-the-art models require GPUs even for fast inference which makes them not so cost-friendly for the end users. Often while deploying neural networks quantization is used widely to speed up the inference.

In quantization, the precision of weights and activations are reduced from float32 to float16 in the case of GPUs and int8 in the case of CPUs this reduction in precision not only saves the memory but also speeds up the network as reducing precision also reduces the memory access time. But still, the weights have int8 values.

In this blog, we would explore a new type of neural network called Binary Neural Networks which stores weights in binary values ie. 1 and -1 which is also termed 1-bit quantization. Due to 1-bit quantization, these networks are extremely efficient to train and inference and are suitable to be deployed on embedded devices and microcontrollers.

Some Important Theoretical Concepts

Any process that happens in a computer can be classified into either computation or memory access operation. Performing computation is costly but performing memory access is even more energy-demanding. That is the reason why performing calculations at high precision is often slower in comparison to doing the same with low precision hence quantization in neural networks helps in improving their inference speed as well the size. In binary neural networks, the weights and activations are converted into binary values i.e -1 and 1. Let's understand how it is done and several other improvements in detail.

Binarizing Weights and Activations

The weights of a neural network in general are initialized with very small values as we know. In the case of binary neural networks, these are binarized with the help of a suitable binarizing function. Below are the two functions used to binarize the weights and activation.

The sign function is given by:

And stochastic binarization is given by:

Single Line Text

Member-only story. Vikas Kumar Ojha · Follow. Published in Geek Culture · 9 min read · Feb 19. -- 1. Share. Kier... in Sight. Unsplash. Whenever we think of neural networks, GPUs automatically pop into our minds because they are very compute-heavy and often require powerful GPUs to train them. Most of the state-of-the-art models require GPUs even for fast inference which makes them not so cost-friendly for the end users. Often while deploying neural networks quantization is used widely to speed up the inference. In quantization, the precision of weights and activations are reduced from float32 to float16 in the case of GPUs and int8 in the case of CPUs this reduction in precision not only saves the memory but also speeds up the network as reducing precision also reduces the memory access time. But still, the weights have int8 values. In this blog, we would explore a new type of neural network called Binary Neural Networks which stores weights in binary values ie. 1 and -1 which is also termed 1-bit quantization. Due to 1-bit quantization, these networks are extremely efficient to train and inference and are suitable to be deployed on embedded devices and microcontrollers. Some Important Theoretical Concepts. Any process that happens in a computer can be classified into either computation or memory access operation. Performing computation is costly but performing memory access is even more energy-demanding. That is the reason why performing calculations at high precision is often slower in comparison to doing the same with low precision hence quantization in neural networks helps in improving their inference speed as well the size. In binary neural networks, the weights and activations are converted into binary values i.e -1 and 1. Let's understand how it is done and several other improvements in detail. Binarizing Weights and Activations. The weights of a neural network in general are initialized with very small values as we know. In the case of binary neural networks, these are binarized with the help of a suitable binarizing function. Below are the two functions used to binarize the weights and activation. The sign function is given by: And stochastic binarization is given by: