Previous Index Next

Understanding the advantages of INT8 quantization

Deep learning neural network models are available in multiple floating-point precisions like FP32 or FP16 but if we calibrate the model to 8-bit integer format (INT8) we could get accurate 8-bit models which can substantially speed up the inference, improve the performance, and reduce the required memory bandwidth. Such models are called, quantized models, i.e. the models that were trained in the floating-point precision and then transformed to integer representation with floating/fixed-point quantization operations between the layers. This transformation can be done using the Post-Training Optimization Tool (POT). Let's get a brief overview of POT in the next topic.

The table in INT8 vs FP32 Comparison illustrates the performance gain of the model by switching from an FP32 representation to INT8 representation.

Optimization and Quantization of Models for better performance

Understanding the advantages of INT8 quantization

XP