Improving the Performance of Deep-Learning based Flask App with ZMQ

Introduction

It is a well-known fact that deep learning models are heavy; with a lot of weights for the deep layers. And it is obviously an overhead to load the model every time we need to get the predictions from the model. Thus this is costly in terms of the time of execution.

In this project, we will mainly focus on addressing this issue, by uniquely integrating the networking functionalities provided by ZMQ library. We will build a server-client based architecture to make the model load exactly once(that is during the starting of the app). The predictions from the model will be served by the model server, as long as it listens to its Flask client which requests it for the predictions for an input image.

Continue reading “Improving the Performance of Deep-Learning based Flask App with ZMQ”