Welcome to the project on How to build low-latency deep-learning-based flask app. In this project, we will refactor the entire codebase of the project How to Deploy an Image Classification Model using Flask. That monolithic code will be refactored to form two microservices - the flask service and model service. The model service acts as a server that renders pretrained Tensorflow model as a deep learning API, and keeps listening for any incoming requests. The flask service requests the model service, and displays the response from the model server. This way, we write cleaner code and promote service isolation.
Further, we will introduce an engineering technique, wherein we introduce the concept of asynchrony using ZMQ networking library, in order to reduce deep learning api response time and thus make the app faster in return. By integrating ZMQ with this client-server architecture, we improve the latency of this microservice-based app. Upon completing this project, we would optimize inference time to classify the input image, and make it a low-latency web application, thus making it deployable in real-time production environments.
Blog : Improving the Performance of Deep-Learning based Flask App with ZMQ
This project depends on another project: How to Deploy an Image Classification Model using Flask
Skills you will develop: