Project- How to build low-latency deep-learning-based flask app

1 / 17

Introduction




Not able to play video? Try with vimeo

This project is in continuation with Project - How to Deploy an Image Classification Model using Flask.

If you observe, the amount of time taken to see the predictions for a given image is as high as nearly 10 seconds. This is not expected in real-time applications, as users expect the apps to be fast.

In this project, we would exclusively work towards improving the performance of that image classification app and bring down the time of execution to as low as nearly 0.2 seconds on an average.

Further, we will learn how to refactor the monolith service built earlier into separate smaller services. Monolith is a structure carved out of single stone. So, the previous service we built is a monolith service - it is trying to do both running the web application as well as doing deep learning. It would make more sense to take out the deep learning model and make it a separate service because the deep learning models have different execution pattern - they might need to be run GPUs, the CPU consumption could be very high as well as their users could be more. So, if we make the deep learning model as a separate service we will be able to scale it separately (run it own many machines using Docker and Kubernetes - you can learning about in our DevOps course). Also, if the deep learning model such as Imagenet is a separate service, other teams in your company can also use it.

So it is expected that you are already aware of the working of the code of the Project - How to Deploy an Image Classification Model using Flask.

  • Please do not delete the virtual environment as mentioned in the last step of that project since we will be considering that project as the very first step for this project.

Loading comments...