Project Architecture

enter image description here In our project, the model gets loaded when we run the file where we define the server.

Then, the server starts and listens to any client request. Note that model loading happens only once(that is when we are importing) and we don't need to load the model for each request, since the model actively listens for any request, and the loaded model is used for responding to any number of requests the server is listening to.

User uploads an image in web-app.
Flask Server acts as a client of the Model Server. It sends the input image(in some encoded form) to the frontend of the Model Server.
Model Server invokes RequestHandler for predictions. A worker connects to the backend of the Model Server.
Model yields Predictions and the worker sends them to the backend of the Model Server.
The frontend of the Model Server responds to Flask Server with the Predictions
Flask Server renders an HTML template along with the predictions displayed.

Project- How to build low-latency deep-learning-based flask app

Project Architecture

XP

Please login to comment

Be the first one to comment!