Inference

Introduction

The fourth swimlane in Fig. 1 illustrates the Inference work stream. In this work stream, the inference application is developed and deployed. When the ML model is updated, the inference application will load the model from the registry. Finally, the inference application serves the updated model.

Activity Description Inputs Outputs
Develop Inference Application The inference application is developed using the coded ML model from the Modeling work stream Coded model Inference app
Deploy Inference Application The developed inference application is deployed to the server infrastructure. Developed inference app Deployed inference app
Update Model The inference application loads the updated ML model from the registry. Deployed inference app & updated model Deployed inference app loaded with updated model
Serve Model The inferene application serves the loaded ML model. The inference inputs are fed from the Pipelining work stream. The Inference inputs & deployed inference app loaded with updated model Inference outputs

Develop Inference Application

The software engineer designs and implements the inferene application that runs the coded ML model. The softwre engineer designs the application based on the business requirements. For example, the application may be designed to process the inference input file by batch or provide RESTful APIs to infer a message stream.

Deploy Inference Application

The inference application is deployed to the server infrastructure on-premises or in the cloud. The deployment can be done manually or automated with Infrastructure as Code.

Update Model

The inference application loads the ML model from the registry. When the ML model is updated in the Training work stream, the inference application will be triggered to load the updated model automatically.

Serve Model

After the inference applicaiton has loaded the ML model, the application processes the inference inputs from the Pipelining work stream and produces the inference outputs. The inference outputs will be regularly fed to the Pipelining work stream for validation. Some inference outputs can be selected to prepare the training data for re-training in the future.