Inference¶

Introduction¶

The fourth swimlane in Fig. 1 illustrates the Inference work stream. In this work stream, the inference application is developed and deployed. When the ML model is updated, the inference application will load the model from the registry. Finally, the inference application serves the updated model.

Activity	Description	Inputs	Outputs
Develop Inference Application	The inference application is developed using the coded ML model from the Modeling work stream	Coded model	Inference app
Deploy Inference Application	The developed inference application is deployed to the server infrastructure.	Developed inference app	Deployed inference app
Update Model	The inference application loads the updated ML model from the registry.	Deployed inference app & updated model	Deployed inference app loaded with updated model
Serve Model	The inferene application serves the loaded ML model. The inference inputs are fed from the Pipelining work stream. The	Inference inputs & deployed inference app loaded with updated model	Inference outputs

Develop Inference Application¶

The software engineer designs and implements the inferene application that runs the coded ML model. The softwre engineer designs the application based on the business requirements. For example, the application may be designed to process the inference input file by batch or provide RESTful APIs to infer a message stream.

Deploy Inference Application¶

The inference application is deployed to the server infrastructure on-premises or in the cloud. The deployment can be done manually or automated with Infrastructure as Code.

Update Model¶

The inference application loads the ML model from the registry. When the ML model is updated in the Training work stream, the inference application will be triggered to load the updated model automatically.

Serve Model¶

After the inference applicaiton has loaded the ML model, the application processes the inference inputs from the Pipelining work stream and produces the inference outputs. The inference outputs will be regularly fed to the Pipelining work stream for validation. Some inference outputs can be selected to prepare the training data for re-training in the future.