Get into the habit of writing together! This is the 12th day of my participation in the "Nuggets Daily New Plan · April Update Challenge", click to view the details of the event .
A service definition is the embodiment of Service Oriented Architecture (SOA), a core building block in BentoML, where the user defines the service runtime architecture and models the logic of the service.
This article will dissect and explain the key components in a service definition. This will give you a comprehensive understanding of what constitutes a service definition and the responsibilities of each key component.
component
The model service definition created in our previous quickstart guide is shown below.
# bento.py
import bentoml
import numpy as np
from bentoml.io import NumpyNdarray
# Load the runner for the latest ScikitLearn model we just saved
runner = bentoml.sklearn.load_runner("iris_classifier_model:latest")
# Create the iris_classifier_service with the ScikitLearn runner
svc = bentoml.Service("iris_classifier_service", runners=[runner])
# Create API function with pre- and post- processing logic
@svc.api(input=NumpyNdarray(), output=NumpyNdarray())
def predict(input_array: np.ndarray) -> np.ndarray:
# Define pre-processing logic
result = runner.run(input_array)
# Define post-processing logic
return result
复制代码
From the above it can be seen that the BentoML service consists of three components:
- Inference APIs
- Runners
- Service
Inference APIs
The inference API defines how to remotely access service functionality and customize preprocessing and postprocessing logic.
# Create API function with pre- and post- processing logic
@svc.api(input=NumpyNdarray(), output=NumpyNdarray())
def predict(input_array: np.ndarray) -> np.ndarray:
# Define pre-processing logic
result = runner.run(input_array)
# Define post-processing logic
return result
复制代码
By @svc.api
decorating a function with, we declare that the function is part of the API that can be accessed remotely. A service can have one or more APIs. @svc.api
The input and output parameters of the decorator further define the expected IO (input output) format of the API.
In the above example, the API NumpyNdarray
defines the IO (input output) type through the IO descriptor numpy.ndarray
. IO descriptors help to verify that input and output conform to the expected format and schema, and convert them to and from the original type. BentoML supports several IO descriptors, including PandasDataFrame
, String
, Image
and File
.
The API is also a good place to define pre-processing and post-processing logic for model services. In the above example, predict
the logic defined in the function will be packaged and deployed as part of the service logic.
BentoML is designed to parallelize API logic by launching multiple instances of the API server based on available system resources. For best performance, we recommend defining an asynchronous API. For more information, refer to API and IO Descriptors .
Runners
Runners represent a logical unit of service that can be scaled horizontally to maximize throughput.
# Load the runner for the latest ScikitLearn model we just saved
runner = bentoml.sklearn.load_runner("iris_classifier_model:latest")
复制代码
Runners can be created by calling a framework-specific load_runner()
function or by @svc.runner
using a decorated implementation class with a decorator.
Framework-specific functions intelligently load the runner with the best configuration of the ML framework for the most fixed support.
For example, if the ML framework publishes the Python GIL and natively supports concurrent access, BentoML will create a single global instance of the runner and route all API requests to the global instance; otherwise, BentoML will create multiple runner instances based on available system resources.
Don't worry, we also allow users to customize the runtime configuration to fine-tune runner performance.
load_runner()
The arguments to the function are the name and version of the model we saved earlier. Using the latest
keyword will ensure that the latest version of the model is loaded. The load runner also declares to the builder that specific models and versions should be packaged into bento when the service is built. We can also define multiple runners in a service.
To learn more, see the Runner Advanced Guide .
Service
A service consists of an API and a Runner, which can betoml.Service()
be initialized via .
# Create the iris_classifier_service with the ScikitLearn runner
svc = bentoml.Service("iris_classifier_service", runners=[runner])
复制代码
The first argument to the service is the name, which will become Bento's name after the service is built.
Runners should be part of a Service, passed in via the runners keyword argument. The build time and runtime behavior of the service can be customized through the svc instance.