BentoML Core Concepts (1): Service Definition

Get into the habit of writing together! This is the 12th day of my participation in the "Nuggets Daily New Plan · April Update Challenge", click to view the details of the event .

A service definition is the embodiment of Service Oriented Architecture (SOA), a core building block in BentoML, where the user defines the service runtime architecture and models the logic of the service.

This article will dissect and explain the key components in a service definition. This will give you a comprehensive understanding of what constitutes a service definition and the responsibilities of each key component.

component

The model service definition created in our previous quickstart guide is shown below.

# bento.py
import bentoml
import numpy as np

from bentoml.io import NumpyNdarray

# Load the runner for the latest ScikitLearn model we just saved
runner = bentoml.sklearn.load_runner("iris_classifier_model:latest")

# Create the iris_classifier_service with the ScikitLearn runner
svc = bentoml.Service("iris_classifier_service", runners=[runner])

# Create API function with pre- and post- processing logic
@svc.api(input=NumpyNdarray(), output=NumpyNdarray())
def predict(input_array: np.ndarray) -> np.ndarray:
    # Define pre-processing logic
    result = runner.run(input_array)
    # Define post-processing logic
    return result
复制代码

From the above it can be seen that the BentoML service consists of three components:

  • Inference APIs
  • Runners
  • Service

Inference APIs

The inference API defines how to remotely access service functionality and customize preprocessing and postprocessing logic.

# Create API function with pre- and post- processing logic
@svc.api(input=NumpyNdarray(), output=NumpyNdarray())
def predict(input_array: np.ndarray) -> np.ndarray:
    # Define pre-processing logic
    result = runner.run(input_array)
    # Define post-processing logic
    return result
复制代码

By @svc.apidecorating a function with, we declare that the function is part of the API that can be accessed remotely. A service can have one or more APIs. @svc.apiThe input and output parameters of the decorator further define the expected IO (input output) format of the API.

In the above example, the API NumpyNdarraydefines the IO (input output) type through the IO descriptor numpy.ndarray. IO descriptors help to verify that input and output conform to the expected format and schema, and convert them to and from the original type. BentoML supports several IO descriptors, including PandasDataFrame, String, Imageand File.

The API is also a good place to define pre-processing and post-processing logic for model services. In the above example, predictthe logic defined in the function will be packaged and deployed as part of the service logic.

BentoML is designed to parallelize API logic by launching multiple instances of the API server based on available system resources. For best performance, we recommend defining an asynchronous API. For more information, refer to API and IO Descriptors .

Runners

Runners represent a logical unit of service that can be scaled horizontally to maximize throughput.

# Load the runner for the latest ScikitLearn model we just saved
runner = bentoml.sklearn.load_runner("iris_classifier_model:latest")
复制代码

Runners can be created by calling a framework-specific load_runner()function or by @svc.runnerusing a decorated implementation class with a decorator.

Framework-specific functions intelligently load the runner with the best configuration of the ML framework for the most fixed support.

For example, if the ML framework publishes the Python GIL and natively supports concurrent access, BentoML will create a single global instance of the runner and route all API requests to the global instance; otherwise, BentoML will create multiple runner instances based on available system resources.

Don't worry, we also allow users to customize the runtime configuration to fine-tune runner performance.

load_runner()The arguments to the function are the name and version of the model we saved earlier. Using the latestkeyword will ensure that the latest version of the model is loaded. The load runner also declares to the builder that specific models and versions should be packaged into bento when the service is built. We can also define multiple runners in a service.

To learn more, see the Runner Advanced Guide .

Service

A service consists of an API and a Runner, which can betoml.Service()be initialized via .

# Create the iris_classifier_service with the ScikitLearn runner
svc = bentoml.Service("iris_classifier_service", runners=[runner])
复制代码

The first argument to the service is the name, which will become Bento's name after the service is built.

Runners should be part of a Service, passed in via the runners keyword argument. The build time and runtime behavior of the service can be customized through the svc instance.

Guess you like

Origin juejin.im/post/7085489020637020167