How to reduce the call cost of large models through the gateway and improve compliance

Author: Zhao Weiji (Zhao Wei)

The AIGC technology represented by ChatGPT has brought great changes to enterprise production and occupies a place in the field of enterprise application development. With its powerful learning ability, AI large models can help people complete various complex tasks, such as helping developers write and debug code, researchers quickly understand scientific research fields, marketers write product descriptions, designers design new works, etc. Many enterprises are exploring how to reduce the cost of using large AI models, and API management of large AI models through gateways has become a common requirement.

How does Higress reduce the cost of using large AI models?

Taking OpenAI as an example, OpenAI's API calls are not billed based on the amount of requests or subscription time, but are billed based on the usage of each request. For large AI models, the number of tokens input and output by the model can better measure the complexity of the reasoning task of the current model. Therefore, billing based on token as the request usage is the standard billing policy of OpenAPI. The billing standards for different model tokens are also different. The more complex the model will generate better results, but it will also bring higher billing. OpenAI completes functions such as user authentication and charging by distributing API keys to users.

insert image description here

It is obviously unrealistic for an organization to apply for the access authority (API Key) of the AI ​​large model for each member. Decentralized API keys will not be conducive to the organization's calculation, management and payment of API usage, thereby increasing the cost of using AI large models. Secondly, for organizations, the selection of AI models, frequency of use, access rights of members, and what data is exposed to AI large models are all functions that need to be paid attention to in management.

Higress 基于丰富的插件能力,提供认证鉴权、请求过滤、流量控制、用量监测和安全防护等功能,帮助组织与 AI 大模型的 API 交互变得更加安全、可靠和可观察:基于 Higress 提供的认证鉴权能力,组织可以实现通过统一的 API 密钥进行 AI 模型的调用量管理和付费等,并为团队成员授予不同的AI模型访问权限;基于 Higress 提供的流量控制能力,组织能为不同的模型与用户设置差异化的访问速率限制,有效降低 AI 模型的使用成本;基于 Higress 提供的请求拦截能力,组织能够有效过滤含敏感信息的访问请求,防护部分内部站点资源不对外暴露,从而有效保障内部数据安全;基于商业版 Higress [ 1] 提供的开箱即用的指标查询和日志记录的能力,组织能够完成对不同用户的 AI 模型调用的用量观测与分析,从而制定更加合理的AI模型使用策略。

Higress docking OpenAI large language model actual combat

Next, we will take Higress connecting to the OpenAI large language model as an example to introduce how Higress seamlessly connects to the AI ​​large model. The overall scheme is shown in the figure. Based on WASM, we have extended the Higress plug-in to realize the request agent forwarding of the OpenAI language model. Based on the capabilities of the Key Auth authentication plug-in provided by Higress, we implement multi-tenant authentication under a unified API-Key. Based on the request filtering capability of Request Block provided by Higress, we will implement the interception of requests containing sensitive information to ensure the security of user data.

insert image description here

prerequisite

  1. Install Higress, refer to the Higress installation and deployment document [ 2]
  2. Prepare the development environment for developing WASM plug-ins in Go language, refer to Developing WASH plug-ins in GO language [ 3]

WASM-based AI Proxy Plugin

The following will give the AI ​​large model API proxy plug-in solution based on Higress and WASM. Higress supports the ability to expand externally based on WASM. The multi-language ecology and hot-swapping mechanism provided by WASM plug-ins facilitate the implementation and deployment of plug-ins. Higress also supports requesting external services in plug-ins, providing an efficient solution path for the implementation of AI agent plug-ins.

Implementation example

We give an implementation example of the proxy plugin of OpenAI-API, please refer to AI proxy plugin [ 4] for details . The following code realizes that after the relevant configuration of the plug-in is completed, the request proxy is automatically forwarded to OPENAI-API based on HTTP, and the response from OPENAI-API is received, thereby completing the call of the AI ​​model. The specific implementation steps are as follows:

  1. Specify the specific OPENAI-API host through the RouteCluster method, confirm the specific path of user request forwarding, and create a new HTTP Client for request proxy forwarding.
func parseConfig(json gjson.Result, config *MyConfig, log wrapper.Log) error {
  chatgptUri := json.Get("chatgptUri").String()
  var chatgptHost string
  if chatgptUri == "" {
    config.ChatgptPath = "/v1/completions"
    chatgptHost = "api.openai.com"
  } //请求默认转发到OPEN AI API
    ...
  config.client = wrapper.NewClusterClient(wrapper.RouteCluster{
    Host: chatgptHost,
  }) //通过RouteCluster方法确认请求转发的具体host
    ...
}
  1. Encapsulate the user request in OPENAI-API format, forward the request and accept the response through the HTTP Client, and forward the response to the user.
//OPENAI API接收的请求体模版,详见:https://platform.openai.com/docs/api-reference/chat
const bodyTemplate string = `
{
"model":"%s",
"prompt":"%s",
"temperature":0.9,
"max_tokens": 150,
"top_p": 1,
"frequency_penalty": 0.0,
"presence_penalty": 0.6,
"stop": ["%s", "%s"]
}
`
func onHttpRequestHeaders(ctx wrapper.HttpContext, config MyConfig, log wrapper.Log) types.Action {
    ...
    //根据用户的请求内容进行OPENAI API请求体封装 
  body := fmt.Sprintf(bodyTemplate, config.Model, prompt[0], config.HumainId, config.AIId)
  //通过HTTP Client进行转发
    err = config.client.Post(config.ChatgptPath, [][2]string{
    {"Content-Type", "application/json"},
    {"Authorization", "Bearer " + config.ApiKey},
  }, []byte(body),
    func(statusCode int, responseHeaders http.Header, responseBody []byte) {
      var headers [][2]string
      for key, value := range responseHeaders {
        headers = append(headers, [2]string{key, value[0]})
      }
            //接收来自于OPENAI API的响应并转发给用户
      proxywasm.SendHttpResponse(uint32(statusCode), headers, responseBody, -1)
    }, 10000)
    ...
}

The process of enabling the custom AI-Proxy-Wasm plug-in in Higress is as follows:

insert image description here

This example provides the compiled AI-proxy-plugin-wasm file and completes the construction and push of the corresponding docker image. The recommended configuration is as follows:

insert image description here

Plug-in configuration instructions

The plug-in is easy to configure and supports proxy forwarding at the global/domain name level/routing level. Route-level configuration is recommended: select the corresponding route configuration -select the corresponding route - policy - enable plug-in . Configuration fields include:

insert image description here

An example configuration is as follows:

AI-Proxy-Plugin-Config

apiKey: "xxxxxxxxxxxxxxxxxx"
model: "curie"
promptParam: "text"

According to this configuration, the gateway proxies to the curie model under the OpenAI API, and the user enters text in the url through the text keyword.

curl "http://{GatewayIP}/?text=Say,hello"

Get the response from OpenAI API:

insert image description here

Multi-tenant authentication based on Key Auth

Different from the form of issuing AI-API keys for each member, enterprises can rely on internal authorization (such as Key Auth, etc.) to manage members’ access to AI models based on the authentication and authentication capabilities provided by the Higress gateway, thereby limiting the services and models that members can use, and relying on unified AI-API keys for proxy forwarding of requests to achieve unified management of API usage. Next, take Key Auth as an example to introduce Higress-based multi-tenant authentication capabilities.

The Key Auth plug-in realizes the authentication and authentication function based on the API Key in the gateway, supports parsing the API Key from the URL parameter or request header of the HTTP request, and verifies whether the API has permission to access. Multi-tenant authentication of the Higress gateway can be realized by performing global configuration and route-level configuration in the Higress console-plug-in market-Key Auth.

insert image description here

Key-Auth global configuration sample

#以下配置
consumers:
- credential: "xxxxxx"
  name: "consumer1"
- credential: "yyyyyy"
  name: "consumer2"
global_auth: false
in_header: true
keys:
- "apikey"

insert image description here

Route-level Configuration Example for Key-Auth

allow: [consumer1]

The above configuration defines the consumer group consumers pointing to the AI ​​model service, and only consumer1 has the permission to access the AI ​​model service under the current route.

curl "http://{GatewayIP}/?text=Say,hello"
#请求未提供 API Key,返回401

curl "http://{GatewayIP}/?text=Say,hello" -H "apikey:zzzzzz"
#请求提供的 API Key 未在消费者组内,无权访问,返回401

curl  "http://{GatewayIP}/?text=Say,hello" -H "apikey:yyyyyy"
#根据请求提供的 API Key匹配到的调用者无AI模型服务的访问权限,返回403

curl "http://{GatewayIP}/?text=Say,hello" -H "apikey:xxxxxx"
#请求合法且有AI模型服务访问权限,请求将被代理到AI模型,正常得到OpenAI API的响应

In addition to providing gateway-level multi-tenant authentication, Higress also provides capabilities such as current limiting. The Key Rate Limit plug-in can limit the user application rate based on the user's membership in the consumer group, thereby limiting the consumption of high-cost AI large model services by key applications. Based on the multi-tenant authentication plug-in and current limiting and other functional plug-in capabilities, Higress can fully control the access rights, number of accesses, and calling costs of AI large model APIs.

Data security based on Request Block

For large AI models, especially language models, to get a good return often requires the user to provide enough prompts (prompt) as model input. This also means that organizations and individuals may face the risk of data leakage in the process of providing tips. Therefore, how to ensure data security in the process of using the AI ​​model is also an important issue for the API caller. Securing data security involves strict control over API call channels for AI models. One way is to use a specific approved model with its published API. Another way is to intercept user requests containing sensitive information. This can be achieved by setting specific request interception at the gateway level. Higress provides request interception capabilities based on the Request Block plug-in, which can prevent unauthorized models from accessing user information and prevent user requests containing sensitive information from being exposed to the Internet.

The Request Block plug-in implements the shielding of HTTP requests based on URL, request header and other characteristics, which can be used to protect some site resources from being exposed to the outside. By configuring the mask field in the Higress console-plug-in market-Request Block , requests containing sensitive fields can be prevented from being sent to the outside world.

insert image description here

Sample Request Block route-level configuration

blocked_code: 404
block_urls:
- password
- pw
case_sensitive: false

The above configuration defines URL-based shielding fields under the current route, and requests containing sensitive information (such as password, pw) will be blocked.

curl "http://{GatewayIP}/?text=Mypassword=xxxxxx" -H "apikey:xxxxxx"
curl "http://{GatewayIP}/?text=pw=xxxxxx" -H "apikey:xxxxxx"
#上述请求将被禁止访问,返回404

Usage observation and analysis based on the commercial version of Higress

For organizations, the usage observation and analysis of calling AI models for each user is helpful to understand their usage and costs. It is also necessary for individual users to understand their own call volume and overhead. Therefore, the observation and analysis of calling at the gateway layer is a necessary capability for the API management of AI large models. The commercial version of Higress is deeply integrated with various indicators and log systems, and provides an out-of-the-box usage observation analysis report construction mechanism, which can view the usage of various APIs in real time and filter according to various parameters, so as to better understand API usage.

Taking the observation of the amount of calls to the OPENAI-Curie model by each user as an example, the user can set the observability parameter request header to distinguish users: x-mse-consumer in the MSE management console - cloud native gateway - gateway instance - parameter configuration - log format adjustment , and include it in the observation list. After that, go to Observation Analysis - Log Center and set the statistical chart function to complete the API usage observation and analysis. As shown in the figure below, the amount of calls to the OPENAI-Curie model of user consumer1 and user consumer2 is presented in the form of a pie chart.

insert image description here

Bonus: Chatbot for the Higress console sample

The Higress team has deployed a GPT model-based easter egg chatbot on the Higress console sample [ 5] . If you have any questions during the use of Higress, please ask it!

insert image description here

If you think Higress is helpful to you, please go to github: Higress and give us a star!

Related Links:

[1] Commercial version Higress

https://www.alibabacloud.com/zh/product/microservices-engine

[2] Higress installation and deployment documents

https://higress.io/zh-cn/docs/ops/deploy-by-helm/#%E6%94%AF%E6%8C%81-istio-crd%E5%8F%AF%E9%80%89

[3] Use GO language to develop WASH plug-in

https://higress.io/zh-cn/docs/user/wasm-go/

[4] AI proxy plugin

https://github.com/alibaba/higress/tree/main/plugins/wasm-go/extensions/chatgpt-proxy

[5] Higress console sample

http://demo.higress.io/login?redirect=/route

[6] github: Higress

https://github.com/alibaba/higress

Guess you like

Origin blog.csdn.net/alisystemsoftware/article/details/131874793