OpenAI update: API adds new function calling capabilities

In just a few months since the launch of ChatGPT, OpenAI has updated many new features based on GPT-3.5 Turbo, GPT-4 and other models. On June 13, OpenAI released another major update, releasing function calls and other API updates.

1 Updated content

(1) Currently available Chat Completions API Function tuning ability

(2) Update gpt-4 and gpt-3.5-turbo versions that are more controllable

(3) New 16k context version gpt-3.5-turbo (compared to the standard 4k version)

(4) The price of embeddings embedding models has been reduced by 75%

(5) gpt-3.5-turbo input token price reduced by 25%

(6) Announcing the deprecation plan for gpt-3.5-turbo-0301 and gpt-4-0314 models

(2) ~ (5) are all about cost and billing issues, (6) is that the old version model will no longer continue to provide services. For developers, the most important update is (1) Chat Completions API can provide function calling capabilities. This feature allows the model to call external functions when needed and generate the corresponding JSON object as output. This allows developers to more accurately use GPT's API to understand user intentions, obtain structured data, and achieve conversion from natural language to external function calls. To put it bluntly: GPT’s API can now understand the developer’s function input and output format requirements based on the function description given by the developer. Convert the user's natural language into the input parameters of the function. Then the local execution script executes the corresponding function according to this parameter, obtains a result and returns it to the GPT API. The GPT API will finally answer the user's question in the form of natural language based on the result. The following is the function call of the GPT API. Official (OpenAI.com) example.

2 GPT’s API for function calling process

User asked about the weather: What’s the weather like in Boston right now?

Step 1: Call the model with functions and user input via the OpenAI API

①The developer givesGPT a function for querying the weatherget_current_weather< a i=4>, the parameter required by this function is location.

②用户问GPT:Now HeavenBoston天气怎么Shade?

GPT Understand that the user’s need is to query the weather, and the parameter is Boston< a i=4>.

Return the function and parameters to the execution script, 'function_call': {'name' ;: 'get_current_weather', 'arguments': '{"location": "Boston"}

Return example:

Note: If in ②, the user does not ask "How is the weather in Boston today", but asks "How is the weather today", GPT does not know the geographical location the user wants to query, and will further ask the user "Please provide your location or Tell me which city you want to check the weather for." When the user clearly gives the city name, GPT executes ③.

Step 2: Call the third-party API based on the response returned by the model

⑤The script automatically executes theget_current_weather function (through a certainAPICheck the weather), the query parameter is Boston , and the weather result is returned{ "temperature": 22, " ;unit": "celsius", "description": "Sunny" }

 Return example:

Step 3: Send the return results of the third-party API to the model and summarize them

GPTConvert the returned results into natural language to answer the user:Boston It is currently sunny and the temperature is22 degrees Celsius.

Return example:

The above process is not the interaction between developers and GPT, but developers, GPT , users' three-party interaction, GPT is the "translator" between developers and users. This process solves three problems:

First, understand user intentions from natural language and match function functions to select appropriate functions;

Second, extract the parameters required by the function from natural language and submit them to the developer in the prescribed format (function execution script);

Third, return the data (usually JSON) returned by the developer (function execution script) to the user in the form of natural language.

The figure below shows the logic of GPT’s API calling function. The blue part on the right side is "insensitive" to the user's writing. What the user actually experiences is only natural language questions and natural language answers.

Note: GPT’sAPI itself does not have functions Calling ability, the result it returns is only the function name and parameter values. Function calls are completed through execution scripts written by developers themselves. Its function is based on the "Which function should be used" returned by GPT's API (function name)" and "pass in parameters" automatically execute the corresponding function. You don’t have to have a third-party interface. It mainly depends on whether there is a calling requirement for a third-party interface in the local function.

Once again, a flow chart is used to describe the process of GPT's AIP calling external functions:

3 Similarities and differences between GPT API and GPT plug-in

The plug-in function previously opened by OpenAI can also call external functions. When the user mentions certain functions when talking to ChatGPT, the external plug-in will be automatically called. This process can only be interacted with on the ChatGPT interface, improves the capabilities of ChatGPT. The API of GPT discussed in this issue is to interact locally, using GPT's semantic understanding ability to serve our own functions.What is improved is the ability of local functions.

4 Summary

GPT’s API can understand the input and output format requirements of the function according to the function description given by the developer, and convert the user’s natural language into the input parameters of the local/third-party function. After the developer executes the function The results are returned to GPT's API, and GPT answers user questions based on the results. This calling ability can call external packaging tools to deal with complex problems. This capability can solve many problems faced by users and functions, and also has the following inspirations for GIS developers:

(1) When users solve problems, they do not know which function to call or which method to use. They are also unable to describe their needs in the form of function requirements. This results in users having needs but not knowing what function to use to solve them. The function can solve the problem but cannot be called. Now GPT's API acts as a "translator" between users and functions, understanding both users and functions.

(2) Functions and methods require fixed and standardized data formats, but extracting the data and formats required by functions from the user's natural language is often a relatively complex task. But GPT’s API can serve as this bridge, seamlessly connecting natural language and functions. In the past, when querying databases, you needed to write query statements, but with GPT's API, you only need to use language to describe the information you want to query. Take the SQL query of area and population as an example: SELECT area, population FROM region WHERE area>14 AND population>2000; à Query areas with an area greater than 140,000 square kilometers and a population greater than 20 million.

(3) Answering questions is more reliable. Functions that have been written into functions by developers will return results that have been accurately calculated logically, rather than "serious nonsense." For example, if GPT-3.5 cannot accurately perform mathematical calculations, developers can write a calculation script locally, which can solve the problem of inaccurate calculations of GPT-3.5.

However, calling functions through this method also has somelimitations:

(1) At present, OpenAI has increasingly strict restrictions on accounts and API quotas. It is difficult for domestic users to apply for accounts and obtain API quotas. Moreover, calling the API requires opening a VPN throughout the process, making timeliness and stability difficult to guarantee.

(2) Although the API call price has decreased and the maximum number of tokens for one-time input/output has increased, calling the API locally requires transmitting the function description in the first step above and the function call result in the third step to the API, and then returning The results are transferred back to the local machine, which will occupy a lot of tokens. This method can still be used when there are few local functions, but if you want to select the function to be executed among hundreds of functions, the demand for API quota is huge.

(3) The semantic understanding capabilities of other lightweight models (LLaMA, Aplaca, Chat-GLM, etc.) and even the large language models of most domestic manufacturers may not reach the level of GPT-3.5 and GPT-4.0 for the time being, so they cannot accurately understand user intentions. It may become a challenge to implement local function calls with the help of other language model APIs.

References:

https://openai.com/blog/function-calling-and-other-api-updates

https://blog.csdn.net/fyfugoyfa/article/details/131216745

https://www.zhihu.com/question/606520916

https://www.zhihu.com/people/loveQt

For technical exchanges/scientific research cooperation/guest internships/joint training, please submit: [email protected] title

 "Future GIS Laboratory", as the upstream scientific research institution of SuperMap Research Institute, is committed to gaining insight into the future development direction of the GIS industry and verifying cutting-edge technologies Feasibility of implementation and rapid transformation of latest research results into key products. The department focuses on scientific research and innovation. The team atmosphere is free and harmonious, and the scientific research atmosphere is relatively strong. Everyone has the opportunity to delve into the cutting-edge directions that interest them.

Guess you like

Origin blog.csdn.net/futuregislab/article/details/131377798