OpenAI ChatGPT API + FaskAPI SSE Stream streaming turnaround technology and front-end Fetch streaming request acquisition case

Fill in the hole first, always get used to it

If nginx wants to support SSE, some parameters need to be adjusted

 The conf configuration file is given by AI, but it should be correct if I didn’t match it myself.

nginx
worker_processes  1;

events {
    worker_connections  1024;
}

http {
    include       mime.types;
    default_type  application/octet-stream;

    sendfile        on;
    keepalive_timeout  65;

    server {
        listen       8000;
        server_name  localhost;

        location /sse {
            proxy_buffering       off;
            proxy_cache           off;

            proxy_http_version    1.1;
            proxy_set_header      Upgrade         $http_upgrade;  
            proxy_set_header      Connection      "upgrade";

            proxy_pass   http://127.0.0.1:9000;
        }
    }
}

The main content of this configuration is as follows: 1. Proxy requests to the local port 9000 in the /sse path
2. Close proxy_buffering and proxy_cache, disable caching
3. Set proxy_http_version to 1.1, support HTTP protocol upgrade
4. Add Upgrade and Connection headers, support upgrade to WebSocket protocol
5. Proxy to the upstream server, and successfully upgrade the protocol on the upstream to realize that the server pushes other content such as worker_processes, events, etc., which are not directly related to SSE, and are only used to illustrate the complete configuration. Workflow: 1. The client initiates a /sse path request to nginx
2. nginx recognizes that this is a protocol upgrade request and needs to be proxied to the upstream
3. Add Upgrade and Connection headers to inform the upstream server that it supports SSE
4. The upstream server receives the request , and successfully upgraded the HTTP protocol to the WebSocket protocol
5. Establish a WebSocket connection to achieve continuous data push from the server to the client
6. All data pushed back by the nginx proxy upstream is sent to the client in real time
7. Realize the server actively and continuously pushes to the client The SSE effect of data Therefore, this nginx configuration supports the protocol upgrade from HTTP to WebSocket, and closes the relevant cache, realizing the SSE technology that the proxy server pushes the data stream to the client. Clients can connect to this server through the standard EventSource API to receive messages pushed by the server. Hope this complete configuration example can deepen your understanding of the implementation of SSE technology in nginx. Feel free to ask any other questions, I'll be happy to explain in more detail. 

Many processes are very similar to websockets, why is it necessary to use SSE?

Personally, I have no feeling for the opinion given by AI Claude, because websocket is usually used, and there is already a very mature socket.io technology. Combined with fastapi, two-way communication can be carried out very well, and it will not often drop Line, OpenAI is popularizing a technology that is not used very much, haha!

 

Officially begin

OpenAI officially gave me a super-simple document, and it was done directly with curl. It really saves as much as possible. You can use apifox or postman to convert curl into fetch or request and other codes that you can understand. Of course You can also learn the curl command by yourself. If you can access OpenAI, you can click the link below to see for yourself

https://platform.openai.com/docs/api-reference/chat/createhttps://platform.openai.com/docs/api-reference/chat/create

 If you are interested in the above bilingual translation, I recommend a free plug-in from a technical leader, immersive translation

https://chrome.google.com/webstore/detail/immersive-translate/bpoadfkcbjbfhfodiogcnhhhpibjhbnhhttps://chrome.google.com/webstore/detail/immersive-translate/bpoadfkcbjbfhfodiogcnhhhpibjhbnh

 There is an explanation of the use of stream. I have never used stream before. After studying, I found that this thing has always existed in a content-type format, but we didn’t pay attention to it. We all use urlencode or json format. To process the data, you can actually send it in binary form, and then you can process it yourself.

Of course, what the hell is the first time I came into contact with SSE, and then I found out that the first big guy even open sourced a plug-in, from which I got a glimpse of the use cases of SSE. If you are interested, you can read another SSE learning case. This is wrong. Let's have an in-depth discussion on the front end

ChatGPT API SSE (server push technology) and Fetch request Accept: text/event-stream header case EventSource` API for processing. Using the `fetch()` method and adding `Accept: text/event-stream` to the request header tells the server that we want to receive a stream in Server-Sent Events (SSE) format. `fetch()` has good support for stream processing, we can use the `body` attribute to read SSE messages, and we can also use other functions of `fetch()` such as timeout control, request retry, etc. The disadvantage is that you need to manually parse the data, https://blog.csdn.net/wangsenling/article/details/130490769 The Python side officially provides the openai library, which is also open source, you can find it

GitHub - openai/openai-python: The OpenAI Python library provides convenient access to the OpenAI API from applications written in the Python language. The OpenAI Python library provides convenient access to the OpenAI API from applications written in the Python language. - GitHub - openai /openai-python: The OpenAI Python library provides convenient access to the OpenAI API from applications written in the Python language. https://github.com/openai/openai-python/ I didn't see much, but it doesn't seem to give stream cases , just the case of the request, if it is only a request, it is actually quite simple, there is nothing to say

SSE is a transport

At the beginning, I put this thing together with websocket to understand, but the direction was wrong. This thing is really different from websocket. Websocket must be asynchronous, and after a long connection, any process on the server side can send a message to the client. Message, for example, after A establishes a websocket with the server, B sends a request, and B's http request can be sent to the A client as a message. This process is purely asynchronous

SSE I didn't find a way to communicate with internal processes, and I don't know if Nginx will be interrupted at some point, but seeing that nginx has upgraded the protocol, it should also keep the http as a long link, and it will not be easily broken, but if there is no The heartbeat response should be interrupted, and the server just keeps sending information to the front end. This method is more suitable like a typewriter. Every time a request is made, OpenAI pushes it little by little, so that the user will not have to wait too long for a large amount of information. , if it is recommended to use websocket for long links, there must be more solutions

I thought of a way, which is to open a sub-thread, push data to a queue, and the main process keeps popping data and then sends it to the front end. Many people give the case of the generator method, this is also possible, generator+ Future can also implement asynchronous task processing, and the communication method is much more flexible than queues.

Instead of the openai library on the official website, according to the development documentation, it is also possible to send a request directly. Here is the request method of a big guy, using httpx. You can learn by yourself. Know that there is a comparison article

Shallow evaluation: requests, aiohttp, httpx which one should I use? - Zhihu Among the many HTTP clients in Python, the most famous ones are requests, aiohttp and httpx. Without the help of other third-party libraries, requests can only send synchronous requests; aiohttp can only send asynchronous requests; httpx can send both synchronous and asynchronous requests. So… https://zhuanlan.zhihu.com/p/103711201

Screenshot of core parameters

Why not post the code? It's healthier to knock on your own, don't form a theory of taking it, copying it once is good for memory and understanding

 Screenshot of request body code

 Explained by AI Claude

 

 This is a generator function. Through the yield function, yield is very obscure in many places. "Javascript You Don't Know" very concisely says that this is a return, but for the generator, the number of returns must be It has been done many times, so a yield is used to distinguish the return of the synchronous function, and the meaning of return is to stop the following code, and the meaning of returning data is still a bit different, but yield is return, return returned multiple times

Core library EventSourceResponse

from sse_starlette import EventSourceResponse

 

The interpretation of EventSourceResponse given by AI, you can also throw the source code of EventSourceResponse to Claude, let him read it, and interpret it for you, it is a good way

 

 

The following is a simple case code of FastAPI of EventSourceResponse for everyone to play

import uvicorn
import asyncio
from fastapi import FastAPI, Request
from fastapi.middleware.cors import CORSMiddleware
from sse_starlette.sse import EventSourceResponse

times = 0
app = FastAPI()

origins = [
    "*"
]

app.add_middleware(
    CORSMiddleware,
    allow_origins=origins,
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)


@app.post("/sse/data")
async def root(request: Request):
    event_generator = status_event_generator(request)
    return EventSourceResponse(event_generator)


status_stream_delay = 1  # second
status_stream_retry_timeout = 30000  # milisecond


# 其实就是绑定函数事件 一直在跑循环
async def status_event_generator(request):
    global times
    while True:
        if not await request.is_disconnected() == True:
            yield {
                "event": "message",
                "retry": status_stream_retry_timeout,
                "data": "data:" + "times" + str(times) + "\n\n"
            }
        print("alive")
        times += 1
        await asyncio.sleep(status_stream_delay)


if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8000, log_level='info')

According to the above explanation, you can get the code out

fetch('http://localhost:8000/sse/data', {
    method: 'POST',
    headers: {
        'Content-Type': 'application/json',
    },
    body: JSON.stringify({
        text: "hello"
    })
}).then(async response=>{
    const reader = response.body.pipeThrough(new TextDecoderStream()).getReader()
    while (true) {
        let {value, done} = await reader.read();
        if (done)
            break;
        if (value.indexOf("data:") < 0 || value.indexOf('event: ping') >= 0)
            continue;
        // console.log('Received~~:', value);
        let values = value.split("\r\n")
        for (let i = 0; i < values.length; i++) {
            let _v = values[i].replace("data:", "")
            // console.log(_v)
            if (_v.trim() === '')
                continue
            console.log(_v)
        }
    }
}
).catch(error=>{
    console.error(error);
}
);

Because your own business code has a lot of authentication and database operations, it is inconvenient to release it. According to your own needs, you can write the generator function on the basis of this simple code.

Guess you like

Origin blog.csdn.net/wangsenling/article/details/130911465