This article teaches you how to transmit intelligent interactive audio directly to OSS

1. Background

This time it brings a record of requirements. Recently, I am doing voice translation in the AI ​​live broadcast room. So this time I used Alibaba Cloud's voice-only interaction and Alibaba Cloud's oss service.

The first is Alibaba Cloud's voice-only interaction: nls-portal.console.aliyun.com/overview .

For oss service, just go directly to Alibaba Cloud’s official search, which is used more often.

2. Technical overview

2.1. Overview

  1. When used to enter the live room. The front end will detect that the xxx user enters the live broadcast room. After capturing this information. The front end transmits the user nickname to the server.
  2. The server splices the nickname and the configured welcome copy, and calls Alibaba Cloud Smart Voice to generate audio.
  3. Alibaba Cloud Smart Voice generates corresponding audio files according to the specified audio synthesis model.
  4. The server gets the corresponding audio stream, and calls the oss service to upload the audio stream to the corresponding server.
  5. After the upload is complete, return the audio link to the front end. The front end plays the corresponding audio file.

2.2. Creating an audio project

  1. After logging in to Alibaba Cloud, nls-portal.console.aliyun.com/applist can only enable voice interaction.
  2. Create the corresponding project
  3. Change setting
  4. Choose the voice you want to synthesize

3. Technical implementation

3.1. Generate token

    /**
     * 获取token
     *
     * @return 
     * @throws IOException
     */
    private static String getToken() throws IOException {
        AccessToken accessToken = new AccessToken("accessKeyId", "accessKeySecret");
        accessToken.apply();
        String token = accessToken.getToken();
        return token;
    }
复制代码

3.2. Generate the corresponding audio stream file

private void processGETRequest(String appkey, String accessToken, String text, String format, int sampleRate, String voice, String filePath) {
        /**
         * 设置HTTPS GET请求:
         * 1.使用HTTPS协议
         * 2.语音识别服务域名:nls-gateway-cn-shanghai.aliyuncs.com
         * 3.语音识别接口请求路径:/stream/v1/tts
         * 4.设置必须请求参数:appkey、token、text、format、sample_rate
         * 5.设置可选请求参数:voice、volume、speech_rate、pitch_rate
         */
        String url = "https://nls-gateway-cn-shanghai.aliyuncs.com/stream/v1/tts";
        url = url + "?appkey=" + appkey;
        url = url + "&token=" + accessToken;
        url = url + "&text=" + text;
        url = url + "&format=" + format;
        url = url + "&voice=" + voice;
        url = url + "&sample_rate=" + sampleRate;

        /**
         * 发送HTTPS GET请求,处理服务端的响应。
         */
        Request request = new Request.Builder().url(url).get().build();
        try {
            OkHttpClient client = new OkHttpClient();
            Response response = client.newCall(request).execute();
            String contentType = response.header("Content-Type");
            if ("audio/mpeg".equals(contentType)) {
                byte[] bytes = response.body().bytes();
                ossService.ossUploadByStream(filePath, new ByteArrayInputStream(bytes));
            } else {
                String errorMessage = response.body().string();
            }
            response.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
复制代码

3.3. Upload audio stream files to oss

public void ossUploadByStream(String filePath, InputStream input) {
        OSS ossClient = new OSSClientBuilder().build(aLiYunConfig.getEndpoint(), aLiYunConfig.getAccessKeyId(),
            aLiYunConfig.getSecretAccessKey());
        PutObjectRequest putObjectRequest = new PutObjectRequest(aLiYunConfig.getBucketName(), filePath, input);
        ossClient.putObject(putObjectRequest);
        ossClient.shutdown();
    }
复制代码

4. Summary

  1. For the generation of audio, it is directly saved to oss, so that it does not need to occupy the memory of the server. Upload directly to oss service. More convenient to use.
  2. It should be noted that the generation of audio requires you to read more documents. The official sdk is also provided. It can also be used directly.

Guess you like

Origin juejin.im/post/7229368008363999292