SpringBoot project (Baidu AI integration) - How to use speech file recognition in Springboot & the installation and use of ffmpeg

Insert image description here

Preface

Preface: In actual use, it is often necessary to refer to official cases, but sometimes it is because of different tools, such as idea and eclipse, the difference between ordinary projects and spring projects; sometimes it is difficult to spread in official cases due to limited level. I can understand each document; sometimes the demo files I use for testing do not meet the requirements of the official website. . . Related functions cannot always be implemented.

This series of blogs attempts to combine the official website cases to explain the use of components in Baidu AI open platform. The core is how to quickly get started in the spring project.

This article introduces how to use voice file recognition in Springboot & the installation and use of ffmpeg

Insert image description here

lead out


1. From the official website demo to use in idea;
2. From use in idea to springboot project integration;
3. Preliminary installation and use of ffmpeg;

Insert image description here

1. How to use speech recognition

1. Official website sdk

https://ai.baidu.com/ai-doc/SPEECH/plbxfq24s

Sometimes I see this when looking for reference documents, and sometimes the checkout fails after clicking git.

Insert image description here

https://github.com/Baidu-AIP/java-sdk

There are actually better instructions for opening GitHub, but it is uncertain whether GitHub can be opened smoothly.

Insert image description here

https://ai.baidu.com/sdk

You can download various sdk resources here

Insert image description here

2. Download sdk

Download the compressed package and unzip it

The official documentation provides the use of eclipse: 3. Right-click "Project -> Properties -> Java Build Path -> Add JARs" in Eclipse.

But I am more comfortable using idea, but I am not familiar with eclipse.

Insert image description here

2. How to run through the demo in idea

Insert image description here

1. Create a new project and create a new lib folder

Used to store the jar package just downloaded and decompressed.

Insert image description here

Copy and paste into the lib directory

Insert image description here

2. Select to import as lib

Insert image description here

Click OK and the import is successful.

Insert image description here

3. Copy the case of the official website

Copy the case from the official website and import the package in the jar package

Insert image description here

Add configuration file, log related

Insert image description here

4. Obtain speech recognition case files

Then I discovered that the file format required for speech recognition is pcm format, so I found a tool to convert files in other formats to pcm. Later, I found that various files always came out every time I called it. Finally, after some twists and turns, I finally found it on the official website. Find a sample file in the python case. . . .

Insert image description here

There were various bugs in the middle, and finally I solved it using this case I found.

Insert image description here

3. How to use it with springboot

Understanding jar packages

Insert image description here

Overview of simple spring project construction

Insert image description here

1. Import dependencies

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.tianju</groupId>
    <artifactId>baidu-api</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <maven.compiler.source>8</maven.compiler.source>
        <maven.compiler.target>8</maven.compiler.target>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    </properties>

    <!--    起步依赖-->
    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>2.6.13</version>
    </parent>

    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>

<!--        百度ai的java sdk中心-->
        <dependency>
            <groupId>com.baidu.aip</groupId>
            <artifactId>java-sdk</artifactId>
            <version>4.16.16</version>
        </dependency>

        <!--json工具-->
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>2.0.12</version>
        </dependency>

        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
            <optional>true</optional>
        </dependency>

        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.12</version>
            <scope>test</scope>
        </dependency>

    </dependencies>

</project>

2. Configure

BaiduPro configuration class

package com.tianju.config.baidu;


import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
import org.springframework.boot.context.properties.ConfigurationProperties;
import org.springframework.context.annotation.PropertySource;
import org.springframework.stereotype.Component;

/**
 * 专门用来获取配置文件里的值
 */
@Component
@ConfigurationProperties(prefix = "baidu")
@PropertySource("classpath:config/baiduAip.properties")

@Data
@NoArgsConstructor
@AllArgsConstructor

public class BaiduPro {
    
    
    private String appId;
    private String apiKey;
    private String secretKey;
}

Configuration class, put in the container

package com.tianju.config.baidu;

import com.baidu.aip.speech.AipSpeech;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

/**
 * 百度相关的配置文件
 */
@Configuration
public class BaiduConfig {
    
    

    @Autowired
    private BaiduPro baiduPro;

    /**
     * 语音相关 AipSpeech
     * @return AipSpeech放容器中
     */
    @Bean
    public AipSpeech aipSpeech(){
    
    

        // 初始化一个AipSpeech
        AipSpeech client = new AipSpeech(baiduPro.getAppId(), baiduPro.getApiKey(), baiduPro.getSecretKey());

        // 可选:设置网络连接参数
        client.setConnectionTimeoutInMillis(2000);
        client.setSocketTimeoutInMillis(60000);
        return client;
    }
}

3.Controller layer calls

GET http://localhost:10050/api/baidu/hello

package com.tianju.config.controller;

import com.baidu.aip.speech.AipSpeech;
import com.tianju.config.resp.HttpResp;

import lombok.extern.slf4j.Slf4j;
import org.json.JSONObject;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

@RestController
@RequestMapping("/api/baidu")

@Slf4j
public class BaiduApiController {
    
    

    @Autowired
    private AipSpeech aipSpeech;

    @GetMapping("/hello")
    public HttpResp hello(){
    
    
        JSONObject pcm = aipSpeech.asr(
                "D:\\Myprogram\\springboot-workspace\\spring-project\\baidu-api\\src\\main\\resources\\static\\helloAipSpeech.pcm",
                "pcm", 16000, null);
        log.debug("get response:"+pcm.toString());
        return HttpResp.success(pcm.toString());
    }
}

4. Introduction to the use of ffmpeg

Insert image description here

1. Basic knowledge

Official website: https://ffmpeg.org/

Introduction to FFMPEG

The name FFmpeg comes from the MPEG video encoding standard. The "FF" in front of it stands for "Fast Forward". FFmpeg is a set of open source computer programs that can be used to record and convert digital audio and video, and convert them into streams. You can easily convert between multiple video formats.
FFmpeg users include Google, Facebook, Youtube, Youku, iQiyi, Tudou, etc.

basic knowledge

1. Container/File: Multimedia files in specific formats, such as mp4, flv, mkv, etc.

2. Media stream (Stream): represents a piece of continuous data on the timeline, such as a piece of sound data, a piece of video data, or a piece of subtitle data. It can be compressed or uncompressed. Compressed data needs to be associated with a specific codec. device.

3. Data frame/packet (Frame/Packet): Usually, a media stream is composed of a large number of data frames. For compressed data, the frame corresponds to the minimum processing unit of the codec, and belongs to data frames of different media streams. Interleaved storage in containers.

In general:

Frame corresponds to the data before compression, and Packet corresponds to the data after compression.

4. Codec (Codec): realizes the mutual conversion between compressed data and original data in frame units.

5. Multiplexing (mux): Putting different streams into a container according to the rules of a certain container is called multiplexing (mux).

6. Demultiplexing (mux): Parsing different streams from a certain container. This behavior is called demultiplexing (demux).

\7. Bit rate and frame rate are the most important basic characteristics of video files, and their unique settings will determine the video quality. If we know the bit rate and duration then we can easily calculate the size of the output file.

8. Frame rate: Frame rate is also called frame frequency. Frame rate is the number of frames per second in a video file. It takes at least 15 frames to see continuous moving images with the naked eye.

9. Bit rate: Bit rate (also called bit rate, data rate) is a parameter that determines the overall video/audio quality. The number of bits processed in seconds. The bit rate is directly proportional to the video quality. In the video file, the bit rate Expressed in bps.

2. Installation and use of commands

Insert image description here

After downloading, unzip it to the specified location, and then configure the environment variables.

Insert image description here

cmd command test is successful

Insert image description here

3. Use commands

ffmpeg -i wjs.aac -acodec pcm_s16le -ar 44100 output.pcm
ffmpeg -i wjs.aac -acodec pcm_s16le -f s16le -ac 2 -ar 16000 16k.pcm

Insert image description here

ffmpeg -i wjs.aac -ss 00:00:10 -to 00:00:59 -f s16le -ar 16000 16.pcm

Here is a command using the FFmpeg command line tool to extract the portion of the audio file wjs.aac from second 10 to second 59 and convert it to 16-bit signed PCM format with a sampling rate of 16000Hz, Save as 16.pcm file.

The specific parameters are explained as follows:

  • -i wjs.aac: Specify the input file as wjs.aac.
  • -ss 00:00:10: Specifies to start extraction from the 10th second.
  • -to 00:00:59: Specifies that the extraction ends at the 59th second.
  • -f s16le: Specifies the output format as 16-bit signed PCM.
  • -ar 16000: Specifies the sampling rate of the output audio to 16000Hz.
  • 16.pcm: Specify the output file name as 16.pcm.

So, what this command means is to extract the audio part from the 10th second to the 59th second in the wjs.aac file, convert it to 16-bit signed PCM format, with a sampling rate of 16000Hz, and save it as a 16.pcm file.

Insert image description here

Extract 1 minute pcm audio file command

Insert image description here


Summarize

1. From the official website demo to use in idea;
2. From use in idea to springboot project integration;
3. Preliminary installation and use of ffmpeg;

Guess you like

Origin blog.csdn.net/Pireley/article/details/133211336