Preface
Preface: In actual use, it is often necessary to refer to official cases, but sometimes it is because of different tools, such as idea and eclipse, the difference between ordinary projects and spring projects; sometimes it is difficult to spread in official cases due to limited level. I can understand each document; sometimes the demo files I use for testing do not meet the requirements of the official website. . . Related functions cannot always be implemented.
This series of blogs attempts to combine the official website cases to explain the use of components in Baidu AI open platform. The core is how to quickly get started in the spring project.
This article introduces how to use voice file recognition in Springboot & the installation and use of ffmpeg
Article directory
lead out
1. From the official website demo to use in idea;
2. From use in idea to springboot project integration;
3. Preliminary installation and use of ffmpeg;
1. How to use speech recognition
1. Official website sdk
https://ai.baidu.com/ai-doc/SPEECH/plbxfq24s
Sometimes I see this when looking for reference documents, and sometimes the checkout fails after clicking git.
https://github.com/Baidu-AIP/java-sdk
There are actually better instructions for opening GitHub, but it is uncertain whether GitHub can be opened smoothly.
You can download various sdk resources here
2. Download sdk
Download the compressed package and unzip it
The official documentation provides the use of eclipse: 3. Right-click "Project -> Properties -> Java Build Path -> Add JARs" in Eclipse.
But I am more comfortable using idea, but I am not familiar with eclipse.
2. How to run through the demo in idea
1. Create a new project and create a new lib folder
Used to store the jar package just downloaded and decompressed.
Copy and paste into the lib directory
2. Select to import as lib
Click OK and the import is successful.
3. Copy the case of the official website
Copy the case from the official website and import the package in the jar package
Add configuration file, log related
4. Obtain speech recognition case files
Then I discovered that the file format required for speech recognition is pcm format, so I found a tool to convert files in other formats to pcm. Later, I found that various files always came out every time I called it. Finally, after some twists and turns, I finally found it on the official website. Find a sample file in the python case. . . .
There were various bugs in the middle, and finally I solved it using this case I found.
3. How to use it with springboot
Understanding jar packages
Overview of simple spring project construction
1. Import dependencies
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.tianju</groupId>
<artifactId>baidu-api</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<maven.compiler.source>8</maven.compiler.source>
<maven.compiler.target>8</maven.compiler.target>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<!-- 起步依赖-->
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.6.13</version>
</parent>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<!-- 百度ai的java sdk中心-->
<dependency>
<groupId>com.baidu.aip</groupId>
<artifactId>java-sdk</artifactId>
<version>4.16.16</version>
</dependency>
<!--json工具-->
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson</artifactId>
<version>2.0.12</version>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<optional>true</optional>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
<scope>test</scope>
</dependency>
</dependencies>
</project>
2. Configure
BaiduPro configuration class
package com.tianju.config.baidu;
import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
import org.springframework.boot.context.properties.ConfigurationProperties;
import org.springframework.context.annotation.PropertySource;
import org.springframework.stereotype.Component;
/**
* 专门用来获取配置文件里的值
*/
@Component
@ConfigurationProperties(prefix = "baidu")
@PropertySource("classpath:config/baiduAip.properties")
@Data
@NoArgsConstructor
@AllArgsConstructor
public class BaiduPro {
private String appId;
private String apiKey;
private String secretKey;
}
Configuration class, put in the container
package com.tianju.config.baidu;
import com.baidu.aip.speech.AipSpeech;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
/**
* 百度相关的配置文件
*/
@Configuration
public class BaiduConfig {
@Autowired
private BaiduPro baiduPro;
/**
* 语音相关 AipSpeech
* @return AipSpeech放容器中
*/
@Bean
public AipSpeech aipSpeech(){
// 初始化一个AipSpeech
AipSpeech client = new AipSpeech(baiduPro.getAppId(), baiduPro.getApiKey(), baiduPro.getSecretKey());
// 可选:设置网络连接参数
client.setConnectionTimeoutInMillis(2000);
client.setSocketTimeoutInMillis(60000);
return client;
}
}
3.Controller layer calls
GET http://localhost:10050/api/baidu/hello
package com.tianju.config.controller;
import com.baidu.aip.speech.AipSpeech;
import com.tianju.config.resp.HttpResp;
import lombok.extern.slf4j.Slf4j;
import org.json.JSONObject;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;
@RestController
@RequestMapping("/api/baidu")
@Slf4j
public class BaiduApiController {
@Autowired
private AipSpeech aipSpeech;
@GetMapping("/hello")
public HttpResp hello(){
JSONObject pcm = aipSpeech.asr(
"D:\\Myprogram\\springboot-workspace\\spring-project\\baidu-api\\src\\main\\resources\\static\\helloAipSpeech.pcm",
"pcm", 16000, null);
log.debug("get response:"+pcm.toString());
return HttpResp.success(pcm.toString());
}
}
4. Introduction to the use of ffmpeg
1. Basic knowledge
Official website: https://ffmpeg.org/
Introduction to FFMPEG
The name FFmpeg comes from the MPEG video encoding standard. The "FF" in front of it stands for "Fast Forward". FFmpeg is a set of open source computer programs that can be used to record and convert digital audio and video, and convert them into streams. You can easily convert between multiple video formats.
FFmpeg users include Google, Facebook, Youtube, Youku, iQiyi, Tudou, etc.
basic knowledge
1. Container/File: Multimedia files in specific formats, such as mp4, flv, mkv, etc.
2. Media stream (Stream): represents a piece of continuous data on the timeline, such as a piece of sound data, a piece of video data, or a piece of subtitle data. It can be compressed or uncompressed. Compressed data needs to be associated with a specific codec. device.
3. Data frame/packet (Frame/Packet): Usually, a media stream is composed of a large number of data frames. For compressed data, the frame corresponds to the minimum processing unit of the codec, and belongs to data frames of different media streams. Interleaved storage in containers.
In general:
Frame corresponds to the data before compression, and Packet corresponds to the data after compression.
4. Codec (Codec): realizes the mutual conversion between compressed data and original data in frame units.
5. Multiplexing (mux): Putting different streams into a container according to the rules of a certain container is called multiplexing (mux).
6. Demultiplexing (mux): Parsing different streams from a certain container. This behavior is called demultiplexing (demux).
\7. Bit rate and frame rate are the most important basic characteristics of video files, and their unique settings will determine the video quality. If we know the bit rate and duration then we can easily calculate the size of the output file.
8. Frame rate: Frame rate is also called frame frequency. Frame rate is the number of frames per second in a video file. It takes at least 15 frames to see continuous moving images with the naked eye.
9. Bit rate: Bit rate (also called bit rate, data rate) is a parameter that determines the overall video/audio quality. The number of bits processed in seconds. The bit rate is directly proportional to the video quality. In the video file, the bit rate Expressed in bps.
2. Installation and use of commands
After downloading, unzip it to the specified location, and then configure the environment variables.
cmd command test is successful
3. Use commands
ffmpeg -i wjs.aac -acodec pcm_s16le -ar 44100 output.pcm
ffmpeg -i wjs.aac -acodec pcm_s16le -f s16le -ac 2 -ar 16000 16k.pcm
ffmpeg -i wjs.aac -ss 00:00:10 -to 00:00:59 -f s16le -ar 16000 16.pcm
Here is a command using the FFmpeg command line tool to extract the portion of the audio file wjs.aac from second 10 to second 59 and convert it to 16-bit signed PCM format with a sampling rate of 16000Hz, Save as 16.pcm file.
The specific parameters are explained as follows:
- -i wjs.aac: Specify the input file as wjs.aac.
- -ss 00:00:10: Specifies to start extraction from the 10th second.
- -to 00:00:59: Specifies that the extraction ends at the 59th second.
- -f s16le: Specifies the output format as 16-bit signed PCM.
- -ar 16000: Specifies the sampling rate of the output audio to 16000Hz.
- 16.pcm: Specify the output file name as 16.pcm.
So, what this command means is to extract the audio part from the 10th second to the 59th second in the wjs.aac file, convert it to 16-bit signed PCM format, with a sampling rate of 16000Hz, and save it as a 16.pcm file.
Extract 1 minute pcm audio file command
Summarize
1. From the official website demo to use in idea;
2. From use in idea to springboot project integration;
3. Preliminary installation and use of ffmpeg;