在 React Native 中使用 Whisper 进行语音识别

在本文中,我们将使用 Whisper 创建语音转文本应用程序。Whisper需要Python后端,因此我们将使用Flask为应用程序创建服务器。

React Native 作为构建移动客户端的框架。我希望您喜欢创建此应用程序的过程,因为我确实这样做了。让我们直接深入研究它。

什么是语音识别?

语音识别使程序能够将人类语音处理成书面格式。语法、句法、结构和音频对于理解和处理人类语音至关重要。

语音识别算法是计算机科学中如何在 macOS Big Sur 上显示/隐藏电池百分比最复杂的领域之一。人工智能、机器学习、无监督预训练技术的发展,以及 Wav2Vec 2.0 等框架,这些框架在自我监督学习和从原始音频中学习方面是有效的,已经提高了它们的能力。

语音识别器由以下组件组成:

  • 语音输入

  • 一种解码器,它依赖于声学模型、发音词典和语言模型进行输出

  • 输出一词

这些组件和技术进步使未标记语音的大型数据集的消费成为可能。预先训练的音频编码器能够学习高质量的语音表示;它们唯一的缺点是不受监督的性质。

什么是解码器?

高性能解码器将语音表示映射到可用输出。解码器解决了音频编码器的监控问题。但是,解码器限制了Wav2Vec等框架对语音识别的有效性。解码器使用起来可能非常复杂,需要熟练的从业者,特别是因为 Wav2Vec 2.0 等技术难以使用。

关键是要结合尽可能多的高质量语音识别数据集。以这种方式训练的模型比在单个源上训练的模型更有效。

什么是耳语?

Whisper或WSPR代表用于语音识别的Web级监督预训练。耳语模型接受训练,以便能够预测成绩单的文本。

Whisper 依靠序列到序列模型在话语如何在 Windows 11 中设置帐户锁定阈值及其转录形式之间进行映射,这使得语音识别管道更有效。Whisper带有一个音频语言检测器,这是一个在VoxLingua107上训练的微调模型。

Whisper数据集由音频与来自互联网的成绩单配对组成。数据集的质量通过使用自动筛选方法而提高。

设置耳语

要使用Whisper,我们需要依靠Python作为我们的后端。Whisper 还需要命令行工具 ffmpeg,它使我们的应用程序能够录制、转如何在发件人不知情的情况下阅读 WhatsApp 消息换和流式传输音频和视频。

以下是在不同机器上安装 ffgmeg 的必要命令:

# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg
​
​
# on Arch Linux
sudo pacman -S ffmpeg
​
​
# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg
​
​
# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg
​
​
# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg

使用 Flask 创建后端应用程序

在本节中,我们将为应用创建后端服务。 Flask是一个用Python编写的Web框架。我选择将Flask用于此应用程序,因为它易于设置。

Flask开发团队建议使用最新版本的Python,尽管Flask仍然支持Python ≥ 3.7。

安装先决条件完成后,我们可以创建项目文件夹来保存客户端和后端应用程序。

mkdir translateWithWhisper && cd translateWithWhisper && mkdir backend && cd backend

Flask 利用虚拟环境来管理项目依赖关系;Python有一个开箱即用的venv模块来创建它们。

在终端窗口中使用以下命令创建文件夹。此文件夹包含我们的依赖项。venv

python3 -m venv venv

指定项目依赖项

使用文件指定必要的依赖项。该文件位于后端目录的根目录中。requirements.txt``requirements.txt

touch requirements.txt
code requirements.txt

将以下代码复制并粘贴到文件中:requirements.txt

numpy
tqdm
transformers>=4.19.0
ffmpeg-python==0.2.0
pyaudio
SpeechRecognition
pydub
git+https://github.com/openai/whisper.git
--extra-index-url https://download.pytorch.org/whl/cu113
torch
flask
flask_cors

创建 Bash shell 脚本以安装依赖项

在根项目目录中,创建一个 Bash shell 脚本文件。Bash 脚本处理 Flask 应用程序中依赖项的安装。

在根项目目录中,打开终端窗口。使用以下命令创建外壳脚本:

touch install_dependencies.sh
code install_dependencies.sh

将以下代码块复制并粘贴到文件中:install_dependencies.sh

# install and run backend
cd backend && python3 -m venv venv
source venv/Scripts/activate
​
pip install wheel
pip install -r requirements.txt

现在,在根目录中打开一个终端窗口并运行以下命令:

sh .\install_dependencies.sh

创建终端节点transcribe

现在,我们将在应用程序中创建如何在Windows中随机停止USB连接/中断连接噪音一个终结点,该终结点将从客户端接收音频输入。应用程序将转录输入并将转录的文本返回给客户端。transcribe

此终结点接受请求并处理输入。当响应是 200 HTTP 响应时,客户端会收到转录的文本。POST

创建一个文件来保存用于处理输入的逻辑。打开一个新的终端窗口,在后端目录中创建一个文件:app.py``app.py

touch backend/app.py
code backend/app.py

将下面的代码块复制并新的 Bing 聊天可以做什么?(必应AI聊天功能)粘贴到文件中:app.py

import os
import tempfile
import flask
from flask import request
from flask_cors import CORS
import whisper

app = flask.Flask(__name__)
CORS(app)

// endpoint for handling the transcribing of audio inputs
@app.route('/transcribe', methods=['POST'])
def transcribe():
    if request.method == 'POST
        language = request.form['language']
        model = request.form['model_size']

        # there are no english models for large
        if model != 'large' and language == 'english':
            model = model + '.en'
        audio_model = whisper.load_model(model)

        temp_dir = tempfile.mkdtemp()
        save_path = os.path.join(temp_dir, 'temp.wav')

        wav_file = request.files['audio_data']
        wav_file.save(save_path)

        if language == 'english':
            result = audio_model.transcribe(save_path, language='english')
        else:
            result = audio_model.transcribe(save_path)

        return result['text']
    else:
        return "This endpoint only processes POST wav blob"

运行烧瓶应用程序

在包含变量的已激活终端窗口中,运行以下命令以启动应用程序:venv

$ cd backend
$ flask run –port 8000

期望应用程序启动时没有任何错误。如果是这种情况,终端窗口中应显示以下结果:

这结束了在 Flask 应用程序中创建端点的过程。transcribe

托管服务器

若要向 iOS 中创建的 HTTP 终结趣知笔记 - 分享有价值的教程!点发出网络请求,我们需要路由到 HTTPS 服务器。ngrok 解决了创建重新路由的问题。

下载 ngrok,然后安装软件包并打开它。终端窗口启动;输入以下命令以使用 ngrok 托管服务器:

ngrok http 8000

ngrok 将生成一个托管 URL,该 URL 将在客户端应用程序中用于请求。

使用 React Native 创建语音识别移动应用程序

对于本教程的这一部分,您需要安装一些东西:

  • 世博会 CLI:用于与世博会工具接口的命令行工具

  • 适用于 Android 和 iOS 的 Expo Go 应用程序:用于打开通过 Expo CLI 提供的应用程序

在新的终端窗口中,初始化 React Native 项目:

npx create-expo-app client
cd client

现在,启动开发服务器:

npx expo start

要在iOS设备上打开应用程序,请打开相机并扫描终端上的QR码。在 Android 设备上,按 Expo Go 应用程序的“主页”选项卡上的扫描二维码。

我们的世博围棋应用程序

处理录音

Expo-av 在我们的应用程序中处理音频的录制。我们的 Flask 服务器需要文件格式的文件。expo-av 包允许我们在保存之前指定格式。.wav

在终端中安装必要的软件包:

yarn add axios expo-av react-native-picker-select

创建模型选择器

应用程序必须能够选择模型大小。有五个选项可供选择:

  • 基础

  • 中等

所选输入大小确定要在服务器上将输入与哪个模型进行比较。

再次在终端中,使用以下命令创建一个名为 的文件夹和子文件夹:src``/components

mkdir src
mkdir src/components
touch src/components/Mode.tsx
code src/components/Mode.tsx

将代码块粘贴到文件中:Mode.tsx

import React from "react";
import { View, Text, StyleSheet } from "react-native";
import RNPickerSelect from "react-native-picker-select";

const Mode = ({
  onModelChange,
  transcribeTimeout,
  onTranscribeTimeoutChanged,
}: any) => {
  function onModelChangeLocal(value: any) {
    onModelChange(value);
  }

  function onTranscribeTimeoutChangedLocal(event: any) {
    onTranscribeTimeoutChanged(event.target.value);
  }

  return (
    <View>
      <Text style={styles.title}>Model Size</Text>
      <View style={
  
  { flexDirection: "row" }}>
        <RNPickerSelect
          onValueChange={(value) => onModelChangeLocal(value)}
          useNativeAndroidPickerStyle={false}
          placeholder={
  
  { label: "Select model", value: null }}
          items={[
            { label: "tiny", value: "tiny" },
            { label: "base", value: "base" },
            { label: "small", value: "small" },
            { label: "medium", value: "medium" },
            { label: "large", value: "large" },
          ]}
          style={customPickerStyles}
        />
      </View>
      <View>
        <Text style={styles.title}>Timeout :{transcribeTimeout}</Text>
      </View>
    </View>
  );
};

export default Mode;
const styles = StyleSheet.create({
  title: {
    fontWeight: "200",
    fontSize: 25,
    float: "left",
  },
});
const customPickerStyles = StyleSheet.create({
  inputIOS: {
    fontSize: 14,
    paddingVertical: 10,
    paddingHorizontal: 12,
    borderWidth: 1,
    borderColor: "green",
    borderRadius: 8,
    color: "black",
    paddingRight: 30, // to ensure the text is never behind the icon
  },
  inputAndroid: {
    fontSize: 14,
    paddingHorizontal: 10,
    paddingVertical: 8,
    borderWidth: 1,
    borderColor: "blue",
    borderRadius: 8,
    color: "black",
    paddingRight: 30, // to ensure the text is never behind the icon
  },
});

创建输出Transcribe

服务器返回带有文本的输出。此组件接收输出数据并显示它。

mkdir src
mkdir src/components
touch src/components/TranscribeOutput.tsx
code src/components/TranscribeOutput.tsx

将代码块粘贴到文件中:TranscribeOutput.tsx

import React from "react";
import { Text, View, StyleSheet } from "react-native";
const TranscribedOutput = ({
  transcribedText,
  interimTranscribedText,
}: any) => {
  if (transcribedText.length === 0 && interimTranscribedText.length === 0) {
    return <Text>...</Text>;
  }

  return (
    <View style={styles.box}>
      <Text style={styles.text}>{transcribedText}</Text>
      <Text>{interimTranscribedText}</Text>
    </View>
  );
};
const styles = StyleSheet.create({
  box: {
    borderColor: "black",
    borderRadius: 10,
    marginBottom: 0,
  },
  text: {
    fontWeight: "400",
    fontSize: 30,
  },
});

export default TranscribedOutput;

创建客户端功能

该应用程序依靠 Axios 从 Flask 服务器发送和接收数据;我们在前面的部分中安装了它。测试应用程序的默认语言是英语。

在文件中,导入必要的依赖项:App.tsx

import * as React from "react";
import {
  Text,
  StyleSheet,
  View,
  Button,
  ActivityIndicator,
} from "react-native";
import { Audio } from "expo-av";
import FormData from "form-data";
import axios from "axios";
import Mode from "./src/components/Mode";
import TranscribedOutput from "./src/components/TranscribeOutput";

创建状态变量

应用程序需要跟踪录制、转录数据、录制和正在进行的转录。默认情况下,语言、模型和超时在状态中设置。

export default () => {
  const [recording, setRecording] = React.useState(false as any);
  const [recordings, setRecordings] = React.useState([]);
  const [message, setMessage] = React.useState("");
  const [transcribedData, setTranscribedData] = React.useState([] as any);
  const [interimTranscribedData] = React.useState("");
  const [isRecording, setIsRecording] = React.useState(false);
  const [isTranscribing, setIsTranscribing] = React.useState(false);
  const [selectedLanguage, setSelectedLanguage] = React.useState("english");
  const [selectedModel, setSelectedModel] = React.useState(1);
  const [transcribeTimeout, setTranscribeTimout] = React.useState(5);
  const [stopTranscriptionSession, setStopTranscriptionSession] =
    React.useState(false);
  const [isLoading, setLoading] = React.useState(false);
  return (
    <View style={styles.root}></View>
)
}

const styles = StyleSheet.create({
  root: {
    display: "flex",
    flex: 1,
    alignItems: "center",
    textAlign: "center",
    flexDirection: "column",
  },
});

创建引用、语言和模型选项变量

useRef Hook 使我们能够跟踪当前初始化的属性。我们要设置转录会话、语言和模型。useRef

将代码块粘贴到挂钩下:setLoading``useState

  const [isLoading, setLoading] = React.useState(false);
  const intervalRef: any = React.useRef(null);

  const stopTranscriptionSessionRef = React.useRef(stopTranscriptionSession);
  stopTranscriptionSessionRef.current = stopTranscriptionSession;

  const selectedLangRef = React.useRef(selectedLanguage);
  selectedLangRef.current = selectedLanguage;

  const selectedModelRef = React.useRef(selectedModel);
  selectedModelRef.current = selectedModel;

  const supportedLanguages = [
    "english",
    "chinese",
    "german",
    "spanish",
    "russian",
    "korean",
    "french",
    "japanese",
    "portuguese",
    "turkish",
    "polish",
    "catalan",
    "dutch",
    "arabic",
    "swedish",
    "italian",
    "indonesian",
    "hindi",
    "finnish",
    "vietnamese",
    "hebrew",
    "ukrainian",
    "greek",
    "malay",
    "czech",
    "romanian",
    "danish",
    "hungarian",
    "tamil",
    "norwegian",
    "thai",
    "urdu",
    "croatian",
    "bulgarian",
    "lithuanian",
    "latin",
    "maori",
    "malayalam",
    "welsh",
    "slovak",
    "telugu",
    "persian",
    "latvian",
    "bengali",
    "serbian",
    "azerbaijani",
    "slovenian",
    "kannada",
    "estonian",
    "macedonian",
    "breton",
    "basque",
    "icelandic",
    "armenian",
    "nepali",
    "mongolian",
    "bosnian",
    "kazakh",
    "albanian",
    "swahili",
    "galician",
    "marathi",
    "punjabi",
    "sinhala",
    "khmer",
    "shona",
    "yoruba",
    "somali",
    "afrikaans",
    "occitan",
    "georgian",
    "belarusian",
    "tajik",
    "sindhi",
    "gujarati",
    "amharic",
    "yiddish",
    "lao",
    "uzbek",
    "faroese",
    "haitian creole",
    "pashto",
    "turkmen",
    "nynorsk",
    "maltese",
    "sanskrit",
    "luxembourgish",
    "myanmar",
    "tibetan",
    "tagalog",
    "malagasy",
    "assamese",
    "tatar",
    "hawaiian",
    "lingala",
    "hausa",
    "bashkir",
    "javanese",
    "sundanese",
  ];

  const modelOptions = ["tiny", "base", "small", "medium", "large"];
  React.useEffect(() => {
    return () => clearInterval(intervalRef.current);
  }, []);

  function handleTranscribeTimeoutChange(newTimeout: any) {
    setTranscribeTimout(newTimeout);
  }

创建录制函数

在本节中,我们将编写五个函数来处理音频听录。

函数startRecording

第一个函数是函数。此函数使应用程序能够请求使用麦克风的权限。所需的音频格式是预设的,我们有一个用于跟踪超时的参考:startRecording

  async function startRecording() {
    try {
      console.log("Requesting permissions..");
      const permission = await Audio.requestPermissionsAsync();
      if (permission.status === "granted") {
        await Audio.setAudioModeAsync({
          allowsRecordingIOS: true,
          playsInSilentModeIOS: true,
        });
        alert("Starting recording..");
        const RECORDING_OPTIONS_PRESET_HIGH_QUALITY: any = {
          android: {
            extension: ".mp4",
            outputFormat: Audio.RECORDING_OPTION_ANDROID_OUTPUT_FORMAT_MPEG_4,
            audioEncoder: Audio.RECORDING_OPTION_ANDROID_AUDIO_ENCODER_AMR_NB,
            sampleRate: 44100,
            numberOfChannels: 2,
            bitRate: 128000,
          },
          ios: {
            extension: ".wav",
            audioQuality: Audio.RECORDING_OPTION_IOS_AUDIO_QUALITY_MIN,
            sampleRate: 44100,
            numberOfChannels: 2,
            bitRate: 128000,
            linearPCMBitDepth: 16,
            linearPCMIsBigEndian: false,
            linearPCMIsFloat: false,
          },
        };
        const { recording }: any = await Audio.Recording.createAsync(
          RECORDING_OPTIONS_PRESET_HIGH_QUALITY
        );
        setRecording(recording);
        console.log("Recording started");
        setStopTranscriptionSession(false);
        setIsRecording(true);
        intervalRef.current = setInterval(
          transcribeInterim,
          transcribeTimeout * 1000
        );
        console.log("erer", recording);
      } else {
        setMessage("Please grant permission to app to access microphone");
      }
    } catch (err) {
      console.error(" Failed to start recording", err);
    }
  }

函数stopRecording

该功能使用户能够停止录制。状态变量存储并保存更新的记录。stopRecording``recording

  async function stopRecording() {
    console.log("Stopping recording..");
    setRecording(undefined);
    await recording.stopAndUnloadAsync();
    const uri = recording.getURI();
    let updatedRecordings = [...recordings] as any;
    const { sound, status } = await recording.createNewLoadedSoundAsync();
    updatedRecordings.push({
      sound: sound,
      duration: getDurationFormatted(status.durationMillis),
      file: recording.getURI(),
    });
    setRecordings(updatedRecordings);
    console.log("Recording stopped and stored at", uri);
    // Fetch audio binary blob data

    clearInterval(intervalRef.current);
    setStopTranscriptionSession(true);
    setIsRecording(false);
    setIsTranscribing(false);
  }

和函数getDurationFormatted``getRecordingLines

要获取录制的持续时间和录制文本的长度,请创建 and 函数:getDurationFormatted``getRecordingLines

  function getDurationFormatted(millis: any) {
    const minutes = millis / 1000 / 60;
    const minutesDisplay = Math.floor(minutes);
    const seconds = Math.round(minutes - minutesDisplay) * 60;
    const secondDisplay = seconds < 10 ? `0${seconds}` : seconds;
    return `${minutesDisplay}:${secondDisplay}`;
  }

  function getRecordingLines() {
    return recordings.map((recordingLine: any, index) => {
      return (
        <View key={index} style={styles.row}>
          <Text style={styles.fill}>
            {" "}
            Recording {index + 1} - {recordingLine.duration}
          </Text>
          <Button
            style={styles.button}
            onPress={() => recordingLine.sound.replayAsync()}
            title="Play"
          ></Button>
        </View>
      );
    });
  }

创建函数transcribeRecording

此功能允许我们与 Flask 服务器进行通信。我们使用 expo-av 库中的功能访问我们创建的音频。、 和 是我们发送到服务器的关键数据片段。getURI()languagemodel_size``audio_data

响应表示成功。我们将响应存储在 Hook 中。此回复包含我们的转录文本。200setTranscribedDatauseState

function transcribeInterim() {
    clearInterval(intervalRef.current);
    setIsRecording(false);
  }

  async function transcribeRecording() {
    const uri = recording.getURI();
    const filetype = uri.split(".").pop();
    const filename = uri.split("/").pop();
    setLoading(true);
    const formData: any = new FormData();
    formData.append("language", selectedLangRef.current);
    formData.append("model_size", modelOptions[selectedModelRef.current]);
    formData.append(
      "audio_data",
      {
        uri,
        type: `audio/${filetype}`,
        name: filename,
      },
      "temp_recording"
    );
    axios({
      url: "https://2c75-197-210-53-169.eu.ngrok.io/transcribe",
      method: "POST",
      data: formData,
      headers: {
        Accept: "application/json",
        "Content-Type": "multipart/form-data",
      },
    })
      .then(function (response) {
        console.log("response :", response);
        setTranscribedData((oldData: any) => [...oldData, response.data]);
        setLoading(false);
        setIsTranscribing(false);
        intervalRef.current = setInterval(
          transcribeInterim,
          transcribeTimeout * 1000
        );
      })
      .catch(function (error) {
        console.log("error : error");
      });

    if (!stopTranscriptionSessionRef.current) {
      setIsRecording(true);
    }
  }

组装应用程序

让我们组装到目前为止创建的所有零件:

import * as React from "react";
import {
  Text,
  StyleSheet,
  View,
  Button,
  ActivityIndicator,
} from "react-native";
import { Audio } from "expo-av";
import FormData from "form-data";
import axios from "axios";
import Mode from "./src/components/Mode";
import TranscribedOutput from "./src/components/TranscribeOutput";
​
export default () => {
  const [recording, setRecording] = React.useState(false as any);
  const [recordings, setRecordings] = React.useState([]);
  const [message, setMessage] = React.useState("");
  const [transcribedData, setTranscribedData] = React.useState([] as any);
  const [interimTranscribedData] = React.useState("");
  const [isRecording, setIsRecording] = React.useState(false);
  const [isTranscribing, setIsTranscribing] = React.useState(false);
  const [selectedLanguage, setSelectedLanguage] = React.useState("english");
  const [selectedModel, setSelectedModel] = React.useState(1);
  const [transcribeTimeout, setTranscribeTimout] = React.useState(5);
  const [stopTranscriptionSession, setStopTranscriptionSession] =
    React.useState(false);
  const [isLoading, setLoading] = React.useState(false);
  const intervalRef: any = React.useRef(null);
​
  const stopTranscriptionSessionRef = React.useRef(stopTranscriptionSession);
  stopTranscriptionSessionRef.current = stopTranscriptionSession;
​
  const selectedLangRef = React.useRef(selectedLanguage);
  selectedLangRef.current = selectedLanguage;
​
  const selectedModelRef = React.useRef(selectedModel);
  selectedModelRef.current = selectedModel;
​
  const supportedLanguages = [
    "english",
    "chinese",
    "german",
    "spanish",
    "russian",
    "korean",
    "french",
    "japanese",
    "portuguese",
    "turkish",
    "polish",
    "catalan",
    "dutch",
    "arabic",
    "swedish",
    "italian",
    "indonesian",
    "hindi",
    "finnish",
    "vietnamese",
    "hebrew",
    "ukrainian",
    "greek",
    "malay",
    "czech",
    "romanian",
    "danish",
    "hungarian",
    "tamil",
    "norwegian",
    "thai",
    "urdu",
    "croatian",
    "bulgarian",
    "lithuanian",
    "latin",
    "maori",
    "malayalam",
    "welsh",
    "slovak",
    "telugu",
    "persian",
    "latvian",
    "bengali",
    "serbian",
    "azerbaijani",
    "slovenian",
    "kannada",
    "estonian",
    "macedonian",
    "breton",
    "basque",
    "icelandic",
    "armenian",
    "nepali",
    "mongolian",
    "bosnian",
    "kazakh",
    "albanian",
    "swahili",
    "galician",
    "marathi",
    "punjabi",
    "sinhala",
    "khmer",
    "shona",
    "yoruba",
    "somali",
    "afrikaans",
    "occitan",
    "georgian",
    "belarusian",
    "tajik",
    "sindhi",
    "gujarati",
    "amharic",
    "yiddish",
    "lao",
    "uzbek",
    "faroese",
    "haitian creole",
    "pashto",
    "turkmen",
    "nynorsk",
    "maltese",
    "sanskrit",
    "luxembourgish",
    "myanmar",
    "tibetan",
    "tagalog",
    "malagasy",
    "assamese",
    "tatar",
    "hawaiian",
    "lingala",
    "hausa",
    "bashkir",
    "javanese",
    "sundanese",
  ];
​
  const modelOptions = ["tiny", "base", "small", "medium", "large"];
​
  React.useEffect(() => {
    return () => clearInterval(intervalRef.current);
  }, []);
​
  function handleTranscribeTimeoutChange(newTimeout: any) {
    setTranscribeTimout(newTimeout);
  }
​
  async function startRecording() {
    try {
      console.log("Requesting permissions..");
      const permission = await Audio.requestPermissionsAsync();
      if (permission.status === "granted") {
        await Audio.setAudioModeAsync({
          allowsRecordingIOS: true,
          playsInSilentModeIOS: true,
        });
        alert("Starting recording..");
        const RECORDING_OPTIONS_PRESET_HIGH_QUALITY: any = {
          android: {
            extension: ".mp4",
            outputFormat: Audio.RECORDING_OPTION_ANDROID_OUTPUT_FORMAT_MPEG_4,
            audioEncoder: Audio.RECORDING_OPTION_ANDROID_AUDIO_ENCODER_AMR_NB,
            sampleRate: 44100,
            numberOfChannels: 2,
            bitRate: 128000,
          },
          ios: {
            extension: ".wav",
            audioQuality: Audio.RECORDING_OPTION_IOS_AUDIO_QUALITY_MIN,
            sampleRate: 44100,
            numberOfChannels: 2,
            bitRate: 128000,
            linearPCMBitDepth: 16,
            linearPCMIsBigEndian: false,
            linearPCMIsFloat: false,
          },
        };
        const { recording }: any = await Audio.Recording.createAsync(
          RECORDING_OPTIONS_PRESET_HIGH_QUALITY
        );
        setRecording(recording);
        console.log("Recording started");
        setStopTranscriptionSession(false);
        setIsRecording(true);
        intervalRef.current = setInterval(
          transcribeInterim,
          transcribeTimeout * 1000
        );
        console.log("erer", recording);
      } else {
        setMessage("Please grant permission to app to access microphone");
      }
    } catch (err) {
      console.error(" Failed to start recording", err);
    }
  }
  async function stopRecording() {
    console.log("Stopping recording..");
    setRecording(undefined);
    await recording.stopAndUnloadAsync();
    const uri = recording.getURI();
    let updatedRecordings = [...recordings] as any;
    const { sound, status } = await recording.createNewLoadedSoundAsync();
    updatedRecordings.push({
      sound: sound,
      duration: getDurationFormatted(status.durationMillis),
      file: recording.getURI(),
    });
    setRecordings(updatedRecordings);
    console.log("Recording stopped and stored at", uri);
    // Fetch audio binary blob data
​
    clearInterval(intervalRef.current);
    setStopTranscriptionSession(true);
    setIsRecording(false);
    setIsTranscribing(false);
  }
​
  function getDurationFormatted(millis: any) {
    const minutes = millis / 1000 / 60;
    const minutesDisplay = Math.floor(minutes);
    const seconds = Math.round(minutes - minutesDisplay) * 60;
    const secondDisplay = seconds < 10 ? `0${seconds}` : seconds;
    return `${minutesDisplay}:${secondDisplay}`;
  }
​
  function getRecordingLines() {
    return recordings.map((recordingLine: any, index) => {
      return (
        <View key={index} style={styles.row}>
          <Text style={styles.fill}>
            {" "}
            Recording {index + 1} - {recordingLine.duration}
          </Text>
          <Button
            style={styles.button}
            onPress={() => recordingLine.sound.replayAsync()}
            title="Play"
          ></Button>
        </View>
      );
    });
  }
​
  function transcribeInterim() {
    clearInterval(intervalRef.current);
    setIsRecording(false);
  }
​
  async function transcribeRecording() {
    const uri = recording.getURI();
    const filetype = uri.split(".").pop();
    const filename = uri.split("/").pop();
    setLoading(true);
    const formData: any = new FormData();
    formData.append("language", selectedLangRef.current);
    formData.append("model_size", modelOptions[selectedModelRef.current]);
    formData.append(
      "audio_data",
      {
        uri,
        type: `audio/${filetype}`,
        name: filename,
      },
      "temp_recording"
    );
    axios({
      url: "https://2c75-197-210-53-169.eu.ngrok.io/transcribe",
      method: "POST",
      data: formData,
      headers: {
        Accept: "application/json",
        "Content-Type": "multipart/form-data",
      },
    })
      .then(function (response) {
        console.log("response :", response);
        setTranscribedData((oldData: any) => [...oldData, response.data]);
        setLoading(false);
        setIsTranscribing(false);
        intervalRef.current = setInterval(
          transcribeInterim,
          transcribeTimeout * 1000
        );
      })
      .catch(function (error) {
        console.log("error : error");
      });
​
    if (!stopTranscriptionSessionRef.current) {
      setIsRecording(true);
    }
  }
  return (
    <View style={styles.root}>
      <View style={
  
  { flex: 1 }}>
        <Text style={styles.title}>Speech to Text. </Text>
        <Text style={styles.title}>{message}</Text>
      </View>
      <View style={styles.settingsSection}>
        <Mode
          disabled={isTranscribing || isRecording}
          possibleLanguages={supportedLanguages}
          selectedLanguage={selectedLanguage}
          onLanguageChange={setSelectedLanguage}
          modelOptions={modelOptions}
          selectedModel={selectedModel}
          onModelChange={setSelectedModel}
          transcribeTimeout={transcribeTimeout}
          onTranscribeTiemoutChanged={handleTranscribeTimeoutChange}
        />
      </View>
      <View style={styles.buttonsSection}>
        {!isRecording && !isTranscribing && (
          <Button onPress={startRecording} title="Start recording" />
        )}
        {(isRecording || isTranscribing) && (
          <Button
            onPress={stopRecording}
            disabled={stopTranscriptionSessionRef.current}
            title="stop recording"
          />
        )}
        <Button title="Transcribe" onPress={() => transcribeRecording()} />
        {getRecordingLines()}
      </View>
​
      {isLoading !== false ? (
        <ActivityIndicator
          size="large"
          color="#00ff00"
          hidesWhenStopped={true}
          animating={true}
        />
      ) : (
        <Text></Text>
      )}
​
      <View style={styles.transcription}>
        <TranscribedOutput
          transcribedText={transcribedData}
          interimTranscribedText={interimTranscribedData}
        />
      </View>
    </View>
  );
};
​
const styles = StyleSheet.create({
  root: {
    display: "flex",
    flex: 1,
    alignItems: "center",
    textAlign: "center",
    flexDirection: "column",
  },
  title: {
    marginTop: 40,
    fontWeight: "400",
    fontSize: 30,
  },
  settingsSection: {
    flex: 1,
  },
  buttonsSection: {
    flex: 1,
    flexDirection: "row",
  },
  transcription: {
    flex: 1,
    flexDirection: "row",
  },
  recordIllustration: {
    width: 100,
  },
  row: {
    flexDirection: "row",
    alignItems: "center",
    justifyContent: "center",
  },
  fill: {
    flex: 1,
    margin: 16,
  },
  button: {
    margin: 16,
  },
});

运行应用程序

使用以下命令运行 React Native 应用程序:

yarn start

项目存储库是公开可用的。

结论

在本文中,我们学习了如何在 React Native 应用程序中创建语音转文本功能。我预见到Whisper会改变日常生活中叙述和听写的工作方式。本文中介绍的技术支持创建听写应用。

我很高兴看到新的和创新的方式,开发人员扩展Whisper,例如,使用Whisper在我们的移动和网络设备上执行操作,或使用Whisper来改善我们网站和应用程序的可访问性。

猜你喜欢

转载自blog.csdn.net/weixin_47967031/article/details/132789181