Author: Xu Xiang
foreword
When using ChatGPT, we often encounter situations where our hands cannot be released, which prevents us from fully utilizing the text input and output functions of ChatGPT. However, with the continuous advancement of technology, speech recognition technology is becoming more and more accurate and convenient, which makes it possible for us to achieve seamless interaction with ChatGPT.
Using speech recognition technology, we can communicate with ChatGPT by dictating questions instead of text input. Speech recognition software converts our questions into text and sends them to ChatGPT for processing. Finally, the answer of ChatGPT can be broadcast in the form of voice through voice recognition software, so as to realize real voice interaction. In this way, we can still communicate seamlessly with ChatGPT and improve the interaction when both hands cannot be operated freely. Efficiency, enjoy a more pleasant interactive experience.
Meet Scriptable
Siri has its own voice recognition capability, and we can call Siri's voice recognition through Scriptable to realize voice interaction. Scriptable is an iOS scripting tool that allows us to develop widgets and Applets using Swift or JavaScript. It provides a rich API that can implement functions such as scheduled tasks, network requests, file operations, and UI display. We will use Scriptable to implement the interaction with Siri and the request of ChatGPT.
Download and open scriptable , click the "plus sign" on the upper right to create a new project and name it SiriChatGPT
This time we make a small-sized widget
function createWidget(img) { //
定义一个创建组件的函数
const w = new ListWidget();
w.addSpacer();
w.spacing = 5;
const bgColor = new LinearGradient(); //
定义一个渐变背景
bgColor.colors = [new Color("#333"), new Color("#333")]; //
设置渐变颜色
bgColor.locations = [0.0, 1.0]; //
设置渐变位置
if (config.widgetFamily == "small" || config.widgetFamily === undefined) { //
如果是小组件
w.backgroundImage = img; //
设置背景图片
w.presentSmall(); //
显示小组件
}
return w;
}
We define a function createWidget to create a component, and the background image of the component can be set by passing in the img parameter.
Get the current rendering size of the component according to config.widgetFamily, widgetFamily: small (small), medium (medium), large (large)
try {
var img = await new Request("https://n.sinaimg.cn/sinakd20230202s/348/w184h164/20230202/ad51-f00728753f8f5e1c53c10ae5dd1cf3de.png").loadImage();
} catch (err) {
throw new Error("
图片地址不支持
");
}
let widget = createWidget(img);
Use the Request method to load a picture as the logo picture of the component and display it, use try to catch loading errors, and call the createWidget method to transfer the picture to the widget, in order to let siri make a sound.
read
Use siri to borrow fixed content from siri's ability to read out the data by calling the Speech API exposed by the widget.
await Speech.speak('
请说出你的问题
');
Call the Speech.speak command directly, and the return is a promise, so you need to add an await before it.
One thing to note about Speech is that it must depend on the siri environment, otherwise it will report an error "command not supported"
After completing the creation of widgets and voice prompts, the next step is to realize the interaction with ChatGPT. Here we need to use the API provided by OpenAI to convert the voice into text, then pass the text to ChatGPT for processing, and finally convert the answer of ChatGPT into voice and play it.
listen
We need to give our dictated questions to Siri and provide chatGPT recognition for voice-to-text conversion. The Dictation API provides an asynchronous start method
let result = await Dictation.start();
By calling the start method, a new dialog box will be opened, we can try to speak and write words, and we can see that the words are converted into text and displayed in the dialog box
Take this as an example, click "Done" after consulting "Recipe of Braised Pork".
think
After completing the creation of widgets and voice prompts, the next step is to realize the interaction with ChatGPT. Here we need to use the API provided by OpenAI to convert the voice into text, then pass the text to ChatGPT for processing, and finally convert the answer of ChatGPT into voice and play it.
async function queryOpenAI(text) {
let req = new Request("https://api.openai.com/v1/engines/davinci-codex/completions");
req.headers = {
"Authorization": "Bearer " + apiKey,
"Content-Type": "application/json",
};
req.method = "POST";
req.body = JSON.stringify({
prompt: text,
max_tokens: 2048,
temperature: 0.7,
n: 1,
stream: false,
stop: "\n",
});
let result = await req.loadJSON();
return result.choices[0].text;
}
Get the identified copy through Dictation, pass it to the query interface and send a request to openai.
Please replace apiKey with your own chatgpt Api key
code show as below
//
如果是在
Siri
中运行,则使用语音提示用户说出问题
if(config.runsWithSiri) {
await Speech.speak('
请说出你的问题
');
}
//
如果是在应用中运行,则调用
Dictation.start()
方法开始语音识别,并将识别结果发送到服
if (config.runsInApp) {
let result = await Dictation.start(); //
开启识别
let r = await _post({ //
发送识别的文案
url: 'chat.openapi.com/request',
body: JSON.stringify({
value: result,
}),
headers: {
"Content-Type": "application/json",
}
})
await Speech.speak(r.response); //
朗读
chatGPT
的返回内容
}
Script.setWidget(widget);
Script.complete();
Install widgets to Mac
Open the settings button at the bottom left of the scriptable
Open the settings panel, click the "Add to Siri" button, and change the voice command "Hey Siri, come out soon" and complete the component and command part
Click " Edit Widget "
Add "Run Script" to the desktop widget
Click on the newly added component
Select Script as the SiriChatGPT component we just developed
So far, all the development and settings have been completed, we can try it out, say "Hey Siri, come out soon" to the Mac, and you will hear Siri say the answer