百度AI的语音识别与语音合成

机器人语音问答的需要,调用百度AI的语音识别

这里的思路很简单,就是用百度的API,初始化客户端,然后输入参数进行调用。

代码


  
  
  1. import wave
  2. import pyaudio
  3. from aip import AipSpeech,AipNlp
  4. from playsound import playsound
  5. """ 你的 APPID AK SK """
  6. APP_ID = '****'
  7. API_KEY = '****'
  8. SECRET_KEY = '****'
  9. # 读取文件
  10. def get_file_content(filePath):
  11. with open(filePath, 'rb') as fp:
  12. return fp.read()
  13. # 录音功能
  14. def record_content():
  15. CHUNK = 1024
  16. FORMAT = pyaudio.paInt16
  17. CHANNELS = 1
  18. RATE = 16000
  19. RECORD_SECONDS = 3
  20. WAVE_OUTPUT_FILENAME = "audio.wav"
  21. p = pyaudio.PyAudio()
  22. stream = p.open(format=FORMAT, channels=CHANNELS,
  23. rate=RATE, input= True,
  24. frames_per_buffer=CHUNK)
  25. print( "* recording")
  26. frames = []
  27. for j in range( 0, int(RATE / CHUNK * RECORD_SECONDS)):
  28. data = stream.read(CHUNK)
  29. frames.append(data)
  30. print( "* done recording")
  31. stream.stop_stream()
  32. stream.close()
  33. p.terminate()
  34. wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
  35. wf.setnchannels(CHANNELS)
  36. wf.setsampwidth(p.get_sample_size(FORMAT))
  37. wf.setframerate(RATE)
  38. wf.writeframes( b''.join(frames))
  39. wf.close()
  40. print( "done ------------------------------ ")
  41. return WAVE_OUTPUT_FILENAME
  42. # 生成语音功能客户端
  43. client_audio = AipSpeech(APP_ID, API_KEY, SECRET_KEY)
  44. # 语音录制
  45. filePath = record_content()
  46. # 语音识别
  47. result_audio = client_audio.asr(get_file_content(filePath), 'wav', 16000, {
  48. 'dev_pid': 1536,
  49. })
  50. content_audio = result_audio[ 'result'][ 0]
  51. print(content_audio)
  52. # 自然语音处理客户端
  53. client_nlp = AipNlp(APP_ID, API_KEY, SECRET_KEY)
  54. # text = "百度是一家高科技公司"
  55. text = content_audio
  56. """ 调用词法分析 """
  57. xx = client_nlp.lexer(text)
  58. content_answer = xx[ 'items'][ 0][ 'item']
  59. # 语音合成
  60. try:
  61. result_answer = client_audio.synthesis(content_answer, 'zh', 1, {
  62. 'vol': 5,
  63. })
  64. except Exception as e:
  65. print(e)
  66. # 语音写入
  67. if not isinstance(result_answer, dict):
  68. with open( 'audio.mp3', 'wb') as f:
  69. f.write(result_answer)
  70. # 语音播放
  71. playsound( 'audio.mp3')

录音

首先将对方的语音录下,存为 **.wav 音频文件,其中原始 PCM 的录音参数必须符合 16k 采样率16bit 位深单声道,支持的格式有:pcm(不压缩)、wav(不压缩,pcm编码)、amr(压缩格式)


  
  
  1. # 录音功能
  2. def record_content():
  3. CHUNK = 1024
  4. FORMAT = pyaudio.paInt16
  5. CHANNELS = 1
  6. RATE = 16000
  7. RECORD_SECONDS = 3
  8. WAVE_OUTPUT_FILENAME = "audio.wav"
  9. p = pyaudio.PyAudio()
  10. stream = p.open(format=FORMAT, channels=CHANNELS,
  11. rate=RATE, input= True,
  12. frames_per_buffer=CHUNK)
  13. print( "* recording")
  14. frames = []
  15. for j in range( 0, int(RATE / CHUNK * RECORD_SECONDS)):
  16. data = stream.read(CHUNK)
  17. frames.append(data)
  18. print( "* done recording")
  19. stream.stop_stream()
  20. stream.close()
  21. p.terminate()
  22. wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
  23. wf.setnchannels(CHANNELS)
  24. wf.setsampwidth(p.get_sample_size(FORMAT))
  25. wf.setframerate(RATE)
  26. wf.writeframes( b''.join(frames))
  27. wf.close()
  28. print( "done ------------------------------ ")
  29. return WAVE_OUTPUT_FILENAME

 

识别

然后将录音文件进行识别

代码如下


  
  
  1. # 语音识别
  2. result_audio = client_audio.asr(get_file_content(filePath), 'wav', 16000, {
  3. 'dev_pid': 1536,
  4. })
  5. content_audio = result_audio[ 'result'][ 0]
  6. print(content_audio)

处理


  
  
  1. # 自然语音处理客户端
  2. client_nlp = AipNlp(APP_ID, API_KEY, SECRET_KEY)
  3. # text = "百度是一家高科技公司"
  4. text = content_audio
  5. """ 调用词法分析 """
  6. xx = client_nlp.lexer(text)
  7. content_answer = xx[ 'items'][ 0][ 'item']

 

回答

语音文件识别结束之后,将其写入到本地文件,并进行播放(python几种播放方法


  
  
  1. # 语音合成
  2. try:
  3. result_answer = client_audio.synthesis(content_answer, 'zh', 1, {
  4. 'vol': 5,
  5. })
  6. except Exception as e:
  7. print(e)
  8. # 语音写入
  9. if not isinstance(result_answer, dict):
  10. with open( 'audio.mp3', 'wb') as f:
  11. f.write(result_answer)
  12. # 语音播放
  13. playsound( 'audio.mp3')

需要解决的问题(有建议请评论告知,感谢!):

1.不定长语音文件的判定(音频文件时长不固定,根据说话时长来确定)

扫描二维码关注公众号,回复: 10335435 查看本文章

2.一群人中确定一个说话人接收指令

发布了61 篇原创文章 · 获赞 45 · 访问量 6660

猜你喜欢

转载自blog.csdn.net/qq_38861587/article/details/105211223