您当前的位置是:  首页 > technology > Corporate communications > technology > voice communication > Technical dynamics >
  Home >technology > Corporate communications > technology > voice communication > Technical dynamics > Google update cloud speech to text service, dictation call error rate reduced by 54%

Google update cloud speech to text service, dictation call error rate reduced by 54%

2018-04-10 16:47:48 Author: Source:CTI ForumComment:0  Click:


Cloud Speech-to-Text service has been redesigned to provide four modes of command and search, telephone, video, and preset. The error rate of the enhanced phone mode dictation phone has been reduced by 54%, which is similar to YouTube's subtitle technology. Video mode transcription error rate also reduced by 64%.
Following the release of the Cloud Text-to-Speech service last month, Google updated the Cloud Speech-to-Text service on Monday (4/9) to provide brand new videos and phones. The transcription model has also been added with automated punctuation functions. Compared to the original phone model, the new enhanced phone call model (enhanced phone_call model) reduced the error rate of speech recognition by 54%.
Google's Cloud Speech-to-Text, originally published in 2016, was formerly known as the Cloud Speech API. It recognizes more than 120 languages, including Chinese. This week is the largest revision since the service was founded. It offers four kinds of customizations. Mode, including command and search (command_and_search), phone_call, video, and default, where the phone model is applicable to phone content recorded based on an 8KHz sampling rate, while the default model is used for sound quality. For better, longer time, and higher sampling rate than 16KHz audio, the user can select the applicable model according to different usage scenarios to smoothly convert the speech into text.
Dan Aharon, product manager of Google Cloud AI, said that there are many cloud providers that use customer requests to improve related services. However, based on data and privacy protection, Google does not intend to adopt such a method. Instead, it launches the industry’s first choice. "opt-in program", which allows customers to actively provide information for Google records and analysis, and the first product is an enhanced version of the phone transcription model, which successfully reduced Cloud Speech-to-Text telephone transcription services. 54% error.
As for the new video mode, it can be used to convert speech in the video to text, or to transcribe speeches that are spoken by many people at the same time. The machine learning technology used in this model is similar to the technology behind YouTube's automated subtitles, compared to the original. In the default mode, the transcription error rate of the movie mode is also reduced by 64%.
No matter if it is an enhanced version of the audio mode or video mode, it now only supports English and is expected to expand to other languages ​​soon.
In addition to the new voice and video modes, the new version of Cloud Speech-to-Text is also designed to automatically add punctuation after the speech is converted into text to make the text easier to read. At present, this model is still in the testing stage. Suggest punctuation such as comma, period, or question mark.
In addition to the video mode, the charge for every 15 seconds is 0.012 US dollars, the cost of other modes is US $ 0.006 every 15 seconds, in order to promote the new movie mode, as of May 31 this year can enjoy a discounted price of 0.006 US dollars per 15 seconds.
[Disclaimer] This article only represents the author's own opinion and has nothing to do with the CTI Forum. The CTI Forum maintains its neutrality in the presentation of statements and opinions, and does not provide any express or implied warranty for the accuracy, reliability or completeness of the contents contained therein. Readers are for reference only, and please bear full responsibility for yourself.

Topics