分享科技與遊戲 by HKGoldenMr.A: 言語辨識及語音文字輸入 HTML5 Web Speech Recognition API

2018-12-31

言語辨識及語音文字輸入 HTML5 Web Speech Recognition API

最近因為需要將錄音訪問轉成文字非常傷腦筋及花時間處理
發現 HTML5 的 Web Speech Recognition API 能夠將言語轉換成文字，便可以邊聽錄音邊說出錄音內容轉換成文字，加快工作速度
其實言語辨識及語音文字輸入並不是新穎的技術，以前語音輸入都需要安裝特殊軟件，而且通常都只支援 Windows
現在的 Android 及 iOS 已經分別能夠使用 Google 及 Apple 的語音輸入功能
但經過一段時間便會自動停止，如果需要長時間語音輸入，便非常麻煩
而 HTML5 Web Speech Recognition API 是網頁技術標準之一，可以由使用者自行製作需要的功能

demo

使用 HTML5 Web Speech Recognition API ，需要建立 SpeechRecognition(言語辨識) 物件，但 SpeechRecognition 還在草擬階段
而且實際上暫時 (文章發佈日期) 只能在 Google Chrome 瀏覽器中建立，但在下仍然保留跨瀏覽器的建立方法

1 2	`var` `SpeechRecognition = window.SpeechRecognition \|\| window.webkitSpeechRecognition;` `var` `speechRecognition =` `new` `SpeechRecognition();`

確保不同瀏覽器的都能建立 SpeechRecognition

SpeechRecognition 一些屬性設定會影響言語辨識的準確度

SpeechRecognition.continuous 設定傳回連續結果，預設為 false
設定為 true 時會傳回多個結果，準確度比較高，需要較多時間
設定為 false 時只傳回一個結果，準確度比較低，需要較少時間
SpeechRecognition.grammars 設定使用的 SRGS (Speech Recognition Grammar Specification 言語辨識文法規範)
如果沒有特別還是使用預設
SpeechRecognition.interimResults 設定是否傳回臨時結果，預設為 false
如果設定為 true ，在言語辨識時，可以即時知道辨識的內容

SpeechRecognition.lang 為設定的支援言語，預設為 HTML 原本的 lang 或使用者的系統言語

言語名稱	言語編碼
南非語	af-ZA
阿姆哈拉語	am-ET
阿塞拜疆語	az-AZ
孟加拉語(孟加拉)	bn-BD
孟加拉語(印度)	bn-IN
印尼語	id-ID
馬來語	ms-MY
加泰隆尼亞語	ca-ES
捷克語	cs-CZ
丹麥語	da-DK
德語	de-DE
英語(澳洲)	en-AU
英語(加拿大)	en-CA
英語(印度)	en-IN
英語(肯雅)	en-KE
英語(坦桑尼亞)	en-TZ
英語(加納)	en-GH
英語(紐西蘭)	en-NZ
英語(尼日利亞)	en-NG
英語(南非)	en-ZA
英語(菲律賓)	en-PH
英語(英國)	en-GB
英語(美國)	en-US
西班牙語(阿根廷)	es-AR
西班牙語(玻利維亞)	es-BO
西班牙語(智利)	es-CL
西班牙語(哥倫比亞)	es-CO
西班牙語(哥斯達黎加)	es-CR
西班牙語(厄瓜多爾)	es-EC
西班牙語(薩爾瓦多)	es-SV
西班牙語(西班牙)	es-ES
西班牙語(美國)	es-US
西班牙語(危地馬拉)	es-GT
西班牙語(洪都拉斯)	es-HN
西班牙語(墨西哥)	es-MX
西班牙語(尼加拉瓜)	es-NI
西班牙語(巴拿馬)	es-PA
西班牙語(巴拉圭)	es-PY
西班牙語(秘魯)	es-PE
西班牙語(波多黎各)	es-PR
西班牙語(多明尼加)	es-DO
西班牙語(烏拉圭)	es-UY
西班牙語(委內瑞拉)	es-VE
巴斯克語	eu-ES
菲律賓語	fil-PH
法語	fr-FR
爪哇語	jv-ID
加利西亞語	gl-ES
古吉拉特語	gu-IN
克羅地亞語	hr-HR
祖魯語	zu-ZA
冰島語	is-IS
意大利語(意大利)	it-IT
意大利語(瑞士)	it-CH
康納達語	kn-IN
高棉語	km-KH
拉脫維亞語	lv-LV
立陶宛語	lt-LT
馬拉雅拉姆語	ml-IN
馬拉提語	mr-IN
匈牙利語	hu-HU
老撾語	lo-LA
荷蘭語	nl-NL
尼泊爾語	ne-NP
挪威語	nb-NO
波蘭語	pl-PL
葡萄牙語(巴西)	pt-BR
葡萄牙語(葡萄牙)	pt-PT
羅馬尼亞語	ro-RO
僧伽羅語	si-LK
斯洛文尼亞語	sl-SI
巽他語	su-ID
斯洛伐克語	sk-SK
芬蘭語	fi-FI
瑞典語	sv-SE
斯瓦希里語(坦桑尼亞)	sw-TZ
斯瓦希里語(肯雅)	sw-KE
喬治亞語	ka-GE
亞美尼亞語	hy-AM
泰米爾語(印度)	ta-IN
泰米爾語(星加坡)	ta-SG
泰米爾語(斯里蘭卡)	ta-LK
泰米爾語(馬來西亞)	ta-MY
泰盧固語	te-IN
越南語	vi-VN
土耳其語	tr-TR
烏爾都語(巴基斯坦)	ur-PK
烏爾都語(印度)	ur-IN
希臘語	el-GR
保加利亞語	bg-BG
俄語	ru-RU
塞爾維亞語	sr-RS
烏克蘭語	uk-UA
韓語	ko-KR
普通話(中國)	cmn-Hans-CN
普通話(香港)	cmn-Hans-HK
國語(台灣)	cmn-Hant-TW
粵語(香港)	yue-Hant-HK
日語	ja-JP
印度語	hi-IN
泰語	th-TH

SpeechRecognition.maxAlternatives 由於言語辨識有偏差內容，設定輸出最多偏差數量，預設為 1

了解基本資料後便可以設定 SpeechRecognition

speechRecognition.continuous = true;
speechRecognition.interimResults = true;
speechRecognition.lang = "yue-Hant-HK";
speechRecognition.maxAlternatives = 1;

設定 SpeechRecognition 後，可以使用

SpeechRecognition.start()
開始言語辨識
SpeechRecognition.stop()
停止言語辨識

開始言語辨識後，還需要設定 Event Handler 來獲取言語辨識後的內容

SpeechRecognition.start
開始言語辨識時的操作
SpeechRecognition.result
正在言語辨識時的操作，會傳回 SpeechRecognitionEvent
當 SpeechRecognition 連續 8 秒接收不到有效的言語，會自動停止
SpeechRecognition.end
停止言語辨識時的操作

Event Handler 設定時
當以屬性設定時，需要加上 on ，即是

speechRecognition.onstart = function(){
// do something
};

當以 Event Listener 設定時則不需要

speechRecognition.addEventListener("start", function(){
// do something
});

比較關鍵的是 result Event Handler
當 SpeechRecognition 開始後，便需要通過 result 來不斷獲取 SpeechRecognitionEvent
SpeechRecognitionEvent.results 的 SpeechRecognitionResultList 是保存著言語辨識後的資料
SpeechRecognitionResultList 是 SpeechRecognitionResult 陣列
SpeechRecognitionResult.isFinal 來用測試當前的內容是否最後結果
SpeechRecognitionResult 保存著 SpeechRecognitionAlternative 即是言語辨識時的偏差資料
SpeechRecognitionAlternative.transcript 就是言語辨識後的內容
可以使用

speechRecognition.addEventListener("result", function(event){
    for (var i in event.results){
        for (var j in event.results[i]){
            if (event.results[i].isFinal){
                // do something event.results[i][j].transcript when it is final
            } else {
                // do something event.results[i][j].transcript when it is not final
            }
        }
    }
});

在下的測試
亦歡迎使用這個測試

Javascript

100

function convertToPunctuation(string){
    var punctuations = {
        "斷行號": "\n",
        "逗號": "，",
        "句號": "。",
        "頓號": "、",
        "冒號": "：",
        "分號": "；",
        "問號": "？",
        "感嘆號": "！",
        "破折號": "——",
        "省略號": "……",
        "開括號": "（",
        "關括號": "）",
        "開引號": "「",
        "關引號": "」",
        "開雙引號": "『",
        "關雙引號": "』",
        "開書名號": "《",
        "關書名號": "》"
//        開此增加自動匹配詞語轉換
    };
    for (var i in punctuations){
        string = string.split(i).join(punctuations[i]);
    }
    return string;
}
 
var SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
 
window.addEventListener("load", function(){
    if (SpeechRecognition){
        speechRecognition = new SpeechRecognition();
        speechRecognition.continuous = true;
        speechRecognition.interimResults = true;
        speechRecognition.addEventListener("start", function(){
            console.log(new Date());
            document.getElementById("toggle").value = "Stop Speech Recognition";
        });
        speechRecognition.addEventListener("result", function(event){
            var bufferContainer = document.getElementById("bufferContainer");
            var resultContainer = document.getElementById("resultContainer");
            var resultList = event.results;
            for (var i = 0; i < resultList.length; i++){
                var result = resultList.item(i);
                try{
                   var alternative = result.item(0);
                    var text = convertToPunctuation(alternative.transcript);
                    bufferContainer.value = resultContainer.value + text;
                } catch (ex){
                    console.log(ex);
                }
                if (result.isFinal){
                    this.stop();
                    break;
                }
            }
        });
        speechRecognition.addEventListener("end", function(){
            var bufferContainer = document.getElementById("bufferContainer");
            var resultContainer = document.getElementById("resultContainer");
            resultContainer.value = bufferContainer.value;
            var toggle = document.getElementById("toggle");
            var autoResume = document.getElementById("autoResume");
            if (toggle.value == "Stop Speech Recognition" && autoResume.checked){
                this.start();
            }
        });
    }
});
 
function toggleSpeechRecognition(){
    if (SpeechRecognition){
        var toggle = document.getElementById("toggle");
        if (toggle.value == "Stop Speech Recognition"){
            toggle.value = "Start Speech Recognition";
            speechRecognition.stop();
        } else {
            speechRecognition.lang = document.getElementById("language").value;
            speechRecognition.start();
        }
    } else {
        window.alert("This browser does not support Web Speech Recognition API.");
    }
}
 
function selectAllText(element){
    element.select();
}
 
function clearAllText(element){
    element.value = "";
}
 
function clearContainer(message){
    if (window.confirm(message)){
        clearAllText(document.getElementById('bufferContainer'));
        clearAllText(document.getElementById('resultContainer'));
    }
}

HTML

<select id="language">
    <option value="yue-Hant-HK">香港粵語</option>
    <option value="en-US">美式英語</option>
<!-- 在此增加 言語名稱 及 言語編碼 -->
</select>
<label>Continuously<input id="autoResume" type="checkbox" checked="checked"/></label>
<input id="toggle" type="button" value="Start Speech Recognition" onclick="toggleSpeechRecognition();"/>
<input id="clear" type="button" value="Clear Contents" onclick="clearContainer('Confirm to clear Contents？');"/>
<table width="100%">
    <colgroup>
        <col width="50%"/>
        <col width="50%"/>
    </colgroup>
    <thead>
        <tr>
            <th><input type="button" value="Select all Temporary Contents" onclick="selectAllText(document.getElementById('bufferContainer'));"/></th>
            <th><input type="button" value="Select all Out Contents" onclick="selectAllText(document.getElementById('resultContainer'));"></th>
        </tr>
    </thead>
    <tbody>
        <tr valign="top">
            <td><textarea id="bufferContainer" rows="1" cols="1" readonly="readonly" style="border: 1px solid #000000; width: 100%; height: 500px; resize: vertical;"></textarea></td>
            <td><textarea id="resultContainer" rows="1" cols="1" style="border: 1px solid #000000; width: 100%; height: 500px; resize: vertical;"></textarea></td>
        </tr>
    </tbody>
</table>

結果
Continuously

由於 Web Speech Recognition API 暫時必須透過 Google Chrome 執行
Google Chrome 由 67 版本開始，只接受經由 HTTPS 安全認證的網站才可以存取硬件裝置
因此若閣下的網站沒有 HTTPS 安全認證便不能使用
折衷方法是由本機 (localhost) 起動網頁伺服器或使用 file:// 協定來迴避安全問題才能存取的限制
但 file:// 協定不能加入到允許網站列表，若要反覆測試，會比較麻煩

當第一次從某網站使用 Web Speech Recognition API 時，會出現安全警告，需要閣下允許該網站使用閣下的電腦的麥克風

不論封鎖或允許，網址列右邊會顯示硬件存取的圖示
如果需要改動該網站的存取權可以按圖示後再更改設定值

如果還有其他網站需要改動存取權，可以到網址列輸入 chrome://settings/content/microphone 更改其他設定值

除了自己編寫程式外，其實最簡單方便就是透過由 Google 提供的 Google 文件便可以輕鬆使用 Web Speech Recognition API

在 Google 文件展開工具後，選取語音輸入

開啟語音輸入後，會顯示語音輸入的圖示

選擇合適的言語

按下語音輸入的圖示後，同樣會需要同意允許使用者麥克風

Google 文件便可以通過閣下的電腦的麥克風語音輸入
正在言語辨識時，分頁會顯示紅色圓形的圖示

在下為了能讓語音輸入不斷執行，當 Web Speech Recognition API 自動停止時會自動再啟動
使用 Linux 版的 Chromium 發現不能連接到麥克風，而 Google Chrome 能連接
但 Linux 版 Google Chrome 由 49 開始只提供 64-bit
如果閣下的電腦不是 64-bit 架構，便暫時無法使用 Web Speech Recognition API

在測試時，發現 Android 的 Google Chrome 當 Web Speech Recognition API 運作時
回應的 result.isFinal 有時會傳回 undefined 或只會傳回 true
導致在 Android 的 Google Chrome 使用 Web Speech Recognition API 時不能產生互動效果，但仍能使用

另外，因為 Chrome for iOS 並不是真正的 Google Chrome
而是 UIWebView (47或之前) 或 WKWebView (48或以後) ，所以不能使用 Web Speech Recognition API

關於 Web Speech Recognition API 的詳細資料，可以到 https://w3c.github.io/speech-api/ 查看
關於 Chrome for iOS 旳詳細資料，可以到 https://blog.chromium.org/2016/01/a-faster-more-stable-chrome-on-ios.html 查看
關於 UIWebView 旳詳細資料，可以到 https://developer.apple.com/documentation/uikit/uiwebview 查看
關於 WKWebView 旳詳細資料，可以到 https://developer.apple.com/documentation/webkit/wkwebview 查看