TWI470387B

TWI470387B - Wireless voice control system

Info

Publication number: TWI470387B
Application number: TW101136364A
Authority: TW
Original assignee: Nat University Of Kaohsuing
Priority date: 2012-10-02
Filing date: 2012-10-02
Publication date: 2015-01-21
Also published as: TW201415182A

Description

Wireless voice control system

本發明是有關於一種控制系統，特別是指一種無線語音控制系統。The present invention relates to a control system, and more particularly to a wireless voice control system.

隨著科技的進步，使用者對於電子產品的依賴性越來越高，且和電子產品間的互動也愈趨頻繁，進而造就人機互動介面蓬勃發展。With the advancement of technology, users are becoming more and more dependent on electronic products, and the interaction with electronic products is becoming more frequent, which has led to the development of human-computer interaction interface.

人機互動技術大致可區分為兩種不同的態樣：接觸式與非接觸式。其中，在接觸式技術部分有為公眾所熟知的觸控技術，而在非接觸式技術部分則有語音辨識、手勢辨識技術。其中接觸式及非接觸式人機互動技術皆強調以人為中心，取代傳統習慣以鍵盤及滑鼠溝通的方式。Human-computer interaction technology can be roughly divided into two different aspects: contact and non-contact. Among them, there are touch technology known to the public in the contact technology part, and voice recognition and gesture recognition technology in the non-contact technology part. The contact and non-contact human-computer interaction technologies emphasize the human-centered approach, replacing traditional habits with keyboard and mouse communication.

然而，語音一直是人跟人之間最主要的溝通方式，以人機互動的角度來看，亦是最為方便也較為實用之輸入標準。其中語音辨識領域發展時間已歷經20~30幾年進化史，所包含的領域也非常廣泛，研究方式也非常多元。However, voice has always been the most important means of communication between people and people. From the perspective of human-computer interaction, it is also the most convenient and practical input standard. The development time of the speech recognition field has gone through 20 to 30 years of evolution history, and the fields involved are also very extensive, and the research methods are also very diverse.

依語音辨識技術的發展時間來看，最早的技術是動態時軸校正(Dynamic Time Warping,DTW)，係將語音以動態規劃演算法去和樣本語音做比較，以找出最相近的語音，其缺點在於運算量大而辨識時間較慢。後來，類神經網路(Artificial Neural Network,ANN)被提出來應用於語音辨識，由於類神經網路架構在訓練後不可任意變更，導致使用者之說話速度會影響到其語音辨識率。有鑑於此，針對上述傳統語音辨識技術之缺點，有必要尋求一解決之道。According to the development time of speech recognition technology, the earliest technology is Dynamic Time Warping (DTW), which compares speech with dynamic sample algorithm and sample speech to find the closest speech. The disadvantage is that the amount of calculation is large and the recognition time is slow. Later, the Artificial Neural Network (ANN) was proposed for speech recognition. Since the neural network architecture cannot be arbitrarily changed after training, the user's speaking speed will affect its speech recognition rate. In view of this, The shortcomings of the above traditional speech recognition technology are necessary to find a solution.

因此，本發明之目的，即在提供一種無線語音控制系統。Accordingly, it is an object of the present invention to provide a wireless voice control system.

於是，本發明無線語音控制系統，包含一語音控制裝置及一被動控制裝置。Thus, the wireless voice control system of the present invention comprises a voice control device and a passive control device.

該語音控制裝置包括一輸入模組、一語音辨識模組，及一第一通訊模組。該輸入模組用以供一使用者進行語音輸入，以產生一相對應的待辨識語音。該語音辨識模組耦接於該輸入模組，用以從該待辨識語音中辨識出一語音關鍵詞，以產生出一相對應的辨識結果。該第一通訊模組耦接於該語音辨識模組，用以將該辨識結果以無線的方式傳送出去。The voice control device includes an input module, a voice recognition module, and a first communication module. The input module is configured to provide a user with voice input to generate a corresponding voice to be recognized. The voice recognition module is coupled to the input module for recognizing a voice keyword from the to-be-identified voice to generate a corresponding recognition result. The first communication module is coupled to the voice recognition module for transmitting the identification result in a wireless manner.

該被動控制裝置包括一殼體、一第二通訊模組、一處理模組、一移動模組，及一感測模組。該第二通訊模組設於該殼體，用以接收由該第一通訊模組所傳送之辨識結果。該處理模組設於該殼體且耦接於該第二通訊模組，用以將該辨識結果轉換成一第一控制指令。該移動模組設於該殼體且耦接於該處理模組，用以依據該第一控制指令驅動該殼體移動。該感測模組設於該殼體且耦接於該處理模組，用以感測該殼體是否碰撞到周圍的障礙物，並在感測到該殼體碰撞到周圍的障礙物時，觸發該處理模組產生一第二控制指令，繼而該移動模組依據該第二控制指令驅動該殼體轉向或停止。The passive control device includes a housing, a second communication module, a processing module, a mobile module, and a sensing module. The second communication module is disposed in the housing for receiving the identification result transmitted by the first communication module. The processing module is disposed in the housing and coupled to the second communication module for converting the identification result into a first control command. The mobile module is disposed in the housing and coupled to the processing module for driving the housing to move according to the first control command. The sensing module is disposed on the housing and coupled to the processing module to sense whether the housing collides with an obstacle in the surrounding area, and when the housing is sensed to collide with an obstacle nearby, The processing module is triggered to generate a second control command, and then the mobile module drives the housing to turn or stop according to the second control command.

有關本發明之前述及其他技術內容、特點與功效，在以下配合參考圖式之一個較佳實施例的詳細說明中，將可清楚的呈現。The above and other technical contents, features and advantages of the present invention will be apparent from the following detailed description of the preferred embodiments.

參閱圖1，本發明無線語音控制系統之較佳實施例，包含一語音控制裝置1，及一被動控制裝置2。在本較佳實施例中，該語音控制裝置1藉由無線遙控的方式控制該被動控制裝置2，且該被動控制裝置2係為應用在智慧型家庭中之清潔機器人，但不限於此，亦可為遙控模型，如遙控汽車、遙控飛行器或遙控船等。Referring to FIG. 1, a preferred embodiment of the wireless voice control system of the present invention includes a voice control device 1, and a passive control device 2. In the preferred embodiment, the voice control device 1 controls the passive control device 2 by means of wireless remote control, and the passive control device 2 is a cleaning robot applied in a smart home, but is not limited thereto. Can be a remote control model, such as a remote control car, a remote control aircraft or a remote control boat.

該語音控制裝置1包括一輸入模組11、一語音辨識模組12、一第一通訊模組13，及一顯示模組14。The voice control device 1 includes an input module 11, a voice recognition module 12, a first communication module 13, and a display module 14.

該輸入模組11用以供一使用者進行語音輸入，以產生一相對應的待辨識語音。在本較佳實施例中，該輸入模組11例如可為將該使用者所輸入的類比語音訊號轉成數位語音訊號之麥克風，進而再經由語音端點偵測後所產生之該待識別語音。其中，該語音端點偵測方式係採用能量偵測法，其計算方式係為熟習此項技術者所熟知，故在此不加以贅述。The input module 11 is configured to allow a user to perform voice input to generate a corresponding voice to be recognized. In the preferred embodiment, the input module 11 can be, for example, a microphone that converts the analog voice signal input by the user into a digital voice signal, and then the voice to be recognized after being detected by the voice endpoint. . The voice endpoint detection method uses an energy detection method, and the calculation method is well known to those skilled in the art, and thus will not be described herein.

該語音辨識模組12耦接於該輸入模組11，用以從該待辨識語音中辨識出一語音關鍵詞，以產生出一相對應的辨識結果。在本較佳實施例中，該語音辨識模組12具有一雜訊消除單元121、一語音前處理單元122，及一特徵計算單元123。The voice recognition module 12 is coupled to the input module 11 for recognizing a voice keyword from the voice to be recognized to generate a corresponding recognition result. In the preferred embodiment, the voice recognition module 12 has a noise cancellation unit 121, a voice pre-processing unit 122, and a feature calculation unit 123.

在本較佳實施例中，該語音關鍵詞為用以使該被動控制裝置2執行相對應之動作，如“前進”、“後退”、“停止”、“左轉”、“右轉”等動作，但並不以此為限，該語音關鍵詞亦隨著該被動控制裝置2將來所能執行的動作增加而增加詞彙，且該顯示模組14用以顯示對應於該語音關鍵詞之該辨識結果。In the preferred embodiment, the voice keyword is used to enable the passive control device 2 to perform corresponding actions, such as "forward", "backward", "stop", "left turn", "right turn", etc. The action, but not limited thereto, the voice keyword also increases the vocabulary as the action that the passive control device 2 can perform in the future, and the display module 14 is configured to display the voice keyword corresponding to the voice keyword. Identify the results.

其中，該雜訊消除單元121用以依據經驗模態分解法(Empirical Mode Decomposition,EMD)將具有雜訊部分及使用者語音部分之待辨識語音，分解成分解後的待辨識語音，其中該分解後的待識別語音具有多組本質模態函數(Intrinsic Mode Functions,IMF)，其詳細實作方式係為熟習此項技術者所熟知，故不在此贅述。The noise cancellation unit 121 is configured to decompose the to-be-identified speech having the noise portion and the user voice portion into the decomposed speech to be recognized according to an Empirical Mode Decomposition (EMD), wherein the decomposition is performed. The latter speech to be recognized has a plurality of sets of Intrinsic Mode Functions (IMF), and the detailed implementation method is well known to those skilled in the art, and therefore will not be described here.

值得一提的是，在本較佳實施例中係藉由實數型基因演算法(Real Genetic Algorithm,GA)，計算出前五個本質模態函數(IMF1~IMF5)之最佳組合參數，並經由該等最佳組合參數將IMF1~IMF5加權加總後，產生出一重組後語音訊號，進而達到將該待識別語音中之多餘的雜訊部分分離之效果。It is worth mentioning that in the preferred embodiment, the optimal combination parameters of the first five essential modal functions (IMF1~IMF5) are calculated by the Real Genetic Algorithm (GA), and The optimal combination parameters add up the IMF1~IMF5 weights to generate a recombined voice signal, thereby achieving the effect of separating the excess noise components in the speech to be recognized.

其中，實數型基因演算法參數設定如下：1.每代染色體數目為16條；2.每條染色體有5個實數基因，分別對應至IMF1~IMF5；3.染色體存活率為0.5和突變率為0.05；以及4.交配方式為隨機配對，演化代數為500代。Among them, the real-type gene algorithm parameters are set as follows: 1. The number of chromosomes per generation is 16; 2. Each chromosome has 5 real numbers, corresponding to IMF1~IMF5; 3. The chromosome survival rate is 0.5 and the mutation rate is 0.05; and 4. The mating method is random pairing, and the evolutionary algebra is 500 generations.

同樣地，其詳細實作方式係為熟習此項技術者所熟知，故不在此贅述。Similarly, the detailed implementation is well known to those skilled in the art and will not be described herein.

其中，該語音前處理單元122係將該重組後語音訊號進行以下語音前處理程序，例如預強調(Pre-emphasis)、漢明窗(Hamming Window)、快速傅利葉轉換(Fast Fourier Transform,FFT)等等，並進而利用梅爾倒頻譜係數(Mel-Frequency Cepstral Coefficients,MFCCs)，從該重組後語音訊號之多個音框中進行特徵擷取，以產生對應於每一音框之特徵向量組，以執行後續語音辨識過程。The voice pre-processing unit 122 performs the following voice pre-processing procedures, such as pre-emphasis, Hamming Window, Fast Fourier Transform (FFT), etc. Etc. and further utilizing Mel-Frequency Cepstral Coefficients (MFCCs) to perform feature extraction from a plurality of sound boxes of the reconstructed speech signal to generate a feature vector group corresponding to each of the sound frames. To perform a subsequent speech recognition process.

附帶一提的是，在本發明之其他較佳實施例中，例如，適用於安靜無雜訊環境中，該語音辨識模組12亦可僅具有該語音前處理單元122及該特徵計算單元123。該語音前處理單元122係將該待識別語音同樣進行以下語音前處理程序，例如預強調、漢明窗、快速傅利葉轉換等等，並進而利用梅爾倒頻譜係數，從該待識別語音之多個音框中進行特徵擷取，以產生對應於每一音框之特徵向量組，並繼而和具有雜訊消除單元121之語音辨識模組12執行相同的後續語音辨識過程。其中，該特徵計算單元123係採用離散隱藏式馬可夫模型(Discrete Hidden Markov Models,DHMM)作為統計模型，從該等特徵向量組中辨識出該語音關鍵詞。It is to be noted that, in other preferred embodiments of the present invention, for example, in a quiet and noise-free environment, the voice recognition module 12 may only have the voice pre-processing unit 122 and the feature calculation unit 123. . The speech pre-processing unit 122 performs the following speech pre-processing procedures, such as pre-emphasis, Hamming window, fast Fourier transform, etc., and further utilizes the Mel cepstral coefficient, from the speech to be recognized. Feature capture is performed in each of the frames to generate a set of feature vectors corresponding to each of the frames, and then the same subsequent speech recognition process is performed with the speech recognition module 12 having the noise cancellation unit 121. The feature calculation unit 123 uses Discrete Hidden Markov Models (DHMM) as a statistical model, and recognizes the voice keyword from the feature vector groups.

此外，在執行語音辨識過程之前，須預先將離散隱藏式馬可夫模型進行訓練，而在本較佳實施例中，係由說話速度不等之8男、2女針對以下語音關鍵詞，“前進”、“ 後退”、“停止”、“左轉”、“右轉”分別錄製10次，所產生之500個無雜訊之語音檔案作為訓練語料。In addition, before performing the speech recognition process, the discrete hidden Markov model must be trained in advance, and in the preferred embodiment, 8 males and 2 females with different speaking speeds are directed to the following speech keywords, "forward" ," Backward, Stop, Left Turn, and Right Turn are recorded 10 times, and the 500 unvoiced voice files generated are used as training corpus.

其訓練步驟類似於語音前處理單元122所執行的語音前處理程序；首先，係將經由端點偵測後的訓練語料進行預強調、漢明窗、快速傅利葉轉換等過程；接著，進而利用梅爾倒頻譜係數取得訓練語料中之每一音框的特徵向量組；接著，藉由模糊向量量化(Fuzzy Vector Quantization,FVQ)結合維特比演算法(Viterbi Algorithm)，從該等特徵向量組中求出最佳化之觀察序列，並將該最佳化之觀察序列代入離散隱藏式馬可夫模型進行訓練，直到離散隱藏式馬可夫模型參數收斂後，繼而完成整個語音訓練過程。The training step is similar to the speech pre-processing program executed by the speech pre-processing unit 122; firstly, the training corpus after the endpoint detection is used to perform pre-emphasis, Hamming window, fast Fourier transform, etc.; The Mel Cepstral Coefficient obtains the feature vector set of each of the frames in the training corpus; and then, by Fuzzy Vector Quantization (FVQ) combined with the Viterbi Algorithm, from the feature vector groups The optimized observation sequence is obtained, and the optimized observation sequence is substituted into the discrete hidden Markov model for training until the discrete hidden Markov model parameters converge, and then the whole speech training process is completed.

在完成離散隱藏式馬可夫模型訓練之後，在本較佳實施例中，該特徵計算單元123係對該等特徵向量組進行模糊向量量化，並將該等量化後的特徵向量組帶入訓練後的離散隱藏式馬可夫模型，以辨識出該語音關鍵詞。After the discrete hidden Markov model training is completed, in the preferred embodiment, the feature computing unit 123 performs fuzzy vector quantization on the set of feature vectors, and brings the quantized feature vector groups into the trained A discrete hidden Markov model is used to identify the speech keyword.

因此，在經由加入電視環境、冷氣環境、廚房、人聲等不同背景雜訊後，同樣由說話速度不等之8男、2女每一人依序對該輸入模組11之麥克風針對相同語音關鍵詞輸入10次，使得對於每一語音關鍵詞則有10×10，一共100組試驗數據，最後將試驗數據平均後之語音辨識率及其比較彙整如下表一、二、三所示，其中表一係為該語音辨識模組12不具有該雜訊消除單元121的功能之實驗結果；表二係為該語音辨識模組12具有該雜訊消除單元121之實驗結果；表三則為表二相較於表一在語音辨識率上所提升的程度。Therefore, after adding different background noises such as TV environment, air-conditioning environment, kitchen, vocals, etc., 8 males and 2 females each having the same speaking speed are sequentially directed to the same voice keyword of the microphone of the input module 11 Input 10 times, so that for each voice keyword, there are 10×10, a total of 100 sets of test data. Finally, the average speech recognition rate after the test data and its comparison are shown in Tables 1, 2 and 3, Table 1 The experimental result of the voice recognition module 12 does not have the function of the noise cancellation unit 121; the second is that the voice recognition module 12 has the experimental result of the noise cancellation unit 121; and the third table is the second phase of the table. Compared with Table 1, the process of improving the speech recognition rate degree.

其中，由表三可以很明顯的看出，表二所呈現的具有雜訊消除功能之雜訊消除單元121在語音辨識率上明顯地勝出不具有雜訊消除功能之表一；而在具有較為安靜背景，如冷氣環境下，表二所呈獻出來的語音辨識率相較於表一提升的效果則較為不明顯。It can be clearly seen from Table 3 that the noise cancellation unit 121 with the noise cancellation function presented in Table 2 clearly wins the speech recognition rate without the noise cancellation function; In a quiet background, such as in an air-conditioned environment, the speech recognition rate presented in Table 2 is less obvious than that in Table 1.

該第一通訊模組13耦接於該語音辨識模組12，用以將該辨識結果以無線的方式傳送出去。The first communication module 13 is coupled to the voice recognition module 12 for transmitting the identification result in a wireless manner.

在本較佳實施例中，該第一通訊模組13為TWS-BS無線發射模組，其頻率為315MHz、工作電壓為3~12V，及傳輸距離為80m~120m，並以振幅調變(Amplitude-Shift Keying,ASK)方式將該辨識結果發射出去。其中，TWS-BS無線發射模組亦可搭配HT-12E編碼IC(圖未示)，而能在該辨識結果發射出去之前，預先將該辨識結果進行編碼。In the preferred embodiment, the first communication module 13 is a TWS-BS wireless transmission module having a frequency of 315 MHz, an operating voltage of 3 to 12 V, and a transmission distance of 80 m to 120 m, and is modulated by amplitude ( The Amplitude-Shift Keying, ASK) method transmits the identification result. The TWS-BS wireless transmitting module can also be combined with the HT-12E encoding IC (not shown), and the identification result can be encoded before the identification result is transmitted.

參閱圖1與圖2，該被動控制裝置2包括一殼體20、一第二通訊模組21、一處理模組22、一移動模組23、一感測模組24，及一充電模組25。Referring to FIG. 1 and FIG. 2 , the passive control device 2 includes a housing 20 , a second communication module 21 , a processing module 22 , a mobile module 23 , a sensing module 24 , and a charging module . 25.

該第二通訊模組21設於該殼體20，用以接收由該第一通訊模組13所傳送之辨識結果。The second communication module 21 is disposed in the housing 20 for receiving the identification result transmitted by the first communication module 13.

在本較佳實施例中，該第二通訊模組21為RWS-371無線接收模組，其接收感度為-117dBm，且工作電壓為3.5V~5.5V，並可接收由振幅調變傳送而來之辨識結果。其中，RWS-371無線接收模組亦可搭配HT-12D解碼IC(圖未示)，而能在接收該編碼後的辨識結果之後，將該編碼後的辨識結果進行解碼。In the preferred embodiment, the second communication module 21 is a RWS-371 wireless receiving module, and has a receiving sensitivity of -117 dBm, and an operating voltage of 3.5 V to 5.5 V, and can be received and transmitted by amplitude modulation. To identify the results. The RWS-371 wireless receiving module can also be combined with the HT-12D decoding IC (not shown), and can decode the encoded identification result after receiving the encoded identification result.

該處理模組22設於該殼體20且耦接於該第二通訊模組21，用以將該辨識結果轉換成一第一控制指令。The processing module 22 is disposed in the housing 20 and coupled to the second communication module 21 for converting the identification result into a first control command.

在本較佳實施例中，該處理模組22係ATMEL公司生產之型號為AT89S5X系列之微型控制器，如，AT89S51，但不限於此。In the preferred embodiment, the processing module 22 is a microcontroller of the AT89S5X series manufactured by ATMEL, such as AT89S51, but is not limited thereto.

該移動模組23設於該殼體20且耦接於該處理模組22，用以依據該第一控制指令驅動該殼體20移動。The mobile module 23 is disposed on the housing 20 and coupled to the processing module 22 for driving the housing 20 to move according to the first control command.

在本較佳實施例中，移動模組23具有一馬達231，但不以此為限，該馬達231數量可依照該被動控制裝置2實際之運作情況進行數量上之變動，故馬達231之數量亦可為大於一台。In the preferred embodiment, the mobile module 23 has a motor 231, but not limited thereto, the number of the motors 231 can be changed in quantity according to the actual operation of the passive control device 2, so the number of motors 231 Can also be more than one.

當移動模組23接收到該第一控制指令後，可透過該馬達231執行“前進”、“後退”、“停止”、“左轉”、“右轉”等動作。After the mobile module 23 receives the first control command, the motor 231 can perform actions such as "forward", "reverse", "stop", "left turn", "right turn", and the like.

該感測模組24設於該殼體20且耦接於該處理模組22，用以感測該殼體20是否碰撞到周圍的障礙物，並在感測到該殼體20碰撞到周圍的障礙物時，觸發該處理模組22產生一第二控制指令，繼而該移動模組23依據該第二控制指令驅動該殼體20轉向或停止。The sensing module 24 is disposed on the housing 20 and coupled to the processing module 22 for sensing whether the housing 20 collides with a surrounding obstacle and senses that the housing 20 collides with the surrounding When the obstacle is triggered, the processing module 22 is triggered to generate a second control command, and then the mobile module 23 is in accordance with the second control The command drives the housing 20 to turn or stop.

在本較佳實施例中，該感測模組24可為設置在殼體20周圍之紅外線感測器，且當該感測模組24感測到該殼體20碰撞到周圍障礙物時，該感測模組24藉由該第二通訊模組21將一碰撞訊息傳送至該第一通訊模組13，進而使該碰撞訊息顯示在該顯示模組14上。In the preferred embodiment, the sensing module 24 can be an infrared sensor disposed around the housing 20, and when the sensing module 24 senses that the housing 20 collides with a surrounding obstacle, The sensing module 24 transmits a collision message to the first communication module 13 via the second communication module 21, so that the collision message is displayed on the display module 14.

該充電模組25具有一偵測單元251及一充電電池單元252，該偵測單元251用以偵測該充電電池單元252之電池狀態，並藉由該第二通訊模組21將該電池狀態傳送至該語音控制裝置1之第一通訊模組13，進而使該電池狀態顯示在該顯示模組14上。The charging module 25 has a detecting unit 251 and a rechargeable battery unit 252. The detecting unit 251 is configured to detect the battery status of the rechargeable battery unit 252, and the battery status is controlled by the second communication module 21. The first communication module 13 is transmitted to the voice control device 1, and the battery state is displayed on the display module 14.

綜上所述，本發明可藉由該語音控制裝置1以無線語音遙控之方式去控制該被動控制裝置2，但由於控制效果取決於於語音之辨識結果，本發明以運算量較小之離散隱藏式馬可夫模型作為語音辨識模型，而離散隱藏式馬可夫模型係以統計方式來做語音辨識，能解決使用者說話速度的問題，還可以做到語意的辨識。此外，本發明在進行語音前處理及語音辨識之前還輔以該雜訊消除單元121，並能在不同雜訊背景中提升語音辨識率，故確實能達成本發明之目的。In summary, the present invention can control the passive control device 2 by means of the wireless voice remote control by the voice control device 1, but since the control effect depends on the recognition result of the voice, the present invention has a small amount of computational complexity. The hidden Markov model is used as the speech recognition model, while the discrete hidden Markov model is used for statistical recognition to solve speech problems. It can solve the problem of user's speaking speed and can also identify semantics. In addition, the present invention is supplemented by the noise canceling unit 121 before performing voice pre-processing and voice recognition, and can improve the voice recognition rate in different noise backgrounds, so that the object of the present invention can be achieved.

惟以上所述者，僅為本發明之較佳實施例而已，當不能以此限定本發明實施之範圍，即大凡依本發明申請專利範圍及發明說明內容所作之簡單的等效變化與修飾，皆仍屬本發明專利涵蓋之範圍內。The above is only the preferred embodiment of the present invention, and the scope of the invention is not limited thereto, that is, the simple equivalent changes and modifications made by the scope of the invention and the description of the invention are All remain within the scope of the invention patent.

1‧‧‧語音控制裝置1‧‧‧Voice control device

11‧‧‧輸入模組11‧‧‧Input module

12‧‧‧語音辨識模組12‧‧‧Voice recognition module

121‧‧‧雜訊消除單元121‧‧‧ Noise Elimination Unit

122‧‧‧語音前處理單元122‧‧‧Voice pre-processing unit

123‧‧‧特徵計算單元123‧‧‧Characteristic calculation unit

13‧‧‧第一通訊模組13‧‧‧First communication module

14‧‧‧顯示模組14‧‧‧Display module

2‧‧‧被動控制裝置2‧‧‧Passive control device

20‧‧‧殼體20‧‧‧shell

21‧‧‧第二通訊模組21‧‧‧Second communication module

22‧‧‧處理模組22‧‧‧Processing module

23‧‧‧移動模組23‧‧‧Mobile Module

231‧‧‧馬達231‧‧‧Motor

24‧‧‧感測模組24‧‧‧Sensing module

25‧‧‧充電模組25‧‧‧Charging module

251‧‧‧偵測單元251‧‧‧Detection unit

252‧‧‧充電電池單元252‧‧‧Rechargeable battery unit

圖1是一方塊圖，說明本發明無線語音控制系統之較佳實施例；以及圖2是一示意圖，說明在本較佳實施例中之一被動控制裝置的一殼體。1 is a block diagram showing a preferred embodiment of a wireless voice control system of the present invention; and FIG. 2 is a schematic diagram showing a housing of a passive control device in the preferred embodiment.

1‧‧‧語音控制裝置1‧‧‧Voice control device

11‧‧‧輸入模組11‧‧‧Input module

12‧‧‧語音辨識模組12‧‧‧Voice recognition module

121‧‧‧雜訊消除單元121‧‧‧ Noise Elimination Unit

122‧‧‧語音前處理單元122‧‧‧Voice pre-processing unit

123‧‧‧特徵計算單元123‧‧‧Characteristic calculation unit

13‧‧‧第一通訊模組13‧‧‧First communication module

14‧‧‧顯示模組14‧‧‧Display module

2‧‧‧被動控制裝置2‧‧‧Passive control device

21‧‧‧第二通訊模組21‧‧‧Second communication module

22‧‧‧處理模組22‧‧‧Processing module

23‧‧‧移動模組23‧‧‧Mobile Module

231‧‧‧馬達231‧‧‧Motor

24‧‧‧感測模組24‧‧‧Sensing module

25‧‧‧充電模組25‧‧‧Charging module

251‧‧‧偵測單元251‧‧‧Detection unit

252‧‧‧充電電池單元252‧‧‧Rechargeable battery unit

Claims

A wireless voice control system includes: a voice control device, including an input module, a voice recognition module, and a first communication module, wherein the input module is configured to provide a user with voice input to generate a And corresponding to the voice to be recognized, the voice recognition module is coupled to the input module, and configured to identify a voice keyword from the to-be-identified voice to generate a corresponding identification result, the first communication mode The group is coupled to the voice recognition module for transmitting the identification result in a wireless manner; and a passive control device includes a casing, a second communication module, a processing module, and a mobile module And a sensing module, the second communication module is disposed in the housing for receiving the identification result transmitted by the first communication module, the processing module is disposed in the housing and coupled to the The second communication module is configured to convert the identification result into a first control command. The mobile module is disposed in the housing and coupled to the processing module, and is configured to drive the housing to move according to the first control instruction. The sensing module is disposed on the shell And being coupled to the processing module, configured to sense whether the housing collides with an obstacle in the surrounding area, and trigger the processing module to generate a second control when sensing that the housing collides with an obstacle in the surrounding area Commanding, and then the mobile module drives the housing to turn or stop according to the second control command; wherein the voice recognition module has a voice pre-processing unit, and a feature computing unit, the voice pre-processing unit utilizes Meyer a cepstrum coefficient, performing feature extraction from a plurality of sound boxes of the to-be-identified speech to generate a feature vector group corresponding to each of the sound frames, and the feature calculation unit The speech keyword is identified from the set of feature vectors by using a discrete hidden Markov model.

The wireless voice control system according to claim 1, wherein the feature computing unit performs fuzzy vector quantization on the set of feature vectors, and brings the quantized feature vector groups into the discrete hidden after training. Markov model to identify the speech keyword.

The wireless voice control system according to claim 2, wherein the discrete hidden Markov model is trained by using fuzzy vector quantization combined with a Viterbi algorithm.

The wireless voice control system according to claim 1, wherein the voice recognition module has a noise cancellation unit, a voice pre-processing unit, and a feature calculation unit, and the noise cancellation unit is used according to experience. The modal decomposition method decomposes the to-be-recognized speech, and recombines the decomposed speech to be recognized by a genetic algorithm to generate a recombined speech signal, and the speech pre-processing unit utilizes the Mel cepstral coefficient. Feature capturing is performed in a plurality of sound boxes of the recombined voice signal to generate a feature vector group corresponding to each sound box, and the feature computing unit uses the discrete hidden Markov model to identify from the feature vector groups The voice keyword is output.

The wireless voice control system according to claim 4, wherein the feature calculation unit performs fuzzy vector quantization on the set of feature vectors, and brings the quantized feature vector groups into the discrete hidden after training. Markov model to identify the speech keyword.

According to the wireless voice control system described in claim 5, The discrete hidden Markov model is trained by using fuzzy vector quantization combined with Viterbi algorithm.

The wireless voice control system of claim 1, wherein the voice control device further includes a display module, and the passive control device further includes a charging module, the charging module has a detecting unit and a charging battery unit, the detecting unit is configured to detect a battery state of the rechargeable battery unit, and transmit the battery state to the first communication module of the voice control device by using the second communication module, thereby The battery status is displayed on the display module.

The wireless voice control system of claim 1, wherein the voice control device further comprises a display module, and the display module of the voice control device is configured to display the identification result.

The wireless voice control system of claim 1, wherein the mobile module of the passive control device has at least one motor.