TWM578858U

TWM578858U - Cross-channel artificial intelligence dialogue platform

Info

Publication number: TWM578858U
Application number: TW108201899U
Authority: TW
Inventors: 江哲宇
Original assignee: 華南商業銀行股份有限公司
Priority date: 2019-02-13
Filing date: 2019-02-13
Publication date: 2019-06-01

Abstract

跨通路人工智慧對話式平台包括三個內部伺服器。第一內部伺服器的語音輸入介面接收第一語音訊號。第二內部伺服器的個資隱藏模組將第一語音訊號中的個人資料刪除以產生第二語音訊號。第一內部伺服器的語音轉文字介面及語意辨識介面分別轉換第二語音訊號為第一文字訊號及根據第一文字訊號獲得意圖訊號。第三內部伺服器的對話模組及應用模組分別根據意圖訊號產生回覆訊號及控制指令。第一內部伺服器的文字轉語音介面將回覆訊號轉換為第三語音訊號。輸出介面輸出第三語音訊號及控制指令。The cross-channel AI dialogue platform includes three internal servers. The voice input interface of the first internal server receives the first voice signal. The personal hiding module of the second internal server deletes the personal data in the first voice signal to generate a second voice signal. The voice-to-text interface and the semantic interface of the first internal server respectively convert the second voice signal into a first text signal and obtain an intent signal according to the first text signal. The dialog module and the application module of the third internal server respectively generate a reply signal and a control command according to the intention signal. The text-to-speech interface of the first internal server converts the reply signal into a third voice signal. The output interface outputs a third voice signal and a control command.

Description

Cross-channel artificial intelligence dialogue platform

本創作係關於一種對話式平台及對話式平台的運作方法，特別是一種跨通路人工智慧對話式平台及其運作方法。This creation is about a dialogue platform and a dialogue platform, especially a cross-channel artificial intelligence dialogue platform and its operation method.

隨著各種數位行銷通路的普及，民眾遇到任何交易上面的問題，往往希望第一時間獲得回覆。With the popularity of various digital marketing channels, people encounter any problems in the transaction, and often hope to get a reply in the first time.

然而，對於提供客服的金融機構而言，提高客服人員數量將不可避免地導致人力成本大幅提高。此外，訓練一名優秀的客服人員需要一定時間，在突如其來的大量客戶湧入時，可從容應付各種客戶問題的客服人員經常應接不暇，到職不久的客服人員又未必能滿足客戶各式各樣的問題。因此，對於無法提供良好服務的金融機構，民眾的評價將顯著下降，連帶影響民眾對於該金融機構在其他項目的信任度與參與意願。However, for financial institutions that provide customer service, increasing the number of customer service personnel will inevitably lead to a significant increase in labor costs. In addition, it takes a certain amount of time to train a good customer service staff. When a sudden influx of a large number of customers comes in, the customer service personnel who can cope with various customer problems are often overwhelmed. The customer service staff who arrive late may not be able to satisfy all kinds of customers. problem. Therefore, for financial institutions that are unable to provide good services, the public's evaluation will be significantly reduced, which will affect the public's trust and willingness to participate in other projects of the financial institution.

有鑑於此，本創作提出一種跨通路人工智慧（Artificial Intelligence ，AI）對話式平台。所述的通路包括：數位通路、客服中心及營業單位等。透過導入語音辨識系統，人工智慧對話式系統與人工智慧對話式後台，結合新一代對話式人工智慧技術，包括：自然語言處理（Natural Language Processing，NLP）、動態學習機制、多輪情境對話設計及動態資訊收集機制等建立客戶對話分析後台，藉此提升數位通路之使用者體驗及市場影響力。In view of this, this creation proposes a cross-channel artificial intelligence (AI) conversational platform. The path includes: a digital channel, a customer service center, and a business unit. Through the introduction of speech recognition system, artificial intelligence dialogue system and artificial intelligence dialogue background, combined with a new generation of conversational artificial intelligence technology, including: Natural Language Processing (NLP), dynamic learning mechanism, multi-round situational dialogue design and The dynamic information collection mechanism establishes a customer dialogue analysis background to enhance the user experience and market influence of the digital channel.

依據本創作一實施例的一種跨通路人工智慧對話式平台，包括：一第一內部伺服器，包括：一語音輸入介面、一語音轉文字介面、一語意辨識介面、一文字轉語音介面及一輸出介面，其中該語音輸入介面用於接收一第一語音訊號；一第二內部伺服器，通訊連接該第一內部伺服器，包括：一客戶音訊資料庫，用以儲存複數個音訊檔，該些音訊檔之內容分別對應至複數個個人資料；及一個資隱藏模組，電性連接該客戶音訊資料庫，該個資隱藏模組用以將該第一語音訊號分割為複數個語音片段，且當該個資隱藏模組判斷該些語音片段中之任一者符合該些音訊檔中之任一者時，該個資隱藏模組從該第一語音訊號刪除該語音片段對應之音頻資訊，並將被刪除該語音片段之音訊資料的第一語音訊號作為一第二語音訊號回傳至該第一內部伺服器；其中該第一內部伺服器之該語音轉文字介面用以根據該第二語音訊號產生一第一文字訊號；該語意辨識介面用以根據該第一文字訊號產生一意圖訊號；以及一第三內部伺服器，通訊連接該第一內部伺服器，包括：一對話模組，用以選擇性地根據該意圖訊號產生一回覆訊號；及一應用模組，用以產生對應於該意圖分析訊號的一控制指令；其中該第一內部伺服器之該文字轉語音介面用以根據該回覆訊號產生一第三語音訊號；該第一內部伺服器之該輸出介面用以輸出該第三語音訊號及該控制指令。A cross-channel artificial intelligence dialogue platform according to an embodiment of the present invention comprises: a first internal server, comprising: a voice input interface, a voice-to-text interface, a semantic recognition interface, a text-to-speech interface, and an output The interface, wherein the voice input interface is configured to receive a first voice signal; a second internal server is connected to the first internal server, and includes: a client audio database for storing a plurality of audio files, The content of the audio file corresponds to a plurality of personal data; and a hidden module is electrically connected to the customer audio database, and the hidden module is configured to divide the first voice signal into a plurality of voice segments, and When the hidden module determines that any one of the voice segments meets any of the audio files, the hidden module deletes the audio information corresponding to the voice segment from the first voice signal. And returning, by the first voice signal, the first voice signal of the audio data of the voice segment to the first internal server; wherein the first internal The voice-to-text interface of the server is configured to generate a first text signal according to the second voice signal; the semantic interface is configured to generate an intent signal according to the first text signal; and a third internal server, the communication connection is An internal server includes: a dialog module for selectively generating a reply signal according to the intent signal; and an application module for generating a control command corresponding to the intent analysis signal; wherein the first The text-to-speech interface of the internal server is configured to generate a third voice signal according to the reply signal; the output interface of the first internal server is configured to output the third voice signal and the control command.

依據本創作一實施例的一種跨通路人工智慧對話式平台的運作方法，包括：以一第一內部伺服器之一語音輸入介面接收一第一語音訊號；以一第二內部伺服器之一客戶音訊資料庫儲存複數個音訊檔，該些音訊檔之內容分別對應至複數個個人資料，其中該第二內部伺服器通訊連接該第一內部伺服器；以該第二內部伺服器之一個資隱藏模組將該第一語音訊號分割為複數個語音片段，其中，該個資隱藏模組電性連接該客戶音訊資料庫，且當該個資隱藏模組判斷該些語音片段中之任一者符合該些音訊檔中之任一者時，以該個資隱藏模組從該第一語音訊號刪除該語音片段對應之音頻資訊；以該個資隱藏模組回傳一第二語音訊號至該第一內部伺服器之一語音轉文字介面，其中該第二語音訊號係刪除該語音片段之音訊資料的第一語音訊號；以該語音轉文字介面根據該第二語音訊號產生一第一文字訊號；以該第一內部伺服器之一語意辨識介面根據該第一文字訊號產生一意圖訊號；以一第三內部伺服器之一對話模組根據該意圖訊號產生一回覆訊號，其中該第三內部伺服器通訊連接該第一內部伺服器；以該第一內部伺服器之一文字轉語音介面根據該回覆訊號產生一第三語音訊號；以該三伺服器之一應用模組根據該意圖訊號產生一控制指令；以及以該第一內部伺服器之一輸出介面輸出該第三語音訊號及該控制指令。A method for operating a cross-channel artificial intelligence dialogue platform according to an embodiment of the present invention comprises: receiving a first voice signal by a voice input interface of a first internal server; and serving as a client of a second internal server The audio database stores a plurality of audio files, and the contents of the audio files respectively correspond to a plurality of personal data, wherein the second internal server communicates with the first internal server; and the second internal server is hidden by a second internal server The module divides the first voice signal into a plurality of voice segments, wherein the resource hiding module is electrically connected to the client audio database, and when the resource hiding module determines any one of the voice segments When the audio file is matched, the audio module deletes the audio information corresponding to the voice segment from the first voice signal; and the second voice signal is returned to the a voice-to-text interface of the first internal server, wherein the second voice signal is used to delete the first voice signal of the audio data of the voice segment; The second voice signal generates a first text signal; the semantic interface of the first internal server generates an intent signal according to the first text signal; and the dialog module of the third internal server generates the image according to the intent signal a reply signal, wherein the third internal server is communicatively coupled to the first internal server; and a text-to-speech interface of the first internal server generates a third voice signal according to the reply signal; and one of the three servers The application module generates a control command according to the intent signal; and outputs the third voice signal and the control command by using an output interface of the first internal server.

以上之關於本揭露內容之說明及以下之實施方式之說明係用以示範與解釋本創作之精神與原理，並且提供本創作之專利申請範圍更進一步之解釋。The above description of the disclosure and the following description of the embodiments are intended to illustrate and explain the spirit and principles of the present invention, and to provide further explanation of the scope of the patent application of the present invention.

以下在實施方式中詳細敘述本創作之詳細特徵以及優點，其內容足以使任何熟習相關技藝者了解本創作之技術內容並據以實施，且根據本說明書所揭露之內容、申請專利範圍及圖式，任何熟習相關技藝者可輕易地理解本創作相關之目的及優點。以下之實施例係進一步詳細說明本創作之觀點，但非以任何觀點限制本創作之範疇。The detailed features and advantages of the present invention are described in detail below in the embodiments, which are sufficient to enable any skilled artisan to understand the technical contents of the present invention and implement it according to the contents, the scope of the patent application and the drawings. Anyone familiar with the relevant art can easily understand the purpose and advantages of this creation. The following examples are intended to further illustrate the scope of this creation, but do not limit the scope of the creation in any way.

請參考圖1，其係繪示本創作之一實施例的跨通路人工智慧對話式平台的架構圖100。所述的跨通路人工智慧對話式平台100，包括第一內部伺服器2、第二內部伺服器4及第三內部伺服器6。如圖1所示，第二內部伺服器4及第三內部伺服器6分別通訊連接至第一內部伺服器2。另外，第一內部伺服器2中的元件各自與客戶端裝置91、第一外部伺服器93、第二外部伺服器95及第三外部伺服器97通訊連接。Please refer to FIG. 1 , which illustrates an architectural diagram 100 of a cross-channel artificial intelligence dialog platform of an embodiment of the present invention. The cross-channel artificial intelligence dialogue platform 100 includes a first internal server 2, a second internal server 4, and a third internal server 6. As shown in FIG. 1, the second internal server 4 and the third internal server 6 are respectively communicably connected to the first internal server 2. In addition, the components in the first internal server 2 are each in communication with the client device 91, the first external server 93, the second external server 95, and the third external server 97.

實務上，第一內部伺服器2、第二內部伺服器4及第三內部伺服器6例如係金融機構機房內配置的刀鋒伺服器(Blade Servers)、機架伺服器(Rack Servers)或直立式伺服器(Pedestal Servers)，本創作對於第一、第二及第三內部伺服器2、4及6的硬體類型不予限制。In practice, the first internal server 2, the second internal server 4, and the third internal server 6 are, for example, Blade Servers, Rack Servers, or uprights configured in a financial institution room. The server (Pedestal Servers), this creation does not limit the hardware types of the first, second and third internal servers 2, 4 and 6.

第一內部伺服器2、第二內部伺服器4及第三內部伺服器6各自具有記憶體以實現後文述及的各項功能。上述記憶體可以是例如隨機存取記憶體、唯讀記憶體或是快閃記憶體等。在一實施例中，第一內部伺服器2、第二內部伺服器4及第三內部伺服器6中更包括支援有線網路、無線網路、行動網路及/或無線通訊的通訊裝置。在一實施例中，第一內部伺服器2、第二內部伺服器4及第三內部伺服器6各自包括一處理電路，可執行後文述及的功能。處理電路例如係微控制器(microcontroller)、微處理器(microprocessor)、處理器(processor)、中央處理器(central processing unit，CPU)、數位訊號處理器(digital signal processor)、特殊應用積體電路(application specific integrated circuit，ASIC)、數位邏輯電路、現場可程式邏輯閘陣列(field programmable gate array，FPGA) 及/或其它具有運算處理功能的硬體元件，本創作對於處理電路之硬體類型不予限制。The first internal server 2, the second internal server 4, and the third internal server 6 each have a memory to implement various functions described later. The above memory may be, for example, a random access memory, a read only memory, or a flash memory. In an embodiment, the first internal server 2, the second internal server 4, and the third internal server 6 further include communication devices supporting wired networks, wireless networks, mobile networks, and/or wireless communications. In one embodiment, the first internal server 2, the second internal server 4, and the third internal server 6 each include a processing circuit that performs the functions described below. The processing circuit is, for example, a microcontroller, a microprocessor, a processor, a central processing unit (CPU), a digital signal processor, a special application integrated circuit. (application specific integrated circuit, ASIC), digital logic circuit, field programmable gate array (FPGA) and / or other hardware components with arithmetic processing functions, this creation does not deal with the hardware type of the circuit Limited.

請繼續參考圖1。第一內部伺服器2包括語音輸入介面21、語音轉文字介面23、語意辨識介面25、文字轉語音介面27及一輸出介面29。語音輸入介面21通訊連接至客戶端裝置91。所述的客戶端裝置91例如是使用者安裝有行動銀行App（應用程式）的智慧型手機、平板電腦，亦可以是智慧分行櫃台的智能音箱或是智能機器人等，對於使用者而言，其係與客戶端裝置91進行交談。實務上，使用者所發出的聲音將由客戶端裝置91的收音器（例如麥克風）據以產生第一語音訊號，再由客戶端裝置91的通訊元件發送此第一語音訊號至第一內部伺服器2的語音輸入介面21。簡言之，當使用者需要進行金融相關操作時，可直接對客戶端裝置91說話，產生第一語音訊號，然後此第一語音訊號將被送至語音輸入介面21進行處理。Please continue to refer to Figure 1. The first internal server 2 includes a voice input interface 21, a voice-to-text interface 23, a semantic recognition interface 25, a text-to-speech interface 27, and an output interface 29. The voice input interface 21 is communicatively coupled to the client device 91. The client device 91 is, for example, a smart phone or a tablet computer with a mobile banking app (application) installed by the user, or a smart speaker or a smart robot of a smart branch counter, etc., for the user, The conversation is made with the client device 91. In practice, the sound emitted by the user will be generated by the receiver (eg, microphone) of the client device 91 to generate the first voice signal, and then the first voice signal is sent by the communication component of the client device 91 to the first internal server. 2 voice input interface 21. In short, when the user needs to perform the financial related operation, the client device 91 can directly speak to generate the first voice signal, and then the first voice signal will be sent to the voice input interface 21 for processing.

請先參考圖1的第二內部伺服器4，其包括彼此電性連接的客戶音訊資料庫41及個資隱藏模組43。客戶音訊資料庫41儲存複數個音訊檔，該些音訊檔之內容分別對應至複數個個人資料。實務上，第二內部伺服器可更包括一動態資訊學習模組，該動態資訊學習模組例如預先以金融機構的人工客服錄音記錄作為訓練資料，並以機器學習的方式，從客服錄音記錄中自動辨別出屬於客戶個資的音訊片段，然後儲存這些音訊片段至客戶音訊資料庫41中。動態資訊學習模組更可以根據每次由語音輸入介面21獲得的第一語音訊號更新客戶音訊資料庫中的記錄，本創作對此不予限制。Please refer to the second internal server 4 of FIG. 1 , which includes a customer audio database 41 and a personal hiding module 43 electrically connected to each other. The customer audio database 41 stores a plurality of audio files, and the contents of the audio files correspond to a plurality of personal data. In practice, the second internal server may further include a dynamic information learning module, for example, the manual service recording record of the financial institution is used as the training data in advance, and is recorded by the machine learning method from the customer service recording record. The audio segments belonging to the customer's personal assets are automatically identified, and then the audio segments are stored in the customer audio library 41. The dynamic information learning module can further update the records in the customer audio database according to the first voice signal obtained by the voice input interface 21, which is not limited in this creation.

請繼續參考圖1的第二內部伺服器4。個資隱藏模組43電性連接客戶音訊資料庫41且通訊連接至第一內部伺服器2的語音轉文字介面23。個資隱藏模組43用以將第一語音訊號分割為複數個語音片段，且當個資隱藏模組43判斷這些語音片段中之任一者符合客戶資訊資料庫41中儲存的複數個音訊檔中之任一者時，個資隱藏模組43從第一語音訊號刪除該語音片段對應之音頻資訊，並將被刪除的語音片段之音訊資料的第一語音訊號作為第二語音訊號回傳至第一內部伺服器2的語音轉文字介面23。在個資隱藏模組43進行比對時，例如可採用模糊比對演算法。另外，當比對到的使用者個資被分割到多個語音片段時，個資隱藏模組43將這些帶有使用者個資的語音片段重組以擷取出屬於使用者個資的完整的音訊資料。透過上述個資隱藏模組43的處理機制，可以將屬於使用者個人的隱私資料限制在金融機構的機房所設置的第一內部伺服器2及第二內部伺服器4中，而在後續的語音辨識時，不致於將使用者的個資外洩到網路上。Please continue to refer to the second internal server 4 of FIG. The personal hiding module 43 is electrically connected to the customer audio database 41 and is communicatively coupled to the voice-to-text interface 23 of the first internal server 2. The personal hiding module 43 is configured to divide the first voice signal into a plurality of voice segments, and when the personal hiding module 43 determines that any of the voice segments meets the plurality of audio files stored in the customer information database 41 In either case, the privilege hiding module 43 deletes the audio information corresponding to the voice segment from the first voice signal, and transmits the first voice signal of the audio data of the deleted voice segment as the second voice signal to the second voice signal. The voice of the first internal server 2 is changed to the text interface 23. When the collocation module 43 performs the comparison, for example, a fuzzy comparison algorithm can be employed. In addition, when the compared user's personal resources are divided into a plurality of voice segments, the collateral hiding module 43 reorganizes the voice segments with the user's personal resources to extract the complete audio belonging to the user's personal resources. data. Through the processing mechanism of the above-mentioned privilege hiding module 43, the privacy data belonging to the user's individual can be restricted to the first internal server 2 and the second internal server 4 provided in the computer room of the financial institution, and the subsequent voice is When identifying, the user's personal resources will not be leaked to the Internet.

請參考圖1。第一內部伺服器2的語音轉文字介面23分別通訊連接第二內部伺服器4的個資隱藏模組43以及第一外部伺服器93，語意辨識介面25通訊連接至第二外部伺服器95。語音轉文字介面23根據第二語音訊號產生第一文字訊號，語意辨識介面25根據第一文字訊號產生一意圖訊號。換言之，語音轉文字介面23將包含使用者個資的聲音資料轉換為文字，語意辨識介面25在從文字中解讀出使用者的意圖。舉例來說，當第一文字訊號為：「我要轉帳一仟元」時，語意辨識模組25可從中獲知「使用者欲進行轉帳」，且「轉帳金額為一仟元」的這兩個意圖。實務上，語音轉文字介面23及語意辨識介面25例如係應用程式介面（Application Programming Interface，API），第一外部伺服器93例如係Google Cloud語音轉文字（speech-to-text，STT）外部伺服器。第二外部伺服器95例如係IBM華生（Watson）外部伺服器，可提供各項Watson認知運算服務，包括用以判斷客戶意圖的自然語言處理（NLP）的服務，可透過句型式（Pattern）機器學習機制，提高語意理解準確度。Please refer to Figure 1. The voice-to-text interface 23 of the first internal server 2 is respectively connected to the personal hiding module 43 of the second internal server 4 and the first external server 93, and the semantic interface 25 is communicatively coupled to the second external server 95. The voice-to-text interface 23 generates a first text signal according to the second voice signal, and the semantic recognition interface 25 generates an intent signal according to the first text signal. In other words, the voice-to-text interface 23 converts the voice data including the user's personal assets into characters, and the semantic recognition interface 25 interprets the user's intention from the text. For example, when the first text signal is: "I want to transfer one dollar", the semantic recognition module 25 can learn from the two intentions that "the user wants to transfer money" and the "transfer amount is one dollar" . In practice, the voice-to-text interface 23 and the semantic recognition interface 25 are, for example, an Application Programming Interface (API), and the first external server 93 is, for example, a Google Cloud voice-to-text (STT) external servo. Device. The second external server 95 is, for example, an IBM Watson external server, and provides various Watson cognitive computing services, including a natural language processing (NLP) service for determining customer intent, through a sentence pattern (Pattern). ) Machine learning mechanism to improve the accuracy of semantic understanding.

請參考圖1。第三內部伺服器6包括通訊連接至語意辨識介面25的對話模組61及應用模組63。對話模組61選擇性地根據意圖訊號產生回覆訊號。應用模組63可產生對應於意圖分析訊號的控制指令。實務上，第三內部伺服器6的對話模組61可透過機器學習模型，提供動態學習機制，因此可大幅提升維護效率。對話模組61更具有多輪情境對話設計。實務上，例如透過將待分析的人工客服記錄預先以Watson平臺訓練出對話分析模型，再將訓練出的對話分析模型儲存於對話模組61的資料庫中，藉此使對話模組61可提供互動情境式對話設計，並具備前後文（Context）連貫的功效。舉例來說，當使用者說出：「我要轉帳一仟元」時，對話模組61除了從語意辨識介面25獲得包含「使用者欲進行轉帳」且「轉帳金額為一仟元」這兩個意圖的意圖訊號之外，更能夠向使用者提出：『詢問使用者要轉帳對象的帳號』以及『詢問使用者要用來轉帳號的帳號』之類的回覆訊號，以便於在客戶端裝置91上運行的行動銀行App能收集足夠的資訊以完成後續的轉帳操作。此外，對話模組61更具有動態資訊收集機制，可以快速設定參數以快速部署，提高使用者體驗。當對話模組61無法辨識使用者的意圖訊號的時候，對話模組61可轉接至人工客服系統，由線上的客服人員回應使用者的問題。Please refer to Figure 1. The third internal server 6 includes a dialog module 61 and an application module 63 that are communicatively coupled to the semantic recognition interface 25. The dialog module 61 selectively generates a reply signal based on the intent signal. The application module 63 can generate control instructions corresponding to the intent analysis signals. In practice, the dialog module 61 of the third internal server 6 can provide a dynamic learning mechanism through the machine learning model, thereby greatly improving maintenance efficiency. The dialog module 61 has a multi-round situational dialogue design. In practice, for example, by training the manual customer service record to be analyzed in advance, the dialog analysis model is trained on the Watson platform, and the trained dialog analysis model is stored in the database of the dialog module 61, thereby enabling the dialog module 61 to provide Interactive contextual dialogue design with Context coherence. For example, when the user says "I want to transfer a dollar", the dialog module 61 obtains the "user wants to transfer" and the "transfer amount is one dollar" from the semantic recognition interface 25. In addition to the intentional intent signal, the user can also provide the user with a reply signal such as "inquiry to the account of the user to transfer the account" and "inquiry to the account that the user wants to transfer the account" to facilitate the client device. The Mobile Bank App running on 91 can collect enough information to complete subsequent transfer operations. In addition, the dialog module 61 has a dynamic information collection mechanism, which can quickly set parameters for rapid deployment and improve the user experience. When the dialog module 61 cannot recognize the user's intention signal, the dialog module 61 can be transferred to the manual customer service system, and the online customer service personnel can respond to the user's question.

請繼續參考圖1。第一內部伺服器2之文字轉語音介面27分別通訊連接第三內部伺服器6的對話模組61以及第三外部伺服器97。文字轉語音介面27用以根據回覆訊號產生第三語音訊號；換言之，將對話模組61產生的回覆轉換為使用者可以理解的電腦語音，然後再由第一內部伺服器2的輸出介面29輸出此第三語音訊號到客戶端裝置91，以便於客戶端裝置91的揚聲器或喇叭播放此第三語音訊號供使用者聆聽。實務上，第三外部伺服器97例如係工研院文字轉語音Web服務外部伺服器，可提供文字轉語音（Text To Speech，TTS）的網路服務（web service），伺服器提供 SOAP（Simple Object Access Protocol）協議的網路服務，將所輸入的文字轉換為語音進行輸出。須注意的是，雖本創作的第一、第二及第三外部伺服器93、95及97在上述實施例中係第一內部伺服器2透過網際網路連線到各家的雲端服務，然而在另一實施例中，上述的外部伺服器亦可由金融機構自行購置具有文字語音互轉功能以及語意理解功能的伺服器並設立於本地端的機房，本創作並不限制第一至第三外部伺服器93~97必須連線至雲端方可達到上述的功能。Please continue to refer to Figure 1. The text-to-speech interface 27 of the first internal server 2 is communicatively coupled to the dialog module 61 of the third internal server 6 and the third external server 97, respectively. The text-to-speech interface 27 is configured to generate a third voice signal according to the reply signal; in other words, the reply generated by the dialog module 61 is converted into a computer voice that the user can understand, and then output by the output interface 29 of the first internal server 2 The third voice signal is sent to the client device 91 so that the speaker or speaker of the client device 91 plays the third voice signal for the user to listen to. In practice, the third external server 97 is, for example, an external server of the Institute of Text-to-Speech Web Service, which can provide a Text To Speech (TTS) web service, and the server provides SOAP (Simple). Object Access Protocol) A network service that converts input text into speech for output. It should be noted that, although the first, second, and third external servers 93, 95, and 97 of the present invention are in the above embodiment, the first internal server 2 is connected to each cloud service through the Internet. However, in another embodiment, the external server may also purchase a server having a text-to-speech function and a semantic understanding function and set up the server at the local end, and the creation does not limit the first to third external portions. Servers 93~97 must be connected to the cloud to achieve the above functions.

請繼續參考圖1。第一內部伺服器2之輸出介面29通訊連接至第三內部伺服器6的應用模組63以及客戶端裝置91。輸出介面29除用以輸出第三語音訊號外，更輸出由應用模組63產生的控制指令。所述的控制指令例如係控制行動銀行App完成轉帳操作的指令。Please continue to refer to Figure 1. The output interface 29 of the first internal server 2 is communicatively coupled to the application module 63 of the third internal server 6 and the client device 91. The output interface 29 outputs a control command generated by the application module 63 in addition to outputting the third voice signal. The control command is, for example, an instruction to control the mobile banking App to complete the transfer operation.

根據上述的跨通路人工智慧對話式平台100的內容，實務上可根據需要，將跨通路人工智慧對話式平台100通訊連接至使用者的智慧型手機或是智慧分行櫃台的智能音箱。藉此，使用者得以透過和客戶端裝置91對話的方式完成所欲進行的金融交易操作項目。According to the content of the cross-channel artificial intelligence dialogue platform 100 described above, the cross-channel artificial intelligence dialogue platform 100 can be communicatively connected to the smart phone of the user or the smart speaker of the smart branch counter as needed. Thereby, the user can complete the financial transaction operation item to be performed through the dialogue with the client device 91.

請一併參考圖1及圖2。圖2係繪示本創作一實施例的跨通路人工智慧對話式平台的運作方法，適用於前述的跨通路人工智慧對話式平台100。請參考步驟S11，語音輸入介面21接收第一語音訊號。詳言之，客戶端裝置91以有線或無線通訊方式傳送使用者的語音，再由第一內部伺服器2的語音輸入介面21接收。請參考步驟S21，第二內部伺服器4的個資隱藏模組43將第一語音訊號分割為複數個語音片段。請參考步驟S23，個資隱藏模組43比對語音片段與客戶音訊資料庫41中的音訊檔。請參考步驟S25，個資隱藏模組43判斷任一語音片段是否符合任一音訊檔。如果符合，則繼續執行步驟S27。否則回到步驟S23。請參考步驟S27，個資隱藏模組43回傳第二語音訊號S27，所述的第二語音訊號係係刪除該語音片段之音訊資料（例如代表使用者個資的音訊的波形訊號）的第一語音訊號。請參考步驟S13，第一內部伺服器2的語音轉文字介面23根據第二語音訊號產生第一文字訊號，具體而言係將不包含使用者個人資料的音訊資料轉換為文字資料。請參考步驟S15，第一內部伺服器2的語意辨識介面25根據第一文字訊號產生意圖訊號S15，例如將文字訊號透過提供雲端服務的語意理解伺服器分析以得到使用者的意圖。請參考步驟S31，第三內部伺服器6的對話模組61根據該意圖訊號產生回覆訊號，換言之，對話模組61可從回覆訊號中辨別出使用者的意圖以給予個人化的服務或是回應。實務上，若是客戶的問題不在對話模組61的回應範圍，對話模組61可轉接到人工客服系統進行後續的客戶服務。請參考步驟S17，第一內部伺服器2的文字轉語音介面27根據回覆訊號產生第三語音訊號S17，其係將系統給予使用者的回覆轉換為使用者可聽到的電腦語音。請參考步驟S33，應用模組63根據意圖訊號產生控制指令S33。該控制指令用以在客戶端裝置上進行對應於使用者語音的操作。請參考步驟S35，輸出介面29輸出第三語音訊號及控制指令至客戶端裝置91，例如將系統回覆播放給使用者聆聽，實現與使用者的對話互動以取得使用者想要執行的操作所需要的其他參數，或是執行控制指令以完成使用者想要的金融交易操作項目。Please refer to Figure 1 and Figure 2 together. FIG. 2 is a diagram showing the operation method of the cross-channel artificial intelligence dialogue platform according to an embodiment of the present invention, which is applicable to the foregoing cross-channel artificial intelligence dialogue platform 100. Referring to step S11, the voice input interface 21 receives the first voice signal. In detail, the client device 91 transmits the voice of the user by wired or wireless communication, and is received by the voice input interface 21 of the first internal server 2. Referring to step S21, the personal hiding module 43 of the second internal server 4 divides the first voice signal into a plurality of voice segments. Referring to step S23, the personal hiding module 43 compares the audio segment with the audio file in the client audio library 41. Referring to step S25, the personal hiding module 43 determines whether any of the voice segments meet any of the audio files. If yes, proceed to step S27. Otherwise, it returns to step S23. Referring to step S27, the personal hiding module 43 returns a second voice signal S27, and the second voice signal is used to delete the audio data of the voice segment (for example, a waveform signal representing the audio of the user's personal resources). A voice signal. Referring to step S13, the voice-to-text interface 23 of the first internal server 2 generates a first text signal according to the second voice signal. Specifically, the audio data that does not include the user's personal data is converted into text data. Referring to step S15, the semantic recognition interface 25 of the first internal server 2 generates an intent signal S15 according to the first text signal, for example, the text signal is analyzed by the semantics of the cloud service to obtain the user's intention. Referring to step S31, the dialog module 61 of the third internal server 6 generates a reply signal according to the intent signal. In other words, the dialog module 61 can distinguish the user's intention from the reply signal to give a personalized service or response. . In practice, if the customer's problem is not within the response range of the dialog module 61, the dialog module 61 can be transferred to the manual customer service system for subsequent customer service. Referring to step S17, the text-to-speech interface 27 of the first internal server 2 generates a third voice signal S17 according to the reply signal, which converts the response given by the system to the user into a computer voice audible to the user. Referring to step S33, the application module 63 generates a control command S33 according to the intention signal. The control command is used to perform an operation corresponding to the user's voice on the client device. Referring to step S35, the output interface 29 outputs a third voice signal and a control command to the client device 91, for example, playing back the system to the user for listening, and implementing a dialogue with the user to obtain the operation that the user wants to perform. Other parameters, or execute control commands to complete the financial transaction operation items that the user wants.

綜合以上所述，本創作所揭露的跨通路人工智慧對話式平台藉由提供與使用者對話來完成金融交易操作的服務，使客戶感受到最佳的體驗與服務，並且可防止使用者的個人隱私資訊外洩到雲端，保護使用者個資安全。另外，藉由跨通路人工智慧對話式平台串接到行動銀行App、智慧分行櫃檯或是智慧個人理財服務，更可以減少金融機構額外聘雇與訓練可提供上述金融服務的人員所需耗費的人力與時間成本。In summary, the cross-channel AI dialogue platform disclosed in the present application provides a service for the financial transaction operation by providing a dialogue with the user, so that the customer feels the best experience and service, and can prevent the user's individual. Privacy information is leaked to the cloud to protect users' security. In addition, the cross-channel AI dialogue platform can be connected to the mobile banking app, the smart branch counter or the smart personal wealth management service, which can reduce the labor required for the financial institution to hire and train the personnel who can provide the above financial services. With time cost.

雖然本創作以前述之實施例揭露如上，然其並非用以限定本創作。在不脫離本創作之精神和範圍內，所為之更動與潤飾，均屬本創作之專利保護範圍。關於本創作所界定之保護範圍請參考所附之申請專利範圍。Although the present invention has been disclosed above in the foregoing embodiments, it is not intended to limit the present invention. The changes and refinements that are made without departing from the spirit and scope of this creation are within the scope of patent protection of this creation. Please refer to the attached patent application scope for the scope of protection defined by this creation.

100‧‧‧跨通路人工智慧對話式平台 2‧‧‧第一內部伺服器 21‧‧‧語音輸入介面 23‧‧‧語音轉文字介面 25‧‧‧語意辨識介面 27‧‧‧文字轉語音介面 29‧‧‧輸出介面 4‧‧‧第二內部伺服器 41‧‧‧客戶音訊資料庫 43‧‧‧個資隱藏模組 6‧‧‧第三內部伺服器 61‧‧‧對話模組 63‧‧‧應用模組 91‧‧‧客戶端裝置 93‧‧‧第一外部伺服器 95‧‧‧第二外部伺服器 97‧‧‧第三外部伺服器 S11~S35‧‧‧步驟 100‧‧‧cross-channel artificial intelligence dialogue platform 2‧‧‧First internal server 21‧‧‧Voice input interface 23‧‧‧Voice to text interface 25‧‧‧Speech Identification Interface 27‧‧‧Text-to-speech interface 29‧‧‧Output interface 4‧‧‧Second internal server 41‧‧‧Customer Audio Library 43‧‧‧ hidden modules 6‧‧‧ Third Internal Server 61‧‧‧Dialog Module 63‧‧‧Application Module 91‧‧‧Client device 93‧‧‧First external server 95‧‧‧Second external server 97‧‧‧ Third external server S11~S35‧‧‧Steps

圖1係依據本創作一實施例的跨通路人工智慧對話式平台所繪示的架構圖。圖2係依據本創作一實施例的跨通路人工智慧對話式平台的運作方法所繪示的流程圖。 1 is a block diagram of a cross-channel artificial intelligence dialog platform according to an embodiment of the present invention. 2 is a flow chart of a method for operating a cross-channel artificial intelligence dialogue platform according to an embodiment of the present invention.

Claims

A cross-channel artificial intelligence dialogue platform includes: a first internal server, comprising: a voice input interface, a voice-to-text interface, a semantic recognition interface, a text-to-speech interface, and an output interface, wherein the voice input interface The first internal server is configured to receive the first internal server, and the first internal server is connected to the first internal server, and includes: a customer audio database for storing a plurality of audio files, wherein the contents of the audio files respectively correspond to a plurality of personal data; and a hidden module electrically connected to the customer audio database, the hidden module is configured to divide the first voice signal into a plurality of voice segments, and when the hidden module is used When it is determined that any one of the voice segments meets any of the audio files, the hidden module deletes the audio information corresponding to the voice segment from the first voice signal, and the voice segment is deleted. The first voice signal of the audio data is transmitted back to the first internal server as a second voice signal; wherein the voice of the first internal server is converted to text The interface is configured to generate a first text signal according to the second voice signal; the semantic interface is configured to generate an intent signal according to the first text signal; and a third internal server to communicate with the first internal server, including: a dialog module for selectively generating a reply signal according to the intent signal; and an application module for generating a control command corresponding to the intent analysis signal; wherein the text of the first internal server is The voice interface is configured to generate a third voice signal according to the reply signal; the output interface of the first internal server is configured to output the third voice signal and the control command.

The cross-channel artificial intelligence dialogue platform according to claim 1, wherein the voice-to-text interface is connected to a Google Cloud voice-to-text external server, and the semantic recognition interface is connected to the IBM Watson external server, and The text-to-speech interface is connected to the external server of the ITRI text-to-speech web service.

The cross-channel artificial intelligence dialogue platform according to claim 1, wherein the dialog module is further connected to an online customer service system, and when the dialogue module cannot distinguish the intention signal, the dialogue module forwards the intention signal To the online customer service system.

The cross-channel artificial intelligence dialogue platform according to claim 1, wherein the second internal server further comprises a dynamic information learning module, wherein the dynamic information learning module is configured to record from a customer service and learn from a machine learning manner. The customer service recording records the audio files associated with the personal data and stores the audio files to the customer audio database.

The cross-channel artificial intelligence dialogue platform according to claim 1, wherein when the resource hiding module determines that the plurality of voice segments in the first voice signal respectively meet the plurality of audio files in the client audio database, The cryptographic module is further configured to reassemble the voice segments to retrieve a complete profile of the personal data.