RU2763691C1

RU2763691C1 - System and method for automating the processing of voice calls of customers to the support services of a company

Info

Publication number: RU2763691C1
Application number: RU2020132261A
Authority: RU
Inventors: Юрий Юрьевич Козин
Original assignee: Общество С Ограниченной Ответственностью "Колл Инсайт"
Priority date: 2020-09-30
Filing date: 2020-09-30
Publication date: 2021-12-30
Also published as: WO2022071826A1

Abstract

FIELD: computing technology.

SUBSTANCE: invention relates to a system and a method for automating the processing of voice calls of the customers to the support services of companies. The system comprises: an interaction server (CORE), an automated workstation (AWS) of the Operator, containing a web interface for processing the voice call of the client with preset response templates and providing playback of an audio segment (Voice Sample) to the Operator; an OSR AWS Configurator containing a web interface for setting the AWS of the operator and an OSR service module; a Semantic service configured to isolate keywords from the transcribed text based on a preset grammar; a Logger service configured to log the results of recognition of voice calls, clients, and isolated semantic tags; a statistics service (Statistics) configured to save the information on all stages of the dialogue; an AWS of the Monitoring Specialist, containing a web interface for viewing reports on the operation of the system and monitoring the correctness of recognition of voice calls of the customers.

EFFECT: dynamic selection of the method for processing the voice call of a client depending on the transmitted parameters of the voice segment.

18 cl, 8 dwg

Description

ОБЛАСТЬ ТЕХНИКИFIELD OF TECHNOLOGY

Настоящее техническое решение относится к области вычислительной техники, в частности к системе и способу автоматизации обработки голосовых обращений клиентов в сервисные службы компании.This technical solution relates to the field of computer technology, in particular to a system and method for automating the processing of customer voice requests to the company's service departments.

УРОВЕНЬ ТЕХНИКИBACKGROUND OF THE INVENTION

Высокий уровень конкуренции и нестабильная экономическая ситуация в мире требуют от компаний непрерывной оптимизации своих расходов, одной из ключевых статей традиционно являются затраты на сервисное обслуживание клиентов. Это влечет за собой необходимость снижения себестоимости обслуживания контакта клиента при сохранении заданного уровня качества, что в свою очередь формирует потребность в автоматизации указанных процессов.The high level of competition and the unstable economic situation in the world require companies to continuously optimize their costs, one of the key items is traditionally the cost of customer service. This entails the need to reduce the cost of servicing a client contact while maintaining a given level of quality, which in turn creates a need for automation of these processes.

Мировые тенденции к развитию голосовых сервисов и совершенствование современных средств голосовых каналов (протоколы голосовой связи через Интернет VoIP и других технологий передачи голосовой информации) и технологий анализа и обработки голосовой информации приводят к росту числа систем автоматизации обработки голосовых запросов клиентов и, как следствие, рост числа затрат компаний на их разработку.Global trends in the development of voice services and the improvement of modern means of voice channels (voice communication protocols over the Internet VoIP and other technologies for transmitting voice information) and technologies for analyzing and processing voice information lead to an increase in the number of automation systems for processing customer voice requests and, as a result, an increase in the number of companies' development costs.

Из уровня техники известно значительное количество систем автоматизации голосовых обращений клиентов, в части такие решения описаны в заявках: US2013246053A1, опубл. 19.09.2013; US2011010173A1, опубл. 13.01.2011. A significant number of customer voice automation systems are known from the prior art, in part such solutions are described in applications: US2013246053A1, publ. 09/19/2013; US2011010173A1, publ. 01/13/2011.

В известных из уровня техники решениях используется следующая система объектов:The solutions known from the prior art use the following system of objects:

- Сообщение системы: вопрос, приглашение или какой-то иной звуковой сигнал системы клиенту. Например, вопрос «Чем я могу помочь Вам?»;- System message: question, invitation or some other system sound signal to the client. For example, the question "How can I help you?";

- Реплика/реакция клиента: Ответ, команда, вопрос, уточнение или другой ответ клиента. Например, «Скажи мне баланс по моей карте»;- Customer response/reaction: A response, command, question, clarification, or other response from the customer. For example, "Tell me the balance on my card";

- Диалог между клиентом и системой – это последовательность сообщений системы и реплики/реакции клиента. Диалог может иметь целью задать вопрос клиенту с целью получения от него информации или предоставить информацию по запросу клиента;- The dialogue between the client and the system is a sequence of messages from the system and the responses/reactions of the client. The dialogue may aim to ask a question to the client in order to obtain information from him or to provide information at the request of the client;

- Коммуникация – это непрерывная последовательность диалогов одного клиента с системой. Диалоги могут быть как логически связанны между собой, так и содержать разные тематические блоки.- Communication is a continuous sequence of dialogues of one client with the system. Dialogues can be both logically interconnected and contain different thematic blocks.

Ключевой показатель качества автоматических и автоматизированных систем обслуживания голосовых запросов клиентов является точность распознавания ответа клиента в каждом диалоге и коммуникации в целом. Точность распознавания в коммуникации (P(right)) можно измерить как отношение числа правильно распознанных ответов клиента во всей коммуникации (Nright) к общему числу ответов клиента в этой коммуникации (N). The key indicator of the quality of automatic and automated systems for servicing customer voice requests is the accuracy of recognizing the customer's response in each dialogue and communication in general. Recognition accuracy in a communication (P(right)) can be measured as the ratio of the number of correctly recognized client responses in the entire communication (Nright) to the total number of client responses in that communication (N).

P(right) = Nright/N*100%.P(right) = Nright/N*100%.

Точность распознавания зависит от двух составляющих: точность перевода звука в текст и точность выявления смысла из распознанного текста.Recognition accuracy depends on two components: the accuracy of translating sound into text and the accuracy of identifying meaning from the recognized text.

Точность выявления смысла из распознанного текста связана с количеством слов в ответе клиента и синтаксической сложностью предложения клиента. Универсальных методик для оценки качества данного показателя на текущий момент не существует. The accuracy of identifying meaning from the recognized text is related to the number of words in the client's response and the syntactic complexity of the client's sentence. There are currently no universal methods for assessing the quality of this indicator.

Данные ограничения приводят к невозможности достижения 100% уровня вероятности распознавания ответов клиента в автоматических и автоматизированных системах и возможности полноценного конкурирования с качеством распознавания речи человеком. Причем, чем более сложен ответ клиента, тем больше вероятность ошибки в распознавании ответа системой. These limitations lead to the impossibility of achieving a 100% probability level of recognition of client responses in automatic and automated systems and the possibility of full competition with the quality of human speech recognition. Moreover, the more complex the client's response, the greater the likelihood of an error in recognizing the response by the system.

Это приводит к следующим основным недостаткам имеющихся в настоящее время систем:This leads to the following main disadvantages of currently available systems:

1. Сервис всегда должен учитывать, что могла произойти ошибка при распознавании, так как точность распознавания речи менее 100%, а в среднем составляет от 80 до 93%. Для решения данной проблемы существующие системы включают шаг уточнения (переспрашивания) информации у клиента, что приводит к удлинению времени обслуживания, росту негативного отношения клиента к системе и, как следствие, прекращению взаимодействия с системой (разрывает соединения, бросает трубку и т.д.).1. The service should always take into account that an error could occur during recognition, since the accuracy of speech recognition is less than 100%, and on average it ranges from 80 to 93%. To solve this problem, existing systems include a step of clarifying (requesting) information from the client, which leads to a lengthening of the service time, an increase in the client’s negative attitude towards the system and, as a result, termination of interaction with the system (breaks connections, hangs up, etc.) .

2. Упрощение диалогов между клиентом и системой. Как следствие, увеличивается количество диалогов в рамках коммуникации из-за того, что система задает более простые вопросы и получает информацию от клиента постепенно. Это также приводит к удлинению времени обслуживания и увеличивает вероятность неверного распознавания ответов клиента. 2. Simplification of dialogues between the client and the system. As a result, the number of dialogues within communication increases due to the fact that the system asks simpler questions and receives information from the client gradually. This also leads to longer service times and increases the likelihood of misidentification of customer responses.

3. Ограничение списка тематик и голосовых сервисов, которые могут быть автоматизированы. Это приводит к отходу от концепции «человеческого» общения клиента и системы, к решению задачи снижения вероятности некорректного распознавания или некорректного выделения смысла.3. Limiting the list of topics and voice services that can be automated. This leads to a departure from the concept of "human" communication between the client and the system, to the solution of the problem of reducing the probability of incorrect recognition or incorrect extraction of meaning.

Перечисленные выше ограничения негативно влияют на ключевые показатели качества работы автоматических и автоматизированных систем и компании в целом:The restrictions listed above negatively affect the key performance indicators of automatic and automated systems and the company as a whole:

- общее снижение качества корректно обработанных голосовых запросов клиентов за счет исключения человека из системы;- a general decrease in the quality of correctly processed customer voice requests due to the exclusion of a person from the system;

- снижение процента автоматизации, который измеряется, как отношение количества клиентов, получивших обслуживание в системе, к общему количеству клиентов, обратившихся в систему; - decrease in the percentage of automation, which is measured as the ratio of the number of customers who received service in the system to the total number of customers who applied to the system;

- рост расходов на обслуживание голосовых запросов клиентов за счет удлинения времени обслуживания;- growth in the cost of servicing customer voice requests due to the lengthening of the service time;

- снижение возможностей развития и самообучения систем из-за ограничения автоматизированного списка тематик и голосовых сервисов. - reduced opportunities for development and self-learning of systems due to the limitation of the automated list of topics and voice services.

СУЩНОСТЬ ИЗОБРЕТЕНИЯSUMMARY OF THE INVENTION

Технической проблемой, на решение которой направлено заявленное техническое решение, является создание системы, способа и машиночитаемого носителя автоматизации обработки голосовых обращений клиентов в сервисные службы компании, которые охарактеризованы в независимых пунктах формулы. Дополнительные варианты реализации настоящего изобретения представлены в зависимых пунктах изобретения.The technical problem to be solved by the claimed technical solution is the creation of a system, method and computer-readable medium for automating the processing of customer voice requests to the company's service departments, which are described in independent claims. Additional embodiments of the present invention are presented in dependent claims.

Технический результат заключается в динамическом выборе способа обработки голосового обращения клиента в зависимости от переданных параметров голосового отрезка (voice sample), качества автоматического распознавания речи и доступности оператора. Указанный технический результат достигается за счет функции маршрутизации звонков между системой ASR и модулем службы OSR, учитывающей такие параметры, как качество распознавания речи, цену/критичность ошибки в бизнес-процессе заказчика, доступность оператора, категорию клиента и т.д.EFFECT: dynamic selection of a method for processing a client's voice request, depending on the transmitted parameters of the voice segment (voice sample), the quality of automatic speech recognition and the availability of the operator. The specified technical result is achieved due to the function of routing calls between the ASR system and the OSR service module, taking into account such parameters as the quality of speech recognition, the price/criticality of an error in the customer's business process, the availability of the operator, the category of the client, etc.

В предпочтительном варианте реализации заявлена система автоматизации голосовых обращений клиентов в сервисные службы компании, содержащая:In a preferred embodiment, a system for automating customer voice calls to the company's service departments is claimed, containing:

сервер взаимодействия (CORE), обеспечивающий:interaction server (CORE), providing:

• взаимодействие с сервером Голосовых приложений посредством Voice XML интерпретатора и MRCP клиента;• interaction with the Voice Applications server via Voice XML interpreter and MRCP client;

• получение от него аудиопотока/аудиофайла для транскрибирования;• receiving from it an audio stream/audio file for transcription;

• осуществляет выбор способа обработки - системой автоматического распознавания речи (ASR) или модулем службы оператора (OSR) аудиопотока/аудиофайла в соответствии с переданными настройками;• selects the method of processing - by the automatic speech recognition system (ASR) or the operator service module (OSR) of the audio stream/audio file in accordance with the transferred settings;

• маршрутизацию аудиопотока/аудиофайла последовательно в систему ASR;• audio stream/audio file routing sequentially to the ASR system;

• обработку ответа от системы ASR и проверку уровня доверия к транскрибированному тексту, при уровне доверия выше минимально установленного осуществляет передачу текста в службу Sematic для выделения семантических тэгов, при уровне доверия ниже минимально установленного осуществляет маршрутизацию обращения в модуль службы OSR;• processing the response from the ASR system and checking the level of trust in the transcribed text, if the level of trust is higher than the minimum set, it transfers the text to the Sematic service to extract semantic tags, if the level of trust is lower than the minimum set, it routes the call to the OSR service module;

• получение от модуля службы OSR массива транскрибированного текста и семантических тэгов;• receiving an array of transcribed text and semantic tags from the OSR service module;

• передачу результатов распознавания голосового обращения клиента и выделенных семантических тэгов в сервер Голосовых приложений;• transferring the results of recognition of the client's voice request and selected semantic tags to the Voice Applications server;

сервер OSR, обеспечивающий:OSR server providing:

• маршрутизацию обращений клиентов в АРМ Оператора;• routing of customer requests to the Operator's workstation;

• передачу результатов обработки обращений в сервер взаимодействия;• transferring the results of requests processing to the interaction server;

• регистрацию и выбор оператора для обработки обращения; • registration and selection of an operator to process the request;

АРМ Оператора, содержащий web-интерфейс для обработки голосового обращения клиента с преднастроенными шаблонами ответов и обеспечивающий проигрывание Оператору звукового отрывка (Voice Sample);Operator's workstation containing a web-interface for processing a client's voice request with pre-configured response templates and providing playback of a sound fragment (Voice Sample) to the Operator;

АРМ Конфигуратор OSR, содержащий web-интерфейс для настройки АРМ оператора и модуля службы OSR;Workstation OSR Configurator containing a web interface for configuring the operator's workstation and the OSR service module;

службу Semantic, выполняющую выделение ключевых слов из транскрибированного текста по заданной грамматике, переданной сервером взаимодействия (CORE), на основе настроенной статистической модели;a Semantic service that extracts keywords from the transcribed text according to a given grammar transmitted by the interaction server (CORE), based on the configured statistical model;

службу Logger, осуществляющую логирование результатов распознавания голосовых обращений, клиентов и выделенных семантических тэгов;the Logger service, which logs the results of recognition of voice calls, clients, and selected semantic tags;

службу статистики (Statistics), осуществляющую сохранение информации обо всех стадиях диалога: statistics service (Statistics), which saves information about all stages of the dialogue:

дату и время начала сессии;

date and time of the beginning of the session;

дату и время окончания сессии;

date and time of the end of the session;

URL аудиопотока/аудиофайла;

URL of the audio stream/audio file;

настройки статистики сервера Голосовых приложений, для дальнейшего использования в АРМ Статистика;

settings for statistics of the Voice Applications server for further use in AWP Statistics;

АРМ Специалиста по мониторингу, содержащий web-интерфейс для просмотра отчетов по работе системы и контроля корректности распознавания голосовых обращений клиентов.AWP of the Monitoring Specialist, containing a web-interface for viewing reports on the operation of the system and monitoring the correctness of recognition of voice requests from clients.

В частном варианте сервер взаимодействия (CORE) в зависимости от уровня критичности диалога производит маршрутизацию функции распознавания обращения клиента в модуль службы OSR без предварительного обращения в систему ASR, в которой производят прослушивание переданного звукового отрезка (VS) и отмечают выбор правильного варианта распознавания текста, после чего модуль службы OSR возвращает в сервер взаимодействия (CORE) массив транскрибированного текста и семантические тэги.In a particular version, the interaction server (CORE), depending on the level of criticality of the dialogue, routes the function of recognizing the client's request to the OSR service module without first contacting the ASR system, in which the transmitted audio segment (VS) is listened to and the choice of the correct text recognition option is noted, after which the OSR service module returns to the interaction server (CORE) an array of transcribed text and semantic tags.

В другом частном варианте сервер взаимодействия (CORE) осуществляет маршрутизацию аудиопотока/аудиофайла только в систему ASR, обработку ответа от системы ASR и проверку уровня доверия к транскрибированному тексту, маршрутизацию в службу Sematic для выделения семантических тэгов при уровне доверия выше минимально установленного, формирование отрицательного ответа при уровне доверия ниже минимально установленного, передачу результатов распознавания голосового обращения клиента и выделенных семантических тэгов в сервер Голосовых приложений.In another particular version, the interaction server (CORE) routes the audio stream/audio file only to the ASR system, processes the response from the ASR system and checks the level of trust in the transcribed text, routes to the Sematic service to extract semantic tags at a trust level above the minimum set, and generates a negative response. if the trust level is lower than the minimum set, transferring the results of recognition of the client's voice request and selected semantic tags to the Voice Applications server.

В другом частном варианте сервер взаимодействия (CORE) вначале отправляет звуковой отрезок (VS) в систему ASR, а после получения результатов автоматического распознавания в модуль службы OSR, где производят прослушивание звукового отрезка (VS) и проверяют/дополняют результаты автоматического распознавания речи клиента, в зависимости от качества автоматического распознавания подтверждают данные системы ASR, или вносят соответствующие корректировки, после чего модуль службы OSR возвращает в сервер взаимодействия (CORE) массив транскрибированного текста и семантические тэги.In another particular variant, the interaction server (CORE) first sends the audio segment (VS) to the ASR system, and after receiving the results of automatic recognition to the OSR service module, where they listen to the audio segment (VS) and check / supplement the results of the client’s automatic speech recognition, in depending on the quality of automatic recognition, confirm the data of the ASR system, or make appropriate adjustments, after which the OSR service module returns to the interaction server (CORE) an array of transcribed text and semantic tags.

В другом частном варианте сервер взаимодействия (CORE) одновременно отправляет звуковой отрезок (VS) и в систему ASR, и в модуль службы OSR, в случае, если первый по времени приходит ответ от модуля службы OSR, то в сервер Голосовых приложений передаются результат распознавания текста и семантические теги от модуля службы OSR, в случае, если первый по времени приходит ответ от системы ASR, то дополнительно проверяется вероятность распознавания текста, если она больше заданного уровня в системе, то сервер взаимодействия (CORE) передает в сервер Голосовых приложений результат автоматического распознавания системой ASR, если уровень доверия менее заданного уровня в сервере взаимодействия (CORE), то ожидается ответ от модуля службы OSR.In another private variant, the interaction server (CORE) simultaneously sends an audio segment (VS) to both the ASR system and the OSR service module, if the first response comes from the OSR service module, then the text recognition result is transmitted to the Voice Application Server and semantic tags from the OSR service module, if the first response comes from the ASR system, then the probability of text recognition is additionally checked, if it is greater than the specified level in the system, then the interaction server (CORE) sends the result of automatic recognition to the Voice Applications server by the ASR system, if the trust level is less than the specified level in the interaction server (CORE), then a response from the OSR service module is expected.

В другом частном варианте после обработки речи клиента и выделения семантических тегов, сервер взаимодействия (CORE) осуществляет обращение через терминал клиента в ИТ-системы заказчика и получает текст для синтеза речи, далее производит обращение с полученным текстом в систему синтеза речи (TTS) и возвращает в терминал клиента аудиофайл с синтезированным сообщением по запрошенной клиентом информации.In another particular variant, after processing the client's speech and extracting semantic tags, the interaction server (CORE) makes an appeal through the client's terminal to the customer's IT systems and receives the text for speech synthesis, then handles the received text to the speech synthesis system (TTS) and returns to the client terminal an audio file with a synthesized message according to the information requested by the client.

Заявленное решение также осуществляется за счет способа автоматизации голосовых обращений клиентов в сервисные службы компании, содержащий этапы, на которых:The claimed solution is also carried out by means of a method for automating the voice calls of customers to the company's service departments, which contains the steps at which:

устанавливают соединение с помощью терминала клиента по протоколу управления медиа-ресурсами (MRCP) c сервером Голосовых приложений и отправляют запрос, содержащий идентификатор (ID) диалога и аудиопоток;establish a connection using the client terminal via the Media Resource Control Protocol (MRCP) with the Voice Applications server and send a request containing the identifier (ID) of the dialogue and the audio stream;

осуществляют с помощью сервера Голосовых приложений предварительную обработку вызова, определяют начало речи с помощью функции Voice Activity Detection (VAD) и таймаутов; perform pre-processing of the call using the Voice Applications server, determine the beginning of speech using the Voice Activity Detection (VAD) function and timeouts;

осуществляют передачу ID-диалога и уникальный указатель ресурса (URL) на аудиопоток/аудиофайл (VS) в сервер взаимодействия (CORE), а также обеспечивают взаимодействие с системами Заказчика;transfer the ID-dialog and a unique resource pointer (URL) to the audio stream/audio file (VS) to the interaction server (CORE), and also provide interaction with the Customer's systems;

принимают с помощью сервера взаимодействия (CORE) из терминала клиента ID-диалога, уникальный указатель ресурса (URL) на аудиопоток/аудиофайл и передают в систему автоматического распознавания речи (ASR) аудиопоток/аудиофайл и настройки распознавания текста;receiving, via the interaction server (CORE) from the client terminal, a dialog ID, a unique resource pointer (URL) to the audio stream/audio file, and transmitting the audio stream/audio file and text recognition settings to the automatic speech recognition (ASR) system;

осуществляют транскрибирование и оценку вероятности правильного распознавания звука с помощью системы ASR;transcribing and assessing the probability of correct sound recognition using the ASR system;

возвращают с помощью системы ASR в сервер взаимодействия (CORE) массив транскрибированного текста и уровень доверия распознавания звука;returning, by means of the ASR system, to the interaction server (CORE) an array of transcribed text and a sound recognition confidence level;

оценивают с помощью сервера взаимодействия (CORE) уровень доверия к распознаванию голосового отрезка (VS);evaluate using the interaction server (CORE) the level of confidence in the recognition of the vocal segment (VS);

при уровне доверия выше минимально установленного осуществляют передачу текста и требуемую грамматику в службу Sematic для выделения семантических тэгов;at a trust level above the minimum set, the text and the required grammar are transferred to the Sematic service to extract semantic tags;

выделяют службой Semantic из переданного текста по указанной грамматике семантические тэги;allocate semantic tags from the transferred text according to the specified grammar by the Semantic service;

при уровне доверия ниже минимально установленного осуществляется маршрутизацию обращения в модуль службы OSR, if the trust level is lower than the minimum set, the call is routed to the OSR service module,

производят с помощью модуля службы OSR прослушивание звукового отрезка (VS) и фиксируют выбор правильного варианта распознавания текста;using the OSR service module, listening to the audio segment (VS) and fixing the choice of the correct text recognition option;

возвращают с помощью модуля службы OSR в сервер взаимодействия (CORE) массив транскрибированного текста и семантические тэги;using the OSR service module, returning to the interaction server (CORE) an array of transcribed text and semantic tags;

передают с помощью сервера взаимодействия (CORE) в сервер Голосовых приложений массив транскрибированного текста и семантические тэги;transmitting by means of the interaction server (CORE) to the Voice Applications server an array of transcribed text and semantic tags;

осуществляют с помощью сервер взаимодействия (CORE) логирование результатов распознавания в службе Logger;using the interaction server (CORE) logging the recognition results in the Logger service;

осуществляют запись и хранение информации обо всех стадиях диалога: record and store information about all stages of the dialogue:

дата и время начала сессии;

date and time of the beginning of the session;

дата и время окончания сессии;

date and time of the end of the session;

URL аудиопотока/аудиофайла;

URL of the audio stream/audio file;

настройки статистики сервера Голосовых приложений, в службе статистики (Statistics) для дальнейшего использования в АРМ Статистика.

settings of the statistics of the Voice Applications server, in the statistics service (Statistics) for further use in the Statistics AWP.

В частном варианте производят с помощью сервера взаимодействия (CORE) маршрутизацию функции распознавания обращения клиента в модуль службы OSR без предварительного обращения в систему ASR, при этом в модуле службы OSR осуществляют прослушивание переданного звукового отрезка (VS) и фиксируют выбор правильного варианта распознавания текста, возвращают с помощью модуля службы OSR в сервер взаимодействия (CORE) массив транскрибированного текста и семантические тэги.In a particular variant, using the interaction server (CORE), the client request recognition function is routed to the OSR service module without first contacting the ASR system, while the OSR service module listens to the transmitted audio segment (VS) and fixes the choice of the correct text recognition option, returns using the OSR service module in the server interaction (CORE) array of transcribed text and semantic tags.

В другом частном варианте дополнительно:In another private variant, additionally:

производят с помощью сервера взаимодействия (CORE) маршрутизацию функции распознавания обращения клиента в систему ASR;using the interaction server (CORE) routing the function of recognizing the client's request to the ASR system;

оценивают с помощью сервера взаимодействия (CORE) уровень доверия к распознаванию голосового отрезка VS;evaluate using the interaction server (CORE) the level of confidence in the recognition of the voice segment VS;

при уровне доверия выше минимально установленного осуществляют передачу текста и требуемой грамматики в службу Sematic для выделения семантических тэгов;at a trust level above the minimum set, the text and the required grammar are transferred to the Sematic service to extract semantic tags;

при уровне доверия ниже минимально установленного осуществляют формирование отрицательного ответа в сервер Голосовых приложений.at a trust level below the minimum set, a negative response is generated to the Voice Applications server.

производят с помощью сервера взаимодействия (CORE) последовательную отправку звукового отрезка (VS) в систему ASR, а после получения результатов автоматического распознавания в модуль службы OSR;using the interaction server (CORE) to sequentially send the audio segment (VS) to the ASR system, and after receiving the results of automatic recognition to the OSR service module;

производят с помощью модуля службы OSR прослушивание звукового отрезка (VS) и проверку результатов автоматического распознавания речи клиента;using the OSR service module, listening to the audio segment (VS) and checking the results of the client's automatic speech recognition;

подтверждают или корректируют в модуле службы OSR результаты автоматического транскрибирования звука в текст с помощью системы ASR;confirm or correct in the OSR service module the results of automatic audio-to-text transcription using the ASR system;

передают с помощью модуля службы OSR в сервер взаимодействия (CORE) массив транскрибированного текста и семантические тэги.using the OSR service module, they send an array of transcribed text and semantic tags to the interaction server (CORE).

производят одновременную отправку звукового отрезка (VS) и в систему ASR, и в модуль службы OSR;simultaneously sending the audio segment (VS) to both the ASR system and the OSR service module;

при получении ответа от системы ASR или модуля службы OSR в сервере взаимодействия (CORE) производят оценку очередности полученных ответов в соответствии со следующим порядком: если первый по времени приходит ответ от модуля службы OSR, то передают в сервер Голосовых приложений результат распознавания и семантические тэги службы OSR; если первый по времени приходит ответ от системы ASR и переданная вероятность распознавания текста больше заданного уровня в сервере взаимодействия (CORE), то осуществляют передачу в сервер Голосовых приложений результат автоматического распознавания системой ASR; если первый по времени приходит ответ от системы ASR и вероятность распознавания текста меньше заданного уровня в сервере взаимодействия (CORE), то ожидается ответ от модуля службы OSR.when receiving a response from the ASR system or the OSR service module in the interaction server (CORE), the order of received responses is evaluated in accordance with the following order: if the first response comes from the OSR service module, then the recognition result and semantic tags of the service are transmitted to the Voice Application Server OSR; if the first response comes from the ASR system and the transmitted probability of text recognition is greater than the specified level in the interaction server (CORE), then the result of automatic recognition by the ASR system is transmitted to the Voice Application server; if the first response comes from the ASR system and the probability of OCR is less than the specified level in the interaction server (CORE), then a response from the OSR service module is expected.

после обработки речи клиента и выделения семантических тегов осуществляют обращение с помощью сервере взаимодействия (CORE) через сервер Голосовых приложений в ИТ-систему заказчика и получают текст для синтеза речи; after processing the client's speech and extracting semantic tags, they are accessed using the interaction server (CORE) through the Voice Applications server to the customer's IT system and receive text for speech synthesis;

производят с помощью сервера взаимодействия (CORE) передачу полученного из ИТ-системы Заказчика текст в систему синтеза речи (TTS);using the interaction server (CORE), the text received from the Customer's IT system is transferred to the speech synthesis system (TTS);

возвращают с помощью сервера взаимодействия (CORE) в терминал клиента аудиофайл с синтезированным сообщением по запрошенной клиентом информации.using the interaction server (CORE), an audio file with a synthesized message is returned to the client terminal according to the information requested by the client.

Заявленное решение также осуществляется за счет машиночитаемого носителя для автоматизации голосовых обращений клиентов в сервисные службы компании, содержащий исполняемые процессором инструкции, которые побуждают взаимодействовать аппаратные средства для выполнения способа автоматизации голосовых обращений клиентов в сервисные службы компании.The claimed solution is also implemented by a computer-readable medium for automating customer voice calls to the company's service departments, containing processor-executable instructions that cause hardware to interact to perform a method for automating customer voice calls to the company's service departments.

В частном варианте производят с помощью сервера взаимодействия (CORE) маршрутизацию функции распознавания обращения клиента к модулю службы OSR без предварительного обращения в систему ASR; In a private embodiment, the interaction server (CORE) is used to route the function of recognizing the client's request to the OSR service module without first contacting the ASR system;

производят с помощью модуля службы OSR прослушивание оператором переданного звукового отрезка (VS) и фиксируют выбор правильного варианта распознавания текста; using the OSR service module, the operator listens to the transmitted audio segment (VS) and fixes the choice of the correct text recognition option;

возвращают с помощью модуля службы OSR в сервер взаимодействия (CORE) массив транскрибированного текста и семантические тэги.using the OSR service module, they return to the interaction server (CORE) an array of transcribed text and semantic tags.

при уровне доверия ниже минимально установленного осуществляется формирование отрицательного ответа в сервер Голосовых приложений.if the trust level is below the minimum set, a negative response is generated to the Voice Applications server.

производят с помощью сервера взаимодействия (CORE) последовательную отправку звукового отрезка (VS) в систему ASR, а после получения результатов автоматического распознавания в модуль службы OSR.;using the interaction server (CORE), sequentially sending the audio segment (VS) to the ASR system, and after receiving the results of automatic recognition to the OSR service module .;

производят с помощью модуля службы OSR прослушивание оператором звукового отрезка (VS) и проверку результатов автоматического распознавания речи клиента; using the OSR service module, the operator listens to the audio segment (VS) and checks the results of the client's automatic speech recognition;

подтверждают или корректируют в модуле службы OSR результаты автоматического транскрибирования звука в тексте с помощью системы ASR;confirm or correct in the OSR service module the results of automatic audio transcription in text using the ASR system;

при получении ответа от системы ASR или от модуля службы OSR c помощью сервера взаимодействия (CORE) производят оценку очередности полученных ответов в соответствии со следующим порядком: upon receipt of a response from the ASR system or from the OSR service module, using the interaction server (CORE), the order of received responses is evaluated in accordance with the following order:

если первый по времени приходит ответ от модуля службы OSR, то передают в сервер Голосовых приложений результат распознавания и семантические тэги модуля службы OSR; если первый по времени приходит ответ от системы ASR и переданная вероятность распознавания текста больше заданного уровня в сервере взаимодействия (CORE), то осуществляют передачу в сервер Голосовых приложений результата автоматического распознавания системой ASR; if the first response comes from the OSR service module, then the recognition result and semantic tags of the OSR service module are transmitted to the Voice Applications server; if the first response comes from the ASR system and the transmitted text recognition probability is greater than the specified level in the interaction server (CORE), then the result of automatic recognition by the ASR system is transmitted to the Voice Application server;

если первый по времени приходит ответ от системы ASR и вероятность распознавания текста меньше заданного уровня в сервере взаимодействия (CORE), то ожидают ответ от службы OSR.if the response from the ASR system is the first in time and the probability of recognizing the text is less than the specified level in the interaction server (CORE), then a response from the OSR service is expected.

после обработки речи клиента и выделения семантических тегов осуществляют обращение с помощью сервера взаимодействия (CORE) через сервер Голосовых сообщений в ИТ-систему Заказчика и получают текст для синтеза речи; after processing the client's speech and extracting semantic tags, they make an appeal using the interaction server (CORE) through the Voice Message server to the Customer's IT system and receive the text for speech synthesis;

производят с помощью сервера взаимодействия (CORE) передачу полученного из ИТ-системы Заказчика текста в систему синтеза речи (TTS);using the interaction server (CORE), the text received from the Customer's IT system is transferred to the speech synthesis system (TTS);

ОПИСАНИЕ ЧЕРТЕЖЕЙDESCRIPTION OF THE DRAWINGS

Реализация изобретения будет описана в дальнейшем в соответствии с прилагаемыми чертежами, которые представлены для пояснения сути изобретения и никоим образом не ограничивают область изобретения. К заявке прилагаются следующие чертежи:The implementation of the invention will be described hereinafter in accordance with the accompanying drawings, which are presented to explain the essence of the invention and in no way limit the scope of the invention. The following drawings are attached to the application:

Фиг. 1 иллюстрирует аппаратно-программный комплекс автоматизации голосовых обращений клиентов в сервисные службы;Fig. 1 illustrates a hardware-software complex for automating customer voice calls to service departments;

Фиг. 2-6 иллюстрируют примеры интерфейсов взаимодействия с системой, в котором предоставлена возможность просмотра списков настроенных сценариев диалогов и формирования нового сценария диалога;Fig. 2-6 illustrate examples of interfaces for interacting with the system, which provides the ability to view lists of customized dialog scripts and generate a new dialog script;

Фиг. 7 иллюстрирует пример интерфейса взаимодействия с системой, в котором представлена технологическая схема процесса анализа голоса во время звонка в реальном времени и определения выбора следующей ветки скрипта диалога в зависимости от анализа голоса;Fig. 7 illustrates an example of an interface for interacting with the system, which presents a flow chart of the process of voice analysis during a real-time call and determining the choice of the next branch of the dialog script, depending on the voice analysis;

Фиг. 8 иллюстрирует пример вариант интерфейса работы специалиста по мониторингу с детальной информацией по обращению клиента.Fig. 8 illustrates an example of a variant of the interface of a monitoring specialist with detailed information on a client's request.

ДЕТАЛЬНОЕ ОПИСАНИЕ ИЗОБРЕТЕНИЯDETAILED DESCRIPTION OF THE INVENTION

В приведенном ниже подробном описании реализации изобретения приведены многочисленные детали реализации, призванные обеспечить отчетливое понимание настоящего изобретения. Однако, квалифицированному в предметной области специалисту, будет очевидно каким образом можно использовать настоящее изобретение, как с данными деталями реализации, так и без них. В других случаях хорошо известные методы, процедуры и компоненты не были описаны подробно, чтобы не затруднять понимание особенностей настоящего изобретения.In the following detailed description of the implementation of the invention, numerous implementation details are provided to provide a clear understanding of the present invention. However, one skilled in the art will appreciate how the present invention can be used, both with and without these implementation details. In other cases, well-known methods, procedures and components have not been described in detail so as not to obscure the features of the present invention.

Кроме того, из приведенного изложения будет ясно, что изобретение не ограничивается приведенной реализацией. Многочисленные возможные модификации, изменения, вариации и замены, сохраняющие суть и форму настоящего изобретения, будут очевидными для квалифицированных в предметной области специалистов.Furthermore, it will be clear from the foregoing that the invention is not limited to the present implementation. Numerous possible modifications, changes, variations and substitutions that retain the spirit and form of the present invention will be apparent to those skilled in the subject area.

Настоящее изобретение направлено на обеспечение системы, способа и машиночитаемого носителя автоматизации обработки голосовых обращений клиентов в сервисные службы компании, которые объединяют в себе систему автоматического распознавания речи (ASR) и модуль службы оператора (OSR).The present invention is directed to providing a system, method and computer-readable medium for automating the processing of customer voice calls to company service departments, which combines an automatic speech recognition (ASR) system and an operator service module (OSR).

Ниже приведены термины и сокращения, которые используются в заявленном решении:The following terms and abbreviations are used in the claimed solution:

NoInput - событие, при котором система не получает от клиента голосовой команды (клиент молчит или говорит слишком тихо);NoInput - an event in which the system does not receive a voice command from the client (the client is silent or speaks too quietly);

NoMatch - событие, при котором клиент вводит значение, которое не определяется системой на основе заложенной грамматики;NoMatch - an event in which the client enters a value that is not determined by the system based on the underlying grammar;

Агрегированный диалог - верхний уровень агрегации диалогов в службе OSR, отражает бизнес-логику объединения тематик обращений клиентов;Aggregated dialog - the top level of dialog aggregation in the OSR service, reflects the business logic of combining the topics of customer requests;

Тематика - агрегирует диалоги с клиентом с единой причиной обращения;Topics - aggregates dialogues with a client with a single reason for contacting;

Диалог - Сообщение, проигрываемое клиенту в рамках разговора. Диалоги, относящиеся к одному вопросу, объединяются в тематики;Dialogue - A message played to the client as part of a conversation. Dialogues related to one issue are combined into topics;

Навык оператора - характеристика оператора, отражающая его специализацию и уровень подготовленности;Operator skill - a characteristic of the operator, reflecting his specialization and level of preparedness;

Глобальный ответ - пред настроенный вариант действия для оператора на голосовую команду клиента, позволяющий переключить клиента в ветку меню вне пред настроенной логики диалога;Global response - a pre-configured action option for an operator to a client's voice command, allowing you to switch the client to a menu branch outside of the pre-configured dialog logic;

Локальный ответ - преднастроенный вариант действия для оператора на голосовую команду клиента, позволяющий переключить клиента на следующий шаг диалога;Local response - a pre-configured option for the operator to the client's voice command, allowing you to switch the client to the next step of the dialogue;

АС - автоматизированная система;AS - automated system;

АРМ - автоматизированное рабочее место;AWP - automated workplace;

АТС - автоматическая телефонная станция;ATS - automatic telephone exchange;

БД - база данных;DB - database;

СУБД - система управления базами данных;DBMS - database management system;

ПО - программное обеспечение;ON - software;

ASR (Automation speech recognition) – автоматическое распознавание речи;ASR (Automation speech recognition) - automatic speech recognition;

IVR (Interactive Voice Response) - система предварительно записанных голосовых сообщений, выполняющая функцию маршрутизации звонков внутри контактного центра;IVR (Interactive Voice Response) - a system of pre-recorded voice messages that performs the function of routing calls within the contact center;

NFS (Network File System) - протокол сетевого доступа к файловым системам;NFS (Network File System) - protocol for network access to file systems;

TCP/IP (Transmission Control Protocol (TCP)) - протокол управления передачей данных и Internet Protocol (IP) – межсетевой протокол, описывающий формат пакета данных, передаваемого по сети;TCP / IP (Transmission Control Protocol (TCP)) - a data transfer control protocol and Internet Protocol (IP) - an internetwork protocol that describes the format of a data packet transmitted over a network;

TTS (Text-To-Speech) – синтез речи.TTS (Text-To-Speech) - speech synthesis.

В заявленном решении система обработки голосового обращения абонента заменяет сотрудника сервисной службы компании (оператора контактного центра, продавца в магазине, официанта в ресторане и т.д.) на автоматический (при полной автоматизации) или автоматизированный (при использовании службы OSR) сервис. В решении максимально используются возможности современной технологии распознавания речи, что позволяет не использовать возможности оператора, и подключать человеческий ресурс только в самых необходимых случаях.In the claimed solution, the subscriber's voice processing system replaces the company's service employee (contact center operator, store clerk, restaurant waiter, etc.) with an automatic (with full automation) or automated (when using the OSR service) service. The solution makes maximum use of the capabilities of modern speech recognition technology, which allows not to use the capabilities of the operator, and connect the human resource only in the most necessary cases.

В заявленном решении используются следующие технологические решения:The claimed solution uses the following technological solutions:

1. Запись разговора в online-режиме обрабатывают и обнаруживают начало разговора клиента. 1. The recording of the conversation in online mode is processed and the beginning of the client's conversation is detected.

2. Голос клиента записывают и создают Voice Sample (VS - фрагмент диалога Клиент – Система, который содержит только ответ клиента на вопрос системы). 2. The client's voice is recorded and a Voice Sample is created (VS - a fragment of the Client - System dialogue, which contains only the client's answer to the system's question).

3. VS передают на обработку оператору посредством модуля службы Operator Speech Recognition (OSR).3. The VS is passed to the operator for processing via the Operator Speech Recognition (OSR) service module.

В зависимости от настроек системы, перевод осуществляется сразу в online-режиме, или может быть обработана в off-line в случае, если система ASR не распознала речь клиента.Depending on the system settings, the translation is carried out immediately online, or can be processed offline if the ASR system did not recognize the client's speech.

В одном из вариантов работы системы предусмотрено одновременное применение системы ASR и модуля службы OSR. In one of the variants of the system operation, the simultaneous use of the ASR system and the OSR service module is provided.

Для формирования звукового отрезка VS выполняют: определение начала разговора клиента; To form an audio segment, VS perform: determining the beginning of a client's conversation;

выделяют ответ клиента и отрезают тишину вначале и в конце разговора. highlight the client's response and cut off the silence at the beginning and end of the conversation.

Отправляют VS в систему ASR. Оценивают вероятность правильного распознавания отрезка VS и, если получен ответ с уровнем вероятности правильного распознавания более заданного в настройках системы, распознанный текст передается в службу Semantic.Submit the VS to the ASR system. The probability of correct recognition of segment VS is estimated, and if a response with a probability of correct recognition more than specified in the system settings is received, the recognized text is transferred to the Semantic service.

Если уровень распознавания меньше заданного порогового значения, то VS передается в модуль службы OSR. Выполняют распознавание переданного VS оператором. В наушниках у оператора инициируется короткий звуковой сигнал, и далее проигрывается VS (с той же скоростью или в N (по умолчанию равный 1,3) раз быстрее). При этом указатель позиции проигрывания перемещается в соответствии с VS. Оператор, прослушав VS, нажимает на кнопку/строку, соответствующую ответу клиента, или набирает текст/цифры и передает результат обратно на сервер взаимодействия (CORE), который в соответствии с заданными настройками маршрутизирует данные на сервер Голосовых приложений.If the recognition level is less than the specified threshold, then the VS is passed to the OSR service module. Perform recognition of the transmitted VS by the operator. A short beep is initiated in the operator's headphones, and then VS is played (at the same speed or N (default equal to 1.3) times faster). In this case, the playback position pointer moves in accordance with VS. The operator, after listening to the VS, presses the button/line corresponding to the client's response, or dials text/numbers and sends the result back to the interaction server (CORE), which, in accordance with the specified settings, routes the data to the Voice Applications server.

В одном из альтернативных вариантов работы системы используют только модуль службы OSR. При этом выполняют формирование звукового отрезка VS. Определяют начало разговора клиента, выделяют ответ клиента и отрезают тишину в начале и в конце разговора. Отправляют VS в модуль службы OSR. Выполняют распознавание переданного VS оператором. В наушниках у оператора инициируется короткий звуковой сигнал, и далее проигрывается VS (с той же скоростью или в N (по умолчанию равный 1,3) раз быстрее). При этом указатель позиции проигрывания перемещается в соответствии с VS. Оператор, прослушав VS, нажимает на кнопку/строку, соответствующую ответу клиента, или набирает текст/цифры и передает результат обратно на сервер взаимодействия (CORE), который маршрутизирует данные на сервер Голосовых приложений.In one alternative system operation, only the OSR service module is used. In this case, the formation of a sound segment VS is performed. Determine the beginning of the client's conversation, highlight the client's response and cut off the silence at the beginning and end of the conversation. Send the VS to the OSR service module. Perform recognition of the transmitted VS by the operator. A short beep is initiated in the operator's headphones, and then VS is played (at the same speed or N (default equal to 1.3) times faster). In this case, the playback position pointer moves in accordance with VS. The operator, after listening to the VS, presses the button/line corresponding to the client's response, or types text/numbers and sends the result back to the interaction server (CORE), which routes the data to the Voice Application server.

В заявленном решении система обработки голосовых обращений клиентов в сервисные службы компании позволяет привлекать оператора в случаях:In the proposed solution, the system for processing customer voice requests to the company's service departments allows you to involve an operator in the following cases:

1. на этапе диалога, где автоматическая система распознавания речи не может распознать слова клиента.1. at the dialogue stage, where the automatic speech recognition system cannot recognize the client's words.

2. только в критичных шагах диалога, когда необходима 100% уверенность в правильном распознавании ответа клиента (подтверждение заказа, проверка пароля/кодового слова и т.д.).2. only in critical steps of the dialogue, when 100% confidence in the correct recognition of the client's response is required (order confirmation, password/codeword verification, etc.).

Данный подход, в отличие от обычного диалога Оператор – Клиент, допускает оператору условно одновременно обрабатывать до 10-14 вызовов, что позволяет до 80% уменьшить затраты на содержание соответствующей службы компании. Дополнительно сохраняются результаты транскрибирования текста и выделенных семантических тэгов по 100% полученных голосовых сообщений (как обработанных автоматической службой, так и обработанных оператором), что позволяет использовать полученные данные для улучшения работы службы Semantic.This approach, in contrast to the usual Operator-Client dialog, allows the operator to conditionally process up to 10-14 calls at the same time, which allows up to 80% to reduce the cost of maintaining the corresponding service of the company. Additionally, the results of transcribing the text and selected semantic tags for 100% of the received voice messages (both processed by the automatic service and processed by the operator) are stored, which allows using the received data to improve the operation of the Semantic service.

Как представлено на Фиг.1 аппаратно-программный комплекс автоматизации голосовых обращений клиентов в сервисные службы состоит из серверных модулей и клиентских модулей.As shown in Figure 1, the hardware-software complex for automating voice calls from clients to customer services consists of server modules and client modules.

Состав серверных модулей:The composition of the server modules:

102. Сервер взаимодействия (CORE), отвечает за взаимодействие всех компонентов модулей и подмодулей между собой, передачи запросов, в том числе к службе статистики. Осуществляет прием запросов от Сервера Голосовых приложений по протоколу управления медиа-ресурсами (MRCP), передачу вызовов в службу OSR, в том числе обработку и предпостроение json форм, получение и маршрутизация результатов распознавание из модуля службы OSR в MRCP. Обращается к серверу Semantic для выделения смысла из распознанного текста.102. The interaction server (CORE) is responsible for the interaction of all components of modules and submodules with each other, transmission of requests, including to the statistics service. It receives requests from the Voice Application Server via the Media Resource Control Protocol (MRCP), transfers calls to the OSR service, including processing and pre-building json forms, receiving and routing recognition results from the OSR service module to the MRCP. Calls the Semantic server to extract meaning from the recognized text.

104. Модуль службы оператора OSR (Operator Speech Recognition) отвечает за передачу запросов и получение ответов в АРМ Оператора. Осуществляет управление регистрацией операторов (присвоение и снятие статуса Busy), отслеживает статус оператора (перерыв, готов, занят) и маршрутизирует вызовы в зависимости от скилл-групп операторов, его занятости и истории обработанных звонков. 104. The OSR (Operator Speech Recognition) operator service module is responsible for transmitting requests and receiving responses to the Operator's workstation. Manages agent registration (assignment and removal of Busy status), monitors agent status (break, ready, busy) and routes calls depending on agent skill groups, their busyness and handled call history.

107. Служба Semantic, отвечает за выделение смысла (ключевых слов) из распознанного текста на основе статистической модели.107. The Semantic service is responsible for extracting meaning (keywords) from the recognized text based on a statistical model.

103. Служба статистики (Statistic) отвечает за сохранение информации обо всех стадиях диалога, для дальнейшего использования в АРМ Статистика.103. The Statistics Service (Statistic) is responsible for storing information about all stages of the dialogue, for further use in the Statistics AWP.

Клиентские модули:Client modules:

111. Автоматизированное рабочее место оператора (АРМ Оператора) – рабочее место оператора, на который поступают запросы от службы MRCP. При получении запроса АРМ автоматически открывает окно запроса и проигрывает звуковой отрывок. Оператор имеет возможность выбрать вариант, которому соответствует звуковой отрывок. Ответ оператора возвращается в сервер взаимодействия (CORE).111. Operator's workstation (Operator's workstation) is an operator's workstation, to which requests are received from the MRCP service. When a request is received, the workstation automatically opens the request window and plays an audio clip. The operator has the opportunity to choose the option that corresponds to the sound clip. The operator's response is returned to the interaction server (CORE).

112. АРМ Конфигуратор – рабочее место администратора Аппаратно-программного комплекса, позволяет настроить интерфейс работы оператора (модуль службы OSR), параметры распознавания (система ASR) и синтеза речи (служба TTS), настройки веб-сервисов для обращения к системам Заказчика.112. AWP Configurator - a workstation of the administrator of the Hardware and Software Complex, allows you to configure the operator's interface (OSR service module), recognition parameters (ASR system) and speech synthesis (TTS service), web services settings for accessing the Customer's systems.

113. АРМ Статистики - рабочее место специалиста по мониторингу. Предоставляет отчеты, на основе сформированной статистики.113. Workstation of Statistics - workplace of a monitoring specialist. Provides reports based on generated statistics.

Для обеспечения работы аппаратно-программного комплекса также необходима техническая среда, состоящая из следующих модулей: To ensure the operation of the hardware-software complex, a technical environment is also required, consisting of the following modules:

101. Сервер Голосовых приложений (Voice XML): поддерживает основную логику работы сервиса и адаптируется под специфику работы Заказчика. Отвечает за взаимодействие с ИТ-системами Заказчика, предварительную обработку вызова, определение начала речи с помощью функции VAD и таймаутов. Осуществляет передачу звука в сервер взаимодействия (CORE).101. Voice Application Server (Voice XML): supports the main logic of the service and adapts to the specifics of the Customer's work. Responsible for interaction with the Customer's IT systems, call pre-processing, speech start detection using the VAD function and timeouts. Transmits audio to the interaction server (CORE).

108. Voice XML интерпретатор и MRCP клиент осуществляет передачу запросов между сервером взаимодействия (CORE) и сервером Голосовых приложений.108. The Voice XML interpreter and the MRCP client transmit requests between the interaction server (CORE) and the Voice Application server.

105. Система ASR (Automation Speech Recognition), отвечает за взаимодействие с серверами распознавания речи различных производителей, в т.ч. Nuance ASR, Yandex Speech Kit и т.д.105. The ASR (Automation Speech Recognition) system is responsible for interacting with speech recognition servers from various manufacturers, incl. Nuance ASR, Yandex Speech Kit, etc.

106. Служба TTS (Text-To-Speech), отвечает за взаимодействие с серверами произнесения речи различных производителей, в т.ч. TTS Nuance, TTS Yandex Speech Kit и т.д.106. The TTS (Text-To-Speech) service is responsible for interacting with speech pronunciation servers from various manufacturers, incl. TTS Nuance, TTS Yandex Speech Kit, etc.

Ниже представлена логика взаимодействия между компонентами системы:Below is the logic of interaction between the system components:

151. Голосовое обращение клиента через ИТ-системы Заказчика маршрутизируется в сервер Голосовых приложений. 151. The client's voice message through the Customer's IT systems is routed to the Voice Applications server.

152. Сервер Голосовых приложений: фиксирует в службе Statistics начало вызова; осуществляет передачу запроса на обработку аудиопотока/аудиофайла на сервер CORE.152. Server of Voice Applications: fixes the beginning of the call in the Statistics service; transfers a request for processing an audio stream/audio file to the CORE server.

Запрос на обработку включает в себя:The processing request includes:

идентификатор вызова;

call identifier;

URL на аудиопоток/аудиофайл;

URL to the audio stream/audio file;

тип обработки (ASR/ OSR/ ASR+OSR);

processing type (ASR/ OSR/ ASR+OSR);

грамматику для проставления семантических тэгов.

grammar for affixing semantic tags.

153. Voice XML интерпретатор и MRCP клиент осуществляет передачу запроса на сервер CORE.153. The Voice XML interpreter and MRCP client transmit the request to the CORE server.

154. В зависимости от переданных настроек сервер CORE осуществляет маршрутизацию аудиопотока/аудиофайла на распознавание:154. Depending on the passed settings, the CORE server routes the audio stream/audio file for recognition:

в систему ASR (154) (аудиопоток/аудиофайл и настройки распознавания текста);to the ASR system (154) (audio stream/audio file and OCR settings);

в модуль службы OSR (154’) (аудиопоток/аудиофайл, при наличии распознанный текст от службы ASR и название диалога);to the OSR service module (154’) (audio stream/audio file, if available, recognized text from the ASR service and the name of the dialog);

одновременно в модуль службы OSR и систему ASR.simultaneously to the OSR service module and the ASR system.

155. система ASR (155) производит обработку аудиопотока/ аудиофайла и формирует массив распознанного текста с указанием уровня доверия.155. The ASR system (155) processes the audio stream/audio file and generates an array of recognized text indicating the level of trust.

155’ Модуль службы OSR (155’) при получении запроса на обработку аудиопотока/аудиофайла осуществляет поиск и приоритезацию свободных операторов и маршрутизирует аудиопоток/аудиофайл на выбранного сотрудника.155’ The OSR service module (155’), upon receiving a request to process an audio stream/audio file, searches for and prioritizes free operators and routes the audio stream/audio file to the selected employee.

Результатом обработки запроса OSR/ASR является:The result of processing an OSR/ASR request is:

- выделенная семантика из аудиозаписи;- selected semantics from the audio recording;

- транскрибированная оператором аудиозапись диалога с клиентом.- audio recording of the dialogue with the client transcribed by the operator.

При отсутствии свободных сотрудников формирует ответ в CORE со статусом BUSY.If there are no free employees, it generates a response in CORE with the BUSY status.

156. После распознавания текста системой ASR или транскрибирование записи оператором сервер CORE осуществляет передачу данной информации на сервер Semantic для выделения семантических тэгов.156. After text recognition by the ASR system or transcription of the record by the operator, the CORE server transfers this information to the Semantic server to extract semantic tags.

В случае, если в процессе обработки аудиозаписи оператор использовал преднастроенные ответы диалога, то модуль службы OSR передает уже выделенные семантические тэги и обращения на сервер Semantic не происходит.If during the processing of an audio recording the operator used preconfigured dialogue responses, then the OSR service module transmits the already selected semantic tags and there is no call to the Semantic server.

157. Сервер Semantic осуществляет выделение семантических тэгов в переданном тексте с помощью указанной грамматики.157. The Semantic Server performs the extraction of semantic tags in the transmitted text using the specified grammar.

158. После получения от сервера Semantic массива семантических тэгов сервер CORE передает данную информацию в сервер Голосовых приложений для определения дальнейших шагов обработки диалога с клиентом.158. After receiving an array of semantic tags from the Semantic server, the CORE server transmits this information to the Voice Applications server to determine further steps in processing the dialogue with the client.

В случае, если семантические тэги не были выделены или при обработке аудиопотока/аудиофайла произошли ошибки, то сервер CORE передает один из следующих типов событий: No Match, No Input, Error.If semantic tags were not selected or errors occurred while processing the audio stream/audio file, the CORE server sends one of the following event types: No Match, No Input, Error.

159. После передачи результатов распознавания в сервер Голосовых приложений посредством протокола MRCP, также происходит их логирование в службе Logger. 159. After the recognition results are transmitted to the Voice Applications server via the MRCP protocol, they are also logged in the Logger service.

160. По результатам анализа распознанного текста от клиента сервер Голосовых приложений может осуществить запрос в ИТ-системы Заказчика для получения дополнительной информации для ответа клиенту (запрос баланса, статуса заказа, информации о работе отделений/магазинов и т.д.)160. Based on the results of the analysis of the recognized text from the client, the Voice Applications server can make a request to the Customer's IT systems to obtain additional information to respond to the client (balance request, order status, information about the work of branches / stores, etc.)

161. ИТ-системы Заказчика формируют необходимую информацию по запросу клиента.161. IT systems of the Customer generate the necessary information at the request of the client.

162. В случае, если информация носит динамический характер и для ее озвучивания необходимо провести синтез речи, то сервер Голосовых приложений направляет запрос в службу TTS (в зависимости от выбранного Заказчиком подрядчика TTS запрос идет напрямую или через MRCP-клиент).162. If the information is of a dynamic nature and it is necessary to perform speech synthesis for its voicing, then the Voice Applications server sends a request to the TTS service (depending on the TTS contractor chosen by the Customer, the request goes directly or through the MRCP client).

163. По заданному тексту служба TTS осуществляет синтез речи и передает созданный аудиофайл для проигрывания в ИТ-системы заказчика.163. According to the given text, the TTS service performs speech synthesis and transfers the created audio file for playback to the customer's IT systems.

164. По завершению обработки голосового вызова от клиента сервер Голосовых приложений осуществляет запись об окончании диалога на сервер Statistic.164. Upon completion of the processing of a voice call from the client, the Voice Applications server records the end of the dialogue to the Statistic server.

При необходимости анализа результатов распознавания и мониторинга качества работы службы специалист по мониторингу посредством web-интерфейса службы Statistic осуществляет поиск и анализ аудиозаписей и результатов распознавания.If it is necessary to analyze the recognition results and monitor the quality of the service, the monitoring specialist searches and analyzes audio recordings and recognition results using the web interface of the Statistic service.

На фигурах 2-5 представлены примеры интерфейсов взаимодействия с системой, в которых предоставлена возможность просмотра списков настроенных сценариев диалогов и формирования нового сценария диалога.Figures 2-5 show examples of interfaces for interacting with the system, which provide the ability to view lists of customized dialog scripts and generate a new dialog script.

Администрирование диалоговDialog Administration

Для перехода в просмотр настроенных диалогов для обработки Оператором необходимо на главном экране выбрать раздел «Диалоги».To switch to viewing the configured dialogs for processing by the Operator, select the “Dialogues” section on the main screen.

Данное окно позволяет:This window allows you to:

1. Просмотреть список настроенных диалогов1. View a list of configured dialogs

2. Создать новый диалог2. Create a new dialog

3. Изменить существующий диалог.3. Modify an existing dialog.

Для создания нового диалога в системе необходимо нажать на кнопку «Создать» в верхней части диалогового окна. В открывшемся диалоговом окне необходимо указать название диалога; в разделе Promt ввести полный текст диалога, зачитываемый клиенту в IVR; выбрать тематику и нажать на кнопку «Создать».To create a new dialog in the system, click on the "Create" button at the top of the dialog box. In the dialog box that opens, you must specify the name of the dialog; in the Promt section, enter the full text of the dialogue to be read to the client in IVR; Select a theme and click on the "Create" button.

Для редактирования ранее созданного диалога необходимо выбрать нужную запись из списка и перейти по гиперссылке в окно просмотра диалога. В открывшемся окне нажать на кнопку «Изменить». Открывается диалоговое окно редактирования параметров диалога, в котором доступно:To edit a previously created dialog, select the required entry from the list and follow the hyperlink to the dialog viewing window. In the window that opens, click on the "Edit" button. A dialog box for editing dialog parameters opens, in which it is available:

- Изменить название диалога- Change the name of the dialogue

- Изменить описание диалога- Change dialog description

- Выбрать тематику, к которой относится данный диалог, из списка доступных тематик.- Select the topic to which this dialog belongs from the list of available topics.

Для создания и редактирования списка ответов диалога необходимо выбрать нужную запись из списка диалогов и перейти по гиперссылке. Открывается диалоговое окно, позволяющее: To create and edit a list of dialogue responses, select the required entry from the list of dialogues and follow the hyperlink. A dialog box opens allowing you to:

1. Просмотреть и скорректировать список ответов в рамках данного диалога (добавить, изменить или удалить);1. View and correct the list of answers within this dialog (add, change or delete);

2. Изменить параметры диалога (название, описание и тематику).2. Change the parameters of the dialog (title, description and subject).

Для создания ответа необходимо нажать на кнопку «Создать ответ» в верхней правой части диалогового окна. В открывшемся диалоговом окне необходимо указать:To create a response, you must click on the "Create response" button in the upper right part of the dialog box. In the dialog box that opens, you must specify:

- название ответа на латинице; - the name of the answer in Latin;

- тип отображения ответа. Доступные варианты: BUTTON – кнопка, ADDRESS – поле для ввода адреса, TEXT – поле для ввода текста, NUMBER – поле для ввода числа, DATE – поле для выбора даты;- response display type. Available options: BUTTON - a button, ADDRESS - a field for entering an address, TEXT - a field for entering text, NUMBER - a field for entering a number, DATE - a field for selecting a date;

- описание ответа, которое будет отображаться оператору- description of the response that will be displayed to the operator

Для редактирования ответа необходимо выбрать нужную запись из списка и нажать на кнопку «Изменить»:To edit the answer, select the desired entry from the list and click on the "Edit" button:

Для удаления ответа необходимо выбрать нужную запись из списка и нажать на кнопку «Удалить».To delete an answer, select the desired entry from the list and click the "Delete" button.

На фигуре 6 приведен вариант интерфейса работы при поступлении вызова оператору.The figure 6 shows a variant of the interface when a call is received by the operator.

При поступлении вызова оператору открывается окно диалога для выбора вариантов ответа клиента.When a call comes in, the agent opens a dialog box to select options for the customer's response.

Окно диалога включает в себя:The dialog box includes:

• Название диалога (601) – в данном поле отображается название диалога, с которого был переведен ответ клиента на оператора. Выводится в верхней части диалогового окна.• Dialog name (601) – this field displays the name of the dialog from which the client's response was transferred to the operator. Appears at the top of the dialog box.

• Оставшееся время (602) – в данном поле отображается время, доступное оператору для ответа, выраженное в миллисекундах. После истечения отведенного времени при отсутствии ответа оператора диалоговое окно автоматически закрывается, в системе фиксируется событие BUSY.• Remaining Time (602) – This field displays the time available for the agent to respond, expressed in milliseconds. After the expiration of the allotted time, if the operator does not answer, the dialog box is automatically closed, the BUSY event is recorded in the system.

• Поле для ввода текста (603) – предназначено для ввода ответа клиента на текущий диалог. • Text entry field (603) – designed to enter the client's response to the current dialog.

Заполняется в случае, если диалог предполагает расширенный (имеет множество вариантов ответа: ввод названий, адреса, ФИО) или уникальный (комментарий, пароли и кодовые слова) ответ клиента на заданный вопрос.It is filled in if the dialogue involves an extended (has many answer options: entering names, addresses, full names) or a unique (comment, passwords and code words) customer response to the question asked.

• Локальные кнопки (604) – переводят диалог с клиентом на следующий шаг в рамках стандартного, преднастроенного маршрута.• Local buttons (604) - take the dialogue with the client to the next step within the standard, preconfigured route.

Название кнопок и логика перехода настраивается администратором системы при создании диалога (Администрирование диалогов).The names of the buttons and the transition logic are configured by the system administrator when creating a dialog (Administration of dialogs).

• Глобальные кнопки (605) – позволяют управлять диалогом с клиентом по нестандартному сценарию: • Global buttons (605) - allow you to control the dialogue with the client according to a non-standard scenario:

перевод звонка на несколько шагов вперед или назад;transferring a call a few steps forward or backward;

проигрывание предзаписанной фразы на ответ/комментарий клиента и возврат на этот же шаг диалога;playing a pre-recorded phrase in response to the client's response/comment and returning to the same step of the dialogue;

проигрывание предзаписанной фразы на ответ/комментарий клиента и завершение диалога с фиксированным событием.playing a pre-recorded phrase in response to the client's response/comment and ending the dialogue with a fixed event.

Глобальные кнопки настраиваются в рамках тематики и одинаковые для всех диалогов в рамках данной тематики. Global buttons are customizable within a theme and are the same for all dialogs within a given theme.

• Кнопка «Отправить» - при нажатии на кнопку в системе фиксируется ответ клиента, введенный оператором в текстовом поле. Диалог переводится на следующий шаг в соответствии с настроенным маршрутом.• "Send" button - when the button is pressed, the system records the client's response entered by the operator in the text field. The dialog moves to the next step according to the configured route.

На фигуре 7 приведен вариант интерфейса работы специалиста по мониторингу. The figure 7 shows a variant of the interface of the monitoring specialist.

Окно диалога включает в себя:The dialog box includes:

• Область для поиска и отбора голосовых обращений клиентов (701) – в данном поле возможен поиск по параметрам время с… по, номер телефона, причина завершения, действие по завершению.• Area for searching and selecting customer voice messages (701) - in this field, you can search by the parameters time from ... to, phone number, termination reason, termination action.

• Статистика поиска (702) – сколько всего обращений по указанным параметрам найдено, в том числе в разбивке по причинам и действиям по завершению.• Search statistics (702) - how many hits were found for the specified parameters, including a breakdown by reason and completion actions.

• Список найденных диалогов (703) – в списке отображается информация по номеру, с которого осуществлялось обращение, дате и времени вызова, длительности, способу завершения, действия по завершению.• List of found dialogs (703) – the list displays information on the number from which the call was made, date and time of the call, duration, termination method, termination action.

• Переход к детальной информации по выбранному обращению (704) • Go to detailed information on the selected case (704)

• Прослушивание обращения (705) – прослушивание аудиозаписи обращения клиента.• Listen to the call (705) - listen to the audio recording of the customer's call.

Для отбора обращений пользователь может указать следующие параметры:To select requests, the user can specify the following parameters:

- Приложение – выбор заданного приложения из списка настроенных в системе- Application - selection of a given application from the list configured in the system

- Время обращения – период, в интервале которого произошло обращение клиента. - Call time - the period in which the client's call occurred.

- Номер телефона – номер телефона, с которого обращался клиент, только для обращений по телефонной линии. - Phone number - the phone number from which the client contacted, only for phone calls.

- Причина завершения – причина окончания разговора с клиентом. - Reason for ending - the reason for ending the conversation with the client.

- Действие по завершению – зафиксированное действие в системе по завершению разговора с клиентом.- End action - a recorded action in the system upon completion of a conversation with a client.

После заполнения параметров поиска обращений необходимо нажать кнопку «Найти». Система отобразит:After filling in the search parameters for requests, you must click the "Find" button. The system will display:

Статистику по результатам поиска обращений: общее количество обращений, соответствующее введенным критериям, и в разбивке по статусу завершения.Case search results statistics: total number of cases matching the entered criteria and broken down by completion status.

Список обращений, соответствующий введенным критериям, с указанием:A list of applications that meets the criteria entered, indicating:

- название приложения- app name

- номер, с которого был осуществлен вызов (для обращений посредством телефонной связи)- the number from which the call was made (for calls by telephone)

- дата и время обращения- date and time of request

- длительность обращения- duration of treatment

- способ завершения обращения (причина завершения обращения)- way to end the call (reason for ending the call)

- действие по завершении.- action on completion.

В данном интерфейсе пользователю доступны операции:The following operations are available to the user in this interface:

- Прослушать весь выбранный звонок клиента, нажав кнопку Play в пункте Audio- Listen to the entire selected customer call by pressing the Play button in the Audio item

- Просмотреть детальную информацию по выбранному обращению. Для этого необходимо перейти по гиперссылке в пункте «Номер» для выбранного звонка.- View detailed information on the selected case. To do this, follow the hyperlink in the "Number" item for the selected call.

На фигуре 8 приведен вариант интерфейса работы специалиста по мониторингу с детальной информацией по обращению клиента. Figure 8 shows a variant of the interface of the monitoring specialist with detailed information on the client's request.

Окно диалога включает в себя:The dialog box includes:

1. Список диалогов в рамках выбранного звонка.1. List of dialogs within the selected call.

2. По каждому диалогу отображается следующая информация:2. The following information is displayed for each dialog:

a. Дата и время диалога;a. Date and time of the dialogue;

b. Уровень доверия к распознанному тексту – принимает значения от 0 до 1, где 1 – высокая степень точности распознавания, 0 – текст не распознан;b. The level of confidence in the recognized text - takes values from 0 to 1, where 1 - a high degree of recognition accuracy, 0 - the text is not recognized;

c. Транскрипция – содержит распознанный текст ответа клиента в рамках диалога;c. Transcription - contains the recognized text of the client's response within the dialogue;

d. Результат – содержит правило, описывающее дальнейшее действие программы после распознавания;d. Result - contains a rule that describes the further action of the program after recognition;

e. Статус – результат сравнения распознанной фразой с грамматикой приложения;e. Status - the result of comparing the recognized phrase with the grammar of the application;

f. Кто ответил – указан источник распознавания текста: сервис ASR или операторf. Who answered - the source of text recognition is indicated: ASR service or operator

g. Этап диалога – название диалогаg. Dialogue stage - name of the dialogue

По выбранному обращению пользователь может: According to the selected request, the user can:

- Просмотреть список диалогов в рамках выбранного обращения;- View the list of dialogues within the selected appeal;

- Прослушать выбранный диалог;- Listen to the selected dialogue;

По каждому диалогу отображается следующая информация:For each dialog, the following information is displayed:

- Дата и время диалога;- Date and time of the dialogue;

- Уровень доверия к распознанному тексту – принимает значения от 0 до 1, где 1 – высокая степень точности распознавания, 0 – текст не распознан;- Level of confidence in the recognized text - takes values from 0 to 1, where 1 - a high degree of recognition accuracy, 0 - the text is not recognized;

- Транскрипция – содержит распознанный текст ответа клиента в рамках диалога;- Transcription - contains the recognized text of the client's response within the dialogue;

- Результат – содержит правило, описывающее дальнейшее действие программы после распознавания;- Result - contains a rule that describes the further action of the program after recognition;

- Статус – результат сравнения распознанной фразой с грамматикой приложения; - Status - the result of comparing the recognized phrase with the grammar of the application;

- Кто ответил – указан источник распознавания текста: сервис ASR или оператор;- Who answered - the source of text recognition is indicated: ASR service or operator;

- Этап диалога – название диалога.- Dialogue stage - the name of the dialogue.

В настоящих материалах заявки было представлено предпочтительное раскрытие осуществление заявленного технического решения, которое не должно использоваться как ограничивающее иные, частные воплощения его реализации, которые не выходят за рамки испрашиваемого объема правовой охраны и являются очевидными для специалистов в соответствующей области техники.In these application materials, a preferred disclosure of the implementation of the claimed technical solution was presented, which should not be used as limiting other, private embodiments of its implementation, which do not go beyond the scope of the requested legal protection and are obvious to specialists in the relevant field of technology.

Claims

1. A system for automating customer voice calls to the company's service departments, containing:

an interaction server (CORE) configured to:

• interaction with the Voice Applications server via Voice XML interpreter and MRCP client;

• receiving from it an audio stream/audio file for transcription;

• selection of the processing method - by the automatic speech recognition system (ASR) or the operator service module (OSR) of the audio stream/audio file in accordance with the transferred settings;

• audio stream/audio file routing sequentially to the ASR system;

• processing the response from the ASR system and checking the level of trust in the transcribed text, if the trust level is higher than the minimum set, it transfers the text to the Sematic service for semantic tags extraction, if the trust level is lower than the minimum set, it routes the call to the OSR service module;

• receiving an array of transcribed text and semantic tags from the OSR service module;

• transferring the results of recognition of the client's voice request and selected semantic tags to the Voice Applications server;

an OSR server configured to:

• routing of customer requests to the Operator's workstation;

• transferring the results of processing requests to the interaction server;

• registration and selection of an operator to process the request;

Operator's workstation containing a web-interface for processing a client's voice request with pre-configured response templates and providing playback of a sound fragment (Voice Sample) to the Operator;

Workstation OSR Configurator containing a web interface for configuring the operator's workstation and the OSR service module;

a Semantic service configured to extract keywords from the transcribed text according to a given grammar transmitted by an interaction server (CORE) based on a customized statistical model;

a Logger service configured to log the results of recognition of voice calls, clients, and selected semantic tags;

statistics service (Statistics), made with the ability to save information about all stages of the dialogue:

date and time of the beginning of the session;

date and time of the end of the session;

URL of the audio stream/audio file;

AWP of the Monitoring Specialist, containing a web-interface for viewing reports on the operation of the system and monitoring the correctness of recognition of voice requests from clients.

2. The system according to claim 1, characterized in that the interaction server (CORE), depending on the level of criticality of the dialogue, routes the client call recognition function to the OSR service module without first contacting the ASR system, in which the transmitted audio segment (VS) is listened to and mark the selection of the correct text recognition option, after which the OSR service module returns to the interaction server (CORE) an array of transcribed text and semantic tags.

3. The system according to claim 1, characterized in that the interaction server (CORE) routes the audio stream/audio file only to the ASR system, processes the response from the ASR system and checks the level of trust in the transcribed text, routing to the Sematic service to highlight semantic tags at the level trust level is higher than the minimum set, generation of a negative response when the level of trust is lower than the minimum set, transfer of the results of recognition of the client's voice request and selected semantic tags to the Voice Applications server.

4. The system according to claim 1, characterized in that the interaction server (CORE) first sends the audio segment (VS) to the ASR system, and after receiving the results of automatic recognition to the OSR service module, where they listen to the audio segment (VS) and check / supplement the results of automatic speech recognition of the client, depending on the quality of automatic recognition, confirm the data of the ASR system, or make appropriate adjustments, after which the OSR service module returns to the interaction server (CORE) an array of transcribed text and semantic tags.

5. The system according to claim 1, characterized in that the interaction server (CORE) simultaneously sends an audio segment (VS) to both the ASR system and the OSR service module, if the first response comes from the OSR service module, then the result of text recognition and semantic tags from the OSR service module are transmitted to the Voice Applications server, if the first response comes from the ASR system, then the probability of text recognition is additionally checked, if it is greater than the specified level in the system, then the interaction server (CORE) sends the result of automatic recognition by the ASR system to the Voice Applications server, if the trust level is less than the specified level in the interaction server (CORE), then a response from the OSR service module is expected.

6. The system according to claim 1, characterized in that after processing the client's speech and extracting semantic tags, the interaction server (CORE) accesses the customer's IT systems through the client's terminal and receives the text for speech synthesis, then handles the received text in speech synthesis system (TTS) and returns to the client terminal an audio file with a synthesized message according to the information requested by the client.

7. A method for automating customer voice calls to the company's service departments, comprising the steps at which:

establish a connection using the client terminal via the Media Resource Control Protocol (MRCP) with the Voice Applications server and send a request containing the identifier (ID) of the dialogue and the audio stream;

perform pre-processing of the call using the Voice Applications server, determine the beginning of speech using the Voice Activity Detection (VAD) function and timeouts;

transfer the ID-dialog and a unique resource pointer (URL) to the audio stream/audio file (VS) to the interaction server (CORE), and also provide interaction with the Customer's systems;

receiving, via the interaction server (CORE) from the client terminal, a dialog ID, a unique resource pointer (URL) to the audio stream/audio file, and transmitting the audio stream/audio file and text recognition settings to the automatic speech recognition (ASR) system;

transcribing and assessing the probability of correct sound recognition using the ASR system;

returning, by means of the ASR system, to the interaction server (CORE) an array of transcribed text and a sound recognition confidence level;

evaluate using the interaction server (CORE) the level of confidence in the recognition of the vocal segment (VS);

at a trust level above the minimum set, the text and the required grammar are transferred to the Sematic service to extract semantic tags;

allocate semantic tags from the transferred text according to the specified grammar by the Semantic service;

if the trust level is lower than the minimum set, the call is routed to the OSR service module,

using the OSR service module, listening to the audio segment (VS) and fixing the choice of the correct text recognition option;

using the OSR service module, returning to the interaction server (CORE) an array of transcribed text and semantic tags;

transmitting by means of the interaction server (CORE) to the Voice Applications server an array of transcribed text and semantic tags;

using the interaction server (CORE) logging the recognition results in the Logger service;

record and store information about all stages of the dialogue:

date and time of the beginning of the session;

date and time of the end of the session;

URL of the audio stream/audio file;

8. The method according to claim 7, characterized in that, using the interaction server (CORE), the client call recognition function is routed to the OSR service module without first contacting the ASR system, while the transmitted audio segment (VS) is listened to in the OSR service module and fixing the choice of the correct text recognition option, using the OSR service module, returning to the interaction server (CORE) an array of transcribed text and semantic tags.

9. The method according to claim 7, characterized in that additionally:

using the interaction server (CORE) routing the function of recognizing the client's request to the ASR system;

evaluate using the interaction server (CORE) the level of confidence in the recognition of the voice segment VS;

at a trust level below the minimum set, a negative response is generated to the Voice Applications server.

10. The method according to claim 7, characterized in that additionally:

using the interaction server (CORE) to sequentially send the audio segment (VS) to the ASR system, and after receiving the results of automatic recognition to the OSR service module;

using the OSR service module, listening to the audio segment (VS) and checking the results of the client's automatic speech recognition;

confirm or correct in the OSR service module the results of automatic audio-to-text transcription using the ASR system;

using the OSR service module, they send an array of transcribed text and semantic tags to the interaction server (CORE).

11. The method according to claim 7, characterized in that additionally:

simultaneously sending the audio segment (VS) to both the ASR system and the OSR service module;

when receiving a response from the ASR system or the OSR service module in the interaction server (CORE), the order of received responses is evaluated in accordance with the following order: if the first response comes from the OSR service module, then the recognition result and semantic tags of the service are transmitted to the Voice Application Server OSR; if the first response comes from the ASR system and the transmitted probability of text recognition is greater than the specified level in the interaction server (CORE), then the result of automatic recognition by the ASR system is transmitted to the Voice Application server; if the first response comes from the ASR system and the probability of OCR is less than the specified level in the interaction server (CORE), then a response from the OSR service module is expected.

12. The method according to claims 7-11, characterized in that additionally:

after processing the client's speech and extracting semantic tags, they are accessed using the interaction server (CORE) through the Voice Applications server to the customer's IT system and receive text for speech synthesis;

using the interaction server (CORE), the text received from the Customer's IT system is transferred to the speech synthesis system (TTS);

using the interaction server (CORE), an audio file with a synthesized message is returned to the client terminal according to the information requested by the client.

13. A computer-readable medium for automating customer voice calls to the company's service departments, containing processor-executable instructions that cause hardware to interact to perform the method according to any one of paragraphs. 7-12.

14. Machine-readable media according to claim 13, characterized in that produce using the interaction server (CORE) routing the recognition function of the client's call to the OSR service module without first contacting the ASR system;

using the OSR service module, the operator listens to the transmitted audio segment (VS) and fixes the choice of the correct text recognition option;

using the OSR service module, they return to the interaction server (CORE) an array of transcribed text and semantic tags.

15. A computer-readable medium according to claim 13, characterized in that additionally:

if the trust level is below the minimum set, a negative response is generated to the Voice Applications server.

16. A computer-readable medium according to claim 13, characterized in that additionally:

using the OSR service module, the operator listens to the audio segment (VS) and checks the results of the client's automatic speech recognition;

confirm or correct in the OSR service module the results of automatic audio transcription in text using the ASR system;

17. The computer-readable medium according to claim 13, characterized in that additionally:

upon receipt of a response from the ASR system or from the OSR service module, using the interaction server (CORE), the order of the received responses is evaluated in accordance with the following order:

if the first response comes from the OSR service module, then the recognition result and semantic tags of the OSR service module are transmitted to the Voice Application server; if the first response comes from the ASR system and the transmitted text recognition probability is greater than the specified level in the interaction server (CORE), then the result of automatic recognition by the ASR system is transmitted to the Voice Application server;

if the response from the ASR system is the first in time and the probability of recognizing the text is less than the specified level in the interaction server (CORE), then a response from the OSR service is expected.

18. A computer-readable medium according to claims 13-17, characterized in that additionally:

after processing the client's speech and extracting semantic tags, they make an appeal using the interaction server (CORE) through the Voice Message server to the Customer's IT system and receive the text for speech synthesis;