GB2368441A

GB2368441A - Voice to voice data handling system

Info

Publication number: GB2368441A
Application number: GB0026158A
Authority: GB
Inventors: Jonathan Paul Richings
Original assignee: Individual
Current assignee: Individual
Priority date: 2000-10-26
Filing date: 2000-10-26
Publication date: 2002-05-01
Also published as: GB0026158D0

Abstract

A voice to voice data handling system comprises a multiplicity of mobile, e.g. automobile borne, sub-systems linked to a remote Internet Server by way of individual GSM and GPRS facilities 21,23. Each sub-system has a hands-free facility 11 and a microphone 13 and speaker 15. Each sub-system has a 'ThinSR' facility 19 capable of recognizing a limited range of simple pre-programmed voice commands and otherwise to transmit the command to the Server. A 'ThickSR' facility of the server, with greater power of command interpretation, responds, if successful in its recognition of the command, by causing the required information to be transmitted, through the Internet, the relevant mobile sub-system.

Description

Voice Responsive Data Handling Systems This invention relates to voice responsive data handling systems.

According to the invention, a voice responsive data handling system is constituted as a system as set out in the claims or any of them of the claims schedule hereof and the substance of said claims and their interdependencies are, notionally, set out at this place, also.

The system is intended, primarily, to provide, in a mobile (typically automotive) environment, means for enabling such matters as voice control of navigation, personal information management, and internet/intercom access.

It is particularly, though by no means exclusively, concerned to provide voice to voice communication. By"voice to voice communication" is meant the response to spoken commands with spoken responses.

Typically, the system consists of a local mobile processing sub-system with GSM communications and GPS and a remote host server.

The system is capable of providing, amongst other things, . voice to voice navigation instructions, including traffic information and location based services; . voice to voice personal information management data such, for examples, as diary and/or address and telephone data; . voice to voice email, and speech activated telecom dialling ; . voice over IP.

The system advantageously relies on GPRS technology.

Although voice responsive data processing systems are becoming available in various applications, they are costly and, in mobile environments, they suffer as a result of space limitations and power requirements. This limits functionality of such systems to simple voice dialling.

Systems in accordance with the present invention enable mobile local data processing systems of limited functionality to access remote speech recognition servers. It also allows speech control of server functions such as navigation, email, web browsing and voice mail.

In a mobile environment, a car environment, in particular, voice control is advantageous. Visual displays, of the head-down type, e. g dashboard panel displays, are, with the vehicle in motion, often impractical, hazardous even. There are many circumstances which call for a facility for communication, by the person in control of a vehicle, even when the vehicle is in motion, some of which are mentioned above.

Such applications have two requirements in common, firstly, access to internet based systems and sophisticated user interface. The user interface to mobile applications can best be fulfilled by a wholly speech orientated interface. The capabilities of practical mobile devices are, for the foreseeable future, at least, far from adequate for enabling voice control of such functions.

By distributing speech recognition between local mobile and remote server data processing systems, as now proposed, simple commands may be handled locally, more complex demands being passed to the more powerful server for processing and communication back to the calling location.

The distributed processing is mirrored in the applications to be controlled by the voiced demands; simple functions, such as dialling a mobile phone are handled by the mobile local data processor, the execution of more complex functions, navigation and route planning, are dealt with remotely by the server, and the solution of such complex functions communicated to the caller in speech.

The accompanying figure is a block schematic diagram showing a subsystem of a system in accordance with the invention, being one of a number, a large number of essentially similar sub-systems linked for communication with a common server by way of the Internet and being, in consequence subject to Internet protocols.

Expressions and, abbreviations therefor, employed in the ensuing description for various next subordinate level component parts of the sub-system are next set out together with a short note regarding their several purposes and/or functions in relation to one another are next set out.

"Hands-Free Unit"comprises electronic means, commonly found in incar and office telephone systems, enabling telephone conversations to be held without the need for handset or headset. The speaker of the Unit is to be capable of being heard when several inches away from the user, and, likewise, the microphone is to be sufficiently sensitive to pick up the sound of the user's voice from a similar distance. The Unit often incorporates electronics to filter echoes and background noise that may be present.

"Codec" (Coder/Decoder) comprises means adapted to translate data in either sense between analogue audio and the digital equivalent.

"Voice-Over-Internet-Protocol (VOIP) comprises a secondary level of voice encoding that provides compatability with communication via the Internet, with the advantage of free access to long distance telephone calls.

"Thin Speech Recognition" ("Thin SR") comprises a logic sub-section adapted to recognize pre-programmed voice prompts. The term'thin' implies that the range of voice prompts is confined to a small number of short sounds. This limited voice recognition facility is a feature shared with various present day mobile phones. The process of recognizing a voice sample may be governed by various algorithms with which the system is programmed, the specific algorithms employed being varying according to specific speech recognition concepts implemented in the system. All speech recognition techniques rely on some form of mechanism to break the sample down into discrete analysable parts and it is this pre-processed data that is to be sent elsewhere in the system, a'FatSR', in the event that the'Thin SR'is unable to respond positively to the voice input. It is of secondary importance as to which algorithm is employed, what is important is that the preprocessing reduces the to the minimum the amount of data that needs to be passed for processing at the'FatSR', hereinafter referred to, and, so, to reduce the air-time data traffic.

"Fat Speech Recognition" ('FatSR') comprises, as implied above, a sub-system performing processes similar to those employed in'Thin SR', though with very much greater processing power and memory, this in order to enable it to recognize and to respond to a very much greater range of commands addressed to it.'FatSR'has the potential, also, to access huge independent data resources, databases especially, in order to respond appropriately. It is, in practice, an Internet connected server and, so, is capable of accessing many other resources such, for examples, as routing servers and on-line phone books, "Global System for Mobile Communication" ('GSM') is a communication network the role of which is given in the name of this network.

"General Packet Radio Service" ('GPRS') section of the system is a packet switched overlay subsystem of GSM. It offers both additional functionality of GSM hardware and additional capability of GSM network.

Its importance resides in enabling reliable non-voice data communication over the GSM.

"Text-to-Speech sub-system"refers to a state of art technology subsystem capable of taking a text string (such as may be written into a word processor) and converting this into synthesized voice audio, the

benefit of the'Text-to-Speech'facility being that, as a text string, the data occupies much less data storage (and, therefore,'airtime') than its audio, digital or analogue, equivalent.

The diagram shows the sub-system in its several next-subordinate component parts. These comprise, as indicated, a hands-free unit 11 linked with a microphone 13 and a speaker 15 ; a Codec 17; a'ThinSR' facility 19 ; a GSM facility 21, which includes, as a part thereof, a GPRS facility 23; a VOIP facility 25; and a Text-to-speech facility 27, the several said component parts of the sub-system being linked with one or more other such parts by way of data transmission paths as shown, and the GSM facility 21 being in communication with the remote System Server (not shown), the latter being, as noted previously, provided with the'ThickSR'facility.

All of the aforestated facilities may be state of the art.

So, for example, the hands-free unit 11 may be of a sort which is commonplace, particularly in in-car telephone systems. It comprises the electronics necessary to permit conversations to be conducted without the need for any sort of handset or headset. Typically, a hands-free unit, as 11, incorporates electronics (not shown) adapted to filter echoes and background noise, being an unwanted side effect arising from spacing from the user of a suitably sensitive microphone 13 and suitably powerful speaker 15.

The hands free unit 11 communicates with the Codec 17, where the analogue audio speech of the hands free unit is digitized. In the example, the Codec 17 serves only in the conversion: analogue to digital. The term'Codec'is commonly employed even where, as here, only one of its functions is exercised.

Digitized speech data is passed from the Codec 17 to data discriminator means 19, the'Thin SR'. It is the role of the'Thin SR' facility 19 to recognize pre-programmed voice prompts. The appellation 'Thin'implies, in this application, that the ability of the discriminator means 19 is purposely limited to a small number of simple, particularly short, sounds. This feature is characteristic of voice recognition capabilities in certain presently available mobile phones.'Voice Recognition'embraces a range of different processes respectively characteristic of different recognition techniques known to the art.

Speech recognition methods rely on characteristic algorithms in accordance with which speech samples are broken down into discrete analysable components, and central to the present invention is the

recognition at the'Thin SR'facility 19 as to whether the sample is to be capable of being processed locally as'Thin SR'or, as will be made clear hereinafter, is to be passed, by the GSM 21 for processing at a remote server, as'Thick SR'.

For the purposes of the invention, it is not to the point which particular recognition technique may be employed. What is to be borne in mind at all times in implementing the system of the present invention, is the current hardware costs involved in local processing of speech samples in the mobile environment, e. g. in an automobile, as compared with the costs incurred in'air time'involved in remote processing, at a'Thick SR'Server, of such data samples.

Simple commands, such, for example, as DIAL, are recognizable by the

discriminator means, the'Thin SR'facility 19, locally processed outputs from the'Thin SR'facility 19 being passed by way of the GSM 21 and/or the GPRS facility 23 and the Internet, to the hands-free unit 11 of another, remote, sub-system (not shown) of the system, the digitized voice command being converted to its analogue equivalent by means of a text to speech means, as 27. More complex commands are passed to the VOIP facility 25 and then from the GPRS 23 and GSM 21 to the'Fat SR'facility of the remote system Server. Commands interpreted at the Server produce an appropriate response, these being transmitted from the aerial of transmitter means of the Server to the aerial of receiver means of an addressed sub-system, and from thence to the Hands-free facility 11 of the calling sub-system and the speaker 15 thereof being appropriately activated, this by way of the GSM and GPRS, facilities 21 and 23, and the Text to Speech facility 27 of the calling sub-system.

The system may, as previously indicated, be constituted as a remote processing, voice controlled, navigation system, involving the combination of local with remote speech recognition. Local processing involves high local hardware costs and may be otherwise unacceptable for space and weight reasons. Split processing allows for low local hardware costs, and space and weight penalties. For example, processing the echo and noise reduction permits increased compression and the reduction of transmitted data. Possible reduction command speech to phoneme level would reduce transmission traffic yet further.

From another viewpoint, the invention encompasses the combination of remote and local navigation processing. Following appropriate spoken commands, a route (being a list of topographical details) may be calculated remotely at the server, the route list being passed to a local data processing means. The local processor compares GPS data with the route list and activates the loudspeaker of the hands-off unit of the sub-system, thereby to provide the driver with driving instructions as to the route to be taken. Various events prompt the route list to be recalculated. These events, which may represent deviations, local or remote, from the list previously presented.

Yet again, the invention extends, in its range of application, to the combination of remote and local personal information management planning. An example might be an address book seamless in both locally and remotely derived information. Seamless in that the information may incorporate such matters as a local address book, an office address book, together with information derived from such sources as Directory Enquiries.

Other functionality (not necessarily voice-in/voice-out in character) provided by the distributed system to any of its many subsystems, might include control of in-vehicle systems remote vehicle diagnostics and service scheduling.

The functioning of the several above sub-systems in the operation of the system is further explained by discussion of a few examples: A. Local Voice Recognition Operation (Mobile Originated) A simple operation may be to command the system to dial the on-board phone in preparation for a speech call. In this case, the user's voice command passes from the microphone 13 of the hands free unit 11 to the Codec 17 where it is converted to a digital equivalent. The digitized

voice command passes to the'ThinSR'facility 19 where an attempt is made to'recognize'the command. If successful, the'ThinSR'facility 19 controls the GSM facility 21 to dial the number entered on the onboard phone.

B. Remote Voice Operation (Mobile Operated) If the'ThinSR'facility 19 fails to recognize the voice command then the voice data (or partially analysed equivalent) is passed by way of the GPRS facility 23 of the GSM facility 21, being transmitted

therefrom to the remote Server, incorporating the'FatSR'facility, which, as with the'ThinSR'facility 19, attempts to recognize the voice command. If this proves to be successful, then depending upon the command, the appropriate response is made, the on-board Text-to Speech facility 27 causing activation of the on-board speaker 15, accordingly.

C. Remote Voice Response Operation (Server Operated) In this mode, if at the remote Server, the'FatSR'facility thereof recognizes a command to provide, as an example, relatively complex navigation data for the attention of the user, such recognized command is passed to a mapping server facility (not represented) and this last mentioned server responds by computing appropriate vehicle routing information, passing such computed information back to the'FatSR' facility in the form of a multiplicity of text strings constituting

routing instructions. From the'Fat SR'facility, the routing instructions are transmitted by way of the GPRS facility 23 of the GSM facility 21 and then transmitted for translation, as before mentioned, at the text-to-speech sub-system 27 for relaying to the user by means of the on-board speaker arrangement 15.

Alternatively, if the'ThinSR'facility 19 is not able to interpret a request to call a certain telephone or like address, in that that address is not stored locally, the greater resources of the 'FatSR'facility at the remote Server can be deployed, using its own information or information retrieved from an extra-system database for such information, the data so retrieved being communicated, as before, by way of the GPRS facility 23 of the GSM facility 21, ultimately for translation at the text-to-speech facility 27 and voice synthesized reproduction, as before, at the speaker 15. Thereafter, the user, being provided with the requisite information, connection with the desired phonic address by recourse, by the user, to the GSM network.

D. VOIP Voice Communication As an alternative to normal GSM voice communication, voice data might be routed to the VOIP 25, then transmitted, via the GPRS facility 23 of the GSM facility 21 to the Internet,-and, finally, to a VOIP of another telecoms receiving system. Incoming VOIP encoded voice data is, similarly, routed back through the GPRS, VOIP and out to the speaker via the hands free subsystem.

Claims

CLAIMS 1. A voice responsive data handling system which comprises: a multiplicity of sub-systems each comprising data processing means (hereinafter referred to each as"local data processor") being data processing means competent to respond to data derived from any voice input demand within a limited range of such demands; common to all of said sub-systems, second data processing means (hereinafter referred to as"the server"), being data processing means competent to respond to data derived from voice input demands being demands within of a range of demands not within the competence of local data processors, or any of them, for processing; a multiplicity of transmitter means respectively associated with said multiplicity of sub-systems; a multiplicity of receiver means respectively associated with said multiplicity of sub-systems; and a multiplicity of logic means respectively associated with the several said sub-systems; and in which each said logic means is operative to cause the associated local data processor to process voice input demands, or components thereof, whenever said demands, or components thereof, as the case may be, are within the competence of the local processor for processing and, otherwise to cause said associated transmitter means to transmit said voice-derived demands or components thereof to receiver means of said server for processing thereby and the transmission of the processed result from the server to the receiver means of the sub-system from which a voice demand to said remote data processor emanated.
2. A voice responsive data processing system as claimed in claim 1 in which the logic means of each sub-system incorporates individual means to discriminate between voice demands, or components thereof, within the processing competence of the local data processor associated with the sub-system and those that are not, and to cause said demands or demand components to be routed to said local processor or to the server, as the case may be.
3. A voice responsive data processing system as claimed in claim 1 or 2 in which: said multiplicity of sub-systems are transportable; communication between the sub-systems or any of them and the server is by wireless transmission; and each of the several said subsystems comprises: a hands free facility; a CODEC facility; a GSM facility incorporating a GPRS facility; a VOIP facility; a'Thin SR'facility ; and a Text to speech facility, all of the said facilities being as described and/or as defined hereinbefore; and being linked each with one or more of the other said facilities by way of communication paths in an arrangement substantially as hereinbefore described with reference to the accompanying drawing.
4. A voice responsive data processing system substantially as hereinbefore described with reference to the accompanying drawing.