WO2000062222A1 - Interactive voice unit for giving instruction to a worker - Google Patents
Interactive voice unit for giving instruction to a worker Download PDFInfo
- Publication number
- WO2000062222A1 WO2000062222A1 PCT/US2000/010143 US0010143W WO0062222A1 WO 2000062222 A1 WO2000062222 A1 WO 2000062222A1 US 0010143 W US0010143 W US 0010143W WO 0062222 A1 WO0062222 A1 WO 0062222A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- worker
- state
- utterance
- work state
- utterances
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B19/00—Programme-control systems
- G05B19/02—Programme-control systems electric
- G05B19/418—Total factory control, i.e. centrally controlling a plurality of machines, e.g. direct or distributed numerical control [DNC], flexible manufacturing systems [FMS], integrated manufacturing systems [IMS] or computer integrated manufacturing [CIM]
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B2219/00—Program-control systems
- G05B2219/30—Nc systems
- G05B2219/35—Nc in input of data, input till input file format
- G05B2219/35453—Voice announcement, oral, speech input
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B2219/00—Program-control systems
- G05B2219/30—Nc systems
- G05B2219/35—Nc in input of data, input till input file format
- G05B2219/35495—Messages to operator in multimedia, voice and image and text
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/02—Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]
Definitions
- This invention relates generally to a method and apparatus for using speech recognition in a directed resource application, for example, in a production, picking, or assembly line environment and, more specifically, to a method and apparatus for increasing the efficiency and operability of a speech recognition application in such a directed resource application environment.
- speech recognition applications Due to advancements in computer technology and software programming techniques, in conjunction with a continuously growing understanding of the mechamcs and characteristics of speech, speech recognition applications have made tremendous strides in acceptance and usage.
- a conventional software program operating on a computer and using speech recognition the sounds, words, or sentences uttered by a person are detected and one or more electrical signals representative of the sounds, words, or sentences are created and used by the computer to control or guide the software program.
- speech recognition applications are now available for allowing people to dictate letters, memos, etc. directly into a computer, for performing speaker identification, and for assisting in inventory management and shipping.
- Some examples of currently available commercial speech recognition software include the Dragon DictateTM software and the Dragon Naturally Speaking
- speech recognition is fairly straightforward. That is, a speaker utters a word, phrase, or sentence into a microphone. A signal processing subsystem extracts acoustic information from the uttered word, phrase or sentence that exhibits characteristics consistent with human language. A speech recognition subsystem then finds the best match between the extracted acoustic information and electronically stored representations of the acoustics of known words or phrases. The speech recognition subsystem then produces a text version of the verbal utterances. In practice, however, accurate and reliable speech recognition is considerably more complicated.
- the digital electrical signal is then processed or decoded to determine the word, phrase, or sentence uttered by the user.
- Speech recognition can be applied in a production, assembly line, picking, manufacturing, or other directed resource environment to increase the efficiency of workers.
- such environments are typically very noisy, and this generally reduces the accuracy of speech recognition applications. That is, the noisier the work environment, the more likely that inaccurate decoding of a word, phrase, sentence, or the utterance will occur.
- the accuracy of speech recognition may also be limited by the different speech and enunciation patterns of workers and by the number of different workers. Even a specific user may pronounce the same word, phrase, or sentence two or more different ways at different times, particularly if the environment that the worker is in changes.
- the worker may pronounce a sentence or utterance one way. But if the work environment for the worker is noisy, the worker may speak loudly or even shout when pronouncing the sentence or utterance, thereby possibly changing the spectral characteristics of the sentence or utterance. Accommodating such acoustic variations between speech instances is a challenge for automatic speech recognition systems.
- Workers in a manufacturing, picking, assembly line, production or other directed resource environment are directed by a list of tasks they must perform.
- the workers are, therefore, resources that are directed by a centralized controller or other system that provides instructions to the workers about the task(s) to be completed by the workers.
- the workers must then somehow report on the completion of their tasks.
- the list of tasks to be performed by a worker takes the form of a paper check list, which the worker consults to get the next task and marks to show the task completion status.
- a picking application for example, where a worker fulfills product orders, by picking selected products or items from an assortment of bins, a conveyer belt, etc., the worker reads a product order from a list and marks the list for how much of the order was filled.
- this list may be computerized, so that a database for tracking the completion of tasks (for example, fulfillment of orders and resulting inventory changes) can be updated automatically. But, if the worker needs to use his or her hands or eyes to perform the tasks, using a list requiring the use of hands or eyes (computerized or not) is at best an interruption of work and at worst a significant distraction.
- speech recognition and/or speech synthesis may be used as a component of the interaction between the user and the task list. But unless the interface is truly speech-centered, using well- designed, robust, and natural speech interaction dialogs, the interface may continue to cause distractions or other interruptions for the worker. Directed resource applications typically, but not necessarily, involve locally mobile workers in a specific work environment.
- Locally mobile workers are those workers who need to move around to do their job, but generally stay within a local area, a single building for example.
- Another hypothetical example of a directed resource application is a system providing nurses with instructions for giving medication to patients in a hospital. A nurse making rounds and seeing patients could request and receive work in the form of medications required by the patient being visited.
- the system When this work is completed, the system would update a database of when patients received their medications and the types and amounts of medications received. Nurses often need to be locally mobile, with both hands and eyes free to dispense medications to patients. The system would also need a pause feature so that the nurse can interact with the patient verbally.
- speech recognition technology there remains a need for an accurate and reliable system for using speech recognition in a manufacturing, picking, assembly line, production, or other directed resource environment and for providing a speech-centered user interface for directed resource applications using locally mobile workers.
- the system will also increase the efficiencies and accuracy of a worker in the environment while allowing the worker to use phrases, terms, sentences, or other utterances that are natural for the worker and that make intuitive sense to the worker. Disclosure of Invention
- Another object of the present invention is to provide a method and apparatus for speech- centered user interaction in a directed resource application.
- Yet another object of the present invention is to provide a method and apparatus for increasing the accuracy of speech recognition in a directed resource application or where a locally mobile worker receives one or more tasks to complete.
- a system in which a worker can operate as a directed resource in one or more states and transition from one state to another by making one or more audible statements includes a server and a client device that can communicate with the server, the server capable of providing information to the worker via the client device and allowing the worker to function in an obtain work state in which the worker receives one or more tasks from the server via the client device to be completed by the worker, a process work state in which the worker completes said one or more tasks, and a report work state in which the worker reports status information to the server via the client device.
- a directed resource model for a worker wherein instructions are provided to the worker from a server via a client device, includes an obtain work state in which the worker receives one or more tasks to be completed by the worker from the server via the client device, a process work state in which the worker completes said one or more tasks, and a report work state in which the worker reports status information to the server via the client device, further wherein said worker can initiate transition from the obtain work state to the process work state by providing a first audible input signal to the client device and the worker can initiate transition from the process work state to the report work state by providing a second audible input signal to the client device.
- a work model for a worker wherein instructions are provided to the worker from a server via a client device, includes a first state of allowed worker activity, a second state of allowed worker activity, a third state of allowed worker activity, a first utterance that transitions the worker from the first state of allowed worker activity to the second state of allowed worker activity when the worker provides the first utterance to the client device, and a second utterance that transitions the worker from the second state of allowed worker activity to the third state of allowed worker activity when the worker provides the second utterance to the client device.
- a method for directing a worker to complete one or more tasks includes establishing a plurality of allowed states of activity for the worker and establishing a group of at least one allowed utterance for use by the worker for each of the allowed states, wherein at least one utterance in at least one group of at least one allowed utterance is a transition utterance that allows the worker to transition from the one of the allowed states to another of the allowed states.
- Figure 1 is a block diagram of a system for using a speech recognition application to monitor, provide information and instructions to, and receive requests and statements from workers;
- Figure 2 is general state diagram for a speech recognition application that can be used with the system of Figure 1;
- Figure 3 is a more specific implementation of the state diagram of Figure 2. Best Mode for Carrying out the Invention
- a system 30 for implementing speech recognition in a directed resource application or environment includes a local application computer or server 32 on which an application using a speech recognition or a speech-centered interface may operate, remote units or clients devices 34, 36, 38, 40, 42 that interact with the local application server 32 and which may be used or worn by users or workers so that instructions and/or information can be sent from the local application server 32 to the users or workers via the remote units 34, 36, 38, 40, 42 and so that the workers can send information and or requests to the local application server 32 via the remote units 34, 36, 38, 40, 42.
- the remote units 34, 36, 38, 40, 42 and the local application server 32 form a client/server network that allows workers located at or wearing the remote units 34, 36, 38, 40, 42 to interact or communicate with the local application server 32.
- the system preferably uses speech recognition or a speech recognition interface to allow workers located at or wearing the remote units 34, 36, 38, 40, 42 to provide audible input signals to the remote units 34, 36, 38, 40, 42. That is, workers can "speak” or provide audible utterances to the remote units 34, 36, 38, 40, 42 to provide input signals to the remote units 34, 36, 38, 40, 42.
- the audible input signals from the workers can be decoded by either the remote units 34, 36, 38, 40, 42 or transmitted or passed by the remote units 34, 36, 38, 40, 42 to the local application server 32 for decoding by the local application server 32.
- the remote unit can pass or transmit the corresponding decoded signal or some equivalent, interpretation, or translation onto the local application server 32.
- the speech input signals from a worker are decoded at the local application server 32 so as to reduce processing and memory requirements in the remote units 34, 36, 38, 40, 42 being used or worn by the worker.
- Remote units such as the remote units 36, 38, 40
- Other remote units such as the remote units 34, 42
- the remote units 34, 36, 38, 40, 42 work as general devices in the system 30 that do not require any or need specific knowledge or information regarding applications operating on the local application server 32 or the enterprise server 44.
- the system 30 can be used in or with a variety of applications that can be changed by updating the local application server 32 and or the enterprise server 44 without usually requiring changes or updates to the software operating on the remote units 34, 36, 38, 40, 42.
- the system 30 may also include an enterprise system or server 44 which monitors and/or controls operation of the environment in which the local application server 32 is operating and which may provide information or instructions to the local application server 32 to govern or influence operation of the apphcation(s) ninning or otherwise operating on the local application server 32.
- the enterprise server 44 may be a computer or computer system that monitors and controls the operation of an assembly line.
- the local application server 32 may be used to monitor and provide information or instructions to workers working in a particular part of the assembly line.
- the local application server 32 provides localized control, monitoring, instructions, or information to the workers while the enterprise server 44 may monitor and control a larger number of workers via a plurality of local application servers and/or other local enterprise servers.
- the workers form a directed resource for the local application server 32 and/or the enterprise server 44 such that the workers can receive tasks from the local application server 32 and/or the enterprise server 44 to be completed by the workers.
- the enterprise server 44 may also be connected to other network resources 46 so that information and instructions can be passed to and from the enterprise server 44.
- the network resources 46 may include a database or database server, a file server, a log server, etc.
- the enterprise server 44 may be connected to the local application server 32 and/or the other network resources 46 via a direct connection, wide area or local area network, or any other kind of computer network or cellular, radio, or wireless network.
- the local application server 32 is not illustrated as connected to the network resources 46 other than through the enterprise server 44.
- the local application server 32 may be connected to the network resources 46 in other ways without departing from the scope or essence of the present invention.
- the local application server 32 may also be connected or networked to the other network resources 46.
- a significant advantage of the system 30 of the present invention is that the system 30 allows communication to and from workers from and to a central monitor and control server, such as a local application server or enterprise server, such that accuracy and efficiency of individual workers and groups of workers are increased and such that the workers can function as locally mobile workers or directed resources.
- a central monitor and control server such as a local application server or enterprise server
- Such efficiency improvement is a direct result of incorporating speech recognition and speech synthesis techniques into a work environment in a way that is natural for the workers and makes intuitive sense to the workers while providing sufficient decoding accuracy for different pronunciations of words, terms, and phrases to be used or otherwise uttered by the workers as input signals to a remote unit.
- a significant feature of the system 30 is that the worker is preferably allowed to conduct activity only in a finite number of operating states. Each individual operating state is directed to one or more subsets of functions or tasks to be performed by the worker while the worker is in the individual state.
- the interaction between the worker and the system 30 is preferably conducted through speech and audio prompts, i.e., a speech centered interface, such that the worker's eyes and hands are available or free for other tasks.
- the speech prompts provided to a worker may be provided by text-to-speech conversion by a remote unit of a signal or information sent or communicated to the remote unit from the local application server 32.
- the worker uses a limited or constrained set of allowed audible input signals or utterances to transition between operating states and to provide information and requests via a remote unit to the local application server 32 while the worker is operating in a particular state.
- Each of the input signals or utterances allowed for each state preferably has a high probability of being decoded accurately by the remote unit or the local application server 32.
- the audible input signals or utterances also preferably make intuitive sense to the workers and naturally tie in to the operating states.
- Each state may have its own set of allowed utterances that differ from other states' sets of allowed utterances and which will be accepted or recognized by the local application server 32 for the state.
- the worker is locally mobile and guided as a directed resource by the local application server 32 through the receiving, performance, completion, and reporting of tasks provided or communicated to the worker from the local application server 32 via a remote unit or other client device.
- a system using speech recognition such as the system 30, can be used in a production, assembly line, picking, manufacturing or other work or directed resource environment or operation to increase the efficiency of workers.
- voice or speech recognition may be used by the system 30 to receive or accept replies or statements from the worker or requests from the worker for new instructions.
- information or instructions may be given to the worker by a remote unit visually or audibly through speech synthesis or artificially created oral commands. Such instructions to the worker might include what items to pick and how many of each item to pick.
- the instructions would be generated by the local application server 32 and communicated to the worker via a remote client device, such as the remote client 34.
- the worker might wear a headset with ear phones and a microphone, which form part of the remote unit 34, so that the interaction between the worker and the system 30 is conducted primarily with speech and audio prompts, thereby leaving the worker's hands and eyes free.
- the remote unit preferably communicates directly or via wireless transmission to a central computer or server, such as the local application server 32, which provides information and instructions to the worker via the remote unit and receives requests for information or instructions from the worker provided as audible inputs to the remote unit.
- workers may be positioned at or even wearing the remote units or clients 34, 36, 38, 40, 42.
- the local application server 32 may provide information and instructions to the workers via the remote units or clients 34, 36, 38, 40, 42 and receive audible requests for information or instructions from the workers via the remote units or clients 34, 36, 38, 40, 42.
- the allowed operational states may include a first state where the worker is obtaining or receiving work or tasks via a remote unit from the local application server, a second state where the worker is processing or completing such work or tasks, and a third state where the worker is reporting via the remote unit to the local application server on the worker's progress.
- Each of the three states may have a different set of allowed audible input signals or utterances that the worker can use for communication to the local application server 32 via the remote unit. More specifically, in the first state, the local application server 32 might provide one or more instructions to the worker via a remote unit to tell the worker what items to select from a moving conveyer belt or bin(s). After receiving the instructions while in the first state, the worker may transition from the first state to the second state to complete the assigned instructions. Once the worker has completed the instructions while in the second state, the worker may transition from the second state to the third state by providing an audible input to the remote unit to signal such completion by the worker to the local application server 32 and to signal transition to the third state.
- the worker may also place the completed order of selected items on a conveyer belt that will deliver them to a loading dock for shipment.
- the worker may transition back to the first state and the local application server 32 may provide one or more additional or new instructions or tasks to the worker via the remote unit to allow the worker to pick new items and fill a new order. Since the worker is preferably wearing the remote unit, such that the worker is mobile and can easily use his or her arms and hands, all commands and statements from the local information server are preferably provided audibly via ear phones to the worker and the input from the worker to the remote units is preferably provided by the worker into a microphone that forms part of the remote unit.
- the utterances made by the worker can be decoded either by the remote unit or by the local application server 32.
- the local application server 32 may send and/or receive information from the enterprise system or server 44. Based on information received from the enterprise server 44, the local application server 32 may modify instructions or information previously sent to workers at the remote units 34, 36, 38, 40, 42, send new instructions or information to workers at the remote units 34, 36, 38, 40, 42, etc.
- a generalized state diagram or work model 80 is provided that can be used with the system 30 and that allows workers to function as directed resources.
- a significant feature of the present invention is that a worker is allowed to work or be active in a finite number of operational states, each state having a predefined or limited number of tasks, functions, or activities that can be performed by a worker and a limited number of allowed audible utterances or input signals that can be used by the worker to communicate with the local application server 32 which is directing the worker's activities.
- the allowed utterances for each state preferably make intuitive sense to the worker and have a high probability of being accurately decoded by either a remote unit or the local application server 32.
- the directed resource or work model 80 provides a general governing framework for a worker using speech recognition and audible prompts and/or feedback in a manufacturing, production, assembly line, picking, or other directed resource work environment or model, as will be described in more detail below.
- the remote unit 34 preferably includes a microphone and headphones and is worn by the worker to allow significant freedom of movement by the worker or such that the worker can be locally mobile.
- the remote unit preferably converts all sound or audible signals or utterances made by the worker into a digital or analog signal representative of the sound or audible signals or utterances and transmits them to the local application server for decoding by the local apphcation server.
- the worker By using speech recognition for providing input from the worker to the remote unit 34, the worker provides information into the remote unit 34 orally or audibly such that no manual entry via keyboard, mouse, etc. is required, thereby simplifying the information entry process for the worker.
- commands or instructions to the worker regarding the type and sequence of information to be provided by the worker to the remote unit 34 are also preferably provided orally or audibly to the worker via the speakers or headphones.
- the system 30 preferably contains a speech synthesis or text to speech capability to allow commands, information, and instructions to be audibly communicated to the worker.
- This text to speech capability can reside either on the remote unit 34 or on the local apphcation server 32 but preferably is implemented in a client-server fashion on both the local apphcation server 32 and the remote unit 34.
- speech synthesis capability is well known in the art and does not need to be described in further detail for purposes of elaboration or explanation of the present invention.
- the remote unit 34 Before a worker has initiated contact with the local application server 32 from the remote unit 34, the remote unit 34 will be in a wait state 82. That is, the remote unit 34 is generally inactive and little, if any, communication is occurring between the remote unit 34 and the local apphcation server
- the remote unit 34 since no worker is positioned at or using the remote unit 34, the remote unit 34 is dormant and no communication session exists between the remote unit 34 and the local apphcation server 32. In fact, the remote unit 34 may be powered down, off, recharging, or in a low power consumption "rest" mode.
- the remote unit 34 transitions from the wait state 82 to a logon or login state 84. Such transition may be automatic or may require the worker to provide an audible input signal or utterance to the local application server 32 via the remote unit 34 that the worker desires to transition to the login state 84. Automatic transition may occur by the worker simply turning or powering on the remote unit 34.
- the worker is then considered as being in the logon state 84 and the worker has established a communication session with the local application server 32 via the remote unit 34.
- the worker and or the remote unit 34 may be considered as being in the same state.
- the following description will refer to the worker and the remote unit 34 as being in a state or transitioning from one state to another state.
- the remote unit 34 and the local application server 32 will communicate with each other to establish a communication session between them to allow the worker at the remote unit 34 to gain access to the system 30, receive instructions and or information from the local application server 32, and send statements or requests for information and/or instructions to the local application server 32.
- the local application server 32 may require that the worker enter a password or other verification information via the remote unit 34 to identify the worker and other information regarding the location of the worker and, as a result, the remote unit 34.
- the local application server 32 may store or otherwise keep information regarding the worker or worker verification itself or the local application server 32 may communicate with the enterprise server 44 or the network resources 46 to retrieve or otherwise access such information.
- the words, phrases, sentences, or other phonetic content uttered by the worker and accepted by the local application server 32 for the logon state 84 during the logon process or while the worker is in the logon state 84 are preferably chosen to create high recognition accuracy while limiting or rninimizing the length of time required for the worker to complete the logon process while in the logon state 84.
- the words, phrases, sentences, or other phonetic content uttered by the worker during the logon process or while in the logon state 84 are preferably chosen such that they seem natural to the worker and make intuitive sense to the worker.
- the worker and the remote unit or client 34 transition from the logon state 84 to the obtain work state 86.
- Such transition from the logon state 84 to the obtain work state 86 may happen automatically or only after the worker provides a predefined or predetermined input signal or utterance to the local application server 32 via the remote unit 34, such input signal or utterance being one of the allowed input signals or utterances for the worker while the worker is in the logon state 84. If the worker does not successfully complete the logon process while in the logon state 84, the worker and the remote unit 34 may be automatically returned to the wait state 82 by the local apphcation server 32.
- the worker may also be directly and automatically transitioned by the local apphcation server 32 from the logon state 84 to the logoff state 88, which will te ⁇ ninate the worker's session or other communication with the local application server 32.
- the worker may signal or indicate a desire to transition from the logon state 84 to the wait state 82 or to the logoff state 88 by providing a predefined or predetermined input signal or utterance to the local application server 32 via the remote unit 34, such predefined or predetermined input signal or utterance being one of the input signals or utterances allowed for use by the worker or accepted by the local application server 32 while the worker is in the logon state 84.
- the remote unit 34 While in the obtain work state 86, the remote unit 34 will communicate or interface with the local application server 32 to obtain one or more instructions or tasks for the worker. That is, the worker will receive one or more tasks or other instructions to be completed by the worker from the local application server 32 via the remote unit 34.
- the worker may terminate the session or communication with the local application server 32 and move or transition to the logoff state 88 by providing a predefined or predetermined input signal or utterance to the local application server 32 via the remote unit 34, such predefined or predetermined input signal or utterance being one of the input signals or utterances allowed to be used by the worker or accepted by the local apphcation server 32 while the worker is in the obtain work state 86, as will be described in more detail below.
- the worker transitions from the obtain work state 86 to a process work state 90, either automatically or by the worker providing a predefined or predetermined input signal or utterance to the local application server 32 via the remote unit 34, such input signal or utterance being one of the input signals or utterances allowed to be used by the worker or accepted or recognized by the local apphcation server 32 while the worker is in the obtain work state 86.
- the worker While in the process work state, the worker performs or completes the tasks or instructions provided to the worker by the local application server 32 via the remote unit 34 while in the obtain work state 86.
- the instructions are preferably provided aurally to the worker by the remote unit 34 using speech synthesis and headphones on the remote unit 34.
- the worker may also provide oral statements back to the remote unit 34 while in the obtain work state 86 to indicate to the remote unit 34 and, as a result, the local application server 32, that the worker has received the instructions, that the worker does or does not understand the instructions, that the worker is ready to transition to the process work state 90, or to make other requests, as will be described in more detail below.
- the words, phrases, sentences, or other phonetic content or utterances uttered by the worker and accepted by the local apphcation server 32 for the obtain work state 86 while the worker is in the obtain work state 86 are preferably chosen to create high recognition accuracy or to have a high probability of accurate decoding by either the remote unit 34 or the local application server 32, while also limiting or rnudirnizing the length of time required for the worker to complete the obtain work process while in the obtain work state 86.
- the words, phrases, sentences, or other phonetic content uttered by the worker while in the obtain work state 86 are preferably chosen such that they seem natural to the worker and make intuitive sense to the worker while allowing the worker to transition to the process work state 90 or the logoff state 88.
- the worker and the remote unit 34 transfer or transition to the process work state 90 either automatically or by the worker providing the appropriate input signal or utterance to the local apphcation server 32 via the remote unit 34, wherein the worker attempts to complete the instructions or tasks assigned to it by the local application server 32 while the worker was in the obtain work state 86.
- the worker may be in contact or communication with the local apphcation server 32 via the remote unit 34.
- the worker may terminate the process work state 90 and transition to the logoff state 88 by providing a predefined or predetermined input signal or utterance to the local apphcation server 32 via the remote unit 34, as will be described in more detail below. If after the completion of the tasks or instructions by the worker while the worker is in the process work state 90, the worker is required to report task related details to the local application server 32, the worker will transition to the report work state 92.
- Such transition from the process work state 90 to the report work state 92 may happen automatically or may have to be initiated by the worker by the worker providing a predefined or predetermined input signal or utterance to the local application server 32 via the remote unit 34, such input signal or utterance being one of the input signals or utterances allowed to be used by the worker while the worker is in the process work state 90. If no such reporting is required, the worker may transition back to the obtain work state 86 to receive one or more new instructions or tasks from the local apphcation server 32 via the remote unit 34.
- the words, phrases, sentences, or other phonetic content uttered by the worker and accepted by the local apphcation server 32 for the process work state 90 while the worker is in the process work state 90 are preferably chosen to create high recognition accuracy while limiting or minimizing the length of time required for the worker to complete the process work interaction process while the worker is in the process work state 90.
- the words, phrases, sentences, or other phonetic content uttered by the worker while in the process work state 90 are preferably chosen such that they seem natural to the worker and make intuitive sense to the worker while allowing the worker to move or transition easily to the obtain work state 86, the logoff state 88, or the report work state 92.
- Such transition may happen automatically or by the worker providing a predefined or predetermined input signal or utterance to the local application server 32 via the remote unit 34, such input signal or utterance being one of the input signals or utterances allowed to be used by the worker or accepted by the local application server while the worker is in the process work state 90.
- the local application server 32 may require the worker to enter the report work state 92 after each single task has been completed by the worker, after a series or set of multiple tasks have been completed by the worker, or periodically as initiated by the worker, the local application server 32, the enterprise server 44, etc.
- the local apphcation server 32 and/or the enterprise server 44 may initiate a request for information to the worker via the remote unit 34, thereby requiring the worker to transition from the process work state 90 to the report work state 92.
- a request may happen at regularly scheduled intervals, sporadically at times to be determined by either the enterprise server 44 or the local apphcation server 32, or both.
- the worker may transition back to the process work state 90 to continue completing instructions or tasks previously provided to the worker while in the obtain work state 86 or to the obtain work state 86 so that the worker can receive new or updated instructions or tasks from the local apphcation server 32.
- the worker, remote unit 34, or the local apphcation server 32 may also terminate the worker's session, thereby transitioning the worker to the logoff state 88.
- any communication by the worker to the local apphcation server 32 is preferably provided orally or audibly by the worker to the remote unit 34 and processed or decoded by the remote unit or chent 34 or passed to the local application server 32 for processing or decoding.
- the worker may also provide oral statements back to the remote unit 34 while in the report work state 92 to indicate to the remote unit 34 and, as a result, the local apphcation server 32, that the worker is ready to transition from the report work state 92 to another state or to make other requests or provide other information, as will be described in more detail below.
- the worker and remote unit 34 transition to the logoff state 88 from the obtain work state 86, the process work state 90, or the report work state 92, the worker ends the session or communication with the local apphcation server 32, thereby freeing up the remote unit 34 for use by another worker, diagnostic service or maintenance, etc.
- the remote unit 34 and/or the local application server 32 may ask the worker to verify that he or she wishes to terminate the session or request other information from the worker. If the worker does not wish to terminate the session or if the worker entered the logoff state 88 inadvertently, the worker may be returned or transitioned to the state at which the worker was in just prior to entering the logoff state 88.
- the remote unit 34 may transition or transfer immediately and/or automatically back to the logon state 84 for use by the same or a new worker or the remote unit 34 may transition to the wait state 82 to await activation or use by the same or another worker.
- the wait state 82 may be combined with or considered as part of either the logon state 84 or the logoff state 88 such that the wait state 82 does not exist separately.
- the logoff state 88 may also be combined with the logon state 84 into a more generalized connection/disconnection operational state if desired.
- a significant aspect of the present invention is that a worker uses audible words, phrases, sentences, or other utterances or other phonetic content while communicating with a remote unit or client to the local apphcation server 32 and while in a state or to initiate or complete transition from one operational state to another.
- Each operational state has its own set of allowed utterances that will be accepted by the local apphcation server 32 or otherwise be considered as a valid input from a worker while the worker is in the state.
- Different states may have different sets of allowed or acceptable utterances and there may be some utterances that are allowable in some or all of the states. Therefore, the words, phrases, sentences, or other utterances chosen become a significant part of the interface between a worker and a remote unit.
- the words, phrases, sentences or other utterances or phonetic content selected for use for each state balance the considerations of recognition accuracy, efficiency, and ease of use.
- Semantic appropriateness of an utterance is directed to the naturalness of the utterance from a worker or user point of view and includes a dete ⁇ riination of whether the utterance makes intuitive sense to the worker or user.
- Utterances are needed to allow a worker to interact with the system 30 so that the worker can communicate with a remote unit and, as a result, the local apphcation server 32.
- Primary considerations for semantic appropriateness are that the utterances make sense as natural phrases to the worker and that the utterances are easy to remember and speak. Therefore, for example, the utterances preferably do not include tongue twisters, non-grammatically correct phrases or sentences, or a series or string of unrelated words.
- the utterances do not have to make sense to everyone, but should make sense to an experienced user of the application or an experienced worker using the system 30 in a particular environment.
- the length of an utterance will also affect the usefulness and efficiency of a speech recognition interface. Generally, shorter utterances are preferable over longer utterances because they improve the efficiency of the overall interaction, both of the worker or user who can take less time to speak the utterance and of the system that can take less time to decode or figure out what was said or uttered by the worker or user.
- There is a tradeoff between utterance length and recognition accuracy however, because very short utterances may be difficult for the local application server 32 to recognize or decode accurately and short utterances are sometimes difficult to distinguish from background noise. Shorter utterances may also be more ambiguous out of context and therefore may not be as appropriate semantically.
- phonemes are short speech elements akin to an alphabet for sounds. All languages, including American English, can be described in terms of a set of distinctive sounds, or phonemes. For example, in American English, there are approximately forty- two phonemes including vowels, diphthongs, semi-vowels, and consonants.
- candidate utterances are preferably tested out of context with any apphcation to evaluate their base speech recognition or decoding potential. This provides an initial check of the decisions made prior to this step.
- Candidate utterances that meet the initial criteria are then tested in the context of the system.
- the candidate utterances are evaluated for accuracy of recognition or decoding by a speech recognition apphcation operating in a remote unit or in a local application server and for ease of use of a worker. Sometimes an utterance that looked natural to a worker will no longer seem so in the actual context of the system. If this happens, a new utterance can be chosen given the new information about how the former utterance was inappropriate. Sometimes an utterance will not give the required accuracy. Then a new utterance may be chosen or some part of the speech recognizer or decoder operating in a remote unit or in a local application server may be modified to improve recognition of the initial utterance.
- words, phrases, sentences, or other utterances can be selected to allow a worker located at or otherwise using a remote unit, such as the remote unit 34, to provide audible input to the remote unit which is then decoded by the remote unit or passed to the local application server 32 for decoding.
- the system 30 minimizes the number of allowable utterances available to a worker for each state or for initiating or completing a transition from one state to another state.
- some words, phrases, sentences, or other utterances may be allowed for some states but not others, or for some transitions between specific states, but not others, as will be discussed in more detail below.
- Limiting of the number of allowed utterances for a state or a set of states in a directed resource application reduces the number of different utterances that must be decoded by either a remote unit or a local application server, reduces memory requirements for the storage of information related to the utterances, and can significantly increase the efficiency and recognition accuracy of the system 30 by increasing the accuracy of decoding of speech input signals.
- utterances that comprise an application's speech interface with a user or worker are used with different frequencies. That is, for some applications using a speech-centered interface, there exists a relatively small subset of allowed utterances, called "critical utterances," that are used by the user or worker with a significantly higher frequency than the rest of the allowed utterances.
- critical utterances a relatively small subset of allowed utterances
- the frequency of using an utterance from a critical set of utterances is over ninety percent (90%) of all of the utterances used or allowed in the application speech interface while the size of the critical set of utterances is under five percent (5%) of the total number of utterances used or allowed in the application speech interface.
- Each of the states 82, 84, 86, 88, 90, 92 may use a different set of critical utterances to allow a worker to indicate or initiate a transition from one state to another state or to provide information or request to a local application server or an enterprise server via a remote chent while in a state.
- FR false rejection rate
- FA false acceptance rate
- ER phrase error rate
- One significant objective of designing a critical set of utterances for different states is to satisfy the above constraints on error indices while selecting appropriate utterances, e.g., utterances in common use in the application domain, utterances intuitively fitting to the action or task performed by a user, worker, or other directed resource, and utterances consistent in structure and meaning with the rest of the application.
- utterances e.g., utterances in common use in the application domain
- utterances intuitively fitting to the action or task performed by a user, worker, or other directed resource, and utterances consistent in structure and meaning with the rest of the application.
- the phrase "gotit,” a compression of the phrase "got it” satisfies these objectives as the response indicating fulfillment of an order request (work completion) in a picking apphcation domain. Therefore, the phrase "gotit" may be one of the allowed critical utterances for a worker while the worker is in the process work state 90.
- An empirical approach can be used to select critical utterances for each allowed state for a worker.
- utterances that are in common use or that make intuitive sense in the application domain to describe user or worker actions and their outcomes for the apphcation domain are gathered.
- An apphcation domain can be defined as some collection of specific apphcation areas that can be considered essentiaUy the same area for the purpose of designing a user interface. For example, picking as a directed resource apphcation is essentially the same if the products to pick are food items or alternatively if the products to pick are motorcycle parts; thus picking is an example apphcation domain.
- a critical set of utterances (a small subset of the gathered utterances with very high frequency of use) exists for each state.
- a dialog is structured between the worker and the apphcation by constraining the sets of utterances to groups that can be accepted by the apphcation at each given point in its operation or at each allowed state. In doing so, it is preferred to group together critical utterances and to separate non-critical utterances into different groups.
- the removed non-critical utterances are preferably replaced with a single "exception" utterance (further treated as critical) in the group of critical utterances that refers to a different group where those utterances are placed or that transitions a worker from a state where such non-critical utterances are not allowed to be used by the worker to a state where such non-critical utterances can be used by the worker.
- the exception utterance will be used by the worker to initiate a transition between the two states.
- potential speech recognition or decoding errors in the obtained groups of utterances are estimated.
- Such errors can be created by, for example, the application rejecting a critical utterance either as noise, or as being unreliably recognizable with respect to other utterances in the same group of utterances, confusing one critical utterance with another critical utterance in the same group of utterances, confusing a critical utterance with a non-critical utterance from the same group of utterances, confusing two non-critical utterances from the same group of utterances, mistaking utterances outside of a group of utterances for those inside the group of utterances.
- Confuseability i.e., a degree of phonetic similarity
- Confuseability i.e., a degree of phonetic similarity
- Phonetic separation of words from the noise sounds is also preferably maximized by changing the words that comprise the utterances and re-estimating potential recognition errors.
- error indices for critical utterances can be defined as follows: (1) FR as a percent of spoken critical utterances from the constrained set that are incorrectly rejected by a speech recognizer or decoder;
- ER as a percent of spoken critical utterances from the constrained set recognized as different utterances from the constrained set (either critical, or non-critical).
- the phonetic composition of the critical utterances is preferably optiographzed, as well as still-confuseable non-critical utterances in the same sets.
- Two adjacent words in an utterance may be joined to more precisely represent the way it is going to be uttered by a user or worker. For example, a critical utterance "got it" was replaced with a compound made-up word
- the resulting sets of utterances (both critical and non-critical) for each state are ready for use in an apphcation speech interface in a directed resource environment.
- exemplary utterances for the states 82, 84, 86, 88, 90, 92 and for initiating or completing transitions between states will now be described in more detail.
- the utterances are selected according to the criteria described above and generally work well in a directed resource environment. Therefore, the utterances contain phonetic content that improves speech recognition accuracy, /. e. , the accuracy of the decoding of the speech input from a worker.
- the utterances also have minimal length to improve performance while also making intuitive and natural sense to a worker.
- a worker first activates a remote unit while in the logon state 84 or after transitioning to the logon state 84 from either the wait state 82 or the logoff state 88. While in the logon state 84, the worker provides audible or oral information to the remote unit that is decoded by either the remote unit or transmitted to a local apphcation server for decoding.
- a sample dialog between the worker and the remote unit 34 for logging in to the system 30 or while otherwise in the logon state 84 is as follows:
- the worker or the remote unit might automatically transition the worker and the remote unit from the logon state 84 to the obtain work state 86.
- the remote unit may query the worker to see if the worker is ready to transition from the logon state 84 to the obtain work state 86 or the remote unit may simply wait until the worker provides an audible input to the remote unit signaling the worker's desire to transition from the logon state 84 to the obtain work state 86.
- a dialog between the remote unit and the worker in which the remote unit queries or asks the worker if the worker is ready to transition from the logon state 84 to the obtain work state 86 is provided as follows:
- Worker Give me more work.
- a list of words, phrases, sentences, or other utterances that can be supplied or provided by a worker to a remote unit and, as a result, to the local application server 32, to signal the worker's desire to transition from the logon state 84 to the obtain work state 86 can include: Worker: Give me more work. or
- the phrases "give me more work” or “gimme more work” may form part of the allowed set of input signals or utterances that a worker can use and which will be accepted by the local application server while the worker is in the logon state 84.
- a worker in the obtain work state 86 is receiving one or more instructions or tasks from the local application server 32 via a remote unit.
- the worker may signify reception or receipt of the instruction(s) or task(s) by providing "ready to go” or "begin work” as an input signal to a remote unit and, as a result, the local application server 32.
- the utterances "ready to go” or “begin work” may be critical utterances for the obtain work state 86.
- the worker will attempt to complete the instruction(s) or task(s) provided to the worker by the local application server 32 while the worker was in the obtain work state 86.
- the worker may signify such completion by uttering "got it" or “gotit” as an input to a remote unit and, as a result, the local apphcation server 32.
- These utterances meet the criteria described previously above and may be critical or allowed utterances for the worker while the worker is in the process work state 90 and will be accepted by the local apphcation server 32 while the worker is in the process work state 90.
- a second state diagram or work model 100 is provided that can be used with the system 30.
- the directed resource or work model 100 is essentiaUy a more sophisticated or complex implementation of the state diagram or model 80 provided in Figure 2 and includes the wait state 82, the logon state 84, the obtain work state 86, the process work state 90, and the report work/status state 92, and the logoff state 88. Each of these states works in a manner similar to their description provided previously above in relation to the state diagram 80.
- the state diagram 100 includes individually optional states such as a process exceptions state 102 for handling problems, commands, or other exceptions to the normal or expected flow of operation, a break state 104 that allows a worker to temporarily suspend initiation, execution, or completion of one or more instructions or tasks whUe in the process work state 90, a get clarification state 106 that aUows a worker to ask questions about the information or instructions previously provided to the worker or other topics as allowed, and a get productivity state 108 that allows a worker to obtain information or updates regarding the worker's performance, production, efficiency, goal achievement, etc., while in the process work state 90 or the report work/status state 92.
- a process exceptions state 102 for handling problems, commands, or other exceptions to the normal or expected flow of operation
- a break state 104 that allows a worker to temporarily suspend initiation, execution, or completion of one or more instructions or tasks whUe in the process work state 90
- a get clarification state 106 that aUows a worker to ask questions about the information
- the process exceptions state 102 is preferably reachable from most, if not aU, allowed states in a system.
- a worker might be transitioned to the process exceptions state 102 in general situations such as when a worker has provided an audible input to a remote unit, in response to a query created by the local application server 32 via the remote unit, that cannot be decoded by the remote unit or the local application server 32 into a command or statement that is valid for the worker's current state. For example, the worker may speak an utterance that for some reason the system 30, particularly the local apphcation server 32, fails to decode.
- the system 30 or local apphcation server 32 may respond to this with a message such as "did not recognize last utterance" communicated to the worker via the worker's remote chent. The worker may then repeat what was said, pronouncing it more carefully or completely, or say something else. In these general situations, the system 30, more particularly the local apphcation server 32, might query the worker again via the remote unit and/or require the worker to provide a new audible response or input to the remote unit.
- the worker wiU preferably automatically transition from the process exceptions state 102 back to the worker's previous state. If the exception cannot be handled or dealt with while the worker is in the process exceptions state 102, the worker may automaticaUy be transitioned from the process exceptions state 102 to the logoff state 88.
- state specific exceptions might also exist that relate specificaUy to the function or activity of a worker while the worker is in a given state.
- the state specific exceptions might vary or be different from state to state.
- the worker while a worker is in the logon state 84, the worker might provide several types of information to a remote unit and, as a result, a local application server. If the worker's responses, inputs, or other utterance to the remote unit are decoded correctly, either by the remote unit itself or the local application server, the worker might not create any general exceptions. However, the worker might provide information or an utterance to the remote unit that creates a state specific exception that will automatically transition the worker from the logon state 84 to the process exceptions state 102.
- the worker might provide inconsistent information during the logon state 84 that, while not creating a general exception to be dealt with in the state 102, creates a logon state 84 specific exception that wUl be dealt with by the system 30 by transitioning the worker from the logon state 84 to the process exceptions state 102.
- the worker wiU then be requested by the local apphcation server 32 via the worker's remote unit to provide clarifying information whUe in the process exceptions state 102 to eliminate the confusion and allow the worker to transition back to the logon state 84.
- State specific exceptions may exist for any of the states in the state diagram 100 and some exceptions may require more time or information to complete than others.
- the worker may be automaticaUy transitioned from the process exceptions state 102 to the logoff state 88.
- the break state 104 allows a worker in the process work state 90 to temporarily halt the worker's activities and signal such a halt to the local application server 32 via a remote unit.
- the worker will be able to suspend work progress, at least temporarily, without further interaction with a remote unit or without receiving consent from a remote unit or a local apphcation server.
- a worker wishes to signal the local apphcation server 32 that the worker is suspending completion of a task or instruction. For example, a worker's performance on a task may be measured by how long the worker takes to complete the task. If the worker needs to suspend work or activity for any reason, such as a restroom break, the worker may want to indicate to the local apphcation server 32 that the time measurement should be halted.
- the worker transitions from the process work state 90 to the break state 104. Once the worker transitions back to the process work state 90 from the break state 104, the time measurement for the worker's performance or completion of tasks can resume.
- a worker preferably can signal a transition from the process work state 90 to the break state 104 by an easy and intuitive utterance provided by the worker to a remote unit that has a high probability of being accurately decoded by a remote unit or by the local apphcation server 32. More specifically, to signal a transition from the process work state 90 to the break state 104, the worker preferably utters "gimmee a break" or "give me a break” as an audible input signal to a remote unit, both phrases of which satisfy the criteria described previously above, and which may be critical or allowed utterances for the process work state 90 and that can be used by a worker while in the process work state 90.
- the worker can signal a transition from the break state 104 to the process work state 90 by uttering "ready to go" as an audible input signal to a remote unit, which is the same utterance that transitions the worker from the obtain work state 86 to the process work state 90 and which may be a critical or allowed utterance for the obtain work state 86 and/or the break state 104 that a worker can use while in these states.
- a worker in the get clarification state 106 is requesting further information that may be specific to the state that the worker is currently in or that may be more of a more generalized nature. Therefore, in a manner similar to the process exceptions state 102 previously described above, the worker can preferably transition to the get clarification state 106 from all or nearly all of the other states. The worker can signal a transition from a state to the get clarification state 106 by using the utterance "gimmee " where the blank is filled in for the particular request.
- a worker might use the utterance "gimmee the product code" to transition from the process work state 90 to the get clarification state 106 based on the worker's desire to obtain more information about a product or item the worker is dealing with. While in the get clarification state 106, the system 30 might provide more information to the worker via a remote unit or query the worker to provide more specific information relating to the worker's request.
- Another general example utterance is "gimmee the time” or "gimmee the date' when the worker wishes to obtain such information from the system 30 or, more specifically, from the local application server 32 via a remote unit or client.
- a worker may utter "gimmee some help" as an input signal to a remote unit to request further information from the local apphcation server 32 about the current task or instruction the worker is to be performing, the current state that the worker is in, or about what words, phrases, sentences, or other utterances are allowable or available to the worker in the worker's current state.
- Other transition utterances beginning with the phrase "gimmee" that may be used as audible input signals by a worker to a remote unit wUl be discussed in more detail below.
- the get productivity state 108 allows a worker to obtain performance information from the local apphcation server 32 either while the worker is in the process work state 90 or while the worker is in the report status/work state 92.
- a worker might use the utterance "how am I doing" to transition from the process work state 90 or the report status/work state 92 to the get productivity state 108 by providing such utterance as an audible input signal to a remote unit or client.
- the utterance "how am I doing" satisfies the criteria described previously above.
- the system 30 may query the worker via a remote unit to provide more information or may provide the worker with a pre-determined statement of information.
- the worker can signify a desire to transition back to the previous state (either the process work state 90 or the report work/status state 92) by uttering "ready to go," which may be a critical utterance for the state 108.
- additional utterances may be allowed and used to enable a worker to transition from one state to another or to interact with a remote unit or local apphcation server 32.
- the worker may utter "pick” and then two digits such as "one two" or "five nine” to indicate to the remote unit that the worker has picked twelve or fifty-nine of the specified items, respectively.
- the utterance "pick one seven” would indicate that the worker has picked seventeen of the specified items while the utterance "pick three two” would indicate that the worker has picked thirty-two of the specified items.
- the worker can utter "superpick” and then three digits.
- the utterance "supe ⁇ ick one one nine” would indicate that the worker has picked one hundred and nineteen items while the utterance "supe ⁇ ick one three six” would indicate that the worker has picked one hundred and thirty-six items.
- Utterances having a format beginning with "pick” or "supe ⁇ ick” are chosen with the criteria for utterances previously described above in mind and may be considered, as allowed or critical utterances for the process work state 90 that can be used by a worker while the worker is in the process work state 90.
- a worker might indicate this to a remote unit and, as a result, the local application server 32, by uttering "cut product" or other statement which preferably aUows the worker to move on to the next task or instruction without completing the current task or instruction. If desired, such a "cut product” utterance may also transition the worker from the process work state 90 to the process exceptions state 102 so that the worker can provide more information as to why the task cannot be completed.
- the utterance "cut product” is chosen with the criteria previously described above in mind and may be a critical or allowed utterance for the process work state 90 usable by a worker while the worker is in the process work state 90.
- a "backload product” or other statement that merely tells the local application server 32 that completion of an instruction or task wiU or should be temporarily delayed or deferred until a later time.
- a "backload product” utterance may also transition the worker from the process work state 90 to the process exceptions state 102 so that the worker can provide more information to the local information server 32 as to why the task should or must be delayed.
- the utterance "backload product” satisfies the criteria previously described above and may be a critical or allowed utterance for the process work state 90 that a worker can use while the worker is in the process work state 90.
- the remote units, local application server, and ente ⁇ rise server may be any computer or computer system on which one or more computer software applications are operating.
- other states may be used along with or in place of any of the states described in the state or directed resource models 80 and 100.
Landscapes
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Manufacturing & Machinery (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
A system for implementing speech recognition in a directed resource application includes a local application server on which a speech recognition application may operate, remote units or clients that interact with the local application server and which may be used or worn by users or workers so that instructions and information can be sent from the local application server to the users or workers and progress reports or requests for information or instructions can be sent by the workers to the local application server. The workers may be locally mobile such that the workers may be moving while simultaneously communicating with the local application server. The workers and the system interact so that each worker conducts activities in one or more work states as directed by the local application server. The system may also include an enterprise system or server which monitors and/or controls operation of the environment in which the local application server is operating and which may provide information or instructions to the local application server to govern or influence operation of the speech recognition application running or otherwise operating on the local application server. Input from the workers to the remote units and, as a result, to the local application server is preferably done audibly to increase the efficiency of the workers and the words, phrases, sentences, or the utterances made by the workers are preferably chosen to increase the accuracy of the speech recognition or decoding and to make intuitive sense to the workers.
Description
INTERACTIVE VOICE UNIT FOR GIVING INSTRUCTION TO A WORKER
Technical Field
This invention relates generally to a method and apparatus for using speech recognition in a directed resource application, for example, in a production, picking, or assembly line environment and, more specifically, to a method and apparatus for increasing the efficiency and operability of a speech recognition application in such a directed resource application environment. Background Art
Due to advancements in computer technology and software programming techniques, in conjunction with a continuously growing understanding of the mechamcs and characteristics of speech, speech recognition applications have made tremendous strides in acceptance and usage. In a conventional software program operating on a computer and using speech recognition, the sounds, words, or sentences uttered by a person are detected and one or more electrical signals representative of the sounds, words, or sentences are created and used by the computer to control or guide the software program. For example speech recognition applications are now available for allowing people to dictate letters, memos, etc. directly into a computer, for performing speaker identification, and for assisting in inventory management and shipping. Some examples of currently available commercial speech recognition software include the Dragon Dictate™ software and the Dragon Naturally Speaking
Continuous Voice Recognition™ software by Dragon Systems, Inc., ViaVoice™ software and ViaVoice Gold™ software by IBM Corporation, and VoicePlus™ software and VoiceXpress™ software by Lernout and Hauspie.
At a general level, speech recognition is fairly straightforward. That is, a speaker utters a word, phrase, or sentence into a microphone. A signal processing subsystem extracts acoustic information from the uttered word, phrase or sentence that exhibits characteristics consistent with human language. A speech recognition subsystem then finds the best match between the extracted acoustic information and electronically stored representations of the acoustics of known words or phrases. The speech recognition subsystem then produces a text version of the verbal utterances. In practice, however, accurate and reliable speech recognition is considerably more complicated.
An underlying assumption in many speech recognition systems is that the speech uttered by a person changes relatively slowly over time. Under this assumption, short segments, often called frames, of a speech signal are isolated and processed as if they were short segments from a sustained sound with fixed physical properties. More specifically, in a conventional speech recognition application, a person speaks one or more sounds, words, or sentences such that compression or sound waves are produced that travel through the air and are detected or picked up by a microphone. The microphone
converts the speech or sound signal into an analog electrical signal representative of the speech signal which is, in turn, converted by an analog-to-digital (A/D) converter into a digital electrical signal that is also representative of the speech signal created by the person. The digital electrical signal is then processed or decoded to determine the word, phrase, or sentence uttered by the user. Speech recognition can be applied in a production, assembly line, picking, manufacturing, or other directed resource environment to increase the efficiency of workers. Unfortunately, such environments are typically very noisy, and this generally reduces the accuracy of speech recognition applications. That is, the noisier the work environment, the more likely that inaccurate decoding of a word, phrase, sentence, or the utterance will occur. In addition, the accuracy of speech recognition may also be limited by the different speech and enunciation patterns of workers and by the number of different workers. Even a specific user may pronounce the same word, phrase, or sentence two or more different ways at different times, particularly if the environment that the worker is in changes. For example, if the work environment for a worker is relatively quiet, the worker may pronounce a sentence or utterance one way. But if the work environment for the worker is noisy, the worker may speak loudly or even shout when pronouncing the sentence or utterance, thereby possibly changing the spectral characteristics of the sentence or utterance. Accommodating such acoustic variations between speech instances is a challenge for automatic speech recognition systems.
Workers in a manufacturing, picking, assembly line, production or other directed resource environment are directed by a list of tasks they must perform. The workers are, therefore, resources that are directed by a centralized controller or other system that provides instructions to the workers about the task(s) to be completed by the workers. The workers must then somehow report on the completion of their tasks. In many cases the list of tasks to be performed by a worker takes the form of a paper check list, which the worker consults to get the next task and marks to show the task completion status. In a picking application, for example, where a worker fulfills product orders, by picking selected products or items from an assortment of bins, a conveyer belt, etc., the worker reads a product order from a list and marks the list for how much of the order was filled. In some cases this list may be computerized, so that a database for tracking the completion of tasks (for example, fulfillment of orders and resulting inventory changes) can be updated automatically. But, if the worker needs to use his or her hands or eyes to perform the tasks, using a list requiring the use of hands or eyes (computerized or not) is at best an interruption of work and at worst a significant distraction. In some cases speech recognition and/or speech synthesis may be used as a component of the interaction between the user and the task list. But unless the interface is truly speech-centered, using well- designed, robust, and natural speech interaction dialogs, the interface may continue to cause distractions or other interruptions for the worker.
Directed resource applications typically, but not necessarily, involve locally mobile workers in a specific work environment. Locally mobile workers are those workers who need to move around to do their job, but generally stay within a local area, a single building for example. Many applications, other than picking, can use a directed resource approach. For example, a vehicle inspection application, where a user reports on the status of a vehicle in response to questions from an inspection checklist; and a construction, maintenance, or diagnostic instruction application, where a user gets a task for a project from a central controller.
Another hypothetical example of a directed resource application is a system providing nurses with instructions for giving medication to patients in a hospital. A nurse making rounds and seeing patients could request and receive work in the form of medications required by the patient being visited.
When this work is completed, the system would update a database of when patients received their medications and the types and amounts of medications received. Nurses often need to be locally mobile, with both hands and eyes free to dispense medications to patients. The system would also need a pause feature so that the nurse can interact with the patient verbally. Despite the state of the prior art in speech recognition technology, there remains a need for an accurate and reliable system for using speech recognition in a manufacturing, picking, assembly line, production, or other directed resource environment and for providing a speech-centered user interface for directed resource applications using locally mobile workers. Preferably, the system will also increase the efficiencies and accuracy of a worker in the environment while allowing the worker to use phrases, terms, sentences, or other utterances that are natural for the worker and that make intuitive sense to the worker. Disclosure of Invention
Accordingly, it is an object of the present invention to provide a method and apparatus for using speech recognition in a manufacturing, picking, assembly line, production, or other directed resource environment.
Another object of the present invention is to provide a method and apparatus for speech- centered user interaction in a directed resource application.
Yet another object of the present invention is to provide a method and apparatus for increasing the accuracy of speech recognition in a directed resource application or where a locally mobile worker receives one or more tasks to complete.
A further object of the present invention is to provide a method and apparatus for increasing the efficiency of workers in a manufacturing, picking, assembly line, production, or other directed resource environment.
Still another object of the present invention is to provide a method and apparatus for allowing communication to/from locally mobile workers via a speech-centered interface from/to a centralized monitor or controller of the workers.
Additional objects, advantages, and novel features of the invention shall be set forth in part in the description that follows, and in part will become apparent to those skilled in the art upon examination of the following or may be learned by the practice of the invention. The objects and the advantages may be realized and attained by means of the instrumentalities and in combinations particularly pointed out in the appended claims.
To achieve the foregoing and other objects and in accordance with the purposes of the present invention, as embodied and broadly described herein, a system in which a worker can operate as a directed resource in one or more states and transition from one state to another by making one or more audible statements includes a server and a client device that can communicate with the server, the server capable of providing information to the worker via the client device and allowing the worker to function in an obtain work state in which the worker receives one or more tasks from the server via the client device to be completed by the worker, a process work state in which the worker completes said one or more tasks, and a report work state in which the worker reports status information to the server via the client device.
Also to achieve the foregoing and other objects and in accordance with the purposes of the present invention, as embodied and broadly described herein, a directed resource model for a worker, wherein instructions are provided to the worker from a server via a client device, includes an obtain work state in which the worker receives one or more tasks to be completed by the worker from the server via the client device, a process work state in which the worker completes said one or more tasks, and a report work state in which the worker reports status information to the server via the client device, further wherein said worker can initiate transition from the obtain work state to the process work state by providing a first audible input signal to the client device and the worker can initiate transition from the process work state to the report work state by providing a second audible input signal to the client device.
Also to achieve the foregoing and other objects and in accordance with the purposes of the present invention, as embodied and broadly described herein, a work model for a worker, wherein instructions are provided to the worker from a server via a client device, includes a first state of allowed worker activity, a second state of allowed worker activity, a third state of allowed worker activity, a first utterance that transitions the worker from the first state of allowed worker activity to the second state of allowed worker activity when the worker provides the first utterance to the client device, and a
second utterance that transitions the worker from the second state of allowed worker activity to the third state of allowed worker activity when the worker provides the second utterance to the client device. Also to achieve the foregoing and other objects and in accordance with the purposes of the present invention, as embodied and broadly described herein, a method for directing a worker to complete one or more tasks includes establishing a plurality of allowed states of activity for the worker and establishing a group of at least one allowed utterance for use by the worker for each of the allowed states, wherein at least one utterance in at least one group of at least one allowed utterance is a transition utterance that allows the worker to transition from the one of the allowed states to another of the allowed states. Brief Description of the Drawings
The accompanying drawings, which are incorporated in and form a part of the specification, illustrate the preferred embodiments of the present invention, and together with the descriptions serve to explain the principles of the invention. In the Drawings Figure 1 is a block diagram of a system for using a speech recognition application to monitor, provide information and instructions to, and receive requests and statements from workers;
Figure 2 is general state diagram for a speech recognition application that can be used with the system of Figure 1; and
Figure 3 is a more specific implementation of the state diagram of Figure 2. Best Mode for Carrying out the Invention
A system 30 for implementing speech recognition in a directed resource application or environment is illustrated in Figure 1 and includes a local application computer or server 32 on which an application using a speech recognition or a speech-centered interface may operate, remote units or clients devices 34, 36, 38, 40, 42 that interact with the local application server 32 and which may be used or worn by users or workers so that instructions and/or information can be sent from the local application server 32 to the users or workers via the remote units 34, 36, 38, 40, 42 and so that the workers can send information and or requests to the local application server 32 via the remote units 34, 36, 38, 40, 42. Essentially, the remote units 34, 36, 38, 40, 42 and the local application server 32 form a client/server network that allows workers located at or wearing the remote units 34, 36, 38, 40, 42 to interact or communicate with the local application server 32.
As will be discussed in more detail below, the system preferably uses speech recognition or a speech recognition interface to allow workers located at or wearing the remote units 34, 36, 38, 40, 42 to provide audible input signals to the remote units 34, 36, 38, 40, 42. That is, workers can "speak" or provide audible utterances to the remote units 34, 36, 38, 40, 42 to provide input signals to the remote
units 34, 36, 38, 40, 42. The audible input signals from the workers can be decoded by either the remote units 34, 36, 38, 40, 42 or transmitted or passed by the remote units 34, 36, 38, 40, 42 to the local application server 32 for decoding by the local application server 32. Should the decoding of a worker's audible input signal be completed by a remote unit, the remote unit can pass or transmit the corresponding decoded signal or some equivalent, interpretation, or translation onto the local application server 32. Preferably, the speech input signals from a worker are decoded at the local application server 32 so as to reduce processing and memory requirements in the remote units 34, 36, 38, 40, 42 being used or worn by the worker.
Remote units, such as the remote units 36, 38, 40, may be connected to the local application server 32 via a local or wide area network, an intranet, the Internet or World Wide Web, direct connection, or some other computer network or connection. Other remote units, such as the remote units 34, 42, may be connected to the local application server 32 via cellular, radio, microwave, or other wireless connection. In such a client server implementation of the system 30, with the remote units 34, 36, 38, 40, 42 functioning as client devices, the remote units 34, 36, 38, 40, 42 work as general devices in the system 30 that do not require any or need specific knowledge or information regarding applications operating on the local application server 32 or the enterprise server 44. Therefore, the system 30 can be used in or with a variety of applications that can be changed by updating the local application server 32 and or the enterprise server 44 without usually requiring changes or updates to the software operating on the remote units 34, 36, 38, 40, 42. The system 30 may also include an enterprise system or server 44 which monitors and/or controls operation of the environment in which the local application server 32 is operating and which may provide information or instructions to the local application server 32 to govern or influence operation of the apphcation(s) ninning or otherwise operating on the local application server 32. For example, the enterprise server 44 may be a computer or computer system that monitors and controls the operation of an assembly line. The local application server 32 may be used to monitor and provide information or instructions to workers working in a particular part of the assembly line. Thus, the local application server 32 provides localized control, monitoring, instructions, or information to the workers while the enterprise server 44 may monitor and control a larger number of workers via a plurality of local application servers and/or other local enterprise servers. The workers form a directed resource for the local application server 32 and/or the enterprise server 44 such that the workers can receive tasks from the local application server 32 and/or the enterprise server 44 to be completed by the workers. The enterprise server 44 may also be connected to other network resources 46 so that information and instructions can be passed to and from the enterprise server 44. The network resources 46 may include a database or database server, a file server, a log server, etc.
The enterprise server 44 may be connected to the local application server 32 and/or the other network resources 46 via a direct connection, wide area or local area network, or any other kind of computer network or cellular, radio, or wireless network. For purposes of explanation of the system 30, the local application server 32 is not illustrated as connected to the network resources 46 other than through the enterprise server 44. However, the local application server 32 may be connected to the network resources 46 in other ways without departing from the scope or essence of the present invention. The local application server 32 may also be connected or networked to the other network resources 46. Each of the components of the system 30 will be discussed in more detail below.
A significant advantage of the system 30 of the present invention is that the system 30 allows communication to and from workers from and to a central monitor and control server, such as a local application server or enterprise server, such that accuracy and efficiency of individual workers and groups of workers are increased and such that the workers can function as locally mobile workers or directed resources. Such efficiency improvement is a direct result of incorporating speech recognition and speech synthesis techniques into a work environment in a way that is natural for the workers and makes intuitive sense to the workers while providing sufficient decoding accuracy for different pronunciations of words, terms, and phrases to be used or otherwise uttered by the workers as input signals to a remote unit.
A significant feature of the system 30 is that the worker is preferably allowed to conduct activity only in a finite number of operating states. Each individual operating state is directed to one or more subsets of functions or tasks to be performed by the worker while the worker is in the individual state. The interaction between the worker and the system 30 is preferably conducted through speech and audio prompts, i.e., a speech centered interface, such that the worker's eyes and hands are available or free for other tasks. The speech prompts provided to a worker may be provided by text-to-speech conversion by a remote unit of a signal or information sent or communicated to the remote unit from the local application server 32. More specifically, with a speech-centered interface between the worker and the system 30, the worker uses a limited or constrained set of allowed audible input signals or utterances to transition between operating states and to provide information and requests via a remote unit to the local application server 32 while the worker is operating in a particular state. Each of the input signals or utterances allowed for each state preferably has a high probability of being decoded accurately by the remote unit or the local application server 32. The audible input signals or utterances also preferably make intuitive sense to the workers and naturally tie in to the operating states. Each state may have its own set of allowed utterances that differ from other states' sets of allowed utterances and which will be accepted or recognized by the local application server 32 for the state. As will be described in more detail below, the worker is locally mobile and guided as a directed resource by the
local application server 32 through the receiving, performance, completion, and reporting of tasks provided or communicated to the worker from the local application server 32 via a remote unit or other client device.
A system using speech recognition, such as the system 30, can be used in a production, assembly line, picking, manufacturing or other work or directed resource environment or operation to increase the efficiency of workers. For example, in a picking application where a worker is picking and filling orders for food or other merchandise from a moving conveyer belt or from a collection of bins, the conveyer belt and each bin containing one or more items that can be selected by the worker, voice or speech recognition may be used by the system 30 to receive or accept replies or statements from the worker or requests from the worker for new instructions. In addition, information or instructions may be given to the worker by a remote unit visually or audibly through speech synthesis or artificially created oral commands. Such instructions to the worker might include what items to pick and how many of each item to pick. The instructions would be generated by the local application server 32 and communicated to the worker via a remote client device, such as the remote client 34. The worker might wear a headset with ear phones and a microphone, which form part of the remote unit 34, so that the interaction between the worker and the system 30 is conducted primarily with speech and audio prompts, thereby leaving the worker's hands and eyes free. The remote unit preferably communicates directly or via wireless transmission to a central computer or server, such as the local application server 32, which provides information and instructions to the worker via the remote unit and receives requests for information or instructions from the worker provided as audible inputs to the remote unit.
In an application of the system 30 in a picking environment, workers may be positioned at or even wearing the remote units or clients 34, 36, 38, 40, 42. The local application server 32 may provide information and instructions to the workers via the remote units or clients 34, 36, 38, 40, 42 and receive audible requests for information or instructions from the workers via the remote units or clients 34, 36, 38, 40, 42. The allowed operational states may include a first state where the worker is obtaining or receiving work or tasks via a remote unit from the local application server, a second state where the worker is processing or completing such work or tasks, and a third state where the worker is reporting via the remote unit to the local application server on the worker's progress. Each of the three states may have a different set of allowed audible input signals or utterances that the worker can use for communication to the local application server 32 via the remote unit. More specifically, in the first state, the local application server 32 might provide one or more instructions to the worker via a remote unit to tell the worker what items to select from a moving conveyer belt or bin(s). After receiving the instructions while in the first state, the worker may transition from the first state to the second state to complete the assigned instructions. Once the worker has completed the instructions while in the second
state, the worker may transition from the second state to the third state by providing an audible input to the remote unit to signal such completion by the worker to the local application server 32 and to signal transition to the third state. The worker may also place the completed order of selected items on a conveyer belt that will deliver them to a loading dock for shipment. After a worker has signaled completion of an order or instruction previously provided by the local application server 32 via a remote unit, the worker may transition back to the first state and the local application server 32 may provide one or more additional or new instructions or tasks to the worker via the remote unit to allow the worker to pick new items and fill a new order. Since the worker is preferably wearing the remote unit, such that the worker is mobile and can easily use his or her arms and hands, all commands and statements from the local information server are preferably provided audibly via ear phones to the worker and the input from the worker to the remote units is preferably provided by the worker into a microphone that forms part of the remote unit. The utterances made by the worker can be decoded either by the remote unit or by the local application server 32. The local application server 32 may send and/or receive information from the enterprise system or server 44. Based on information received from the enterprise server 44, the local application server 32 may modify instructions or information previously sent to workers at the remote units 34, 36, 38, 40, 42, send new instructions or information to workers at the remote units 34, 36, 38, 40, 42, etc.
Now referring to Figure 2, a generalized state diagram or work model 80 is provided that can be used with the system 30 and that allows workers to function as directed resources. As previously discussed above, a significant feature of the present invention is that a worker is allowed to work or be active in a finite number of operational states, each state having a predefined or limited number of tasks, functions, or activities that can be performed by a worker and a limited number of allowed audible utterances or input signals that can be used by the worker to communicate with the local application server 32 which is directing the worker's activities. The allowed utterances for each state preferably make intuitive sense to the worker and have a high probability of being accurately decoded by either a remote unit or the local application server 32. Thus, while a worker is completing tasks, instructions, or other actions, the worker is, in essence, conducting one or more allowed tasks, functions, or activities for a state or transitioning from one state to another state that allows the desired or needed task, function, or activity. Therefore, the directed resource or work model 80 provides a general governing framework for a worker using speech recognition and audible prompts and/or feedback in a manufacturing, production, assembly line, picking, or other directed resource work environment or model, as will be described in more detail below.
For purposes of further explanation of the state diagram 80, the worker will be presumed to be located at the remote unit 34 and to be in wireless communication with the local application server 32
for sending requests and statements to the local application server 32 and for receiving instructions and information from the local application server 32. The remote unit 34 preferably includes a microphone and headphones and is worn by the worker to allow significant freedom of movement by the worker or such that the worker can be locally mobile. The remote unit preferably converts all sound or audible signals or utterances made by the worker into a digital or analog signal representative of the sound or audible signals or utterances and transmits them to the local application server for decoding by the local apphcation server. By using speech recognition for providing input from the worker to the remote unit 34, the worker provides information into the remote unit 34 orally or audibly such that no manual entry via keyboard, mouse, etc. is required, thereby simplifying the information entry process for the worker. In addition, commands or instructions to the worker regarding the type and sequence of information to be provided by the worker to the remote unit 34 are also preferably provided orally or audibly to the worker via the speakers or headphones. While the commands, information, or instructions from the remote unit 34 may be presented visually to the worker on a screen or other display on the remote unit 34, audible commands, iriformation, or instructions are preferred since they allow for more freedom of movement and other activity by the worker since the worker does not have to look at a screen or other display on the remote unit 34 to receive the commands, information, or instructions. Thus, the system 30 preferably contains a speech synthesis or text to speech capability to allow commands, information, and instructions to be audibly communicated to the worker. This text to speech capability can reside either on the remote unit 34 or on the local apphcation server 32 but preferably is implemented in a client-server fashion on both the local apphcation server 32 and the remote unit 34. Such speech synthesis capability is well known in the art and does not need to be described in further detail for purposes of elaboration or explanation of the present invention.
Before a worker has initiated contact with the local application server 32 from the remote unit 34, the remote unit 34 will be in a wait state 82. That is, the remote unit 34 is generally inactive and little, if any, communication is occurring between the remote unit 34 and the local apphcation server
32. In essence, since no worker is positioned at or using the remote unit 34, the remote unit 34 is dormant and no communication session exists between the remote unit 34 and the local apphcation server 32. In fact, the remote unit 34 may be powered down, off, recharging, or in a low power consumption "rest" mode. When a worker at the remote unit 34 first initiates contact or a communication session with the local application server 32, the remote unit 34 transitions from the wait state 82 to a logon or login state 84. Such transition may be automatic or may require the worker to provide an audible input signal or utterance to the local application server 32 via the remote unit 34 that the worker desires to transition to the login state 84. Automatic transition may occur by the worker simply turning or
powering on the remote unit 34. The worker is then considered as being in the logon state 84 and the worker has established a communication session with the local application server 32 via the remote unit 34. For purposes of explanation, but not limitation, of the present invention, the worker and or the remote unit 34 may be considered as being in the same state. Thus, the following description will refer to the worker and the remote unit 34 as being in a state or transitioning from one state to another state.
During the logon state 84, the remote unit 34 and the local application server 32 will communicate with each other to establish a communication session between them to allow the worker at the remote unit 34 to gain access to the system 30, receive instructions and or information from the local application server 32, and send statements or requests for information and/or instructions to the local application server 32. As part of the logon process that occurs while the worker is in the logon or login state 84, the local application server 32 may require that the worker enter a password or other verification information via the remote unit 34 to identify the worker and other information regarding the location of the worker and, as a result, the remote unit 34. The local application server 32 may store or otherwise keep information regarding the worker or worker verification itself or the local application server 32 may communicate with the enterprise server 44 or the network resources 46 to retrieve or otherwise access such information.
As will be discussed in more detail below, the words, phrases, sentences, or other phonetic content uttered by the worker and accepted by the local application server 32 for the logon state 84 during the logon process or while the worker is in the logon state 84 are preferably chosen to create high recognition accuracy while limiting or rninimizing the length of time required for the worker to complete the logon process while in the logon state 84. In addition, the words, phrases, sentences, or other phonetic content uttered by the worker during the logon process or while in the logon state 84 are preferably chosen such that they seem natural to the worker and make intuitive sense to the worker. After the worker at the remote unit 34 successfully completes the logon process while in the logon state 84, the worker and the remote unit or client 34 transition from the logon state 84 to the obtain work state 86. Such transition from the logon state 84 to the obtain work state 86 may happen automatically or only after the worker provides a predefined or predetermined input signal or utterance to the local application server 32 via the remote unit 34, such input signal or utterance being one of the allowed input signals or utterances for the worker while the worker is in the logon state 84. If the worker does not successfully complete the logon process while in the logon state 84, the worker and the remote unit 34 may be automatically returned to the wait state 82 by the local apphcation server 32. The worker may also be directly and automatically transitioned by the local apphcation server 32 from the logon state 84 to the logoff state 88, which will teπninate the worker's session or other communication with the local application server 32. If desired, the worker may signal or indicate a
desire to transition from the logon state 84 to the wait state 82 or to the logoff state 88 by providing a predefined or predetermined input signal or utterance to the local application server 32 via the remote unit 34, such predefined or predetermined input signal or utterance being one of the input signals or utterances allowed for use by the worker or accepted by the local application server 32 while the worker is in the logon state 84.
While in the obtain work state 86, the remote unit 34 will communicate or interface with the local application server 32 to obtain one or more instructions or tasks for the worker. That is, the worker will receive one or more tasks or other instructions to be completed by the worker from the local application server 32 via the remote unit 34. At any time while the worker is in the obtain work state 86, the worker may terminate the session or communication with the local application server 32 and move or transition to the logoff state 88 by providing a predefined or predetermined input signal or utterance to the local application server 32 via the remote unit 34, such predefined or predetermined input signal or utterance being one of the input signals or utterances allowed to be used by the worker or accepted by the local apphcation server 32 while the worker is in the obtain work state 86, as will be described in more detail below. After the worker has obtained one or more instructions from the local apphcation server 32 via the remote unit 34, the worker transitions from the obtain work state 86 to a process work state 90, either automatically or by the worker providing a predefined or predetermined input signal or utterance to the local application server 32 via the remote unit 34, such input signal or utterance being one of the input signals or utterances allowed to be used by the worker or accepted or recognized by the local apphcation server 32 while the worker is in the obtain work state 86. While in the process work state, the worker performs or completes the tasks or instructions provided to the worker by the local application server 32 via the remote unit 34 while in the obtain work state 86. Once again, the instructions are preferably provided aurally to the worker by the remote unit 34 using speech synthesis and headphones on the remote unit 34. The worker may also provide oral statements back to the remote unit 34 while in the obtain work state 86 to indicate to the remote unit 34 and, as a result, the local application server 32, that the worker has received the instructions, that the worker does or does not understand the instructions, that the worker is ready to transition to the process work state 90, or to make other requests, as will be described in more detail below.
As previously discussed above and as will be discussed in more detail below, the words, phrases, sentences, or other phonetic content or utterances uttered by the worker and accepted by the local apphcation server 32 for the obtain work state 86 while the worker is in the obtain work state 86 are preferably chosen to create high recognition accuracy or to have a high probability of accurate decoding by either the remote unit 34 or the local application server 32, while also limiting or rniriirnizing the length of time required for the worker to complete the obtain work process while in the
obtain work state 86. In addition, the words, phrases, sentences, or other phonetic content uttered by the worker while in the obtain work state 86 are preferably chosen such that they seem natural to the worker and make intuitive sense to the worker while allowing the worker to transition to the process work state 90 or the logoff state 88. After the worker has successfully obtained one or more instructions or other work while in the obtain work state 86, the worker and the remote unit 34 transfer or transition to the process work state 90 either automatically or by the worker providing the appropriate input signal or utterance to the local apphcation server 32 via the remote unit 34, wherein the worker attempts to complete the instructions or tasks assigned to it by the local application server 32 while the worker was in the obtain work state 86. During processing or completion of the instructions or tasks while the worker is in the process work state 90, the worker may be in contact or communication with the local apphcation server 32 via the remote unit 34. In addition, the worker may terminate the process work state 90 and transition to the logoff state 88 by providing a predefined or predetermined input signal or utterance to the local apphcation server 32 via the remote unit 34, as will be described in more detail below. If after the completion of the tasks or instructions by the worker while the worker is in the process work state 90, the worker is required to report task related details to the local application server 32, the worker will transition to the report work state 92. Such transition from the process work state 90 to the report work state 92 may happen automatically or may have to be initiated by the worker by the worker providing a predefined or predetermined input signal or utterance to the local application server 32 via the remote unit 34, such input signal or utterance being one of the input signals or utterances allowed to be used by the worker while the worker is in the process work state 90. If no such reporting is required, the worker may transition back to the obtain work state 86 to receive one or more new instructions or tasks from the local apphcation server 32 via the remote unit 34.
As previously discussed above and as will be discussed in more detail below, the words, phrases, sentences, or other phonetic content uttered by the worker and accepted by the local apphcation server 32 for the process work state 90 while the worker is in the process work state 90 are preferably chosen to create high recognition accuracy while limiting or minimizing the length of time required for the worker to complete the process work interaction process while the worker is in the process work state 90. In addition, the words, phrases, sentences, or other phonetic content uttered by the worker while in the process work state 90 are preferably chosen such that they seem natural to the worker and make intuitive sense to the worker while allowing the worker to move or transition easily to the obtain work state 86, the logoff state 88, or the report work state 92.
In general, little phonetic content will likely be created by the worker while in the process work state 90. That is because, during the process work state 90, the worker is primarily implementing the
one or more instructions or tasks received from the local application server 32 while in the obtain work state 86. Most, if not all, communication initiated by the worker to the remote unit 34 or the local application server 32 while the worker is in the process work state 90 will be directed to transitioning the worker to another state, such as the report work state 92, as will be described in more detail below. After the worker has completed one or more tasks provided to the worker from the local application server 32 during the obtain work state 86, the worker transitions from the process work state 90 to the report work state 92. Such transition may happen automatically or by the worker providing a predefined or predetermined input signal or utterance to the local application server 32 via the remote unit 34, such input signal or utterance being one of the input signals or utterances allowed to be used by the worker or accepted by the local application server while the worker is in the process work state 90. The local application server 32 may require the worker to enter the report work state 92 after each single task has been completed by the worker, after a series or set of multiple tasks have been completed by the worker, or periodically as initiated by the worker, the local application server 32, the enterprise server 44, etc. For example, the local apphcation server 32 and/or the enterprise server 44 may initiate a request for information to the worker via the remote unit 34, thereby requiring the worker to transition from the process work state 90 to the report work state 92. Such a request may happen at regularly scheduled intervals, sporadically at times to be determined by either the enterprise server 44 or the local apphcation server 32, or both.
After a worker reports his or her status or provides any other information requested or required during the report work state 92, the worker may transition back to the process work state 90 to continue completing instructions or tasks previously provided to the worker while in the obtain work state 86 or to the obtain work state 86 so that the worker can receive new or updated instructions or tasks from the local apphcation server 32. The worker, remote unit 34, or the local apphcation server 32 may also terminate the worker's session, thereby transitioning the worker to the logoff state 88. As with the states previously described above, any communication by the worker to the local apphcation server 32 is preferably provided orally or audibly by the worker to the remote unit 34 and processed or decoded by the remote unit or chent 34 or passed to the local application server 32 for processing or decoding. The worker may also provide oral statements back to the remote unit 34 while in the report work state 92 to indicate to the remote unit 34 and, as a result, the local apphcation server 32, that the worker is ready to transition from the report work state 92 to another state or to make other requests or provide other information, as will be described in more detail below.
After the worker and remote unit 34 transition to the logoff state 88 from the obtain work state 86, the process work state 90, or the report work state 92, the worker ends the session or communication with the local apphcation server 32, thereby freeing up the remote unit 34 for use by
another worker, diagnostic service or maintenance, etc. While in the logoff state 88, the remote unit 34 and/or the local application server 32 may ask the worker to verify that he or she wishes to terminate the session or request other information from the worker. If the worker does not wish to terminate the session or if the worker entered the logoff state 88 inadvertently, the worker may be returned or transitioned to the state at which the worker was in just prior to entering the logoff state 88.
After the remote unit 34 has completed logoff state 88, the remote unit 34 may transition or transfer immediately and/or automatically back to the logon state 84 for use by the same or a new worker or the remote unit 34 may transition to the wait state 82 to await activation or use by the same or another worker. The wait state 82 may be combined with or considered as part of either the logon state 84 or the logoff state 88 such that the wait state 82 does not exist separately. The logoff state 88 may also be combined with the logon state 84 into a more generalized connection/disconnection operational state if desired.
As previously discussed above, a significant aspect of the present invention is that a worker uses audible words, phrases, sentences, or other utterances or other phonetic content while communicating with a remote unit or client to the local apphcation server 32 and while in a state or to initiate or complete transition from one operational state to another. Each operational state has its own set of allowed utterances that will be accepted by the local apphcation server 32 or otherwise be considered as a valid input from a worker while the worker is in the state. Different states may have different sets of allowed or acceptable utterances and there may be some utterances that are allowable in some or all of the states. Therefore, the words, phrases, sentences, or other utterances chosen become a significant part of the interface between a worker and a remote unit. Preferably the words, phrases, sentences or other utterances or phonetic content selected for use for each state balance the considerations of recognition accuracy, efficiency, and ease of use.
Selection of allowed utterances for each state begins with theoretical considerations such as phoneme content, phrase length, and semantic appropriateness of utterances. Each of these criteria will be discussed in more detail below. Candidate utterances are then evaluated empirically to see if they work well in practice. Finally, utterances are again evaluated empirically in the full system context to see if they work well together as an integrated user interface.
Semantic appropriateness of an utterance is directed to the naturalness of the utterance from a worker or user point of view and includes a deteπriination of whether the utterance makes intuitive sense to the worker or user. Utterances are needed to allow a worker to interact with the system 30 so that the worker can communicate with a remote unit and, as a result, the local apphcation server 32. Primary considerations for semantic appropriateness are that the utterances make sense as natural phrases to the worker and that the utterances are easy to remember and speak. Therefore, for example,
the utterances preferably do not include tongue twisters, non-grammatically correct phrases or sentences, or a series or string of unrelated words. The utterances do not have to make sense to everyone, but should make sense to an experienced user of the application or an experienced worker using the system 30 in a particular environment. The length of an utterance will also affect the usefulness and efficiency of a speech recognition interface. Generally, shorter utterances are preferable over longer utterances because they improve the efficiency of the overall interaction, both of the worker or user who can take less time to speak the utterance and of the system that can take less time to decode or figure out what was said or uttered by the worker or user. There is a tradeoff between utterance length and recognition accuracy, however, because very short utterances may be difficult for the local application server 32 to recognize or decode accurately and short utterances are sometimes difficult to distinguish from background noise. Shorter utterances may also be more ambiguous out of context and therefore may not be as appropriate semantically.
The phoneme content of an utterance plays a significant role in whether or not to include the utterance in the speech recognition interface. Phonemes, or phones, are short speech elements akin to an alphabet for sounds. All languages, including American English, can be described in terms of a set of distinctive sounds, or phonemes. For example, in American English, there are approximately forty- two phonemes including vowels, diphthongs, semi-vowels, and consonants.
The actual sounds present in the words of candidate utterances are considered based on experience with how well these individual sounds or phonemes are recognized. For example, better phrases are those that avoid high-frequency initial fricatives (Iff, Isl, /sh/, etc.) as in the words "file" or "show," that avoid high front vowel (/iy/) as in the words "clean" or "read," that avoid initial aspirated stops (/p/, /t/, /k ) as in the words "tool" or "copy," that avoid initial liquids (HI, Iτl) as in the words "link" or "write," that avoid initial nasals (In/, Imf) as in the words "new" or "more," or that are multi- syllabic such as the words "information" or "exception." Usually the initial phoneme of a word or a phrase gets the most attention, especially when the speech recognition system is to be triggered by these phrases. Some very common words have many acceptable pronunciation variations, for example, "ok." This often works against a candidate word, especially if the word is very short to begin with.
As previously discussed above, candidate utterances are preferably tested out of context with any apphcation to evaluate their base speech recognition or decoding potential. This provides an initial check of the decisions made prior to this step. Candidate utterances that meet the initial criteria are then tested in the context of the system. The candidate utterances are evaluated for accuracy of recognition or decoding by a speech recognition apphcation operating in a remote unit or in a local application server and for ease of use of a worker. Sometimes an utterance that looked natural to a
worker will no longer seem so in the actual context of the system. If this happens, a new utterance can be chosen given the new information about how the former utterance was inappropriate. Sometimes an utterance will not give the required accuracy. Then a new utterance may be chosen or some part of the speech recognizer or decoder operating in a remote unit or in a local application server may be modified to improve recognition of the initial utterance.
With these criteria in mind, words, phrases, sentences, or other utterances can be selected to allow a worker located at or otherwise using a remote unit, such as the remote unit 34, to provide audible input to the remote unit which is then decoded by the remote unit or passed to the local application server 32 for decoding. Preferably, but not necessarily, the system 30 minimizes the number of allowable utterances available to a worker for each state or for initiating or completing a transition from one state to another state. Furthermore, some words, phrases, sentences, or other utterances may be allowed for some states but not others, or for some transitions between specific states, but not others, as will be discussed in more detail below. Limiting of the number of allowed utterances for a state or a set of states in a directed resource application reduces the number of different utterances that must be decoded by either a remote unit or a local application server, reduces memory requirements for the storage of information related to the utterances, and can significantly increase the efficiency and recognition accuracy of the system 30 by increasing the accuracy of decoding of speech input signals.
In a typical method of selecting utterances based on the criteria previously discussed above, utterances that comprise an application's speech interface with a user or worker are used with different frequencies. That is, for some applications using a speech-centered interface, there exists a relatively small subset of allowed utterances, called "critical utterances," that are used by the user or worker with a significantly higher frequency than the rest of the allowed utterances. Typically, the frequency of using an utterance from a critical set of utterances is over ninety percent (90%) of all of the utterances used or allowed in the application speech interface while the size of the critical set of utterances is under five percent (5%) of the total number of utterances used or allowed in the application speech interface. Each of the states 82, 84, 86, 88, 90, 92 may use a different set of critical utterances to allow a worker to indicate or initiate a transition from one state to another state or to provide information or request to a local application server or an enterprise server via a remote chent while in a state.
Due to the high frequency of use, the number of speech recognition or decoding errors of critical utterances is preferably very low. In order to reflect different impact on an apphcation using a speech-centered interface for different types of errors, three error indices are preferably employed for the system 30: (1) false rejection rate (FR), (2) false acceptance rate (FA), and (3) phrase error rate
(ER), which will be defined in more detail below. In order to increase the efficiency and usability of a speech interface, the following constraints on utterances from a critical set of utterances are preferably imposed for each allowed operational state: FR is less than five percent (5%), FA is less than three percent (3%), and ER is less than two percent (2%). One significant objective of designing a critical set of utterances for different states is to satisfy the above constraints on error indices while selecting appropriate utterances, e.g., utterances in common use in the application domain, utterances intuitively fitting to the action or task performed by a user, worker, or other directed resource, and utterances consistent in structure and meaning with the rest of the application. For example, the phrase "gotit," a compression of the phrase "got it," satisfies these objectives as the response indicating fulfillment of an order request (work completion) in a picking apphcation domain. Therefore, the phrase "gotit" may be one of the allowed critical utterances for a worker while the worker is in the process work state 90. An empirical approach can be used to select critical utterances for each allowed state for a worker. First, utterances that are in common use or that make intuitive sense in the application domain to describe user or worker actions and their outcomes for the apphcation domain are gathered. An apphcation domain can be defined as some collection of specific apphcation areas that can be considered essentiaUy the same area for the purpose of designing a user interface. For example, picking as a directed resource apphcation is essentially the same if the products to pick are food items or alternatively if the products to pick are motorcycle parts; thus picking is an example apphcation domain. Based on the apphcation usage scenarios, a critical set of utterances (a small subset of the gathered utterances with very high frequency of use) exists for each state. After such a set of critical utterances is determined for each state, a dialog is structured between the worker and the apphcation by constraining the sets of utterances to groups that can be accepted by the apphcation at each given point in its operation or at each allowed state. In doing so, it is preferred to group together critical utterances and to separate non-critical utterances into different groups. In some cases, the removed non-critical utterances are preferably replaced with a single "exception" utterance (further treated as critical) in the group of critical utterances that refers to a different group where those utterances are placed or that transitions a worker from a state where such non-critical utterances are not allowed to be used by the worker to a state where such non-critical utterances can be used by the worker. The exception utterance will be used by the worker to initiate a transition between the two states. Next, potential speech recognition or decoding errors in the obtained groups of utterances are estimated. Such errors can be created by, for example, the application rejecting a critical utterance either as noise, or as being unreliably recognizable with respect to other utterances in the same group of utterances, confusing one critical utterance with another critical utterance in the same group of utterances, confusing a critical utterance with a non-critical utterance from the same group of
utterances, confusing two non-critical utterances from the same group of utterances, mistaking utterances outside of a group of utterances for those inside the group of utterances. Confuseability (i.e., a degree of phonetic similarity) among the utterances in the same group is preferably niinimized by changing the words that comprise the utterances and re-estimating potential recognition errors. Phonetic separation of words from the noise sounds (typical for an operating noise background for a given application) is also preferably maximized by changing the words that comprise the utterances and re-estimating potential recognition errors.
After the above optimization stage is finished such that each allowed state has a defined set of allowed utterances, empirical tests can be performed to measure recognition or decoding error. A corpus or collection of spoken utterances with appropriate frequencies of use according to apphcation usage scenarios can be used to perform recognition tests with the speech recognition engine and to evaluate error indices FR, FA and ER.
For a given constrained set of utterances, error indices for critical utterances can be defined as follows: (1) FR as a percent of spoken critical utterances from the constrained set that are incorrectly rejected by a speech recognizer or decoder;
(2) FA as a percent of spoken utterances outside the constrained set or non-critical utterances from the constrained set that are accepted as some critical utterances from the constrained set; and
(3) ER as a percent of spoken critical utterances from the constrained set recognized as different utterances from the constrained set (either critical, or non-critical).
Similar indices are defined for non-critical utterances, and they are also preferably evaluated.
Based on the results of recognition tests, the phonetic composition of the critical utterances is preferably optiiriized, as well as still-confuseable non-critical utterances in the same sets. Two adjacent words in an utterance may be joined to more precisely represent the way it is going to be uttered by a user or worker. For example, a critical utterance "got it" was replaced with a compound made-up word
"gotit" in order to disallow any pause between the two words, and to develop more precise phonetic pronunciations for it. Taken as separate words, "got" and "it" would have to allow a speech recognition engine or decoder to insert a pause between them which would have increased a chance of confusing "it" with noise. It would have also made it more difficult to account for precise co-articulation between the two words in a way that they are commonly pronounced together. Some critical utterances (usually single-word ones) may be represented with more detailed phonetic models, to account for important pronunciation details for the utterance. After a number of changes are made and re-tested, the resulting sets of utterances (both critical and non-critical) for each state are ready for use in an apphcation speech interface in a directed resource environment.
Referring back again to Figure 2, exemplary utterances for the states 82, 84, 86, 88, 90, 92 and for initiating or completing transitions between states will now be described in more detail. The utterances are selected according to the criteria described above and generally work well in a directed resource environment. Therefore, the utterances contain phonetic content that improves speech recognition accuracy, /. e. , the accuracy of the decoding of the speech input from a worker. The utterances also have minimal length to improve performance while also making intuitive and natural sense to a worker.
As previously described above, a worker first activates a remote unit while in the logon state 84 or after transitioning to the logon state 84 from either the wait state 82 or the logoff state 88. While in the logon state 84, the worker provides audible or oral information to the remote unit that is decoded by either the remote unit or transmitted to a local apphcation server for decoding. A sample dialog between the worker and the remote unit 34 for logging in to the system 30 or while otherwise in the logon state 84 is as follows:
System (via speech synthesis to a remote client): Please say employee ID, work class, and shift.
Worker: three five one, picker, shift alpha.
System (via speech synthesis to a remote client): Are you Al Fansome, three five one, picker, shift alpha, speaker model A?
Worker: Yes. Thus, the phrases or words "yes," "picker," "shift alpha," etc., may form part of the allowed set of input signals or utterances that a worker can use and which will be accepted by the local apphcation server 32 while the worker is in the logon state 84.
After a worker has completed the logon process while in the logon state 84, the worker or the remote unit might automatically transition the worker and the remote unit from the logon state 84 to the obtain work state 86. Alternatively, the remote unit may query the worker to see if the worker is ready to transition from the logon state 84 to the obtain work state 86 or the remote unit may simply wait until the worker provides an audible input to the remote unit signaling the worker's desire to transition from the logon state 84 to the obtain work state 86. A dialog between the remote unit and the worker in which the remote unit queries or asks the worker if the worker is ready to transition from the logon state 84 to the obtain work state 86 is provided as follows:
System (via speech synthesis to a remote chent): Please request work.
Worker: Give me more work.
A list of words, phrases, sentences, or other utterances that can be supplied or provided by a worker to a remote unit and, as a result, to the local application server 32, to signal the worker's desire to transition from the logon state 84 to the obtain work state 86 can include: Worker: Give me more work. or
Worker: Gimme more work.
Therefore, the phrases "give me more work" or "gimme more work" may form part of the allowed set of input signals or utterances that a worker can use and which will be accepted by the local application server while the worker is in the logon state 84. As previously discussed above, a worker in the obtain work state 86 is receiving one or more instructions or tasks from the local application server 32 via a remote unit. The worker may signify reception or receipt of the instruction(s) or task(s) by providing "ready to go" or "begin work" as an input signal to a remote unit and, as a result, the local application server 32. The utterances "ready to go" or "begin work" may be critical utterances for the obtain work state 86. These utterances were chosen with the criteria for utterances previously discussed above in mind. Receipt by the local apphcation server 32 of the utterance "ready to go" from a worker will transition the worker from the obtain work state 86 to the process work state 90.
Once the worker is in the process work state 90, the worker will attempt to complete the instruction(s) or task(s) provided to the worker by the local application server 32 while the worker was in the obtain work state 86. After completion of one or more instructions or tasks, the worker may signify such completion by uttering "got it" or "gotit" as an input to a remote unit and, as a result, the local apphcation server 32. These utterances meet the criteria described previously above and may be critical or allowed utterances for the worker while the worker is in the process work state 90 and will be accepted by the local apphcation server 32 while the worker is in the process work state 90. Depending on the apphcation that the worker is currently involved with, the worker may then automatically transition back to the obtain work state 86 to receive one or more instructions or tasks, automatically transition to the report work/status state 92 to provide further information to a remote unit or a local apphcation server, or the worker may remain in the process state 90 to initiate or complete another instruction or task. Now referring to Figure 3, a second state diagram or work model 100 is provided that can be used with the system 30. The directed resource or work model 100 is essentiaUy a more sophisticated or complex implementation of the state diagram or model 80 provided in Figure 2 and includes the wait state 82, the logon state 84, the obtain work state 86, the process work state 90, and the report
work/status state 92, and the logoff state 88. Each of these states works in a manner similar to their description provided previously above in relation to the state diagram 80.
In addition to the states 82, 84, 86, 88, 90, 92, the state diagram 100 includes individually optional states such as a process exceptions state 102 for handling problems, commands, or other exceptions to the normal or expected flow of operation, a break state 104 that allows a worker to temporarily suspend initiation, execution, or completion of one or more instructions or tasks whUe in the process work state 90, a get clarification state 106 that aUows a worker to ask questions about the information or instructions previously provided to the worker or other topics as allowed, and a get productivity state 108 that allows a worker to obtain information or updates regarding the worker's performance, production, efficiency, goal achievement, etc., while in the process work state 90 or the report work/status state 92. Each of these states 102, 104, 106, 108 will be described in more detail below.
In general, the process exceptions state 102 is preferably reachable from most, if not aU, allowed states in a system. A worker might be transitioned to the process exceptions state 102 in general situations such as when a worker has provided an audible input to a remote unit, in response to a query created by the local application server 32 via the remote unit, that cannot be decoded by the remote unit or the local application server 32 into a command or statement that is valid for the worker's current state. For example, the worker may speak an utterance that for some reason the system 30, particularly the local apphcation server 32, fails to decode. This may be an utterance pronounced poorly or incompletely by the worker, or an utterance provided by the worker to the remote unit not appropriate or allowed for the given context or state that the worker is in. The system 30 or local apphcation server 32 may respond to this with a message such as "did not recognize last utterance" communicated to the worker via the worker's remote chent. The worker may then repeat what was said, pronouncing it more carefully or completely, or say something else. In these general situations, the system 30, more particularly the local apphcation server 32, might query the worker again via the remote unit and/or require the worker to provide a new audible response or input to the remote unit. Once a valid response or input has been provided by the worker to the remote unit and, as a result, the local apphcation server 32, the worker wiU preferably automatically transition from the process exceptions state 102 back to the worker's previous state. If the exception cannot be handled or dealt with while the worker is in the process exceptions state 102, the worker may automaticaUy be transitioned from the process exceptions state 102 to the logoff state 88.
In addition to general exceptions, state specific exceptions might also exist that relate specificaUy to the function or activity of a worker while the worker is in a given state. Thus, the state specific exceptions might vary or be different from state to state. For example, while a worker is in the
logon state 84, the worker might provide several types of information to a remote unit and, as a result, a local application server. If the worker's responses, inputs, or other utterance to the remote unit are decoded correctly, either by the remote unit itself or the local application server, the worker might not create any general exceptions. However, the worker might provide information or an utterance to the remote unit that creates a state specific exception that will automatically transition the worker from the logon state 84 to the process exceptions state 102. For example, the worker might provide inconsistent information during the logon state 84 that, while not creating a general exception to be dealt with in the state 102, creates a logon state 84 specific exception that wUl be dealt with by the system 30 by transitioning the worker from the logon state 84 to the process exceptions state 102. The worker wiU then be requested by the local apphcation server 32 via the worker's remote unit to provide clarifying information whUe in the process exceptions state 102 to eliminate the confusion and allow the worker to transition back to the logon state 84.
State specific exceptions may exist for any of the states in the state diagram 100 and some exceptions may require more time or information to complete than others. In addition, if the worker cannot resolve an exception while in the process exceptions state 102, the worker may be automaticaUy transitioned from the process exceptions state 102 to the logoff state 88.
The break state 104 allows a worker in the process work state 90 to temporarily halt the worker's activities and signal such a halt to the local application server 32 via a remote unit. In many applications, the worker will be able to suspend work progress, at least temporarily, without further interaction with a remote unit or without receiving consent from a remote unit or a local apphcation server. However, there may be circumstances where a worker wishes to signal the local apphcation server 32 that the worker is suspending completion of a task or instruction. For example, a worker's performance on a task may be measured by how long the worker takes to complete the task. If the worker needs to suspend work or activity for any reason, such as a restroom break, the worker may want to indicate to the local apphcation server 32 that the time measurement should be halted.
Therefore, the worker transitions from the process work state 90 to the break state 104. Once the worker transitions back to the process work state 90 from the break state 104, the time measurement for the worker's performance or completion of tasks can resume.
As with other transitions between states described previously above, a worker preferably can signal a transition from the process work state 90 to the break state 104 by an easy and intuitive utterance provided by the worker to a remote unit that has a high probability of being accurately decoded by a remote unit or by the local apphcation server 32. More specifically, to signal a transition from the process work state 90 to the break state 104, the worker preferably utters "gimmee a break" or "give me a break" as an audible input signal to a remote unit, both phrases of which satisfy the criteria
described previously above, and which may be critical or allowed utterances for the process work state 90 and that can be used by a worker while in the process work state 90. The worker can signal a transition from the break state 104 to the process work state 90 by uttering "ready to go" as an audible input signal to a remote unit, which is the same utterance that transitions the worker from the obtain work state 86 to the process work state 90 and which may be a critical or allowed utterance for the obtain work state 86 and/or the break state 104 that a worker can use while in these states.
In general, a worker in the get clarification state 106 is requesting further information that may be specific to the state that the worker is currently in or that may be more of a more generalized nature. Therefore, in a manner similar to the process exceptions state 102 previously described above, the worker can preferably transition to the get clarification state 106 from all or nearly all of the other states. The worker can signal a transition from a state to the get clarification state 106 by using the utterance "gimmee " where the blank is filled in for the particular request. Starting the utterance with the phrase "gimmee" or "gimme" satisfies the criteria discussed previously above for the utterance and the phrases may be in a group of aUowed utterances for each of the other states and might even be considered a critical utterance that a worker can use for one or more states. More specificaUy, a worker might use the utterance "gimmee the product code" to transition from the process work state 90 to the get clarification state 106 based on the worker's desire to obtain more information about a product or item the worker is dealing with. While in the get clarification state 106, the system 30 might provide more information to the worker via a remote unit or query the worker to provide more specific information relating to the worker's request. Another general example utterance is "gimmee the time" or "gimmee the date' when the worker wishes to obtain such information from the system 30 or, more specifically, from the local application server 32 via a remote unit or client.
As another example, a worker may utter "gimmee some help" as an input signal to a remote unit to request further information from the local apphcation server 32 about the current task or instruction the worker is to be performing, the current state that the worker is in, or about what words, phrases, sentences, or other utterances are allowable or available to the worker in the worker's current state. Other transition utterances beginning with the phrase "gimmee" that may be used as audible input signals by a worker to a remote unit wUl be discussed in more detail below.
The get productivity state 108 allows a worker to obtain performance information from the local apphcation server 32 either while the worker is in the process work state 90 or while the worker is in the report status/work state 92. A worker might use the utterance "how am I doing" to transition from the process work state 90 or the report status/work state 92 to the get productivity state 108 by providing such utterance as an audible input signal to a remote unit or client. The utterance "how am I doing" satisfies the criteria described previously above.
Once a worker is in the get productivity state 108, the system 30 may query the worker via a remote unit to provide more information or may provide the worker with a pre-determined statement of information. The worker can signify a desire to transition back to the previous state (either the process work state 90 or the report work/status state 92) by uttering "ready to go," which may be a critical utterance for the state 108.
In an application of the system 30 and the state diagram 100 in a picking environment, additional utterances may be allowed and used to enable a worker to transition from one state to another or to interact with a remote unit or local apphcation server 32. For example, while in the process work state 90, the worker may utter "pick" and then two digits such as "one two" or "five nine" to indicate to the remote unit that the worker has picked twelve or fifty-nine of the specified items, respectively. Thus, the utterance "pick one seven" would indicate that the worker has picked seventeen of the specified items while the utterance "pick three two" would indicate that the worker has picked thirty-two of the specified items. If the worker needs to pick more than ninety-nine items, the worker can utter "superpick" and then three digits. Thus, the utterance "supeφick one one nine" would indicate that the worker has picked one hundred and nineteen items while the utterance "supeφick one three six" would indicate that the worker has picked one hundred and thirty-six items. Utterances having a format beginning with "pick" or "supeφick" are chosen with the criteria for utterances previously described above in mind and may be considered, as allowed or critical utterances for the process work state 90 that can be used by a worker while the worker is in the process work state 90. If for some reason a worker cannot fulfil an order or otherwise complete an assigned picking task while in the process work state 90, the worker might indicate this to a remote unit and, as a result, the local application server 32, by uttering "cut product" or other statement which preferably aUows the worker to move on to the next task or instruction without completing the current task or instruction. If desired, such a "cut product" utterance may also transition the worker from the process work state 90 to the process exceptions state 102 so that the worker can provide more information as to why the task cannot be completed. The utterance "cut product" is chosen with the criteria previously described above in mind and may be a critical or allowed utterance for the process work state 90 usable by a worker while the worker is in the process work state 90.
As an alternative to uttering a "cut product" or other statement which tells the local apphcation server 32 that a specific task or instruction cannot be completed is a "backload product" or other statement that merely tells the local application server 32 that completion of an instruction or task wiU or should be temporarily delayed or deferred until a later time. If desired, such a "backload product" utterance may also transition the worker from the process work state 90 to the process exceptions state 102 so that the worker can provide more information to the local information server 32 as to why the
task should or must be delayed. The utterance "backload product" satisfies the criteria previously described above and may be a critical or allowed utterance for the process work state 90 that a worker can use while the worker is in the process work state 90.
The foregoing description is considered as illustrative only of the principles of the invention. Furthermore, since numerous modifications and changes will readily occur to those skilled in the art, it is not desired to limit the invention to the exact construction and process shown and described above. Accordingly, all suitable modifications and equivalents may be resorted to falling within the scope of the invention as defined by the claims that follow. For example, the remote units, local application server, and enteφrise server may be any computer or computer system on which one or more computer software applications are operating. In addition, other states may be used along with or in place of any of the states described in the state or directed resource models 80 and 100.
The words "comprise," "comprises," "comprising," "include," "including," and "includes" when used in this specification and in the following claims are intended to specify the presence of stated features, elements, integers, components, or steps, but they do not preclude the presence or addition of one or more other features, elements, integers, components, steps, or groups thereof.
Claims
1. A system in which a worker can operate as a directed resource in one or more states and transition from one state to another by making one or more audible statements, comprising a server and a client device that can communicate with said server, said server capable of providing information to the worker via said client device and allowing the worker to function in an obtain work state in which the worker receives one or more tasks from said local application server via said client device to be completed by the worker, a process work state in which the worker completes said one or more tasks, and a report work state in which the worker reports status information to said server via said client device.
2. The system of claim 1, wherein said worker can initiate transition from said obtain work state to said process work state by providing an audible utterance input to said chent device.
3. The system of claim 2, wherein said utterance input includes the phrase "ready to go. "
4. The system of claim 1, wherein said worker can signal completion of a task to said server by providing an audible utterance input to said chent device.
5. The system of claim 4, wherein said utterance input includes the phrase "got it" or the phase "gotit."
6. The system of claim 1, wherein said worker can also function in a get clarification state.
7. The system of claim 6, wherein said worker can initiate transition to said get clarification state from at least one of said obtain work state, said process work state, or said report work state by providing an audible utterance input to said chent device.
8. The system of claim 7, wherein said utterance input includes the phrase "gimmee" or the phrase "gimme" or the phrase "give me."
9. The system of claim 1, including a critical set of utterances for at least one of said obtain work state, said process work state, or said report work state.
10. The system of claim 9, wherein said at least one critical set of utterances has a false rejection rate less than or equal to five percent (5%).
11. The system of claim 9, wherein said at least one critical set of utterances has a false acceptance rate less than or equal to three percent (3%).
12. The system of claim 9, wherein said at least one critical set of utterances has an error rate less than or equal to two percent 0/-
13. A directed resource model for a worker wherein instructions are provided to the worker from a server via a client device, comprising an obtain work state in which the worker receives one or more tasks to be completed by the worker from the server via the client device, a process work state in which the worker completes said one or more tasks, and a report work state in which the worker reports status information to said server via a client device, further wherein said worker can initiate transition from said obtain work state to said process work state by providing a first audible input signal to said chent device and said worker can initiate transition from said process work state to said report work state by providing a second audible input signal to said client device.
14. The directed resource model of claim 13, wherein said first audible input signal includes the phrase "ready to go."
15. The directed resource model of claim 13, wherein the worker can signal completion of a task by providing a third audible input signal to said client device.
16. The directed resource model of claim 15, wherein said third audible input signal includes the phrase "got it" or the phrase "gotit."
17. The directed resource model of claim 13, wherein the worker can transition from at least one of said obtain work state, said process work state, or said report work state to a get clarification state by providing a third audible input signal to said client device.
18. The directed resource model of claim 17, wherein said third audible input signal includes the phrase "gimmee" or the phrase "gimme" or the phrase "give me."
19. The directed resource model of claim 13, wherein at least one of said obtain work state, said process work state, or said report work state includes a group of critical audible input signals.
20. A work model for a worker wherein instructions are provided to the worker from a server via a chent device, comprising a first state of allowed worker activity, a second state of allowed worker activity, a third state of allowed worker activity, a first utterance that transitions the worker from said first state of allowed worker activity to said second state of aUowed worker activity when the worker provides said first utterance to the chent device, and a second utterance that transitions the worker from said second state of allowed worker activity to said third state of aUowed worker activity when the worker provides said second utterance to the chent device.
21. The work model of claim 20, wherein said first state aUows the worker to obtain at least one instruction or task from the server.
22. The work model of claim 21 wherein said first utterance includes the phrase "ready to go."
23. The work model of claim 21 , wherein said second state allows the worker to initiate completion of said at least one instruction or task.
24. The work model of claim 20, wherein said second state allows the worker to initiate completion of said at least one instruction or task.
25. The work model of claim 24, wherein the worker can indicate completion of said at least one instruction or task by providing a third utterance to said client device.
26. The work model of claim 25, wherein said third utterance includes the phrase "got it."
27. A method of directing a worker to complete one or more tasks, comprising: establishing a plurality of allowed states of activity for the worker; and establishing a group of at least one allowed utterance for use by the worker for each of said allowed states, wherein at least one utterance in at least one group of at least one aUowed utterance is a transition utterance that allows the worker to transition from one of said allowed states to another of said allowed states.
28. The method of claim 27, wherein said establishing a plurality of allowed states includes establishing an obtain work state for the worker.
29. The method of claim 28, wherein said estabhshing a plurality of aUowed states includes estabhshing a process work state for the worker.
30. The method of claim 29, wherein said establishing a plurality of aUowed states includes establishing a report work state for the worker.
31. The method of claim 30, wherein said establishing a plurality of allowed states includes establishing a get clarification state for the worker.
32. The method of claim 27, including allowing the worker to conduct activity in each of said aUowed states and to transition from at least one state to at least one other state.
33. The method of claim 27, including communicating tasks or information to the worker from a server to a client device worn or used by the worker.
34. The method of claim 28, wherein said establishing a group of aUowed utterances includes estabhshing a transition utterance from said obtain work state to said process work state that includes the phrase "ready to go."
35. The method of claim 29, wherein said establishing a group of aUowed utterances includes estabhshing an allowed utterance for said process work state that includes the phrase "got it" or the phrase "gotit."
36. The method of claim 31 , wherein said establishing a group of aUowed utterances includes a transition utterance to said get clarification state from at least one of said obtain work state,
said process work state, or said report work state that includes the phrase "gimmee" or the phrase "gimme" or the phrase "give me."
37. The method of claim 27, including providing a client device which receives utterances made by the worker.
38. The method of claim 37, including sending said utterance from said chent device to a server.
39. The method of claim 38, including decoding said utterances made by the worker.
40. The method of claim 38, including sending information or at least one task to be completed by the worker from said server to said chent device.
41. The method of claim 27, including decoding an utterance made by the worker and determining if said utterance is within said group of at least one allowed utterance for the worker's current state of activity.
42. The method of claim 39, including deteπriining if said utterance is within said group of at least one aUowed utterance for the worker's current state of activity.
43. The method of claim 27, including providing an instruction to said worker directing said worker to pick one or more of one or more items as part of fulfiUing an order.
44. An apparatus for directing a worker, comprising: means for establishing a plurality of allowed states of activity for the worker; and means for establishing a group of at least one aUowed utterance for use by the worker for each of said allowed states, wherein at least one utterance in at least one group of at least one allowed utterance is a transition utterance that allows the worker to transition from one of said allowed states to another of said aUowed states.
45 The apparatus of claim 44, including means for communicating tasks or information to the worker from a server to a client device worn or used by the worker. 46. The apparatus of claim 44, including means for decoding said utterances made by the worker.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US29279199A | 1999-04-14 | 1999-04-14 | |
| US09/292,791 | 1999-04-14 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2000062222A1 true WO2000062222A1 (en) | 2000-10-19 |
Family
ID=23126215
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2000/010143 Ceased WO2000062222A1 (en) | 1999-04-14 | 2000-04-14 | Interactive voice unit for giving instruction to a worker |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2000062222A1 (en) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR20020031675A (en) * | 2000-10-23 | 2002-05-03 | 국오선 | System for enterprise resource planning by using voice cognition and method thereof |
| WO2003036528A3 (en) * | 2001-10-26 | 2003-11-27 | Reeft Aps | A system and a method for distributing assignments and receiving report data |
| JP2013117961A (en) * | 2011-12-02 | 2013-06-13 | Boeing Co:The | Totaling of aircraft assembly time confirmed at point of time of use |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP0670537A1 (en) * | 1992-04-06 | 1995-09-06 | Edward George Newman | Hands-free user-supported portable computer |
| JPH08123482A (en) * | 1994-10-19 | 1996-05-17 | Nec Corp | Speech operation indicating device |
| US5801946A (en) * | 1995-10-19 | 1998-09-01 | Kawasaki Motors Mfg. Co. | Assembly prompting system |
| EP0872827A2 (en) * | 1997-04-14 | 1998-10-21 | AT&T Corp. | System and method for providing remote automatic speech recognition services via a packet network |
-
2000
- 2000-04-14 WO PCT/US2000/010143 patent/WO2000062222A1/en not_active Ceased
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP0670537A1 (en) * | 1992-04-06 | 1995-09-06 | Edward George Newman | Hands-free user-supported portable computer |
| JPH08123482A (en) * | 1994-10-19 | 1996-05-17 | Nec Corp | Speech operation indicating device |
| US5801946A (en) * | 1995-10-19 | 1998-09-01 | Kawasaki Motors Mfg. Co. | Assembly prompting system |
| EP0872827A2 (en) * | 1997-04-14 | 1998-10-21 | AT&T Corp. | System and method for providing remote automatic speech recognition services via a packet network |
Non-Patent Citations (2)
| Title |
|---|
| CARLSON S ET AL: "APPLICATION OF SPEECH RECOGNITION TECHNOLOGY TO ITS ADVANCED TRAVELER INFORMATION SYSTEMS", PACIFIC RIM TRANSTECH CONFERENCE. VEHICLE NAVIGATION AND INFORMATION SYSTEMS CONFERENCE PROCEEDINGS,US,NEW YORK, IEEE, vol. CONF. 6, 30 July 1995 (1995-07-30), pages 118 - 125, XP000641143, ISBN: 0-7803-2588-5 * |
| CHAN C -F ET AL: "Design considerations in the selection of an automatic speech recognition system for the quality control inspection function", PROCEEDINGS OF THE GLOBAL TELECOMMUNICATIONS CONFERENCE AND EXHIBITION(GLOBECOM),US,NEW YORK, IEEE, vol. -, 26 November 1984 (1984-11-26), pages 273 - 276, XP002105136 * |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR20020031675A (en) * | 2000-10-23 | 2002-05-03 | 국오선 | System for enterprise resource planning by using voice cognition and method thereof |
| WO2003036528A3 (en) * | 2001-10-26 | 2003-11-27 | Reeft Aps | A system and a method for distributing assignments and receiving report data |
| JP2013117961A (en) * | 2011-12-02 | 2013-06-13 | Boeing Co:The | Totaling of aircraft assembly time confirmed at point of time of use |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20020123894A1 (en) | Processing speech recognition errors in an embedded speech recognition system | |
| US12035070B2 (en) | Caption modification and augmentation systems and methods for use by hearing assisted user | |
| US6839667B2 (en) | Method of speech recognition by presenting N-best word candidates | |
| US6487534B1 (en) | Distributed client-server speech recognition system | |
| US5893063A (en) | Data processing system and method for dynamically accessing an application using a voice command | |
| EP0773532B1 (en) | Continuous speech recognition | |
| US6314397B1 (en) | Method and apparatus for propagating corrections in speech recognition software | |
| US6934682B2 (en) | Processing speech recognition errors in an embedded speech recognition system | |
| US7865362B2 (en) | Method and system for considering information about an expected response when performing speech recognition | |
| US6754627B2 (en) | Detecting speech recognition errors in an embedded speech recognition system | |
| US7624018B2 (en) | Speech recognition using categories and speech prefixing | |
| CN101211559B (en) | Method and device for splitting voice | |
| JP2017058673A (en) | Dialog processing apparatus and method and intelligent dialog processing system | |
| US9298811B2 (en) | Automated confirmation and disambiguation modules in voice applications | |
| US20120253823A1 (en) | Hybrid Dialog Speech Recognition for In-Vehicle Automated Interaction and In-Vehicle Interfaces Requiring Minimal Driver Processing | |
| US20170256252A1 (en) | Systems and methods for providing non-lexical cues in synthesized speech | |
| EP1181684A1 (en) | Client-server speech recognition | |
| US9460703B2 (en) | System and method for configuring voice synthesis based on environment | |
| US5897618A (en) | Data processing system and method for switching between programs having a same title using a voice command | |
| US20020161584A1 (en) | Method and system for determining available and alternative speech commands | |
| US7461000B2 (en) | System and methods for conducting an interactive dialog via a speech-based user interface | |
| US20020184016A1 (en) | Method of speech recognition using empirically determined word candidates | |
| WO2002089112A1 (en) | Adaptive learning of language models for speech recognition | |
| WO2000062222A1 (en) | Interactive voice unit for giving instruction to a worker | |
| Ward et al. | Hands-free documentation |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AK | Designated states |
Kind code of ref document: A1 Designated state(s): CA JP |
|
| AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
| DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
| 122 | Ep: pct application non-entry in european phase | ||
| NENP | Non-entry into the national phase |
Ref country code: JP |