WO2016147342A1 - 情報提供システム - Google Patents
情報提供システム Download PDFInfo
- Publication number
- WO2016147342A1 WO2016147342A1 PCT/JP2015/058073 JP2015058073W WO2016147342A1 WO 2016147342 A1 WO2016147342 A1 WO 2016147342A1 JP 2015058073 W JP2015058073 W JP 2015058073W WO 2016147342 A1 WO2016147342 A1 WO 2016147342A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- target word
- recognition target
- recognition
- unit
- character string
- Prior art date
Links
- 238000003860 storage Methods 0.000 claims description 37
- 238000004904 shortening Methods 0.000 claims description 20
- 238000012545 processing Methods 0.000 claims description 6
- 238000000034 method Methods 0.000 description 18
- 239000000284 extract Substances 0.000 description 7
- 238000010586 diagram Methods 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000002093 peripheral effect Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000008859 change Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 238000004519 manufacturing process Methods 0.000 description 3
- 230000000877 morphologic effect Effects 0.000 description 3
- 230000015572 biosynthetic process Effects 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 101000661807 Homo sapiens Suppressor of tumorigenicity 14 protein Proteins 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000010411 cooking Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000005401 electroluminescence Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/187—Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/338—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/167—Audio in a user interface, e.g. using voice commands for navigating, audio feedback
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/247—Thesauruses; Synonyms
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/221—Announcement of recognition results
Definitions
- This invention relates to an information providing system for providing information related to a keyword spoken by a user from keywords related to information to be provided.
- the information providing apparatus that provides information selected by a user among information obtained by distribution or the like.
- the information providing apparatus performs linguistic analysis on text information of content distributed from the outside, extracts keywords, displays the keyword as an option on the screen or outputs voice, and the user inputs the keyword by voice input. When selected, the content linked to the keyword is provided.
- the dictionary data generation device that generates speech recognition dictionary data used in a speech recognition device that recognizes an input command based on speech uttered by a user.
- the dictionary data generation device specifies the number of characters of a keyword that can be displayed on a display device for displaying a keyword, and extracts a character string within the range of the number of characters from text data corresponding to an input command Then, it is set as a keyword, and dictionary data is created by associating voice feature value data corresponding to the keyword with content data for specifying the processing content corresponding to the input command.
- JP 2004-334280 A International Publication No. 2006/093003
- Patent Document 1 does not consider the restriction on the number of display characters when a keyword is selected as an option and displayed on the screen. Therefore, when the number of characters that can be displayed on the screen is limited, only a part of the keyword may be displayed. As a result, the user cannot accurately grasp the keyword and cannot utter the correct keyword. As a result, there is a problem that the user cannot provide the content to be selected by utterance.
- Patent Document 2 Although the number of characters that can be displayed is taken into consideration, the character string is deleted for each part of speech and used as a keyword for speech recognition. Information may be lost. Then, when the user speaks what keyword, the user cannot accurately grasp what content is presented, and may not be able to access the desired content. For example, when the keyword “USA” is set for the content related to “US President”, there is a discrepancy between the content and the keyword.
- the recognition target words include not only the original keywords that best represent the content of the audio output content but also words that have little difference from the meaning of the original keywords or at least one of the character strings. It is effective to help users understand Furthermore, considering that the keyword is displayed on the screen, it is effective to provide the content that the user wants to select even if the keyword is mistakenly recognized due to the influence of the character string deletion.
- the present invention has been made to solve the above-described problems. Even when the number of characters that can be displayed on the screen is limited, the operation can be performed so that information desired by the user can be provided.
- the purpose is to improve performance and convenience.
- the information providing system includes an acquisition unit that acquires information to be provided from an information source, a first recognition target word from the information acquired by the acquisition unit, and a first recognition target word that exceeds a specified number of characters.
- Associating the generation unit that generates the second recognition target word using all the character strings reduced to the specified number of characters, the information acquired by the acquisition unit, and the first recognition target word and the second recognition target word generated by the generation unit A first recognition target word or second recognition target consisting of a character string within the specified number of characters generated by the generation unit, a storage unit for storing the voice, a speech recognition unit for recognizing a user's speech and outputting a recognition result character string
- the word is output to the display unit, and when the recognition result character string output from the speech recognition unit matches the first recognition target word or the second recognition target word, the related information is acquired from the storage unit and the display unit or voice In which a control unit for outputting the force unit.
- the second recognition target word is generated using all the character strings obtained by shortening the first recognition target word to the specified number of characters. Therefore, when the user who presented the first recognition target word or the second recognition target word consisting of a character string within the specified number of characters mistakes the presented character string and utters a word other than the first recognition target word However, recognition is possible based on the second recognition target word. Therefore, it becomes possible to provide information that the user desires to select and operability and convenience are improved.
- FIG. 1 It is a figure explaining the outline of the information provision system which concerns on Embodiment 1 of this invention, and its peripheral device. It is a figure explaining the information provision method by the information provision system which concerns on Embodiment 1, and shows the case where a regulation character number is seven characters. It is a figure explaining the information provision method by the information provision system which concerns on Embodiment 1, and shows the case where a regulation character number is five characters. It is the schematic which shows the main hardware constitutions of the information provision system which concerns on Embodiment 1, and its peripheral device. 2 is a functional block diagram illustrating a configuration example of an information providing system according to Embodiment 1. FIG.
- FIG. 5 is a flowchart showing an operation of the information providing system according to the first embodiment, and shows an operation at the time of content acquisition. It is a flowchart which shows operation
- 6 is a functional block diagram illustrating a modification of the information providing system according to Embodiment 1.
- FIG. 1 is a diagram illustrating an outline of an information providing system 1 and its peripheral devices according to Embodiment 1 of the present invention.
- the information providing system 1 acquires content from an information source such as the server 3 via the network 2, extracts a keyword related to the content, and presents the keyword to the user by causing the display 5 to display the screen.
- the uttered voice is input from the microphone 6 to the information providing system 1.
- the information providing system 1 recognizes a keyword uttered by a user using a recognition target word generated from a keyword related to the content, displays the content related to the recognized keyword on the screen 5, and outputs sound from the speaker 4. It is provided to the user by making it output.
- the display 5 is a display unit
- the speaker 4 is an audio output unit.
- the information providing system 1 when the information providing system 1 is an in-vehicle device, the number of characters that can be displayed on the screen of the display 5 is limited due to the existence of a guideline or the like that regulates the display content during travel. Even when the information providing system 1 is a portable information terminal, the number of characters that can be displayed is limited because the display 5 is small and the resolution is low.
- the number of characters that can be displayed on the screen of the display 5 is referred to as a “specified number of characters”.
- FIG. 2 shows a case where the specified number of characters that can be displayed in the character display areas A1 and A2 of the display 5 is 7, and FIG. 3 shows a case where the specified number of characters is 5.
- FIGS. 2 and 3 show an information providing system 1 that provides news information as shown in FIGS. 2 and 3 as content.
- the headline of the news is “American President visits Japan on XX”, and the main text of the news is “American President XX visits Japan for XX day and YY negotiations.
- the subsequent part of the news text is referred to as ⁇ hereinafter abbreviated>.
- the keyword representing the content of the news is, for example, “America President”
- the recognition target word is, for example, “America President (America Daitoyo)”.
- the notation and reading of the recognition target word are described as “notation (reading)”.
- the information providing system 1 displays the keyword “US President” as it is in the character display area A1.
- the recognition target word for this keyword “US President” is “US President (US Daito Ryo)”.
- the information providing system 1 recognizes the keyword spoken by the user B using the recognition target word, and the text “American of the news” related to the recognized keyword.
- XX President comes to Japan for negotiation on XX and YY.
- the information providing system 1 may display a news headline or a part (for example, the beginning) of the news body on the display 5 in addition to or instead of the voice output.
- the information providing system 1 displays a character string “America University” in which the keyword is shortened to the specified number of characters in the character display area A1.
- the recognition target words for the keyword “America University” are the first recognition target word “US President (America Daitoyo)” and the second recognition target word “America University (America Die)”.
- the user B speaks “America President (America Daitoyo)” or “America University (America Die)”
- the information providing system 1 recognizes the keyword spoken by the user B using the recognition target word, and FIG.
- the news text related to the recognized keyword is output as voice or displayed on the screen.
- the keyword display area is two character display areas A1 and A2, but the character display area is not limited to two.
- FIG. 4 is a schematic diagram showing main hardware configurations of the information providing system 1 and its peripheral devices in the first embodiment.
- a CPU Central Processing Unit
- ROM Read Only Memory
- RAM Random Access Memory
- the CPU 101 implements various functions of the information providing system 1 in cooperation with each hardware by reading and executing various programs stored in the ROM 102 or the HDD 106. Various functions of the information providing system 1 realized by the CPU 101 will be described with reference to FIG.
- the RAM 103 is a memory used when executing the program.
- the input device 104 receives user input and is an operation device such as a microphone or a remote controller, or a touch sensor. In FIG. 1, a microphone 6 is illustrated as an example of the input device 104.
- the communication device 105 communicates with an information source such as the server 3 via the network 2.
- the HDD 106 is an example of an external storage device.
- the external storage device examples include a storage that employs a flash memory such as a CD or DVD or a USB memory and an SD card in addition to the HDD.
- the output device 107 presents information to the user, and is a speaker, a liquid crystal display, an organic EL (Electroluminescence), or the like. In FIG. 1, a speaker 4 and a display 5 are illustrated as examples of the output device 107.
- FIG. 5 is a functional block diagram illustrating a configuration example of the information providing system 1 according to the first embodiment.
- the information providing system 1 includes an acquisition unit 10, a generation unit 11, a voice recognition dictionary 16, an association determination unit 17, a storage unit 18, a control unit 19, and a voice recognition unit 20.
- the functions of the acquisition unit 10, the generation unit 11, the association determination unit 17, the control unit 19, and the voice recognition unit 20 are realized by the CPU 101 executing a program.
- the voice recognition dictionary 16 and the storage unit 18 are the RAM 103 or the HDD 106.
- the acquisition unit 10, the generation unit 11, the speech recognition dictionary 16, the association determination unit 17, the storage unit 18, the control unit 19, and the speech recognition unit 20 included in the information providing system 1 are included in one device as illustrated in FIG. Or may be distributed to a server on a network, a portable information terminal such as a smartphone, and an in-vehicle device.
- the acquisition unit 10 acquires content described in HTML (HyperText Markup Language) or XML (extensible Markup Language) format from the server 3 via the network 2. Then, the acquisition unit 10 extracts the main part information by interpreting the content based on the default tag information or the like attached to the acquired content and excluding incidental information, and related to the generation unit 11 Output to the determination unit 17.
- HTML HyperText Markup Language
- XML extensible Markup Language
- the server 3 is an information source that stores content such as news.
- content the text information of news that can be acquired from the server 3 by the information providing system 1 via the network 2 is illustrated.
- the present invention is not limited to this, and knowledge such as a word dictionary is available. It may be text information such as a database service or a cooking recipe. Further, content that does not need to be acquired via the network 2 such as content stored in advance in the information providing system 1 may be used. Furthermore, the content is not limited to text information, and may be moving image information, audio information, or the like.
- the acquisition unit 10 acquires text information of news distributed by the server 3 every time it is distributed, or acquires text information of recipes stored in the server 3 in response to a request from the user. Or
- the generation unit 11 includes a first recognition target word generation unit 12, a display character string determination unit 13, a second recognition target word generation unit 14, and a recognition dictionary generation unit 15.
- the first recognition target word generation unit 12 extracts a keyword related to the content from the text information of the content acquired by the acquisition unit 10, and generates a first recognition target word from the keyword.
- Keyword extraction uses known natural language processing techniques such as morphological analysis processing, such as proper nouns included in the text information of the content, heading or heading nouns in the text information, frequent nouns in the text information, etc. Any method may be used including a method of extracting an important word representing the content content.
- the first recognition target word generation unit 12 extracts the first noun “US President” as a keyword from the news headline “US President comes to Japan on XX day”, and the notation and reading thereof are the first recognition target. Set to the word “US President”.
- the first recognition target word generation unit 12 outputs the generated first recognition target word to the display character string determination unit 13 and the recognition dictionary generation unit 15. The notation of the keyword and the first recognition target word is the same.
- the first recognition target word generation unit 12 may add a character string set in advance to the first recognition target word.
- the first recognition target word is “US Presidential News” in which the character string “News” is added after the first recognition target word “US President”.
- the character string added to the first recognition target word is not limited to this, and may be a character string added before or after the first recognition target word.
- the first recognition target word generation unit 12 may use both “US President” and “US President's News” as the first recognition target word, or may use either one as the first recognition target word.
- the display character string determination unit 13 determines the prescribed number of characters that can be displayed in this area based on the information in the character display areas A1 and A2 of the display 5. The display character string determination unit 13 determines whether or not the first recognition target word generated by the first recognition target word generation unit 12 exceeds the specified number of characters, and if so, the first recognition target word is reduced to the specified number of characters. The generated character string is generated and output to the second recognition target word generation unit 14. In the first embodiment, the character string obtained by shortening the first recognition target word to the specified number of characters and the notation of the second recognition target word described later are the same.
- the information of the character display areas A1 and A2 may be anything as long as it represents the size of the area such as the number of characters or the number of pixels. Further, the character display areas A1 and A2 may have a predetermined size, and when the size of the display area or the display screen changes dynamically, the sizes of the character display areas A1 and A2 also change dynamically. You can do it. When the sizes of the character display areas A1 and A2 dynamically change, for example, the control unit 19 notifies the display character string determination unit 13 of the information on the character display areas A1 and A2.
- the display character string determination unit 13 sets the last two characters “Corporate” to “America President”. By deleting, the character string is shortened to “America University” for five characters from the beginning.
- the display character string determination unit 13 outputs the character string “America University” obtained by shortening the first recognition target word to the second recognition target word generation unit 14.
- the first recognition target word is shortened to a character string of five characters from the beginning.
- any method may be used as long as the first recognition target word is shortened to the specified number of characters.
- the display character string determination unit 13 uses “American President” as it is as the second recognition target word generation unit 14. Output to.
- the second recognition target word generation unit 14 generates a second recognition target word when a character string obtained by shortening the first recognition target word to the specified number of characters is received from the display character string determination unit 13. For example, if the character string obtained by abbreviating “US President” is “America University”, the second recognition target word generation unit 14 sets the notation and reading as the second recognition target word “America University (America Die)”. . As the second recognition target word reading, the second recognition target word generation unit 14 generates, for example, a reading of a character string shortened to a specified number of characters among the readings of the first recognition target word. The second recognition target word generation unit 14 outputs the generated second recognition target word to the recognition dictionary generation unit 15. On the other hand, when the first recognition target word that has not been shortened is received from the display character string determination unit 13, the second recognition target word generation unit 14 does not generate the second recognition target word.
- the recognition dictionary generation unit 15 receives the first recognition target word from the first recognition target word generation unit 12 and the second recognition target word from the second recognition target word generation unit 14. And the recognition dictionary production
- the speech recognition dictionary 16 may have any format such as a network grammar format that describes a recognizable word sequence as a grammar, or a statistical language model that probabilistically models word connections. .
- the voice recognition unit 20 recognizes the voice of the user B with reference to the voice recognition dictionary 16, and the recognition result character string is obtained. Output to the control unit 19.
- a method for speech recognition by the speech recognition unit 20 may be performed by using a known technique, and a description thereof will be omitted.
- a button for instructing the voice recognition start is provided for the user B to clearly instruct the information providing system 1 to start the utterance. May have been.
- the voice recognition unit 20 recognizes the voice uttered after the button is pressed by the user B.
- the voice recognition unit 20 always receives the voice collected by the microphone 6, detects the utterance section corresponding to the content uttered by the user B, and Recognize speech.
- the association determination unit 17 receives the text information of the content acquired by the acquisition unit 10 and receives the first recognition target word and the second recognition target word from the recognition dictionary generation unit 15. Then, the association determination unit 17 determines a correspondence relationship between the first recognition target word, the second recognition target word, and the content, and associates the first recognition target word and the second recognition target word with the text information of the content and stores the storage unit. 18 is stored.
- the storage unit 18 stores the currently available content, the first recognition target word, and the second recognition target word in association with each other.
- FIG. 6 shows an example of the first recognition target word, the second recognition target word, and the content stored in the storage unit 18.
- FIG. 6 shows an example where the prescribed number of characters is five.
- the first recognition target word "US President (America Daito Ryo)”
- the second recognition target word “America University (America Die)”
- the content of the news text "US President XX for YX negotiations on XX day "I will come to Japan.”
- the second recognition target word is not generated, so only the first recognition target word and the content are associated and stored in the storage unit 18.
- the content stored in the storage unit 18 is not limited to text information, and may be moving image information, audio information, or the like.
- the control unit 19 outputs the first recognition target word or the second recognition target word within the specified number of characters to the display 5, and the recognition result character string output from the voice recognition unit 20 is the first recognition target word or the second recognition target.
- Information related to the case of matching with the target word is acquired from the storage unit 18 and output to the display 5 or the speaker 4.
- the control unit 19 acquires the text information of the content stored in the storage unit 18 and notifies the voice recognition unit 20 as the text information of the content that can be currently provided. Further, the control unit 19 acquires the second recognition target word stored in association with the text information of the currently available content from the storage unit 18, and as shown in FIG. 3, the character display areas A1, A2 of the display 5 To display. When the second recognition target word exists in the storage unit 18, the first recognition target word exceeds the specified number of characters. On the other hand, if only the first recognition target word associated with the text information of the currently available content is stored in the storage unit 18 and there is no second recognition target word, the first recognition target word is within the specified number of characters. is there. In this case, as shown in FIG. 2, the control unit 19 acquires the first recognition target word from the storage unit 18 and displays it in the character display areas A ⁇ b> 1 and A ⁇ b> 2 of the display 5.
- control unit 19 receives the recognition result character string from the speech recognition unit 20, collates the recognition result character string with the first recognition target word and the second recognition target word stored in the storage unit 18, and recognizes the recognition result.
- the text information of the content associated with the first recognition target word or the second recognition target word that matches the character string is acquired.
- the control unit 19 performs speech synthesis on the acquired text information of the content and causes the speaker 4 to output the sound. Since a known technique may be used for speech synthesis, description thereof is omitted. Note that the display mode of the information is not particularly limited as long as the user can appropriately recognize the information according to the type of the information. For example, the control unit 19 causes the display 5 to display the beginning part of the text information on the screen 5 or scroll. By doing so, the entire text information may be displayed on the screen. When the content is moving image information, the control unit 19 may display the moving image information on the display 5. When the content is audio information, the control unit 19 may output the audio information from the speaker 4 as audio.
- the information providing system 1 has acquired two news contents, news ⁇ and news ⁇ , distributed by the server 3 via the network 2.
- the headline of the news ⁇ is “American President is coming to Japan on XX”, and the main text is “American President is coming to Japan for XX day and YY negotiations.
- the headline of News ⁇ is “The motor show opens in Tokyo” and the main text is “The bi-annual motor show opens on XX.
- the acquisition unit 10 acquires the content distributed from the server 3 via the network 2, analyzes the tags and the like, excludes the incidental information of the content, The text information of the main part is obtained (step ST1).
- the acquisition unit 10 outputs the text information of the content to the first recognition target word generation unit 12 and the association determination unit 17.
- the first recognition target word generation unit 12 extracts a keyword from the text information of the content received from the acquisition unit 10, and generates a first recognition target word (step ST2).
- the first recognition target word generation unit 12 outputs the first recognition target word to the display character string determination unit 13 and the recognition dictionary generation unit 15.
- the first recognition target word generation unit 12 uses a natural language processing technique such as morphological analysis to extract a noun (including a compound noun) that appears at the beginning of a news headline as a keyword, and reads and reads the keyword. Generate and set as the first recognition target word. That is, applying the specific examples of news ⁇ and ⁇ , the first recognition target word of news ⁇ is “US President (America Daito Ryo)”, and the first recognition target word of news ⁇ is “motor show (motor show)”. Become.
- a natural language processing technique such as morphological analysis
- the display character string determination unit 13 determines the prescribed number of characters that can be displayed in the character display areas A1 and A2 based on the information in the character display areas A1 and A2 of the display 5 and receives the character string from the display character string determination unit 13. It is then determined whether or not the first recognition target word exceeds the specified number of characters, that is, whether or not all characters of the first recognition target word can be displayed in the character display areas A1 and A2 (step ST3). When all characters of the first recognition target word cannot be displayed (step ST3 “NO”), the display character string determination unit 13 generates a character string obtained by shortening the first recognition target word to the specified number of characters (step ST4). The display character string determination unit 13 outputs a character string obtained by shortening the first recognition target word to the specified number of characters to the second recognition target word generation unit 14.
- the display character string determination unit 13 shortens the first recognition target word of news ⁇ to 5 characters to “America University” and shortens the first recognition target word of news ⁇ to 5 characters to “Motorcy” or Set to “Motor Show”. In the following description, it is assumed that the name is shortened to “motor”.
- the second recognition target word generation unit 14 receives from the display character string determination unit 13 a character string obtained by shortening the first recognition target word to the specified number of characters, and uses all of the characters included in the character string.
- a recognition target word is generated (step ST5).
- the second recognition target word reading the second recognition target word generation unit 14 generates, for example, a reading of a character string shortened to a specified number of characters among the readings of the first recognition target word.
- the second recognition target word of news ⁇ is “America University (America Die)”
- the second recognition target word of news ⁇ is “Motorcy”.
- the second recognition target word generation unit 14 outputs the second recognition target word to the recognition dictionary generation unit 15.
- step ST3 “YES” when all the characters of the first recognition target word can be displayed within the prescribed number of characters (step ST3 “YES”), the display character string determination unit 13 skips the processes of steps ST4 and ST5 and proceeds to step ST6.
- the recognition dictionary generation unit 15 receives the first recognition target word from the first recognition target word generation unit 12, and registers it as a recognition target word in the speech recognition dictionary 16 (step ST6). Moreover, the recognition dictionary production
- the first recognition target words "US President (America Daito Ryo)""Motor Show (Motor Show)" and the second recognition target words "America University (America Dai)"”Motor System (Motor System)" Is registered in the speech recognition dictionary 16 as a recognition target word. Furthermore, the recognition dictionary generation unit 15 notifies the association determination unit 17 of the recognition target words registered in the speech recognition dictionary 16.
- the association determination unit 17 receives the text information of the content from the acquisition unit 10 and also receives a notification of the recognition target word from the recognition dictionary generation unit 15, determines the correspondence between the content and the recognition target word, and The data are stored in the storage unit 18 in association with each other (step ST7).
- the control unit 19 refers to the storage unit 18, and if the second recognition target word associated with the currently available content is stored, acquires the second recognition target word and relates to the content. As keywords to be displayed in the character display areas A1 and A2 of the display 5 (step ST11).
- the control unit 19 acquires the first recognition target word. Then, it is displayed in the character display areas A1 and A2 of the display 5 as keywords related to the content (step ST11). In this way, the first recognition target word or the second recognition target word corresponding to the size of the character display areas A1 and A2 is displayed as a keyword and presented to the user B.
- the first recognition target words of the news ⁇ and ⁇ cannot be displayed in the character display areas A1 and A2, so that the second recognition target words “America University” and “Motorcy” are displayed on the display 5. It is displayed in the character display areas A1 and A2.
- control unit 19 before presenting the keyword in step ST11 or together with the keyword presentation, the control unit 19 outputs a headline of the news ⁇ , ⁇ or the head of the text, etc. by voice output, so that an overview of the currently available news can be obtained by the user. B may be notified.
- step ST ⁇ b> 11 the microphone 6 collects speech spoken by the user B and outputs it to the speech recognition unit 20.
- the voice recognition unit 20 waits for the user B's utterance voice input through the microphone 6 (step ST12).
- step ST12 “YES”) the voice recognition section 20 stores the utterance voice in the voice recognition dictionary 16. It recognizes using (step ST13).
- the voice recognition unit 20 outputs the recognition result character string to the control unit 19.
- the speech recognition unit 20 recognizes this speech using the speech recognition dictionary 16 and uses “ “America University” is output to the control unit 19.
- the control unit 19 receives the recognition result character string from the voice recognition unit 20, searches the storage unit 18 using the recognition result character string as a search key, and acquires text information of the content corresponding to the recognition result character string. (Step ST14).
- the recognition result character string “America University” matches the second recognition target word “America University (America Die)” of news ⁇ , so the text of the news ⁇ “American President XX is XX “I will come to Japan for YY and YY negotiations.
- control unit 19 synthesizes the text information of the content acquired from the storage unit 18 and outputs the voice from the speaker 4 or displays the beginning part of the text information on the display 5 (step ST15). As a result, the content that the user B desires to select is provided.
- the information providing system 1 specifies the acquisition unit 10 that acquires the content to be provided from the server 3, and generates the first recognition target word from the content acquired by the acquisition unit 10.
- the generation unit 11 that generates the second recognition target word using all the character strings obtained by shortening the first recognition target word exceeding the number of characters to the specified number of characters, the content acquired by the acquisition unit 10 and the first generated by the generation unit 11
- the storage unit 18 that stores the recognition target word and the second recognition target word in association with each other, the speech recognition unit 20 that recognizes the speech of the user B and outputs a recognition result character string, and the prescribed number of characters generated by the generation unit 11
- the first recognition target word or the second recognition target word consisting of the character string is output to the display 5 and the recognition result character string output from the speech recognition unit 20 is the first recognition target word or Since it is configured to include the control unit 19 that acquires content related to the second recognition target word from the storage unit 18 and outputs the content to the display 5 or the speaker 4, the first character string that is within the prescribed
- the second recognition target word generation unit 14 of Embodiment 1 is configured to use a character string obtained by shortening the first recognition target word, which is a keyword, to the specified number of characters as it is as the second recognition target word. You may make it the structure which processes 2 and produces
- modified examples of the method for generating the second recognition target word will be described.
- the second recognition target word generation unit 14 may generate one or more readings for a character string obtained by shortening the first recognition target word to a specified number of characters as the reading of the second recognition target word.
- the second recognition target word generation unit 14 may perform one or more readings by performing a morphological analysis process, or may determine one or more readings using a word dictionary (not illustrated).
- the second recognition target word generation unit 14 reads the second recognition target word “America University” in addition to the same “America University (America Die)” as the first recognition target word reading.
- readings such as “America University (America O)” and “America University (America Thailand)” are given.
- the second recognition target word generation unit 14 adds another character string reading as a reading of the second recognition target word to a reading of the character string obtained by shortening the first recognition target word to the specified number of characters. May be.
- the second recognition target word generation unit 14 may search for another character string using, for example, a word dictionary (not shown).
- the reading of the generated second recognition target word is a reading of another word including all the shortened character strings.
- the second recognition target word generation unit 14 adds another character string “Land” to the character string “America University”, which is an abbreviation of “US President”, and changes the character string “American continent”.
- the generated “American continent” reading (American tyric) is used as the second recognition target word “America University”.
- the second recognition target word generation unit 14 replaces the character string obtained by shortening the first recognition target word with the specified number of characters with another character string that is within the specified number of characters and has the same meaning as the first recognition target word.
- the second recognition target word may be generated.
- the second recognition target word generation unit 14 may search for another character string having the same number of characters as the first recognition target word using a word dictionary (not shown). Specifically, the second recognition target word generation unit 14 determines that the first recognition target word “US President (America Daitoyo)” is within the prescribed number of characters of “US President (Baekoku Daitoyo)” A synonymous character string is generated as a second recognition target word.
- the second recognition target word generation unit 14 sets “US President” in addition to “America University” as the second recognition target word. As a result, even when the user B utters a reading different from the reading of the first recognition target word, the possibility that the user B can provide the content that the user B wants to select increases, and the operability and convenience of the user B are increased. Is further improved. Further, the control unit 19 replaces the character string presented to the user B as a keyword with another character string instead of the character string “America University” obtained by shortening the first recognition target word to the specified number of characters. You may change the notation of the target word to “US President”.
- the second recognition target word generation unit 14 may generate a plurality of second recognition target words by combining a plurality of the above-described modified examples.
- the second recognition target word generation unit 14 may generate the reading of the second recognition target word based on the utterance history of the user B.
- a configuration example of the information providing system 1 in this case is shown in FIG.
- a history storage unit 21 is added to the information providing system 1.
- the history storage unit 21 stores the recognition result character string of the voice recognition unit 20 as the utterance history of the user B.
- the second recognition target word generation unit 14 acquires the recognition result character string stored in the history storage unit 21 and sets it as a reading of the second recognition target word. Specifically, when two types of second recognition target words “America University (America Die)” and “America University (America Die)” are generated and User B speaks “America University (America Die)”, Thereafter, the second recognition target word generation unit 14 generates a second recognition target word “America University (America Die)” to which the readings made by the user B in the past are given.
- the second recognition target word generation unit 14 not only simply determines whether the user B has spoken in the past, but also performs statistical processing such as frequency distribution, and reads the second reading more than a preset probability. You may make it the structure provided to a recognition object word. As a result, the habit of user B's utterance can be reflected in the speech recognition process, so even if user B speaks a different reading from the first recognition target word, the content that user B wants to select is selected. The possibility of being provided increases, and the operability and convenience of the user B are further improved.
- the second recognition target word generation unit 14 may generate a reading of the second recognition target word according to the user based on the utterance history for each user.
- the user identification unit 7 identifies the current user B, and outputs the identification result to the second recognition target word generation unit 14 and the history storage unit 21.
- the history storage unit 21 stores the recognition result character string in association with the user B notified from the user identification unit 7.
- the second recognition target word generation unit 14 acquires a recognition result character string stored in association with the user B notified from the user identification unit 7 from the history storage unit 21 and sets it as a reading of the second recognition target word.
- the identification method of the user identification unit 7 may be any method that can identify the user, such as login authentication that requires the user to input a user name and password, or biometric authentication based on the user's face or fingerprint.
- the first recognition target word and the second recognition target word generated by the operation shown in the flowchart of FIG. 7 are registered in the speech recognition dictionary 16, but at least for the second recognition target word, the acquisition unit
- the content may be deleted at a preset timing.
- the preset time for example, the timing at which a predetermined time (for example, 24 hours) has passed since the time when the second recognition target word is registered in the speech recognition dictionary 16, the predetermined time (for example, every morning 6 Timing).
- the user may set a timing for deleting the second recognition target word from the speech recognition dictionary 16.
- the speech recognition unit 20 receives text information of content that can be currently provided from the control unit 19 in order to shorten the recognition processing time.
- the first recognition target word and the second recognition target word registered in the speech recognition dictionary 16 the first recognition target word and the second recognition target word corresponding to the text information of the content can be recognized. You may make it prescribe
- control unit 19 performs control to display a first recognition target word or a character string obtained by shortening the first recognition target word to a specified number of characters.
- the display 5 may be controlled to be software keys that can be selected.
- the software key may be any software key that can be selected and operated by the user B using the input device 104, for example, a touch button that can be selected by a touch sensor or a button that can be selected by an operation device.
- the information providing system 1 according to Embodiment 1 is configured to match the case where the recognition target word is Japanese, but may be configured to match a language other than Japanese.
- the present invention can be modified with any component of the embodiment or omitted with any component.
- the information providing system In addition to generating the first recognition target word from the information to be provided, the information providing system according to the present invention generates the second recognition target word using all the character strings obtained by shortening the first recognition target word to the specified number of characters. Since it is generated, it is suitable for use in an in-vehicle device and a portable information terminal in which the number of characters that can be displayed on the screen is limited.
- 1 Information providing system 2 networks, 3 servers (information source), 4 speakers (sound output unit), 5 display (display unit), 6 microphones, 7 user identification unit, 10 acquisition unit, 11 generation unit, 12 first recognition Target word generation unit, 13 display character string determination unit, 14 second recognition target word generation unit, 15 recognition dictionary generation unit, 16 speech recognition dictionary, 17 association determination unit, 18 storage unit, 19 control unit, 20 speech recognition unit, 21 History storage unit, 100 bus, 101 CPU, 102 ROM, 103 RAM, 104 input device, 105 communication device, 106 HDD, 107 output device.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Machine Translation (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
Description
例えば、特許文献1に係る情報提供装置は、外部から配信されたコンテンツのテキスト情報を言語解析してキーワードを抽出し、当該キーワードを選択肢として画面表示または音声出力し、ユーザが音声入力によりキーワードを選択するとそのキーワードにリンクされたコンテンツを提供するというものである。
例えば、特許文献2に係る辞書データ生成装置は、キーワードを表示するための表示装置において表示可能なキーワードの文字数を特定し、入力コマンドに対応したテキストデータから前記文字数の範囲内の文字列を抽出してキーワードとして設定し、当該キーワードに対応した音声の特徴量データと入力コマンドに対応した処理内容を特定するための内容データとを対応付けることにより辞書データを作成するというものである。
特に、外部から配信されたコンテンツを利用する場合には、コンテンツが時々刻々と変化する特徴があり、情報提供装置側ではどのような内容のコンテンツが配信されるか不明であるため、事前に十分な文字表示領域を確保しておくことは難しい。
なお、以下の実施の形態では、この発明に係る情報提供システムを車両等の移動体に搭載される車載器に適用した場合を例に挙げて説明するが、車載器の他、PC(Personal Computer)、タブレットPC、およびスマートフォン等の携帯情報端末に適用してもよい。
図1は、この発明の実施の形態1に係る情報提供システム1とその周辺機器の概略を説明する図である。
情報提供システム1は、ネットワーク2を介してサーバ3などの情報源からコンテンツを取得し、コンテンツに関連するキーワードを抽出し、ディスプレイ5に画面表示させることによってキーワードをユーザに提示する。キーワードがユーザによって発話されると、発話音声がマイク6から情報提供システム1に入力される。情報提供システム1は、コンテンツに関連するキーワードから生成した認識対象語を用いて、ユーザにより発話されたキーワードを認識し、認識したキーワードに関連するコンテンツをディスプレイ5に画面表示させたりスピーカ4から音声出力させたりすることによってユーザに提供する。
このディスプレイ5は表示部であり、スピーカ4は音声出力部である。
以下では、ディスプレイ5の画面上に表示可能な文字数を、「規定文字数」と呼ぶ。
図2および図3のようなニュースの情報をコンテンツとして提供する情報提供システム1を想定する。ニュースの見出しは「アメリカ大統領がXX日に来日」、ニュースの本文は「アメリカの○○大統領がXX日、YY交渉のため来日する。<以後略>」とする。なお、説明の便宜上、ニュース本文の続き部分を<以後略>としている。
このニュースの場合、ニュースの内容を表すキーワードは例えば「アメリカ大統領」になり、認識対象語は例えば「アメリカ大統領(アメリカダイトーリョー)」となる。ここでは、認識対象語の表記と読みを、「表記(読み)」のように記載する。
RAM103は、プログラム実行時に使用するメモリである。
入力装置104は、ユーザ入力を受け付けるものであり、マイク、リモートコントローラ等の操作デバイス、またはタッチセンサ等である。図1では、入力装置104の例として、マイク6を図示している。
通信装置105は、ネットワーク2を介して、サーバ3などの情報源と通信するものである。
HDD106は、外部記憶装置の一例である。外部記憶装置としては、HDDの他に、CDもしくはDVD、またはUSBメモリおよびSDカード等のフラッシュメモリを採用したストレージ等が含まれる。
出力装置107は、情報をユーザに提示するものであり、スピーカ、液晶ディスプレイ、または有機EL(Electroluminescence)等である。図1では、出力装置107の例として、スピーカ4およびディスプレイ5を図示している。
この情報提供システム1は、取得部10、生成部11、音声認識辞書16、関連判定部17、記憶部18、制御部19および音声認識部20を備えている。取得部10、生成部11、関連判定部17、制御部19および音声認識部20の機能は、CPU101がプログラムを実行することにより実現される。音声認識辞書16および記憶部18は、RAM103またはHDD106である。
サーバ3は、ニュース等のコンテンツを格納している情報源である。実施の形態1では、「コンテンツ」として、ネットワーク2を介して情報提供システム1がサーバ3から取得可能なニュースのテキスト情報を例示するが、これに限定されるものではなく、単語辞書等の知識データベースサービスまたは料理のレシピなどのテキスト情報であってもよい。また、情報提供システム1の内部に予め格納されているコンテンツなど、ネットワーク2を介して取得する必要がないコンテンツでもよい。
さらに、コンテンツはテキスト情報に限定されるものではなく、動画像情報、音声情報などであっても構わない。
取得部10は、例えば、サーバ3が配信するニュースのテキスト情報を、配信される都度取得したり、ユーザからの要求をきっかけにしてサーバ3に格納されている料理のレシピのテキスト情報を取得したりする。
一方、第一認識対象語が「アメリカ大統領(アメリカダイトーリョー)」であって規定文字数が7文字以内の場合、表示文字列判定部13は「アメリカ大統領」をそのまま第二認識対象語生成部14へ出力する。
一方、短縮されていない第一認識対象語を表示文字列判定部13から受け取った場合、第二認識対象語生成部14は第二認識対象語を生成しない。
音声認識開始を指示するボタンが設けられていない場合、例えば、音声認識部20は常にマイク6が集音する音声を受け付け、ユーザBが発話した内容に該当する発話区間を検出し、発話区間の音声を認識する。
ここで、図6に、記憶部18が記憶している第一認識対象語と第二認識対象語とコンテンツの一例を示す。図6は規定文字数が5文字の場合の例である。第一認識対象語「アメリカ大統領(アメリカダイトーリョー)」と、第二認識対象語「アメリカ大(アメリカダイ)」と、コンテンツであるニュース本文「アメリカの○○大統領がXX日、YY交渉のため来日する。<以後略>」が関連付けられている。また、第一認識対象語「モーターショー(モーターショー)」と、第二認識対象語「モーターシ(モーターシ)」と、ニュース本文「2年に1度のモーターショーがXX日、開幕する。<以後略>」が関連付けられている。
また、記憶部18が記憶するコンテンツはテキスト情報に限定されるものではなく、動画像情報、音声情報などであっても構わない。
一方、記憶部18に、現在提供可能なコンテンツのテキスト情報に関連付いた第一認識対象語のみが記憶されており、第二認識対象語がない場合、第一認識対象語は規定文字数以内である。この場合、図2に示すように、制御部19は第一認識対象語を記憶部18から取得してディスプレイ5の文字表示領域A1,A2に表示させる。
なお、情報の表示態様は、その情報の種類に応じてユーザが情報を適切に認識できるものであればよく、例えば、制御部19がテキスト情報の冒頭一部分をディスプレイ5に画面表示させたり、スクロールさせることによってテキスト情報の全文を画面表示させたりしてもよい。
また、コンテンツが動画像情報である場合は、制御部19がその動画像情報をディスプレイ5に画面表示させればよい。コンテンツが音声情報である場合は、制御部19がその音声情報をスピーカ4から音声出力させればよい。
ここでは、ニュース提供サービスのサーバ3から配信されたコンテンツを取得するものとして説明する。説明を簡略化するため、情報提供システム1は、サーバ3が配信したニュースα、ニュースβの2つのニュースコンテンツを、ネットワーク2を介して取得したものとする。ニュースαの見出しは「アメリカ大統領がXX日に来日」、本文は「アメリカの○○大統領がXX日、YY交渉のため来日する。<以後略>」である。ニュースβの見出しは「モーターショーが東京で開幕」、本文は「2年に1度のモーターショーがXX日、開幕する。<以後略>」である。
まず、取得部10は、ネットワーク2を介してサーバ3から配信されたコンテンツを取得し、タグ等を解析することによりコンテンツの付帯的な情報を除外し、ニュースα,βの見出しおよび本文等の主要部分のテキスト情報を得る(ステップST1)。取得部10は、コンテンツのテキスト情報を第一認識対象語生成部12と関連判定部17へ出力する。
さらに、認識辞書生成部15は、音声認識辞書16に登録した認識対象語を、関連判定部17へ通知する。
まず、制御部19は、記憶部18を参照し、現在提供可能なコンテンツに関連付けられた第二認識対象語が記憶されている場合はその第二認識対象語を取得して、当該コンテンツに関連するキーワードとしてディスプレイ5の文字表示領域A1,A2に表示させる(ステップST11)。また、制御部19は、現在提供可能なコンテンツに関連付けられた第二認識対象語が記憶されておらず、第一認識対象語のみ記憶されている場合はその第一認識対象語を取得して、当該コンテンツに関連するキーワードとしてディスプレイ5の文字表示領域A1,A2に表示させる(ステップST11)。このように、文字表示領域A1,A2のサイズに応じた第一認識対象語または第二認識対象語を、キーワードとして表示することにより、ユーザBに提示する。
音声認識部20は、マイク6を通じて入力されるユーザBの発話音声を待ち受け(ステップST12)、発話音声の入力があった場合に(ステップST12“YES”)、その発話音声を音声認識辞書16を用いて認識する(ステップST13)。音声認識部20は、認識結果文字列を制御部19へ出力する。
前述の具体例に当てはめると、認識結果文字列「アメリカ大」はニュースαの第二認識対象語「アメリカ大(アメリカダイ)」に一致するので、ニュースαの本文「アメリカの○○大統領がXX日、YY交渉のため来日する。<以後略>」が記憶部18から取得される。
以下、第二認識対象語の生成方法について、変形例を説明する。
具体的には、第二認識対象語生成部14は、「アメリカ大」という第二認識対象語の読みとして、第一認識対象語の読みと同じ「アメリカ大(アメリカダイ)」に加えて、またはその代わりに、「アメリカ大(アメリカオー)」「アメリカ大(アメリカタイ)」のような読みを付与する。
これにより、ユーザBが第一認識対象語の読みとは異なる読みを発話した場合でも、ユーザBが希望して選択しようとしたコンテンツを提供できる可能性が高まり、ユーザBの操作性および利便性がさらに向上する。
具体的には、第二認識対象語生成部14は、「アメリカ大統領」を短縮した文字列「アメリカ大」に対して別の文字列「陸」を追加して「アメリカ大陸」という文字列を生成し、生成した「アメリカ大陸」の読み(アメリカタイリク)を第二認識対象語「アメリカ大」の読みとする。
これにより、ユーザBが第一認識対象語の読みとは異なる読みを発話した場合でも、ユーザBが希望して選択しようとしたコンテンツを提供できる可能性が高まり、ユーザBの操作性および利便性がさらに向上する。
具体的には、第二認識対象語生成部14は、「アメリカ大統領(アメリカダイトーリョー)」という第一認識対象語に対し、「米国大統領(ベーコクダイトーリョー)」という規定文字数5文字以内かつ同義の文字列を第二認識対象語として生成する。第二認識対象語生成部14は、「アメリカ大」に加えて「米国大統領」も第二認識対象語として設定する。
これにより、ユーザBが第一認識対象語の読みとは異なる読みを発話した場合でも、ユーザBが希望して選択しようとしたコンテンツを提供できる可能性が高まり、ユーザBの操作性および利便性がさらに向上する。
さらに、制御部19は、キーワードとしてユーザBに提示する文字列を、第一認識対象語を規定文字数に短縮した文字列「アメリカ大」ではなく、別の文字列に置換した他の第二認識対象語の表記「米国大統領」に変更してもよい。
具体的には、「アメリカ大(アメリカダイ)」「アメリカ大(アメリカオー)」という二種類の第二認識対象語が生成され、ユーザBが「アメリカ大(アメリカダイ)」と発話した場合、これ以降、第二認識対象語生成部14はユーザBが過去に発話した読みを付与した「アメリカ大(アメリカダイ)」という第二認識対象語を生成する。
その際、第二認識対象語生成部14は、単純にユーザBが過去に発話したか否かだけでなく、頻度分布等の統計処理を行って、予め設定された確率以上の読みを第二認識対象語に付与する構成にしてもよい。
これにより、ユーザBの発話の癖を音声認識処理に反映できるので、ユーザBが第一認識対象語の読みとは異なる読みを発話した場合でも、ユーザBが希望して選択しようとしたコンテンツを提供できる可能性が高まり、ユーザBの操作性および利便性がさらに向上する。
ユーザ識別部7の識別方法は、ユーザに対してユーザ名とパスワード等の入力を求めるログイン認証、またはユーザの顔もしくは指紋等に基づく生体認証など、ユーザを識別可能な方法であれば何でもよい。
予め設定された時間になった場合とは、例えば、第二認識対象語が音声認識辞書16に登録された時点から所定時間(例えば、24時間)が経過したタイミング、所定時刻(例えば、毎朝6時)になったタイミングなどである。さらに、第二認識対象語を音声認識辞書16から消去するタイミングをユーザに設定させる構成にしてもよい。
これにより、ユーザBが発話する可能性の低い認識対象語を消去でき、音声認識辞書16を構成するRAM103またはHDD106における使用領域を削減できるようになる。
一方、音声認識辞書16に登録された認識対象語を消去しない場合には、認識処理の時間短縮のために、例えば音声認識部20が制御部19から現在提供可能なコンテンツのテキスト情報を受け取り、音声認識辞書16に登録された第一認識対象語と第二認識対象語のうち、当該コンテンツのテキスト情報に対応する第一認識対象語と第二認識対象語を有効化することで認識可能な語彙を規定するようにしてもよい。
Claims (8)
- 提供対象の情報を情報源から取得する取得部と、
前記取得部が取得した情報から第一認識対象語を生成すると共に、規定文字数を超える第一認識対象語を当該規定文字数に短縮した文字列すべてを用いて第二認識対象語を生成する生成部と、
前記取得部が取得した情報、ならびに前記生成部が生成した第一認識対象語および第二認識対象語を関連付けて記憶する記憶部と、
ユーザの発話音声を認識して認識結果文字列を出力する音声認識部と、
前記生成部が生成した前記規定文字数以内の文字列からなる第一認識対象語または第二認識対象語を表示部に出力すると共に、前記音声認識部から出力された認識結果文字列が前記第一認識対象語または前記第二認識対象語と一致する場合に関連する情報を前記記憶部から取得して前記表示部または音声出力部に出力する制御部とを備える情報提供システム。 - 前記生成部は、前記第一認識対象語を前記規定文字数に短縮した文字列を加工して前記第二認識対象語を生成することを特徴とする請求項1記載の情報提供システム。
- 前記生成部は、前記第二認識対象語の読みとして、前記第一認識対象語の読みのうちの前記規定文字数に短縮した文字列の読みを生成することを特徴とする請求項2記載の情報提供システム。
- 前記生成部は、前記第二認識対象語の読みとして、前記第一認識対象語を前記規定文字数に短縮した文字列に対する一以上の読みを生成することを特徴とする請求項2記載の情報提供システム。
- 前記生成部は、前記第二認識対象語の読みとして、前記第一認識対象語を前記規定文字数に短縮した文字列の読みに対して別の文字列の読みを追加することを特徴とする請求項2記載の情報提供システム。
- 前記生成部は、前記第一認識対象語を前記規定文字数に短縮した文字列を、前記規定文字数以内かつ前記第一認識対象語と同義の別の文字列に置換して、他の第二認識対象語を生成することを特徴とする請求項1記載の情報提供システム。
- 前記生成部は、前記第二認識対象語の読みをユーザの発話履歴に基づいて生成することを特徴とする請求項2記載の情報提供システム。
- 前記生成部は、前記第一認識対象語および前記第二認識対象語を音声認識辞書に登録し、前記取得部が新たな情報を取得した場合または予め設定された時間になった場合に前記音声認識辞書から少なくとも前記第二認識対象語を消去することを特徴とする請求項1記載の情報提供システム。
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201580077897.0A CN107408118A (zh) | 2015-03-18 | 2015-03-18 | 信息提供系统 |
PCT/JP2015/058073 WO2016147342A1 (ja) | 2015-03-18 | 2015-03-18 | 情報提供システム |
US15/548,154 US20170372695A1 (en) | 2015-03-18 | 2015-03-18 | Information providing system |
DE112015006325.0T DE112015006325T5 (de) | 2015-03-18 | 2015-03-18 | Informations-Bereitstellsystem |
JP2017505946A JP6125138B2 (ja) | 2015-03-18 | 2015-03-18 | 情報提供システム |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2015/058073 WO2016147342A1 (ja) | 2015-03-18 | 2015-03-18 | 情報提供システム |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2016147342A1 true WO2016147342A1 (ja) | 2016-09-22 |
Family
ID=56918466
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2015/058073 WO2016147342A1 (ja) | 2015-03-18 | 2015-03-18 | 情報提供システム |
Country Status (5)
Country | Link |
---|---|
US (1) | US20170372695A1 (ja) |
JP (1) | JP6125138B2 (ja) |
CN (1) | CN107408118A (ja) |
DE (1) | DE112015006325T5 (ja) |
WO (1) | WO2016147342A1 (ja) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11238409B2 (en) | 2017-09-29 | 2022-02-01 | Oracle International Corporation | Techniques for extraction and valuation of proficiencies for gap detection and remediation |
JP7135399B2 (ja) * | 2018-04-12 | 2022-09-13 | 富士通株式会社 | 特定プログラム、特定方法および情報処理装置 |
CN109215679A (zh) * | 2018-08-06 | 2019-01-15 | 百度在线网络技术(北京)有限公司 | 基于用户情绪的对话方法和装置 |
US20200097879A1 (en) * | 2018-09-25 | 2020-03-26 | Oracle International Corporation | Techniques for automatic opportunity evaluation and action recommendation engine |
US11467803B2 (en) | 2019-09-13 | 2022-10-11 | Oracle International Corporation | Identifying regulator and driver signals in data systems |
US11367034B2 (en) | 2018-09-27 | 2022-06-21 | Oracle International Corporation | Techniques for data-driven correlation of metrics |
JP7268449B2 (ja) * | 2019-03-29 | 2023-05-08 | 京セラドキュメントソリューションズ株式会社 | 表示制御装置、表示制御方法、及び表示制御プログラム |
JP7334510B2 (ja) * | 2019-07-05 | 2023-08-29 | コニカミノルタ株式会社 | 画像形成装置、画像形成装置の制御方法、および画像形成装置の制御プログラム |
US20220067807A1 (en) * | 2020-09-02 | 2022-03-03 | Fero Tech Global Holdings Inc | System and method for facilitating one or more freight transactions |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001034286A (ja) * | 1999-07-22 | 2001-02-09 | Ishida Co Ltd | 商品処理システム |
JP2004334280A (ja) * | 2003-04-30 | 2004-11-25 | Matsushita Electric Ind Co Ltd | 情報提供装置および情報提供方法 |
WO2006093003A1 (ja) * | 2005-02-28 | 2006-09-08 | Pioneer Corporation | 辞書データ生成装置及び電子機器 |
JP2009169470A (ja) * | 2008-01-10 | 2009-07-30 | Nissan Motor Co Ltd | 情報案内システムおよびその認識辞書データベース更新方法 |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH1118127A (ja) * | 1997-06-27 | 1999-01-22 | Nec Corp | 通信装置の表示制御装置およびその方法 |
US7437296B2 (en) * | 2003-03-13 | 2008-10-14 | Matsushita Electric Industrial Co., Ltd. | Speech recognition dictionary creation apparatus and information search apparatus |
CN103869948B (zh) * | 2012-12-14 | 2019-01-15 | 联想(北京)有限公司 | 语音命令处理方法和电子设备 |
-
2015
- 2015-03-18 US US15/548,154 patent/US20170372695A1/en not_active Abandoned
- 2015-03-18 JP JP2017505946A patent/JP6125138B2/ja not_active Expired - Fee Related
- 2015-03-18 WO PCT/JP2015/058073 patent/WO2016147342A1/ja active Application Filing
- 2015-03-18 DE DE112015006325.0T patent/DE112015006325T5/de not_active Withdrawn
- 2015-03-18 CN CN201580077897.0A patent/CN107408118A/zh active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001034286A (ja) * | 1999-07-22 | 2001-02-09 | Ishida Co Ltd | 商品処理システム |
JP2004334280A (ja) * | 2003-04-30 | 2004-11-25 | Matsushita Electric Ind Co Ltd | 情報提供装置および情報提供方法 |
WO2006093003A1 (ja) * | 2005-02-28 | 2006-09-08 | Pioneer Corporation | 辞書データ生成装置及び電子機器 |
JP2009169470A (ja) * | 2008-01-10 | 2009-07-30 | Nissan Motor Co Ltd | 情報案内システムおよびその認識辞書データベース更新方法 |
Also Published As
Publication number | Publication date |
---|---|
DE112015006325T5 (de) | 2017-11-30 |
JP6125138B2 (ja) | 2017-05-10 |
US20170372695A1 (en) | 2017-12-28 |
CN107408118A (zh) | 2017-11-28 |
JPWO2016147342A1 (ja) | 2017-04-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6125138B2 (ja) | 情報提供システム | |
JP6570651B2 (ja) | 音声対話装置および音声対話方法 | |
US11189277B2 (en) | Dynamic gazetteers for personalized entity recognition | |
EP3193328B1 (en) | Method and device for performing voice recognition using grammar model | |
KR101770358B1 (ko) | 내장형 및 네트워크 음성 인식기들의 통합 | |
US20190295531A1 (en) | Determining phonetic relationships | |
US9442920B2 (en) | Speech translation system, dictionary server, and program | |
US10170122B2 (en) | Speech recognition method, electronic device and speech recognition system | |
WO2020238045A1 (zh) | 智能语音识别方法、装置及计算机可读存储介质 | |
CN106710593B (zh) | 一种添加账号的方法、终端、服务器 | |
JP2006208696A (ja) | プレゼンテーション用アプリケーションをリモートコントロールするための装置,方法,プログラム及び記録媒体 | |
US20050010422A1 (en) | Speech processing apparatus and method | |
US20120221335A1 (en) | Method and apparatus for creating voice tag | |
JP2018045001A (ja) | 音声認識システム、情報処理装置、プログラム、音声認識方法 | |
CN109326284A (zh) | 语音搜索的方法、装置和存储介质 | |
US20250182761A1 (en) | Electronic device and control method therefor | |
CN115699170A (zh) | 文本回声消除 | |
JP5396530B2 (ja) | 音声認識装置および音声認識方法 | |
JP5160594B2 (ja) | 音声認識装置および音声認識方法 | |
CN110580905A (zh) | 识别装置及方法 | |
WO2006118683A1 (en) | Speech dialog method and system | |
US11935539B1 (en) | Integrating voice controls into applications | |
CN114586021B (zh) | 信息输出装置、信息输出方法以及记录介质 | |
JP2001306090A (ja) | 対話装置および方法、音声制御装置および方法、ならびにコンピュータを対話装置および音声制御装置として機能させるためのプログラムをそれぞれ記録したコンピュータ読取可能な記録媒体 | |
JP7465124B2 (ja) | 音声処理システム、音声処理方法、及び音声処理プログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15885438 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2017505946 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 15548154 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 112015006325 Country of ref document: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 15885438 Country of ref document: EP Kind code of ref document: A1 |