CN103176591A

CN103176591A - Text location and selection method based on voice recognition

Info

Publication number: CN103176591A
Application number: CN 201110432826
Authority: CN
Inventors: 顾健
Original assignee: Shanghai Bolu Information Technology Co Ltd
Current assignee: Shanghai Bolu Information Technology Co Ltd
Priority date: 2011-12-21
Filing date: 2011-12-21
Publication date: 2013-06-26

Abstract

The invention discloses a text location and selection method based on voice recognition. The text location and selection method based on the voice recognition comprises a voice recognition module, a text selection module, a service logic module and the like. Through a user dictating a part of content, a terminal or a system recognizes a voice, a voice segment dictated by the user is converted into test content, text content in the part is served as keywords to search a text displayed at an active window of the present terminal and the recognized text Is located based on the searched result. Corresponding content is selected after location is successful, so that the user is helped to fast select the corresponding content to carry out a next operation. Due to the fact that the user dictates the content to recognize and search the text content of the present active window, and the text content of the present active window is located and selected. A method of text selection is provided for the user.

Description

A kind of text location and system of selection based on speech recognition

Technical field

The present invention relates to system software, the speech recognition technology field more particularly, relates to a kind of text location and system of selection based on speech recognition.

Background technology

Traditional text selecting mode all needs the user to carry out manual various operations on each terminal, comprise dilatory selection of mouse on computer, keyboard operation, touch by touch-screen on intelligent terminal and select, have difficulties under some specific scene, sensitivity as touch-screen, the flexibility ratio of user's finger, the selection operation of capital impact on screen, especially on the less intelligent terminal of screen, the accurate localization of text of user is also selected often to have variety of issue, need to repeatedly position and select.

And speech recognition has become in the situation of general ability of various intelligent terminals and system, obtain user's intention by speech recognition, can accurately choose the text that corresponding user need to select and locate, facilitated user's operation, for the user provides a selection in the operation under varying environment.

Summary of the invention

Give an oral account partial content by the user, terminal or system carry out the identification of voice, the sound bite of user's oral account is converted to content of text, and the text that shows on the active window on take this segment word content as the current terminal of keyword search, and position based on Search Results identification text, choose corresponding content after locating successfully, thereby help content corresponding to user's fast selecting further to operate, a kind of text location and the system of selection based on speech recognition of using simple and fast is provided for the user.

Further, a kind of text location and system of selection based on speech recognition of passing through to provide provides powerful guarantee for the user uses the development of the application of various terminals, satisfies each side's requirement, promotes user friendly experience.

For achieving the above object, one aspect of the present invention provides a kind of text location and system of selection based on speech recognition, and the method comprises:

Give an oral account by the user content that need to choose on terminal, carry out speech recognition in terminal or system, be converted to word and initiate the search of the content of text of current active window take recognition result as keyword, obtain location and the position-based of identification text and select corresponding content of text.

Terminal has comprised traditional computer, mobile phone, and the support voice such as panel computer are obtained the various terminal devices with network function.

In a kind of text location and an embodiment of system of selection based on speech recognition provided by the invention, the method also comprises:

The contents fragment of user's oral account, terminal is obtained and the recording user speech data by microphone, be converted to the phonetic matrix of speech recognition requirement, be chosen in terminal according to the terminal equipment software and hardware recognition capability and carry out speech recognition, or by carrying out speech recognition with method of service open system end speech recognition interface Request System, obtain text corresponding to voice.

According to terminal hardware environment and ability, terminal can load sound identification module, also can carry out the identification of content to the online speech-recognition services of system by the audio content that transmission is obtained, and initiate afterwards search and the location of the text of current active window in identification.

Terminal is after obtaining the content of text of corresponding voice, at current text corresponding to active window search, search the position of localization of text after text, and according to the selection mode of current terminal system to operations such as text highlight, user's corresponding menu of can breathing out subsequently further operates.

After navigating to corresponding content of text, the user can be by the modes such as the button corresponding choice menus of breathing out, and menu comprises various operation options, comprises common copying, and shears, and the various operations such as shares.

Have the following advantages specifically:

Use-pattern is simple:

The user gives an oral account simply and positions and select and further various operation after the part statement can obtain the corresponding content text, and system carries out the identification of statement automatically, resolves and identification, and implementation is simple and convenient.

Cloud mode identification:

Satisfy the terminal of different abilities, can carry out in terminal the identification of voice, also can carry out speech recognition by the speech-recognition services of system end, satisfied the terminal of different hardware level ability.

Accurate positioning:

Automatically carry out selection and the location of text by system, need not the user and manually select, avoided the hardware deficiency of various terminals and the problem of user's flexible operation degree, the precision that has improved the location and selected.

Description of drawings

Accompanying drawing described herein is used to provide a further understanding of the present invention, consists of the application's a part, and illustrative examples of the present invention and explanation thereof are used for explaining the present invention, do not consist of improper restriction of the present invention.In the accompanying drawings:

Fig. 1 is the schematic diagram of system module structure of the present invention.

Fig. 2 is operation flow schematic diagram of the present invention.

Fig. 3 is speech recognition schematic flow sheet of the present invention.

Embodiment

With reference to the accompanying drawings the present invention is described more fully, exemplary embodiment of the present invention wherein is described.

For achieving the above object, a kind of text location and system of selection based on speech recognition proposed.

Below in conjunction with the drawings, embodiments of the present invention are described

The key point that realizes a kind of text location based on speech recognition and system of selection is as follows:

Voice obtain:

The user chooses by the lower user of terminal microphone record the phonetic matrix that becomes speech recognition to accept with the partial content segment contents of giving an oral account and compression coding.

Speech recognition:

User's spoken words starts speech recognition, identification module is at terminal or system end, in terminal, speech recognition library is installed according to terminal capability and carries out speech recognition, or system end provides speech recognition capabilities and opens with service form, the speech-recognition services of terminal request system end, submit the speech data of record to, system carries out speech recognition.

Content search and location:

After terminal was obtained corresponding speech text, end side was carried out the search of content and decides based on text

The position is selected the content that searches automatically, chooses manifestation mode with routines such as inverses text is selected, and the user can comprise and copy based on the text exhalation actions menu of these selections immediately, and shearing such as shares at the various operations.

Main functional modules is as shown in Figure 1:

Mobile terminal side:

Terminal refers to possess the various intelligent terminals of mobile Internet net function and camera, comprises smart mobile phone, the equipment such as panel computer of Tape movement data function;

User terminal 100:

User terminal refers to the various equipment that possess operating system, comprises computer, flat board, and the various smart machines such as smart mobile phone also possess network function.

Service logic 101:

The terminal traffic logic is controlled and is called logic function and the operation flow of each business, and carries out the transmission of data and calling of function with each functional module of periphery.

Voice acquisition module 102:

Call terminal audio frequency function and microphone recording user voice, and be converted to the phonetic matrix of identification service module requirement, offer the identification that identification module carries out content.

Content operation module 103:

After text is positioned, the option of operation of corresponding text is provided, the content-based operation of user can be carried out further various operation to content, as copies, and shears etc.

Configuration Manager 104:

End side user carries out the configuration of various customer parameters and service parameter, comprises the user data configuration, service parameter configuration etc.

Content search locating module 105:

After getting the resulting text of speech recognition, terminal applies is searched for resulting text the content of current window as keyword, and carries out the location of cursor and the selection of content based on the result of search.

Sound identification module 106:

The optional module of end side in the situation that terminal possesses the voice content of speech recognition capabilities identification user oral account, and is converted into word and offers other functional modules such as search.

Services request module 107:

The functional module of the remote service such as Request System remote speech identification, terminal generates various services request by the services request module, and the request remote system provides various service functions, comprises identification service etc.

Interface module 108:

Data-interface between terminal and system, the various data of the service response message by interface sending and receiving system end.

Transmission channel 109:

Comprise mobile network and internet, bearer data transmission passage and miscellaneous service, the various data between transmission terminal and system.

System side: system end for the terminal that does not possess local voice identification provides service, is optional part.

Service interface module 110:

The mode of the service access that the define system end provides and parameter are responsible for communicating by data network with terminal, obtain the request of mobile terminal submission and the various data of interacting message.

Business logic modules 111:

The functional module that the various requests of submitting to according to the user and request msg are carried out each corresponding service logic and be responsible for controlling and call periphery communicates and exchanges various data and completed the miscellaneous service logic function.

Security module 112:

The system that is responsible for carries out authentication to the safety management of user and service request to user and terminal, and the safety that ensures data transmission, and the encrypting and decrypting etc. that comprises data relates to the various functions of service security.

Sound identification module 113:

System is responsible for the original data content that the identification terminal side sends, by interface service, and the identification service of terminal remote calling system identification module, and recognition result is submitted to other functional modules to continue next step flow process.

System management module 114:

Whole system is managed and configures, comprise user management, log recording and management, management of service logic etc.

Fig. 3 illustrates speech recognition schematic flow sheet of the present invention, and step is as follows.

1) user opens application;

2) user gives an oral account the text that needs the location and select;

3) terminal is obtained user's speech data;

4) according to recognition method, be chosen in terminal this locality or system and identify;

5) obtain recognition result after, use search and the location of initiating text with recognition result;

6) to the text of location, use and choose corresponding text and highlight;

7) user's menu of can taking immediately to breathe out further operates.

The below gives one example to illustrate that the mobile terminal of system of the present invention triggers the flow process of business by voice mode, and as shown in Figure 2, in this embodiment, business comprises the following steps:

Step 1: user's application of opening a terminal, oral account thing partial content;

Step 2: terminal converts audio format data to by microphone recording user voice, submits to terminal or system to carry out the identification of voice according to recognition method;

Step 3: terminal or system carry out speech recognition, obtain content of text corresponding to voice;

Step 4. terminal applies is initiated search and location with the sentence that obtains as search parameter;

After step 5. terminal applies retrieves the content that comprises voice identification result, locate this content and select corresponding text and highlight demonstration;

The step 6. user menu mode of can taking to breathe out further operates, and comprises and copies, and shears etc.

Description of the invention is in order to provide for the purpose of example and explanation, and is not exhaustively or limit the invention to disclosed form.Many modifications and variations are obvious for the ordinary skill in the art.Selecting and describing embodiment is for better explanation principle of the present invention and practical application, thereby and makes those of ordinary skill in the art can understand the various embodiment with various modifications that the present invention's design is suitable for special-purpose.

Claims

1. text location and system of selection based on a speech recognition, it is characterized in that, give an oral account by the user content that need to choose on terminal, carry out speech recognition in terminal or system, be converted to word and initiate the search of the content of text of current active window take recognition result as keyword, obtain location and the position-based of identification text and select corresponding content of text.

2. as claimed in claim 1, terminal is to comprise various fixing or portable terminal devices, it is characterized in that, terminal has comprised traditional computer, mobile phone, and the support voice such as panel computer are obtained the various terminal devices with network function.

3. as claimed in claim 1, the user gives an oral account the partial content fragment, terminal records these voice and identifies, it is characterized in that, the contents fragment of user's oral account, terminal is obtained and the recording user speech data by microphone, be converted to the phonetic matrix of speech recognition requirement, be chosen in terminal according to the terminal equipment software and hardware recognition capability and carry out speech recognition, or by carrying out speech recognition with method of service open system end speech recognition interface Request System, obtain text corresponding to voice.

4. as claimed in claim 3, terminal is identified in terminal this locality or system after obtaining voice content, it is characterized in that, according to terminal hardware environment and ability, terminal can load sound identification module, also can carry out the identification of content to the online speech-recognition services of system by the audio content that transmission is obtained, and initiate afterwards search and the location of the text of current active window in identification.

5. as claimed in claim 4, terminal is obtained search and the location of initiating content text after content of text corresponding to voice, it is characterized in that, terminal is after obtaining the content of text of corresponding voice, at current text corresponding to active window search, search the position of localization of text after text, and according to the selection mode of current terminal system to operations such as text highlight, user's corresponding menu of can breathing out subsequently further operates.

6. as claimed in claim 5, terminal according to the selection mode of current terminal system to operations such as text highlight, user's corresponding menu of can breathing out subsequently further operates, it is characterized in that, after navigating to corresponding content of text, the user can be by the modes such as the button corresponding choice menus of breathing out, menu comprises various operation options, comprise common copying, shear, the various operations such as share.