CN103035247B - Based on the method and device that voiceprint is operated to audio/video file - Google Patents
Based on the method and device that voiceprint is operated to audio/video file Download PDFInfo
- Publication number
- CN103035247B CN103035247B CN201210518118.4A CN201210518118A CN103035247B CN 103035247 B CN103035247 B CN 103035247B CN 201210518118 A CN201210518118 A CN 201210518118A CN 103035247 B CN103035247 B CN 103035247B
- Authority
- CN
- China
- Prior art keywords
- audio
- voiceprint
- contact person
- video file
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/54—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for retrieval
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/63—Querying
- G06F16/632—Query formulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7834—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using audio features
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/22—Interactive procedures; Man-machine interfaces
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Acoustics & Sound (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Library & Information Science (AREA)
- Databases & Information Systems (AREA)
- Human Computer Interaction (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
The present invention discloses a kind of method operated to audio/video file based on voiceprint, comprises the following steps:Gather the voiceprint of audible target;And audio/video file is searched for according to the voiceprint.Present invention also offers a kind of terminal device.Technical scheme proposed by the present invention, audio/video file can be classified according to the voiceprint of particular contact, when user wants to find the audio/video file for including particular contact, need not the broadcasting of file one by one check, but directly selected, so as to facilitate user to search the audio-video document containing specific people's sound.Further, the method operated to audio/video file based on voiceprint that the present invention is provided can jump directly to the timing node that certain contact person in audio/video speaks and play out, so as to provide the search efficiency of user.
Description
Technical field
The present invention relates to mobile device communication application field, more particularly to according to particular contact vocal print to terminal device sound
The method and device of vision operation.
Background technology
Phonographic recorder or image pick-up device on existing terminal device can facilitate user to record and shoot Voice & Video file.With
The performance for terminal device is improved, memory capacity increase, and the condition such as the species of multimedia application increases, user is easy to record
Make or shoot substantial amounts of audio/video file.However, facing to a large amount of audio/video files, when user requires to look up all records
The audio/video file of certain particular contact is formed with, or is searched and is played certain particular contact in certain audio/video text
During a certain section of customizing messages in part, due to cannot quickly position, the situation for having no way of searching can be run into.One file of only one of which
Broadcasting check, can just obtain required file or fragment.
In view of this, it is desirable to provide one kind is quick to search and class object audio/video file, and positions particular contact
The method and terminal device of time of occurrence point in this document, has specific people's sound and video to facilitate user to search to record
File.
The content of the invention
In order to solve the above-mentioned technical problem, realize that user quickly searches the file recorded and have specific people's sound or video.
An object of the present invention is to provide a kind of method operated to audio/video file based on voiceprint,
Comprise the following steps:Gather the voiceprint of audible target;And audio/video file is searched for according to the voiceprint;Wherein,
All sound being recorded in the audio/video file are divided into multiple voice units, and each voice unit only includes it
In a voice for audible target, and record time point of the audible target in the audio/video file.
Another object of the present invention is to provide a kind of terminal device, including:Voiceprint extraction module, for gathering sounding mesh
Target voiceprint;And performing module, for searching for audio/video file according to the voiceprint;Wherein, the sound/regard
All sound being recorded in frequency file are divided into multiple voice units, and each voice unit only includes one of sounding
The voice of target, and record time point of the audible target in the audio/video file.
The method and apparatus that the present invention is provided, can quickly search the file recorded and have specific people's sound or video, with
Improve the search efficiency of user.
The additional aspect of the present invention and advantage will be set forth in part in the description, and these will become from the following description
Obtain substantially, or recognized by practice of the invention.
Brief description of the drawings
The above-mentioned and/or additional aspect of the present invention and advantage will become from description below in conjunction with the accompanying drawings to implementation
Obtain substantially and be readily appreciated that, wherein:
Fig. 1 shows schematic flow sheet according to an embodiment of the invention;
Fig. 2 shows that the terminal device of an embodiment of the invention carries out the interface schematic diagram before audio collection;
Fig. 3 shows the flow chart of audio collection according to embodiments of the present invention;
Interface schematic diagram when Fig. 4 shows that the terminal device of an embodiment of the invention carries out audio collection;
Fig. 5 shows after the video and audio file that search out recording that terminal device shows and is labeled with sounding hereof
The voiceprint appearance of target and/or the interface schematic diagram at the time point for terminating;
Fig. 6 shows the flow chart that contact person's media library is checked by terminal device of an embodiment of the invention;
Fig. 7 shows the flow chart of recording contact person's sound according to embodiments of the present invention;
Fig. 8 shows overall structure diagram according to an embodiment of the invention;
Fig. 9 shows structural representation according to an embodiment of the invention.
Specific embodiment
Illustrative embodiment of the invention is specifically described referring now to accompanying drawing.However, the present invention can be with many not
Implement and should not be construed as limited to the specific implementation method for illustrating here with form;Conversely, there is provided these implementations
Be in order that disclosure of the invention is thorough and complete, and intactly passed on to those skilled in the art thought of the invention, idea,
Purpose, design, reference scheme and protection domain.Used in the detailed description of the specific illustrative implementation of example in accompanying drawing
Term is not meant to limit the present invention.In accompanying drawing, identical label refers to identical element.
Those skilled in the art of the present technique are appreciated that unless expressly stated, singulative " " used herein, " one
It is individual ", " described " and " being somebody's turn to do " may also comprise plural form.It is to be further understood that what is used in specification of the invention arranges
Diction " including " refer to the presence of the feature, integer, step, operation, element and/or component, but it is not excluded that in the presence of or addition
One or more other features, integer, step, operation, element, component and/or their group.It should be understood that when we claim unit
Part is " connected " or during " coupled " to another element, and it can be directly connected or coupled to other elements, or can also exist
Intermediary element.Additionally, " connection " used herein or " coupling " can include wireless connection or coupling.Wording used herein
"and/or" includes one or more associated any cells for listing item and all combines.
Those skilled in the art of the present technique are appreciated that unless otherwise defined, all terms used herein (including technology art
Language and scientific terminology) have with art of the present invention in those of ordinary skill general understanding identical meaning.Should also
Understand, those terms defined in such as general dictionary should be understood that the meaning having with the context of prior art
The consistent meaning of justice, and unless defined as here, will not be with idealizing or excessively formal implication be explained.
As shown in figure 1, the invention provides a kind of method operated to audio/video file based on voiceprint, bag
Include following steps:The voiceprint of S1, collection audible target;And S2, according to voiceprint search for audio/video file.
For example, step S1 is realized by the following method:When contact person X1 makes a phone call to user Y, terminal device is opened interior
Put phonographic recorder and record the voice (for example, the spoken sounds recorded, time span 7-10 seconds) that one section of contact person X1 individually talks,
And therefrom extract voiceprint;Then, after stopping call, terminal device is according to the voiceprint generation speaker model M1 for recording
Afterwards, the sample is stored in media library;Then, speaker model is corresponded to terminal device the register of contact person in address list X.
For example, step S1 is also realized by the following method:When user Y bands son X2 goes to park to play, terminal device exists
" recording vocal print sample " option is opened in address list in the record of son X2 and the voiceprint of son X2 is recorded;Then, stop
After recording, after terminal device is according to the voiceprint generation speaker model M2 for recording, the sample is stored in terminal memory;Connect
, speaker model is corresponded to terminal device the file of contact person X2 in media library.It will of course be understood that being, media library is to deposit
A kind of statement of storage collection of multimedia documents, it is also possible to be expressed as file, file manager, media manager, video management
Device, audio manager etc..As shown in figure 5, the voiceprint for including speaker model M1 and M2, terminal ought be run into again later
These videos and audio file are classified and are marked by equipment according to special object (for example, " I " and " son ").In classification
After storage, the information such as Subject field, file, the media library of corresponding classification can be generated.
Step S1 can also be achieved by the steps of:A sounding in step S11, elected middle address list application program
There is provided during target (for example, Zhang San), on display screen and record vocal print sample option;Step S12, when user click on record vocal print sample
After option, terminal device collection voiceprint, and the speaker model storage that will be generated according to voiceprint is in contact person's media
In storehouse;And step S13, when enter contact person's media library page after, display screen shows the audio/video file for searching.Cause
This, the voiceprint for gathering audible target includes:When certain audible target in choosing, voiceprint is gathered;And storage collection
Voiceprint.
Fig. 2 shows that the terminal device of an embodiment of the invention carries out the interface schematic diagram before audio collection.
Fig. 3 shows the flow chart of audio collection according to embodiments of the present invention.Audio collection flow comprises the following steps:Step 101:
Entry communication is recorded, and opens particular contact on telephone directory.Then, step 102:By " recording vocal print sample " option (such as Fig. 2 institutes
Show), record contact person's sound (that is, gathering the voiceprint of contact person).Then, step 103:After the completion of recording, to contact person
Sound be modeled, to generate speaker model, and speaker model is saved in associated person information.Therefore, collection and
Storage voiceprint includes:Speaker model is generated according to voiceprint;And speaker model storage is being locally stored mould
In block.
Fig. 4 shows modeling process according to an embodiment of the present invention.Using voiceprint identification speaker's identity
Technology is properly termed as Speaker Identification (Speaker Recognition, SR), and corresponding model is properly termed as speaker model
(Speaker Model, SM).Speaker Recognition System is generally modeled using the method for UBM-GMM, i.e., by a large amount of training
Audio (a more than speaker) trains a universal background model (Universal Background Model, UBM), then
Method by self adaptation on the basis of this UBM is modeled to specific speaker, obtains speaker model (SM).No matter
Universal background model or speaker model, generally all using mixed Gauss model (Gaussian Mixture Model,
GMM) structure.
Interface schematic diagram when Fig. 4 shows that the terminal device of an embodiment of the invention carries out audio collection.Example
Such as, when terminal device records vocal print sample under address book contact interface (as shown in Figure 4), click on addition and record vocal print sample
Button can just record contact person's sound.
Further, as shown in figure 3, Application on Voiceprint Recognition flow comprises the following steps:Step 104:Determine audio/video file.
Then, step 105:Speaker's segmentation is carried out to the voice in audio/video file, and generates n voice unit, each voice list
Unit only includes single speaker's voice.Then, step 106:To each voice unit for being partitioned into (for example, n voice list
Unit) carry out contact person's Application on Voiceprint Recognition and judge whether matching.Then, step 107:It it is end if recognition result is matching
End equipment sets up the database of corresponding relation between a contact person and this audio/video file.Further, the number of corresponding relation
Can be recorded according to storehouse and the audio/video file of contact person's sound occur.Further, the database of corresponding relation can also record connection
It is the time point during people's sound appears in audio/video file.That is, map audio/video by time point appearing in accordingly
Position in file.
Fig. 6 shows the flow chart that contact person's media library is checked by terminal device of an embodiment of the invention.
The flow for checking contact person's media library by terminal device may include steps of:Step 201:Open media library, select into
Enter " contact person's media library " menu.Then, step 202:Start to read contact person and audio/video document relationship database.Then,
Step 203:Contact person and its correspondence media file and time point 203 are shown after the completion of reading.
Fig. 5 shows after the video and audio file that search out recording that terminal device shows and is labeled with sounding hereof
The voiceprint appearance of target and/or the interface schematic diagram at the time point for terminating.For example, opening media library, selection enters " contact
People's media library " menu, at this moment checks that the interface of contact person's media library is presented to user.On interface provide through read contact person and
Every terms of information after audio/video document relationship database.Therefore, searching for audio/video file according to voiceprint includes:Work as opening
When module is locally stored, audio/video file is shown.
Further, as can be seen that having " son " and " I " in the media library of the implementation method from the interface shown in Fig. 5
Two class media files, wherein:Have three time points, i.e., 3 ' 45 in " International Children's Day " project of " son " file ", 18 ' 23 ",
45’34”.These three time points are exactly occur the time point of " son " sound in " International Children's Day " project.For example, user can be with
Selection " 3 ' 45 " ", commences play out when at this moment terminal device can enter to go to 45 seconds 3 minutes in " International Children's Day " project automatically.
Therefore, the voiceprint of storage collection includes:Classification storage is carried out according to speaker model.Further, according to voiceprint
Search audio/video file includes:When opening is locally stored module, audio/video file is shown.Further, the classification bag
Include:Classification is carried out according to speaker model to audio/video file to show.Further, the display includes:Display audible target
Appear in the time point in audio/video file.Further, the classification includes:Species according to audible target is to audio/video
File is searched classifiably.Further, the time point include:When the time point in display of classifying in choosing, broadcasting sound/regard
The audio/video of the audible target contained in frequency file.
As shown in figures 1 to 6, another implementation method of the invention, when terminal device to audio/video file according to specific
When contact person is classified, it is necessary first to carry out the modeling and storage of vocal print for its emphasis contact person in address list module.
The present invention is that each contact person record increases " vocal print sample " field, for storing in terminal device address list module
The vocal print sample of contact person.Concrete operation method is:User is newly-built or edits the important relation people (such as " child ") of its concern.With
Afterwards, one section of audio of the particular contact (" child ") (for example, recording normal speech, time span 7-10 seconds) is recorded.Terminal
Equipment is modeled according to sample sound to particular contact (" the child ") vocal print, and is saved in the address list contact person record
In the vocal print sample field of (" child ").Then, user's Record and Save audio/video file on the terminal device.The present invention
Important relation people voiceprint analysis can be carried out and be classified according to contact person, Tag Contact's sound time of origin point it is right
As.Then, the sound of all speakers being recorded in audio/video file is extracted and is split using speaker's cutting techniques
It is multiple voice units, each voice unit only includes the voice of one of speaker.Then, using speaker model to every
Individual voice unit carries out Application on Voiceprint Recognition.Then, to the database of storage contact person and audio/video relation after Application on Voiceprint Recognition, it is used for
Record contact person and the corresponding relation of audio/video file, and the time point that contact person's sound occurs in this audio/video file.
The vocal print that the present invention is mentioned refers to:The sound wave spectrum of user voice is the biological characteristic of the user voice.Compared by vocal print, moved
Dynamic terminal can find out the respective objects in the multimedia of storage.Therefore, certain in audible target is contact application
During individual contact person, the method for gathering the voiceprint of audible target includes:When being conversed with the contact person, contact person is recorded
One section of sound, this section of sound time length 7-10 seconds and the only sound of the contact person above and in this section of sound.Using this
Duan Shengyin extracts voiceprint and generates vocal print template.Further, certain in audible target is contact application
During contact person, the voiceprint for gathering audible target includes:When being conversed with the contact person, the vocal print letter of contact person is recorded
Breath.Further, when audible target is certain contact person in contact application, the voiceprint of audible target is gathered
Including:User records contact person's voice manually, records the voiceprint of contact person.Further, when audible target is contact
During certain contact person in people's application program, search audio/video file includes:As the contact person in choosing, mapping contact is played
The audio/video of people.
Fig. 7 shows the flow chart of recording contact person's sound according to embodiments of the present invention.Record the stream of contact person's sound
Journey includes:Step 301:Open certain contact person on address list.Then, step 302:Judge whether it is to record for the first time.
When judged result is when recording for the first time, into step 303:Start to record.Then, step 304:After the completion of recording
Preserve this audio.Then, step 305:Vocal print modeling is carried out to the audio.Then, step 306:Preserve vocal print modeling information.Connect
, step 307:Existing audio/video file is recognized with this voiceprint.Then, step 308:The file that will identify that and time
Point is saved in contact person and audio/video relational database.Finally, step 309:Vocal print records end-of-job.
When judged result is not when recording for the first time, then into step 310:Determine whether whether prompting records again.
If necessary to record again, then into step 311:Delete original recording file.After deleting original recording file, then into step
303.Above-mentioned steps 303 to 309 are then performed successively.If need not record again, do not record, process terminates (309).
Another implementation method of the invention, one kind is entered based on sound groove recognition technology in e to video on terminal device and audio
One of row classification and the method for mark, comprise the following steps:Contact person's sound is recorded to shift to an earlier date voiceprint.Then, by sound/regard
Frequency file carries out speaker's segmentation, is divided into multiple voice units, and each voice unit comprises only the voice of speaker,
Application on Voiceprint Recognition is carried out one by one to these voice units.Then, recognition result is saved in contact person and audio/video relational database
In.When contact person's media library is entered, or carried out in any media library of terminal device or file manager as user " according to
When contact categories " or " according to searching contact person " are operated, or contact person's phase is directly viewable in contact application
When closing audio frequency and video, read the relational database of contact person and audio/video and show their relation.The present invention not only may be used
With so that the relation of contact person and audio/video is shown in the way of a certain menu item in media library, it is also possible in contact person or text
Shown with menu-style in part manager.
Further, another implementation method of the invention, in terminal device media library, contact manager, file
In the application programs such as manager, selection " according to contact categories " or " according to searching contact person " come carry out audio, video point
Class shows and searches.Further, another implementation method of the invention, can be directly viewable in contact application
The related audio/video of the contact person.
Therefore, the method operated to audio/video file based on voiceprint that the present invention is provided can be according to specific
The voiceprint of contact person is classified to audio/video file.Therefore, when user want to find the sound for including particular contact/
Video file, it is not necessary to which the broadcasting of file one by one is checked, but directly passes through media library, contact manager, file management
Device display information is selected, so as to facilitate user to search the file containing specific people's sound or video.Further, this hair
The method operated to audio/video file based on voiceprint of bright offer can jump directly to certain connection in audio/video
It is that the timing node that people speaks is played out, so as to provide the search efficiency of user.
As shown in figure 8, overall plan of the invention is properly termed as using the technology of voiceprint identification speaker's identity
Words people identification (Speaker Recognition, SR), corresponding model be properly termed as speaker model (Speaker Model,
SM).Speaker Recognition System is generally modeled using the method for UBM-GMM, i.e., by a large amount of training audios, (more than one is said
Words people) one universal background model (Universal Background Model, UBM) of training, then on the basis of this UBM
Specific speaker is modeled by the method for self adaptation, obtains speaker model (SM).Either universal background model
Or speaker model, generally all uses mixed Gauss model (Gaussian Mixture Model, GMM) structure.Such as Fig. 8 institutes
Show, the method operated to audio/video file based on voiceprint that the present invention is provided can include:Modeling process, identification
Process.Modeling process may comprise steps of:Step 1:Training audio;Step 2:Jing Yin detection;Step 3:Voice is split;Step
Rapid 4:Feature extraction;Step 5:Intersection self adaptation is carried out according to universal background model;Step 6:Generation speaker model;Step 7:
Z-norm treatment is carried out based on personator's audio;Step 8:Normalization speaker model.Identification process may comprise steps of:
Step 1:Detect audio to be identified;Step 2:Jing Yin detection;Step 3:Voice is split;Step 4:Feature extraction;Step 5:According to
Normalization speaker model carries out score calculating;Step 6:T-norm treatment is carried out based on personator's audio;Step 7:Judgement;Step
Rapid 8:Output recognition result.Wherein:Normalization speaker model and personator's model group into speaker model.It is of the invention
One implementation method, the modeling process of speaker model can be described generally as following several stages:1st, feature extraction phases:Utilize
Jing Yin detection technique (Voice Activity Detection, VAD), effective voice is detected from input audio,
And audio segmentation will be input into some voices according to the Jing Yin length between voice, then carried from each voice for splitting
Phonetic feature required for taking Speaker Identification;2nd, the UBM modelling phases:Using a large amount of phonetic features from training audio extraction,
Calculate universal background model (UBM);3rd, the SM modelling phases:Voice using universal background model and a small amount of speaker dependent is special
Levy, the model (SM) of the speaker is calculated by adaptive approach;4th, the SM normalization stages:In order to strengthen the anti-of speaker model
Interference performance, completes after speaker model modeling, is frequently utilized that the phonetic feature of some personation speakers to speaker model
(Normalization) operation is normalized, the speaker model (Normalized SM) after normalization is finally given.Root
According to one embodiment of the present invention, the identification process of Speaker Identification can be described generally as following several stages:1st, feature is carried
Take the stage:This stage is identical with the feature extraction phases of modeling process;2nd, score calculation stages:Using speaker model, calculate
It is input into the score of phonetic feature;3rd, the Score Normalization stage:Using normalized speaker model, to score obtained in the previous step
It is normalized, and makes conclusive judgement.Furthermore, in modeling as described above and identification process, part steps
There can be different implementation methods:1st, the Jing Yin detection technique of feature extraction phases:The application use method be first with
The energy information and fundamental frequency information of audio are input into, are distinguished Jing Yin with non-mute, recycle a SVMs
(Support Vector Machine, SVM) model distinguishes the voice and non-voice of non-mute part.Voice is determined
Part, it is possible to according to the gap length between voice segments, input audio is divided into some voices;2nd, using common background
Model calculates the adaptive approach of speaker model:The application uses eigentones (Eigenvoice) method, and constraint is maximum
Likelihood linear regression (Constrained Maximum Likelihood Linear Regression, CMLLR) method and
The method that structuring maximum a posteriori probability (Structured Maximum A Posterior, SMAP) method is combined;3rd, say
Words people's model method for normalizing:The application uses Z-Norm methods;4th, score normalization:The application uses T-
Norm methods.The method for normalizing that Z-Norm and T-Norm methods are combined is most popular in speaker Recognition Technology at present
Method for normalizing, the former is used for the modelling phase, and the latter is used for cognitive phase.
As shown in figure 9, another object of the present invention is to provide a kind of terminal device, including:Voiceprint extraction module, is used for
Gather the voiceprint of audible target;And performing module, for searching for audio/video file according to voiceprint.
Further, voiceprint extraction module includes:Voiceprint collecting unit, for being adopted when certain audible target is chosen
Collection voiceprint;Vocal print sample generation unit, for generating speaker model according to voiceprint.
Further, device also includes:Memory module, the voiceprint for storing collection.
Further, memory module is additionally operable to:Storage vocal print template sample.
Further, voiceprint extraction module includes:Target classification unit, classification storage is carried out according to speaker model.
Further, device also includes:Display, when opening is locally stored module, shows audio/video file.
Further, display is used for:Audio/video file is entered according to the species that target classification unit is based on audible target
Row classification display.
Further, display is used for:Display audible target appears in the time point in audio/video file.
Further, target classification unit is additionally operable to:Species according to audible target carries out classification and searches to audio/video file
Rope.
Further, performing module is additionally operable to:When the time point in display of classifying in choosing, in broadcasting audio/video file
The audio/video of the audible target for containing.
Further, when audible target is certain contact person in contact application, voiceprint extraction module is used for:
When being conversed with the contact person, the voiceprint of contact person is recorded.
Further, when audible target is certain contact person in contact application, voiceprint extraction module is used for:
User records contact person's voice manually, records the voiceprint of contact person.
Further, when audible target is certain contact person in contact application, performing module is additionally operable to:When
When choosing the contact person, the audio/video of mapping contact person is played.
The method and apparatus that the present invention is provided, can quickly search the file recorded and have specific people's sound or video, with
Improve the search efficiency of user.
During those skilled in the art of the present technique are appreciated that the present invention can be related to for performing operation described herein
One or more equipment of operation.The equipment can be for needed for purpose and specially design and manufacture, or can also include
Known device in all-purpose computer, the all-purpose computer is activated or reconstructed with having procedure Selection of the storage in it.This
The computer program of sample can be stored in equipment (for example, computer) computer-readable recording medium or store be suitable to storage electronics refer to
Make and be coupled to respectively in any kind of medium of bus, the computer-readable medium is including but not limited to any kind of
Disk (including floppy disk, hard disk, CD, CD-ROM and magneto-optic disk), immediately memory (RAM), read-only storage (ROM), electricity can be compiled
Journey ROM, electrically erasable ROM (EPROM), electrically erasable ROM (EEPROM), flash memory, magnetic card or light card.It is readable
Medium is included for by any mechanism of the readable form storage of equipment (for example, computer) or transmission information.For example, readable
Medium include immediately memory (RAM), read-only storage (ROM), magnetic disk storage medium, optical storage medium, flash memory device, with
Signal (such as carrier wave, infrared signal, data signal) that electricity, light, sound or other forms are propagated etc..
Those skilled in the art of the present technique are appreciated that method above with reference to implementation of the invention, method, are
Invention has been described for the structure chart and/or block diagram and/or flow graph of system and computer program product.It should be understood that can
With each frame and these structure charts during these structure charts and/or block diagram and/or flow graph are realized with computer program instructions
And/or the combination of the frame in block diagram and/or flow graph.These computer program instructions can be supplied to all-purpose computer, specialty
The processor of computer or other programmable data processing methods generates machine, so as to pass through computer or other programmable numbers
The instruction performed according to the processor of processing method creates the frame or many for realizing structure chart and/or block diagram and/or flow graph
The method specified in individual frame.
Those skilled in the art of the present technique are appreciated that in various operations, method, the flow discussed in the present invention
Step, measure, scheme can be replaced, changed, combined or deleted.Furthermore, with having discussed in the present invention
Other steps, measure in various operations, method, flow, scheme can also be replaced, changed, reset, decomposed, combined or deleted
Remove.Furthermore, it is of the prior art with various operations, method, the flow disclosed in the present invention in step, arrange
Apply, scheme can also be replaced, changed, reset, decompose, combines or be deleted.
Illustrative embodiment of the invention is disclosed in drawing and description.Despite the use of particular term, but it
Be only used for general and description meaning, and be not for purposes of limitation.It should be pointed out that general for the art
For logical technical staff, under the premise without departing from the principles of the invention, some improvements and modifications can also be made, these improve and
Retouching also should be regarded as protection scope of the present invention.Protection scope of the present invention should be limited with claims of the present invention.
Claims (23)
1. a kind of method operated to audio/video file based on voiceprint, it is characterised in that comprise the following steps:
Gather the voiceprint of audible target;And
Audio/video file is searched for according to the voiceprint, terminal device shows the sound for being labeled with audible target hereof
The time point that line information occurs and/or terminates;
Wherein, all sound being recorded in the audio/video file are divided into multiple voice units, each voice unit
The voice of one of audible target is only included, and records time point of the audible target in the audio/video file, led to
Cross the position that the time point mapping audio/video is appeared in corresponding document.
2. method according to claim 1, it is characterised in that the voiceprint of the collection audible target includes:
When certain audible target in choosing, voiceprint is gathered;And
Store the voiceprint of collection.
3. method according to claim 2, it is characterised in that collection and storage voiceprint include:
Speaker model is generated according to the voiceprint;And
By speaker model storage in module is locally stored.
4. method according to claim 3, it is characterised in that the voiceprint of the storage collection includes:
Classification storage is carried out according to the speaker model.
5. method according to claim 3, it is characterised in that searching for audio/video file according to the voiceprint includes:
When module is locally stored described in opening, the audio/video file is shown.
6. method according to claim 4, it is characterised in that the classification includes:
Classification is carried out according to the speaker model to audio/video file to show.
7. method according to claim 6, it is characterised in that the classification includes:
Species according to the audible target is searched classifiably to audio/video file.
8. method according to claim 6, it is characterised in that the time point includes:
When the time point in display of classifying in choosing, from the time point commence play out the audio/video file in contain
The audio/video of the audible target.
9. method according to claim 1, it is characterised in that certain in the audible target is contact application
During individual contact person, the voiceprint of the collection audible target includes:
When being conversed with the contact person, the voiceprint of the contact person is recorded.
10. method according to claim 1, it is characterised in that in the audible target is contact application
During certain contact person, the voiceprint of the collection audible target includes:
User records contact person's voice manually, records the voiceprint of the contact person.
11. methods according to claim 1, it is characterised in that in the audible target is contact application
During certain contact person, the search audio/video file includes:
As the contact person in choosing, the audio/video of the mapping contact person is played.
A kind of 12. terminal devices, it is characterised in that including:
Voiceprint extraction module, the voiceprint for gathering audible target;And
Performing module, for searching for audio/video file according to the voiceprint;
Display, the time point that the voiceprint of audible target occurs and/or terminates is labeled with for showing hereof;
Wherein, all sound being recorded in the audio/video file are divided into multiple voice units, each voice unit
The voice of one of audible target is only included, and records time point of the audible target in the audio/video file, led to
Cross the position that the time point mapping audio/video is appeared in corresponding document.
13. terminal devices according to claim 12, it is characterised in that the voiceprint extraction module includes:
Voiceprint collecting unit, for gathering voiceprint when certain audible target is chosen;
Vocal print sample generation unit, for generating speaker model according to the voiceprint.
14. terminal devices according to claim 13, it is characterised in that also include:
Memory module, the voiceprint for storing collection.
15. terminal devices according to claim 14, it is characterised in that the memory module is additionally operable to:Storage is stated
Words people's model.
16. terminal device according to claim 13 or 15, it is characterised in that the voiceprint extraction module includes:
Target classification unit, classification storage is carried out according to the speaker model.
17. terminal devices according to claim 14, it is characterised in that the display, when opening is locally stored module
When, show the audio/video file.
18. terminal devices according to claim 16, it is characterised in that the display is used for:
The species for being based on the audible target according to the target classification unit carries out classification and shows to the audio/video file.
19. terminal devices according to claim 18, it is characterised in that the target classification unit is additionally operable to:
Species according to audible target is searched classifiably to audio/video file.
20. terminal devices according to claim 18, it is characterised in that the performing module is additionally operable to:
When the time point in display of classifying in choosing, from the time point commence play out the audio/video file in contain
The audio/video of the audible target.
21. terminal devices according to claim 12, it is characterised in that when the audible target is contact application
In certain contact person when, the voiceprint extraction module is used for:
When being conversed with the contact person, the voiceprint of the contact person is recorded.
22. terminal devices according to claim 12, it is characterised in that when the audible target is contact application
In certain contact person when, the voiceprint extraction module is used for:
User records contact person's voice manually, records the voiceprint of the contact person.
23. terminal devices according to claim 12, it is characterised in that when the audible target is contact application
In certain contact person when, the performing module is additionally operable to:
As the contact person in choosing, the audio/video of the mapping contact person is played.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710439537.1A CN107274916B (en) | 2012-12-05 | 2012-12-05 | Method and device for operating audio/video file based on voiceprint information |
CN201210518118.4A CN103035247B (en) | 2012-12-05 | 2012-12-05 | Based on the method and device that voiceprint is operated to audio/video file |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210518118.4A CN103035247B (en) | 2012-12-05 | 2012-12-05 | Based on the method and device that voiceprint is operated to audio/video file |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710439537.1A Division CN107274916B (en) | 2012-12-05 | 2012-12-05 | Method and device for operating audio/video file based on voiceprint information |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103035247A CN103035247A (en) | 2013-04-10 |
CN103035247B true CN103035247B (en) | 2017-07-07 |
Family
ID=48022078
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710439537.1A Active CN107274916B (en) | 2012-12-05 | 2012-12-05 | Method and device for operating audio/video file based on voiceprint information |
CN201210518118.4A Active CN103035247B (en) | 2012-12-05 | 2012-12-05 | Based on the method and device that voiceprint is operated to audio/video file |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710439537.1A Active CN107274916B (en) | 2012-12-05 | 2012-12-05 | Method and device for operating audio/video file based on voiceprint information |
Country Status (1)
Country | Link |
---|---|
CN (2) | CN107274916B (en) |
Families Citing this family (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109117665A (en) * | 2013-08-14 | 2019-01-01 | 华为终端(东莞)有限公司 | Realize method for secret protection and device |
CN104123115B (en) * | 2014-07-28 | 2017-05-24 | 联想(北京)有限公司 | Audio information processing method and electronic device |
CN104243934A (en) * | 2014-09-30 | 2014-12-24 | 智慧城市信息技术有限公司 | Method and device for acquiring surveillance video and method and device for retrieving surveillance video |
TWI571120B (en) * | 2014-10-06 | 2017-02-11 | 財團法人資訊工業策進會 | Video capture system and video capture method thereof |
CN104268279B (en) * | 2014-10-16 | 2018-04-20 | 魔方天空科技(北京)有限公司 | The querying method and device of corpus data |
CN105828179A (en) * | 2015-06-24 | 2016-08-03 | 维沃移动通信有限公司 | Video positioning method and device |
CN105022263B (en) * | 2015-07-28 | 2018-03-27 | 广东欧珀移动通信有限公司 | A kind of method and intelligent watch for controlling intelligent watch |
CN106548793A (en) * | 2015-09-16 | 2017-03-29 | 中兴通讯股份有限公司 | Storage and the method and apparatus for playing audio file |
CN105635452B (en) * | 2015-12-28 | 2019-05-10 | 努比亚技术有限公司 | Mobile terminal and its identification of contacts method |
CN105654942A (en) * | 2016-01-04 | 2016-06-08 | 北京时代瑞朗科技有限公司 | Speech synthesis method of interrogative sentence and exclamatory sentence based on statistical parameter |
CN106095764A (en) * | 2016-03-31 | 2016-11-09 | 乐视控股(北京)有限公司 | A kind of dynamic picture processing method and system |
CN106448683A (en) * | 2016-09-30 | 2017-02-22 | 珠海市魅族科技有限公司 | Method and device for viewing recording in multimedia files |
CN107452408B (en) * | 2017-07-27 | 2020-09-25 | 成都声玩文化传播有限公司 | Audio playing method and device |
CN108305636B (en) | 2017-11-06 | 2019-11-15 | 腾讯科技(深圳)有限公司 | A kind of audio file processing method and processing device |
CN108074574A (en) * | 2017-11-29 | 2018-05-25 | 维沃移动通信有限公司 | Audio-frequency processing method, device and mobile terminal |
CN108364663A (en) * | 2018-01-02 | 2018-08-03 | 山东浪潮商用系统有限公司 | A kind of method and module of automatic recording voice |
CN108364654B (en) * | 2018-01-30 | 2020-10-13 | 网易乐得科技有限公司 | Voice processing method, medium, device and computing equipment |
CN108319371A (en) * | 2018-02-11 | 2018-07-24 | 广东欧珀移动通信有限公司 | Play control method and related product |
CN108920619A (en) * | 2018-06-28 | 2018-11-30 | Oppo广东移动通信有限公司 | File display method and device, storage medium and electronic equipment |
CN109446356A (en) * | 2018-09-21 | 2019-03-08 | 深圳市九洲电器有限公司 | A kind of multimedia document retrieval method and device |
CN111091844A (en) * | 2018-10-23 | 2020-05-01 | 北京嘀嘀无限科技发展有限公司 | Video processing method and system |
CN111462761A (en) * | 2020-03-03 | 2020-07-28 | 深圳壹账通智能科技有限公司 | Voiceprint data generation method and device, computer device and storage medium |
CN111883139A (en) * | 2020-07-24 | 2020-11-03 | 北京字节跳动网络技术有限公司 | Method, apparatus, device and medium for screening target speech |
CN112153461B (en) * | 2020-09-25 | 2022-11-18 | 北京百度网讯科技有限公司 | Method and device for positioning sound production object, electronic equipment and readable storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1156871A (en) * | 1995-11-17 | 1997-08-13 | 雅马哈株式会社 | Personal information database system |
CN1307589C (en) * | 2001-04-17 | 2007-03-28 | 皇家菲利浦电子有限公司 | Method and apparatus of managing information about a person |
CN102238189A (en) * | 2011-08-01 | 2011-11-09 | 安徽科大讯飞信息科技股份有限公司 | Voiceprint password authentication method and system |
CN102404278A (en) * | 2010-09-08 | 2012-04-04 | 盛乐信息技术(上海)有限公司 | Song requesting system based on voiceprint recognition and application method thereof |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6345252B1 (en) * | 1999-04-09 | 2002-02-05 | International Business Machines Corporation | Methods and apparatus for retrieving audio information using content and speaker information |
US8606579B2 (en) * | 2010-05-24 | 2013-12-10 | Microsoft Corporation | Voice print identification for identifying speakers |
CN102347060A (en) * | 2010-08-04 | 2012-02-08 | 鸿富锦精密工业(深圳)有限公司 | Electronic recording device and method |
CN102655002B (en) * | 2011-03-01 | 2013-11-27 | 株式会社理光 | Audio processing method and audio processing equipment |
-
2012
- 2012-12-05 CN CN201710439537.1A patent/CN107274916B/en active Active
- 2012-12-05 CN CN201210518118.4A patent/CN103035247B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1156871A (en) * | 1995-11-17 | 1997-08-13 | 雅马哈株式会社 | Personal information database system |
CN1307589C (en) * | 2001-04-17 | 2007-03-28 | 皇家菲利浦电子有限公司 | Method and apparatus of managing information about a person |
CN102404278A (en) * | 2010-09-08 | 2012-04-04 | 盛乐信息技术(上海)有限公司 | Song requesting system based on voiceprint recognition and application method thereof |
CN102238189A (en) * | 2011-08-01 | 2011-11-09 | 安徽科大讯飞信息科技股份有限公司 | Voiceprint password authentication method and system |
Also Published As
Publication number | Publication date |
---|---|
CN107274916B (en) | 2021-08-20 |
CN103035247A (en) | 2013-04-10 |
CN107274916A (en) | 2017-10-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103035247B (en) | Based on the method and device that voiceprint is operated to audio/video file | |
US10977299B2 (en) | Systems and methods for consolidating recorded content | |
WO2020211354A1 (en) | Speaker identity recognition method and device based on speech content, and storage medium | |
US20160283185A1 (en) | Semi-supervised speaker diarization | |
US9058384B2 (en) | System and method for identification of highly-variable vocalizations | |
CN105845129A (en) | Method and system for dividing sentences in audio and automatic caption generation method and system for video files | |
CN103530432A (en) | Conference recorder with speech extracting function and speech extracting method | |
JP2007519987A (en) | Integrated analysis system and method for internal and external audiovisual data | |
CN106531159B (en) | A mobile phone source identification method based on the spectral characteristics of equipment noise floor | |
Khan et al. | A novel audio forensic data-set for digital multimedia forensics | |
CN113903363B (en) | Violation behavior detection method, device, equipment and medium based on artificial intelligence | |
CN108831456B (en) | Method, device and system for marking video through voice recognition | |
CN110136726A (en) | A kind of estimation method, device, system and the storage medium of voice gender | |
CN107507626A (en) | A kind of mobile phone source title method based on voice spectrum fusion feature | |
CN106302987A (en) | A kind of audio frequency recommends method and apparatus | |
CN109635151A (en) | Establish the method, apparatus and computer equipment of audio retrieval index | |
Pandey et al. | Cell-phone identification from audio recordings using PSD of speech-free regions | |
CN109817223A (en) | Phoneme marking method and device based on audio fingerprints | |
Baskoro et al. | Analysis of voice changes in anti forensic activities case study: voice changer with telephone effect | |
Hajihashemi et al. | Novel time-frequency based scheme for detecting sound events from sound background in audio segments | |
US9430800B2 (en) | Method and apparatus for trade interaction chain reconstruction | |
CN118284932A (en) | Method and apparatus for performing speaker segmentation clustering on mixed bandwidth speech signals | |
Li et al. | BlackFeather: A framework for background noise forensics | |
Cornaggia-Urrigshardt et al. | SCALA-Speech: An interactive system for finding and analyzing speech content in audio data | |
Cano et al. | Robust sound modelling for song identification in broadcast audio |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |