[go: up one dir, main page]

CN103035247B - Based on the method and device that voiceprint is operated to audio/video file - Google Patents

Based on the method and device that voiceprint is operated to audio/video file Download PDF

Info

Publication number
CN103035247B
CN103035247B CN201210518118.4A CN201210518118A CN103035247B CN 103035247 B CN103035247 B CN 103035247B CN 201210518118 A CN201210518118 A CN 201210518118A CN 103035247 B CN103035247 B CN 103035247B
Authority
CN
China
Prior art keywords
audio
voiceprint
contact person
video file
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210518118.4A
Other languages
Chinese (zh)
Other versions
CN103035247A (en
Inventor
杨帆
苏腾荣
李世全
马永健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Samsung Telecommunications Technology Research Co Ltd
Samsung Electronics Co Ltd
Original Assignee
Beijing Samsung Telecommunications Technology Research Co Ltd
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Samsung Telecommunications Technology Research Co Ltd, Samsung Electronics Co Ltd filed Critical Beijing Samsung Telecommunications Technology Research Co Ltd
Priority to CN201710439537.1A priority Critical patent/CN107274916B/en
Priority to CN201210518118.4A priority patent/CN103035247B/en
Publication of CN103035247A publication Critical patent/CN103035247A/en
Application granted granted Critical
Publication of CN103035247B publication Critical patent/CN103035247B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/54Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for retrieval
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/63Querying
    • G06F16/632Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7834Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using audio features
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Acoustics & Sound (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The present invention discloses a kind of method operated to audio/video file based on voiceprint, comprises the following steps:Gather the voiceprint of audible target;And audio/video file is searched for according to the voiceprint.Present invention also offers a kind of terminal device.Technical scheme proposed by the present invention, audio/video file can be classified according to the voiceprint of particular contact, when user wants to find the audio/video file for including particular contact, need not the broadcasting of file one by one check, but directly selected, so as to facilitate user to search the audio-video document containing specific people's sound.Further, the method operated to audio/video file based on voiceprint that the present invention is provided can jump directly to the timing node that certain contact person in audio/video speaks and play out, so as to provide the search efficiency of user.

Description

Based on the method and device that voiceprint is operated to audio/video file
Technical field
The present invention relates to mobile device communication application field, more particularly to according to particular contact vocal print to terminal device sound The method and device of vision operation.
Background technology
Phonographic recorder or image pick-up device on existing terminal device can facilitate user to record and shoot Voice & Video file.With The performance for terminal device is improved, memory capacity increase, and the condition such as the species of multimedia application increases, user is easy to record Make or shoot substantial amounts of audio/video file.However, facing to a large amount of audio/video files, when user requires to look up all records The audio/video file of certain particular contact is formed with, or is searched and is played certain particular contact in certain audio/video text During a certain section of customizing messages in part, due to cannot quickly position, the situation for having no way of searching can be run into.One file of only one of which Broadcasting check, can just obtain required file or fragment.
In view of this, it is desirable to provide one kind is quick to search and class object audio/video file, and positions particular contact The method and terminal device of time of occurrence point in this document, has specific people's sound and video to facilitate user to search to record File.
The content of the invention
In order to solve the above-mentioned technical problem, realize that user quickly searches the file recorded and have specific people's sound or video.
An object of the present invention is to provide a kind of method operated to audio/video file based on voiceprint, Comprise the following steps:Gather the voiceprint of audible target;And audio/video file is searched for according to the voiceprint;Wherein, All sound being recorded in the audio/video file are divided into multiple voice units, and each voice unit only includes it In a voice for audible target, and record time point of the audible target in the audio/video file.
Another object of the present invention is to provide a kind of terminal device, including:Voiceprint extraction module, for gathering sounding mesh Target voiceprint;And performing module, for searching for audio/video file according to the voiceprint;Wherein, the sound/regard All sound being recorded in frequency file are divided into multiple voice units, and each voice unit only includes one of sounding The voice of target, and record time point of the audible target in the audio/video file.
The method and apparatus that the present invention is provided, can quickly search the file recorded and have specific people's sound or video, with Improve the search efficiency of user.
The additional aspect of the present invention and advantage will be set forth in part in the description, and these will become from the following description Obtain substantially, or recognized by practice of the invention.
Brief description of the drawings
The above-mentioned and/or additional aspect of the present invention and advantage will become from description below in conjunction with the accompanying drawings to implementation Obtain substantially and be readily appreciated that, wherein:
Fig. 1 shows schematic flow sheet according to an embodiment of the invention;
Fig. 2 shows that the terminal device of an embodiment of the invention carries out the interface schematic diagram before audio collection;
Fig. 3 shows the flow chart of audio collection according to embodiments of the present invention;
Interface schematic diagram when Fig. 4 shows that the terminal device of an embodiment of the invention carries out audio collection;
Fig. 5 shows after the video and audio file that search out recording that terminal device shows and is labeled with sounding hereof The voiceprint appearance of target and/or the interface schematic diagram at the time point for terminating;
Fig. 6 shows the flow chart that contact person's media library is checked by terminal device of an embodiment of the invention;
Fig. 7 shows the flow chart of recording contact person's sound according to embodiments of the present invention;
Fig. 8 shows overall structure diagram according to an embodiment of the invention;
Fig. 9 shows structural representation according to an embodiment of the invention.
Specific embodiment
Illustrative embodiment of the invention is specifically described referring now to accompanying drawing.However, the present invention can be with many not Implement and should not be construed as limited to the specific implementation method for illustrating here with form;Conversely, there is provided these implementations Be in order that disclosure of the invention is thorough and complete, and intactly passed on to those skilled in the art thought of the invention, idea, Purpose, design, reference scheme and protection domain.Used in the detailed description of the specific illustrative implementation of example in accompanying drawing Term is not meant to limit the present invention.In accompanying drawing, identical label refers to identical element.
Those skilled in the art of the present technique are appreciated that unless expressly stated, singulative " " used herein, " one It is individual ", " described " and " being somebody's turn to do " may also comprise plural form.It is to be further understood that what is used in specification of the invention arranges Diction " including " refer to the presence of the feature, integer, step, operation, element and/or component, but it is not excluded that in the presence of or addition One or more other features, integer, step, operation, element, component and/or their group.It should be understood that when we claim unit Part is " connected " or during " coupled " to another element, and it can be directly connected or coupled to other elements, or can also exist Intermediary element.Additionally, " connection " used herein or " coupling " can include wireless connection or coupling.Wording used herein "and/or" includes one or more associated any cells for listing item and all combines.
Those skilled in the art of the present technique are appreciated that unless otherwise defined, all terms used herein (including technology art Language and scientific terminology) have with art of the present invention in those of ordinary skill general understanding identical meaning.Should also Understand, those terms defined in such as general dictionary should be understood that the meaning having with the context of prior art The consistent meaning of justice, and unless defined as here, will not be with idealizing or excessively formal implication be explained.
As shown in figure 1, the invention provides a kind of method operated to audio/video file based on voiceprint, bag Include following steps:The voiceprint of S1, collection audible target;And S2, according to voiceprint search for audio/video file.
For example, step S1 is realized by the following method:When contact person X1 makes a phone call to user Y, terminal device is opened interior Put phonographic recorder and record the voice (for example, the spoken sounds recorded, time span 7-10 seconds) that one section of contact person X1 individually talks, And therefrom extract voiceprint;Then, after stopping call, terminal device is according to the voiceprint generation speaker model M1 for recording Afterwards, the sample is stored in media library;Then, speaker model is corresponded to terminal device the register of contact person in address list X.
For example, step S1 is also realized by the following method:When user Y bands son X2 goes to park to play, terminal device exists " recording vocal print sample " option is opened in address list in the record of son X2 and the voiceprint of son X2 is recorded;Then, stop After recording, after terminal device is according to the voiceprint generation speaker model M2 for recording, the sample is stored in terminal memory;Connect , speaker model is corresponded to terminal device the file of contact person X2 in media library.It will of course be understood that being, media library is to deposit A kind of statement of storage collection of multimedia documents, it is also possible to be expressed as file, file manager, media manager, video management Device, audio manager etc..As shown in figure 5, the voiceprint for including speaker model M1 and M2, terminal ought be run into again later These videos and audio file are classified and are marked by equipment according to special object (for example, " I " and " son ").In classification After storage, the information such as Subject field, file, the media library of corresponding classification can be generated.
Step S1 can also be achieved by the steps of:A sounding in step S11, elected middle address list application program There is provided during target (for example, Zhang San), on display screen and record vocal print sample option;Step S12, when user click on record vocal print sample After option, terminal device collection voiceprint, and the speaker model storage that will be generated according to voiceprint is in contact person's media In storehouse;And step S13, when enter contact person's media library page after, display screen shows the audio/video file for searching.Cause This, the voiceprint for gathering audible target includes:When certain audible target in choosing, voiceprint is gathered;And storage collection Voiceprint.
Fig. 2 shows that the terminal device of an embodiment of the invention carries out the interface schematic diagram before audio collection. Fig. 3 shows the flow chart of audio collection according to embodiments of the present invention.Audio collection flow comprises the following steps:Step 101: Entry communication is recorded, and opens particular contact on telephone directory.Then, step 102:By " recording vocal print sample " option (such as Fig. 2 institutes Show), record contact person's sound (that is, gathering the voiceprint of contact person).Then, step 103:After the completion of recording, to contact person Sound be modeled, to generate speaker model, and speaker model is saved in associated person information.Therefore, collection and Storage voiceprint includes:Speaker model is generated according to voiceprint;And speaker model storage is being locally stored mould In block.
Fig. 4 shows modeling process according to an embodiment of the present invention.Using voiceprint identification speaker's identity Technology is properly termed as Speaker Identification (Speaker Recognition, SR), and corresponding model is properly termed as speaker model (Speaker Model, SM).Speaker Recognition System is generally modeled using the method for UBM-GMM, i.e., by a large amount of training Audio (a more than speaker) trains a universal background model (Universal Background Model, UBM), then Method by self adaptation on the basis of this UBM is modeled to specific speaker, obtains speaker model (SM).No matter Universal background model or speaker model, generally all using mixed Gauss model (Gaussian Mixture Model, GMM) structure.
Interface schematic diagram when Fig. 4 shows that the terminal device of an embodiment of the invention carries out audio collection.Example Such as, when terminal device records vocal print sample under address book contact interface (as shown in Figure 4), click on addition and record vocal print sample Button can just record contact person's sound.
Further, as shown in figure 3, Application on Voiceprint Recognition flow comprises the following steps:Step 104:Determine audio/video file. Then, step 105:Speaker's segmentation is carried out to the voice in audio/video file, and generates n voice unit, each voice list Unit only includes single speaker's voice.Then, step 106:To each voice unit for being partitioned into (for example, n voice list Unit) carry out contact person's Application on Voiceprint Recognition and judge whether matching.Then, step 107:It it is end if recognition result is matching End equipment sets up the database of corresponding relation between a contact person and this audio/video file.Further, the number of corresponding relation Can be recorded according to storehouse and the audio/video file of contact person's sound occur.Further, the database of corresponding relation can also record connection It is the time point during people's sound appears in audio/video file.That is, map audio/video by time point appearing in accordingly Position in file.
Fig. 6 shows the flow chart that contact person's media library is checked by terminal device of an embodiment of the invention. The flow for checking contact person's media library by terminal device may include steps of:Step 201:Open media library, select into Enter " contact person's media library " menu.Then, step 202:Start to read contact person and audio/video document relationship database.Then, Step 203:Contact person and its correspondence media file and time point 203 are shown after the completion of reading.
Fig. 5 shows after the video and audio file that search out recording that terminal device shows and is labeled with sounding hereof The voiceprint appearance of target and/or the interface schematic diagram at the time point for terminating.For example, opening media library, selection enters " contact People's media library " menu, at this moment checks that the interface of contact person's media library is presented to user.On interface provide through read contact person and Every terms of information after audio/video document relationship database.Therefore, searching for audio/video file according to voiceprint includes:Work as opening When module is locally stored, audio/video file is shown.
Further, as can be seen that having " son " and " I " in the media library of the implementation method from the interface shown in Fig. 5 Two class media files, wherein:Have three time points, i.e., 3 ' 45 in " International Children's Day " project of " son " file ", 18 ' 23 ", 45’34”.These three time points are exactly occur the time point of " son " sound in " International Children's Day " project.For example, user can be with Selection " 3 ' 45 " ", commences play out when at this moment terminal device can enter to go to 45 seconds 3 minutes in " International Children's Day " project automatically. Therefore, the voiceprint of storage collection includes:Classification storage is carried out according to speaker model.Further, according to voiceprint Search audio/video file includes:When opening is locally stored module, audio/video file is shown.Further, the classification bag Include:Classification is carried out according to speaker model to audio/video file to show.Further, the display includes:Display audible target Appear in the time point in audio/video file.Further, the classification includes:Species according to audible target is to audio/video File is searched classifiably.Further, the time point include:When the time point in display of classifying in choosing, broadcasting sound/regard The audio/video of the audible target contained in frequency file.
As shown in figures 1 to 6, another implementation method of the invention, when terminal device to audio/video file according to specific When contact person is classified, it is necessary first to carry out the modeling and storage of vocal print for its emphasis contact person in address list module. The present invention is that each contact person record increases " vocal print sample " field, for storing in terminal device address list module The vocal print sample of contact person.Concrete operation method is:User is newly-built or edits the important relation people (such as " child ") of its concern.With Afterwards, one section of audio of the particular contact (" child ") (for example, recording normal speech, time span 7-10 seconds) is recorded.Terminal Equipment is modeled according to sample sound to particular contact (" the child ") vocal print, and is saved in the address list contact person record In the vocal print sample field of (" child ").Then, user's Record and Save audio/video file on the terminal device.The present invention Important relation people voiceprint analysis can be carried out and be classified according to contact person, Tag Contact's sound time of origin point it is right As.Then, the sound of all speakers being recorded in audio/video file is extracted and is split using speaker's cutting techniques It is multiple voice units, each voice unit only includes the voice of one of speaker.Then, using speaker model to every Individual voice unit carries out Application on Voiceprint Recognition.Then, to the database of storage contact person and audio/video relation after Application on Voiceprint Recognition, it is used for Record contact person and the corresponding relation of audio/video file, and the time point that contact person's sound occurs in this audio/video file. The vocal print that the present invention is mentioned refers to:The sound wave spectrum of user voice is the biological characteristic of the user voice.Compared by vocal print, moved Dynamic terminal can find out the respective objects in the multimedia of storage.Therefore, certain in audible target is contact application During individual contact person, the method for gathering the voiceprint of audible target includes:When being conversed with the contact person, contact person is recorded One section of sound, this section of sound time length 7-10 seconds and the only sound of the contact person above and in this section of sound.Using this Duan Shengyin extracts voiceprint and generates vocal print template.Further, certain in audible target is contact application During contact person, the voiceprint for gathering audible target includes:When being conversed with the contact person, the vocal print letter of contact person is recorded Breath.Further, when audible target is certain contact person in contact application, the voiceprint of audible target is gathered Including:User records contact person's voice manually, records the voiceprint of contact person.Further, when audible target is contact During certain contact person in people's application program, search audio/video file includes:As the contact person in choosing, mapping contact is played The audio/video of people.
Fig. 7 shows the flow chart of recording contact person's sound according to embodiments of the present invention.Record the stream of contact person's sound Journey includes:Step 301:Open certain contact person on address list.Then, step 302:Judge whether it is to record for the first time.
When judged result is when recording for the first time, into step 303:Start to record.Then, step 304:After the completion of recording Preserve this audio.Then, step 305:Vocal print modeling is carried out to the audio.Then, step 306:Preserve vocal print modeling information.Connect , step 307:Existing audio/video file is recognized with this voiceprint.Then, step 308:The file that will identify that and time Point is saved in contact person and audio/video relational database.Finally, step 309:Vocal print records end-of-job.
When judged result is not when recording for the first time, then into step 310:Determine whether whether prompting records again. If necessary to record again, then into step 311:Delete original recording file.After deleting original recording file, then into step 303.Above-mentioned steps 303 to 309 are then performed successively.If need not record again, do not record, process terminates (309).
Another implementation method of the invention, one kind is entered based on sound groove recognition technology in e to video on terminal device and audio One of row classification and the method for mark, comprise the following steps:Contact person's sound is recorded to shift to an earlier date voiceprint.Then, by sound/regard Frequency file carries out speaker's segmentation, is divided into multiple voice units, and each voice unit comprises only the voice of speaker, Application on Voiceprint Recognition is carried out one by one to these voice units.Then, recognition result is saved in contact person and audio/video relational database In.When contact person's media library is entered, or carried out in any media library of terminal device or file manager as user " according to When contact categories " or " according to searching contact person " are operated, or contact person's phase is directly viewable in contact application When closing audio frequency and video, read the relational database of contact person and audio/video and show their relation.The present invention not only may be used With so that the relation of contact person and audio/video is shown in the way of a certain menu item in media library, it is also possible in contact person or text Shown with menu-style in part manager.
Further, another implementation method of the invention, in terminal device media library, contact manager, file In the application programs such as manager, selection " according to contact categories " or " according to searching contact person " come carry out audio, video point Class shows and searches.Further, another implementation method of the invention, can be directly viewable in contact application The related audio/video of the contact person.
Therefore, the method operated to audio/video file based on voiceprint that the present invention is provided can be according to specific The voiceprint of contact person is classified to audio/video file.Therefore, when user want to find the sound for including particular contact/ Video file, it is not necessary to which the broadcasting of file one by one is checked, but directly passes through media library, contact manager, file management Device display information is selected, so as to facilitate user to search the file containing specific people's sound or video.Further, this hair The method operated to audio/video file based on voiceprint of bright offer can jump directly to certain connection in audio/video It is that the timing node that people speaks is played out, so as to provide the search efficiency of user.
As shown in figure 8, overall plan of the invention is properly termed as using the technology of voiceprint identification speaker's identity Words people identification (Speaker Recognition, SR), corresponding model be properly termed as speaker model (Speaker Model, SM).Speaker Recognition System is generally modeled using the method for UBM-GMM, i.e., by a large amount of training audios, (more than one is said Words people) one universal background model (Universal Background Model, UBM) of training, then on the basis of this UBM Specific speaker is modeled by the method for self adaptation, obtains speaker model (SM).Either universal background model Or speaker model, generally all uses mixed Gauss model (Gaussian Mixture Model, GMM) structure.Such as Fig. 8 institutes Show, the method operated to audio/video file based on voiceprint that the present invention is provided can include:Modeling process, identification Process.Modeling process may comprise steps of:Step 1:Training audio;Step 2:Jing Yin detection;Step 3:Voice is split;Step Rapid 4:Feature extraction;Step 5:Intersection self adaptation is carried out according to universal background model;Step 6:Generation speaker model;Step 7: Z-norm treatment is carried out based on personator's audio;Step 8:Normalization speaker model.Identification process may comprise steps of: Step 1:Detect audio to be identified;Step 2:Jing Yin detection;Step 3:Voice is split;Step 4:Feature extraction;Step 5:According to Normalization speaker model carries out score calculating;Step 6:T-norm treatment is carried out based on personator's audio;Step 7:Judgement;Step Rapid 8:Output recognition result.Wherein:Normalization speaker model and personator's model group into speaker model.It is of the invention One implementation method, the modeling process of speaker model can be described generally as following several stages:1st, feature extraction phases:Utilize Jing Yin detection technique (Voice Activity Detection, VAD), effective voice is detected from input audio, And audio segmentation will be input into some voices according to the Jing Yin length between voice, then carried from each voice for splitting Phonetic feature required for taking Speaker Identification;2nd, the UBM modelling phases:Using a large amount of phonetic features from training audio extraction, Calculate universal background model (UBM);3rd, the SM modelling phases:Voice using universal background model and a small amount of speaker dependent is special Levy, the model (SM) of the speaker is calculated by adaptive approach;4th, the SM normalization stages:In order to strengthen the anti-of speaker model Interference performance, completes after speaker model modeling, is frequently utilized that the phonetic feature of some personation speakers to speaker model (Normalization) operation is normalized, the speaker model (Normalized SM) after normalization is finally given.Root According to one embodiment of the present invention, the identification process of Speaker Identification can be described generally as following several stages:1st, feature is carried Take the stage:This stage is identical with the feature extraction phases of modeling process;2nd, score calculation stages:Using speaker model, calculate It is input into the score of phonetic feature;3rd, the Score Normalization stage:Using normalized speaker model, to score obtained in the previous step It is normalized, and makes conclusive judgement.Furthermore, in modeling as described above and identification process, part steps There can be different implementation methods:1st, the Jing Yin detection technique of feature extraction phases:The application use method be first with The energy information and fundamental frequency information of audio are input into, are distinguished Jing Yin with non-mute, recycle a SVMs (Support Vector Machine, SVM) model distinguishes the voice and non-voice of non-mute part.Voice is determined Part, it is possible to according to the gap length between voice segments, input audio is divided into some voices;2nd, using common background Model calculates the adaptive approach of speaker model:The application uses eigentones (Eigenvoice) method, and constraint is maximum Likelihood linear regression (Constrained Maximum Likelihood Linear Regression, CMLLR) method and The method that structuring maximum a posteriori probability (Structured Maximum A Posterior, SMAP) method is combined;3rd, say Words people's model method for normalizing:The application uses Z-Norm methods;4th, score normalization:The application uses T- Norm methods.The method for normalizing that Z-Norm and T-Norm methods are combined is most popular in speaker Recognition Technology at present Method for normalizing, the former is used for the modelling phase, and the latter is used for cognitive phase.
As shown in figure 9, another object of the present invention is to provide a kind of terminal device, including:Voiceprint extraction module, is used for Gather the voiceprint of audible target;And performing module, for searching for audio/video file according to voiceprint.
Further, voiceprint extraction module includes:Voiceprint collecting unit, for being adopted when certain audible target is chosen Collection voiceprint;Vocal print sample generation unit, for generating speaker model according to voiceprint.
Further, device also includes:Memory module, the voiceprint for storing collection.
Further, memory module is additionally operable to:Storage vocal print template sample.
Further, voiceprint extraction module includes:Target classification unit, classification storage is carried out according to speaker model.
Further, device also includes:Display, when opening is locally stored module, shows audio/video file.
Further, display is used for:Audio/video file is entered according to the species that target classification unit is based on audible target Row classification display.
Further, display is used for:Display audible target appears in the time point in audio/video file.
Further, target classification unit is additionally operable to:Species according to audible target carries out classification and searches to audio/video file Rope.
Further, performing module is additionally operable to:When the time point in display of classifying in choosing, in broadcasting audio/video file The audio/video of the audible target for containing.
Further, when audible target is certain contact person in contact application, voiceprint extraction module is used for: When being conversed with the contact person, the voiceprint of contact person is recorded.
Further, when audible target is certain contact person in contact application, voiceprint extraction module is used for: User records contact person's voice manually, records the voiceprint of contact person.
Further, when audible target is certain contact person in contact application, performing module is additionally operable to:When When choosing the contact person, the audio/video of mapping contact person is played.
The method and apparatus that the present invention is provided, can quickly search the file recorded and have specific people's sound or video, with Improve the search efficiency of user.
During those skilled in the art of the present technique are appreciated that the present invention can be related to for performing operation described herein One or more equipment of operation.The equipment can be for needed for purpose and specially design and manufacture, or can also include Known device in all-purpose computer, the all-purpose computer is activated or reconstructed with having procedure Selection of the storage in it.This The computer program of sample can be stored in equipment (for example, computer) computer-readable recording medium or store be suitable to storage electronics refer to Make and be coupled to respectively in any kind of medium of bus, the computer-readable medium is including but not limited to any kind of Disk (including floppy disk, hard disk, CD, CD-ROM and magneto-optic disk), immediately memory (RAM), read-only storage (ROM), electricity can be compiled Journey ROM, electrically erasable ROM (EPROM), electrically erasable ROM (EEPROM), flash memory, magnetic card or light card.It is readable Medium is included for by any mechanism of the readable form storage of equipment (for example, computer) or transmission information.For example, readable Medium include immediately memory (RAM), read-only storage (ROM), magnetic disk storage medium, optical storage medium, flash memory device, with Signal (such as carrier wave, infrared signal, data signal) that electricity, light, sound or other forms are propagated etc..
Those skilled in the art of the present technique are appreciated that method above with reference to implementation of the invention, method, are Invention has been described for the structure chart and/or block diagram and/or flow graph of system and computer program product.It should be understood that can With each frame and these structure charts during these structure charts and/or block diagram and/or flow graph are realized with computer program instructions And/or the combination of the frame in block diagram and/or flow graph.These computer program instructions can be supplied to all-purpose computer, specialty The processor of computer or other programmable data processing methods generates machine, so as to pass through computer or other programmable numbers The instruction performed according to the processor of processing method creates the frame or many for realizing structure chart and/or block diagram and/or flow graph The method specified in individual frame.
Those skilled in the art of the present technique are appreciated that in various operations, method, the flow discussed in the present invention Step, measure, scheme can be replaced, changed, combined or deleted.Furthermore, with having discussed in the present invention Other steps, measure in various operations, method, flow, scheme can also be replaced, changed, reset, decomposed, combined or deleted Remove.Furthermore, it is of the prior art with various operations, method, the flow disclosed in the present invention in step, arrange Apply, scheme can also be replaced, changed, reset, decompose, combines or be deleted.
Illustrative embodiment of the invention is disclosed in drawing and description.Despite the use of particular term, but it Be only used for general and description meaning, and be not for purposes of limitation.It should be pointed out that general for the art For logical technical staff, under the premise without departing from the principles of the invention, some improvements and modifications can also be made, these improve and Retouching also should be regarded as protection scope of the present invention.Protection scope of the present invention should be limited with claims of the present invention.

Claims (23)

1. a kind of method operated to audio/video file based on voiceprint, it is characterised in that comprise the following steps:
Gather the voiceprint of audible target;And
Audio/video file is searched for according to the voiceprint, terminal device shows the sound for being labeled with audible target hereof The time point that line information occurs and/or terminates;
Wherein, all sound being recorded in the audio/video file are divided into multiple voice units, each voice unit The voice of one of audible target is only included, and records time point of the audible target in the audio/video file, led to Cross the position that the time point mapping audio/video is appeared in corresponding document.
2. method according to claim 1, it is characterised in that the voiceprint of the collection audible target includes:
When certain audible target in choosing, voiceprint is gathered;And
Store the voiceprint of collection.
3. method according to claim 2, it is characterised in that collection and storage voiceprint include:
Speaker model is generated according to the voiceprint;And
By speaker model storage in module is locally stored.
4. method according to claim 3, it is characterised in that the voiceprint of the storage collection includes:
Classification storage is carried out according to the speaker model.
5. method according to claim 3, it is characterised in that searching for audio/video file according to the voiceprint includes:
When module is locally stored described in opening, the audio/video file is shown.
6. method according to claim 4, it is characterised in that the classification includes:
Classification is carried out according to the speaker model to audio/video file to show.
7. method according to claim 6, it is characterised in that the classification includes:
Species according to the audible target is searched classifiably to audio/video file.
8. method according to claim 6, it is characterised in that the time point includes:
When the time point in display of classifying in choosing, from the time point commence play out the audio/video file in contain The audio/video of the audible target.
9. method according to claim 1, it is characterised in that certain in the audible target is contact application During individual contact person, the voiceprint of the collection audible target includes:
When being conversed with the contact person, the voiceprint of the contact person is recorded.
10. method according to claim 1, it is characterised in that in the audible target is contact application During certain contact person, the voiceprint of the collection audible target includes:
User records contact person's voice manually, records the voiceprint of the contact person.
11. methods according to claim 1, it is characterised in that in the audible target is contact application During certain contact person, the search audio/video file includes:
As the contact person in choosing, the audio/video of the mapping contact person is played.
A kind of 12. terminal devices, it is characterised in that including:
Voiceprint extraction module, the voiceprint for gathering audible target;And
Performing module, for searching for audio/video file according to the voiceprint;
Display, the time point that the voiceprint of audible target occurs and/or terminates is labeled with for showing hereof;
Wherein, all sound being recorded in the audio/video file are divided into multiple voice units, each voice unit The voice of one of audible target is only included, and records time point of the audible target in the audio/video file, led to Cross the position that the time point mapping audio/video is appeared in corresponding document.
13. terminal devices according to claim 12, it is characterised in that the voiceprint extraction module includes:
Voiceprint collecting unit, for gathering voiceprint when certain audible target is chosen;
Vocal print sample generation unit, for generating speaker model according to the voiceprint.
14. terminal devices according to claim 13, it is characterised in that also include:
Memory module, the voiceprint for storing collection.
15. terminal devices according to claim 14, it is characterised in that the memory module is additionally operable to:Storage is stated Words people's model.
16. terminal device according to claim 13 or 15, it is characterised in that the voiceprint extraction module includes:
Target classification unit, classification storage is carried out according to the speaker model.
17. terminal devices according to claim 14, it is characterised in that the display, when opening is locally stored module When, show the audio/video file.
18. terminal devices according to claim 16, it is characterised in that the display is used for:
The species for being based on the audible target according to the target classification unit carries out classification and shows to the audio/video file.
19. terminal devices according to claim 18, it is characterised in that the target classification unit is additionally operable to:
Species according to audible target is searched classifiably to audio/video file.
20. terminal devices according to claim 18, it is characterised in that the performing module is additionally operable to:
When the time point in display of classifying in choosing, from the time point commence play out the audio/video file in contain The audio/video of the audible target.
21. terminal devices according to claim 12, it is characterised in that when the audible target is contact application In certain contact person when, the voiceprint extraction module is used for:
When being conversed with the contact person, the voiceprint of the contact person is recorded.
22. terminal devices according to claim 12, it is characterised in that when the audible target is contact application In certain contact person when, the voiceprint extraction module is used for:
User records contact person's voice manually, records the voiceprint of the contact person.
23. terminal devices according to claim 12, it is characterised in that when the audible target is contact application In certain contact person when, the performing module is additionally operable to:
As the contact person in choosing, the audio/video of the mapping contact person is played.
CN201210518118.4A 2012-12-05 2012-12-05 Based on the method and device that voiceprint is operated to audio/video file Active CN103035247B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201710439537.1A CN107274916B (en) 2012-12-05 2012-12-05 Method and device for operating audio/video file based on voiceprint information
CN201210518118.4A CN103035247B (en) 2012-12-05 2012-12-05 Based on the method and device that voiceprint is operated to audio/video file

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210518118.4A CN103035247B (en) 2012-12-05 2012-12-05 Based on the method and device that voiceprint is operated to audio/video file

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN201710439537.1A Division CN107274916B (en) 2012-12-05 2012-12-05 Method and device for operating audio/video file based on voiceprint information

Publications (2)

Publication Number Publication Date
CN103035247A CN103035247A (en) 2013-04-10
CN103035247B true CN103035247B (en) 2017-07-07

Family

ID=48022078

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201710439537.1A Active CN107274916B (en) 2012-12-05 2012-12-05 Method and device for operating audio/video file based on voiceprint information
CN201210518118.4A Active CN103035247B (en) 2012-12-05 2012-12-05 Based on the method and device that voiceprint is operated to audio/video file

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN201710439537.1A Active CN107274916B (en) 2012-12-05 2012-12-05 Method and device for operating audio/video file based on voiceprint information

Country Status (1)

Country Link
CN (2) CN107274916B (en)

Families Citing this family (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109117665A (en) * 2013-08-14 2019-01-01 华为终端(东莞)有限公司 Realize method for secret protection and device
CN104123115B (en) * 2014-07-28 2017-05-24 联想(北京)有限公司 Audio information processing method and electronic device
CN104243934A (en) * 2014-09-30 2014-12-24 智慧城市信息技术有限公司 Method and device for acquiring surveillance video and method and device for retrieving surveillance video
TWI571120B (en) * 2014-10-06 2017-02-11 財團法人資訊工業策進會 Video capture system and video capture method thereof
CN104268279B (en) * 2014-10-16 2018-04-20 魔方天空科技(北京)有限公司 The querying method and device of corpus data
CN105828179A (en) * 2015-06-24 2016-08-03 维沃移动通信有限公司 Video positioning method and device
CN105022263B (en) * 2015-07-28 2018-03-27 广东欧珀移动通信有限公司 A kind of method and intelligent watch for controlling intelligent watch
CN106548793A (en) * 2015-09-16 2017-03-29 中兴通讯股份有限公司 Storage and the method and apparatus for playing audio file
CN105635452B (en) * 2015-12-28 2019-05-10 努比亚技术有限公司 Mobile terminal and its identification of contacts method
CN105654942A (en) * 2016-01-04 2016-06-08 北京时代瑞朗科技有限公司 Speech synthesis method of interrogative sentence and exclamatory sentence based on statistical parameter
CN106095764A (en) * 2016-03-31 2016-11-09 乐视控股(北京)有限公司 A kind of dynamic picture processing method and system
CN106448683A (en) * 2016-09-30 2017-02-22 珠海市魅族科技有限公司 Method and device for viewing recording in multimedia files
CN107452408B (en) * 2017-07-27 2020-09-25 成都声玩文化传播有限公司 Audio playing method and device
CN108305636B (en) 2017-11-06 2019-11-15 腾讯科技(深圳)有限公司 A kind of audio file processing method and processing device
CN108074574A (en) * 2017-11-29 2018-05-25 维沃移动通信有限公司 Audio-frequency processing method, device and mobile terminal
CN108364663A (en) * 2018-01-02 2018-08-03 山东浪潮商用系统有限公司 A kind of method and module of automatic recording voice
CN108364654B (en) * 2018-01-30 2020-10-13 网易乐得科技有限公司 Voice processing method, medium, device and computing equipment
CN108319371A (en) * 2018-02-11 2018-07-24 广东欧珀移动通信有限公司 Play control method and related product
CN108920619A (en) * 2018-06-28 2018-11-30 Oppo广东移动通信有限公司 File display method and device, storage medium and electronic equipment
CN109446356A (en) * 2018-09-21 2019-03-08 深圳市九洲电器有限公司 A kind of multimedia document retrieval method and device
CN111091844A (en) * 2018-10-23 2020-05-01 北京嘀嘀无限科技发展有限公司 Video processing method and system
CN111462761A (en) * 2020-03-03 2020-07-28 深圳壹账通智能科技有限公司 Voiceprint data generation method and device, computer device and storage medium
CN111883139A (en) * 2020-07-24 2020-11-03 北京字节跳动网络技术有限公司 Method, apparatus, device and medium for screening target speech
CN112153461B (en) * 2020-09-25 2022-11-18 北京百度网讯科技有限公司 Method and device for positioning sound production object, electronic equipment and readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1156871A (en) * 1995-11-17 1997-08-13 雅马哈株式会社 Personal information database system
CN1307589C (en) * 2001-04-17 2007-03-28 皇家菲利浦电子有限公司 Method and apparatus of managing information about a person
CN102238189A (en) * 2011-08-01 2011-11-09 安徽科大讯飞信息科技股份有限公司 Voiceprint password authentication method and system
CN102404278A (en) * 2010-09-08 2012-04-04 盛乐信息技术(上海)有限公司 Song requesting system based on voiceprint recognition and application method thereof

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6345252B1 (en) * 1999-04-09 2002-02-05 International Business Machines Corporation Methods and apparatus for retrieving audio information using content and speaker information
US8606579B2 (en) * 2010-05-24 2013-12-10 Microsoft Corporation Voice print identification for identifying speakers
CN102347060A (en) * 2010-08-04 2012-02-08 鸿富锦精密工业(深圳)有限公司 Electronic recording device and method
CN102655002B (en) * 2011-03-01 2013-11-27 株式会社理光 Audio processing method and audio processing equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1156871A (en) * 1995-11-17 1997-08-13 雅马哈株式会社 Personal information database system
CN1307589C (en) * 2001-04-17 2007-03-28 皇家菲利浦电子有限公司 Method and apparatus of managing information about a person
CN102404278A (en) * 2010-09-08 2012-04-04 盛乐信息技术(上海)有限公司 Song requesting system based on voiceprint recognition and application method thereof
CN102238189A (en) * 2011-08-01 2011-11-09 安徽科大讯飞信息科技股份有限公司 Voiceprint password authentication method and system

Also Published As

Publication number Publication date
CN107274916B (en) 2021-08-20
CN103035247A (en) 2013-04-10
CN107274916A (en) 2017-10-20

Similar Documents

Publication Publication Date Title
CN103035247B (en) Based on the method and device that voiceprint is operated to audio/video file
US10977299B2 (en) Systems and methods for consolidating recorded content
WO2020211354A1 (en) Speaker identity recognition method and device based on speech content, and storage medium
US20160283185A1 (en) Semi-supervised speaker diarization
US9058384B2 (en) System and method for identification of highly-variable vocalizations
CN105845129A (en) Method and system for dividing sentences in audio and automatic caption generation method and system for video files
CN103530432A (en) Conference recorder with speech extracting function and speech extracting method
JP2007519987A (en) Integrated analysis system and method for internal and external audiovisual data
CN106531159B (en) A mobile phone source identification method based on the spectral characteristics of equipment noise floor
Khan et al. A novel audio forensic data-set for digital multimedia forensics
CN113903363B (en) Violation behavior detection method, device, equipment and medium based on artificial intelligence
CN108831456B (en) Method, device and system for marking video through voice recognition
CN110136726A (en) A kind of estimation method, device, system and the storage medium of voice gender
CN107507626A (en) A kind of mobile phone source title method based on voice spectrum fusion feature
CN106302987A (en) A kind of audio frequency recommends method and apparatus
CN109635151A (en) Establish the method, apparatus and computer equipment of audio retrieval index
Pandey et al. Cell-phone identification from audio recordings using PSD of speech-free regions
CN109817223A (en) Phoneme marking method and device based on audio fingerprints
Baskoro et al. Analysis of voice changes in anti forensic activities case study: voice changer with telephone effect
Hajihashemi et al. Novel time-frequency based scheme for detecting sound events from sound background in audio segments
US9430800B2 (en) Method and apparatus for trade interaction chain reconstruction
CN118284932A (en) Method and apparatus for performing speaker segmentation clustering on mixed bandwidth speech signals
Li et al. BlackFeather: A framework for background noise forensics
Cornaggia-Urrigshardt et al. SCALA-Speech: An interactive system for finding and analyzing speech content in audio data
Cano et al. Robust sound modelling for song identification in broadcast audio

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant