[go: up one dir, main page]

CN113762110A - Law enforcement instant evidence fixing method and law enforcement instrument - Google Patents

Law enforcement instant evidence fixing method and law enforcement instrument Download PDF

Info

Publication number
CN113762110A
CN113762110A CN202110974278.9A CN202110974278A CN113762110A CN 113762110 A CN113762110 A CN 113762110A CN 202110974278 A CN202110974278 A CN 202110974278A CN 113762110 A CN113762110 A CN 113762110A
Authority
CN
China
Prior art keywords
law enforcement
voiceprint
image
network model
face
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110974278.9A
Other languages
Chinese (zh)
Other versions
CN113762110B (en
Inventor
肖炯恩
丁丽萍
黄昭颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Zhongke Shishu Technology Co ltd
Original Assignee
Guangzhou Institute of Software Application Technology Guangzhou GZIS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Institute of Software Application Technology Guangzhou GZIS filed Critical Guangzhou Institute of Software Application Technology Guangzhou GZIS
Priority to CN202110974278.9A priority Critical patent/CN113762110B/en
Publication of CN113762110A publication Critical patent/CN113762110A/en
Application granted granted Critical
Publication of CN113762110B publication Critical patent/CN113762110B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • Health & Medical Sciences (AREA)
  • Strategic Management (AREA)
  • Theoretical Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Economics (AREA)
  • Multimedia (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Technology Law (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Image Analysis (AREA)
  • Collating Specific Patterns (AREA)

Abstract

本发明公开了一种执法即时证据固定方法及执法仪,该方法包括:采集执法场景的图像或音视频文件;采用VGG‑16卷积神经网络的人脸识别技术对图像或音视频文件中的人脸进行识别,生成图像哈希值;采用FBN‑Alexnet网络小样本声纹识别技术对图像或音视频文件中的声纹进行识别,生成声纹哈希值;将针对所述图像或音视频文件自动计算生成的文件哈希值、授时中心的可信时间、地理位置信息、执法设备信息和执法人员信息五个要素通过网络自动上传部署了区块链的实证存证云中,同时将生成实证存证云固证报告链接的二维码进行保存;所述文件哈希值包括图像哈希值和声纹哈希值。本发明结合图片和声纹的分析,确保执法仪的证据可靠性和唯一性。

Figure 202110974278

The invention discloses a law enforcement instant evidence fixing method and a law enforcement instrument. The method includes: collecting images or audio and video files of law enforcement scenes; Face recognition to generate image hash values; FBN‑Alexnet network small sample voiceprint recognition technology is used to identify voiceprints in images or audio and video files to generate voiceprint hash values; The five elements of the file hash value generated by the automatic calculation of the file, the trusted time of the timing center, the geographical location information, the law enforcement equipment information and the law enforcement personnel information are automatically uploaded through the network to the empirical evidence cloud where the blockchain is deployed. The two-dimensional code linked to the empirical evidence cloud solidification report is saved; the file hash value includes the image hash value and the voiceprint hash value. The invention combines the analysis of pictures and voiceprints to ensure the reliability and uniqueness of the evidence of the law enforcement instrument.

Figure 202110974278

Description

Law enforcement instant evidence fixing method and law enforcement instrument
Technical Field
The invention relates to the technical field of intelligent law enforcement, in particular to a law enforcement instant evidence fixing method and a law enforcement instrument.
Background
At present, a law enforcement instrument used by law enforcement personnel in a primary law enforcement department only has basic recording functions of photographing, video recording, audio recording and the like, and a certain distance still exists between an electronic file recorded by the law enforcement instrument and an electronic evidence meeting legal requirements. In the lawsuit process, law enforcement units may be required to prove that the submitted evidence video is the original video recorded by the law enforcement unit and not tampered, while the traditional law enforcement unit cannot provide reliable proof basis to explain the conditions of real time, real place and the like and hash check value during recording, and the evidence can be proved by complex technical means such as video editing analysis, metadata analysis and the like, and sometimes cannot be proved, so that the law enforcement units are in an embarrassing status in court quality certificates. Therefore, it is necessary to improve the conventional law enforcement instrument, so that the law enforcement instrument can intelligently obtain various information closely related to law enforcement scenes, such as hash values of recorded electronic files, trusted time of a time service center, geographical location information, law enforcement equipment information, law enforcement personnel information and the like, when recording, and utilize the information to fix instant evidence after the recording of the law enforcement instrument is completed, thereby completing the spanning from common electronic files to electronic evidence meeting legal requirements.
Disclosure of Invention
In view of the above, in order to solve the above problems in the prior art, the present invention provides a law enforcement instant evidence fixing method and a law enforcement instrument, which combine the analysis of pictures and voiceprints to ensure the evidence reliability and uniqueness of the law enforcement instrument.
The invention solves the problems through the following technical means:
in one aspect, the invention provides a law enforcement instant evidence fixing method, which comprises the following steps:
collecting images or audio/video files of a law enforcement scene in the law enforcement process;
adopting a face recognition technology of a VGG-16 convolutional neural network to recognize the face in the image or the audio/video file and generate an image hash value;
identifying the voiceprint in the image or audio/video file by adopting an FBN-Alexnet network small sample voiceprint identification technology to generate a voiceprint hash value;
automatically uploading five elements of a file hash value, trusted time of a time service center, geographical position information, law enforcement equipment information and law enforcement personnel information which are generated by automatically calculating the image or audio/video file to an evidence storing cloud with a block chain through a network, and simultaneously storing a two-dimensional code which is linked with an evidence storing cloud evidence fixing report; the file hash value comprises an image hash value and a voice print hash value.
Preferably, the recognizing the face in the image or the audio/video file by using the face recognition technology of the VGG-16 convolutional neural network specifically comprises:
constructing a VGG-16 convolutional neural network model, and training the VGG-16 convolutional neural network model;
acquiring a face picture in an image or audio/video file;
preprocessing the face picture, including face detection and face alignment correction;
extracting human face features by adopting a trained VGG-16 convolutional neural network model, and screening the features through three full-connection layers after feature extraction, nonlinear mapping and feature dimension reduction of 5 units of the VGG-16 convolutional neural network model, so as to further reduce feature dimensions; and finally, recognizing the human face by adopting a classifier.
Preferably, the Aadaboost + Haar feature method is adopted for face detection, and affine transformation is adopted for face alignment correction.
Preferably, the building of the VGG-16 convolutional neural network model and the training of the VGG-16 convolutional neural network model comprises:
acquiring a certain number of human face pictures, and preprocessing the human face pictures to be used as a training data set of a VGG-16 convolutional neural network model; preprocessing comprises data enhancement, face detection alignment cutting, data format conversion and picture mean calculation;
and constructing a VGG-16 convolutional neural network model, and training the VGG-16 convolutional neural network model by adopting a training data set, wherein the training comprises network layer modification, network parameter modification and network model training.
Preferably, the identifying the voiceprint in the image or audio/video file by using the FBN-Alexnet network small sample voiceprint identification technology specifically includes:
inputting an original voice signal of a small sample to obtain a spectrogram;
an image increasing algorithm based on a convex lens imaging principle is adopted, and more training data are obtained by changing the size of a spectrogram;
training an FBN-Alexnet network model by using voiceprint data, wherein the training comprises extracting voiceprint characteristics by a convolutional layer, accelerating network convergence by the FBN, reducing the calculation complexity by a pooling layer and carrying out voiceprint classification by a full connection layer;
and acquiring voiceprint data in the image or audio/video file, and identifying the voiceprint data by adopting a trained FBN-Alexnet network model.
In another aspect, the present invention provides a law enforcement instrument comprising:
the image audio/video acquisition module is used for acquiring images or audio/video files of a law enforcement scene in the law enforcement process;
the image hash value generation module is used for identifying the face in the image or the audio/video file by adopting the face identification technology of the VGG-16 convolutional neural network to generate an image hash value;
the voiceprint hash value generation module is used for identifying the voiceprint in the image or the audio/video file by adopting an FBN-Alexnet network small sample voiceprint identification technology to generate a voiceprint hash value;
the two-dimensional code link generation module is used for automatically uploading five elements, namely a file hash value generated by automatic calculation aiming at the image or audio/video file, trusted time of a time service center, geographical position information, law enforcement equipment information and law enforcement personnel information, into an evidence deposit cloud with a block chain deployed through a network, and storing a two-dimensional code generating an evidence deposit cloud evidence fixation report link; the file hash value comprises an image hash value and a voice print hash value.
Preferably, the image hash value generation module includes:
the neural network model training unit is used for constructing the VGG-16 convolutional neural network model and training the VGG-16 convolutional neural network model;
the face picture acquisition unit is used for acquiring a face picture in an image or audio/video file;
the image preprocessing unit is used for preprocessing the face image, and comprises face detection and face alignment correction;
the face recognition unit is used for extracting face features by adopting the trained VGG-16 convolutional neural network model, and screening the features through three full-connection layers after feature extraction, nonlinear mapping and feature dimension reduction of 5 units of the VGG-16 convolutional neural network model, so that feature dimensions are further reduced; and finally, recognizing the human face by adopting a classifier.
Preferably, the Aadaboost + Haar feature method is adopted for face detection, and affine transformation is adopted for face alignment correction.
Preferably, the neural network model training unit includes:
the training data set acquisition subunit is used for acquiring a certain number of face pictures, preprocessing the face pictures and using the face pictures as a training data set of the VGG-16 convolutional neural network model; preprocessing comprises data enhancement, face detection alignment cutting, data format conversion and picture mean calculation;
and the neural network model training subunit is used for constructing the VGG-16 convolutional neural network model, and training the VGG-16 convolutional neural network model by adopting a training data set, wherein the training comprises network layer modification, network parameter modification and network model training.
Preferably, the voiceprint hash value generation module includes:
the small sample input unit is used for inputting an original voice signal of a small sample to obtain a spectrogram;
the training data acquisition module is used for acquiring more training data by changing the size of a spectrogram by adopting an image increasing algorithm of a convex lens imaging principle;
the FBN network model training module is used for training an FBN-Alexnet network model by adopting voiceprint data, and comprises a convolutional layer for extracting voiceprint characteristics, an FBN accelerating network convergence, a pooling layer for reducing the calculation complexity and a full connection layer for voiceprint classification;
and the voiceprint recognition module is used for acquiring voiceprint data in the image or audio/video file and recognizing the voiceprint data by adopting a trained FBN-Alexnet network model.
Compared with the prior art, the invention has the beneficial effects that at least:
1) when the law enforcement instrument records, various information closely related to law enforcement scenes, such as the hash value of the recorded electronic file, the trusted time of a time service center, geographical location information, law enforcement equipment information, law enforcement personnel information and the like, can be intelligently obtained, the instant evidence fixation after the recording of the law enforcement instrument is finished is realized by utilizing the information, and the spanning from the common electronic file to the electronic evidence meeting the legal requirements is finished.
2) And the reliability and uniqueness of the evidence in the law enforcement process of the law enforcement instrument are ensured by combining the analysis of the picture and the voiceprint.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flow chart of a law enforcement instant evidence immobilization method of the present invention;
FIG. 2 is a flow chart of the face picture analysis of the present invention;
FIG. 3 is a flow chart of the voiceprint analysis of the present invention;
FIG. 4 is a schematic diagram of the construction of a law enforcement instrument of the present invention;
FIG. 5 is a block diagram of an image hash value generation module according to the present invention;
FIG. 6 is a schematic structural diagram of a voiceprint hash value generation module according to the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below. It should be noted that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments, and all other embodiments obtained by those skilled in the art without any inventive work based on the embodiments of the present invention belong to the protection scope of the present invention.
Example 1
As shown in fig. 1, the present invention provides a law enforcement instant evidence fixing method, comprising the following steps:
s1, collecting the image or audio/video file of the law enforcement scene in the law enforcement process; the invention adopts a Zynq-7000 platform-based high-definition video acquisition processing system, and realizes high-definition CMOS image acquisition, image preprocessing on FPGA and video image caching by deeply analyzing the Zynq-7000 platform basic characteristics and the whole frame of high-definition video acquisition processing.
S2, recognizing the face in the image or audio/video file by adopting a face recognition technology of a VGG-16 convolutional neural network to generate an image hash value;
s3, identifying the voiceprint in the image or audio/video file by adopting an FBN-Alexnet network small sample voiceprint identification technology to generate a voiceprint hash value;
s4, automatically uploading five elements of a file hash value, trusted time of a time service center, geographical location information, law enforcement equipment information and law enforcement personnel information generated by automatic calculation aiming at the image or audio/video file through a network into an evidence storing cloud with a block chain deployed, and simultaneously storing a two-dimensional code generating an evidence storing cloud evidence fixing report link; the file hash value comprises an image hash value and a voice print hash value.
As shown in fig. 2, in step S2, the recognizing the face in the image or the audio/video file by using the face recognition technology of the VGG-16 convolutional neural network specifically includes:
s21, constructing a VGG-16 convolutional neural network model, and training the VGG-16 convolutional neural network model;
s22, acquiring a face picture in the image or audio/video file;
s23, preprocessing the face picture, including face detection and face alignment correction; preferably, the Aadaboost + Haar characteristic method is adopted for face detection, and affine transformation is adopted for face alignment correction;
s24, extracting the face features by adopting the trained VGG-16 convolutional neural network model, and screening the features through three full-connection layers after feature extraction, nonlinear mapping and feature dimension reduction of 5 units of the VGG-16 convolutional neural network model, so as to further reduce feature dimensions; and finally, identifying the human face by adopting a Softmax classifier.
In step S21, constructing the VGG-16 convolutional neural network model, and training the VGG-16 convolutional neural network model includes:
s211, acquiring a certain number of face pictures, preprocessing the face pictures, and using the preprocessed face pictures as a training data set of the VGG-16 convolutional neural network model; preprocessing comprises data enhancement, face detection alignment cutting, data format conversion and picture mean calculation;
s212, constructing a VGG-16 convolutional neural network model, and training the VGG-16 convolutional neural network model by adopting a training data set, wherein the training comprises network layer modification, network parameter modification and network model training.
Voiceprint recognition is divided into two techniques, speaker recognition and speaker verification. In any technique, the voiceprint of the speaker is collected, digitized and modeled. After the voiceprint collection of the public society is collected in the whole sample, when the voiceprint inspection material of the case-related sound source is obtained, the voiceprint inspection material can be automatically compared with the voiceprint collection of the whole sample, and the real identity of the suspect can be instantly locked. Because the voiceprint can be remotely sampled and identified, the method has incomparable natural advantages for detecting non-contact cases. Voiceprint recognition currently faces the following challenges: the voice time-varying property affects the voiceprint recognition, the stability of the voice is relatively poorer than the stability of biological characteristics such as human faces, fingerprints and the like, and the voice of one person can be changed due to factors such as different voice-varying periods, pathological changes, trauma, recording conditions, different speech environments and the like, so that the stability of the voice is reduced; the influence of cross-channel collection on voiceprint recognition is caused, sound sources and channels are various, such as a recording pen, a telephone, a VOIP (voice over Internet protocol), a sound pick-up and the like, different audio coding and decoding modes are adopted in different collection channels, and the damage of sound is caused more or less in the process of analog-to-digital conversion; the influence of technologies such as recording attack, TTS and the like on voiceprint recognition. When the voiceprint recognition model based on deep learning is trained through a large amount of voice data, rich acoustic features (frequency spectrum, fundamental tone, formant and the like) can be automatically learned, and the challenges are overcome to a certain extent.
In addition, in the network training process, due to the fact that the problems of more network layers, huge network parameters, time consumption for training and network fitting exist, a Fast Batch Normalization (FBN) method is proposed to be based on so as to accelerate network convergence when an FBN-Alexnet network is trained.
As shown in fig. 3, in step S3, the identifying the voiceprint in the image or audio/video file by using the FBN-Alexnet network small sample voiceprint identification technology specifically includes:
s31, inputting an original voice signal of the small sample to obtain a spectrogram; before using an audio training or testing model, a section of voice signal is framed in advance due to the characteristic of short-time invariance of the voice signal;
s32, obtaining more training data by changing the size of a spectrogram by adopting an image increasing algorithm of a convex lens imaging principle;
s33, training an FBN-Alexnet network model by using voiceprint data, wherein the training comprises extracting voiceprint features by a convolutional layer, accelerating network convergence by an FBN, reducing calculation complexity by a pooling layer and carrying out voiceprint classification by a full connection layer;
and S34, obtaining voiceprint data in the image or audio/video file, and identifying the voiceprint data by adopting the trained FBN-Alexnet network model.
Example 2
As shown in fig. 4, the present invention provides a law enforcement instrument, which includes an image audio/video acquisition module, an image hash value generation module, a voiceprint hash value generation module, and a two-dimensional code link generation module;
the image audio/video acquisition module is used for acquiring images or audio/video files of a law enforcement scene in the law enforcement process; the invention adopts a Zynq-7000 platform-based high-definition video acquisition processing system, and realizes high-definition CMOS image acquisition, image preprocessing on FPGA and video image caching by deeply analyzing the Zynq-7000 platform basic characteristics and the whole frame of high-definition video acquisition processing;
the image hash value generation module is used for identifying the face in the image or the audio/video file by adopting a face identification technology of a VGG-16 convolutional neural network to generate an image hash value;
the voiceprint hash value generation module is used for identifying the voiceprint in the image or the audio/video file by adopting an FBN-Alexnet network small sample voiceprint identification technology to generate a voiceprint hash value;
the two-dimension code link generation module is used for automatically uploading five elements of a file hash value, trusted time of a time service center, geographical location information, law enforcement equipment information and law enforcement personnel information which are generated by automatic calculation aiming at the image or audio/video file through a network into an evidence storing cloud with a block chain deployed, and storing a two-dimension code generating an evidence storing cloud evidence fixing report link; the file hash value comprises an image hash value and a voice print hash value.
As shown in fig. 5, the image hash value generation module includes a neural network model training unit, a face image obtaining unit, an image preprocessing unit, and a face recognition unit;
the neural network model training unit is used for constructing a VGG-16 convolutional neural network model and training the VGG-16 convolutional neural network model;
the face picture acquisition unit is used for acquiring a face picture in an image or audio/video file;
the image preprocessing unit is used for preprocessing a face image, and comprises face detection and face alignment correction; preferably, the Aadaboost + Haar feature method is adopted for face detection, and affine transformation is adopted for face alignment correction.
The face recognition unit is used for extracting face features by adopting a trained VGG-16 convolutional neural network model, and screening the features through three full-connection layers after feature extraction, nonlinear mapping and feature dimension reduction of 5 units of the VGG-16 convolutional neural network model, so that feature dimensions are further reduced; and finally, recognizing the human face by adopting a classifier.
Specifically, the neural network model training unit comprises a training data set acquisition subunit and a neural network model training subunit;
the training data set acquisition subunit is used for acquiring a certain number of human face pictures, preprocessing the human face pictures and using the human face pictures as a training data set of the VGG-16 convolutional neural network model; preprocessing comprises data enhancement, face detection alignment cutting, data format conversion and picture mean calculation;
the neural network model training subunit is used for constructing the VGG-16 convolutional neural network model, and training the VGG-16 convolutional neural network model by adopting a training data set, wherein the training comprises network layer modification, network parameter modification and network model training.
Voiceprint recognition is divided into two techniques, speaker recognition and speaker verification. In any technique, the voiceprint of the speaker is collected, digitized and modeled. After the voiceprint collection of the public society is collected in the whole sample, when the voiceprint inspection material of the case-related sound source is obtained, the voiceprint inspection material can be automatically compared with the voiceprint collection of the whole sample, and the real identity of the suspect can be instantly locked. Because the voiceprint can be remotely sampled and identified, the method has incomparable natural advantages for detecting non-contact cases. Voiceprint recognition currently faces the following challenges: the voice time-varying property affects the voiceprint recognition, the stability of the voice is relatively poorer than the stability of biological characteristics such as human faces, fingerprints and the like, and the voice of one person can be changed due to factors such as different voice-varying periods, pathological changes, trauma, recording conditions, different speech environments and the like, so that the stability of the voice is reduced; the influence of cross-channel collection on voiceprint recognition is caused, sound sources and channels are various, such as a recording pen, a telephone, a VOIP (voice over Internet protocol), a sound pick-up and the like, different audio coding and decoding modes are adopted in different collection channels, and the damage of sound is caused more or less in the process of analog-to-digital conversion; the influence of technologies such as recording attack, TTS and the like on voiceprint recognition. When the voiceprint recognition model based on deep learning is trained through a large amount of voice data, rich acoustic features (frequency spectrum, fundamental tone, formant and the like) can be automatically learned, and the challenges are overcome to a certain extent.
In addition, in the network training process, due to the fact that the problems of more network layers, huge network parameters, time consumption for training and network fitting exist, a Fast Batch Normalization (FBN) method is proposed to be based on so as to accelerate network convergence when an FBN-Alexnet network is trained.
As shown in fig. 6, the voiceprint hash value generation module includes a small sample input unit, a training data acquisition module, an FBN network model training module, and a voiceprint recognition module;
the small sample input unit is used for inputting an original voice signal of a small sample to obtain a spectrogram; before using an audio training or testing model, a section of voice signal is framed in advance due to the characteristic of short-time invariance of the voice signal;
the training data acquisition module is used for acquiring more training data by changing the size of a spectrogram by adopting an image increasing algorithm of a convex lens imaging principle;
the FBN network model training module is used for training an FBN-Alexnet network model by adopting voiceprint data, and comprises a convolutional layer for extracting voiceprint characteristics, an FBN accelerating network convergence, a pooling layer for reducing the calculation complexity and a full connection layer for voiceprint classification;
the voiceprint recognition module is used for obtaining voiceprint data in the image or audio/video file and recognizing the voiceprint data by adopting a trained FBN-Alexnet network model.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A law enforcement instant evidence fixing method is characterized by comprising the following steps:
collecting images or audio/video files of a law enforcement scene in the law enforcement process;
adopting a face recognition technology of a VGG-16 convolutional neural network to recognize the face in the image or the audio/video file and generate an image hash value;
identifying the voiceprint in the image or audio/video file by adopting an FBN-Alexnet network small sample voiceprint identification technology to generate a voiceprint hash value;
automatically uploading five elements of a file hash value, trusted time of a time service center, geographical position information, law enforcement equipment information and law enforcement personnel information which are generated by automatically calculating the image or audio/video file to an evidence storing cloud with a block chain through a network, and simultaneously storing a two-dimensional code which is linked with an evidence storing cloud evidence fixing report; the file hash value comprises an image hash value and a voice print hash value.
2. The law enforcement instant evidence fixing method according to claim 1, wherein the recognizing the face in the image or audio-video file by adopting the face recognition technology of the VGG-16 convolutional neural network specifically comprises:
constructing a VGG-16 convolutional neural network model, and training the VGG-16 convolutional neural network model;
acquiring a face picture in an image or audio/video file;
preprocessing the face picture, including face detection and face alignment correction;
extracting human face features by adopting a trained VGG-16 convolutional neural network model, and screening the features through three full-connection layers after feature extraction, nonlinear mapping and feature dimension reduction of 5 units of the VGG-16 convolutional neural network model, so as to further reduce feature dimensions; and finally, recognizing the human face by adopting a classifier.
3. The method for fixing law enforcement instant evidence according to claim 2, characterized in that Aadaboost + Haar feature method is adopted for face detection, and affine transformation is adopted for face alignment correction.
4. The law enforcement instant evidence immobilization method of claim 2 wherein constructing a VGG-16 convolutional neural network model and training the VGG-16 convolutional neural network model comprises:
acquiring a certain number of human face pictures, and preprocessing the human face pictures to be used as a training data set of a VGG-16 convolutional neural network model; preprocessing comprises data enhancement, face detection alignment cutting, data format conversion and picture mean calculation;
and constructing a VGG-16 convolutional neural network model, and training the VGG-16 convolutional neural network model by adopting a training data set, wherein the training comprises network layer modification, network parameter modification and network model training.
5. The law enforcement instant evidence fixing method according to claim 1, wherein the identification of the voiceprint in the image or audio/video file by using a FBN-Alexnet network small sample voiceprint identification technology specifically comprises:
inputting an original voice signal of a small sample to obtain a spectrogram;
an image increasing algorithm based on a convex lens imaging principle is adopted, and more training data are obtained by changing the size of a spectrogram;
training an FBN-Alexnet network model by using voiceprint data, wherein the training comprises extracting voiceprint characteristics by a convolutional layer, accelerating network convergence by the FBN, reducing the calculation complexity by a pooling layer and carrying out voiceprint classification by a full connection layer;
and acquiring voiceprint data in the image or audio/video file, and identifying the voiceprint data by adopting a trained FBN-Alexnet network model.
6. A law enforcement instrument, comprising:
the image audio/video acquisition module is used for acquiring images or audio/video files of a law enforcement scene in the law enforcement process;
the image hash value generation module is used for identifying the face in the image or the audio/video file by adopting the face identification technology of the VGG-16 convolutional neural network to generate an image hash value;
the voiceprint hash value generation module is used for identifying the voiceprint in the image or the audio/video file by adopting an FBN-Alexnet network small sample voiceprint identification technology to generate a voiceprint hash value;
the two-dimensional code link generation module is used for automatically uploading five elements, namely a file hash value generated by automatic calculation aiming at the image or audio/video file, trusted time of a time service center, geographical position information, law enforcement equipment information and law enforcement personnel information, into an evidence deposit cloud with a block chain deployed through a network, and storing a two-dimensional code generating an evidence deposit cloud evidence fixation report link; the file hash value comprises an image hash value and a voice print hash value.
7. The law enforcement instrument of claim 6, wherein the image hash value generation module comprises:
the neural network model training unit is used for constructing the VGG-16 convolutional neural network model and training the VGG-16 convolutional neural network model;
the face picture acquisition unit is used for acquiring a face picture in an image or audio/video file;
the image preprocessing unit is used for preprocessing the face image, and comprises face detection and face alignment correction;
the face recognition unit is used for extracting face features by adopting the trained VGG-16 convolutional neural network model, and screening the features through three full-connection layers after feature extraction, nonlinear mapping and feature dimension reduction of 5 units of the VGG-16 convolutional neural network model, so that feature dimensions are further reduced; and finally, recognizing the human face by adopting a classifier.
8. The law enforcement instrument of claim 7, wherein Aadaboost + Haar features are used for face detection and affine transformation is used for face alignment correction.
9. The law enforcement instrument of claim 7, wherein the neural network model training unit comprises:
the training data set acquisition subunit is used for acquiring a certain number of face pictures, preprocessing the face pictures and using the face pictures as a training data set of the VGG-16 convolutional neural network model; preprocessing comprises data enhancement, face detection alignment cutting, data format conversion and picture mean calculation;
and the neural network model training subunit is used for constructing the VGG-16 convolutional neural network model, and training the VGG-16 convolutional neural network model by adopting a training data set, wherein the training comprises network layer modification, network parameter modification and network model training.
10. The law enforcement instrument of claim 6, wherein the voiceprint hash generation module comprises:
the small sample input unit is used for inputting an original voice signal of a small sample to obtain a spectrogram;
the training data acquisition module is used for acquiring more training data by changing the size of a spectrogram by adopting an image increasing algorithm of a convex lens imaging principle;
the FBN network model training module is used for training an FBN-Alexnet network model by adopting voiceprint data, and comprises a convolutional layer for extracting voiceprint characteristics, an FBN accelerating network convergence, a pooling layer for reducing the calculation complexity and a full connection layer for voiceprint classification;
and the voiceprint recognition module is used for acquiring voiceprint data in the image or audio/video file and recognizing the voiceprint data by adopting a trained FBN-Alexnet network model.
CN202110974278.9A 2021-08-24 2021-08-24 Law enforcement instant evidence fixing method and law enforcement instrument Active CN113762110B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110974278.9A CN113762110B (en) 2021-08-24 2021-08-24 Law enforcement instant evidence fixing method and law enforcement instrument

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110974278.9A CN113762110B (en) 2021-08-24 2021-08-24 Law enforcement instant evidence fixing method and law enforcement instrument

Publications (2)

Publication Number Publication Date
CN113762110A true CN113762110A (en) 2021-12-07
CN113762110B CN113762110B (en) 2024-07-26

Family

ID=78790992

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110974278.9A Active CN113762110B (en) 2021-08-24 2021-08-24 Law enforcement instant evidence fixing method and law enforcement instrument

Country Status (1)

Country Link
CN (1) CN113762110B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115309938A (en) * 2022-10-09 2022-11-08 浙江汇信科技有限公司 Method and system for analyzing and mining supervision and law enforcement big data

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111723679A (en) * 2020-05-27 2020-09-29 上海五零盛同信息科技有限公司 Face and voiceprint authentication system and method based on deep migration learning
WO2020237855A1 (en) * 2019-05-30 2020-12-03 平安科技(深圳)有限公司 Sound separation method and apparatus, and computer readable storage medium
CN112487452A (en) * 2020-12-01 2021-03-12 安徽三人信息科技有限公司 On-site certificate storing method and device and related certificate storing system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020237855A1 (en) * 2019-05-30 2020-12-03 平安科技(深圳)有限公司 Sound separation method and apparatus, and computer readable storage medium
CN111723679A (en) * 2020-05-27 2020-09-29 上海五零盛同信息科技有限公司 Face and voiceprint authentication system and method based on deep migration learning
CN112487452A (en) * 2020-12-01 2021-03-12 安徽三人信息科技有限公司 On-site certificate storing method and device and related certificate storing system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
丁冬兵;: "TL-CNN-GAP模型下的小样本声纹识别方法研究", 电脑知识与技术, no. 24, 31 August 2018 (2018-08-31) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115309938A (en) * 2022-10-09 2022-11-08 浙江汇信科技有限公司 Method and system for analyzing and mining supervision and law enforcement big data
CN115309938B (en) * 2022-10-09 2024-04-12 浙江汇信科技有限公司 Method and system for monitoring and managing law enforcement big data analysis mining

Also Published As

Publication number Publication date
CN113762110B (en) 2024-07-26

Similar Documents

Publication Publication Date Title
CN106529414A (en) Method for realizing result authentication through image comparison
CN109190475B (en) Face recognition network and pedestrian re-recognition network collaborative training method
US11544835B2 (en) Systems and methods for detecting image recapture
CN111368649B (en) A Method of Emotion Sensing Running on Raspberry Pi
CN114387977B (en) Voice cutting trace positioning method based on double-domain depth feature and attention mechanism
CN110827832A (en) Video identity recognition equipment and method
Chandran et al. Missing child identification system using deep learning and multiclass SVM
CN107133590B (en) A kind of identification system based on facial image
CN112069891A (en) A deep forgery face identification method based on illumination features
CN111639580B (en) A Gait Recognition Method Combining Feature Separation Model and Perspective Transformation Model
CN110493640A (en) A kind of system and method that the Video Quality Metric based on video processing is PPT
CN114463828A (en) Unified witness-based invigilation method and system, electronic equipment and storage medium
CN109740492A (en) An identity authentication method and device
CN111709930A (en) Image provenance and tampering identification method based on pattern noise
CN113762110B (en) Law enforcement instant evidence fixing method and law enforcement instrument
CN117176998A (en) Channel attention-based dual-flow network cross-mode mouth shape synchronization method and system
CN116383791A (en) Customer identity authentication method, device, electronic equipment and medium
CN105138886A (en) Robot biometric identification system
CN113327619B (en) A method and system for meeting recording based on cloud-edge collaboration architecture
CN114022923A (en) Intelligent collecting and editing system
CN112560811A (en) End-to-end automatic detection research method for audio-video depression
CN118747927A (en) Access control method, system and access control terminal based on multi-source data
CN111325118A (en) Method for identity authentication based on video and video equipment
CN110580915A (en) Sound source target recognition system based on wearable devices
Anvekar et al. Detection of manipulated multimedia in digital forensics using machine learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Ding Liping

Inventor after: Du Mo

Inventor before: Sean

Inventor before: Ding Liping

Inventor before: Huang Zhaoying

TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20230529

Address after: Room 501-5, 501-6, 501-7, 501-8, 501-9, 501-10, 5th floor, No.128 Jiaoxi Road, Huangge Town, Nansha District, Guangzhou City, Guangdong Province, 510000

Applicant after: GUANGDONG ZHONGKE SHISHU TECHNOLOGY Co.,Ltd.

Address before: 510000 804, building a, No. 1121 Haibin Road, Nansha District, Guangzhou, Guangdong

Applicant before: Guangzhou Software Application Technology Research Institute

GR01 Patent grant
GR01 Patent grant