[go: up one dir, main page]

CN112269897B - Method and device for determining voice acquisition equipment - Google Patents

Method and device for determining voice acquisition equipment Download PDF

Info

Publication number
CN112269897B
CN112269897B CN202011128284.4A CN202011128284A CN112269897B CN 112269897 B CN112269897 B CN 112269897B CN 202011128284 A CN202011128284 A CN 202011128284A CN 112269897 B CN112269897 B CN 112269897B
Authority
CN
China
Prior art keywords
track data
voice acquisition
voice
data set
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011128284.4A
Other languages
Chinese (zh)
Other versions
CN112269897A (en
Inventor
李健
王玉好
梁志婷
沈忱
徐浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Minglue Artificial Intelligence Group Co Ltd
Original Assignee
Shanghai Minglue Artificial Intelligence Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Minglue Artificial Intelligence Group Co Ltd filed Critical Shanghai Minglue Artificial Intelligence Group Co Ltd
Priority to CN202011128284.4A priority Critical patent/CN112269897B/en
Publication of CN112269897A publication Critical patent/CN112269897A/en
Application granted granted Critical
Publication of CN112269897B publication Critical patent/CN112269897B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Tourism & Hospitality (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Library & Information Science (AREA)
  • Economics (AREA)
  • Multimedia (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention provides a method and a device for determining voice acquisition equipment, comprising the following steps: acquiring N groups of audio track data sets obtained by respectively carrying out voice acquisition by N voice acquisition devices, wherein one voice acquisition device corresponds to one group of audio track data sets, and N is an integer greater than 1; comparing text repetition rates between the N sets of soundtrack data sets; and determining the voice acquisition equipment corresponding to the sound track data set with the text repetition rate greater than or equal to the preset threshold value as target voice acquisition equipment. The invention solves the problem of low working efficiency caused by difficult management of staff at the gate, thereby achieving the effect of timely and accurately supervising the staff and improving the working efficiency of the staff.

Description

Method and device for determining voice acquisition equipment
Technical Field
The present invention relates to the field of communications, and in particular, to a method and apparatus for determining a voice acquisition device.
Background
The main problem of difficult management of store service staff is that staff talk with each other privately, and the customer enters the store with poor experience. The staff on the basic level of the service industry is difficult to manage and the staff is distracted. The resources of the basic staff are not effectively utilized. Work behavior specification and formulation formulated by a company are not carried out in place by staff, and quantitative assessment cannot be carried out. The store clerk management is basically close to store manager and store length, so that the time and labor are consumed, resources are wasted, subjective factors occupy more proportion, and unfair phenomenon exists.
Aiming at the problem of low working efficiency caused by difficult management of store personnel in the related art, no effective solution exists at present.
Disclosure of Invention
The embodiment of the invention provides a method and a device for determining voice acquisition equipment, which are used for at least solving the problem of low working efficiency caused by difficult management of store personnel in the related technology.
According to an embodiment of the present invention, there is provided a method for determining a voice acquisition device, including: acquiring N groups of audio track data sets obtained by respectively carrying out voice acquisition by N voice acquisition devices, wherein one voice acquisition device corresponds to one group of audio track data sets, and N is an integer greater than 1; comparing text repetition rates between the N sets of soundtrack data sets; and determining the voice acquisition equipment corresponding to the sound track data set with the text repetition rate greater than or equal to the preset threshold value as target voice acquisition equipment.
Optionally, the N voice acquisition devices include a first voice acquisition device and a second voice acquisition device, where the first voice acquisition device acquires a first audio track data set by voice acquisition, and the second voice acquisition device acquires a second audio track data set by voice acquisition, and the voice acquisition device corresponding to the audio track data set with the text repetition rate greater than or equal to the preset threshold is determined to be a target voice acquisition device, and the method includes: and under the condition that the text repetition rate between the first audio track data set and the second audio track data set is larger than or equal to the preset threshold value, determining the first voice acquisition equipment and the second voice acquisition equipment as the target voice acquisition equipment.
Optionally, the first audio track data set includes a first far audio track data set and a first near audio track data set, the second audio track data set includes a second far audio track data set and a second near audio track data set, where determining a voice acquisition device corresponding to the audio track data set with a text repetition rate greater than or equal to a preset threshold is a target voice acquisition device, including: determining that the first voice acquisition device and the second voice acquisition device are the target voice acquisition device when the text repetition rate between the first far-track data set and the second near-track data set is greater than or equal to a first preset threshold value and the text repetition rate between the first near-track data set and the second far-track data set is greater than or equal to a second preset threshold value; wherein the preset threshold includes the first preset threshold and the second preset threshold.
Optionally, the comparing text repetition rates between the N sets of audio track data sets includes: and respectively converting each of the N groups of audio track data sets into audio track identification texts to obtain N groups of audio track identification texts.
Optionally, after the determining that the voice acquisition device corresponding to the track data set with the text repetition rate greater than or equal to the preset threshold is the target voice acquisition device, the method further includes: determining the identity of the target voice acquisition equipment; recording the identity and sending the identity to a management terminal.
Optionally, after the determining that the voice acquisition device corresponding to the track data set with the text repetition rate greater than or equal to the preset threshold is the target voice acquisition device, the method further includes: and sending indication information to the target voice acquisition equipment so as to indicate the target voice acquisition equipment to send alarm information.
According to another embodiment of the present invention, there is provided a determining apparatus of a voice acquisition device, including: the voice acquisition module is used for acquiring N groups of voice track data sets obtained by voice acquisition of N voice acquisition devices respectively, wherein one voice acquisition device corresponds to one group of voice track data sets; a comparison module for comparing text repetition rates between the N sets of soundtrack data sets; the determining module is used for determining the voice acquisition equipment corresponding to the sound track data set with the text repetition rate being greater than or equal to a preset threshold value as target voice acquisition equipment.
Optionally, the N voice acquisition devices include a first voice acquisition device and a second voice acquisition device, where the first voice acquisition device acquires a first audio track data set and the second voice acquisition device acquires a second audio track data set, and the determining module includes: and the first determining unit is used for determining that the first voice acquisition device and the second voice acquisition device are the target voice acquisition device under the condition that the text repetition rate between the first voice track data set and the second voice track data set is larger than or equal to the preset threshold value.
According to a further embodiment of the invention, there is also provided a storage medium having stored therein a computer program, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run.
According to a further embodiment of the invention, there is also provided an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.
According to the invention, as N groups of audio track data sets are obtained by acquiring N voice acquisition devices which respectively acquire voice, one voice acquisition device corresponds to one group of audio track data sets; comparing text repetition rates between the N sets of soundtrack data sets; and determining the voice acquisition equipment corresponding to the sound track data set with the text repetition rate greater than or equal to the preset threshold value as target voice acquisition equipment. The method and the device achieve the aim of determining the staff in chat based on the target voice acquisition equipment and achieve the aim of supervising the staff. Therefore, the problem of low working efficiency caused by difficult management of staff at the gate can be solved, staff can be timely and accurately supervised, and the effect of improving the working efficiency of the staff is achieved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiments of the invention and together with the description serve to explain the invention and do not constitute a limitation on the invention. In the drawings:
fig. 1 is a block diagram of a hardware structure of a mobile terminal of a method for determining a voice acquisition device according to an embodiment of the present invention;
FIG. 2 is a flow chart of a determination of a speech acquisition device according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart diagram according to an alternative embodiment of the present invention;
fig. 4 is a block diagram of a configuration of a determining apparatus of a voice acquisition device according to an embodiment of the present invention.
Detailed Description
The invention will be described in detail hereinafter with reference to the drawings in conjunction with embodiments. It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.
The method embodiment provided in the first embodiment of the present application may be executed in a mobile terminal, a computer terminal or a similar computing device. Taking the mobile terminal as an example, fig. 1 is a block diagram of a hardware structure of the mobile terminal according to a method for determining a voice acquisition device according to an embodiment of the present invention. As shown in fig. 1, the mobile terminal 10 may include one or more (only one is shown in fig. 1) processors 102 (the processor 102 may include, but is not limited to, a microprocessor MCU or a processing device such as a programmable logic device FPGA) and a memory 104 for storing data, and optionally a transmission device 106 for communication functions and an input-output device 108. It will be appreciated by those skilled in the art that the structure shown in fig. 1 is merely illustrative and not limiting of the structure of the mobile terminal described above. For example, the mobile terminal 10 may also include more or fewer components than shown in FIG. 1 or have a different configuration than shown in FIG. 1.
The memory 104 may be used to store a computer program, for example, a software program of application software and a module, such as a computer program corresponding to a method for determining a voice acquisition device in an embodiment of the present invention, and the processor 102 executes the computer program stored in the memory 104, thereby performing various functional applications and data processing, that is, implementing the method described above. Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the mobile terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission means 106 is arranged to receive or transmit data via a network. The specific examples of networks described above may include wireless networks provided by the communication provider of the mobile terminal 10. In one example, the transmission device 106 includes a network adapter (Network Interface Controller, simply referred to as NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used to communicate with the internet wirelessly.
In this embodiment, a method for determining a voice capturing device operating on the mobile terminal is provided, fig. 2 is a flowchart of determining a voice capturing device according to an embodiment of the present invention, and as shown in fig. 2, the flowchart includes the following steps:
step S202, N groups of sound track data sets obtained by respectively carrying out sound collection by N sound collection devices are obtained, wherein one sound collection device corresponds to one group of sound track data sets, and N is an integer greater than 1;
step S204, comparing text repetition rates among N groups of audio track data sets;
step S206, determining the voice acquisition device corresponding to the sound track data set with the text repetition rate larger than or equal to the preset threshold value as the target voice acquisition device.
Through the steps, since N groups of track data sets are obtained by acquiring the voices acquired by N voice acquisition devices respectively, wherein one voice acquisition device corresponds to one group of track data sets; comparing text repetition rates between the N sets of soundtrack data sets; and determining the voice acquisition equipment corresponding to the sound track data set with the text repetition rate greater than or equal to the preset threshold value as target voice acquisition equipment. The method and the device achieve the aim of determining the staff in chat based on the target voice acquisition equipment and achieve the aim of supervising the staff. Therefore, the problem of low working efficiency caused by difficult management of staff at the gate can be solved, staff can be timely and accurately supervised, and the effect of improving the working efficiency of the staff is achieved.
Alternatively, the execution subject of the above steps may be a terminal or the like, but is not limited thereto.
As an alternative embodiment, the invention performs sound collection by wearing one or more voice capture devices per attendant, which may be microphone recording devices. The voice acquisition device can acquire voices of the wearer and the talker. In this embodiment, N voice capturing devices may be worn by N staff, and each voice capturing device may capture voices uttered by the staff wearing the voice capturing device and the speaker.
Suppose that three persons exist in the field staff, staff A wears a first voice acquisition device, staff B wears a second voice acquisition device, and staff C wears a third voice acquisition device. Typically, the staff A, B, C is typically assigned to different areas for work, and the voice capture devices worn by the staff are typically capable of voice capture for staff conversations with customers. If the staff member performs a conversation with other staff members, such as a conversation between staff members a and B, the first voice collecting device and the second voice collecting device may also perform voice collection on the conversation contents of staff members a and B.
As an alternative embodiment, a voice acquisition device may acquire a set of audio track data. The first voice acquisition device can acquire a first audio track data set, the second voice acquisition device can acquire a second audio track data set, and the third voice acquisition device can acquire a third audio track data set. By comparing the text repetition rates of the track data sets acquired by the three voice acquisition devices, the staff of the conversation can be determined. For example, if the text repetition rate of the first audio track data set and the second audio track data set is greater than the preset threshold, it is determined that the staff a wearing the first voice acquisition device and the staff B wearing the second voice acquisition device have a conversation, and there is a possibility of talking with each other privately, and at this time, it is determined that the first voice acquisition device and the second voice acquisition device are target voice acquisition devices. In this embodiment, the preset threshold may be determined according to practical situations, and may be, for example, 0.5,0.6,0.7, etc. In this embodiment, through the text repetition rate between the audio track data sets that the pronunciation collection device that the staff wore, confirm the staff who talks each other, and then can in time accurately confirm the staff that the working time talked privately, reach in time accurately supervise staff's operating condition's purpose, and then can improve staff's work efficiency.
Optionally, the N voice acquisition devices include a first voice acquisition device and a second voice acquisition device, where the first voice acquisition device acquires a first audio track data set by voice acquisition, and the second voice acquisition device acquires a second audio track data set by voice acquisition, and the voice acquisition device corresponding to the audio track data set with the text repetition rate greater than or equal to the preset threshold is determined to be a target voice acquisition device, and the method includes: and under the condition that the text repetition rate between the first audio track data set and the second audio track data set is larger than or equal to the preset threshold value, determining the first voice acquisition equipment and the second voice acquisition equipment as the target voice acquisition equipment.
As an alternative embodiment, the first voice collecting device is worn by the staff a, the second voice collecting device is worn by the staff B, the first voice collecting device can collect the dialogue content between the staff a and other staff, and the second voice collecting device can collect the dialogue content between the staff B and other staff. If the staff A and the staff B are in a conversation, the text repetition rate between the first voice track data set of the first voice acquisition device and the second voice track data set acquired by the second voice acquisition device is higher, whether the conversation exists between the staff A and the staff B can be determined by setting a preset threshold value, and if the text repetition rate between the first voice track data set and the second voice track data set is greater than or equal to the preset threshold value, the conversation exists between the staff A and the staff B, so that the possibility of private conversation and negative idle work exists. At this time, it is determined that the first voice acquisition device and the second voice acquisition device are target voice acquisition devices, and the staff A wearing the first voice acquisition device and the staff B wearing the second voice acquisition device are target objects, so that the staff can be warned not to talk privately at working time in a warning mode, and therefore the working efficiency of the staff can be improved.
Optionally, the first audio track data set includes a first far audio track data set and a first near audio track data set, the second audio track data set includes a second far audio track data set and a second near audio track data set, where determining a voice acquisition device corresponding to the audio track data set with a text repetition rate greater than or equal to a preset threshold is a target voice acquisition device, including: determining that the first voice acquisition device and the second voice acquisition device are the target voice acquisition device when the text repetition rate between the first far-track data set and the second near-track data set is greater than or equal to a first preset threshold value and the text repetition rate between the first near-track data set and the second far-track data set is greater than or equal to a second preset threshold value; wherein the preset threshold includes the first preset threshold and the second preset threshold.
As an alternative embodiment, the voice of the wearer acquired by the voice acquisition device is a near audio track, and the voices of other people are far audio tracks. For example, the voice of the staff a collected by the first voice collecting device worn by the staff a is a near audio track, that is, the first near audio track data set is the voice data sent by the staff a wearing the first voice collecting device, and the other voice data collected by the first voice collecting device is far audio track data. Similarly, the voice of the staff B collected by the second voice collecting device worn by the staff B is near audio track, and the other voice data collected by the second voice collecting device is far audio track data. In this embodiment, assuming that the a and B dialogues, the text repetition rate between the first near-track data set acquired by the first voice acquisition device and the second far-track data set acquired by the second voice acquisition device is higher, and the text repetition rate between the first far-track data set acquired by the first voice acquisition device and the second near-track data set acquired by the second voice acquisition device is higher. In this embodiment, whether a conversation exists between the staff member a and the staff member B may be determined by setting a first preset threshold value and a second preset threshold value, and if the text repetition rate between the first near-track data set collected by the first voice collecting device and the second far-track data set collected by the second voice collecting device is greater than or equal to the first preset threshold value, and the text repetition rate between the first far-track data set collected by the first voice collecting device and the second near-track data set collected by the second voice collecting device is greater than or equal to the second preset threshold value, it is determined that a conversation exists between the staff member a and the staff member B, and a possibility of private conversation exists.
Optionally, the comparing text repetition rates between the N sets of audio track data sets includes: and respectively converting each of the N groups of audio track data sets into audio track identification texts to obtain N groups of audio track identification texts.
As an alternative embodiment, the voice data collected by the voice collection device may be converted into text-to-text data. The text repetition rate between the soundtrack data sets may be determined by text alignment.
Optionally, after the determining that the voice acquisition device corresponding to the track data set with the text repetition rate greater than or equal to the preset threshold is the target voice acquisition device, the method further includes: determining the identity of the target voice acquisition equipment; recording the identity and sending the identity to a management terminal.
As an alternative embodiment, each voice capture device has an identity, which may be the label of the voice capture device, which is used to uniquely identify the identity of the voice capture device. In this embodiment, the identity of the voice acquisition device may be bound to the identity information of the worker wearing the voice acquisition device, for example, the label of the voice acquisition device is bound to the work number of the worker. If the text repetition rate between the first audio track data set acquired by the first voice acquisition equipment and the second audio track data set acquired by the second voice acquisition equipment is greater than or equal to a preset threshold, determining that the first voice acquisition equipment and the second voice acquisition equipment are target voice acquisition equipment, recording identity marks of the first voice acquisition equipment and the second voice acquisition equipment, determining identity information of staff bound with the identity marks, and determining which staff have private conversations and are idle by a manager at a management terminal. Therefore, the technical effects of supervising the working state of the staff and improving the working efficiency of the staff can be achieved.
Optionally, after the determining that the voice acquisition device corresponding to the track data set with the text repetition rate greater than or equal to the preset threshold is the target voice acquisition device, the method further includes: and sending indication information to the target voice acquisition equipment so as to indicate the target voice acquisition equipment to send alarm information.
As an alternative embodiment, if the text repetition rate between the first audio track data set acquired by the first voice acquisition device and the second audio track data set acquired by the second voice acquisition device is greater than or equal to a preset threshold, it is determined that the first voice acquisition device and the second voice acquisition device are target voice acquisition devices. Under the condition that the first voice acquisition equipment and the second voice acquisition equipment are determined to be target voice acquisition equipment, indication information is sent to the first voice acquisition equipment and the second voice acquisition equipment so as to indicate the first voice acquisition equipment and the second voice acquisition equipment to send alarm information, so that a conversation is stopped between an employee A wearing the first voice acquisition equipment and an employee B wearing the second voice acquisition equipment, and the effect of reminding the employee of serious work is achieved.
The present application is described below by way of a specific example.
The main problem of store attendant management difficulty is that staff talk with each other privately, passive idle work, customer business entrance experience is poor, and through wearing a multi-microphone recording device by every attendant in this embodiment, sound collection is carried out, including wearer himself and the speaker, carries out the record file to different speakers and separately stores, and the wearer stores as near audio track file, and the speaker stores as far audio track file. The recording equipment starts up to automatically record, the recording files are uploaded to the server in real time, the server algorithm carries out real-time transcription, comparison and identification on the recording files, and the near track of the equipment 1 and the far track of the equipment 2, the far track of the equipment 1 and the near track of the equipment 2 are compared and identified, if the comparison sounds of the two files are the same or close to the same, the wearers of the equipment 1 and the equipment 2 can be judged to chat in a short distance, so that the server gives early warning, and a manager is informed of taking corresponding measures in real time.
The working state monitoring method based on voice recognition provided in this embodiment, as shown in fig. 3, is a schematic flow chart according to an alternative embodiment of the present invention, and specifically includes the following steps:
during working time, the recording device is worn on a person (service personnel) and can record the sound of a wearer and the sound of surrounding dialogs. The microphone array in each recording device acquires two tracks: a far track and a near track, so that each recording device can acquire two kinds of voice data: far track data and near track data.
The recording device is connected with the server, and the recording device uploads the voice data carrying the device number of the recording device to the server. And performing algorithm processing on the server, performing algorithm processing of voice transcription, and respectively performing voice recognition processing on the two paths of audio track data acquired corresponding to each equipment number to obtain recognition texts of the two paths of audio tracks, namely a near audio track recognition text and a far audio track recognition text.
Performing algorithm processing of comparison and recognition, and performing pairwise cross comparison on the recognition text according to the equipment number, for example, a server receives voice data with the equipment number of 001 and voice data with the equipment number of 002; comparing the near track recognition text of 001 with the far track recognition text of 002, and detecting a first repetition degree of the two texts; comparing the far track recognition text of 001 with the near track recognition text of 002, and detecting the second repetition degree of the two texts; when the first repetition and the second repetition are both greater than a preset threshold (the preset threshold of the repetition includes a preset text repetition rate threshold and a preset repeated text number threshold), the recording devices corresponding to 001 and 002 are determined to be in an abnormal state (the wearers of 001 and 002 chat in close range). And outputting the result after the identification and comparison, providing a data basis for a manager, and knowing the working state of the service personnel by the manager according to the identification result in an abnormal state, thereby providing data support for performance evaluation and the like of the service personnel.
The embodiment can also be used for reminding service personnel in real time, and when the comparison result is identified as an abnormal state, a prompt instruction is generated and sent to two corresponding recording devices, which can be voice prompt, lamplight prompt, vibration prompt and the like, so as to prompt the service personnel. Through this embodiment can reach and improve chain store staff management efficiency, provide the quantization data to store staff performance, improve staff's work efficiency.
From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the above embodiments may be implemented by means of software plus the necessary general hardware platform, but of course also by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present invention.
The embodiment also provides a determining device of the voice acquisition device, which is used for implementing the foregoing embodiments and the preferred implementation manner, and the description is omitted herein. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.
Fig. 4 is a block diagram of a determining apparatus of a voice acquisition device according to an embodiment of the present invention, as shown in fig. 4, the apparatus including: an acquisition module 42, configured to acquire N sets of audio track data sets obtained by performing voice acquisition by N voice acquisition devices, where one voice acquisition device corresponds to one set of audio track data sets; a comparison module 44 for comparing text repetition rates between the N sets of soundtrack data sets; the determining module 46 is configured to determine a voice capturing device corresponding to the track data set with the text repetition rate greater than or equal to the preset threshold as the target voice capturing device.
Optionally, the N voice acquisition devices include a first voice acquisition device and a second voice acquisition device, where the first voice acquisition device acquires a first audio track data set and the second voice acquisition device acquires a second audio track data set, and the determining module includes: and the first determining unit is used for determining that the first voice acquisition device and the second voice acquisition device are the target voice acquisition device under the condition that the text repetition rate between the first voice track data set and the second voice track data set is larger than or equal to the preset threshold value.
Optionally, the first track data set includes a first far track data set and a first near track data set, the second track data set includes a second far track data set and a second near track data set, where a voice collection device corresponding to the track data set with a text repetition rate greater than or equal to a preset threshold is determined to be a target voice collection device, the above device is further configured to determine that the text repetition rate between the first far track data set and the second near track data set is greater than or equal to a first preset threshold, and determine that the first voice collection device and the second voice collection device are the target voice collection device when the text repetition rate between the first near track data set and the second far track data set is greater than or equal to a second preset threshold; wherein the preset threshold includes the first preset threshold and the second preset threshold.
Optionally, the above apparatus is further configured to implement the text repetition rate between the comparison N sets of audio track data by: and respectively converting each of the N groups of audio track data sets into audio track identification texts to obtain N groups of audio track identification texts.
Optionally, the device is further configured to determine, after the voice acquisition device corresponding to the track data set with the text repetition rate greater than or equal to the preset threshold is determined to be the target voice acquisition device, an identity of the target voice acquisition device; recording the identity and sending the identity to a management terminal.
Optionally, the device is further configured to send, after the voice acquisition device corresponding to the track data set with the text repetition rate greater than or equal to the preset threshold is determined to be the target voice acquisition device, indication information to the target voice acquisition device, so as to indicate the target voice acquisition device to send alarm information.
It should be noted that each of the above modules may be implemented by software or hardware, and for the latter, it may be implemented by, but not limited to: the modules are all located in the same processor; alternatively, the above modules may be located in different processors in any combination.
An embodiment of the invention also provides a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run.
Alternatively, in the present embodiment, the above-described storage medium may be configured to store a computer program for performing the steps of:
s1, acquiring N groups of sound track data sets obtained by respectively carrying out sound collection by N sound collection devices, wherein one sound collection device corresponds to one group of sound track data sets, and N is an integer greater than 1;
s2, comparing text repetition rates among N groups of audio track data sets;
s3, determining the voice acquisition equipment corresponding to the sound track data set with the text repetition rate larger than or equal to the preset threshold value as target voice acquisition equipment.
Alternatively, in the present embodiment, the storage medium may include, but is not limited to: a usb disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing a computer program.
An embodiment of the invention also provides an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, where the transmission device is connected to the processor, and the input/output device is connected to the processor.
Alternatively, in the present embodiment, the above-described processor may be configured to execute the following steps by a computer program:
s1, acquiring N groups of sound track data sets obtained by respectively carrying out sound collection by N sound collection devices, wherein one sound collection device corresponds to one group of sound track data sets, and N is an integer greater than 1;
s2, comparing text repetition rates among N groups of audio track data sets;
s3, determining the voice acquisition equipment corresponding to the sound track data set with the text repetition rate larger than or equal to the preset threshold value as target voice acquisition equipment.
Alternatively, specific examples in this embodiment may refer to examples described in the foregoing embodiments and optional implementations, and this embodiment is not described herein.
It will be appreciated by those skilled in the art that the modules or steps of the invention described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may alternatively be implemented in program code executable by computing devices, so that they may be stored in a memory device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than that shown or described, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps within them may be fabricated into a single integrated circuit module for implementation. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A method for determining a voice acquisition device, comprising:
acquiring N groups of audio track data sets obtained by respectively carrying out voice acquisition by N voice acquisition devices, wherein one voice acquisition device corresponds to one group of audio track data sets, and N is an integer greater than 1;
comparing text repetition rates between the N sets of soundtrack data sets;
and determining the voice acquisition equipment corresponding to the sound track data set with the text repetition rate greater than or equal to the preset threshold value as target voice acquisition equipment.
2. The method of claim 1, wherein the N voice capture devices include a first voice capture device and a second voice capture device, the first voice capture device capturing a first track data set and the second voice capture device capturing a second track data set, wherein determining a voice capture device corresponding to a track data set having a text repetition rate greater than or equal to a preset threshold as the target voice capture device includes:
and under the condition that the text repetition rate between the first audio track data set and the second audio track data set is larger than or equal to the preset threshold value, determining the first voice acquisition equipment and the second voice acquisition equipment as the target voice acquisition equipment.
3. The method of claim 2, wherein the first track data set comprises a first far track data set and a first near track data set, and the second track data set comprises a second far track data set and a second near track data set, wherein determining a speech acquisition device corresponding to a track data set having a text repetition rate greater than or equal to a preset threshold as the target speech acquisition device comprises:
determining that the first voice acquisition device and the second voice acquisition device are the target voice acquisition device when the text repetition rate between the first far-track data set and the second near-track data set is greater than or equal to a first preset threshold value and the text repetition rate between the first near-track data set and the second far-track data set is greater than or equal to a second preset threshold value;
wherein the preset threshold includes the first preset threshold and the second preset threshold.
4. The method of claim 1, wherein comparing text repetition rates between N sets of soundtrack data comprises:
and respectively converting each of the N groups of audio track data sets into audio track identification texts to obtain N groups of audio track identification texts.
5. The method according to any one of claims 1 to 4, wherein after the determining that the voice capture device corresponding to the track data set having the text repetition rate greater than or equal to the preset threshold is the target voice capture device, the method further comprises:
determining the identity of the target voice acquisition equipment;
recording the identity and sending the identity to a management terminal.
6. The method according to any one of claims 1 to 4, wherein after the determining that the voice capture device corresponding to the track data set having the text repetition rate greater than or equal to the preset threshold is the target voice capture device, the method further comprises:
and sending indication information to the target voice acquisition equipment so as to indicate the target voice acquisition equipment to send alarm information.
7. A determining apparatus of a voice acquisition device, characterized by comprising:
the voice acquisition module is used for acquiring N groups of voice track data sets obtained by voice acquisition of N voice acquisition devices respectively, wherein one voice acquisition device corresponds to one group of voice track data sets;
a comparison module for comparing text repetition rates between the N sets of soundtrack data sets;
the determining module is used for determining the voice acquisition equipment corresponding to the sound track data set with the text repetition rate being greater than or equal to a preset threshold value as target voice acquisition equipment.
8. The apparatus of claim 7, wherein the N voice capture devices comprise a first voice capture device that voice captures a first track data set and a second voice capture device that voice captures a second track data set, wherein the determining module comprises:
and the first determining unit is used for determining that the first voice acquisition device and the second voice acquisition device are the target voice acquisition device under the condition that the text repetition rate between the first voice track data set and the second voice track data set is larger than or equal to the preset threshold value.
9. A storage medium having stored therein a computer program, wherein the program is executable by a terminal device or a computer to perform the method of any of claims 1 to 6.
10. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to run the computer program to perform the method of any of the claims 1 to 6.
CN202011128284.4A 2020-10-20 2020-10-20 Method and device for determining voice acquisition equipment Active CN112269897B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011128284.4A CN112269897B (en) 2020-10-20 2020-10-20 Method and device for determining voice acquisition equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011128284.4A CN112269897B (en) 2020-10-20 2020-10-20 Method and device for determining voice acquisition equipment

Publications (2)

Publication Number Publication Date
CN112269897A CN112269897A (en) 2021-01-26
CN112269897B true CN112269897B (en) 2024-04-05

Family

ID=74341303

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011128284.4A Active CN112269897B (en) 2020-10-20 2020-10-20 Method and device for determining voice acquisition equipment

Country Status (1)

Country Link
CN (1) CN112269897B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110634485A (en) * 2019-10-16 2019-12-31 声耕智能科技(西安)研究院有限公司 Voice interaction service processor and processing method
CN110751948A (en) * 2019-10-18 2020-02-04 珠海格力电器股份有限公司 Voice recognition method, device, storage medium and voice equipment
CN111128250A (en) * 2019-12-18 2020-05-08 秒针信息技术有限公司 Information processing method and device
CN111145758A (en) * 2019-12-25 2020-05-12 厦门快商通科技股份有限公司 Voiceprint recognition method, system, mobile terminal and storage medium
CN111145774A (en) * 2019-12-09 2020-05-12 秒针信息技术有限公司 Voice separation method and device
CN111261149A (en) * 2018-11-30 2020-06-09 海马新能源汽车有限公司 Voice information recognition method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4364251B2 (en) * 2007-03-28 2009-11-11 株式会社東芝 Apparatus, method and program for detecting dialog

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111261149A (en) * 2018-11-30 2020-06-09 海马新能源汽车有限公司 Voice information recognition method and device
CN110634485A (en) * 2019-10-16 2019-12-31 声耕智能科技(西安)研究院有限公司 Voice interaction service processor and processing method
CN110751948A (en) * 2019-10-18 2020-02-04 珠海格力电器股份有限公司 Voice recognition method, device, storage medium and voice equipment
CN111145774A (en) * 2019-12-09 2020-05-12 秒针信息技术有限公司 Voice separation method and device
CN111128250A (en) * 2019-12-18 2020-05-08 秒针信息技术有限公司 Information processing method and device
CN111145758A (en) * 2019-12-25 2020-05-12 厦门快商通科技股份有限公司 Voiceprint recognition method, system, mobile terminal and storage medium

Also Published As

Publication number Publication date
CN112269897A (en) 2021-01-26

Similar Documents

Publication Publication Date Title
US10057419B2 (en) Intelligent call screening
CN105721660B (en) Harassing call recognition methods and system
US8577006B2 (en) User-defined system-enforced session termination in a unified telephony environment
CN105280187A (en) Family emotion management device and method
US20100104122A1 (en) Method for establishing performance of hearing devices
SE0500239L (en) Procedure, call connection device and computer software product to control the connection of a telephone call to a user associated with a local network
CN112883932A (en) Method, device and system for detecting abnormal behaviors of staff
CN112231748B (en) Desensitization processing method and device, storage medium and electronic device
CN107977823A (en) Method and device for processing emergency
CN107912084A (en) Data outage detection based on path
CN111311774A (en) Sign-in method and system based on voice recognition
CN112738538B (en) Live broadcasting room on-hook behavior detection method and device, electronic equipment and computer readable storage medium
CN110556110A (en) Voice processing method and device, intelligent terminal and storage medium
KR20160040954A (en) Method and Apparatus for Determining Emergency Disaster Report
CN113890928B (en) Intelligent voice information management method and device, storage medium and electronic equipment
CN110827829A (en) Passenger flow analysis method and system based on voice recognition
CN110782341A (en) Business collection method, device, equipment and medium
CN111144351A (en) Image acquisition and analysis system and method for engineering progress
KR20160010951A (en) System and method for evaluating call or speaker using voice emotion index
CN109951794A (en) Processing method, device, storage medium and the electronic device of voice messaging
CN112269897B (en) Method and device for determining voice acquisition equipment
CN113869576B (en) Order processing method, device, equipment and storage medium
CN115118818A (en) Quality inspection method and device for call recording data, electronic equipment and storage medium
KR102171658B1 (en) Crowd transcription apparatus, and control method thereof
CN114257688A (en) Telephone fraud identification method and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant