CN104157286A

CN104157286A - Idiomatic phrase acquisition method and device

Info

Publication number: CN104157286A
Application number: CN201410374537.4A
Authority: CN
Inventors: 卢存洋
Original assignee: Shenzhen Jinli Communication Equipment Co Ltd
Current assignee: Shenzhen Microphone Holdings Co Ltd
Priority date: 2014-07-31
Filing date: 2014-07-31
Publication date: 2014-11-19
Anticipated expiration: 2034-07-31
Also published as: CN104157286B

Abstract

The embodiment of the invention discloses an idiomatic phrase acquisition method which comprises the following steps: if a voice signal from a user is detected, voice data corresponding to the voice signal is acquired; according to a preset voice byte threshold, a number of target voice bytes corresponding to the voice byte threshold are selected from the voice data; and the target voice bytes are analyzed, and an analysis result including the idiomatic phrases of the user is acquired. The embodiment of the invention also discloses an idiomatic phrase acquisition device. By adopting the idiomatic phrase acquisition method and device, the idiomatic phrases of related users can be obtained in a targeted way.

Description

Method and device for acquiring idioms

Technical Field

The present invention relates to the field of media technologies, and in particular, to a method and an apparatus for acquiring idioms.

Background

In daily life, people inevitably need to communicate with other people. However, in the process of communicating with people, people all have their own speaking habits, so some idioms may be carried in the communication. Some expressions are used as the illiterate expressions, which may damage the communication environment, for example, in a more formal situation, when some illiterate words are played unconsciously, the harmony of communication between people is affected, and negative effects and even certain losses may be brought to the speaker. Therefore, it is critical to grasp the speaking habits of the user in time. However, in the prior art, the speaking habits of the user are not analyzed, and the speaking habits of the relevant user cannot be obtained through the current communication tool.

Disclosure of Invention

The embodiment of the invention provides a method and a device for acquiring idioms, which can be used for acquiring the idioms of related users in a targeted manner.

The embodiment of the invention provides a method for acquiring a idiom, which comprises the following steps:

if a voice signal sent by a user is detected, acquiring voice data corresponding to the voice signal;

screening target voice bytes with the number corresponding to the voice byte threshold value from the voice data according to a preset voice byte threshold value;

and analyzing the target voice byte and acquiring an analysis result containing the idioms of the user.

Correspondingly, the embodiment of the invention also provides a device for acquiring the idioms, which comprises:

the first acquisition unit is used for acquiring voice data corresponding to a voice signal if the voice signal sent by a user is detected;

the screening unit is used for screening target voice bytes with the number corresponding to the voice byte threshold value from the voice data acquired by the first acquisition unit according to a preset voice byte threshold value;

and the second acquisition unit is used for analyzing the target voice byte screened by the screening unit and acquiring an analysis result containing the idioms of the user.

According to the embodiment of the invention, when the voice signal sent by the user is detected, the corresponding voice data can be obtained, and the target voice byte screened from the voice data is analyzed, so that the idiom of the current user can be obtained, the idiom of the related user can be obtained in a targeted manner, and the flexibility is strong.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flowchart illustrating a method for acquiring idioms according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating another method for acquiring idioms according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating a method for obtaining a target speech byte according to an embodiment of the present invention;

FIG. 4 is an interaction diagram of a method for acquiring idioms according to an embodiment of the present invention;

FIG. 5 is a flowchart illustrating a method for acquiring idioms according to another embodiment of the present invention;

FIG. 6 is a schematic structural diagram of a phrase acquiring apparatus according to an embodiment of the present invention;

FIG. 7 is a schematic structural diagram of an apparatus for acquiring idioms according to another embodiment of the present invention;

FIG. 8 is a schematic structural diagram of a phrase acquiring apparatus according to another embodiment of the present invention;

fig. 9 is a schematic structural diagram of a terminal according to an embodiment of the present invention;

FIG. 10 is a schematic diagram of a server according to an embodiment of the present invention;

FIG. 11 is a schematic structural diagram of a phrase acquisition system according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Please refer to fig. 1, which is a flowchart illustrating a method for acquiring a idiom according to an embodiment of the present invention, where the method may be specifically applied to a terminal device such as a mobile phone, a tablet computer, a wearable device, or may be applied to a server, and the embodiment of the present invention is not limited. Specifically, the method comprises the following steps:

s101: and if the voice signal sent by the user is detected, acquiring the voice data corresponding to the voice signal.

In a specific embodiment, whether a voice signal sent by a user exists currently or not can be detected, and when the voice signal is detected, the acquisition of voice data corresponding to the voice signal is triggered, for example, the voice data is acquired through recording.

Further, before the voice data is obtained, it may be further detected whether the user currently sending the voice signal is a valid user of the current terminal, for example, matching detection is performed through a preset voice sample, where the voice sample is a sound fragment of the valid user, and may be obtained by performing pre-recording on the valid user.

S102: and screening out target voice bytes with the number corresponding to the voice byte threshold from the voice data according to a preset voice byte threshold.

In a specific embodiment, a voice byte threshold may be preset, and a target voice byte may be extracted from the acquired voice data according to the threshold. Generally, each word spoken by a user corresponds to one phonetic byte, for example, the user speaks "how you are", and corresponds to three phonetic bytes.

Alternatively, the obtained speech data may be a sentence, and the number of speech bytes corresponding to the threshold value may be extracted from a specific position of the sentence, such as the beginning and/or the end, according to the preset speech byte threshold value as the target speech byte. That is, the target speech byte can be screened every time a sentence is obtained, for example, every time a sentence is recorded, so as to obtain a certain number of target speech bytes. Wherein, each sentence can be distinguished by a preset pause time interval.

Further optionally, the obtained speech data may also be a speech (i.e. composed of multiple sentences), and the obtained speech data may be segmented according to a preset pause time interval to obtain multiple speech segments (one speech segment may correspond to one sentence). Accordingly, if the speech byte threshold is set to 5, 5 speech bytes can be extracted from a specific position of each speech segment as target speech bytes, for example, the first 5 bytes and/or the last 5 bytes of the speech segment are extracted as target speech bytes, so as to obtain a plurality of target speech bytes.

S103: and analyzing the target voice byte and acquiring an analysis result containing the idioms of the user.

Specifically, if it is analyzed that the target speech bytes are the same, that is, some target speech bytes repeatedly appear, the number of occurrences of the speech bytes, that is, the number of repetitions is calculated, and when the number of repetitions exceeds a preset number threshold, for example, 5 times, the corresponding target speech bytes are stored as the idioms of the user.

Furthermore, the analyzed user idioms and the repetition times of the idioms can be pushed to the current terminal.

Furthermore, when it is detected that the user sends a voice signal subsequently and the voice data corresponding to the voice signal is matched with the idioms obtained by the analysis, a message prompt is sent to remind the user of the related utterances.

By implementing the embodiment of the invention, when the voice signal sent by the user is detected, the corresponding voice data can be obtained, and the idiom of the current user can be obtained by analyzing the target voice byte screened from the voice data, so that the idiom of the related user can be obtained in a targeted manner, and the flexibility is strong.

Referring to fig. 2, a flow chart of another method for acquiring idioms according to an embodiment of the present invention is shown, and specifically, the method includes:

s201: and if the voice signal sent by the user is detected, acquiring the sound attribute corresponding to the voice signal.

S202: and judging whether the sound attribute corresponding to the voice signal is matched with the sound attribute corresponding to a preset voice sample.

In a specific embodiment, a voice sample may be preset, where the voice sample is a sound fragment of a valid user, and may be specifically recorded by a current valid user.

S203: and if so, acquiring voice data corresponding to the voice signal.

Specifically, when a voice signal sent by a user is detected, that is, when a person speaks is detected, the validity of the current user identity is determined by matching and comparing the sound attribute of the voice signal with the sound attribute of the voice sample, for example, judging whether the sound color and the frequency corresponding to the two are matched, and when the judgment result is matching, that is, the current user identity is legal, triggering to acquire the voice data corresponding to the voice signal. Wherein the sound attribute may include a speech rate, a tone color or a frequency, and the like.

S204: and segmenting the voice data according to a preset pause time interval to obtain voice segments.

When the user who sends the voice signal is determined to be a legal user, corresponding voice data can be obtained, for example, the voice data is obtained by recording. Specifically, the speech data may be a whole speech, that is, the speech data includes a plurality of speech segments, and the speech data may be segmented by a predetermined segmentation method, for example, the speech data is segmented according to a predetermined pause time interval between speech bytes in the speech data, for example, 200ms, to obtain a speech segment (the speech segment may correspond to a sentence). Further, if the currently recorded voice data is only one speech, the speech may be regarded as one voice segment, that is, each time one speech is recorded, the speech may be regarded as one voice segment, so as to obtain the voice segments with the preset number threshold.

S205: and respectively extracting the voice bytes with the number corresponding to the voice byte threshold from the beginning or the end of the voice segment as target voice bytes according to a preset voice byte threshold.

In a specific embodiment, a speech byte threshold may also be preset, and a target speech byte is extracted from a specific position, such as the beginning and/or the end, of each divided speech segment according to the threshold. For example, if the speech byte threshold is set to 5, the first 5 bytes and the last 5 bytes of the speech segment can be extracted at the same time as the target speech byte, so as to obtain a plurality of target speech bytes.

Further, it may be configured to sequentially decrease the voice byte threshold, for example, sequentially decrease from 5 to 4, 3, 2, and 1, and repeatedly extract the target voice bytes corresponding to the corresponding voice byte threshold from the beginning and the end of each voice segment until the voice byte threshold becomes 0, that is, extract 5 voice bytes, 4 voice bytes, 3 voice bytes, 2 voice bytes, and 1 voice byte from the beginning and the end of each voice segment as the target voice bytes, so as to obtain the target voice bytes with different voice byte numbers.

S206: and calculating the repetition times of the target voice byte and recording the repetition times.

S207: and if the repetition times reach a preset second quantity threshold value, the target voice byte is used as the idiom of the user, and the idiom is stored.

Specifically, if the same target speech byte exists in each target speech byte after analysis, the occurrence frequency of the speech byte, that is, the repetition frequency, is calculated, and when the repetition frequency exceeds a preset number threshold, for example, 5 times, the corresponding target speech byte is stored as the idiom of the user, so that the user can query the analysis result or directly push the analysis result containing the idiom of the user to the user.

Optionally, a reminding time may be preset, for example, nine nights per day, and when the reminding time arrives, result information such as the user idioms and the corresponding repetition times thereof obtained from the obtained analysis result is pushed to the current terminal.

In a specific embodiment, a prohibition speech library may also be preset, and the speech library may preset speech segments carrying prohibition indications, that is, some familiar non-verbal utterances, such as "lean", "operate", "younger sister", and other speech bytes. Optionally, if the idiom is parsed into voice bytes that need to be prohibited, such as some non-verbal utterances, a prohibition instruction may be generated, and the idiom carrying the prohibition instruction may be added to the prohibited voice library as a prohibited voice segment.

Further, if it is detected that the voice data corresponding to the voice signal sent by the user matches any one of the voice segments in the forbidden voice library, a message prompt may be sent to remind the user of the related utterance. Specifically, the message prompt may include a short message, a ring tone, or a vibration type prompt, which is not limited in the embodiment of the present invention.

By implementing the embodiment of the invention, the corresponding voice data can be triggered and obtained when the identity of the user sending the voice signal at present is detected to be legal, the voice segments are obtained by carrying out segmentation processing on the voice data, and representative utterances are screened out from the beginning and/or the end of each voice segment, so that the idiomatic expression of the user at present is obtained by analysis, and is pushed to related users in a targeted manner, and further, the method can be used for reminding the user when the user is detected to say the spoken Buddhist like a Chinese explication at the subsequent time.

Referring to fig. 3, a flowchart of a method for obtaining a target voice byte according to an embodiment of the present invention is shown, and specifically, the method includes:

s301: and screening out a target voice segment with the voice byte number larger than or equal to a preset voice byte threshold value from the voice segments.

S302: if the number of the screened target voice fragments is not smaller than a preset first number threshold, extracting voice bytes with the number corresponding to the voice byte threshold from the beginning or the end of the target voice fragments as target voice bytes.

For example, if the threshold of the voice byte is set to 5 and the threshold of the number corresponding to the voice segment is set to 6, the voice segment with the voice byte greater than or equal to 5 can be screened out from the voice segment, and when the screening reaches 6 voice segments, the first 5 voice bytes and/or the last 5 voice bytes of the 6 voice segments can be triggered to be extracted as the target voice byte.

Optionally, for a voice segment whose voice byte is smaller than the preset voice byte threshold value in the divided voice segments, the voice segment whose voice byte is smaller than the voice byte threshold value may be used as a guess about the upcoming occurrence of the non-verbal utterance, the voice segment smaller than the voice byte threshold value is compared with each voice segment in a preset prohibited voice library, if a match between the voice segment and the voice segment is detected, the voice segment smaller than the voice byte threshold value may be used as the non-verbal utterance, and the non-verbal utterance and the occurrence frequency thereof are stored, so that a user may perform subsequent query or push the occurring non-verbal utterance and the occurrence frequency thereof to a current user.

S303: and sequentially decreasing the voice byte threshold value, and judging whether the decreased voice byte threshold value is zero or not.

Further, it may be set to sequentially decrement the speech byte threshold, for example, sequentially decrement from 5 to 4, 3, 2, 1, and repeatedly perform step S302 until the speech byte threshold becomes 0, that is, 5, 4, 3, 2, and 1 speech bytes are extracted from the beginning and/or the end of the screened target speech segment as target speech bytes, respectively.

S304: and obtaining the target voice byte.

If the voice byte threshold value becomes 0, it can indicate that the extraction operation of the target voice byte is ended, thereby obtaining the target voice byte with different voice byte numbers.

For example, if the following speech segments are obtained by screening:

1. this lesson is immediately initiated.

2. The students then quickly review what they were saying in the course.

3. Better, it is not seen first.

4. Then open your book to turn to fifty-fifth page.

5. And then look at the prompts there.

6. This lesson begins.

The number threshold corresponding to the voice segment is 6, and the voice byte threshold is set to 5, that is, 6 words accumulated continuously can be used as a comparison unit, and each word satisfies that the voice byte is greater than or equal to 5.

For the above 6 sentences, based on the speech byte threshold 5, the speech bytes corresponding to "this lesson immediately" and "to start", "then students" and "content to speak", "see first" and "see first", "then open you" and "fifteenth page", "then see" and "prompt content", and "this lesson start" and "lesson start" can be extracted from the beginning and end of each sentence respectively as target speech bytes, and the extracted target speech bytes are parsed.

In a specific embodiment, each target speech byte can be parsed by comparing the extracted speech byte at the beginning and the extracted speech byte at the end of each sentence. For example, the first sentences, i.e. the beginning of each sentence, i.e. "this lesson is right now", "then classmates", "do nothing first", "then open you", "then see", "this lesson starts", can be compared and none of the 6 first sentences is found to be identical; further, comparing the tail sentences, namely the end of each sentence, namely "about to start", "content to speak", "not see", "fifty-five pages", "prompt content", "start", and finding that none of the six tail sentences is the same, the speech byte threshold value can be set to be decremented from 5 to 4.

According to the speech byte threshold of 4, the first sentences "this lesson horse", "then classmate", "good first", "then open", "then see one", "this lesson open" can be compared, and none of the six first sentences can be found to be the same; further, comparing the tail sentences "begin", "content" not see "," fifty-five pages "," content "and" begin ", finding that none of the six tail sentences is the same, then setting the speech byte threshold to be decremented from 4 to 3, and so on.

Until the voice byte threshold is decreased to 2, the first sentence of 6 sentences is found to have three occurrences of 'then', at which time the voice byte corresponding to 'then' is saved, and the corresponding repetition times 3 are recorded, namely 3 occurrences.

Finally, the threshold value of the voice byte is decreased from 2 to 1, the voice byte corresponding to the voice byte is stored and the repetition times of the voice byte are recorded as 2 when the voice byte appears twice in the first sentence; the word "ran" appears three times, and the repetition number 3 is recorded; and finding that the 'two' appears four times in the tail sentence, saving the voice bytes corresponding to the 'two' and recording the repetition number of 4. Further, the repeat number of the "then" is the same as the repeat number of the "then" when the speech byte threshold is 2, and is 3, that is, the repeat number is not higher than the occurrence number of the "then", and the "then" includes the "then", the related record of the "then" can be directly discarded, otherwise, the "then" and the repeat number are recorded.

Based on the analysis, the analyzed user habit words are spoken with ' this ', ' then ' and '. Further, if the number threshold corresponding to the number of repetitions is set to 3, "then" and "may be stored as idioms of the user.

Further, the above-mentioned parsing process may be performed for 6 sentences having a subsequent speech byte greater than or equal to 5, and a parsing result including the Buddhist of the user is obtained, and if the monitored Buddhist is consistent with the previous Buddhist, the number of occurrences of the Buddhist may be accumulated, and a certain number of occurrences may be exceeded within a preset time range, for example, 20 occurrences within 3 hours, and a message is issued to notify the current user of a serious warning.

By implementing the embodiment of the invention, the voice segments with more than a certain number of bytes of voice can be screened out, the target voice bytes with corresponding number of bytes can be respectively extracted from the beginning and the end of each voice segment according to the descending order of the preset number of bytes, and whether repeated bytes exist in each target voice byte is analyzed, so that the idioms of the current user can be analyzed and obtained, and the pertinence is strong.

Please refer to fig. 4, which is an interaction diagram of a method for acquiring idioms according to an embodiment of the present invention, the method includes:

s401: if the terminal detects a voice signal sent by a user, the terminal acquires voice data corresponding to the voice signal.

Optionally, if the terminal detects a voice signal sent by the user, the terminal acquires voice data corresponding to the voice signal, which may specifically be: if the terminal detects a voice signal sent by a user, the terminal acquires a sound attribute corresponding to the voice signal; the terminal judges whether the sound attribute corresponding to the voice signal is matched with the sound attribute corresponding to a preset voice sample, wherein the voice sample is a sound fragment of a legal user, and the sound attribute comprises any one or more of a speech speed, a tone and a frequency; and if the terminal judgment result is matching, namely when the current user is detected to be a legal user, the terminal triggers to acquire the voice data corresponding to the voice signal.

S402: and the terminal sends the voice data to a server.

S403: and the server receives the voice data sent by the terminal and screens out target voice bytes with the number corresponding to the voice byte threshold from the voice data according to a preset voice byte threshold.

In a specific embodiment, a voice byte threshold may be preset, and a target voice byte may be extracted from the acquired voice data according to the threshold.

Optionally, the server, according to a preset voice byte threshold, screens out a target voice byte corresponding to the voice byte threshold from the voice data, which may specifically be: the server segments the voice data according to a preset pause time interval to obtain voice fragments, wherein the voice data comprises at least one voice fragment; and the server extracts the voice bytes with the number corresponding to the voice byte threshold value from the beginning or the end of the voice segment respectively according to a preset voice byte threshold value to be used as target voice bytes.

It should be noted that, the step of obtaining the target voice byte in S403 may also be executed by the terminal, that is, after the terminal selects the target voice byte with the number corresponding to the voice byte threshold from the voice data according to the preset voice byte threshold, the obtained target voice byte is sent to the server, so that the server parses the target voice byte.

S404: and the server analyzes the target voice byte and acquires an analysis result containing the idioms of the user.

Optionally, the server parses the target speech byte and obtains a parsing result including the idiom of the user, which may specifically be: the server calculates the repetition times of the target voice bytes and records the repetition times; and if the server detects that the repetition times reach a preset quantity threshold value, the server takes the target voice byte as the idiom of the user and stores the idiom.

S405: and the server pushes the analysis result to the terminal.

Furthermore, the server may further push the analysis results, such as the analyzed user idioms and the repeated times of the idioms, to the current terminal.

Referring to fig. 5, a flowchart of a method for acquiring a second idiom according to an embodiment of the present invention is shown, where the method is specifically applicable to a server, and specifically, the method includes:

s501: the server receives voice data sent by the terminal, and selects target voice bytes with the number corresponding to the voice byte threshold from the voice data according to a preset voice byte threshold.

The voice data is the voice data corresponding to the voice signal acquired by the terminal when the voice signal sent by the user is detected.

In a specific embodiment, the server may segment the voice data according to a preset pause time interval to obtain a voice segment, and extract, according to a preset voice byte threshold, voice bytes of a number corresponding to the voice byte threshold from the beginning or the end of the voice segment as target voice bytes. Wherein the voice data comprises at least one voice segment;

specifically, the server may preset a speech byte threshold, and extract the target speech byte from a specific position, such as the beginning and/or end, of each divided speech segment according to the threshold. For example, if the speech byte threshold is set to 5, the first 5 bytes and the last 5 bytes of the speech segment can be extracted at the same time as the target speech byte, so as to obtain a plurality of target speech bytes.

S502: and the server analyzes the target voice byte and acquires an analysis result containing the idioms of the user.

In a specific embodiment, the server may calculate the number of repetitions of the target voice byte and record the number of repetitions; and if the server detects that the repetition times reach a preset quantity threshold value, the server takes the target voice byte as the idiom of the user and stores the idiom.

By implementing the embodiment of the invention, the server can analyze the target voice byte screened from the voice data when receiving the voice data sent by the terminal, thereby obtaining the idioms of the current user, being capable of obtaining the idioms of the related users with pertinence and having stronger flexibility.

Please refer to fig. 6, which is a schematic structural diagram of a idiom obtaining apparatus according to an embodiment of the present invention, where the apparatus may be specifically disposed in a terminal device such as a mobile phone, a tablet computer, a wearable device, or disposed in a server, and the embodiment of the present invention is not limited. Specifically, the apparatus includes a first acquiring unit 11, a screening unit 12, and a second acquiring unit 13. Wherein,

the first obtaining unit 11 is configured to, if a voice signal sent by a user is detected, obtain voice data corresponding to the voice signal.

In a specific embodiment, the first obtaining unit 11 may detect whether there is a voice signal sent by a user currently, and when the voice signal is detected, trigger obtaining of voice data corresponding to the voice signal, for example, obtain the voice data through recording.

The screening unit 12 is configured to screen, according to a preset voice byte threshold, a target voice byte with a number corresponding to the voice byte threshold from the voice data acquired by the first acquiring unit 11.

In an embodiment, a voice byte threshold may be preset, and the filtering unit 12 may extract the target voice byte from the acquired voice data according to the threshold. Generally, each word spoken by a user corresponds to one phonetic byte, for example, the user speaks "how you are", and corresponds to three phonetic bytes.

Alternatively, the voice data acquired by the first acquiring unit 11 may be a sentence, and the filtering unit 12 may extract a number of voice bytes corresponding to the threshold from a specific position of the sentence, such as the beginning and/or the end, as the target voice byte according to the preset threshold. That is, each time a sentence is obtained, for example, each recorded sentence is obtained, the filtering unit 12 may perform the filtering operation on the target speech byte, so as to filter a certain number of target speech bytes.

Further optionally, the voice data acquired by the first acquiring unit 11 may also be a segment of speech (i.e. composed of multiple sentences), and the screening unit 12 may perform segmentation processing on the recorded voice data according to a preset pause time interval to obtain multiple voice segments (i.e. one voice segment may correspond to one sentence). Accordingly, if the speech byte threshold is set to 5, the filtering unit 12 may extract 5 speech bytes from a specific position of each speech segment as the target speech byte, for example, extract the first 5 bytes and/or the last 5 bytes of the speech segment as the target speech byte, thereby obtaining a plurality of target speech bytes.

A second obtaining unit 13, configured to analyze the target speech byte screened by the screening unit 12, and obtain an analysis result including the idioms of the user.

Specifically, if the second obtaining unit 13 analyzes that the target speech bytes are the same, that is, some target speech bytes repeatedly appear, the number of occurrences of the speech bytes, that is, the number of repetitions may be calculated, and when the number of repetitions exceeds a preset number threshold, for example, 5 times, the corresponding target speech bytes are stored as the idioms of the user.

Further, the second obtaining unit 13 may further push the analyzed user idioms and the repetition times of the idioms to the current terminal.

By implementing the embodiment of the invention, when the voice signal sent by the user is detected, the corresponding voice data can be recorded, and the idiom of the current user can be obtained by analyzing the target voice byte screened from the recorded voice data, so that the idiom of the related user can be obtained in a targeted manner, and the flexibility is strong.

Referring to fig. 7, a schematic structural diagram of another idiom obtaining apparatus according to an embodiment of the present invention is shown, where the apparatus includes a first obtaining unit 11, a screening unit 12, and a second obtaining unit 13 of the idiom obtaining apparatus, and further, in an embodiment of the present invention, the first obtaining unit 11 may include:

the information acquiring unit 111 is configured to, if a voice signal sent by a user is detected, acquire a sound attribute corresponding to the voice signal;

a judging unit 112, configured to judge whether the sound attribute corresponding to the voice signal acquired by the information acquiring unit 111 matches a sound attribute corresponding to a preset voice sample.

Wherein, the sound attribute comprises any one or more items of speed of speech, tone and frequency.

A data obtaining unit 113, configured to obtain the voice data corresponding to the voice signal when the determination result of the determining unit 112 is a match.

Specifically, when the information obtaining unit 111 detects a voice signal sent by the user, that is, when a person speaking is detected, the sound attribute corresponding to the voice signal can be obtained, and the sound attribute of the voice signal is compared with the sound attribute of the voice sample by the determining unit 112, for example, whether the sound color and the frequency corresponding to the two are matched is determined, so as to determine the validity of the current user identity, and when the determination result is matching, that is, the current user identity is valid, the voice data corresponding to the voice signal is obtained by the data obtaining unit 113.

Further, in the embodiment of the present invention, the screening unit 12 may include:

a data segmenting unit 121, configured to segment the voice data according to a preset pause time interval, so as to obtain a voice segment.

Wherein the voice data comprises at least one voice segment.

When the determination result of the determining unit 112 is matching, that is, the user currently sending the voice signal is a legal user, the data obtaining unit 113 may obtain corresponding voice data, for example, record the voice data through the data obtaining unit 113. Specifically, the voice data may be a whole voice, that is, the voice data includes a plurality of voice segments, and the data segmenting unit 121 may segment the voice data in a preset segmentation manner, for example, segment the voice data according to a pause time interval between voice bytes in the voice data, for example, 200ms, to obtain a voice segment (the voice segment may correspond to a sentence). Further, if the voice data recorded by the first obtaining unit 11 is only one speech, the data segmenting unit 121 may use the speech as one voice segment, that is, each time the first obtaining unit 11 records one speech, the data segmenting unit 121 may use the speech as one voice segment, so as to obtain the voice segments with the preset number threshold.

A data extracting unit 122, configured to extract, according to a preset voice byte threshold, voice bytes of a number corresponding to the voice byte threshold from the beginning or the end of the voice segment divided by the data segmenting unit 121 as target voice bytes.

In a specific embodiment, the data extracting unit 122 may extract the target voice byte from a specific position, such as the beginning and/or the end, of each divided voice segment according to a preset voice byte threshold. For example, if the speech byte threshold is set to 5, the data extracting unit 122 may extract the first 5 bytes and the last 5 bytes of the speech segment as the target speech bytes simultaneously, thereby obtaining a plurality of target speech bytes.

Optionally, the data extracting unit 122 may be specifically configured to:

screening out a target voice fragment with the voice byte number larger than or equal to a preset voice byte threshold value from the voice fragments; if the number of the screened target voice fragments is not smaller than a preset first number threshold, extracting voice bytes with the number corresponding to the voice byte threshold from the beginning or the end of the target voice fragments as target voice bytes.

For example, if the speech byte threshold is set to 5 and the number threshold corresponding to the speech segments is set to 6, the data extracting unit 122 may screen out the speech segments with speech bytes greater than or equal to 5 from the speech segments, and may extract the first 5 speech bytes and/or the last 5 speech bytes of the 6 speech segments as the target speech bytes through the speech obtaining subunit 1222 when the 6 speech segments are screened out.

Further, in the embodiment of the present invention, the apparatus may further include:

the control unit 14 is configured to control to sequentially decrement the speech byte threshold, and notify the data extraction unit 122 to extract speech bytes with a number corresponding to the speech byte threshold from the beginning or the end of the target speech segment as target speech bytes until the speech byte threshold is zero.

Further, the control unit 14 may set to sequentially decrement the voice byte threshold, for example, sequentially decrement from 5 to 4, 3, 2, and 1, and notify the data extraction unit 122 to extract the target voice bytes corresponding to the corresponding voice byte threshold from the beginning and the end of each voice segment until the voice byte threshold becomes 0, that is, notify the data extraction unit 122 to extract 5 voice bytes, 4 voice bytes, 3 voice bytes, 2 voice bytes, and 1 voice byte from the beginning and the end of each voice segment as the target voice bytes, thereby acquiring the target voice bytes with different voice byte numbers.

Further, in the embodiment of the present invention, the second obtaining unit 13 may include:

a calculating unit 131, configured to calculate a repetition number of the target voice byte, and record the repetition number;

and an information storage unit 132, configured to, if it is detected that the repetition number reaches a preset second number threshold, use the target speech byte as the idiom of the user, and store the idiom.

Specifically, if the same target speech byte exists in each target speech byte after the analysis, the number of occurrences of the speech data, that is, the number of repetitions, may be calculated by the calculating unit 131, and when the number of repetitions exceeds a preset number threshold, for example, 5 times, the corresponding target speech byte is stored as the idiom of the user by the information storage unit 132, so that the user may perform the analysis result query or directly push the analysis result including the idiom of the user to the user.

By implementing the embodiment of the invention, the corresponding voice data can be triggered and obtained when the identity of the user sending the voice signal at present is detected to be legal, the voice segments are obtained by carrying out segmentation processing on the voice data, and representative utterances are screened out from the beginning and the end of each voice segment, so that the idioms of the current user are obtained by analysis, and the idioms are pushed to related users in a targeted manner.

Please refer to fig. 8, which is a schematic structural diagram of another idiom obtaining apparatus according to an embodiment of the present invention, the apparatus may be specifically configured in a server, and specifically, the apparatus includes a screening unit 21 and an obtaining unit 22. Wherein,

the screening unit 21 is configured to screen, according to a preset voice byte threshold, a target voice byte with a number corresponding to the voice byte threshold from the voice data sent by the terminal.

In a specific embodiment, a voice byte threshold may be preset, and the filtering unit 12 may extract a target voice byte from the acquired voice data according to the threshold.

The obtaining unit 22 is configured to analyze the target speech byte screened by the screening unit 21, and obtain an analysis result including the idioms of the user.

Further, in an embodiment of the present invention, the screening unit 21 may include:

a data segmenting unit 211, configured to segment the voice data according to a preset pause time interval to obtain a voice segment, where the voice data includes at least one voice segment;

a data extracting unit 212, configured to extract, according to a preset voice byte threshold, voice bytes of a number corresponding to the voice byte threshold from the beginning or the end of the voice segment divided by the data segmenting unit 211, respectively, as target voice bytes.

In a specific embodiment, the data extraction unit 212 may extract the target speech byte from a specific position, such as the beginning and/or the end, of each speech segment divided by the data segmentation unit 211 according to a preset speech byte threshold. For example, if the speech byte threshold is set to 5, the data extraction unit 212 may extract the first 5 bytes and the last 5 bytes of the speech segment as the target speech bytes simultaneously, thereby obtaining a plurality of target speech bytes.

Optionally, the data extraction unit 212 may be specifically configured to:

Further, in the embodiment of the present invention, the obtaining unit 22 may include:

a calculating unit 221, configured to calculate a repetition number of the target voice byte, and record the repetition number;

and an information storage unit 222, configured to, if it is detected that the repetition number reaches a preset number threshold, use the target speech byte as an idiom of the user, and store the idiom.

Specifically, if the same target speech byte exists in each target speech byte after the analysis, the number of occurrences of the speech data, that is, the number of repetitions may be calculated by the calculating unit 221, and when the number of repetitions exceeds a preset number threshold, for example, 5 times, the corresponding target speech byte is stored as the idiom of the user by the information storage unit 222, so that the user may perform the analysis result query or directly push the analysis result including the idiom of the user to the user.

Further, please refer to fig. 9, which is a schematic structural diagram of a terminal according to an embodiment of the present invention. As shown in fig. 9, the terminal includes: at least one processor 100, e.g., a CPU, at least one user interface 300, memory 400, at least one communication bus 200. Wherein a communication bus 200 is used to enable the connection communication between these components. The user interface 300 may include a Display (Display) and a Keyboard (Keyboard), and the optional user interface 300 may also include a standard wired interface and a standard wireless interface. The memory 400 may be a high-speed RAM memory or a non-volatile memory (e.g., at least one disk memory). The memory 400 may optionally be at least one memory device located remotely from the processor 100. Wherein the processor 100 may be combined with the idiom obtaining apparatus described in fig. 6 and 7, the memory 400 stores a set of program codes, and the processor 100 calls the program codes stored in the memory 400 to perform the following operations:

In an alternative embodiment, when the processor 100 calls the program code stored in the memory 400 and detects a voice signal sent by a user, the method for acquiring the voice data corresponding to the voice signal may specifically be:

if a voice signal sent by a user is detected, acquiring a sound attribute corresponding to the voice signal;

judging whether the sound attribute corresponding to the voice signal is matched with the sound attribute corresponding to a preset voice sample, wherein the voice sample is recorded by a legal user, and the sound attribute comprises any one or more of the speed, tone and frequency;

and if so, acquiring voice data corresponding to the voice signal.

Further optionally, the processor 100 calls the program code stored in the memory 400 to screen out, according to a preset voice byte threshold, a target voice byte corresponding to the voice byte threshold from the voice data, which may specifically be:

segmenting the voice data according to a preset pause time interval to obtain voice segments, wherein the voice data comprises at least one voice segment;

and respectively extracting the voice bytes with the number corresponding to the voice byte threshold from the beginning or the end of the voice segment as target voice bytes according to a preset voice byte threshold.

In an optional embodiment, the processor 100 calls the program code stored in the memory 400 to extract, according to a preset speech byte threshold, speech bytes of a number corresponding to the speech byte threshold from the beginning or the end of the speech segment as target speech bytes, which may specifically be:

screening out a target voice fragment with the voice byte number larger than or equal to a preset voice byte threshold value from the voice fragments;

if the number of the screened target voice fragments is not smaller than a preset first number threshold, extracting voice bytes with the number corresponding to the voice byte threshold from the beginning or the end of the target voice fragments as target voice bytes.

In an alternative embodiment, processor 100 may also perform the following steps:

sequentially decreasing the voice byte threshold values;

and repeatedly executing the step of extracting the voice bytes with the number corresponding to the voice byte threshold from the beginning or the end of the target voice segment respectively as target voice bytes until the voice byte threshold is zero.

In an alternative embodiment, the processor 100 calls the program code stored in the memory 400 to parse the target speech byte, and obtains a parsing result including the idioms of the user, which may specifically be:

calculating the repetition times of the target voice byte and recording the repetition times;

and if the repetition times reach a preset second quantity threshold value, the target voice byte is used as the idiom of the user, and the idiom is stored.

Specifically, the terminal described in this embodiment may be used to implement part or all of the flow of the method for acquiring idioms described in conjunction with fig. 1 to 4 in the present invention.

Further, please refer to fig. 10, which is a schematic structural diagram of a server according to an embodiment of the present invention. As shown in fig. 10, the server includes: at least one processor 500, e.g., a CPU, at least one user interface 700, memory 800, at least one communication bus 600. Wherein a communication bus 600 is used to enable the connection communication between these components. The user interface 700 may include a standard wired interface, a wireless interface, among others. Memory 800 may be a high-speed RAM memory or a non-volatile memory, such as at least one disk memory. The memory 800 may optionally be at least one memory device located remotely from the processor 500. Wherein the processor 500 may be combined with the idiom obtaining apparatus described in fig. 6 and 7, the memory 800 stores a set of program codes, and the processor 500 calls the program codes stored in the memory 800 to perform the following operations:

In an alternative embodiment, when the processor 500 calls the program code stored in the memory 800 and detects a voice signal sent by a user, the processor 500 obtains voice data corresponding to the voice signal, which may specifically be:

and if so, acquiring voice data corresponding to the voice signal.

Further optionally, the processor 500 calls the program code stored in the memory 800 to screen out, according to a preset voice byte threshold, a target voice byte corresponding to the voice byte threshold from the voice data, which may specifically be:

In an optional embodiment, the processor 500 calls the program code stored in the memory 800 to extract, according to a preset speech byte threshold, speech bytes with a number corresponding to the speech byte threshold from the beginning or the end of the speech segment as target speech bytes, which may specifically be:

In an alternative embodiment, the processor 500 may also perform the following steps:

sequentially decreasing the voice byte threshold values;

In an alternative embodiment, the processor 500 invokes the program code stored in the memory 800 to parse the target speech byte and obtain a parsing result containing the idioms of the user, which may specifically be:

Specifically, the server described in this embodiment may be used to implement part or all of the processes in the method embodiment of acquiring idioms described in conjunction with fig. 1 to 4.

Further, please refer to fig. 11, which is a schematic structural diagram of a idiom obtaining system according to an embodiment of the present invention, the system includes: a terminal 1 and a server 2; wherein,

the terminal 1 is configured to, if a voice signal sent by a user is detected, acquire voice data corresponding to the voice signal, and send the voice data to the server 2;

the server 2 is configured to receive the voice data sent by the terminal 1, and screen out a target voice byte from the voice data according to a preset voice byte threshold, where the target voice byte is in a number corresponding to the voice byte threshold; and analyzing the target voice byte and acquiring an analysis result containing the idioms of the user.

In an optional embodiment, the terminal 1 may be further configured to, if a voice signal sent by a user is detected, obtain a sound attribute corresponding to the voice signal; judging whether the sound attribute corresponding to the voice signal is matched with the sound attribute corresponding to a preset voice sample, wherein the voice sample is a sound fragment of a legal user, and the sound attribute comprises any one or more of a speech speed, a tone and a frequency; and if so, acquiring voice data corresponding to the voice signal.

In an optional embodiment, the server 2 may be further configured to segment the voice data according to a preset pause time interval to obtain a voice segment, where the voice data includes at least one voice segment; and respectively extracting the voice bytes with the number corresponding to the voice byte threshold from the beginning or the end of the voice segment as target voice bytes according to a preset voice byte threshold.

Specifically, the server 2 may screen out a target voice segment whose number of voice bytes is greater than or equal to a preset voice byte threshold from the voice segments, and extract, when the number of the screened target voice segment is not less than a preset first number threshold, for example, 6 voice bytes, as target voice bytes, the number of voice bytes corresponding to the voice byte threshold from the beginning or the end of the target voice segment.

Further, the server 2 may control to sequentially decrease the voice byte threshold, and repeatedly perform the step of extracting the voice bytes with the number corresponding to the voice byte threshold from the beginning or the end of the target voice segment as the target voice bytes until the voice byte threshold is zero, so as to obtain a plurality of target voice bytes with different voice byte numbers.

In an optional embodiment, the server 2 may be further configured to calculate a repetition number of the target voice byte, and record the repetition number; and if the repetition times reach a preset second quantity threshold value, the server 2 takes the target voice byte as the idiom of the user and stores the idiom.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

It should be noted that, in the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to relevant descriptions of other embodiments for parts that are not described in detail in a certain embodiment. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and elements referred to are not necessarily required to practice the invention.

The steps in the method of the embodiment of the invention can be sequentially adjusted, combined and deleted according to actual needs.

The modules or units in the device of the embodiment of the invention can be combined, divided and deleted according to actual needs.

The modules or units in the embodiments of the present invention may be implemented by a general-purpose integrated Circuit, such as a CPU (Central Processing Unit), or an ASIC (Application specific integrated Circuit).

The text information display method and the terminal provided by the embodiment of the invention are described in detail, a specific example is applied in the text to explain the principle and the implementation of the invention, and the description of the embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A method for acquiring idioms, comprising:

2. The method of claim 1, wherein the obtaining voice data corresponding to the voice signal if the voice signal sent by the user is detected comprises:

judging whether the sound attribute corresponding to the voice signal is matched with the sound attribute corresponding to a preset voice sample, wherein the voice sample is a sound fragment of a legal user, and the sound attribute comprises any one or more of a speech speed, a tone and a frequency;

and if so, acquiring voice data corresponding to the voice signal.

3. The method of claim 1, wherein the screening out a number of target voice bytes corresponding to the voice byte threshold from the voice data according to a preset voice byte threshold comprises:

4. The method of claim 3, further comprising:

sequentially decreasing the voice byte threshold values;

and repeatedly executing the step of extracting the voice bytes with the number corresponding to the voice byte threshold value from the beginning or the end of the voice segment respectively as target voice bytes until the voice byte threshold value is zero.

5. The method of claim 1, wherein the parsing the target speech byte and obtaining a parsing result containing idioms of the user comprises:

6. An idiom acquisition apparatus, comprising:

7. The apparatus of claim 6, wherein the first obtaining unit comprises:

the information acquisition unit is used for acquiring the sound attribute corresponding to the voice signal if the voice signal sent by the user is detected;

the judging unit is used for judging whether the sound attribute corresponding to the voice signal acquired by the information acquiring unit is matched with the sound attribute corresponding to a preset voice sample, wherein the voice sample is a sound fragment of a legal user, and the sound attribute comprises any one or more of a speech speed, a tone and a frequency;

and the data acquisition unit is used for acquiring the voice data corresponding to the voice signal when the judgment result of the judgment unit is matched.

8. The apparatus of claim 6, wherein the screening unit comprises:

the data segmentation unit is used for segmenting the voice data according to a preset pause time interval to obtain voice segments, and the voice data comprises at least one voice segment;

and the data extraction unit is used for extracting the voice bytes with the number corresponding to the voice byte threshold value from the beginning or the end of the voice segment divided by the data segmentation unit respectively according to a preset voice byte threshold value as target voice bytes.

9. The apparatus of claim 8, further comprising:

and the control unit is used for controlling the voice byte threshold to be sequentially decreased, and informing the data extraction unit to extract the voice bytes with the number corresponding to the voice byte threshold from the beginning or the end of the voice segment as target voice bytes until the voice byte threshold is zero.

10. The apparatus of claim 6, wherein the second obtaining unit comprises:

the calculating unit is used for calculating the repetition times of the target voice bytes and recording the repetition times;

and the information storage unit is used for taking the target voice byte as the idiom of the user and storing the idiom if the repetition times reaches a preset second quantity threshold value.