CN110414591B

CN110414591B - Data processing method and equipment

Info

Publication number: CN110414591B
Application number: CN201910672042.2A
Authority: CN
Inventors: 闫桂霞; 王晓利; 林媛; 周明
Original assignee: Tencent Technology Wuhan Co Ltd
Current assignee: Tencent Technology Wuhan Co Ltd
Priority date: 2019-07-24
Filing date: 2019-07-24
Publication date: 2022-07-12
Anticipated expiration: 2039-07-24
Also published as: CN110414591A

Abstract

The embodiment of the application discloses a data processing method and equipment, wherein the method comprises the following steps: acquiring feedback data of at least two time periods; clustering the feedback data of the at least two time periods to generate a plurality of data classifications and classification keywords corresponding to each data classification; determining the proportion of the target quantity of the target feedback data in the data classification to the total quantity of the feedback data in the data classification as a feedback proportion; the target feedback data refers to feedback data belonging to a target time period in a data classification, and the at least two time periods comprise the target time period; determining data classification for alarming according to the feedback proportion of each data classification and the classification key words of each data classification, and taking the data classification as target data classification; and generating alarm information according to the target data classification. By adopting the method and the device, the missing report rate of the alarm system can be reduced, the sensitivity is improved, and the occurrence rate of operation accidents is reduced.

Description

Data processing method and equipment

Technical Field

The present application relates to the field of electronic technologies, and in particular, to a data processing method and device.

Background

With the advent of the mobile internet era, the level of mobility of the whole society is gradually improved, the internet industry has many products with huge volume, for better user experience and timely updating and upgrading of the products, various internet products are provided with user feedback modules, the feedback mode is mainly character feedback, the character feedback can collect suggestions of users to the products or complaints of users, the user feedback modules have great significance on product performance, particularly new performance and supervision of new modules, and simultaneously have great effect on discovery and supervision of program bugs, an alarm system generates alarm information according to the feedback content and feeds the alarm information back to developers in time, therefore, the alarm system should report corresponding problems as early as possible, the new problems can be perceived as far as possible when the new problems do not reach extreme points, at present, a small amount of user feedback brought by some sudden problems is provided, the alarm system has low perception degree, slow response, incapability of timely processing the accident at the early stage, high missing report rate and low sensitivity, and is easy to cause operation accidents.

Disclosure of Invention

The embodiment of the application provides a data processing method and equipment, which can reduce the rate of missing report of an alarm system, improve the sensitivity and reduce the occurrence rate of operation accidents.

An aspect of the present application provides a data processing method, which may include:

acquiring feedback data of at least two time periods;

clustering the feedback data of the at least two time periods to generate a plurality of data classifications and classification keywords corresponding to each data classification;

determining the proportion of the target quantity of the target feedback data in the data classification to the total quantity of the feedback data in the data classification as a feedback proportion; the target feedback data refers to feedback data belonging to a target time period in a data classification, and the at least two time periods comprise the target time period;

determining data classification for alarming according to the feedback proportion of each data classification and the classification key words of each data classification, and taking the data classification as target data classification;

and generating alarm information according to the target data classification.

Wherein the obtaining of the feedback data of at least two time periods comprises:

acquiring feedback data of a target time period according to a time period backtracking data mode;

and acquiring feedback data of the associated time period associated with the target time period in a time period backtracking data mode.

Wherein, still include:

detecting a first data quantity of feedback data of a target time period, and when the first data quantity is smaller than a first quantity threshold value, acquiring the feedback data of the first quantity threshold value according to a quantity backtracking data mode;

and detecting a second data quantity of the feedback data of the associated time period, and when the second data quantity is smaller than a second quantity threshold, acquiring the feedback data of the second quantity threshold according to a quantity backtracking data mode.

The clustering the feedback data group to generate a plurality of data classifications and classification keywords corresponding to each data classification includes:

performing word segmentation on the feedback data of the at least two time periods to generate word segmentation data, and performing clustering processing on the feedback data to generate a plurality of data classifications according to the use frequency of the word segmentation data in the feedback data;

and generating a classification keyword corresponding to each data classification according to the word segmentation data corresponding to each data classification.

Wherein, the determining the data classification for alarming according to the feedback proportion and the classification key word of each data classification as the target data classification comprises:

determining the data classification with the feedback proportion larger than a proportion threshold value as a data classification to be alarmed, and storing the data classification to be alarmed into a data set to be alarmed;

and performing alarm duplication removal screening on the data to be alarmed in the data set to be alarmed, and determining the data to be alarmed after the alarm duplication removal screening as a target data classification.

Wherein, the classifying the data to be alarmed in the data set to be alarmed is subjected to alarm duplication elimination screening, and the method comprises the following steps:

detecting the keyword repetition degree between the classified keywords of the data classification to be alarmed and the classified keywords of the alarm data classification in the alarm list; the alarm data in the alarm list is classified into the data classification which is alarmed;

determining the data classification to be alarmed with the keyword repetition degree larger than the repetition degree threshold value as a first data classification to be alarmed;

determining the data classification to be alarmed with the keyword repetition degree smaller than or equal to a repetition degree threshold value and the cosine distance larger than a distance threshold value as a second data classification to be alarmed; the cosine distance refers to the cosine distance between the classified keyword and the alarmed keyword;

and determining that the first to-be-alarmed data classification and the second to-be-alarmed data classification meet an alarm duplication elimination condition, and deleting the to-be-alarmed data classification meeting the alarm duplication elimination condition from a to-be-alarmed data set.

detecting the classified keywords of the data to be alarmed in the data set to be alarmed through a support vector machine to generate a detection result;

deleting the data to be alarmed with the detection result as the target detection result from the data set to be alarmed in a classified manner; and the multiple detection results corresponding to the support vector machine comprise the target detection result.

Wherein, the generating of the alarm information according to the target data classification includes:

packaging feedback data meeting priority conditions in the alarm data classification and classification keywords of the alarm data classification to generate alarm information;

and outputting and displaying the alarm information.

In one aspect, an embodiment of the present application provides a data processing apparatus, which may include:

the feedback data acquisition unit is used for acquiring feedback data of at least two time periods;

the feedback data clustering unit is used for clustering the feedback data of the at least two time periods to generate a plurality of data classifications and classification keywords corresponding to each data classification;

a feedback proportion confirming unit, configured to determine, as a feedback proportion, a proportion of a target number of target feedback data in the data classification to a total amount of feedback data in the data classification; the target feedback data refers to feedback data belonging to a target time period in one data classification, and the at least two time periods comprise the target time period;

the alarm data confirmation unit is used for determining the data classification for alarming according to the feedback proportion of each data classification and the classification key word of each data classification, and the data classification is used as a target data classification;

and the alarm information generating unit is used for generating alarm information according to the target data classification.

Wherein the feedback data acquisition unit is specifically configured to:

Wherein, the feedback data obtaining unit is specifically further configured to:

Wherein the feedback data clustering unit is specifically configured to:

Wherein the alarm data confirmation unit includes:

the data storage subunit is used for determining the data classification with the feedback proportion larger than the proportion threshold value as the data classification to be alarmed and storing the data classification to the data set to be alarmed;

the data screening subunit is used for carrying out alarm duplication removal screening on the data to be alarmed in the data set to be alarmed;

and the data confirmation subunit is used for determining the data to be alarmed after the alarm duplication-elimination screening is carried out as the target data classification.

Wherein the data screening subunit is specifically configured to:

determining the data classification to be alarmed with the keyword repetition degree smaller than or equal to a repetition degree threshold value and the cosine distance larger than a distance threshold value as a second data classification to be alarmed; the cosine distance refers to the cosine distance between the classified keywords and the alarmed keywords;

Wherein, the data screening subunit is further specifically configured to:

Wherein the alarm information generating unit is specifically configured to:

and outputting and displaying the alarm information.

An aspect of the embodiments of the present application provides a computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the above-mentioned method steps.

In one aspect, an embodiment of the present application provides a data processing apparatus, including a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the above-mentioned method steps.

In the embodiment of the application, feedback data of at least two time periods are obtained; clustering the feedback data of the at least two time periods to generate a plurality of data classifications and classification keywords corresponding to each data classification; determining the proportion of the target quantity of the target feedback data in the data classification to the total quantity of the feedback data in the data classification as a feedback proportion; the target feedback data refers to feedback data belonging to a target time period in a data classification, and the at least two time periods comprise the target time period; determining data classification for alarming according to the feedback proportion of each data classification and the classification key words of each data classification, and taking the data classification as target data classification; and generating alarm information according to the target data classification. The method for clustering the feedback data of at least two time periods to generate data classification, determining the data classification for alarming according to the feedback proportion of the target time period in the data classification, and determining the data classification for alarming through clustering the feedback data of one time period has the advantages that aiming at a small amount of user feedback caused by some sudden problems, the perception degree of an alarm system is low, the response is slow, and timely processing cannot be carried out at the early stage of an accident, so that the missing report rate of the alarm system is reduced, the sensitivity of the alarm system is improved, and the occurrence rate of the operation accident is reduced.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1a is a block diagram of a data processing architecture according to an embodiment of the present disclosure;

fig. 1b is a schematic view of a data processing scenario provided in an embodiment of the present application;

fig. 2 is a schematic flowchart of a data processing method according to an embodiment of the present application;

fig. 3 is a schematic flowchart of a data processing method according to an embodiment of the present application;

fig. 4 is a schematic flowchart of a data processing method according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Fig. 1a is a block diagram of a data processing system according to an embodiment of the present invention. The server 10f establishes a connection with a user terminal cluster through the switch 10e and the communication bus 10d, and the user terminal cluster may include: user terminal 10a, user terminal 10 b. The database 10g stores feedback data sent by a plurality of user terminals, and the feedback data carries a time stamp, which is the time when the user terminals send the feedback data. The server 10f extracts the feedback data of at least two time periods from the database 10g, classifies each feedback data in an unsupervised clustering manner, and generates a plurality of data classifications and classification keywords corresponding to each data classification. The server 10f determines a ratio of the target number of the target feedback data in the data classification to the total amount of the feedback data in the data classification as a feedback ratio; the target feedback data refers to feedback data belonging to a target time period in a data classification, the feedback data in the at least two time periods can be feedback data of user terminals in the same time period on different dates, and the at least two time periods comprise the target time period; the server 10f determines a data classification for performing an alarm as a target data classification according to the feedback proportion of each data classification and the classification keyword of each data classification; generating alarm information according to the target data classification, wherein the alarm information is used for displaying information in the target data classification, the alarm information comprises classification keywords and feedback information in the target data classification, the feedback information in the alarm information can be partially displayed as required, specifically, a priority sorting mode can be adopted, developers can obtain the key information in the alarm information more conveniently, and the alarm information can also display other information, for example, the feedback information can comprise the feedback proportion and the number of the feedback information in a target time period, the number of users sending the feedback information in the target time period, and the like.

The data processing device related to the embodiment of the application can comprise a server, a terminal background and the like, and the user terminal comprises: terminal equipment such as tablet personal computers, smart phones, Personal Computers (PCs), notebook computers, palmtop computers and the like.

The following description will be made with reference to fig. 1b for a specific implementation scenario provided in the embodiments of the present application, as shown in fig. 1 b. The server 10f extracts N pieces of feedback data from the database 10g, wherein the N pieces of feedback data are respectively feedback data 1, feedback data 2, summary sentence and feedback data N, the feedback data 1 is determined as target feedback data for at least two time periods, the server 10f performs word segmentation on the feedback data to generate M classified keywords with high use frequency, the classified keywords are respectively classification keywords 1, classification keywords 2, summary sentence and classification keywords M, the N pieces of feedback data are classified in an unsupervised clustering mode to generate X pieces of data classification and classification keywords corresponding to each data classification, the data classifications are respectively data classification 1, data classification 2, summary sentence and data classification X, the server 10f determines the proportion of the target quantity of the feedback data 1 in the data classification to the total quantity of the feedback data in the data classification, determining the data classification with the feedback proportion larger than a proportion threshold value as a feedback proportion to be an alarm data classification, and storing the data classification into a data set to be alarmed, carrying out alarm duplication removal screening on the data classification to be alarmed in the data set to be alarmed by the server 10f, carrying out duplication removal screening including screening of the data classification which is alarmed and the data classification which is not civilized, determining the data classification to be alarmed after the alarm duplication removal screening as a target data classification, packaging feedback data meeting priority conditions in the target data classification and classification keywords of the alarm data classification, generating alarm information and outputting and displaying the alarm information, wherein the alarm information is used for displaying information in the target data classification, and comprises an alarm subject, an alarm abstract, a feedback proportion and an alarm subject which is a classification keyword, the "alarm summary" is part of feedback information in data classification, specifically, a priority ranking mode is adopted to display feedback information more relevant to classification keywords, so that developers can more favorably and accurately obtain key information in the alarm information, and the "feedback duty ratio" is a feedback ratio of the feedback information in a target time period and the number of users sending the feedback information in the target time period.

Referring to fig. 2, a flow chart of a data processing method according to an embodiment of the present application is schematically shown. As shown in fig. 2, the method of the embodiment of the present application may include the following steps S101 to S105.

S101, acquiring feedback data of at least two time periods;

specifically, the data processing device obtains feedback data of at least two time periods from a feedback database, and it can be understood that the data processing device may be a server in a system architecture diagram, the feedback database is used for storing feedback data, the feedback data sent by a user on an application program are all stored in the feedback database, the data processing device may obtain the feedback data in the feedback database according to a preset statistical frequency, the feedback data is a suggestion or a problem of the user for the application program, the feedback data is mainly text feedback or voice feedback, the data processing device obtains the feedback data of at least two time periods from the feedback database, the feedback data carries timestamp information, the timestamp information is feedback time for the user to feedback the feedback data, the feedback data in a certain time period is feedback information with a timestamp falling on the time period, specifically, the feedback data in the two time periods may be feedback data fed back by users in the same time period on different dates.

S102, clustering the feedback data of the at least two time periods to generate a plurality of data classifications and classification keywords corresponding to each data classification;

specifically, the data processing device clusters the feedback data of the at least two time periods to generate a plurality of data classifications and classification keywords corresponding to each data classification, it is understood that the feedback data of the at least two time periods are clustered, the feedback data can be classified by adopting an unsupervised learning mode, the clustering method includes a division method, a hierarchy method, a model algorithm and the like, the specific clustering algorithm includes a K-MEANS algorithm, a BIRCH algorithm and the like, the data processing device clusters the feedback data of the at least two time periods by adopting an unsupervised clustering mode to generate a plurality of data classifications, each data classification includes a plurality of feedback data, the feedback data in each data classification can include feedback data in any time period of the at least two time periods, the data processing device generates corresponding classification keywords according to the feedback data in each data classification, the classification keyword of each classification data may include a plurality of classification keywords.

S103, determining the proportion of the target quantity of the target feedback data in the data classification to the total quantity of the feedback data in the data classification as a feedback proportion; the target feedback data refers to feedback data belonging to a target time period in a data classification, and the at least two time periods comprise the target time period;

specifically, the data processing device determines a ratio of a target number of target feedback data in the data classification to a total amount of feedback data in the data classification as a feedback ratio; the target feedback data refers to feedback data belonging to a target time period in a data classification, the at least two time periods include the target time period, it can be understood that the data classification is any data classification generated through clustering, the target feedback data is feedback data belonging to a preset target time period, the target time period is one time period of the at least two time periods, a time period with important attention is generally taken as the target time period, a data processing device determines a ratio of a target number of the feedback data belonging to the target time period in the data classification to a total amount of the feedback data in the data classification as a feedback ratio, and the feedback ratio can reflect influence of the target feedback data in the data classification.

S104, determining data classification for alarming according to the feedback proportion of each data classification and the classification key words of each data classification, and taking the data classification as target data classification;

specifically, the data processing apparatus determines a data classification for performing an alarm as a target data classification based on the feedback proportion of each data classification and a classification keyword of each data classification, it is understood that the target data classification is a data classification for an alarm in a data classification generated by clustering, the data processing apparatus determines a data classification for performing an alarm by judging the feedback proportion of each data classification and the classification keyword of each data classification, and specifically, the judgment of the feedback proportion may be made by setting a proportion threshold of the feedback proportion or setting a quantity threshold of target feedback data in the data classification, and the judgment of the classification keyword may be made by setting a "no-alarm keyword" for screening the data classification, for example, the "no-alarm keyword" includes an ambiguous word and a keyword of the data classification for which an alarm has been performed, repeated and false alarms can be prevented from being made to the data classification.

And S105, generating alarm information according to the target data classification.

Specifically, the data processing device generates the alarm information according to the target data classification, and it is understood that, the target data classification is used for performing alarm data classification, the alarm information is information for performing typesetting display on information in the target data classification, the alarm information comprises classification key words and feedback information in the target data classification, the feedback information in the alarm information can be partially displayed according to needs, the five pieces of feedback information with the highest degree of association with the classified keywords can be displayed in a priority sorting mode, so that developers can more conveniently and accurately obtain the key information in the alarm information, and the alarm information can also display other information, such as, the feedback proportion and the number of the feedback information in the target time period of the feedback information, the number of users sending the feedback information in the target time period, and the like can be included.

In the embodiment of the application, feedback data of at least two time periods are obtained; clustering the feedback data of the at least two time periods to generate a plurality of data classifications and classification keywords corresponding to each data classification; determining the proportion of the target quantity of the target feedback data in the data classification to the total quantity of the feedback data in the data classification as a feedback proportion; the target feedback data refers to feedback data belonging to a target time period in a data classification, and the at least two time periods comprise the target time period; determining data classification for alarming according to the feedback proportion of each data classification and the classification key words of each data classification, and taking the data classification as target data classification; and generating alarm information according to the target data classification. The method for clustering the feedback data of at least two time periods to generate data classification determines the data classification for alarming according to the feedback proportion of the target time period in the data classification, and the method for clustering the feedback data of one time period to determine the data classification for alarming aims at a small amount of user feedback brought by some sudden problems, and an alarming system is low in perception degree, slow in response and incapable of timely processing at the early stage of an accident, so that the missing report rate of the alarming system is reduced, the sensitivity of the alarming system is improved, and the occurrence rate of operation accidents is reduced.

Referring to fig. 3, a flow chart of a data processing method according to an embodiment of the present application is schematically shown. As shown in fig. 3, the method of the embodiment of the present application may include the following steps S201 to S207.

S201, acquiring feedback data of a target time period according to a time period backtracking data mode; acquiring feedback data of an associated time period associated with the target time period in a time period backtracking data mode;

specifically, the data processing device acquires feedback data of a target time period from the feedback database according to a time period backtracking data mode; obtaining feedback data of an associated time period associated with the target time period from the feedback database in a time period backtracking data manner, it can be understood that the time period backtracking manner is to backtrack a time period from a time point, and obtain the feedback data of the time period from the feedback database, the target time period is the time period from the time point of backtracking start to the time point of backtracking end, the time period can be set according to requirements, the associated time period is a time period with the same duration as the target time period, the associated time period can include a plurality of time periods, for example, specifically obtaining the feedback data manner is as follows, the time period of time backtracking is set to be 1 hour, the target time period is backtracking from the current time of the day for 1 hour, and the feedback data of the target time period is obtained from the feedback database, the associated time period is a time period which is the same as the target time period in the previous n days, the number of days corresponding to the associated time period can be preset, the feedback data of the associated time period associated with the target time period are obtained from the feedback database, if the current time is 9 points, the target time period is 8 points to 9 points, the feedback data from 8 points to 9 points are obtained from the feedback database, if the feedback data of the associated time period is the feedback data corresponding to the previous 7 days of the target time period, the feedback data with short associated time is the feedback data from 8 points to 9 points every day in the previous 7 days, the feedback data from 8 points to 9 points every day in the previous 7 days are obtained from the feedback database, the associated time period comprises 7 time periods, and the feedback data of the associated time period comprises the feedback data corresponding to the 7 time periods.

In order to ensure the stability of the scale of the feedback data, after the feedback data is acquired by adopting a time period backtracking data mode, the quantity of the feedback data is detected, and the specific detection process is as follows: detecting a first data quantity of feedback data of a target time period, and when the first data quantity is smaller than a first quantity threshold, obtaining the feedback data of the first quantity threshold in a quantity backtracking data mode; detecting a second data quantity of the feedback data of the associated time period, and when the second data quantity is smaller than a second quantity threshold, obtaining the feedback data of the second quantity threshold in a quantity backtracking data mode; it can be understood that, the manner of quantity backtracking is to start backtracking from a time point, obtain a certain quantity of feedback data from the feedback database, where the first data quantity is the quantity of feedback data of a target time period, when the first data quantity is smaller than a first quantity threshold, obtain a first quantity threshold of feedback data from the feedback database in the manner of quantity backtracking data, where the first quantity threshold is preset, and the second data quantity is the quantity of feedback data of an associated time period, when the second data quantity is smaller than a second quantity threshold, obtain a second quantity threshold of feedback data from the feedback database in the manner of quantity backtracking data, where the second quantity threshold is preset, and the first quantity threshold and the second quantity threshold may be the same, if the associated time period includes multiple time periods, the same detection can be performed for each time period in the associated time, and the number threshold corresponding to each time period can be set respectively.

S202, performing word segmentation on the feedback data of the at least two time periods to generate word segmentation data, and performing clustering processing on the feedback data to generate a plurality of data classifications according to the use frequency of the word segmentation data in the feedback data; and generating a classification keyword corresponding to each data classification according to the word segmentation data corresponding to each data classification.

Specifically, the data processing device performs word segmentation on the feedback data of the at least two time periods to generate word segmentation data, and performs clustering processing on the feedback data to generate a plurality of data classifications according to the use frequency of the word segmentation data in the feedback data; generating a classification keyword corresponding to each data classification according to the participle data corresponding to each data classification, wherein the participle processing is to perform participle on the feedback data, remove stop words, messy codes and useless characters, generate a plurality of participle data, and obtain the corresponding frequency of each participle, specifically, an FPgrowth algorithm can be adopted to count the use frequency of the participle data, extract frequent participles with high use frequency, distribute the feedback data to different frequent participles to form subclasses according to whether the frequent participles appear in the feedback data, perform secondary optimization on the subclasses by using hierarchical clustering, and generate a plurality of data classifications; the data processing device generates classification keywords corresponding to each data classification according to the segmentation data corresponding to each data classification, specifically, scores are carried out on a plurality of frequent item segmentation words corresponding to each data classification, the higher the use frequency of the frequent item segmentation words is, the higher the score is, and the keywords with the scores larger than the score threshold value are used as the classification keywords of the data classification.

S203, determining the proportion of the target quantity of the target feedback data in the data classification to the total quantity of the feedback data in the data classification as a feedback proportion; the target feedback data refers to feedback data belonging to a target time period in a data classification, and the at least two time periods comprise the target time period;

S204, determining the data classification with the feedback proportion larger than the proportion threshold value as the data classification to be alarmed, and storing the data classification to be alarmed into a data set to be alarmed;

specifically, the data processing device determines the data classification with the feedback proportion larger than the proportion threshold as the data classification to be alarmed and stores the data classification to the data set to be alarmed, it may be understood that the proportion threshold is preset, for example, the proportion threshold is 0.9, the data classification with the feedback proportion larger than 0.9 is determined as the data classification to be alarmed and stored in the data set to be alarmed, the data set to be alarmed is used for storing the data classification to be alarmed, the data processing device detects each data classification, stores the data classification meeting the proportion threshold in the data set to be alarmed, and the data set to be alarmed may include a plurality of data classifications to be alarmed.

S205, performing alarm duplicate removal screening on the data to be alarmed in the data set to be alarmed;

specifically, the data processing device performs alarm duplication elimination screening on the data classification to be alarmed in the data set to be alarmed, and it can be understood that the duplication elimination screening includes screening of the data classification that has been alarmed and the data classification that is not civilized.

The specific screening of the data classification of the alarm comprises the following steps S1-S4:

s1, detecting the keyword repetition degree between the classified keywords of the data classification to be alarmed and the classified keywords of the alarm data classification in the alarm list; the alarm data in the alarm list is classified into the data classification which is alarmed;

specifically, the data processing equipment detects the keyword repetition degree between the classified keywords of the data classification to be alarmed and the classified keywords of the alarm data classification in the alarm list; the alarm data in the alarm list is classified into the data classification which has been alarmed, it can be understood that the alarm list stores the data classification which has been alarmed and the classification key words of the data classification which has been alarmed, the alarm list stores a plurality of data classifications which have been alarmed, the data processing equipment detects the key word repetition degree between the classification key words of the data classification which is to be alarmed and the classification key words of the data classification which has been alarmed in the alarm list, the key word repetition degree is the number of the same key words in the classification key words of the data classification which is to be alarmed and the classification key words of the data classification which has been alarmed in the alarm list, and the key word repetition degree between the data classification which is to be alarmed and each of the data classifications which has been alarmed in the alarm list is obtained.

S2, determining the data to be alarmed with the keyword repetition degree larger than the repetition degree threshold value as a first data to be alarmed classification;

specifically, the data processing device determines the data category to be warned, whose keyword repetition degree is greater than the repetition degree threshold value, as the first data category to be warned, it may be understood that the repetition degree threshold value is preset, the warning list stores multiple data categories to be warned, multiple keyword repetition degrees may exist between the data category to be warned and the data categories to be warned, and determines the data category to be warned, whose any keyword repetition degree is greater than the repetition degree threshold value, as the first data category to be warned.

S3, determining the data to be alarmed with the keyword repetition degree smaller than or equal to the repetition degree threshold value and the cosine distance larger than the distance threshold value as a second data to be alarmed; the cosine distance refers to the cosine distance between the classified keyword and the alarmed keyword;

specifically, the data processing device determines the data class to be alarmed of which the keyword repetition degree is less than or equal to a repetition degree threshold value and the cosine distance is greater than a distance threshold value as a second data class to be alarmed; the cosine distance is a cosine distance between the classified keyword and the alarmed keyword, and it can be understood that the cosine distance is a cosine distance between a vector corresponding to the classified keyword after vectorization and a vector corresponding to the alarmed keyword after vectorization, the distance threshold is preset, and the data processing device determines that the keyword repetition degree is less than or equal to the repetition degree threshold, and the data to be alarmed of which the cosine distance is greater than the distance threshold is classified as a second data category to be alarmed.

S4, determining that the first data classification to be alarmed and the second data classification to be alarmed meet the alarm duplication elimination condition, and deleting the data classification to be alarmed meeting the alarm duplication elimination condition from the data set to be alarmed.

Specifically, the data processing device determines that the first to-be-alarmed data classification and the second to-be-alarmed data classification meet an alarm duplication elimination condition, and deletes the to-be-alarmed data classification meeting the alarm duplication elimination condition from a to-be-alarmed data set.

The specific screening-agnostic data classification includes:

detecting the classified keywords of the data to be alarmed in the data set to be alarmed through a support vector machine to generate a detection result; deleting the data to be alarmed with the detection result as the target detection result from the data set to be alarmed in a classified manner; and the multiple detection results corresponding to the support vector machine comprise the target detection result.

Specifically, the data processing equipment detects the classified keywords of the data to be alarmed in the data set to be alarmed through the support vector machine to generate a detection result, the support vector machine is trained in advance by adopting training data, the training data carries data labels, the data label is one of a plurality of detection results corresponding to a support vector machine, a vector generated by vectorizing the classification keyword of the classification to be alarmed generates a detection result through the support vector machine, when the detection result is a target detection result, the data to be alarmed with the detection result being the target detection result is classified and deleted from the data set to be alarmed, for example, when the classification key word of the data to be alarmed is an uncivilized word, and if the detection result is the target detection result, deleting the data to be alarmed from the data set to be alarmed in a classified mode.

S206, determining the data to be alarmed after the alarm duplication elimination screening as a target data classification.

Specifically, the data processing device determines the classification of the data to be alarmed after the alarm duplication removal screening is performed as the target data classification. The target data classification is a data classification for performing alarms, and the target data classification may include a plurality of data classifications.

S207, packaging feedback data meeting priority conditions in the alarm data classification and classification keywords of the alarm data classification to generate alarm information; and outputting and displaying the alarm information.

Specifically, the data processing device packages feedback data meeting priority conditions in the alarm data classification and classification keywords of the alarm data classification to generate alarm information; the alarm information is output and displayed, it can be understood that the target data classification is a data classification for alarming, the alarm information is information for typesetting and displaying information in the target data classification, the alarm information includes classification keywords and feedback information in the target data classification, the feedback information in the alarm information can be partially displayed according to needs, specifically, five pieces of feedback information with the highest degree of association with the classification keywords can be displayed in a priority sorting mode, which is more beneficial for developers to accurately obtain key information in the alarm information, the alarm information can also encapsulate other information, for example, the feedback information can include feedback proportion and feedback information quantity in a target time period, the number of users who send the feedback information in the target time period, and the like, and the alarm information is sent to a target address for output and display, specifically, the output can be performed by means of short messages, WeChat and the like. Referring to fig. 4, a flow chart of a data processing method according to an embodiment of the present application is schematically shown. As shown in fig. 4, the data processing device obtains feedback data of at least two time periods from a feedback database, where the feedback data includes feedback data of a target time period and feedback data of an associated time period, and in order to ensure stability of the quantity of the feedback data, the feedback data is obtained by combining time period backtracking data and quantity backtracking data, each feedback data is classified by an unsupervised clustering method, a plurality of data classifications and classification keywords corresponding to each data classification are generated, and a ratio of the target quantity of the target feedback data in the data classifications to the total quantity of the feedback data in the data classifications is determined as a feedback ratio, where the target feedback data is generally feedback data obtained by performing data backtracking at a current time point, and the data classification with the feedback ratio greater than a ratio threshold is determined as a data classification to be alarmed, storing the data to be alarmed into a data set to be alarmed, carrying out alarm duplicate removal screening on the data to be alarmed in the data set to be alarmed to generate target data classification, carrying out duplicate removal screening on the data classification to be alarmed and the data classification which is not civilized, packaging feedback data meeting priority conditions in the target data classification and classification keywords of the alarm data classification to generate alarm information, outputting and displaying the alarm information, wherein the alarm information is used for displaying information in the target data classification, the alarm information comprises the classification keywords and the feedback information in the target data classification, the feedback information in the alarm information can be partially displayed according to needs, specifically, a priority sorting mode can be adopted, more favorable for developers to accurately obtain key information in the alarm information, and the alarm information can also display other information, for example, a feedback ratio and the number of feedback information of the feedback information in the target time period, the number of users who send the feedback information in the target time period, and the like may be included.

Referring to fig. 5, a schematic structural diagram of a data processing device is provided in an embodiment of the present application. As shown in fig. 5, the data processing apparatus 1 according to the embodiment of the present application may include: a feedback data acquisition unit 11, a feedback data clustering unit 12, a feedback proportion confirmation unit 13, an alarm data confirmation unit 14, and an alarm information generation unit 15.

A feedback data acquiring unit 11, configured to acquire feedback data of at least two time periods;

specifically, the feedback data acquiring unit 11 acquires feedback data of at least two time periods from a feedback database, it can be understood that the feedback database is used for storing feedback data, the feedback data sent by a user on an application program are all stored in the feedback database, the feedback data acquiring unit 11 can acquire the feedback data in the feedback database according to a preset statistical frequency, the feedback data is a suggestion or a problem of the user for the application program, the feedback data is mainly text feedback or voice feedback, the data processing device acquires the feedback data of at least two time periods from the feedback database, the feedback data carries timestamp information, the timestamp information is feedback time when the user feeds back the feedback data, the feedback data in a certain time period is feedback information when a timestamp falls on the time period, specifically, the feedback data in the two time periods may be feedback data fed back by users in the same time period on different dates.

The feedback data clustering unit 12 is configured to cluster the feedback data of the at least two time periods to generate a plurality of data classifications and a classification keyword corresponding to each data classification;

specifically, the feedback data clustering unit 12 clusters the feedback data of the at least two time periods to generate a plurality of data classifications and classification keywords corresponding to each data classification, it can be understood that the feedback data of the at least two time periods are clustered, the feedback data of the at least two time periods can be classified by adopting an unsupervised learning mode, the clustering method includes a partition method, a hierarchy method, a model algorithm, etc., the specific clustering algorithm includes a K-MEANS algorithm, a BIRCH algorithm, etc., the feedback data clustering unit 12 clusters the feedback data of the at least two time periods by adopting an unsupervised clustering method to generate a plurality of data classifications, each data classification includes a plurality of feedback data, the feedback data in each data classification can include feedback data in any one time period of the at least two time periods, the feedback data clustering unit 12 generates corresponding classification keywords according to the feedback data in each data classification, the classification keyword of each classification data may include a plurality of classification keywords.

A feedback proportion confirming unit 13, configured to determine, as a feedback proportion, a proportion of a target number of target feedback data in the data classification to a total amount of feedback data in the data classification; the target feedback data refers to feedback data belonging to a target time period in a data classification, and the at least two time periods comprise the target time period;

specifically, the feedback proportion confirming unit 13 determines a proportion of a target number of target feedback data in the data classification to a total amount of feedback data in the data classification as a feedback proportion; the target feedback data refers to feedback data belonging to a target time period in a data classification, the at least two time periods include the target time period, it can be understood that the data classification is any data classification generated through clustering, the target feedback data is feedback data belonging to a preset target time period, the target time period is one time period of the at least two time periods, a time period with important attention is generally taken as the target time period, a data processing device determines a ratio of a target number of the feedback data belonging to the target time period in the data classification to a total amount of the feedback data in the data classification as a feedback ratio, and the feedback ratio can reflect influence of the target feedback data in the data classification.

The alarm data confirmation unit 14 is configured to determine a data classification for performing an alarm as a target data classification according to the feedback proportion of each data classification and a classification keyword of each data classification;

specifically, the alarm data confirmation unit 14 determines the data classification for performing alarm as the target data classification based on the feedback proportion of each data classification and the classification keyword of each data classification, it is understood that the target data classification is the data classification for alarm in the data classification generated by clustering, the alarm data confirmation unit 14 determines the data classification for performing alarm by judging the feedback proportion of each data classification and the classification keyword of each data classification, specifically, the judgment of the feedback proportion may be made by setting a proportion threshold value of the feedback proportion or setting a quantity threshold value of the target feedback data in the data classification, and the judgment of the classification keyword may be made by setting a "no-alarm keyword" to screen the data classification, for example, the "no-alarm keyword" includes a non-literate word and a keyword of the data classification for which has performed alarm, repeated and false alarms can be prevented from being made to the data classification.

Referring to fig. 5, the alarm data confirmation unit 14 according to the embodiment of the present application may include: data storage subunit 141, data screening subunit 142, data validation subunit 143

The data storage subunit 141 is configured to determine the data classification with the feedback proportion larger than the proportion threshold as a data classification to be alarmed, and store the data classification to be alarmed in the data set to be alarmed;

a data screening subunit 142, configured to perform alarm duplication elimination screening on the data to be alarmed in the data set to be alarmed;

a data confirmation subunit 143, configured to determine the data to be alarmed after the alarm deduplication screening is performed as a target data classification;

and the warning information generating unit 15 is used for generating warning information according to the target data classification.

Specifically, the alarm information generating unit 15 generates alarm information based on the target data classification, it can be understood that the target data classification is a data classification for performing alarm, the alarm information is information for performing typesetting display on information in the target data classification, the alarm information includes a classification keyword and feedback information in the target data classification, and the feedback information in the alarm information may be partially displayed as needed, specifically, the five pieces of feedback information with the highest degree of association with the classified keywords can be displayed in a priority sorting mode, so that a developer can more conveniently and accurately obtain the key information in the alarm information, and the alarm information can also display other information, such as, the feedback proportion and the number of the feedback information in the target time period of the feedback information, the number of users sending the feedback information in the target time period, and the like can be included.

In the embodiment of the application, feedback data of at least two time periods are obtained; clustering the feedback data of the at least two time periods to generate a plurality of data classifications and classification keywords corresponding to each data classification; determining the proportion of the target quantity of the target feedback data in the data classification to the total quantity of the feedback data in the data classification as a feedback proportion; the target feedback data refers to feedback data belonging to a target time period in one data classification, and the at least two time periods comprise the target time period; determining data classification for alarming according to the feedback proportion of each data classification and the classification key words of each data classification, and taking the data classification as target data classification; and generating alarm information according to the target data classification. The method for clustering the feedback data of at least two time periods to generate data classification determines the data classification for alarming according to the feedback proportion of the target time period in the data classification, and the method for clustering the feedback data of one time period to determine the data classification for alarming aims at a small amount of user feedback brought by some sudden problems, and an alarming system is low in perception degree, slow in response and incapable of timely processing at the early stage of an accident, so that the missing report rate of the alarming system is reduced, the sensitivity of the alarming system is improved, and the occurrence rate of operation accidents is reduced.

Fig. 6 is a schematic structural diagram of a data processing device according to an embodiment of the present disclosure. As shown in fig. 6, the data processing apparatus 1000 may include: at least one processor 1001, such as a CPU, at least one network interface 1004, a user interface 1003, memory 1005, at least one communication bus 1002. The communication bus 1002 is used to implement connection communication among these components. The user interface 1003 may include a Display screen (Display), and the optional user interface 1003 may also include a standard wired interface or a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., a WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one disk memory. The memory 1005 may optionally be at least one memory device located remotely from the processor 1001. As shown in fig. 6, the memory 1005, which is a type of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a data processing application program.

In the data processing apparatus 1000 shown in fig. 6, a network interface 1004 may provide a network communication function, and a user interface 1003 is mainly used as an interface for providing an input for a user; the processor 1001 may be configured to call a data processing application stored in the memory 1005, so as to implement the description of the data processing method in the embodiment corresponding to any one of fig. 2 to fig. 4, which is not described herein again.

It should be understood that the data processing apparatus 1000 described in this embodiment of the application may perform the description of the data processing method in the embodiment corresponding to any one of fig. 2 to fig. 4, and may also perform the description of the data processing apparatus in the embodiment corresponding to fig. 5, which is not described herein again. In addition, the beneficial effects of the same method are not described in detail.

Further, here, it is to be noted that: an embodiment of the present application further provides a computer-readable storage medium, where a computer program executed by the aforementioned data processing apparatus is stored in the computer-readable storage medium, and the computer program includes program instructions, and when the processor executes the program instructions, the description of the data processing method in any one of the embodiments corresponding to fig. 2 to fig. 4 can be performed, so that details are not repeated here. In addition, the beneficial effects of the same method are not described in detail. For technical details not disclosed in embodiments of the computer-readable storage medium referred to in the present application, reference is made to the description of embodiments of the method of the present application.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and should not be taken as limiting the scope of the present application, so that the present application will be covered by the appended claims.

Claims

1. A data processing method, comprising:

acquiring feedback data of at least two time periods;

performing word segmentation processing on the feedback data of the at least two time periods to generate word segmentation data, acquiring the use frequency of the word segmentation data in the feedback data, acquiring frequent word segmentation from the word segmentation data according to the use frequency, distributing the feedback data to different frequent word segmentation to form subclasses according to whether the frequent word segmentation appears in the feedback data, performing secondary optimization on the subclasses by using hierarchical clustering to generate a plurality of data classifications, and generating classification keywords corresponding to each data classification according to the word segmentation data corresponding to each data classification;

determining the proportion of the target quantity of the target feedback data in the data classification to the total quantity of the feedback data in the data classification as a feedback proportion; the target feedback data refers to feedback data belonging to a target time period in one data classification, and the at least two time periods comprise the target time period;

and generating alarm information according to the target data classification.

2. The method of claim 1, wherein obtaining feedback data for at least two time periods comprises:

3. The method of claim 2, further comprising:

and detecting a second data quantity of the feedback data of the associated time period, and when the second data quantity is smaller than a second quantity threshold, acquiring the feedback data of the second quantity threshold in a quantity backtracking data mode.

4. The method according to claim 1, wherein the determining a data classification for performing an alarm according to the feedback proportion and the classification keyword of each data classification as a target data classification comprises:

determining that the first data classification to be alarmed and the second data classification to be alarmed meet an alarm duplication elimination condition, deleting the data classification to be alarmed meeting the alarm duplication elimination condition from a data set to be alarmed, and determining the data classification to be alarmed after the alarm duplication elimination screening as a target data classification.

5. The method according to claim 1, wherein the determining, as the target data classification, the data classification for performing the alarm according to the feedback proportion and the classification keyword of each data classification comprises:

deleting the data to be alarmed with the detection result as the target detection result from the data set to be alarmed in a classified manner; the multiple detection results corresponding to the support vector machine comprise the target detection result;

and determining the data to be alarmed after the alarm duplication removal screening as a target data classification.

6. The method of claim 1, wherein generating alarm information according to the target data classification comprises:

packaging feedback data meeting priority conditions in the target data classification and classification keywords of the alarm data classification to generate alarm information;

and outputting and displaying the alarm information.

7. A data processing apparatus, characterized by comprising:

the feedback data clustering unit is used for performing word segmentation processing on the feedback data of the at least two time periods to generate word segmentation data, acquiring the use frequency of the word segmentation data in the feedback data, acquiring frequent word segmentation from the word segmentation data according to the use frequency, distributing the feedback data to different frequent word segmentation to form subclasses according to whether the frequent word segmentation appears in the feedback data, performing secondary optimization on the subclasses by using hierarchical clustering to generate a plurality of data classifications, and generating classification keywords corresponding to each data classification according to the word segmentation data corresponding to each data classification;

a feedback proportion confirming unit, configured to determine, as a feedback proportion, a proportion of a target number of target feedback data in the data classification to a total amount of feedback data in the data classification; the target feedback data refers to feedback data belonging to a target time period in a data classification, and the at least two time periods comprise the target time period;

8. A computer storage medium, characterized in that the computer storage medium stores a computer program comprising program instructions that, when executed by a processor, perform the method according to any one of claims 1-6.