CN112291424B

CN112291424B - Fraud number identification method and device, computer equipment and storage medium

Info

Publication number: CN112291424B
Application number: CN202011176102.0A
Authority: CN
Inventors: 钱沁莹; 葛胜利; 汲丽
Original assignee: Information and Data Security Solutions Co Ltd
Current assignee: Information and Data Security Solutions Co Ltd
Priority date: 2020-10-29
Filing date: 2020-10-29
Publication date: 2021-09-14
Anticipated expiration: 2040-10-29
Also published as: CN112291424A

Abstract

The invention is applicable to the technical field of computers, and provides a fraud number identification method, a fraud number identification device, computer equipment and a storage medium, wherein the method comprises the following steps: acquiring communication characteristic information of a number to be identified; processing the communication characteristic information according to a preset fraud number identification model to generate a fraud number identification result; the preset fraud number recognition model is generated by training a self-training classification algorithm based on semi-supervised learning in advance. The fraud number identification method provided by the invention trains and generates the fraud number identification model by using the self-training classification algorithm which does not need to rely on a large amount of sample data marked with labels in the training process, can train and obtain a better identification model, has good adaptability in the field of fraud telephone identification with insufficient sample data, and has high accuracy of fraud number identification results obtained by processing communication characteristic information by using the fraud number identification model.

Description

Fraud number identification method and device, computer equipment and storage medium

Technical Field

The invention belongs to the technical field of computers, and particularly relates to a fraud number identification method and device, computer equipment and a storage medium.

Background

In the operator's business scenario, fraud phone identification is one of the more important parts. Existing fraud telephone identification solutions that are common in the industry have two broad categories, regular engines and machine learning methods. The machine learning method is widely popularized and applied in anti-fraud scenes due to the characteristics of automation and intellectualization. From a technical perspective, identification of fraudulent calls can be abstracted as a classification problem in supervised learning. In practical application, the problem that positive and negative sample labels in supervised learning are difficult to obtain is to be solved urgently.

Supervised learning techniques require carriers to have a sufficient accumulation of historical tags, or rely on expert experience to label portions of fraudulent telephone tags. Therefore, the comprehensiveness and reliability of the existing sample label have a great influence on the accuracy of the model identification result. In conclusion, the supervised learning technology is too dependent on the labeling of sample labels, the technology for fraud telephone identification by using simple supervised learning has limited use scenes, and the identification capability for novel fraud groups is weak.

As can be seen, the existing fraud telephone identification technology relies on the labeling of the known fraud telephone label, which affects the identification accuracy, identification efficiency and application range of fraud telephone identification.

Disclosure of Invention

The embodiment of the invention aims to provide a fraud number identification method, and aims to solve the technical problem that the identification accuracy, identification efficiency and application range of fraud telephone identification are influenced by depending on marking of known fraud telephone labels in the existing fraud telephone identification technology.

The embodiment of the invention is realized in such a way that a fraud number identification method comprises the following steps:

acquiring communication characteristic information of a number to be identified; the communication characteristic information at least comprises one or more than two of base station data, call data, short message data and flow data;

processing the communication characteristic information according to a preset fraud number identification model to generate a fraud number identification result; the preset fraud number recognition model is generated by training a self-training classification algorithm based on semi-supervised learning in advance.

Another object of an embodiment of the present invention is to provide a fraud number identification apparatus, including:

the communication characteristic information acquisition unit is used for acquiring the communication characteristic information of the number to be identified; the communication characteristic information at least comprises one or more than two of base station data, call data, short message data and flow data;

the fraud number identification unit is used for processing the communication characteristic information according to a preset fraud number identification model to generate a fraud number identification result; the preset fraud number recognition model is generated by training a self-training classification algorithm based on semi-supervised learning in advance.

It is a further object of embodiments of the present invention to provide a computer device, comprising a memory and a processor, said memory having stored therein a computer program, which, when executed by said processor, causes said processor to perform the steps of said fraud number identification method as described above.

It is a further object of embodiments of the present invention to provide a computer-readable storage medium, having stored thereon a computer program, which, when executed by a processor, causes the processor to perform the steps of the fraud number identification method as described above.

According to the fraud number identification method provided by the embodiment of the invention, after the communication characteristic information of the number to be identified, such as base station data, call data, short message data and traffic data, is obtained, the communication characteristic information is directly processed according to the preset fraud number identification model to generate a fraud number identification result, wherein the preset fraud number identification model is generated by training a self-training classification algorithm based on semi-supervised learning in advance.

Drawings

FIG. 1 is a flow chart illustrating steps of a method for identifying a fraud number according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating steps of training a fraud number recognition model according to an embodiment of the present invention;

FIG. 3 is a flowchart of another step of training a fraud number identification model according to an embodiment of the present invention;

FIG. 4 is a flow chart of steps of another fraud number identification method provided by the embodiment of the present invention;

FIG. 5(a) is a undirected connectivity graph of normal user group numbers and calling devices;

FIG. 5(b) is a undirected connectivity graph of abnormal subscriber group numbers and calling devices;

FIG. 6 is a flow chart of steps of still another fraud number identification method provided by the embodiment of the present invention;

FIG. 7 is a flow chart illustrating steps of a method for identifying a victim group according to an embodiment of the present invention;

FIG. 8 is a schematic structural diagram of a fraud number identification apparatus according to an embodiment of the present invention;

fig. 9 is an internal structural diagram of a computer device for executing a fraud number identification method according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As shown in fig. 1, a flow chart of steps of a fraud number identification method provided in the embodiment of the present invention specifically includes the following steps:

and step S102, obtaining the communication characteristic information of the number to be identified.

In the embodiment of the present invention, the communication characteristic information generally includes one or more of base station data, call data, short message data, and flow data, specifically, the base station data includes related information such as a number attribution, the call data includes related information such as a calling number, a called number, a location of the calling number, a device number used by the calling number, a call duration, and the like, the short message record includes related information such as the calling number, the called number, the location of the calling number, and all device numbers of the calling number, and the flow data includes related information such as a flow number of each month, and an app corresponding to the flow.

In the embodiment of the present invention, preferably, in consideration of the timeliness required for fraudulent call identification, the acquired communication feature information is generally acquired from all records of the month with the largest amount of money consumed in the last half year and all records of the last month.

In the embodiment of the invention, the original communication characteristic information is usually stored by dictionary data, and the communication characteristic information is described by vectorizing the dictionary data structure by utilizing the characteristic vectorization technology, so that a large space is saved for sparse matrixes and class type variables. Specifically, for example, for the feature that apps of users use the flow number (MB) in the current month, if stored as a sparse matrix, the flow number of each app as one feature will yield tens of thousands of dimensions in total. Wherein, for apps rarely used by users, a large number of characteristic values with a value of 0 will be generated. The matrix storage form of the large-width table consumes a large amount of memory, and the memory can be effectively saved based on the characteristic vectorization technology.

And step S104, processing the communication characteristic information according to a preset fraud number identification model to generate a fraud number identification result.

In the embodiment of the invention, the preset fraud number recognition model is generated by training a self-training classification algorithm based on semi-supervised learning in advance.

In the embodiment of the invention, compared with the conventional machine learning algorithm, the self-training classification algorithm (self-training) of semi-supervised learning can train to obtain a better recognition model without depending on the labeling of the sample labels, and has better adaptability in the field of fraud telephone recognition with less sample labels, wherein the step of training and generating the fraud number recognition model based on the self-training classification algorithm (self-training) can refer to fig. 2 and the content explained by the same.

As shown in fig. 2, a flowchart of the steps for training and generating a fraud number recognition model provided in the embodiment of the present invention specifically includes the following steps:

step S202, a labeled data set and an unlabeled data set are obtained.

In the embodiment of the present invention, the tagged data set comprises a plurality of tagged sample numbers carrying fraud number identification result information and communication characteristic information, and usually comprises a positive sample number partially determined as a fraud number and a negative sample number of a plurality of normal users, and the non-tagged data set comprises a plurality of non-tagged sample numbers carrying only communication characteristic information and not knowing whether the non-tagged sample numbers are fraud numbers, and in the normal fraud number identification field, the number of non-tagged data sets is usually much larger than that of tagged data sets. However, the conventional machine learning algorithm can only be trained by using the labeled data set, and in most cases, the recognition accuracy of the trained recognition model is not high in the actual recognition application process because the number of the positive samples which are definitely determined as the fraud number is insufficient.

And step S204, determining the labeled data set as a training set, and training a fraud number recognition teacher model with the optimal current recognition effect based on a preset training rule.

In the embodiment of the invention, the preset training rule is usually a neural network model algorithm, after the labeled data set is given as the training set, the fraud number recognition teacher model with the optimal current recognition effect can be determined based on the conventional neural network model algorithm, and considering the sample number problem of the labeled data set, it is obvious that the fraud number recognition teacher model with the optimal current recognition effect cannot necessarily ensure that the classification effect in the actual application process is also better.

Step S206, the non-label data set is identified according to the fraud number identification teacher model, and the fraud result prediction probabilities of the plurality of non-label sample numbers are determined.

In the embodiment of the invention, further, the fraud number recognition teacher model is used for recognizing the unlabeled data set, and the final output layer adopts a softmax form, so as to ensure that the processing result of the fraud number recognition teacher model on the unlabeled data set is the fraud result prediction probability P of the unlabeled sample number, wherein P belongs to [0,1 ].

In step S208, the non-label sample number with the fraud result prediction probability exceeding the preset confidence threshold is updated to the pseudo-label data set.

In the embodiment of the present invention, the fraud result prediction probability P of the non-labeled sample number may describe the probability that the non-labeled sample number is a fraud number, the closer to the value of 1, the more likely to be a fraud number, the closer to the value of 0, the less likely to be a fraud number, so that the non-labeled sample number whose prediction probability satisfies the preset condition is updated to the pseudo-label data set by setting the confidence threshold a, and in general, the confidence interval determined according to the confidence threshold a includes two ends, [0, a ] "a, 1], when P is within the confidence interval, the fraud result of the non-labeled sample number can be considered to be more credible (more likely to be a normal number or more likely to be a fraud number), and for such non-labeled sample number, the non-labeled sample number is updated to the pseudo-label data set.

In the embodiment of the present invention, the confidence threshold a is a key for making a pseudo tag, and too high a confidence threshold may result in too many false negative examples (FN) in the pseudo tag, but too low may introduce some false positive examples (FP). Therefore, the confidence threshold a is usually required to be adaptively adjusted to screen out the optimal model with the highest accuracy. Sample imbalance phenomena usually exist due to fraudulent phone identification scenarios, namely: the sample size of the normal number is far larger than that of the fraud telephone, and the simple precision ratio and recall ratio are difficult to comprehensively measure the identification accuracy. According to the scheme, the class imbalance is considered, the identification accuracy is measured by adopting the index weighted f1-score, and model screening is carried out.

Step S210, determining the labeled data set and the pseudo label data set as a new training set, and training and generating a fraud number recognition student model according to a preset training rule.

In the embodiment of the invention, the pseudo-label data set is added into the labeled data set to form a new training set so as to expand the training set, and the fraud number recognition student model is generated under the condition that the word sample capacity is larger and according to the training rule, wherein the recognition effect of the fraud number recognition student model is required to be better than that of the fraud number recognition teacher model.

In step S212, it is determined whether a preset training end condition is satisfied. When it is determined that the preset training end condition is not satisfied, performing step S214; when it is judged that the preset training end condition is satisfied, step S216 is performed.

In the embodiment of the present invention, the preset training result condition is usually that the number of iterations is used as the determination condition, and certainly, whether a fraud number recognition student model with better recognition effect than the fraud number recognition teacher model exists may also be used as the determination condition, when the training end condition is not satisfied, it indicates that further iteration is needed, at this time, step S224 is executed, and when the training end condition is satisfied, the fraud number recognition student model is the fraud number recognition model obtained by training.

Step S214, determining the fraud number identification student model as a new fraud number identification teacher model and returning to said step S206;

in the embodiment of the invention, the fraud number recognition student model is determined as the fraud number recognition teacher model again, and then the non-tag data set is processed again, at the moment, the non-tag data updated to the pseudo-tag data set is removed from the non-tag data set.

Step S216, the fraud number recognition student model is determined into a fraud number recognition model generated by training.

In the embodiment of the present invention, a detailed step of training and generating a fraud number recognition model based on a self-training classification algorithm (self-training) is provided, and further, in consideration of a model overfitting problem that the self-training classification algorithm is easy to exist, the present invention further provides an improved self-training classification algorithm for solving the above problem, and specifically, refer to fig. 3 and the explanation thereof.

FIG. 3 is a flowchart of another procedure for training a fraud number recognition model according to an embodiment of the present invention, which is described in detail below.

In the embodiment of the present invention, the difference from the flowchart of the steps of training the fraud number identification model shown in fig. 2 is that the step S210 specifically includes:

step S302, determining the labeled data set and the pseudo label data set as new training sets, and generating fraud number recognition student models according to preset training rule training and preset noise adding rules.

In the embodiment of the invention, the improved self-training classification algorithm is provided by introducing random noise information in the process of training the generation of the student model. Specifically, after a new training set is determined, a fraud number recognition student model is generated according to a preset training rule training and a preset noise adding rule, wherein the preset noise adding rule comprises a data noise adding rule for adding noise in the training set and a model noise adding rule for adding noise in the fraud number recognition student model, generally speaking, the model noise adding rule comprises one or more than two of dropout, random depth and random enhancement, and the data noise mainly relates to corrosion of data, such as deleting and modifying sample data in a certain proportion. dropout, random depth, and random enhancement rules the present invention is not described herein in detail.

In the embodiment of the invention, the generalization capability of the model can be effectively improved by adding the data noise, and the robustness of the model can be further improved by adding the model noise, so that the problem of model overfitting easily existing in the conventional self-training classification algorithm is solved.

As shown in FIG. 4, a flow chart of steps of another fraud number identification method provided by the embodiment of the invention is described in detail as follows.

In the embodiment of the present invention, the difference from the flow chart of steps of a fraud number identification method shown in fig. 1 is that after step S104, the method further includes:

step S402, judging whether the calling equipment number of the number to be identified meets the preset fraud characteristics.

In the embodiment of the invention, in addition to utilizing the fraud number identification model to identify the number to be identified, the invention provides a scheme for further judging whether the number is a fraud number by utilizing the calling equipment number of the number to be identified through researching a multidirectional connection diagram formed by the calling number and the calling equipment number in the call records of normal users and abnormal users, and particularly, the communication characteristic information of the number to be identified also comprises the calling equipment number.

Step S404, when the calling device number of the number to be identified is judged to meet the preset fraud characteristics, the number to be identified is confirmed to be a fraud number.

In the embodiment of the present invention, in general, the calling number of a normal user and the number of a calling device are in a one-to-one relationship, in few cases, there is a one-to-few relationship, and in a multidirectional connection graph formed by the calling number of an abnormal user and the number of a calling device, the number of nodes reaches thousands at most, that is, there is a large number of one-to-many relationships, and specifically, the multidirectional connection graph of the numbers of the normal user group and the abnormal user group and the calling device may be referred to as shown in fig. 5. Therefore, whether the number to be identified is a fraud number can be further judged by judging that the calling device number of the number to be identified meets the preset fraud characteristics, and the judgment result can be integrated with the judgment result of the fraud number identification model to realize the judgment of the fraud number, so that the accuracy of the judgment result is further improved.

As shown in fig. 5(a) and 5(b), the undirected connectivity graphs of the numbers of the normal user group and the abnormal user group and the calling device are respectively described as follows.

In the embodiment of the present invention, as shown in fig. 5(a), for a undirected connected graph formed by a calling number and a calling device number in a normal user call record, the number of nodes in a connected component sub-graph does not exceed 3, that is, almost all of the nodes correspond to one or a few device numbers.

In the embodiment of the present invention, as shown in fig. 5(b), the undirected connected graph is formed by the calling number and the calling device number in the abnormal user call record, the number of nodes reaches thousands at most, and the connection forms presented by different groups are different.

Fig. 6 is a flow chart showing the steps of still another fraud number identification method provided by the embodiment of the present invention, which is described in detail as follows.

step S602, after the number to be identified is determined to be a fraud number, determining a victim according to the communication characteristic information and a preset victim identification rule.

In the embodiment of the invention, in order to further fully utilize the identification result of the fraud phone and serve network environment management and detection, based on the communication characteristic information of the fraud phone, the victim group and the victim degree can be obtained by using the identification rule of the victim group, and the suspected victim group closely related to the victim can be obtained by using the identification rule of the victim group, and the specific implementation rule is shown in fig. 7.

As shown in fig. 7, a flowchart of steps of a method for identifying a victim group according to an embodiment of the present invention specifically includes the following steps:

step S702, determining a sensitive number having communication interaction behavior with the fraud number according to the communication characteristic information.

In the embodiment of the invention, the number which has communication interaction with the fraud number is determined as the sensitive number according to the communication characteristic information.

Step S704, judging whether the sensitive number belongs to a victim group according to the conversation time length and the conversation times of the sensitive number and the fraud number and the judgment result of whether the sensitive number has communication interaction with other fraud numbers.

In the embodiment of the present invention, it may be further determined whether the sensitive number belongs to a victim group by determining a call duration, a call number of the sensitive number and the fraud number, and a determination result of whether the sensitive number has a communication interaction behavior with other fraud numbers, where usually, threshold values of the call duration and the call number may be set, and the sensitive number whose call duration exceeds the threshold value or whose call number exceeds the threshold value is determined as a victim group, and certainly, if the sensitive number also has a communication interaction behavior with other fraud numbers, the sensitive number may also be determined as a victim group, at this time, the degree of the victim group may be evaluated by setting different threshold values, and the final determination result may be used for subsequent network environment governance, for example, performing anti-fraud education and the like on the victim group with a higher degree of the victim group.

As shown in fig. 8, a schematic structural diagram of a fraud number identification apparatus provided in an embodiment of the present invention specifically includes the following structures:

a communication characteristic information obtaining unit 810, configured to obtain communication characteristic information of the number to be identified.

A fraud number recognition unit 820, configured to process the communication characteristic information according to a preset fraud number recognition model, and generate a fraud number recognition result.

According to the fraud number identification device provided by the embodiment of the invention, after the communication characteristic information of the number to be identified, such as base station data, call data, short message data and traffic data, is obtained, the communication characteristic information is directly processed according to the preset fraud number identification model to generate a fraud number identification result, wherein the preset fraud number identification model is generated by training a self-training classification algorithm based on semi-supervised learning in advance.

FIG. 9 is a diagram illustrating an internal structure of a computer device in one embodiment. As shown in fig. 9, the computer apparatus includes a processor, a memory, a network interface, an input device, and a display screen connected through a system bus. Wherein the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and may further store a computer program that, when executed by the processor, causes the processor to implement the fraud number identification method. The internal memory may also have a computer program stored therein, which when executed by the processor, causes the processor to perform a fraud number identification method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 9 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, the fraud number identification apparatus provided herein may be implemented in the form of a computer program that is executable on a computer device as shown in fig. 9. The memory of the computer device may store therein various program modules constituting the fraud number identification apparatus, such as the communication characteristic information acquisition unit 810 and the fraud number identification unit 820 shown in FIG. 8. The respective program modules constitute computer programs that cause the processors to execute the steps in the fraud number identification methods of the respective embodiments of the present application described in the present specification.

For example, the computer device shown in fig. 9 may execute step S102 by the communication characteristic information acquiring unit 810 in the fraud number identification apparatus shown in fig. 8; the computer device may perform step S104 through the fraud number identification unit 820.

In one embodiment, a computer device is proposed, the computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:

In one embodiment, a computer readable storage medium is provided, having a computer program stored thereon, which, when executed by a processor, causes the processor to perform the steps of:

It should be understood that, although the steps in the flowcharts of the embodiments of the present invention are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in various embodiments may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A fraud number identification method, comprising:

processing the communication characteristic information according to a preset fraud number identification model to generate a fraud number identification result; the preset fraud number recognition model is generated by training a self-training classification algorithm based on semi-supervised learning in advance;

the step of training and generating the preset fraud number recognition model specifically comprises:

acquiring a tagged data set and a non-tagged data set; the tagged data set comprises a plurality of tagged sample numbers carrying fraud number identification result information and communication characteristic information; the non-tag data set comprises a plurality of non-tag sample numbers carrying communication characteristic information;

determining the labeled data set as a training set and training a fraud number recognition teacher model with the optimal current recognition effect based on a preset training rule;

identifying the unlabeled data set according to the fraud number identification teacher model, and determining fraud result prediction probabilities of the unlabeled sample numbers;

updating the non-label sample numbers with the fraud result prediction probability exceeding a preset confidence coefficient threshold value into a pseudo-label data set;

determining the labeled data set and the pseudo-label data set as a new training set, and training and generating a fraud number recognition student model according to a preset training rule; the recognition effect of the fraud number recognition student model is superior to that of the fraud number recognition teacher model;

judging whether a preset training end condition is met;

when the preset training end condition is judged not to be met, determining the fraud number recognition student model as a new fraud number recognition teacher model, returning to the step of performing recognition processing on the unlabeled data set according to the fraud number recognition teacher model, and determining fraud result prediction probabilities of the plurality of unlabeled sample numbers;

and when the preset training end condition is judged to be met, determining the fraud number recognition student model generated by training by using the fraud number recognition student model.

2. The fraud number identification method of claim 1, wherein the preset fraud number identification model is generated in advance based on an improved self-training classification algorithm training: the improved self-training classification algorithm introduces random noise information in the process of training and generating a student model;

the step of training and generating the fraud number recognition student model according to the preset training rule specifically comprises the following steps:

and generating a fraud number recognition student model according to the preset training rule training and the preset noise adding rule.

3. The fraud number identification method of claim 2, wherein the preset noise-adding rules comprise data noise-adding rules for adding noise in a training set and model noise-adding rules for adding noise in a fraud number identification student model; the model noise adding rule comprises one or more than two of dropout, random depth and random enhancement.

4. The fraud number identification method of claim 1, wherein the communication feature information further includes a calling device number; after the step of processing the communication characteristic information according to the preset fraud number identification model to generate a fraud number identification result, the method further comprises the following steps:

judging whether the calling equipment number of the number to be identified meets the preset fraud characteristics or not;

and when the calling equipment number of the number to be identified meets the preset fraud characteristics, determining that the number to be identified is a fraud number.

5. The fraud number identification method according to claim 1, wherein after said step of processing said communication characteristic information according to a preset fraud number identification model to generate a fraud number identification result, further comprising:

and after the number to be identified is determined to be a fraud number, determining a victim group according to the communication characteristic information and a preset victim group identification rule.

6. The fraud number identification method of claim 5, wherein the step of determining the victim group according to the communication characteristic information and the preset victim group identification rules specifically comprises:

determining a sensitive number which has communication interaction with the fraud number according to the communication characteristic information;

and judging whether the sensitive number belongs to a victim group according to the conversation time length and the conversation times of the sensitive number and the fraud number and the judgment result of whether the sensitive number has communication interaction with other fraud numbers.

7. A computer device, characterized in that it comprises a memory and a processor, said memory having stored therein a computer program which, when executed by said processor, causes said processor to carry out the steps of the fraud number identification method of any one of claims 1 to 6.

8. A computer-readable storage medium, characterized in that a computer program is stored thereon, which, when being executed by a processor, causes the processor to carry out the steps of the fraud number identification method of any one of claims 1 to 6.