CN115022464A - Number processing method, system, computing device and storage medium - Google Patents
Number processing method, system, computing device and storage medium Download PDFInfo
- Publication number
- CN115022464A CN115022464A CN202210486524.0A CN202210486524A CN115022464A CN 115022464 A CN115022464 A CN 115022464A CN 202210486524 A CN202210486524 A CN 202210486524A CN 115022464 A CN115022464 A CN 115022464A
- Authority
- CN
- China
- Prior art keywords
- data
- target
- target number
- fraudulent
- fraud
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/22—Arrangements for supervision, monitoring or testing
- H04M3/2281—Call monitoring, e.g. for law enforcement purposes; Call tracing; Detection or prevention of malicious calls
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/40—Business processes related to the transportation industry
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W12/00—Security arrangements; Authentication; Protecting privacy or anonymity
- H04W12/12—Detection or prevention of fraud
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2203/00—Aspects of automatic or semi-automatic exchanges
- H04M2203/60—Aspects of automatic or semi-automatic exchanges related to security aspects in telephonic communication systems
- H04M2203/6027—Fraud preventions
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Computer Security & Cryptography (AREA)
- Signal Processing (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Technology Law (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
本申请提供一种号码处理方法、系统、计算设备、存储介质。该方法包括:获取待识别的目标号码。获取所述目标号码的基础数据,以及,所述目标号码在第一历史时长内的流量详单数据和通话详单数据。根据所述基础数据、所述流量详单数据和所述通话详单数据,利用号码识别模型,获取所述目标号码是否为诈骗号码的识别概率。根据所述识别概率,确定所述目标号码是否为诈骗号码。本申请能够提高诈骗号码识别的效率和准确性。
The present application provides a number processing method, system, computing device, and storage medium. The method includes: acquiring a target number to be identified. Acquire the basic data of the target number, and the traffic detail data and call detail data of the target number within the first historical period. According to the basic data, the traffic detailed bill data and the call detailed bill data, a number identification model is used to obtain the identification probability of whether the target number is a fraudulent number. According to the identification probability, it is determined whether the target number is a fraudulent number. The present application can improve the efficiency and accuracy of fraud number identification.
Description
技术领域technical field
本申请涉及通信技术,尤其涉及一种号码处理方法、系统、计算设备、存储介质。The present application relates to communication technologies, and in particular, to a number processing method, system, computing device, and storage medium.
背景技术Background technique
目前,针对诈骗号码的识别,多采用由互联网公司众包的方式,以利用安装在终端上的安全软件,由被叫号码的用户标注主叫号码是否为诈骗号码。虽然,这种方式可以在一定程度上区分诈骗号码与普通号码,但是诈骗号码在被标注之前,往往已经存在大量呼出行为,甚至可能已经存在诈骗成功的情况,导致诈骗号码识别存在滞后性。At present, for the identification of fraudulent numbers, crowdsourcing is mostly adopted by Internet companies, using the security software installed on the terminal, and the user of the called number marks whether the calling number is a fraudulent number. Although this method can distinguish fraudulent numbers from ordinary numbers to a certain extent, before the fraudulent numbers are marked, there are often a large number of outgoing calls, and there may even be successful frauds, resulting in a lag in the identification of fraudulent numbers.
发明内容SUMMARY OF THE INVENTION
本申请提供一种号码处理方法、系统、计算设备、存储介质,用以解决诈骗号码识别存在滞后性问题。The present application provides a number processing method, system, computing device, and storage medium, which are used to solve the problem of hysteresis in the identification of fraudulent numbers.
第一方面,本申请提供一种号码处理方法,该方法包括:In a first aspect, the present application provides a number processing method, the method comprising:
获取待识别的目标号码;Get the target number to be identified;
获取所述目标号码的基础数据,以及,所述目标号码在第一历史时长内的流量详单数据和通话详单数据;Acquire the basic data of the target number, and the traffic detailed data and call detailed data of the target number within the first historical period;
根据所述基础数据、所述流量详单数据和所述通话详单数据,利用号码识别模型,获取所述目标号码是否为诈骗号码的识别概率;According to the basic data, the traffic detailed bill data and the call detailed bill data, use a number identification model to obtain the identification probability of whether the target number is a fraudulent number;
根据所述识别概率,确定所述目标号码是否为诈骗号码。According to the identification probability, it is determined whether the target number is a fraudulent number.
可选的,所述获取待识别的目标号码,包括:Optionally, the obtaining the target number to be identified includes:
获取目标区域的N个号码在第二历史时长内的呼出次数;所述N为大于或等于1的整数;Obtain the number of calls made by N numbers of the target area within the second historical period; the N is an integer greater than or equal to 1;
若N个所述号码中存在呼出次数大于呼出阈值的号码,则将该号码作为候选号码;所述呼出阈值与所述目标区域的诈骗号码呼出量有关;If there is a number whose number of outgoing calls is greater than the outgoing threshold in the N said numbers, the number is regarded as a candidate number; the outgoing threshold is related to the outgoing volume of fraudulent numbers in the target area;
从候选号码中确定所述目标号码。The target number is determined from the candidate numbers.
可选的,所述从候选号码中确定所述目标号码,包括:Optionally, the determining the target number from the candidate numbers includes:
将未位于白名单和灰名单中的候选号码作为所述目标号码;其中,所述白名单为非诈骗号码的名单,所述灰名单为候选诈骗号码的名单。The candidate numbers that are not in the white list and the gray list are used as the target numbers; wherein, the white list is a list of non-fraud numbers, and the gray list is a list of candidate fraud numbers.
可选的,根据权利要求3所述的方法,所述根据所述识别概率,确定所述目标号码是否为诈骗号码之后,所述方法还包括:Optionally, the method according to claim 3, after determining whether the target number is a fraudulent number according to the identification probability, the method further comprises:
若根据所述识别概率,确定所述目标号码非诈骗号码,则将所述目标号码添加至所述白名单;If it is determined that the target number is not a fraudulent number according to the identification probability, adding the target number to the white list;
若根据所述识别概率,确定所述目标号码可能为诈骗号码,则将所述目标号码添加至所述灰名单;If, according to the identification probability, it is determined that the target number may be a fraudulent number, adding the target number to the grey list;
若根据所述识别概率,确定所述目标号码为诈骗号码,则将所述目标号码添加至黑名单;所述黑名单为诈骗号码的名单。If it is determined according to the identification probability that the target number is a fraudulent number, the target number is added to a blacklist; the blacklist is a list of fraudulent numbers.
可选的,若所述目标号码为灰名单中的号码,则所述方法还包括:Optionally, if the target number is a number in the grey list, the method further includes:
向被叫号码发送所述目标号码异常的提示信息;Send a prompt message that the target number is abnormal to the called number;
若接收到所述被叫号码针对所述目标号码的反馈结果、且反馈结果用于表征所述目标号码是否诈骗号码,则记录所述目标号码的反馈结果;If the feedback result of the called number for the target number is received, and the feedback result is used to indicate whether the target number is a fraudulent number, record the feedback result of the target number;
若所述目标号码的累计反馈结果表征所述目标号码非诈骗号码,则将所述目标号码从所述灰名单迁移至所述白名单;If the cumulative feedback result of the target number indicates that the target number is not a fraudulent number, the target number is migrated from the gray list to the white list;
若所述目标号码的累计反馈结果表征所述目标号码为诈骗号码,则将所述目标号码从所述灰名单迁移至所述黑名单。If the accumulated feedback result of the target number indicates that the target number is a fraudulent number, the target number is migrated from the gray list to the black list.
可选的,若所述目标号码为黑名单中的号码,则所述方法还包括:Optionally, if the target number is a number in the blacklist, the method further includes:
对所述目标号码执行封停操作。A blocking operation is performed on the target number.
可选的,所述根据所述基础数据、所述流量详单数据和所述通话详单数据,利用号码识别模型,获取所述目标号码是否为诈骗号码的识别概率,包括:Optionally, according to the basic data, the traffic detailed bill data and the call detailed bill data, using a number identification model to obtain the identification probability of whether the target number is a fraudulent number, including:
对所述流量详单数据进行数据加工,得到加工后的流量详单数据;Perform data processing on the detailed flow data to obtain the processed detailed data;
对所述基础数据、所述加工后的流量详单数据进行数据清洗,并利用清洗后的基础数据和清洗后的流量详单数据,构建第一特征向量;Data cleaning is performed on the basic data and the processed traffic detailed data, and a first feature vector is constructed by using the cleaned basic data and the cleaned traffic detailed data;
对所述通话详单数据进行数据清洗,并利用清洗后的通话详单数据,构建第二特征向量;Data cleaning is performed on the call detail data, and a second feature vector is constructed by using the cleaned call detail data;
根据所述第一特征向量,以及,所述第二特征向量,利用所述号码识别模型,获取所述目标号码是否为诈骗号码的识别概率;According to the first feature vector and the second feature vector, use the number recognition model to obtain the identification probability of whether the target number is a fraudulent number;
其中,所述号码识别模型包括:采用多层感知机搭建的第一子模型、采用卷积神经网络搭建的第二子模型,以及,分类层;Wherein, the number recognition model includes: a first sub-model constructed with a multilayer perceptron, a second sub-model constructed with a convolutional neural network, and a classification layer;
所述第一子模型,用于根据所述第一特征向量获取所述目标号码的第一特征信息;The first sub-model is used to obtain the first feature information of the target number according to the first feature vector;
所述第二子模型,用于根据所述第二特征向量获取第二特征信息;the second sub-model, for obtaining second feature information according to the second feature vector;
所述分类层,用于根据所述第一特征信息,以及,所述第二特征信息,获取所述目标号码是否为诈骗号码的识别概率。The classification layer is configured to obtain the identification probability of whether the target number is a fraudulent number according to the first feature information and the second feature information.
第二方面,本申请提供一种号码处理系统,所述系统包括:In a second aspect, the present application provides a number processing system, the system comprising:
实时监控模块:用于获取待识别的目标号码。Real-time monitoring module: used to obtain the target number to be identified.
数据获取模块:用于获取目标号码的基础数据,以及,目标号码在第一历史时长内的流量详单数据和通话详单数据。Data acquisition module: used to acquire the basic data of the target number, as well as the traffic detailed data and call detailed data of the target number within the first historical period.
诈骗检测模块:用于根据基础数据、流量详单数据和通话详单数据,利用号码识别模型,获取目标号码是否为诈骗号码的识别概率。根据识别概率,确定目标号码是否为诈骗号码。Fraud detection module: It is used to obtain the identification probability of whether the target number is a fraudulent number by using the number recognition model according to the basic data, traffic detailed data and call detailed data. According to the identification probability, it is determined whether the target number is a fraudulent number.
第三方面,本申请提供一种电子设备,所述电子设备包括:处理器,以及与所述处理器通信连接的存储器;In a third aspect, the present application provides an electronic device, the electronic device comprising: a processor, and a memory communicatively connected to the processor;
所述存储器存储计算机执行指令;the memory stores computer-executable instructions;
所述处理器执行所述存储器存储的计算机执行指令,以实现第一方面任一项所述的方法。The processor executes the computer-executable instructions stored in the memory to implement the method of any one of the first aspects.
第四方面,本申请提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,所述计算机执行指令被处理器执行时用于实现第一方面任一项所述的号码处理方法。In a fourth aspect, the present application provides a computer-readable storage medium, where computer-executable instructions are stored in the computer-readable storage medium, and when the computer-executable instructions are executed by a processor, are used to implement any one of the first aspects. number processing method.
第五方面,本申请提供一种计算机程序产品,包括计算机程序,该计算机程序被处理器执行时实现如第一方面中任一项所述的方法。In a fifth aspect, the present application provides a computer program product, comprising a computer program that, when executed by a processor, implements the method according to any one of the first aspects.
本申请提供的号码处理方法、系统、计算设备及存储介质,可以获取所获取的目标号码的基础数据、在第一历史时长内的流量详单数据和通话详单数据。由于在第一历史时长内的通话详单数据中包括能够反映目标号码的历史呼叫行为在时间上的相关性的信息,因此,基于所获取的目标号码的基础数据、在第一历史时长内的流量详单数据和通话详单数据,可以准确的识别出目标号码是否为诈骗号码。The number processing method, system, computing device and storage medium provided by the present application can acquire the basic data of the acquired target number, the traffic detail data and the call detail data within the first historical period. Since the call detail data in the first historical duration includes information that can reflect the temporal correlation of historical calling behaviors of the target number, based on the acquired basic data of the target number, the data in the first historical duration Traffic detail data and call detail data can accurately identify whether the target number is a fraudulent number.
上述方法,不依赖于用户对诈骗号码的标记,解决了依赖用户标记诈骗号码存在的滞后性和操作繁琐的问题。此外,本实施例结合能够反映号码的历史呼叫行为在时间上的相关性的信息作为判断基础,进一步提高了识别诈骗号码的准确性。The above method does not depend on the user's marking of fraudulent numbers, and solves the problems of hysteresis and cumbersome operations of relying on users to mark fraudulent numbers. In addition, in this embodiment, the information that can reflect the temporal correlation of the historical calling behavior of the number is used as the judgment basis, which further improves the accuracy of identifying fraudulent numbers.
附图说明Description of drawings
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description serve to explain the principles of the application.
图1为本申请实施例中的诈骗号码识别系统100的结构示意图;1 is a schematic structural diagram of a fraud number identification system 100 in an embodiment of the application;
图2为本申请实施例提供的一种诈骗号码识别系统100的应用场景示意图;FIG. 2 is a schematic diagram of an application scenario of a fraud number identification system 100 provided by an embodiment of the present application;
图3为本申请实施例提供的另一种诈骗号码识别系统100的应用场景示意图;3 is a schematic diagram of an application scenario of another fraud number identification system 100 provided by an embodiment of the present application;
图4为部署有诈骗号码识别系统100的计算设备200的硬件结构示意图;FIG. 4 is a schematic diagram of the hardware structure of the computing device 200 deployed with the fraud number identification system 100;
图5为本申请实施例提供的一种诈骗号码识别系统识别诈骗号码的流程示意图;5 is a schematic flowchart of identifying a fraudulent number by a fraudulent number identification system according to an embodiment of the present application;
图6为本申请实施例提供的诈骗号码识别系统获取待识别的目标号码的流程示意图;6 is a schematic flowchart of obtaining a target number to be identified by the fraud number identification system provided by the embodiment of the present application;
图7为本申请实施例提供的诈骗号码识别系统利用号码识别模型获取目标号码是否为诈骗号码的识别概率的流程示意图;7 is a schematic flowchart of the identification probability of whether a target number is a fraudulent number obtained by the fraudulent number identification system provided by the embodiment of the present application using a number identification model;
图8为本申请实施例提供的一种号码识别模型整体结构示意图;8 is a schematic diagram of the overall structure of a number identification model provided by an embodiment of the present application;
图9为本申请实施例提供的一种多层感知机模型结构示意图;FIG. 9 is a schematic structural diagram of a multilayer perceptron model provided by an embodiment of the present application;
图10为本申请实施例提供的一种卷积神经网络模型结构示意图。FIG. 10 is a schematic structural diagram of a convolutional neural network model provided by an embodiment of the present application.
通过上述附图,已示出本申请明确的实施例,后文中将有更详细的描述。这些附图和文字描述并不是为了通过任何方式限制本申请构思的范围,而是通过参考特定实施例为本领域技术人员说明本申请的概念。Specific embodiments of the present application have been shown by the above-mentioned drawings, and will be described in more detail hereinafter. These drawings and written descriptions are not intended to limit the scope of the concepts of the present application in any way, but to illustrate the concepts of the present application to those skilled in the art by referring to specific embodiments.
具体实施方式Detailed ways
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本申请的一些方面相一致的装置和方法的例子。Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. Where the following description refers to the drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the illustrative examples below are not intended to represent all implementations consistent with this application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as recited in the appended claims.
多层感知机(Multilayer Perceptron,MLP):是一种前馈人工神经网络模型,其将输入的多个数据集映射到单一的输出的数据集上。Multilayer Perceptron (MLP): is a feedforward artificial neural network model that maps multiple input datasets to a single output dataset.
卷积神经网络(Convolutional Neural Network,CNN):是一类包含卷积计算且具有深度结构的前馈神经网络,是深度学习的代表算法之一。其优势在于卷积核创造局部视野,感受二维数据的空间相关性,并可通过调整卷积核大小和形状改变感受视野,从而学习多样化的高维特征。Convolutional Neural Network (CNN): It is a kind of feedforward neural network with convolution calculation and deep structure, which is one of the representative algorithms of deep learning. The advantage is that the convolution kernel creates a local field of view and perceives the spatial correlation of two-dimensional data, and can change the perceived field of view by adjusting the size and shape of the convolution kernel, thereby learning diverse high-dimensional features.
现有的识别诈骗号码的方式主要有以下几种:The existing methods for identifying fraudulent numbers mainly include the following:
现有技术1:Prior art 1:
运营商将诈骗号码识别的相关业务众包给互联网公司,互联网公司采用用安全软件的方式为用户提供诈骗号码标注及提示服务。用户在终端上安装该安全软件后,当用户接到疑似诈骗号码来电后,即可利用该安全软件标注该号码为诈骗号码的标签。这样,凡是安装了该安全软件的终端,后续在接收到该被标记的号码来电时,均会显示诈骗号码的标签,起到提示的作用。Operators crowdsource the services related to fraudulent number identification to Internet companies, and Internet companies use security software to provide users with fraudulent number marking and prompting services. After the user installs the security software on the terminal, when the user receives a call from a suspected fraudulent number, the user can use the security software to label the number as a fraudulent number. In this way, any terminal that has installed the security software will display the label of the fraudulent number to serve as a reminder when receiving an incoming call from the marked number.
然而,上述方式需要用户手动标注,且诈骗号码在被标注之前,往往已经存在大量呼出行为,甚至可能已经存在诈骗成功的情况。另一方面,诈骗号码的标记需广大用户共同参与和维护,这种方式完全依靠用户的自觉性。因此,采用上述方式识别诈骗号码,存在滞后性的问题,也存在操作繁琐的问题。However, the above method requires the user to manually mark, and before the fraudulent number is marked, there are often a large number of outbound calls, and there may even be successful fraud. On the other hand, the marking of fraudulent numbers requires the participation and maintenance of the majority of users, which completely depends on the consciousness of users. Therefore, using the above method to identify fraudulent numbers has the problem of hysteresis and complicated operation.
现有技术2:Prior art 2:
通信运营商提出了运用数据挖掘或机器学习技术建立反诈模型的防诈反诈方法。该方法利用传统机器学习模型(例如决策树、xgboost、支持向量机等)构建的反诈模型,根据号码基础信息和历史行为,可以自动识别号码是否为诈骗号码。Communication operators have proposed anti-fraud and anti-fraud methods that use data mining or machine learning technology to build anti-fraud models. This method uses the anti-fraud model constructed by traditional machine learning models (such as decision tree, xgboost, support vector machine, etc.), and can automatically identify whether the number is a fraudulent number based on the basic information and historical behavior of the number.
该反诈模型的输入特征构造过程一般为数据挖掘结合经验选择号码基本信息,并加上经过统计处理的呼叫行为数据/上网流量数据。The input feature construction process of the anti-fraud model is generally data mining combined with experience to select the basic information of the number, and add the call behavior data/Internet traffic data after statistical processing.
通过上述反诈模型自动识别号码是否为诈骗号码的方式,虽然可以避免前述现有技术1所说的滞后性和操作繁琐的问题,但是,上述反诈模型的输入特征是由人为基于号码的历史行为筛选分析的。即,上述反诈模型的输入特征均为一定时间区间内的统计意义的信息。统计得到的这些信息中丢弃了呼叫行为数据中能够反映该号码的历史呼叫行为在时间上的相关性的信息。因此,上述方式存在对诈骗号码判断的准确度较低的问题。Through the method of automatically identifying whether a number is a fraud number by the above-mentioned anti-fraud model, although the problems of hysteresis and complicated operation mentioned in the aforementioned prior art 1 can be avoided, the input feature of the above-mentioned anti-fraud model is artificially based on the history of the number. Behavioral screening analysis. That is, the input features of the above-mentioned anti-fraud model are all statistically significant information within a certain time interval. Information that can reflect the temporal correlation of the historical calling behavior of the number in the call behavior data is discarded from the information obtained by statistics. Therefore, the above method has the problem that the accuracy of judging the fraudulent number is low.
有鉴于此,本申请提供一种诈骗号码识别方法,可以结合能够反映号码的历史呼叫行为在时间上的相关性的信息,自动且准确的识别出该号码是否为诈骗号码。应理解,本申请实施例的方法的执行主体可以是诈骗号码识别系统。下面对诈骗号码识别系统进行介绍和说明:In view of this, the present application provides a method for identifying a fraudulent number, which can automatically and accurately identify whether the number is a fraudulent number in combination with information that can reflect the temporal correlation of historical calling behaviors of the number. It should be understood that the execution body of the method in this embodiment of the present application may be a fraud number identification system. The following is an introduction and description of the fraud number identification system:
图1为本申请实施例提供的一种诈骗号码识别系统100的结构示意图。应理解,图1仅是示例性地展示了诈骗号码识别系统100的一种结构化示意图,本申请并不限定对诈骗号码识别系统100中的模块的划分。如图1所示,诈骗号码识别系统100可以包括实时监控模块11、数据获取模块12、诈骗检测模块13。可选地,该系统还可以包括下述至少一项:名单维护模块14、智能防诈模块15、通知与交互模块16。FIG. 1 is a schematic structural diagram of a fraudulent number identification system 100 according to an embodiment of the present application. It should be understood that FIG. 1 is only a schematic diagram showing a structure of the fraudulent number identification system 100 , and the present application does not limit the division of the modules in the fraudulent number identification system 100 . As shown in FIG. 1 , the fraud number identification system 100 may include a real-time monitoring module 11 , a
下面简要地描述诈骗号码识别系统100中的各个模块的功能:The function of each module in the fraudulent number identification system 100 is briefly described below:
实时监控模块11,用于获取待识别的目标号码。作为一种可能的实现方式,实时监控模块11,具体用于获取目标区域的N个号码在第二历史时长内的呼出次数。N为大于或等于1的整数。若N个号码中存在呼出次数大于呼出阈值的号码,则将该号码作为候选号码。呼出阈值与目标区域的诈骗电话呼出量有关。例如,实时监控模块11,具体用于将从候选号码中确定目标号码。将未位于白名单和灰名单中的候选号码作为目标号码。其中,白名单为非诈骗号码的名单,灰名单为候选诈骗号码的名单。The real-time monitoring module 11 is used to obtain the target number to be identified. As a possible implementation manner, the real-time monitoring module 11 is specifically configured to acquire the number of outgoing calls of the N numbers of the target area within the second historical time period. N is an integer greater than or equal to 1. If there is a number whose number of outgoing calls is greater than the outgoing threshold among the N numbers, the number is taken as a candidate number. The outgoing call threshold is related to the volume of outgoing fraudulent calls in the target area. For example, the real-time monitoring module 11 is specifically configured to determine the target number from the candidate numbers. The candidate numbers that are not in the whitelist and greylist are used as target numbers. Among them, the white list is the list of non-fraud numbers, and the grey list is the list of candidate fraud numbers.
数据获取模块12,用于获取目标号码的基础数据,以及,目标号码在第一历史时长内的流量详单数据和通话详单数据。The
诈骗检测模块13,用于根据基础数据、流量详单数据和通话详单数据,利用号码识别模型,获取目标号码是否为诈骗号码的识别概率。根据识别概率,确定目标号码是否为诈骗号码。The
作为一种可能的实现方式,诈骗检测模块13,具体用于对流量详单数据进行数据加工,得到加工后的流量详单数据;对基础数据、加工后的流量详单数据进行数据清洗,并利用清洗后的基础数据和清洗后的流量详单数据,构建第一特征向量;对通话详单数据进行数据清洗,并利用清洗后的通话详单数据,构建第二特征向量;根据第一特征向量,以及,第二特征向量,利用号码识别模型,获取目标号码是否为诈骗号码的识别概率。As a possible implementation manner, the
其中,号码识别模型包括:采用多层感知机搭建的第一子模型、采用卷积神经网络搭建的第二子模型,以及,分类层;第一子模型,用于根据第一特征向量获取目标号码的第一特征信息;第二子模型,用于根据第二特征向量获取第二特征信息;分类层,用于根据第一特征信息,以及,第二特征信息,获取目标号码是否为诈骗号码的识别概率。Wherein, the number recognition model includes: a first sub-model constructed by using a multi-layer perceptron, a second sub-model constructed by using a convolutional neural network, and a classification layer; the first sub-model is used to obtain the target according to the first feature vector. The first feature information of the number; the second submodel is used to obtain the second feature information according to the second feature vector; the classification layer is used to obtain whether the target number is a fraudulent number according to the first feature information and the second feature information recognition probability.
作为一种可能的实现方式,名单维护模块14,用于若根据识别概率,确定目标号码非诈骗电话,则将目标号码添加至白名单。若根据识别概率,确定目标号码可能为诈骗电话,则将目标号码添加至灰名单。若根据识别概率,确定目标号码为诈骗电话,则将目标号码添加至黑名单。黑名单为诈骗号码的名单。As a possible implementation manner, the
作为一种可能的实现方式,智能防诈模块15,用于若目标号码为黑名单中的号码,则对目标号码执行封停操作。As a possible implementation manner, the intelligent
作为一种可能的实现方式,通知与交互模块16,用于若目标号码为灰名单中的号码,则向目标号码的呼叫号码发送目标号码异常的提示信息。若接收到呼叫号码针对目标号码的反馈结果、且反馈结果用于表征目标号码是否诈骗号码,则记录目标号码的反馈结果。若目标号码的累计反馈结果表征目标号码非诈骗号码。则将目标号码从灰名单迁移至白名单。若目标号码的累计反馈结果表征目标号码为诈骗号码,则将目标号码从灰名单迁移至黑名单。As a possible implementation manner, the notification and
需要说明的是,上述图1所示的诈骗号码识别系统100的各模块的划分仅是一种示意,本申请对各模块的划分,以及,各模块的命名并不进行限定。It should be noted that the division of each module of the fraud number identification system 100 shown in FIG. 1 is only an illustration, and the present application does not limit the division of each module and the name of each module.
图2为本申请实施例提供的一种诈骗号码识别系统100的应用场景示意图,如图2所示,在一种实施例中,诈骗号码识别系统100可全部部署在云环境中。云环境是云计算模式下利用基础资源向用户提供云服务的实体。云环境包括云数据中心和云服务平台,云数据中心包括云服务提供商拥有的大量基础资源(包括计算资源、存储资源和网络资源),云数据中心包括的计算资源可以是大量的计算设备(例如服务器)。例如,以云数据中心包括的计算资源是运行有虚拟机的服务器为例,诈骗号码识别系统100可以独立地部署在云数据中心中的服务器或虚拟机上,诈骗号码识别系统100也可以分布式地部署在云数据中心中的多台服务器上、或者分布式地部署在云数据中心中的多台虚拟机上、再或者分布式地部署在云数据中心中的服务器和虚拟机上。FIG. 2 is a schematic diagram of an application scenario of a fraudulent number identification system 100 provided by an embodiment of the present application. As shown in FIG. 2 , in an embodiment, the fraudulent number identification system 100 may be fully deployed in a cloud environment. Cloud environment is an entity that utilizes basic resources to provide cloud services to users under the cloud computing model. The cloud environment includes cloud data centers and cloud service platforms. Cloud data centers include a large number of basic resources (including computing resources, storage resources, and network resources) owned by cloud service providers. The computing resources included in cloud data centers can be a large number of computing devices ( e.g. server). For example, taking the computing resource included in the cloud data center as a server running a virtual machine, the fraudulent number identification system 100 can be independently deployed on the server or virtual machine in the cloud data center, and the fraudulent number identification system 100 can also be distributed It is deployed on multiple servers in the cloud data center, or distributed on multiple virtual machines in the cloud data center, or distributed on servers and virtual machines in the cloud data center.
如图2所示,诈骗号码识别系统100例如可以由云服务提供商在云服务平台抽象成一种诈骗号码识别服务提供给用户,用户在云服务平台购买该云服务后(例如,可预充值再根据最终资源的使用情况进行结算),云环境利用部署在云数据中心的诈骗号码识别系统100向用户提供诈骗号码识别服务。用户在使用诈骗号码识别服务时,可以通过应用程序接口(application program interface,API)或者图形界面接口(Graphics UserInterface,GUI)指定需要诈骗号码识别系统100识别的号码,云环境中的诈骗号码识别系统100可以对该号码执行识别的操作,诈骗号码识别系统100通过API或者GUI向用户返回识别结果,并可以根据不同的识别结果,对目标号码采取不同的处理方式,或者调用其他系统对该目标号码进行相应的处理。As shown in FIG. 2 , the fraudulent number identification system 100 can be abstracted into a fraudulent number identification service by the cloud service provider on the cloud service platform, for example, and provided to the user. Settlement is performed according to the usage of the final resource), the cloud environment provides fraudulent number recognition services to users by using the fraudulent number identification system 100 deployed in the cloud data center. When using the fraudulent number identification service, the user can specify the number that needs to be recognized by the fraudulent number recognition system 100 through an application program interface (API) or a graphical interface (Graphics UserInterface, GUI). The fraudulent number recognition system in the cloud environment 100 can perform the operation of identifying the number, and the fraud number identification system 100 returns the identification result to the user through API or GUI, and can adopt different processing methods for the target number according to different identification results, or call other systems to the target number. Carry out corresponding processing.
图3为本申请实施例提供的另一种诈骗号码识别系统100的应用场景示意图,本申请实施例提供的诈骗号码识别系统100的部署较为灵活,如图3 所示,在另一种实施例中,本申请实施例提供的诈骗号码识别系统100还可以分布式地部署在不同的环境中。本申请提供的诈骗号码识别系统100可以在逻辑上分成多个部分,每个部分具有不同的功能。诈骗号码识别系统100 中的各部分可以分别部署在终端计算设备(位于用户侧)、边缘环境和云环境中的任意两个或三个中。位于用户侧的终端计算设备例如可以包括下述至少一种:终端服务器、智能手机、笔记本电脑、平板电脑、个人台式电脑等。边缘环境为包括距离终端计算设备较近的边缘计算设备集合的环境,边缘计算设备包括:边缘服务器、拥有计算力的边缘小站等。部署在不同环境或设备的诈骗号码识别系统100的各个部分协同实现为用户提供诈骗号码识别的功能。应理解,本申请实施例不对诈骗号码识别系统100的哪些部分部署具体部署在什么环境进行限制性的划分,实际应用时可根据终端计算设备的计算能力、边缘环境和云环境的资源占有情况或具体应用需求进行适应性的部署。图3是以诈骗号码识别系统100分别部署在边缘环境和云环境为例的应用场景示意图。FIG. 3 is a schematic diagram of an application scenario of another fraudulent number identification system 100 provided by an embodiment of the present application. The deployment of the fraudulent number identification system 100 provided by the embodiment of the present application is relatively flexible. As shown in FIG. 3 , in another embodiment , the fraudulent number identification system 100 provided by the embodiment of the present application may also be deployed in different environments in a distributed manner. The fraudulent number identification system 100 provided by the present application can be logically divided into multiple parts, and each part has different functions. Each part of the fraud number identification system 100 can be deployed in any two or three of the terminal computing device (located on the user side), the edge environment and the cloud environment, respectively. The terminal computing device on the user side may include, for example, at least one of the following: a terminal server, a smart phone, a notebook computer, a tablet computer, a personal desktop computer, and the like. The edge environment is an environment including a set of edge computing devices close to the terminal computing device, and the edge computing devices include: edge servers, edge small stations with computing power, and the like. Various parts of the fraudulent number recognition system 100 deployed in different environments or devices cooperate to realize the function of providing fraudulent number recognition for users. It should be understood that the embodiments of the present application do not restrictively divide which parts of the fraud number identification system 100 are deployed in what environment. Adaptive deployment to specific application requirements. FIG. 3 is a schematic diagram of an application scenario in which the fraudulent number identification system 100 is deployed in an edge environment and a cloud environment as an example.
诈骗号码识别系统100也可以单独部署在任意环境中的一个计算设备上 (例如:单独部署在边缘环境的一个边缘服务器上)。图4为部署有诈骗号码识别系统100的计算设备200的硬件结构示意图,图4所示的计算设备200 包括存储器201、处理器202、通信接口203。存储器201、处理器202、通信接口203之间彼此通信连接。例如,存储器201、处理器202、通信接口 203之间可以采用网络连接的方式,实现通信连接。或者,上述计算设备200 还可以包括总线204。存储器201、处理器202、通信接口203通过总线204 实现彼此之间的通信连接。图4是以存储器201、处理器202、通信接口203 通过总线204实现彼此之间的通信连接的计算设备200。The fraudulent number identification system 100 can also be deployed separately on a computing device in any environment (eg, deployed separately on an edge server in an edge environment). FIG. 4 is a schematic diagram of the hardware structure of the computing device 200 in which the fraudulent number identification system 100 is deployed. The computing device 200 shown in FIG. 4 includes a memory 201 , a
存储器201可以是只读存储器(Read Only Memory,ROM),静态存储设备,动态存储设备或者随机存取存储器(Random Access Memory,RAM)。存储器201可以存储程序,当存储器201中存储的程序被处理器202执行时,处理器202和通信接口203用于执行诈骗号码识别系统100识别诈骗号码的方法。存储器还可以存储诈骗号码识别系统100识别诈骗号码所需的数据,例如,目标号码的基础数据、流量详单数据和通话详单数据等。The memory 201 may be a read only memory (Read Only Memory, ROM), a static storage device, a dynamic storage device, or a random access memory (Random Access Memory, RAM). The memory 201 may store a program, and when the program stored in the memory 201 is executed by the
处理器202可以采用通用的中央处理器(Central Processing Unit,CPU),微处理器,应用专用集成电路(Application Specific Integrated Circuit,ASIC),图形处理器(graphics processing unit,GPU)或者一个或多个集成电路。The
处理器202还可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,本申请的诈骗号码识别系统100的功能可以通过处理器202中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器202还可以是通用处理器、数字信号处理器(digitalsignal processing,DSP)、专用集成电路 (ASIC)、现成可编程门阵列(fieldprogrammable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件,可以实现或者执行本申请下文实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请下文实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器201,处理器202读取存储器201中的信息,结合其硬件完成本申请实施例的诈骗号码识别系统100的功能。The
通信接口203使用例如但不限于收发器一类的收发模块,来实现计算设备200与其他设备或通信网络之间的通信。例如,可以通过通信接口203获取数据集。The
当上述计算设备200包括总线204时,总线204可包括在计算设备200 各个部件(例如,存储器201、处理器202、通信接口203)之间传送信息的通路。When the computing device 200 described above includes a bus 204, the bus 204 may include a pathway for communicating information between the various components of the computing device 200 (eg, memory 201,
下面以诈骗号码识别系统为例,对如何识别目标号码是否为诈骗号码进行说明。下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例中不再赘述。下面将结合附图,对本申请的实施例进行描述。The following takes the fraudulent number identification system as an example to describe how to identify whether the target number is a fraudulent number. The following specific embodiments may be combined with each other, and the same or similar concepts or processes may not be repeated in some embodiments. The embodiments of the present application will be described below with reference to the accompanying drawings.
图5为本申请实施例提供的一种诈骗号码识别系统识别诈骗号码的流程示意图,如图5所示,该方法包括:FIG. 5 is a schematic flowchart of identifying fraudulent numbers by a fraudulent number identification system provided by an embodiment of the present application. As shown in FIG. 5 , the method includes:
S101、获取待识别的目标号码。S101. Acquire a target number to be identified.
例如,诈骗号码识别系统可以接收该目标号码,或者,从接收到的多个号码中确定出的号码。上述所说的接收例如可以是用户输入的,或者是通信运营商系统发送的。For example, the fraudulent number identification system may receive the target number, or a number determined from a plurality of numbers received. The above-mentioned receiving may be, for example, input by a user or sent by a communication operator system.
S102、获取目标号码的基础数据,以及,目标号码在第一历史时长内的流量详单数据和通话详单数据。S102. Acquire basic data of the target number, as well as traffic detailed data and call detailed data of the target number within the first historical period.
诈骗号码识别系统例如可以从通信运营商系统获取目标号码的基础数据,以及,目标号码在第一历史时长内的流量详单数据和通话详单数据。For example, the fraudulent number identification system may obtain the basic data of the target number, as well as the traffic detail data and the call detail data of the target number within the first historical period, from the communication operator system.
上述基础数据用于表征目标号码的基本信息,例如可以包括产品编码、产品资费、客户类型、归属地市、入网渠道、入网时间、话费余额等中一项或多项。The above basic data is used to represent the basic information of the target number, for example, it may include one or more of product code, product tariff, customer type, home city, network access channel, network access time, and call balance.
上述流量详单用于表征目标号码在第一历史时长内的流量使用情况,例如可以包括目标号码在第一历史时长内每天使用的流量数、使用流量的时间、产生流量消耗的应用程序(Application,APP)等。The traffic detailed list above is used to characterize the traffic usage of the target number in the first historical period, for example, it may include the number of traffic used by the target number every day in the first historical period, the time of using the traffic, and the application program (Application Program) that generates the traffic consumption. , APP) etc.
上述通话详单数据用于表征目标号码在第一历史时长内的通话情况。通话详单数据可以包括目标号码在第一历史时长内的多条通话详单。一条通话详单例如可以包括通话时间、通话时长、呼叫类型、本机通话地、对方归属地、通话类型、蜂窝号、基站号、小区号、终端串号、业务类型、通话费用数据等中的一项或多项。The above-mentioned call detail data is used to represent the call situation of the target number within the first historical time period. The call detail list data may include multiple call detail lists of the target number within the first historical period. A call detail list may include, for example, call time, call duration, call type, local call location, other party's home location, call type, cell number, base station number, cell number, terminal serial number, service type, call cost data, etc. one or more.
通过分析目标号码在不同通话时间产生的通话详单,可以获得目标号码通话的时间相关性特征。该时间相关性特征用于表征目标号码通话的时间特性,该时间特性例如可以是目标号码的通话时间特性,或者,目标号码的通话时长特性,或者目标号码的通话间隔特性。诈骗号码在通话时间和通话时长上具有其特有的特征。因此,可以借助该时间相关性准确识别号码是否为诈骗号码。By analyzing the call details generated by the target number at different call times, the time-dependent characteristics of the call of the target number can be obtained. The time correlation feature is used to characterize the time characteristic of the call of the target number, for example, the time characteristic may be the call time characteristic of the target number, or the call duration characteristic of the target number, or the call interval characteristic of the target number. Scam numbers have their own characteristics in terms of call time and call duration. Therefore, whether the number is a fraudulent number can be accurately identified by means of the time correlation.
上述第一历史时长可以是一个月,也可以是一天。本申请对此不做限制。该第一历史时长可以是以当前时刻作为截止时间的一个历史时间段,也可以是预设的满足第一历史时长的某一历史时间段等。The above-mentioned first historical duration may be one month or one day. This application does not limit this. The first historical time period may be a historical time period with the current moment as the deadline, or may be a preset historical time period that satisfies the first historical time period, or the like.
S103、根据基础数据、流量详单数据和通话详单数据,利用号码识别模型,获取目标号码是否为诈骗号码的识别概率。S103 , according to the basic data, the traffic detailed bill data, and the call detailed bill data, using a number identification model to obtain the identification probability of whether the target number is a fraudulent number.
诈骗号码识别系统在获取到目标号码的基础数据、流量详单数据和通话详单数据之后,可以直接将三者输入至号码识别模型,以得到目标号码是否为诈骗号码的识别概率。After obtaining the basic data, traffic detail data and call detail data of the target number, the fraud number recognition system can directly input the three data into the number recognition model to obtain the recognition probability of whether the target number is a fraud number.
或者,诈骗号码识别系统在获取到目标号码的基础数据、流量详单数据和通话详单数据之后,也可以根据号码识别模型所需的输入数据的格式,对基础数据、流量详单数据和通话详单数据这三者中的至少一者进行预处理,以得到满足输入数据格式的数据。然后,将满足输入数据格式的数据输入至号码识别模型,以得到目标号码是否为诈骗号码的识别概率。上述所说的预处理例如可以包括:数据加工和/或数据清洗等。Alternatively, after obtaining the basic data, traffic detail data and call detail data of the target number, the fraud number identification system can also identify the basic data, traffic detail data and call data according to the format of the input data required by the number identification model. At least one of the detailed bill data is preprocessed to obtain data satisfying the input data format. Then, the data satisfying the input data format is input into the number identification model to obtain the identification probability of whether the target number is a fraudulent number. The above-mentioned preprocessing may include, for example, data processing and/or data cleaning.
应理解,具体对基础数据、流量详单数据和通话详单数据是否进行处理,以及,进行何种处理,与号码识别模型所需的输入数据的格式有关,或者说,与号码识别模型训练过程中所使用的数据格式有关,对此不再赘述。It should be understood that whether to process the basic data, traffic detailed data and call detailed data, and what kind of processing is performed is related to the format of the input data required by the number recognition model, or, in other words, it is related to the training process of the number recognition model. It is related to the data format used in , and will not be repeated here.
本申请对上述号码识别模型不做限定,例如,可以是任一具有二分类功能的模型。This application does not limit the above-mentioned number recognition model, for example, it may be any model with a binary classification function.
上述识别概率用于判断目标号码是否是诈骗号码。其输出的结果可以包括:目标号码是诈骗号码的概率,或者,目标号码不是诈骗号码的概率,或者,既包括目标号码是诈骗号码的概率,又包括目标号码不是诈骗号码的概率。当包括两个概率时,该两个概率之和可以等于预设值,该预设值例如可以是1。The above identification probability is used to determine whether the target number is a fraudulent number. The output result may include: the probability that the target number is a fraudulent number, or the probability that the target number is not a fraudulent number, or both the probability that the target number is a fraudulent number and the probability that the target number is not a fraudulent number. When two probabilities are included, the sum of the two probabilities may be equal to a preset value, and the preset value may be 1, for example.
示例性的,若上述识别概率用于表征目标号码是诈骗号码的概率,则识别概率的取值与目标号码是诈骗号码的可能性可以是正相关,或者是负相关。以正相关为例,即,识别概率的取值越大,目标号码是诈骗号码的可能性越大。Exemplarily, if the above identification probability is used to represent the probability that the target number is a fraudulent number, the value of the identification probability and the possibility that the target number is a fraudulent number may be positively correlated or negatively correlated. Taking positive correlation as an example, that is, the larger the value of the recognition probability, the higher the possibility that the target number is a fraudulent number.
S104、根据识别概率,确定目标号码是否为诈骗号码。S104. Determine whether the target number is a fraudulent number according to the identification probability.
以使用目标号码是诈骗号码的概率来确定目标号码是否为诈骗号码为例,诈骗号码识别系统可以设置一个或多个识别概率阈值,将从号码识别模型获取的识别概率同上述识别概率阈值对比,以确定目标号码是否为诈骗号码。示例性的,诈骗号码识别系统设置有第一识别概率阈值和第二识别概率阈值。其中,第一识别概率阈值大于第二识别概率阈值。示例性的,上述第一识别概率阈值可以是80%,上述第二识别概率阈值可以是40%。Taking the probability that the target number is a fraudulent number to determine whether the target number is a fraudulent number as an example, the fraudulent number recognition system can set one or more recognition probability thresholds, and compare the recognition probability obtained from the number recognition model with the above recognition probability thresholds, To determine whether the target number is a fraudulent number. Exemplarily, the fraudulent number identification system is provided with a first identification probability threshold and a second identification probability threshold. Wherein, the first recognition probability threshold is greater than the second recognition probability threshold. Exemplarily, the first recognition probability threshold may be 80%, and the second recognition probability threshold may be 40%.
在该实现方式下,当目标号码的识别概率大于第一识别概率阈值时,则确定目标号码为诈骗号码;当目标号码的识别概率大于第二识别概率阈值且小于等于第一识别概率阈值时,则确定目标号码可能为诈骗号码;当目标号码的识别概率小于等于第二识别概率阈值时,则确定目标号码不是诈骗号码。In this implementation, when the recognition probability of the target number is greater than the first recognition probability threshold, the target number is determined to be a fraudulent number; when the recognition probability of the target number is greater than the second recognition probability threshold and less than or equal to the first recognition probability threshold, Then it is determined that the target number may be a fraudulent number; when the recognition probability of the target number is less than or equal to the second recognition probability threshold, it is determined that the target number is not a fraudulent number.
本实施例提供的号码处理方法,诈骗号码识别系统可以获取所获取的目标号码的基础数据、在第一历史时长内的流量详单数据和通话详单数据。由于在第一历史时长内的通话详单数据中包括能够反映目标号码的历史呼叫行为在时间上的相关性的信息,因此,基于所获取的目标号码的基础数据、在第一历史时长内的流量详单数据和通话详单数据,可以准确的识别出目标号码是否为诈骗号码。In the number processing method provided in this embodiment, the fraudulent number identification system can acquire the acquired basic data of the target number, the traffic detail data and the call detail data within the first historical period. Since the call detail data in the first historical duration includes information that can reflect the temporal correlation of historical calling behaviors of the target number, based on the acquired basic data of the target number, the data in the first historical duration Traffic detail data and call detail data can accurately identify whether the target number is a fraudulent number.
上述方法,不依赖于用户对诈骗号码的标记,解决了依赖用户标记诈骗号码存在的滞后性和操作繁琐的问题。此外,本实施例结合能够反映号码的历史呼叫行为在时间上的相关性的信息作为判断基础,进一步提高了识别诈骗号码的准确性。The above method does not depend on the user's marking of fraudulent numbers, and solves the problems of hysteresis and cumbersome operations of relying on users to mark fraudulent numbers. In addition, in this embodiment, the information that can reflect the temporal correlation of the historical calling behavior of the number is used as the judgment basis, which further improves the accuracy of identifying fraudulent numbers.
下面对上述步骤S101中诈骗号码识别系统获取待识别的目标号码进行示例说明。图6为诈骗号码识别系统获取待识别的目标号码的流程示意图。如图6所示,作为一种可能的实现方式,上述步骤S101可以包括如下步骤:The following is an example to illustrate the acquisition of the target number to be identified by the fraudulent number identification system in the above step S101. FIG. 6 is a schematic flowchart of a fraudulent number identification system acquiring a target number to be identified. As shown in FIG. 6, as a possible implementation manner, the above step S101 may include the following steps:
S201、获取目标区域的N个号码在第二历史时长内的呼出次数。S201. Acquire the number of outgoing calls of the N numbers of the target area within the second historical time period.
诈骗号码识别系统例如可以从通信运营商系统获取N个号码在第二历史时长内的呼出次数。其中,上述N为大于或等于1的整数,该N个号码可以是目标区域覆盖范围下的所有号码,也可以是部分号码。For example, the fraudulent number identification system may acquire the number of outgoing calls of the N numbers within the second historical time period from the communication operator system. The above N is an integer greater than or equal to 1, and the N numbers may be all numbers under the coverage of the target area, or may be part of the numbers.
上述目标区域可以是任一所需识别的区域,该区域的大小具体与区域的划分方式有关,该区域可以是行政意义上划分的区域,也可以是根据用户实际需求自定义的区域。例如,该目标区域可以是国家、省、市、县、某小区、某基站的服务区域等中任一种。The above-mentioned target area may be any area to be identified, and the size of the area is specifically related to the division method of the area. The area may be an administratively divided area, or an area customized according to the actual needs of the user. For example, the target area may be any of a country, a province, a city, a county, a certain cell, a service area of a certain base station, and the like.
上述第二历史时长可以是一个月,也可以是一天。本申请对此不做限制。该第二历史时长可以是以当前时刻作为截止时间的一个历史时间段,也可以是预设的满足第二历史时长的某一历史时间段等。第一历史时长和第二历史时长可以相同也可以不同。The above-mentioned second historical duration may be one month or one day. This application does not limit this. The second historical time period may be a historical time period with the current moment as the deadline, or may be a preset historical time period that satisfies the second historical time period, or the like. The first history duration and the second history duration may be the same or different.
S202、若N个号码中存在呼出次数大于呼出阈值的号码,则将该号码作为候选号码。S202. If there is a number whose number of outgoing calls is greater than the outgoing threshold among the N numbers, use the number as a candidate number.
上述呼出阈值可以为一预设的固定值。或者,上述呼出阈值与目标区域的诈骗号码呼出量有关。即,该目标区域的呼出阈值会随该目标区域的诈骗号码呼出量的变化而变化。例如,诈骗号码呼出量越大,呼出阈值越小等。上述所说的目标区域的诈骗号码呼出量可以是第二历史时长内的呼出量,也可以是其他历史时长内的呼出量,也可以是截止至当前的累计呼出量等。The above-mentioned outgoing threshold may be a preset fixed value. Alternatively, the above-mentioned outgoing threshold is related to the outgoing volume of fraudulent numbers in the target area. That is, the outgoing call threshold of the target area will change with the change of the call volume of fraudulent numbers in the target area. For example, the larger the outgoing volume of fraudulent numbers, the smaller the outgoing threshold. The above-mentioned outgoing volume of fraudulent numbers in the target area may be the outgoing volume in the second historical period, the outgoing volume in other historical durations, or the accumulated outgoing volume up to the present.
以上述呼出阈值与目标区域的诈骗号码呼出量有关为例,示例性的,可以根据目标区域的诈骗号码呼出量,对预设的呼出基准阈值进行调整,得到该目标区域的呼出阈值,例如可以采用下述公式:Taking the above-mentioned outgoing threshold value related to the outgoing volume of fraudulent numbers in the target area as an example, exemplarily, the preset outgoing reference threshold may be adjusted according to the outgoing volume of fraudulent numbers in the target area to obtain the outgoing threshold of the target area, for example, Use the following formula:
目标区域呼出阈值=呼出阈值×e^(-目标区域诈骗号码呼出量占比)Target area outgoing threshold = outgoing threshold × e^(- percentage of outgoing calls from fraudulent numbers in the target area)
其中,目标区域诈骗号码呼出量占比等于该目标区域的诈骗号码呼出量与所有区域的诈骗号码呼出量的比值。Among them, the proportion of outgoing fraudulent numbers in the target area is equal to the ratio of the outgoing fraudulent numbers in the target area to the outgoing fraudulent numbers in all areas.
应理解,诈骗号码识别系统可以采用上述方式对目标区域的呼出阈值进行调整,也可以在目标区域满足预设条件时,采用上述方式对目标区域的呼出阈值进行调整。该预设条件例如可以是目标区域的诈骗号码呼出量在所有区域的诈骗号码呼出量中的排序位置属于预设的排序位置之前。It should be understood that the fraudulent number identification system can use the above method to adjust the call threshold of the target area, and can also use the above method to adjust the call threshold of the target area when the target area meets the preset conditions. The preset condition may be, for example, that the outgoing number of fraudulent numbers in the target area ranks before the preset sorting position in the outgoing number of fraudulent numbers in all areas.
示例性的,诈骗号码识别系统可以先判断目标区域的诈骗号码呼出量在所有区域的诈骗号码呼出量中的排序,以从高到低的排序方式为例,若该目标区域的诈骗号码呼出量的排序位置属于预设的排序位置之前(例如前5名),则可以根据目标区域的诈骗号码呼出量,对预设的呼出基准阈值进行调整,得到该目标区域的呼出阈值。若该目标区域的诈骗号码呼出量的排序位置位于预设的位置之后,则可以将预设的呼出基准阈值作为该目标区域的呼出阈值。Exemplarily, the fraudulent number identification system may first determine the order of the fraudulent number outgoing volume of the target area among the fraudulent number outgoing volume of all areas. The ranking position of the number belongs to the preset ranking position (for example, the top 5), then the preset calling threshold can be adjusted according to the call volume of fraudulent numbers in the target area to obtain the calling threshold of the target area. If the ranking position of the outgoing volume of fraudulent numbers in the target area is located after the preset position, the preset outgoing reference threshold may be used as the outgoing threshold of the target area.
通过这种获取呼出阈值的方式,可以基于目标区域的诈骗号码呼出量,灵活的调整该目标区域的呼出阈值,从而使用更加准确的呼出阈值对该目标区域的号码进行筛选。Through this method of obtaining the outgoing call threshold, it is possible to flexibly adjust the outgoing threshold of the target area based on the outgoing volume of fraudulent numbers in the target area, so that the numbers in the target area can be screened with a more accurate outgoing threshold.
S203、从候选号码中确定目标号码。S203. Determine the target number from the candidate numbers.
例如,可以将该候选号码作为目标号码,以对该目标号码采用上述图5 对应的方法实施例进行识别。再例如,若该诈骗号码识别系统设置有名单库,该名单库例如可以包括:白名单、黑名单和灰名单。则该诈骗号码识别系统还可以进一步结合名单库,从候选号码中确定目标号码。For example, the candidate number can be used as the target number, so that the target number can be identified by using the method embodiment corresponding to FIG. 5 above. For another example, if the fraudulent number identification system is provided with a list database, the list database may include, for example, a white list, a black list, and a gray list. Then, the fraud number identification system can further combine the list database to determine the target number from the candidate numbers.
其中,白名单中的号码均为非诈骗号码,黑名单中的号码均为诈骗号码,灰名单中的号码均是可能为诈骗号码的号码。The numbers in the white list are all non-fraud numbers, the numbers in the black list are all fraud numbers, and the numbers in the grey list are all numbers that may be fraudulent numbers.
当诈骗号码识别系统确定候选号码之后,可以判断其是否位于三个名单中。若该候选号码未位于任一名单中,则确定该候选号码为目标号码,若该候选号码位于任一名单中,则确定该候选号码为该名单表征的号码的身份。可选的,若历史对黑名单中的号码进行封停操作,则在该实现方式下,可以无需判断候选号码是否位于黑名单中,仅基于白名单和灰名单对候选号码进行判断。即,将未位于白名单和灰名单中的候选号码作为目标号码。After the fraud number identification system determines the candidate number, it can be judged whether it is in the three lists. If the candidate number is not in any list, the candidate number is determined as the target number, and if the candidate number is in any list, the candidate number is determined as the identity of the number represented by the list. Optionally, if the number in the blacklist is blocked in history, in this implementation mode, it is not necessary to judge whether the candidate number is in the blacklist, and the candidate number is judged only based on the whitelist and the greylist. That is, the candidate numbers that are not in the white list and grey list are used as target numbers.
示例性的,诈骗号码识别系统可以先扫描白名单,若候选号码属于白名单,说明该候选号码非诈骗号码,因此,无需对其采用上述图5对应的方法实施例进行识别,流程结束。若候选号码不属于白名单,则继续扫描灰名单。若候选号码属于灰名单,说明该候选号码可能为诈骗号码。因此,也无需对其进行识别。若候选号码也不属于灰名单,说明诈骗号码识别系统对该候选号码并没有明确的定行,需要通过系统对其进行识别,因此需要将该候选号码确定为目标号码投入图5所示的识别流程中。Exemplarily, the fraudulent number identification system may scan the whitelist first. If the candidate number belongs to the whitelist, it means that the candidate number is not a fraudulent number. Therefore, it is not necessary to use the method embodiment corresponding to FIG. 5 to identify it, and the process ends. If the candidate number does not belong to the white list, continue to scan the gray list. If the candidate number belongs to the grey list, it means that the candidate number may be a fraudulent number. Therefore, it does not need to be identified either. If the candidate number does not belong to the grey list, it means that the fraud number identification system has not clearly defined the candidate number, and it needs to be identified by the system. Therefore, the candidate number needs to be determined as the target number and put into the identification shown in Figure 5. in the process.
上述名单库中各名单中所包括的号码可以是预设的,也可以是该诈骗号码系统历史采用上述图5对应的方法实施例识别确定的,也可以既包括了预设的,也包括了该系统历史采用上述图5对应的方法实施例识别确定的。上述预设至白名单中的号码可以是从通信运营商系统中获取的呼叫行为特性可能会被认定为诈骗号码的一些特定行业人员所使用的号码。例如,快递行业、送餐行业等。The numbers included in each list in the above-mentioned list database may be preset, or may be identified and determined in the history of the fraudulent number system using the method embodiment corresponding to FIG. 5 above, or may include both preset and The system history is identified and determined by using the method embodiment corresponding to FIG. 5 above. The numbers preset into the white list may be numbers used by some specific industry personnel whose calling behavior characteristics obtained from the communication operator system may be identified as fraudulent numbers. For example, express delivery industry, food delivery industry, etc.
通过该名单库,可以对候选号码进行二次筛选,以最终选择未被号码识别模型识别过的候选号码作为目标号码。这样,可以避免同一号码的重复判断,减少不必要的资源浪费,同时也提高了系统的识别效率。Through the list library, the candidate numbers can be screened for a second time, and finally the candidate numbers that have not been recognized by the number recognition model can be selected as the target numbers. In this way, repeated judgment of the same number can be avoided, unnecessary waste of resources can be reduced, and the identification efficiency of the system can also be improved.
本实施例提供的识别目标号码的方法,由于诈骗号码会进行频繁的号码呼出,所以,号码呼出次数可以直观的反应出对应号码是否是诈骗号码的可能性。基于这一特点,本实施例基于号码呼出次数来对所获取的号码是否是诈骗号码进行初步判断,从而将明显不具备诈骗号码特征的号码剔除,一方面避免了诈骗号码识别系统对全量号码进行识别造成的算力资源浪费,另一方面也提高了诈骗号码识别系统对诈骗号码的识别效率。In the method for identifying a target number provided in this embodiment, since fraudulent numbers frequently call out numbers, the number of outgoing numbers can intuitively reflect the possibility of whether the corresponding number is a fraudulent number. Based on this feature, this embodiment makes a preliminary judgment on whether the acquired number is a fraudulent number based on the number of outgoing numbers, so as to eliminate the number that obviously does not have the characteristics of a fraudulent number. The waste of computing power resources caused by the identification, on the other hand, also improves the identification efficiency of the fraudulent number by the fraudulent number recognition system.
下面将对诈骗号码识别系统如何获取目标号码是否为诈骗号码的识别概率进行说明。The following will describe how the fraud number recognition system obtains the recognition probability of whether the target number is a fraud number.
图7为本申请实施例提供的诈骗号码识别系统获取目标号码是否为诈骗号码的识别概率的流程示意图。如图7所示,上述步骤S103例如可以包括如下步骤:FIG. 7 is a schematic flow chart of obtaining the identification probability of whether a target number is a fraudulent number by the fraudulent number identification system provided by the embodiment of the present application. As shown in FIG. 7 , the foregoing step S103 may, for example, include the following steps:
S301、对流量详单数据进行数据加工,得到加工后的流量详单数据。S301. Perform data processing on the data of the detailed flow sheet to obtain the detailed data of the flow after processing.
诈骗号码识别系统对流量详单数据进行的数据加工例如可以是数据抽取、数据转换、数据计算等操作中的一种或多种。The data processing performed by the fraudulent number identification system on the traffic detailed data may be, for example, one or more of operations such as data extraction, data conversion, and data calculation.
示例性的,诈骗号码识别系统可以在对原始的流量详单数据中的目标号码在第一历史时长内每天使用的流量数、使用流量的时间、产生流量消耗的应用程序(Application,APP)等进行数据加工,得到第一历史时长内日均使用流量数(例如可以表示为本月日均使用流量数),第一历史时长内使用流量最多的一个或多个时段(例如可以表示为流量使用时段),以及第一历史时长内流量消耗最多的一个或多个APP(例如可以表示为流量使用APP)等。然后基于上述经过数据加工后得到的数据进行后续的诈骗号码识别操作。Exemplarily, the fraudulent number identification system may determine the number of traffic used by the target number in the original traffic detail data in the first historical period of time, the time of using the traffic, the application (Application, APP) that generates traffic consumption, etc. Perform data processing to obtain the average daily traffic usage in the first historical period (for example, it can be expressed as the average daily traffic usage in this month), and one or more time periods with the most traffic in the first historical period (for example, it can be represented as traffic usage time period), and one or more APPs with the most data consumption in the first historical time period (for example, it can be expressed as a data usage APP), etc. Then, a subsequent fraud number identification operation is performed based on the data obtained after data processing.
S302、对基础数据、加工后的流量详单数据进行数据清洗,并利用清洗后的基础数据和清洗后的流量详单数据,构建第一特征向量;对通话详单数据进行数据清洗,并利用清洗后的通话详单数据,构建第二特征向量。S302. Perform data cleaning on the basic data and the processed traffic detail data, and use the cleaned basic data and the cleaned traffic detail data to construct a first feature vector; perform data cleaning on the call detail data, and use The cleaned call detail data is used to construct a second feature vector.
上述数据清洗用于将获取的基础数据、流量详单数据和通话详单数据转换成诈骗号码识别系统所要求的数据表示方式。The above data cleaning is used to convert the acquired basic data, traffic detailed data and call detailed data into the data representation required by the fraud number identification system.
例如,诈骗号码识别系统可以将基础数据、流量详单数据和通话详单中的日期型数据(例如,通话时间、入网时间)转换为诈骗号码识别系统所要求的日期数组的形式。示例性的,诈骗号码识别系统可以运用正则匹配的方式将以日期型数据表示的通话时间转换为日期数组。然后,获取当前日期的周信息,用数字0-6表示。剔除日期数组中对诈骗号码识别系统识别诈骗号码没有影响的年和月的信息,仅保留日、时、分、秒。最后,将日期的周与日、时、分、秒组合形成契合本申请需求的日期数组。例如,通话时间为“2021-10-01 01:00:00”的日期型数据,最终可以表示为(1,1,0,0,5) 的日期数组。For example, the fraudulent number identification system can convert the basic data, traffic detail data, and date data (eg, call time, network access time) in the basic data, traffic details, and call details into the form of a date array required by the fraudulent number identification system. Exemplarily, the fraudulent number identification system can convert the call time represented by date data into a date array by using regular matching. Then, get the week information of the current date, represented by numbers 0-6. Eliminate the information of the year and month in the date array that has no effect on the fraud number identification system to identify fraud numbers, and only retain the day, hour, minute, and second. Finally, combine the week, day, hour, minute, and second of the date to form a date array that meets the requirements of this application. For example, the date data whose call time is "2021-10-01 01:00:00" can finally be represented as a date array of (1, 1, 0, 0, 5).
此外,针对不同的日期型数据,需要根据实际需求对数据进行剔除。例如,入网时间数据可以剔除对诈骗号码识别系统识别诈骗号码没有影响的时、分、秒等信息,仅保留年、月、日等有效信息。In addition, for different date-type data, the data needs to be eliminated according to actual needs. For example, the network access time data can exclude information such as hours, minutes, and seconds that have no effect on the fraud number identification system to identify fraud numbers, and only retain valid information such as year, month, and day.
诈骗号码识别系统可以通过将基础数据、流量详单数据和通话详单中的字符串重新编码的形式将字符串转换成数字。例如,归属地市、本机通话地、对方归属地等表示地区的字符串可以转换成该地区对应的区号。示例性的,“北京”可以用“010”表示,“深圳”可以用“0755”表示,“广州“可以用“020”表示。呼叫类型中的“主叫”、“被叫”等可以分别用0和1表示。对于流量使用APP可以分别对其从0开始依次编号,示例性的,短视频APP1 可以编号为“0”,网购APP2可以编号为“2”,以此类推。客户类型、入网渠道、业务类型等用字符串形式表示的数据均可以采用从0开始依次对数据内容进行编号的形式对数据进行清洗。本申请中,字符串经清洗后的编号值可以根据实际需求设定,在此不对其进行限制。The fraud number identification system can convert strings into numbers by re-encoding the strings in the underlying data, traffic detail data and call detail bills. For example, the character string representing the area, such as the home city, the local call place, the other party's home place, etc., can be converted into the area code corresponding to the area. Exemplarily, "Beijing" can be represented by "010", "Shenzhen" can be represented by "0755", and "Guangzhou" can be represented by "020". "Calling", "Called", etc. in the call type can be represented by 0 and 1, respectively. For traffic usage APPs, they may be numbered sequentially from 0, for example, short video APP1 may be numbered as "0", online shopping APP2 may be numbered as "2", and so on. The data in the form of strings, such as customer type, network access channel, business type, etc., can be cleaned by sequentially numbering the data content from 0. In this application, the serial number value of the string after cleaning can be set according to actual needs, which is not limited here.
由于不同用户使用流量数据情况差别较大,有的可能仅有几KB,有的则有可能有几百GB,数字取值区间非常广。上述的流量数据的表示方式对于诈骗号码识别系统中的号码识别模型并不十分友好,更容易增加模型的出错的可能性,且过长的数字表示将占用模型更多的算力资源。Due to the large differences in the usage of traffic data by different users, some may be only a few KB, and some may be hundreds of GB, and the numerical value range is very wide. The above-mentioned representation of traffic data is not very friendly to the number recognition model in the fraud number recognition system, and it is more likely to increase the possibility of errors in the model, and an excessively long number representation will occupy more computing resources of the model.
诈骗号码识别系统例如可以采用以科学计数法转换流量数据的方式实现对流量详单数据的清洗。示例性的,假定原始流量数据为1000KB,经过数据清洗之后可以表示为(1.0,3)。其中,数组中的第一个数表示流量的基础数据,其有效数字的保留可以根据实际需求设定,例如可以是2,本申请对此不作限制。数组中的第二个数为10的幂次方。其中,幂次方可以只取3的倍数,以方便流量的表示和计算,也可以根据实际需要设置。示例性的,原始流量数据为50,101,000KB,经过数据清洗之后可以表示为(50,6)。For example, the fraud number identification system can use the method of converting the traffic data in scientific notation to realize the cleaning of the traffic detailed data. Exemplarily, it is assumed that the original traffic data is 1000KB, which can be expressed as (1.0, 3) after data cleaning. The first number in the array represents the basic data of the traffic, and the reservation of the significant number can be set according to actual needs, for example, it can be 2, which is not limited in this application. The second number in the array is a power of 10. Among them, the power can only be a multiple of 3 to facilitate the representation and calculation of the flow, and can also be set according to actual needs. Exemplarily, the original traffic data is 50,101,000KB, which can be expressed as (50,6) after data cleaning.
部分数据如产品编码、终端串号等,数据位数往往较多,如88888888,与之同时客户类型只需枚举0和1,差异巨大。这种差异可能会影响号码识别模型的训练速度、拟合情况,甚至导致模型无法收敛。而且,这些数据本身的含义对诈骗号码的识别没有影响,诈骗号码识别系统只需能根据产品编码、终端串号等字段的对应数据能实现对产品或终端的身份识别即可。因此,过长的数据形式对诈骗号码识别系统识别诈骗号码有害无益。Some data, such as product code, terminal serial number, etc., often have more data digits, such as 88888888. At the same time, the customer type only needs to enumerate 0 and 1, and the difference is huge. This difference may affect the training speed and fit of the number recognition model, or even cause the model to fail to converge. Moreover, the meaning of these data itself has no effect on the identification of fraudulent numbers. The fraudulent number identification system only needs to be able to identify products or terminals based on the corresponding data in fields such as product codes and terminal serial numbers. Therefore, the excessively long data form is harmful and unhelpful for the fraud number identification system to identify fraud numbers.
示例性的,诈骗号码识别系统可以通过建立映射的方式对产品编码、终端串号等数据进行重新编号。例如,诈骗号码识别系统可以对获取的第一个目标号码的终端串号编号为1,第二个目标号码的终端串号编号为2,以此类推,并保存其对应的映射关系。当系统获取同一个终端串号时,可以直接根据之前建立的映射关系匹配对应的编号,实现对终端串号的数据清洗。Exemplarily, the fraudulent number identification system may renumber data such as product codes and terminal serial numbers by establishing a mapping. For example, the fraudulent number identification system may number the terminal serial number of the first target number obtained as 1, the terminal serial number of the second target number as 2, and so on, and save the corresponding mapping relationship. When the system obtains the same terminal serial number, it can directly match the corresponding serial number according to the previously established mapping relationship to realize data cleaning of the terminal serial number.
上述基础数据、流量详单数据,以及,通话详单数据均可以根据实际需求,按照上述方法进行数据清洗。The above-mentioned basic data, traffic detailed bill data, and call detailed bill data can all be cleaned according to the above method according to actual needs.
示例性的,本实施例提供一个号码清洗前后的基础数据、流量详单数据和该号码的一条通话详单数据,分别如表1和表2所示,以便于对本实施例的理解。Exemplarily, this embodiment provides basic data before and after cleaning a number, traffic detail data, and a call detail data of the number, as shown in Table 1 and Table 2, respectively, to facilitate understanding of this embodiment.
表1Table 1
表2Table 2
以表1中的清洗后的基础数据和流量详单数据为例,构建的第一特征向量,例如可以是(1,18.0,0,020,510,2021,10,1,10.0,112,3,15, 16,10,0,11,14),共17个特征向量值。Taking the cleaned basic data and traffic detailed data in Table 1 as an example, the constructed first feature vector can be, for example, (1, 18.0, 0, 020, 510, 2021, 10, 1, 10.0, 112, 3 , 15, 16, 10, 0, 11, 14), a total of 17 eigenvector values.
第二特征向量由清洗后的多条通话详单数据组成。示例性的,诈骗号码识别系统可以获取目标号码近30天内的1024条通话详单,然后对其进行数据清洗。以表2为例,清洗后的每条通话详单构建成的特征向量例如可以是 (1,10,30,6,5,20,0,020,0755,1,119,119,9472,50,1,0),共16个特征向量值。最终1024条通话详单构成1024×16的特征向量矩阵,即第二特征向量为1024×16的特征向量矩阵。The second feature vector is composed of multiple pieces of call detail data after cleaning. Exemplarily, the fraudulent number identification system can obtain 1024 call details of the target number in the past 30 days, and then perform data cleaning on them. Taking Table 2 as an example, the feature vector constructed from each call detail list after cleaning can be, for example, (1, 10, 30, 6, 5, 20, 0, 020, 0755, 1, 119, 119, 9472, 50 , 1, 0), a total of 16 eigenvector values. The final 1024 call details form a 1024×16 eigenvector matrix, that is, the second eigenvector is a 1024×16 eigenvector matrix.
上述第一特征向量和第二特征向量用于作为号码识别模型的输入项。The above-mentioned first feature vector and second feature vector are used as input items of the number recognition model.
S303、根据第一特征向量,以及,第二特征向量,利用号码识别模型,获取目标号码是否为诈骗号码的识别概率。S303. According to the first feature vector and the second feature vector, use a number identification model to obtain the identification probability of whether the target number is a fraudulent number.
其中,号码识别模型可以包括:采用多层感知机搭建的第一子模型、采用卷积神经网络搭建的第二子模型,以及,分类层。Wherein, the number recognition model may include: a first sub-model constructed by using a multi-layer perceptron, a second sub-model constructed by using a convolutional neural network, and a classification layer.
其中,第一子模型,用于根据第一特征向量获取目标号码的第一特征信息;第二子模型,用于根据第二特征向量获取第二特征信息;分类层,用于根据第一特征信息,以及,第二特征信息,获取目标号码是否为诈骗号码的识别概率。The first sub-model is used to obtain the first feature information of the target number according to the first feature vector; the second sub-model is used to obtain the second feature information according to the second feature vector; the classification layer is used to obtain the second feature information according to the first feature information, and the second feature information, to obtain the identification probability of whether the target number is a fraudulent number.
上述第一子模型和第二子模型除了是多层感知机和卷积神经网络外,也可以是其他能够根据第一特征向量和第二特征向量获取对应的第一特征信息和第二特征信息的机器学习算法或投票机制。上述分类层的功能也可以使用分类子模型实现,例如可以是softmax模型,也可以是sigmoid模型。在该实现方式下,上述号码识别模型可以包括第一子模型、第二子模型和分类子模型。In addition to the multilayer perceptron and the convolutional neural network, the above-mentioned first sub-model and second sub-model may also be other devices that can obtain corresponding first and second feature information according to the first feature vector and the second feature vector. machine learning algorithms or voting mechanisms. The functions of the above classification layer can also be implemented using a classification sub-model, such as a softmax model or a sigmoid model. In this implementation manner, the above-mentioned number recognition model may include a first sub-model, a second sub-model, and a classification sub-model.
以分类层为softmax层为例,图8为本申请实施例提供的一种号码识别模型整体结构示意图。如图8所示,诈骗号码识别系统分别将第一特征向量和第二特征向量输入多层感知机和卷积神经网络中,经过上述模型的处理之后,分别获得维度均为1×4的第一特征信息和第二特征信息。在这里,多层感知机和卷积神经网络输出的特征信息的维度可以相同也可以不同,具体可以根据实际需求设定。Taking the classification layer as the softmax layer as an example, FIG. 8 is a schematic diagram of the overall structure of a number recognition model provided by an embodiment of the present application. As shown in Figure 8, the fraud number identification system inputs the first feature vector and the second feature vector into the multilayer perceptron and the convolutional neural network respectively. a feature information and a second feature information. Here, the dimensions of the feature information output by the multilayer perceptron and the convolutional neural network can be the same or different, and can be set according to actual needs.
然后,由第一特征信息和第二特征信息构建的1×8的特征向量被输入 softmax层。经过softmax层的处理后,最终输出目标号码是否是诈骗号码的识别概率。在这里,基于softmax模型的特性,识别概率最终输出的表现形式为两个数值,分别为是“0”的概率,和,是“1”的概率,两个概率相加的和为1。其中“0”代表正常号码,“1”代表诈骗号码。Then, a 1×8 feature vector constructed from the first feature information and the second feature information is input to the softmax layer. After processing by the softmax layer, the final output is the identification probability of whether the target number is a fraudulent number. Here, based on the characteristics of the softmax model, the final output of the recognition probability is represented by two values, which are the probability of "0" and the probability of "1", and the sum of the two probabilities is 1. Among them, "0" represents a normal number, and "1" represents a fraudulent number.
图9为本申请实施例提供的一种多层感知机模型的结构示意图。如图9 所示,该多层感知机模型可以包括输入层、隐藏层和输出层。其中,输入层设置17个神经元,隐藏层设置8个神经元,输出层设置4个神经元。输入层设置17个神经元意味着诈骗号码识别系统将向多层感知机输入的第一特征向量包含17个特征向量值。输出层输出的即为根据第一特征向量获取的目标号码的第一特征信息。激活函数可以采用ReLU函数。上述各层神经元的个数可以根据实际应用需求进行设定,本申请对此不进行限制。FIG. 9 is a schematic structural diagram of a multi-layer perceptron model provided by an embodiment of the present application. As shown in Figure 9, the multilayer perceptron model may include an input layer, a hidden layer and an output layer. Among them, there are 17 neurons in the input layer, 8 neurons in the hidden layer, and 4 neurons in the output layer. Setting 17 neurons in the input layer means that the first eigenvector input by the fraud number recognition system to the multilayer perceptron contains 17 eigenvector values. The output layer outputs the first feature information of the target number obtained according to the first feature vector. The activation function can use the ReLU function. The number of neurons in the above-mentioned layers can be set according to actual application requirements, which is not limited in this application.
图10为本申请实施例提供的一种卷积神经网络模型的结构示意图。如图 10所示,该卷积神经网络模型可以分为12层,卷积核步长可以为1。诈骗号码识别系统将向卷积神经网络输入的第一特征向量可以包含16个特征向量值,诈骗号码识别系统仅获取最近30天内1024条通话详单数据。对应的,本实施例提供的卷积构造的输入特征维度可以为16×1024。其中,通话详单不足的部分可以采取补零处理。以模型训练时采用16×1024的第二特征向量作为输入为例,若在具体使用该模型时,诈骗号码识别系统获取的目标号码的最近30天内的通话详单数据不足1024条,则在构建上述第二特征向量的过程中,将不足的特征向量值用零补足,然后将补零处理后的第二特征向量输入卷积神经网络之中进行处理。FIG. 10 is a schematic structural diagram of a convolutional neural network model provided by an embodiment of the present application. As shown in Figure 10, the convolutional neural network model can be divided into 12 layers, and the convolution kernel step size can be 1. The first feature vector that the fraudulent number identification system will input to the convolutional neural network can contain 16 feature vector values, and the fraudulent number identification system only obtains the data of 1024 call details in the last 30 days. Correspondingly, the input feature dimension of the convolution structure provided in this embodiment may be 16×1024. Among them, the part with insufficient call details can be filled with zeros. Taking the second feature vector of 16×1024 as the input during model training as an example, if the specific use of the model, if the number of call details obtained by the fraud number identification system for the target number in the last 30 days is less than 1024, then the construction In the process of the above-mentioned second feature vector, the insufficient feature vector value is filled with zeros, and then the second feature vector after zero-filling processing is input into the convolutional neural network for processing.
各层例如可以进行如下操作:For example, each layer can perform the following operations:
①分别采用3×3×2、5×5×2、7×7×2的卷积核,对16×1024语音详单进行卷积,卷积结果均为16×1024×2。①Convolution kernels of 3×3×2, 5×5×2, and 7×7×2 are used to convolve the 16×1024 detailed speech lists, and the convolution results are all 16×1024×2.
②将上述①的各个卷积核输出结果在第三维进行叠加,输出维度为 16×1024×6。② Superimpose the output results of each convolution kernel in the above ① in the third dimension, and the output dimension is 16×1024×6.
③对于②的输出,采用3×3×32卷积核,输出维度为16×1024×32。③ For the output of ②, a 3×3×32 convolution kernel is used, and the output dimension is 16×1024×32.
④对于③的输出,采用2×2的最大池化层,输出维度为8×512×32。④ For the output of ③, a 2×2 max pooling layer is used, and the output dimension is 8×512×32.
⑤对于④的输出,采用3×3×64卷积核,输出维度为8×512×64。⑤ For the output of ④, a 3×3×64 convolution kernel is used, and the output dimension is 8×512×64.
⑥对于⑤的输出,采用2×2的最大池化层,输出维度为4×256×64。⑥ For the output of ⑤, a 2×2 max pooling layer is used, and the output dimension is 4×256×64.
⑦对于⑥的输出,采用3×3×128卷积核,输出维度为4×256×128。⑦ For the output of ⑥, a 3×3×128 convolution kernel is used, and the output dimension is 4×256×128.
⑧对于⑦的输出,采用2×2的最大池化层,输出维度为2×128×128。⑧ For the output of ⑦, a 2×2 max pooling layer is used, and the output dimension is 2×128×128.
⑨对于⑧的输出,压平后采用全连接层,输出维度为1×4096。⑨ For the output of ⑧, a fully connected layer is used after flattening, and the output dimension is 1×4096.
⑩对于⑨的输出,采用全连接层,输出维度为1×1000。⑩ For the output of ⑨, a fully connected layer is used, and the output dimension is 1×1000.
对于⑩的输出,采用全连接层,输出维度为1×128。 For the output of ⑩, a fully connected layer is used, and the output dimension is 1×128.
对于的输出,采用全连接层,输出维度为1×4。 for The output of , using a fully connected layer, the output dimension is 1 × 4.
其中第12层输出的即为根据第二特征向量获取的目标号码的第二特征信息。The output of the 12th layer is the second feature information of the target number obtained according to the second feature vector.
上述卷积神经网络模型参数可以根据实际需求设定,本申请对此不作限定。The parameters of the above-mentioned convolutional neural network model can be set according to actual requirements, which are not limited in this application.
本实施例提供的号码处理方法,诈骗号码识别系统在利用基础数据、流量详单数据和通话详单数据获取目标号码是否是诈骗号码的概率的过程中,对上述数据进行了数据清洗。通过数据清洗将上述数据转换成符合号码识别模型要求的表示形式,剔除了对识别诈骗号码没有帮助的无用数据,同时还将过长的数字串转换成用较短的数字串表示,这样既节省了号码识别模型的算力资源,也提高了识别效率,降低了模型在工作过程中出错的可能性。此外,本实施例采用多层感知机对基础数据和加工后的流量详单数据构成的第一特征向量进行学习,采用卷积神经网络对表现时间相关性特征的通话详单数据构成的第二特征向量进行学习,最后采用softmax层对从多层感知机和卷积神经网络获取的结果进行二分类处理,最终得到目标号码是否为诈骗号码的识别概率。本实施例采用不同的模型对具有不同特性的数据进行学习,充分发挥了不同深度学习模型在不同场景下的作用,进一步提高了诈骗号码识别系统识别诈骗号码的准确性。In the number processing method provided by this embodiment, the fraudulent number identification system performs data cleaning on the above data in the process of obtaining the probability of whether the target number is a fraudulent number by using basic data, traffic detail data and call detail data. Through data cleaning, the above data is converted into a representation that meets the requirements of the number recognition model, and the useless data that is not helpful for identifying fraudulent numbers is eliminated. At the same time, the excessively long number string is converted into a shorter number string representation, which not only saves money The computing power resources of the number recognition model are improved, the recognition efficiency is also improved, and the possibility of errors in the working process of the model is reduced. In addition, in this embodiment, a multi-layer perceptron is used to learn the first feature vector composed of the basic data and the processed traffic detail data, and a convolutional neural network is used to learn the second feature vector composed of the call detail data that expresses the temporal correlation feature. The feature vector is learned, and finally the softmax layer is used to perform binary classification processing on the results obtained from the multi-layer perceptron and the convolutional neural network, and finally the recognition probability of whether the target number is a fraudulent number is obtained. In this embodiment, different models are used to learn data with different characteristics, which fully utilizes the roles of different deep learning models in different scenarios, and further improves the accuracy of identifying fraudulent numbers by the fraudulent number identification system.
下面的实施例,将基于诈骗号码识别系统设置有白名单、黑名单和灰名单的情况下,详细对诈骗号码识别系统在获取目标号码是否为诈骗号码的识别概率后,对目标号码的后续处理进行说明:In the following embodiment, based on the situation that the fraud number identification system is provided with a white list, a black list and a gray list, the detailed follow-up processing of the target number after obtaining the identification probability of whether the target number is a fraud number by the fraud number identification system Be explained:
情况1:根据识别概率,确定目标号码非诈骗号码。Case 1: According to the identification probability, determine that the target number is not a fraudulent number.
在识别到号码为非诈骗号码后,可以将目标号码添加至白名单中。后续的,如果诈骗号码识别系统再次捕捉到该目标号码,通过对白名单的扫描,可直接确认该目标号码非诈骗号码,避免了对非诈骗号码的重复检测,减少了算力资源的浪费。After identifying the number as a non-scam number, the target number can be added to the whitelist. Subsequently, if the fraudulent number identification system captures the target number again, by scanning the whitelist, it can be directly confirmed that the target number is not a fraudulent number, which avoids repeated detection of non-fraudulent numbers and reduces the waste of computing resources.
情况2:根据识别概率,确定目标号码可能为诈骗号码。Case 2: According to the identification probability, it is determined that the target number may be a fraudulent number.
若根据号码识别概率,无法确定目标号码是诈骗号码还是非诈骗号码,则确定目标号码可能为诈骗号码,可以将目标号码添加至灰名单。If it is impossible to determine whether the target number is a fraudulent number or a non-fraudulent number according to the number recognition probability, it is determined that the target number may be a fraudulent number, and the target number can be added to the grey list.
诈骗号码识别系统可以直接向被叫号码发送目标号码异常的提示信息,以及,接受被叫号码的反馈。或者,诈骗号码识别系统可以调用其他系统发送目标号码异常的提示信息,然后,其他系统接收被叫号码的反馈后,将反馈发送到诈骗号码识别系统。上述其他系统例如可以是通信运营商系统。The fraudulent number identification system can directly send the prompt information of the abnormal target number to the called number, and receive the feedback of the called number. Alternatively, the fraudulent number identification system may call another system to send a prompt message that the target number is abnormal, and then, after receiving the feedback from the called number, the other system sends the feedback to the fraudulent number identification system. The other systems mentioned above may be, for example, communication operator systems.
以诈骗号码识别系统可以向灰名单中的被叫号码发送该目标号码异常的提示信息为例。该提示信息可以是短信的形式,也可以是语音通话的形式,还可以是通过系统软件发送提示信息的形式,例如,通过运营商提供的系统软件发送提示信息。Take the fraudulent number identification system as an example to send a prompt message that the target number is abnormal to the called number in the grey list. The prompt information may be in the form of a short message, a voice call, or a form of sending prompt information through system software, for example, sending prompt information through system software provided by an operator.
可选的,上述提示信息还可以包括提示被叫号码向系统发送反馈的信息。示例性的,上述提示信息可以提示被叫号码向系统发送0,或者,1。其中0 用于表示上述目标号码为非诈骗号码,1用于表示上述目标号码为诈骗号码。上述被叫号码向系统发送反馈的方式,例如可以是通过短信,也可以通过语音,还可以基于运营商提供的系统软件进行回复。Optionally, the above prompt information may further include information prompting the called number to send feedback to the system. Exemplarily, the above prompt information may prompt the called number to send 0 or 1 to the system. Among them, 0 is used to indicate that the above target number is a non-fraud number, and 1 is used to indicate that the above target number is a fraud number. The manner in which the above-mentioned called number sends feedback to the system, for example, may be through a short message, or through a voice, and may also reply based on the system software provided by the operator.
若诈骗号码识别系统接收到被叫号码的反馈结果、且反馈结果用于表征目标号码是否诈骗号码,表明被叫号码的反馈有助于诈骗号码识别系统识别目标号码是否为诈骗号码,则记录目标号码的反馈结果。If the fraudulent number identification system receives the feedback result of the called number, and the feedback result is used to indicate whether the target number is a fraudulent number, indicating that the feedback of the called number is helpful for the fraudulent number identification system to identify whether the target number is a fraudulent number, record the target number. Number of feedback results.
若目标号码的累计反馈结果表征目标号码非诈骗号码,则将目标号码从灰名单迁移至白名单。If the accumulated feedback result of the target number indicates that the target number is not a fraudulent number, the target number is migrated from the grey list to the white list.
若目标号码的累计反馈结果表征目标号码是诈骗号码,则将目标号码从灰名单迁移至黑名单。If the accumulated feedback result of the target number indicates that the target number is a fraudulent number, the target number is migrated from the grey list to the black list.
上述通过累计反馈结果判断目标号码是否为诈骗号码的方式可以有多种。例如,诈骗号码识别系统可以设定若反馈结果接收到表示目标号码是或不是诈骗号码的反馈数量超过预设的反馈阈值,则确定目标号码是或不是诈骗号码。上述反馈阈值可以根据实际需求设定,例如可以是10个,本申请对此不作限制。There may be various methods for determining whether the target number is a fraudulent number through the accumulated feedback results. For example, the fraudulent number identification system may be configured to determine whether the target number is a fraudulent number or not if the number of feedbacks indicating that the target number is or is not a fraudulent number exceeds a preset feedback threshold as a result of the feedback. The foregoing feedback threshold may be set according to actual requirements, for example, it may be 10, which is not limited in this application.
或者,诈骗号码识别系统可以设定若反馈结果表示目标号码是诈骗号码的数量和不是诈骗号码的数量的比例达到预设的比例阈值时,则确定目标号码是或不是诈骗号码。上述反馈阈值和比例阈值可以根据实际需求设定,例如比例阈值可以是4:1,本申请对此不作限制。Alternatively, the fraudulent number identification system may be configured to determine whether the target number is a fraudulent number or not when the ratio of the number of target numbers that are fraudulent numbers to those that are not fraudulent numbers reaches a preset ratio threshold. The foregoing feedback threshold and ratio threshold may be set according to actual requirements, for example, the ratio threshold may be 4:1, which is not limited in this application.
诈骗号码识别系统也可以结合上述反馈阈值和比例阈值来确定目标号码是或不是诈骗号码。例如,可以设定当反馈阈值为10,对应的比例阈值为4:1 是。The fraudulent number identification system may also combine the feedback thresholds and proportional thresholds described above to determine whether the target number is or is not a fraudulent number. For example, it can be set that when the feedback threshold is 10, the corresponding ratio threshold is 4:1.
在情况2下,诈骗号码识别系统采用灰名单对可能是诈骗号码的目标号码进行存放,并通过向灰名单中的被叫号码发送目标号码异常的提示信息,以及接收被叫号码的反馈的方式,进一步确定目标号码是否是诈骗号码。通过上述操作,诈骗号码识别系统可以对疑似诈骗号码的目标号码进行再次确认,而非简单的通过号码识别模型给出的识别概率进行判断。这样可以提高诈骗号码识别的准确性,减少对正常号码误封的可能性,也避免遗漏诈骗号码。In case 2, the fraudulent number identification system uses the greylist to store the target numbers that may be fraudulent numbers, and sends a prompt message that the target number is abnormal to the called number in the greylist, and receives feedback from the called number. to further determine whether the target number is a fraudulent number. Through the above operations, the fraudulent number identification system can reconfirm the target number of the suspected fraudulent number, instead of simply judging the identification probability given by the number recognition model. In this way, the accuracy of identifying fraudulent numbers can be improved, the possibility of mistakenly blocking normal numbers can be reduced, and the omission of fraudulent numbers can also be avoided.
情况3:根据识别概率,确定目标号码为诈骗号码。Case 3: According to the identification probability, the target number is determined to be a fraudulent number.
在识别到号码为非诈骗号码后,可以将目标号码添加至黑名单。该黑名单为诈骗号码的名单。After identifying the number as a non-scam number, the target number can be added to the blacklist. The blacklist is a list of fraudulent numbers.
可选的,在将目标号码添加至黑名单之后,还可以对黑名单中的目标号码执行封停操作。Optionally, after the target number is added to the blacklist, a blocking operation can also be performed on the target number in the blacklist.
作为一种可能的实现方式,诈骗号码识别系统可以直接对黑名单中目标号码进行封停操作。例如,诈骗号码识别系统具有可以对黑名单中目标号码进行封停的功能。或者,诈骗号码识别系统可以调用其他系统对黑名单中的目标号码进行封停。上述其他系统,例如可以是通信运营商系统。示例性的,诈骗号码识别系统可以对运营商数据库中黑名单中的目标号码进行标记,例如,可以将目标号码在运营商数据库中的状态标记为待封停。然后调用通信运营商系统根据标记对目标号码执行封停操作。As a possible implementation, the fraudulent number identification system can directly block the target number in the blacklist. For example, the fraud number identification system has the function of blocking the target number in the blacklist. Alternatively, the fraudulent number identification system can call other systems to block the target numbers in the blacklist. The other systems mentioned above may be, for example, the systems of communication operators. Exemplarily, the fraudulent number identification system may mark the target number in the blacklist in the operator's database, for example, may mark the status of the target number in the operator's database as pending blocking. Then, the communication operator system is called to perform a blocking operation on the target number according to the mark.
作为另一种可能的实现方式,在诈骗号码识别系统对黑名单中的目标号码封停之前还可以包括对目标号码的二次稽核。该二次稽核,例如可以是对目标号码是否存在诈骗号码的典型特征进行稽核。上述二次稽核操作,可以是人工执行的,也可以是由系统执行的。上述系统例如可以是诈骗号码识别系统或者其他系统(如通信运营商系统)。As another possible implementation manner, before the fraudulent number identification system blocks the target number in the blacklist, a secondary audit of the target number may also be included. The secondary audit may be, for example, auditing whether the target number has typical characteristics of fraudulent numbers. The above-mentioned secondary audit operation may be performed manually or by the system. The above-mentioned system may be, for example, a fraudulent number identification system or another system (eg, a communication operator system).
可选的,诈骗号码识别系统还可以定时扫描黑名单中的目标号码,并封停遗漏的号码。Optionally, the fraudulent number identification system can also regularly scan the target numbers in the blacklist and block the missed numbers.
在情况3下,诈骗号码识别系统利用黑名单对诈骗号码进行存放,并对该诈骗号码进行封停。同时,在封停前还进行了对号码的二次稽核。通过二次稽核可以降低误封的可能性,避免误封号码对用户使用体验的降低。此外,诈骗号码识别系统还可以定时扫描黑名单中的目标号码,并封停遗漏的号码。通过上述操作进一步保证了诈骗号码识别系统对诈骗号码的精准处理,减少漏封号码的存在。In case 3, the fraudulent number identification system uses the blacklist to store the fraudulent number and block the fraudulent number. At the same time, a second audit of the number was carried out before the suspension. Through the secondary audit, the possibility of misblocking can be reduced, and the user experience of the user experience can be avoided by misblocking the number. In addition, the fraud number identification system can regularly scan the target numbers in the blacklist and block the missing numbers. The above operations further ensure the accurate processing of fraudulent numbers by the fraudulent number identification system, and reduce the existence of missing numbers.
下面通过一个具体的示例来对本申请实施例提供的号码处理方法进行详细说明,包括如下步骤:The number processing method provided by the embodiment of the present application is described in detail below through a specific example, including the following steps:
S401、获取目标区域的N个号码在第二历史时长内的呼出次数。S401. Acquire the number of outgoing calls of the N numbers of the target area within the second historical time period.
S402、判断N个号码中是否存在呼出次数大于呼出阈值的号码。若存在,执行步骤S403,若不存在,则结束流程。S402. Determine whether there is a number whose number of outgoing calls is greater than a threshold for outgoing calls among the N numbers. If it exists, step S403 is executed, and if it does not exist, the process ends.
S403、呼出次数大于呼出阈值的号码作为候选号码。S403. The number whose number of outgoing calls is greater than the outgoing threshold is used as a candidate number.
S404、判断该候选号码是否位于白名单中。若位于白名单中,则结束流程,若未位于白名单中,则执行步骤S405。S404. Determine whether the candidate number is in the white list. If it is in the white list, the process ends, and if it is not in the white list, step S405 is executed.
S405、判断该候选号码是否位于灰名单中。若位于灰名单中,则执行步骤S416,若未位于灰名单中,则执行步骤S406。S405. Determine whether the candidate number is in the grey list. If it is in the grey list, go to step S416, if it is not in the grey list, go to step S406.
S406、将该候选号码作为目标号码。S406. Use the candidate number as the target number.
S407、获取目标号码的基础数据,以及,目标号码在第一历史时长内的流量详单数据和通话详单数据。S407. Acquire basic data of the target number, as well as traffic detailed data and call detailed data of the target number within the first historical period.
S408、对流量详单数据进行数据加工,得到加工后的流量详单数据。S408 , performing data processing on the data of the detailed flow sheet to obtain the data of the detailed flow sheet after processing.
S409、对基础数据、加工后的流量详单数据进行数据清洗,并利用清洗后的基础数据和清洗后的流量详单数据,构建第一特征向量。S409 , performing data cleaning on the basic data and the processed flow detailed bill data, and constructing a first feature vector by using the cleaned basic data and the cleaned flow detailed bill data.
S410、对通话详单数据进行数据清洗,并利用清洗后的通话详单数据,构建第二特征向量。S410. Perform data cleaning on the call detail data, and use the cleaned call detail data to construct a second feature vector.
S411、根据第一特征向量,以及,第二特征向量,利用号码识别模型,获取目标号码是否为诈骗号码的识别概率。S411. According to the first feature vector and the second feature vector, use a number identification model to obtain the identification probability of whether the target number is a fraudulent number.
S412、根据识别概率,确定目标号码是否为诈骗号码。若确定为诈骗号码,则执行步骤S413。若确定为非诈骗号码,则执行步骤S415。若无法确定是否为诈骗号码,则执行步骤S416。S412. Determine whether the target number is a fraudulent number according to the identification probability. If it is determined to be a fraudulent number, step S413 is executed. If it is determined that the number is not fraudulent, step S415 is executed. If it cannot be determined whether it is a fraudulent number, step S416 is executed.
S413、将目标号码添加至黑名单。S413. Add the target number to the blacklist.
S414、对目标号码执行封停操作。S414. Perform a blocking operation on the target number.
执行步骤S415之后,流程结束。After step S415 is executed, the flow ends.
S415、将目标号码添加至白名单。S415. Add the target number to the whitelist.
执行步骤S415之后,流程结束。After step S415 is executed, the flow ends.
S416、将目标号码添加至灰名单。S416. Add the target number to the grey list.
S417、向被叫号码发送目标号码异常的提示信息。S417. Send a prompt message that the target number is abnormal to the called number.
S418、判断是否接收到被叫号码的反馈结果。S418. Determine whether the feedback result of the called number is received.
S419、记录目标号码的反馈结果。S419, record the feedback result of the target number.
S420、判断目标号码的累计反馈结果是否表征目标号码非诈骗号码。若表征目标号码非诈骗号码,则执行步骤S421,若表征目标号码为诈骗号码,则执行步骤S422。S420. Determine whether the accumulated feedback result of the target number indicates that the target number is not a fraudulent number. If the represented target number is not a fraudulent number, step S421 is performed; if the represented target number is a fraudulent number, step S422 is performed.
S421、将目标号码从灰名单迁移至白名单。S421. Migrate the target number from the gray list to the white list.
S422、将目标号码从灰名单迁移至黑名单。S422. Migrate the target number from the gray list to the black list.
执行步骤S422之后,返回执行步骤S414。After step S422 is executed, it returns to execute step S414.
从运营商数据库中获取3个呼叫次数均高于50次/天的号码作为示例,利用本申请的诈骗号码识别系统对其进行识别。Three numbers with call times higher than 50 times/day are obtained from the operator database as an example, and the fraudulent number identification system of the present application is used to identify them.
其中,示例1的基础数据和流量详单显示其产品资费较低,客户类型为公众客户,本月日均是用流量极少,且流量使用最多的APP仅为通信APP 一种。其语音详单在一天内存在大量主叫记录,呼出号码均为外地或外省,不存在重复呼出的号码,每次呼出间隔只有几分钟,基站号、小区号和蜂窝号基本无变化。具有典型的诈骗号码特征。Among them, the basic data and traffic details of Example 1 show that its product tariff is low, the customer type is public customers, the daily traffic usage is very small this month, and the APP with the most traffic is only the communication APP. There are a large number of caller records in its voice detailed list in one day, the outgoing numbers are all out of place or out of the province, there are no repeated outgoing numbers, the interval between each outgoing call is only a few minutes, and the base station number, cell number and cell number are basically unchanged. Has typical scam number characteristics.
示例2的基础数据和流量详单显示其产品资费中等,客户类型为集团客户,本月日均使用流量较多且使用时段为工作时段,使用流量最多的APP为物流APP,其次为通信APP。语音详单一天内也存在大量主叫记录,极少重复呼出号码,呼出号码归属本市、外地或外省,基站号、小区号和蜂窝号时有变动,与诈骗号码有相似性但存在区分度。其特征和物流配送行业号码特征较符合。The basic data and flow details of Example 2 show that the product tariff is moderate, the customer type is group customer, the average daily traffic is high this month, and the usage period is during working hours. The APP with the most traffic is the logistics APP, followed by the communication APP. There are also a large number of caller records in a single day, and the outgoing numbers are rarely repeated. The outgoing numbers belong to the city, other places or provinces. The base station number, cell number and cell number change from time to time, which is similar to the fraudulent number but has a degree of distinction. . Its characteristics are more in line with the characteristics of the logistics and distribution industry numbers.
示例3的基础数据和流量详单显示其产品资费较高,客户类型为公众,本月日均使用流量较高且使用时间集中在下班时间,流量使用最多的APP为视频APP,其次为通信APP。其语音详单一天呼出号码较少,存在主叫和被叫记录,基本是本市通话或虚拟专用网络呼叫,有不少回拨电话,基站号、小区号和蜂窝号时有变动。其特征和一般安全号码特征相符。The basic data and traffic details of Example 3 show that the product tariff is relatively high, the customer type is public, the average daily traffic usage this month is high, and the usage time is concentrated during off-duty hours. The APP with the most traffic is the video APP, followed by the communication APP . The number of outgoing calls in a single day is less, and there are calling and called records. They are basically local calls or virtual private network calls. There are many callback calls, and the base station number, cell number, and cell number change from time to time. Its characteristics are consistent with those of general security numbers.
将上述3个示例号码均用诈骗号码识别系统进行识别,结果显示示例1 的诈骗号码识别概率表示其为诈骗号码的概率为97.26%,示例2为41.85%,示例1的诈骗号码识别概率表示其为诈骗号码的概率为6.18%。由此可以看出,这三个示例号码人工逻辑推断与系统实际输出大致相同,系统在这三个示例号码取得了较好的防诈效果。The above three example numbers are all identified by the fraud number recognition system, and the results show that the probability of fraud number recognition in Example 1 means that the probability of being a fraud number is 97.26%, and the probability of being a fraud number in Example 2 is 41.85%. The probability of being a scam number is 6.18%. From this, it can be seen that the artificial logic inference of the three example numbers is roughly the same as the actual output of the system, and the system has achieved a good anti-fraud effect in these three example numbers.
进入后续流程,示例1需进入黑名单,并对其进行封停处理;示例2需进入灰名单,诈骗号码识别系统将通过同被叫号码交互对其进行再次确认;示例3进入白名单库,后续再次呼出时,若被诈骗号码识别系统捕捉到,可避免再次进入后续流程。In the follow-up process, example 1 needs to enter the blacklist and block it; example 2 needs to enter the graylist, and the fraudulent number identification system will reconfirm it by interacting with the called number; example 3 enters the whitelist database, If you are caught by the fraudulent number identification system when you make a subsequent call, you can avoid entering the subsequent process again.
上述示例1-3,清楚说明了各个场景下的号码防诈的全流程,实现了防诈系统高效率全自动防诈模式。对于诈骗号码,系统能够有效及时拦截其继续呼出并全自动封停处理;对于正常用户号码,系统将其加入白名单,保证用户正常使用,并避免二次防诈造成系统资源浪费;对于疑似诈骗号码系统将其加入灰名单,并与被叫号码实时交互,获取更多信息,避免贸然封停正常用户,影响用户使用,提高了用户感知。若大量被呼用户反馈确实为诈骗号码,系统会将其移入黑名单封停,通过交互方式弥补了为保证用户感知而导致偏低的诈骗号码召回率。The above examples 1-3 clearly illustrate the whole process of number fraud prevention in each scenario, and realize the efficient and automatic fraud prevention mode of the fraud prevention system. For fraudulent numbers, the system can effectively and timely intercept their continued outgoing calls and automatically block them; for normal user numbers, the system will add them to the whitelist to ensure normal use by users and avoid waste of system resources caused by secondary fraud prevention; for suspected frauds The number system adds it to the greylist, and interacts with the called number in real time to obtain more information, so as to avoid rashly blocking normal users, affecting user usage, and improving user perception. If the feedback of a large number of called users is indeed a fraudulent number, the system will move them to the blacklist and block them, and make up for the low recall rate of fraudulent numbers to ensure user perception through interaction.
本申请还提供一种如图1所示的诈骗号码识别系统100,诈骗号码识别系统包括的模块和功能如前文的描述,在此不再赘述。在一种实施例中,诈骗号码识别系统100中的实时监控模块11用于获取待识别的目标号码的动作,数据获取模块12用于获取目标号码的基础数据,以及,目标号码在第一历史时长内的流量详单数据和通话详单数据的动作,诈骗检测模块13,用于根据基础数据、流量详单数据和通话详单数据,利用号码识别模型,获取目标号码是否为诈骗号码的识别概率,根据识别概率,确定目标号码是否为诈骗号码,其实现原理类似,对此不再加以赘述。The present application also provides a fraudulent number identification system 100 as shown in FIG. 1 . The modules and functions included in the fraudulent number identification system are as described above, and will not be repeated here. In one embodiment, the real-time monitoring module 11 in the fraudulent number identification system 100 is used to obtain the action of the target number to be identified, the
本申请还提供一种如图4所示的计算设备200,计算设备200中的处理器202读取存储器201存储的程序和数据集合以执行前述诈骗号码识别系统执行的号码处理方法。The present application also provides a computing device 200 as shown in FIG. 4 . The
本申请还提供了一种计算机可读存储介质,该计算机可读存储介质可以包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random AccessMemory)、磁盘或者光盘等各种可以存储程序代码的介质,具体的,该计算机可读存储介质中存储有程序指令,程序指令用于上述实施例中的方法。The application also provides a computer-readable storage medium, the computer-readable storage medium may include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory) Various media that can store program codes, such as a magnetic disk, a magnetic disk, or an optical disk, specifically, the computer-readable storage medium stores program instructions, and the program instructions are used for the methods in the foregoing embodiments.
本申请还提供一种程序产品,该程序产品包括执行指令,该执行指令存储在可读存储介质中。计算设备的至少一个处理器可以从可读存储介质读取该执行指令,至少一个处理器执行该执行指令使得电子设备实施上述的各种实施方式提供的号码处理方法。The present application also provides a program product including execution instructions stored in a readable storage medium. At least one processor of the computing device can read the execution instruction from the readable storage medium, and the execution of the execution instruction by the at least one processor causes the electronic device to implement the number processing method provided by the above-mentioned various embodiments.
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本申请的其它实施方案。本申请旨在涵盖本申请的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本申请的一般性原理并包括本申请未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本申请的真正范围和精神由下面的权利要求书指出。Other embodiments of the present application will readily occur to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses or adaptations of this application that follow the general principles of this application and include common knowledge or conventional techniques in the technical field not disclosed in this application . The specification and examples are to be regarded as exemplary only, with the true scope and spirit of the application being indicated by the following claims.
应当理解的是,本申请并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本申请的范围仅由所附的权利要求书来限制。It is to be understood that the present application is not limited to the precise structures described above and illustrated in the accompanying drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202210486524.0A CN115022464A (en) | 2022-05-06 | 2022-05-06 | Number processing method, system, computing device and storage medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202210486524.0A CN115022464A (en) | 2022-05-06 | 2022-05-06 | Number processing method, system, computing device and storage medium |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN115022464A true CN115022464A (en) | 2022-09-06 |
Family
ID=83069085
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202210486524.0A Pending CN115022464A (en) | 2022-05-06 | 2022-05-06 | Number processing method, system, computing device and storage medium |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN115022464A (en) |
Citations (23)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2001052510A1 (en) * | 2000-01-13 | 2001-07-19 | Eyretel Plc | System and method for recording voice and the data entered by a call center agent and retrieval of these communication streams for analysis or correction |
| WO2011015017A1 (en) * | 2009-08-04 | 2011-02-10 | 中兴通讯股份有限公司 | Method and system for preventing short message cheat |
| KR20150047378A (en) * | 2013-10-24 | 2015-05-04 | 주식회사 네이블커뮤니케이션즈 | Device of blocking voice phishing calls |
| US20170111515A1 (en) * | 2015-10-14 | 2017-04-20 | Pindrop Security, Inc. | Call detail record analysis to identify fraudulent activity |
| CN106791220A (en) * | 2016-11-04 | 2017-05-31 | 国家计算机网络与信息安全管理中心 | Prevent the method and system of telephone fraud |
| CN107331385A (en) * | 2017-07-07 | 2017-11-07 | 重庆邮电大学 | A kind of identification of harassing call and hold-up interception method |
| CN107506776A (en) * | 2017-01-16 | 2017-12-22 | 恒安嘉新(北京)科技股份公司 | A kind of analysis method of fraudulent call number |
| WO2018027138A1 (en) * | 2016-08-04 | 2018-02-08 | Pindrop Security, Inc. | Fraud detection in interactive voice response systems |
| CN110188805A (en) * | 2019-05-17 | 2019-08-30 | 国家计算机网络与信息安全管理中心 | A kind of recognition methods for swindling group |
| CN110798330A (en) * | 2018-08-01 | 2020-02-14 | 中国移动通信集团浙江有限公司 | A method and device for updating a telecommunication fraud database |
| CN111198947A (en) * | 2020-01-06 | 2020-05-26 | 南京中新赛克科技有限责任公司 | Convolutional neural network fraud short message classification method and system based on naive Bayes optimization |
| CN111444960A (en) * | 2020-03-26 | 2020-07-24 | 上海交通大学 | Dermatological image classification system based on multimodal data input |
| CN111741472A (en) * | 2020-08-07 | 2020-10-02 | 北京微智信业科技有限公司 | GoIP fraud telephone identification method, system, medium and equipment |
| CN112291424A (en) * | 2020-10-29 | 2021-01-29 | 上海观安信息技术股份有限公司 | Fraud number identification method and device, computer equipment and storage medium |
| CN112396079A (en) * | 2019-08-16 | 2021-02-23 | 中国移动通信集团广东有限公司 | Number recognition model training method, number recognition method and device |
| CN112491864A (en) * | 2020-11-23 | 2021-03-12 | 恒安嘉新(北京)科技股份公司 | Method, device, equipment and medium for detecting phishing deep victim user |
| CN112565525A (en) * | 2019-09-26 | 2021-03-26 | 中国电信股份有限公司 | Anti-fraud early warning method and device |
| CN113163057A (en) * | 2021-01-20 | 2021-07-23 | 北京工业大学 | Method for constructing dynamic identification interval of fraud telephone |
| WO2021184837A1 (en) * | 2020-03-16 | 2021-09-23 | 宇龙计算机通信科技(深圳)有限公司 | Fraudulent call identification method and device, storage medium, and terminal |
| CN113961712A (en) * | 2021-09-08 | 2022-01-21 | 武汉众智数字技术有限公司 | Knowledge graph-based fraud telephone analysis method |
| CN113961764A (en) * | 2021-10-19 | 2022-01-21 | 平安国际智慧城市科技股份有限公司 | Identification method, device, equipment and storage medium of fraudulent call |
| CN114169439A (en) * | 2021-12-08 | 2022-03-11 | 中国电信股份有限公司 | Abnormal communication number identification method and device, electronic equipment and readable medium |
| CN114189585A (en) * | 2020-09-14 | 2022-03-15 | 中国移动通信集团重庆有限公司 | Crank call abnormity detection method and device and computing equipment |
-
2022
- 2022-05-06 CN CN202210486524.0A patent/CN115022464A/en active Pending
Patent Citations (23)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2001052510A1 (en) * | 2000-01-13 | 2001-07-19 | Eyretel Plc | System and method for recording voice and the data entered by a call center agent and retrieval of these communication streams for analysis or correction |
| WO2011015017A1 (en) * | 2009-08-04 | 2011-02-10 | 中兴通讯股份有限公司 | Method and system for preventing short message cheat |
| KR20150047378A (en) * | 2013-10-24 | 2015-05-04 | 주식회사 네이블커뮤니케이션즈 | Device of blocking voice phishing calls |
| US20170111515A1 (en) * | 2015-10-14 | 2017-04-20 | Pindrop Security, Inc. | Call detail record analysis to identify fraudulent activity |
| WO2018027138A1 (en) * | 2016-08-04 | 2018-02-08 | Pindrop Security, Inc. | Fraud detection in interactive voice response systems |
| CN106791220A (en) * | 2016-11-04 | 2017-05-31 | 国家计算机网络与信息安全管理中心 | Prevent the method and system of telephone fraud |
| CN107506776A (en) * | 2017-01-16 | 2017-12-22 | 恒安嘉新(北京)科技股份公司 | A kind of analysis method of fraudulent call number |
| CN107331385A (en) * | 2017-07-07 | 2017-11-07 | 重庆邮电大学 | A kind of identification of harassing call and hold-up interception method |
| CN110798330A (en) * | 2018-08-01 | 2020-02-14 | 中国移动通信集团浙江有限公司 | A method and device for updating a telecommunication fraud database |
| CN110188805A (en) * | 2019-05-17 | 2019-08-30 | 国家计算机网络与信息安全管理中心 | A kind of recognition methods for swindling group |
| CN112396079A (en) * | 2019-08-16 | 2021-02-23 | 中国移动通信集团广东有限公司 | Number recognition model training method, number recognition method and device |
| CN112565525A (en) * | 2019-09-26 | 2021-03-26 | 中国电信股份有限公司 | Anti-fraud early warning method and device |
| CN111198947A (en) * | 2020-01-06 | 2020-05-26 | 南京中新赛克科技有限责任公司 | Convolutional neural network fraud short message classification method and system based on naive Bayes optimization |
| WO2021184837A1 (en) * | 2020-03-16 | 2021-09-23 | 宇龙计算机通信科技(深圳)有限公司 | Fraudulent call identification method and device, storage medium, and terminal |
| CN111444960A (en) * | 2020-03-26 | 2020-07-24 | 上海交通大学 | Dermatological image classification system based on multimodal data input |
| CN111741472A (en) * | 2020-08-07 | 2020-10-02 | 北京微智信业科技有限公司 | GoIP fraud telephone identification method, system, medium and equipment |
| CN114189585A (en) * | 2020-09-14 | 2022-03-15 | 中国移动通信集团重庆有限公司 | Crank call abnormity detection method and device and computing equipment |
| CN112291424A (en) * | 2020-10-29 | 2021-01-29 | 上海观安信息技术股份有限公司 | Fraud number identification method and device, computer equipment and storage medium |
| CN112491864A (en) * | 2020-11-23 | 2021-03-12 | 恒安嘉新(北京)科技股份公司 | Method, device, equipment and medium for detecting phishing deep victim user |
| CN113163057A (en) * | 2021-01-20 | 2021-07-23 | 北京工业大学 | Method for constructing dynamic identification interval of fraud telephone |
| CN113961712A (en) * | 2021-09-08 | 2022-01-21 | 武汉众智数字技术有限公司 | Knowledge graph-based fraud telephone analysis method |
| CN113961764A (en) * | 2021-10-19 | 2022-01-21 | 平安国际智慧城市科技股份有限公司 | Identification method, device, equipment and storage medium of fraudulent call |
| CN114169439A (en) * | 2021-12-08 | 2022-03-11 | 中国电信股份有限公司 | Abnormal communication number identification method and device, electronic equipment and readable medium |
Non-Patent Citations (4)
| Title |
|---|
| 刘红星;刘山葆;: "声纹识别和意图理解技术在电信诈骗检测中的应用研究", 广东通信技术, no. 07, 15 July 2020 (2020-07-15) * |
| 张尼;陈扬帆;王志军;李正;陶冶;: "有效、可扩展的电信欺诈管理系统设计与实现", 信息通信技术, no. 06, 15 December 2015 (2015-12-15) * |
| 杨林;夏雪玲;吴娟;许玉龙;: "终端售后质量与效果跟踪研究", 信息通信, no. 05, 15 May 2017 (2017-05-15) * |
| 黎明雪;何宇锋;杨静雯;: "基于动态规则的后向流量运营优化方案研究", 移动通信, no. 23, 15 December 2017 (2017-12-15) * |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN110163618B (en) | Abnormal transaction detection method, device, equipment and computer-readable storage medium | |
| CN110310020B (en) | Project scheme management method based on data analysis, related device and storage medium | |
| CN113435912A (en) | Data analysis method, device, equipment and medium based on client portrait | |
| US20230388412A1 (en) | Reputation management platform and methods thereof | |
| CN118674465A (en) | Fraud recognition method, fraud recognition device, electronic equipment and storage medium | |
| CN110457601A (en) | Identification method and device for social account, storage medium and electronic device | |
| CN113744054A (en) | Anti-fraud method, device and equipment | |
| CN111931189A (en) | API interface transfer risk detection method and device and API service system | |
| CN114519588A (en) | Intelligent anti-fraud method, system, device and medium based on decision engine | |
| CN117939003A (en) | Abnormal number identification method, device, electronic device and storage medium | |
| CN114444652A (en) | An intelligent analysis method for operator logs based on multi-dimensional mining | |
| CN110675263A (en) | Risk identification method and device for transaction data | |
| CN115022464A (en) | Number processing method, system, computing device and storage medium | |
| US20240420133A1 (en) | Methods, systems, and computer program products for transfer validation | |
| Zaratiegui et al. | Performing highly accurate predictions through convolutional networks for actual telecommunication challenges | |
| CN115037655A (en) | Pressure measurement method and system | |
| US20250126205A1 (en) | Systems and methods for service center control and management | |
| US12430673B2 (en) | Systems and methods for request validation | |
| CN116630059A (en) | Loss prediction method, device, equipment and storage medium based on artificial intelligence | |
| CN115130577A (en) | Method, device and electronic device for identifying fraudulent numbers | |
| CN113362069A (en) | Dynamic adjustment method, device and equipment of wind control model and readable storage medium | |
| CN112150261A (en) | Financial anti-fraud method and system based on user communication behavior | |
| CN113971188A (en) | Target user identification method and device and electronic equipment | |
| CN110351731A (en) | A kind of method and device of phone number antifraud | |
| Mawgoud¹ et al. | in Telecommunications Regulatory Authorities |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20220906 |
|
| WD01 | Invention patent application deemed withdrawn after publication |







