CN113052577B - Class speculation method and system for block chain digital currency virtual address - Google Patents
Class speculation method and system for block chain digital currency virtual address Download PDFInfo
- Publication number
- CN113052577B CN113052577B CN202110272026.1A CN202110272026A CN113052577B CN 113052577 B CN113052577 B CN 113052577B CN 202110272026 A CN202110272026 A CN 202110272026A CN 113052577 B CN113052577 B CN 113052577B
- Authority
- CN
- China
- Prior art keywords
- data set
- address
- class
- data
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q20/00—Payment architectures, schemes or protocols
- G06Q20/04—Payment circuits
- G06Q20/06—Private payment circuits, e.g. involving electronic currency used among participants of a common payment scheme
- G06Q20/065—Private payment circuits, e.g. involving electronic currency used among participants of a common payment scheme using e-cash
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/64—Protecting data integrity, e.g. using checksums, certificates or signatures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/04—Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Security & Cryptography (AREA)
- General Business, Economics & Management (AREA)
- Strategic Management (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- General Health & Medical Sciences (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Bioethics (AREA)
- Technology Law (AREA)
- Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a class speculation method and a class speculation system for a block chain digital currency virtual address, wherein the class speculation method comprises the following steps: acquiring a known category address and an unknown category address of digital currency; carrying out transaction retrieval, feature extraction, data normalization, feature contribution calculation and screening and sample class unbalanced treatment on the known class address to obtain a sample data set; dividing the sample data set into a training data set and a test data set, and selecting an optimal model as a classifier for digital currency address category estimation after multiple iterations; carrying out transaction retrieval, feature extraction, data normalization and feature contribution screening on the unknown class address to obtain a data set to be classified; and classifying and calculating the input data set to be classified based on the classifier to obtain the category to which the data set to be classified belongs. The invention learns the characteristics of the virtual addresses of the known classes so as to infer the class to which the virtual address of the unknown class belongs, and can solve the problem that most of the virtual addresses in the blockchain network are in an information unknown state.
Description
Technical Field
The invention relates to the technical field of blockchain digital currency, in particular to a class speculation method and a class speculation system for a blockchain digital currency virtual address.
Background
With the development of blockchain technology and digital currency, more and more people are attracted by the anonymity and decentralization of blockchain technology and the like and the high income of digital currency, and attention is paid to blockchain digital currency such as bitcoin, ethernet and the like. The anonymity of the blockchain digital currency provides enough security for the user nodes, and simultaneously causes the whole network fish to be mixed, so that illegal transaction activities occur frequently. Therefore, entity type speculation is performed on the blockchain digital currency under the condition of ensuring the anonymity of the user nodes, so that the category of the virtual address covered by each user node is defined, and the method has important value for supervising the blockchain digital currency network.
The current research on blockchain digital currency network entity type speculation mainly extends around several aspects of entity user address clustering, entity legitimacy detection, entity user address classification and the like:
1. Aiming at entity user address clustering, the prior study mainly uses a heuristic clustering method to cluster transaction addresses, and inputs addresses in transactions are clustered into one type and are identified as an entity.
2. Aiming at entity validity detection, the existing research mainly aims at marked addresses and entities participating in illegal activities for analysis; wherein, the related illegal activities mainly comprise investment fraud, forbidden goods buying and selling, money laundering and the like.
3. Aiming at address classification of entity users, the existing research mainly comprises the steps of obtaining address information provided in some forums, blogs and websites, making classification standards and training a classification model.
Although the above scheme can realize the speculation of the entity type of the blockchain digital currency network to a certain extent, most of the schemes depend on the under-chain information of the virtual addresses, although the above scheme can accurately infer the category or real information of the virtual addresses, not all the virtual addresses can be associated with the under-chain information, most of the virtual addresses in the blockchain network are in an unknown state, and a great deal of network resources need to be searched in the scheme, so that the method is very time-consuming and labor-consuming, and has the problems of less data feature dimension for the entity type speculation, too high research cost, poor universality of the whole solution and the like.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a category estimation method and a category estimation system for a blockchain digital currency virtual address, which have low cost and strong universality.
The invention discloses a category speculation method of a block chain digital currency virtual address, which comprises the following steps:
Acquiring a known category address and an unknown category address of the blockchain digital currency;
carrying out transaction retrieval, feature extraction, data normalization, feature contribution calculation and screening and sample class imbalance treatment on the known class address to obtain a sample data set;
Dividing the sample data set into a training data set for model training and a test data set for model evaluation, and selecting an optimal model as a classifier for digital currency address category estimation after multiple iterations;
Carrying out transaction retrieval, feature extraction, data normalization and feature contribution screening on the unknown class address to obtain a data set to be classified;
And carrying out classification calculation on the input data set to be classified based on the classifier to obtain the category to which the data set to be classified belongs.
As a further improvement of the present invention, the transaction retrieval and feature extraction includes:
obtaining all transaction information participated in by the known type address or the unknown type address in the account book data of the blockchain digital currency;
Extracting features of the transaction information to obtain a basic data set; the basic data set comprises total transaction times, sum of each transaction, times taken as output addresses, times of participating in cast coin transactions, time for receiving the bit coin for the first time, time for spending the bit coin for the first time, output address count of each transaction and input address count of each transaction;
And combining the feature data in the basic data set based on a method in feature engineering to acquire new data features and generate a feature data set.
As a further improvement of the present invention, the data normalization includes:
and carrying out data normalization operation on the feature data set generated by the feature extraction by adopting a maximum value normalization method, so that the processed data is limited between 0 and 1.
As a further improvement of the present invention, the feature contribution calculation and screening includes:
for the known category addresses, calculating information gain values of all characteristic attributes contained in the data set after data normalization by using an information gain calculation method, sorting and screening out the characteristics with characteristic contribution values lower than a threshold value, recording the names of the screened out characteristic attributes, and forming a new characteristic data set by the rest characteristic attributes;
And directly deleting the screened characteristic attribute recorded in the known class address from the data set after data normalization aiming at the unknown class address, and forming a new characteristic data set by the rest characteristic attributes to serve as the data set to be classified.
As a further improvement of the present invention, the sample class imbalance processing includes:
and processing the data set after feature contribution calculation and screening by using a boundary synthesis minority class oversampling technology to obtain the sample data set.
The invention also discloses a class speculation system of the block chain digital currency virtual address, which comprises:
A data processing module for:
Acquiring a known category address and an unknown category address of the blockchain digital currency;
carrying out transaction retrieval, feature extraction, data normalization, feature contribution calculation and screening and sample class imbalance treatment on the known class address to obtain a sample data set;
Carrying out transaction retrieval, feature extraction, data normalization and feature contribution screening on the unknown class address to obtain a data set to be classified;
The classifier generation module is used for:
Dividing the sample data set into a training data set for model training and a test data set for model evaluation, and selecting an optimal model as a classifier for digital currency address category estimation after multiple iterations;
A classification module for:
And carrying out classification calculation on the input data set to be classified based on the classifier to obtain the category to which the data set to be classified belongs.
As a further improvement of the present invention, in the data processing module, the transaction retrieval and feature extraction includes:
obtaining all transaction information participated in by the known type address or the unknown type address in the account book data of the blockchain digital currency;
Extracting features of the transaction information to obtain a basic data set; the basic data set comprises total transaction times, sum of each transaction, times taken as output addresses, times of participating in cast coin transactions, time for receiving the bit coin for the first time, time for spending the bit coin for the first time, output address count of each transaction and input address count of each transaction;
And combining the feature data in the basic data set based on a method in feature engineering to acquire new data features and generate a feature data set.
As a further improvement of the present invention, in the data processing module, the data normalization includes:
and carrying out data normalization operation on the feature data set generated by the feature extraction by adopting a maximum value normalization method, so that the processed data is limited between 0 and 1.
As a further improvement of the present invention, in the data processing module, the feature contribution calculation and screening includes:
for the known category addresses, calculating information gain values of all characteristic attributes contained in the data set after data normalization by using an information gain calculation method, sorting and screening out the characteristics with characteristic contribution values lower than a threshold value, recording the names of the screened out characteristic attributes, and forming a new characteristic data set by the rest characteristic attributes;
And directly deleting the screened characteristic attribute recorded in the known class address from the data set after data normalization aiming at the unknown class address, and forming a new characteristic data set by the rest characteristic attributes to serve as the data set to be classified.
As a further improvement of the present invention, in the data processing module, the sample class imbalance processing includes:
and processing the data set after feature contribution calculation and screening by using a boundary synthesis minority class oversampling technology to obtain the sample data set.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention learns the characteristics of the virtual addresses of the known classes so as to infer the class to which the virtual addresses of the unknown classes belong, and can solve the problem that most of the virtual addresses in the blockchain network are in an information unknown state;
2. According to the method, the characteristics with the contribution degree lower than the threshold value are removed by adopting the characteristic screening mode, so that the characteristic dimension is reduced, and the classification efficiency of the whole model is improved;
3. According to the invention, a cyclic iterative classification model training mode is adopted, and an optimal algorithm and the most appropriate parameters are selected in different classification scenes, so that the finally generated classifier is ensured to be more accurate in estimating the current virtual address category;
4. the invention adopts a layer-by-layer discrimination mode for the virtual address, and outputs the virtual address as other categories when the characteristics of the virtual address are not matched with various characteristics learned by the classifier.
Drawings
FIG. 1 is a flow chart of a method and system for class speculation for blockchain digital currency virtual addresses in accordance with one embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention is described in further detail below with reference to the attached drawing figures:
The invention provides a category speculation method and system of a block chain digital currency virtual address, which firstly uses a data mining method to analyze and extract transaction information (such as the number of times of participating in a transaction, the number of times as output, the amount of each transaction, the time of participating in the transaction and the like) participated in by the virtual address of the block chain digital currency, and obtains the characteristic data of the virtual address; then screening, processing and constructing a data set on the characteristic data by adopting a characteristic engineering mode; by analyzing the characteristic data set and performing characteristic comparison peer-to-peer work, a scheme for classifying virtual addresses in the blockchain digital currency is completed.
Specific:
Example 1:
As shown in FIG. 1, the present invention provides a class speculation method for a blockchain digital currency virtual address, comprising:
step 1, obtaining a known category address and an unknown category address of block chain digital currency;
Specific:
The invention takes the bitcoin as a research object, obtains the known class bitcoin addresses of different classes and the unknown class bitcoin address to be identified in a bitcoin address class label website, and takes the class label of the known class bitcoin address as the output of the training of the subsequent classifier.
Step 2, carrying out transaction retrieval, feature extraction, data normalization, feature contribution calculation and screening and sample class imbalance treatment on the known class address to obtain a sample data set;
Specific:
transaction retrieval and feature extraction:
Obtaining all transaction information participated in by the known category address from account data of the blockchain digital currency; extracting characteristics of transaction information to obtain a basic data set; and combining the characteristic data in the basic data set based on a method in the characteristic engineering to acquire new data characteristics and generate a characteristic data set. Wherein,
Taking the bitcoin as a research object, and acquiring all transaction information participated in by a given bitcoin address from bitcoin official account book data by transaction retrieval; the feature extraction is to analyze the transaction information corresponding to the given bit coin address, and extract data such as the total number of transactions, the amount of each transaction, the number of times as an output address, the number of times of participating in cast coin transactions, the time of receiving the bit coin for the first time, the time of spending the bit coin for the first time, the input address count of each transaction and the like as a basic data set of the address; and then combining the feature data in the basic data set based on a method in the feature engineering to acquire new data features and generate a feature data set.
Data normalization:
Performing data normalization operation on the feature data set generated by feature extraction by adopting a maximum value normalization method, so that the processed data is limited between 0 and 1, and thus, adverse effects caused by singular sample data are eliminated; the maximum value normalization method is to use an attribute value X i to subtract a minimum value min (X) in the attribute X and then divide the attribute by a difference between a maximum value max (X) and the minimum value min (X).
Feature contribution calculation and screening:
for the known category address, calculating the information gain value of each characteristic attribute contained in the data set after data normalization by using an information gain calculation method, sorting and screening out the characteristics with the characteristic contribution value lower than the threshold value, recording the names of the screened out characteristic attributes, and forming a new characteristic data set by the rest characteristic attributes.
Sample class imbalance processing:
Processing the sample imbalance problem using a boundary synthesis minority class oversampling technique (Borderline SMOTE) for the identified classified bitcoin addresses and generating a sample dataset; borderline SMOTE the sampling process is to divide the minority class samples into Safe, danger and Noise. More than half of the periphery of the Safe sample is a few samples; more than half of the Danger samples are all the most types of samples; the Noise samples are all surrounded by a plurality of types of samples. Borderline SMOTE oversamples only a few classes of samples of Danger. The implementation of oversampling involves a total of 3 steps: a. for each sample y in the minority class of Danger, calculating the distance from the sample y to all samples in the minority class sample set by taking Euclidean distance as a standard, and obtaining the k neighbor of the sample y. b. A sampling proportion is set according to the sample unbalance proportion to determine the sampling multiplying power N, for each minority class sample y, a plurality of samples are randomly selected from k neighbors of the minority class sample y, and the selected neighbor is assumed to be y n. c. For each randomly selected neighbor y n, the new sample y new constructed with the original sample is any point on the y and y n connection line.
Step 3, dividing the sample data set into a training data set for model training and a test data set for model evaluation, and selecting an optimal model as a classifier for digital currency address category estimation after multiple iterations;
Specific:
Model training: in the step, a training data set is used for parameter adjustment training on a plurality of classification algorithms in machine learning, such as a K nearest neighbor algorithm, a Bayesian algorithm, a decision tree algorithm, a random forest algorithm, a gradient lifting tree algorithm and the like.
Model evaluation: the method comprises the steps that a classification algorithm involved in model training is subjected to comparison evaluation on multiple evaluation indexes such as accuracy, precision, recall rate and F1 score by using a test data set, when the model does not reach all preset evaluation parameters, the classifier is considered to be failed to generate, and the model training step is returned to carry out parameter adjustment training of the model again; and if at least one model reaches all preset evaluation parameters, selecting the model with the best training effect to generate a classifier.
Step 4, carrying out transaction retrieval, feature extraction, data normalization and feature contribution screening on the unknown class address to obtain a data set to be classified;
Specific:
transaction retrieval and feature extraction:
the method is the same as the step 2;
data normalization:
the method is the same as the step 2;
Feature contribution screening:
and aiming at the unknown class address, directly deleting the screened characteristic attribute recorded in the known class address from the data set after data normalization, and forming a new characteristic data set by the rest characteristic attributes as the data set to be classified.
Step 5, classifying and calculating the input data set to be classified based on the classifier to obtain the category to which the data set to be classified belongs;
Specific:
Taking the processed data set to be classified as the input of a classifier, and the classifier can complete the category speculation each bit coin address according to the characteristic data of each bit coin address; that is, the classifier calculates the input feature data set, classifies the bitcoin addresses of the unidentified classifications into the categories closer to the feature data thereof according to the calculation result, and outputs the categories as category projections of the corresponding bitcoin addresses.
Example 2:
as shown in FIG. 1, the present invention provides a class speculation system for blockchain digital currency virtual addresses, comprising:
the data processing module is used for realizing the steps 1, 2 and 4;
The classifier generating module is used for realizing the step 3;
And the classification module is used for realizing the step 5.
Example 3:
The embodiment provides an electronic device, which comprises a processor and a memory; wherein,
A memory storing codes;
the processor executes code for performing the class speculation method of embodiment 1.
Example 4
The present embodiment provides a computer-readable storage medium storing a program including instructions that, when executed by a computer, cause the computer to perform the category inference method of embodiment 1.
In an embodiment of the invention, the processor may be an integrated circuit chip having signal processing capabilities. The Processor may be a general purpose Processor, a digital signal Processor (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), field programmable gate array (Field Programmable GATE ARRAY, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components.
The disclosed methods, steps, and logic blocks in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The processor reads the information in the storage medium and, in combination with its hardware, performs the steps of the above method.
The storage medium may be memory, for example, may be volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory.
The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable ROM (ELECTRICALLY EPROM, EEPROM), or a flash Memory.
The volatile memory may be a random access memory (Random Access Memory, RAM for short) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as static random access memory (STATIC RAM, SRAM), dynamic random access memory (DYNAMIC RAM, DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate Synchronous dynamic random access memory (Double DATA RATESDRAM, ddr SDRAM), enhanced Synchronous dynamic random access memory (ENHANCEDSDRAM, ESDRAM), synchronous link dynamic random access memory (SYNCHLINK DRAM, SLDRAM), and direct memory bus random access memory (DirectRambus RAM, DRRAM).
The storage media described in embodiments of the present invention are intended to comprise, without being limited to, these and any other suitable types of memory.
Those skilled in the art will appreciate that in one or more of the examples described above, the functions described in the present invention may be implemented in a combination of hardware and software. When the software is applied, the corresponding functions may be stored in a computer-readable medium or transmitted as one or more instructions or code on the computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (8)
1. A method for category speculation of blockchain digital currency virtual addresses, comprising:
Acquiring a known category address and an unknown category address of the blockchain digital currency;
Carrying out transaction retrieval, feature extraction, data normalization, feature contribution calculation and screening and sample class imbalance treatment on the known class address to obtain a sample data set; wherein, the characteristic contribution calculation and screening includes: for the known category addresses, calculating information gain values of all characteristic attributes contained in the data set after data normalization by using an information gain calculation method, sorting and screening out the characteristics with characteristic contribution values lower than a threshold value, recording the names of the screened out characteristic attributes, and forming a new characteristic data set by the rest characteristic attributes; for the unknown class address, directly deleting the screened characteristic attribute recorded in the known class address from the data set after data normalization, and forming a new characteristic data set by the rest characteristic attributes to serve as a data set to be classified;
Dividing the sample data set into a training data set for model training and a test data set for model evaluation, and selecting an optimal model as a classifier for digital currency address category estimation after multiple iterations; the method specifically comprises the following steps: model training: the method comprises the steps of performing parameter adjustment training on a plurality of classification algorithms in machine learning by using a training data set; model evaluation: the method comprises the steps that a classification algorithm involved in model training is subjected to comparison evaluation on multiple evaluation indexes such as accuracy, precision, recall rate and F1 score by using a test data set, when the model does not reach preset evaluation parameters, the classifier is considered to be failed to generate, and the model training step is returned to carry out parameter adjustment training of the model again; if at least one model reaches each preset evaluation parameter, selecting the model with the best training effect to generate a classifier;
Carrying out transaction retrieval, feature extraction, data normalization and feature contribution screening on the unknown class address to obtain a data set to be classified;
And carrying out classification calculation on the input data set to be classified based on the classifier to obtain the category to which the data set to be classified belongs.
2. The category inference method of claim 1, wherein the transaction retrieval and feature extraction comprises:
obtaining all transaction information participated in by the known type address or the unknown type address in the account book data of the blockchain digital currency;
Extracting features of the transaction information to obtain a basic data set; the basic data set comprises total transaction times, sum of each transaction, times taken as output addresses, times of participating in cast coin transactions, time for receiving the bit coin for the first time, time for spending the bit coin for the first time, output address count of each transaction and input address count of each transaction;
And combining the feature data in the basic data set based on a method in feature engineering to acquire new data features and generate a feature data set.
3. The class-estimation method of claim 1 or 2, wherein the data normalization comprises:
and carrying out data normalization operation on the feature data set generated by the feature extraction by adopting a maximum value normalization method, so that the processed data is limited between 0 and 1.
4. The class estimation method of claim 1, wherein the sample class imbalance process comprises:
and processing the data set after feature contribution calculation and screening by using a boundary synthesis minority class oversampling technology to obtain the sample data set.
5. A class speculation system for a blockchain digital currency virtual address, comprising:
A data processing module for:
Acquiring a known category address and an unknown category address of the blockchain digital currency;
carrying out transaction retrieval, feature extraction, data normalization, feature contribution calculation and screening and sample class imbalance treatment on the known class address to obtain a sample data set;
Carrying out transaction retrieval, feature extraction, data normalization and feature contribution screening on the unknown class address to obtain a data set to be classified; wherein, the characteristic contribution calculation and screening includes: for the known category addresses, calculating information gain values of all characteristic attributes contained in the data set after data normalization by using an information gain calculation method, sorting and screening out the characteristics with characteristic contribution values lower than a threshold value, recording the names of the screened out characteristic attributes, and forming a new characteristic data set by the rest characteristic attributes; for the unknown class address, directly deleting the screened characteristic attribute recorded in the known class address from the data set after data normalization, and forming a new characteristic data set by the rest characteristic attributes to serve as the data set to be classified;
The classifier generation module is used for:
Dividing the sample data set into a training data set for model training and a test data set for model evaluation, and selecting an optimal model as a classifier for digital currency address category estimation after multiple iterations; the method specifically comprises the following steps: model training: the method comprises the steps of performing parameter adjustment training on a plurality of classification algorithms in machine learning by using a training data set; model evaluation: the method comprises the steps that a classification algorithm involved in model training is subjected to comparison evaluation on multiple evaluation indexes such as accuracy, precision, recall rate and F1 score by using a test data set, when the model does not reach preset evaluation parameters, the classifier is considered to be failed to generate, and the model training step is returned to carry out parameter adjustment training of the model again; if at least one model reaches each preset evaluation parameter, selecting the model with the best training effect to generate a classifier;
A classification module for:
And carrying out classification calculation on the input data set to be classified based on the classifier to obtain the category to which the data set to be classified belongs.
6. The category inference system of claim 5, wherein in the data processing module, the transaction retrieval and feature extraction comprises:
obtaining all transaction information participated in by the known type address or the unknown type address in the account book data of the blockchain digital currency;
Extracting features of the transaction information to obtain a basic data set; the basic data set comprises total transaction times, sum of each transaction, times taken as output addresses, times of participating in cast coin transactions, time for receiving the bit coin for the first time, time for spending the bit coin for the first time, output address count of each transaction and input address count of each transaction;
And combining the feature data in the basic data set based on a method in feature engineering to acquire new data features and generate a feature data set.
7. The class speculation system of claim 5 or 6, wherein in the data processing module, the data normalization comprises:
and carrying out data normalization operation on the feature data set generated by the feature extraction by adopting a maximum value normalization method, so that the processed data is limited between 0 and 1.
8. The class speculation system of claim 5, wherein in the data processing module, the sample class imbalance process comprises:
and processing the data set after feature contribution calculation and screening by using a boundary synthesis minority class oversampling technology to obtain the sample data set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110272026.1A CN113052577B (en) | 2021-03-12 | 2021-03-12 | Class speculation method and system for block chain digital currency virtual address |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110272026.1A CN113052577B (en) | 2021-03-12 | 2021-03-12 | Class speculation method and system for block chain digital currency virtual address |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113052577A CN113052577A (en) | 2021-06-29 |
CN113052577B true CN113052577B (en) | 2024-08-09 |
Family
ID=76512365
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110272026.1A Active CN113052577B (en) | 2021-03-12 | 2021-03-12 | Class speculation method and system for block chain digital currency virtual address |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113052577B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114358101B (en) * | 2021-09-06 | 2024-12-31 | 成都链安科技有限公司 | Method and device for identifying virtual currency exchange names based on counterparty matching |
CN114615009A (en) * | 2022-01-18 | 2022-06-10 | 北京邮电大学 | Gateway flow-based digital currency detection method |
CN114520739A (en) * | 2022-02-14 | 2022-05-20 | 东南大学 | Phishing address identification method based on cryptocurrency transaction network node classification |
CN115967525A (en) * | 2022-10-25 | 2023-04-14 | 淮阴工学院 | A virtual currency abnormal address detection method and device based on capsule network |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111259924A (en) * | 2020-01-07 | 2020-06-09 | 吉林大学 | Boundary synthesis, mixed sampling, anomaly detection algorithm and data classification method |
CN111754345A (en) * | 2020-06-18 | 2020-10-09 | 天津理工大学 | A Bitcoin Address Classification Method Based on Improved Random Forest |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109918584A (en) * | 2019-03-25 | 2019-06-21 | 中国科学院自动化研究所 | Bitcoin exchange address identification method, system and device |
CN111444232A (en) * | 2020-01-03 | 2020-07-24 | 上海宓猿信息技术有限公司 | Method for mining digital currency exchange address and storage medium |
CN112435032A (en) * | 2020-10-22 | 2021-03-02 | 江苏大学 | Bit currency address incremental clustering method based on multi-input address clustering |
-
2021
- 2021-03-12 CN CN202110272026.1A patent/CN113052577B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111259924A (en) * | 2020-01-07 | 2020-06-09 | 吉林大学 | Boundary synthesis, mixed sampling, anomaly detection algorithm and data classification method |
CN111754345A (en) * | 2020-06-18 | 2020-10-09 | 天津理工大学 | A Bitcoin Address Classification Method Based on Improved Random Forest |
Also Published As
Publication number | Publication date |
---|---|
CN113052577A (en) | 2021-06-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113052577B (en) | Class speculation method and system for block chain digital currency virtual address | |
CN106778241B (en) | Malicious file identification method and device | |
CN105590055B (en) | Method and device for identifying user credible behaviors in network interaction system | |
CN111027069A (en) | Malware family detection method, storage medium and computing device | |
CN108712453A (en) | Detection method for injection attack, device and the server of logic-based regression algorithm | |
CN113591924A (en) | Phishing number detection method, system, storage medium and terminal equipment | |
CN110135193A (en) | A data desensitization method, device, equipment and computer-readable storage medium | |
CN113657896A (en) | A method and device for analyzing topological graph of blockchain transactions based on graph neural network | |
CN110069545B (en) | Behavior data evaluation method and device | |
CN110084609B (en) | Transaction fraud behavior deep detection method based on characterization learning | |
CN107092827A (en) | A kind of Android malware detection method based on improvement forest algorithm | |
CN112801784A (en) | Bit currency address mining method and device for digital currency exchange | |
Shi et al. | An improved agglomerative hierarchical clustering anomaly detection method for scientific data | |
CN115510981A (en) | Decision tree model feature importance calculation method and device and storage medium | |
CN112437053A (en) | Intrusion detection method and device | |
CN118400152A (en) | Network intrusion detection method | |
CN112488140B (en) | Data association method and device | |
Assis et al. | A genetic programming approach for fraud detection in electronic transactions | |
CN113283901A (en) | Byte code-based fraud contract detection method for block chain platform | |
CN112632219A (en) | Method and device for intercepting junk short messages | |
CN111291370B (en) | Network data intrusion detection method, system, terminal and storage medium | |
CN114792007A (en) | Code detection method, device, equipment, storage medium and computer program product | |
CN114298169A (en) | A Graph Classification-Based Recognition Method for Bitcoin Mixing Service Types | |
Borkar et al. | Comparative study of supervised learning algorithms for fake news classification | |
CN119312147B (en) | Privacy-enhanced detection method based on generative adversarial network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |