Disclosure of Invention
The invention aims to: the track-based sliding verification code man-machine identification method aims to construct a multi-dimensional effective characteristic system to identify a trigger of a sliding verification code and ensure the safety of a network environment protected by verification behaviors.
The technical scheme adopted by the invention is as follows:
a track-based sliding verification code man-machine identification method comprises the following steps:
s1: collecting user track data;
s2: constructing a multi-dimensional characteristic system according to the track data;
s3: and distinguishing the tracks of the multi-dimensional characteristic system according to the designed human-computer recognition model.
Further, the multi-dimensional feature system comprises an X feature, a Y feature and a T feature.
Further, the specific steps of the X feature extraction are as follows:
s201: extracting an X characteristic class, and carrying out normalization processing on a track transverse coordinate X;
s202: dividing the track transverse coordinate into a front half section and a rear half section;
s203: respectively extracting a plurality of X characteristic groups of a front half section X front, a rear half section X rear, a front half section collar difference X front diff, a rear half section collar difference X rear diff and a stop section final stop of a track;
s204: and extracting features in each X feature group, including maximum value, peak value, median value, variance, minimum value and range.
Further, the Y feature extraction specifically comprises the following steps:
s211: extracting a Y characteristic class, and carrying out normalization processing on the track longitudinal coordinate Y;
s212: respectively extracting a plurality of Y characteristic groups of the whole section Y, the half-folded Y half, the whole section adjacent potential difference Y diff and the whole section adjacent potential difference Y diff;
s213: features in each Y feature group are extracted, including variance, mean, range, and sum.
Further, the specific steps of the T feature extraction are as follows:
s221: extracting a T characteristic class, and carrying out normalization processing on the time characteristic T;
s222: extracting a T-X characteristic group, and subtracting the normalized time characteristic T from the normalized transverse coordinate X;
s223: and extracting features in the T-X feature group, including maximum value, peak value, median value, variance, minimum value and range.
Further, the step S3 is a specific step of designing a human-machine recognition model as follows:
s301: inputting the characteristics in the multi-dimensional characteristic system into a plurality of training models for algorithm training;
s302: and carrying out linear weighting on the training output of the feature algorithm.
Further, the training model comprises: a Catboost model, an XGboost model, a Randomforest model and a Logistic regression model.
In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:
1. in the invention, a multi-dimensional characteristic system is designed by combining two phenomena of human tracks, the sliding verification habit of a user is described by using characteristics, and further the user operation is distinguished from machine simulation, so that the system has advantages in the countermeasures with black production tools of attackers, and plays a good countermeasure protection role.
2. In the invention, the behavior habit of the human during the sliding verification code is described by mainly adopting the transverse characteristic x, the characteristic of the machine is described by using the longitudinal characteristic y, and the difference between the human and the machine is described by using the time characteristic t as a supplement, so that the user operation and the machine simulation can be more accurately distinguished, and the accuracy of track distinguishing is improved.
3. In the invention, the actual verification effect is on a test set recorded by 200 ten thousand tracks, the harmonic F value of the accuracy rate and the recall rate reaches 88.56, and is far higher than the effect 87.89 of the scheme mainly describing a 'machine'.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Example 1
A track-based sliding verification code man-machine identification method comprises the following steps:
s1: collecting user track data;
the method comprises the steps of collecting user track data (x, y, t), including transverse coordinates x and longitudinal coordinates y of different time points t in the track triggering process, specifically, obtaining track records of a user in the sliding verification code triggering process, and providing data support for the construction of a sliding verification code multi-dimensional feature system.
S2: constructing a multi-dimensional characteristic system according to the track data;
based on the discovery of two modes, one, the terminal fold-back phenomenon of the human trajectory; secondly, the phenomenon of the human track is far, fast and slow; and performing multi-dimensional feature extraction to further construct a multi-dimensional feature system, so that later algorithm training of a man-machine recognition model is facilitated, and a difference between the man and the machine is found by using feature description instead of using feature description of the machine.
S3: carrying out track distinguishing on a multi-dimensional characteristic system according to a designed man-machine recognition model;
and inputting the characteristics of the multi-dimensional characteristic system into a designed man-machine recognition model for man-machine recognition model optimization, inputting the characteristics of the multi-dimensional characteristic body into the man-machine recognition model for feature learning to obtain different probability values, and enabling the probability output by the man-machine recognition model to be closer to the real type of the track in a linear weighting mode.
In the invention, the human track is analyzed, and based on the discovery of two modes, the mode I is as follows: human trajectory retrace phenomenon, mode two: the method comprises the steps of constructing a multi-dimensional characteristic system for the phenomena of far, fast and near human tracks, further designing a human-computer recognition model, as shown in fig. 1, constructing the multi-dimensional characteristic system for a training set corresponding to user tracks, further training the human-computer recognition model, constructing the same multi-dimensional characteristic system for a prediction set corresponding to predicted tracks, inputting the multi-dimensional characteristic system into the trained human-computer recognition model for model optimization and track type distinguishing training, and performing linear weighting on training output to obtain a probability value for distinguishing track types.
Example 2
On the basis of the embodiment 1, the multi-dimensional feature system comprises X features, Y features and T features.
The invention mainly adopts the transverse characteristic x to describe the behavior habit of the human when performing the sliding verification, uses the longitudinal characteristic y to describe the characteristic of the machine, and uses the time characteristic T as a supplement to describe the difference between the human and the machine, as shown in figure 2.
Further, the specific steps of the X feature extraction are as follows:
s201: extracting an X characteristic class, and carrying out normalization processing on a track transverse coordinate X;
s202: the track is divided into a first half section and a second half section by combining the phenomenon of 'far quick and near slow' of the human track mode II;
specifically, "far and near" indicates that the speed is faster when the verification code is slid farther from the target point and slower when the verification code is slid closer to the target point. Therefore, the horizontal x-feature group constructively divides the track into the first and second half segments to be extracted respectively.
S203: respectively extracting a plurality of X characteristic groups of a front half section X front, a rear half section X rear, a front half section collar difference X front diff, a rear half section collar difference X rear diff and a stop section final stop of a track;
and combining the phenomenon of 'end retrace' of the human track mode one to construct an extraction stop final stop feature group.
Specifically, the transverse coordinate data of the whole track extracted from the track data (x, y, t) form a transverse sequence { x }1,x2,...,xt,...,xnGet the first half of the trace sequence { x }1,x2,...,xn/2Form x _ front, take the second half of the trace sequence { x }n/2,xn/2+1,...,xnConstitute x _ real, take the difference of the ortho position of the first half of the trajectory sequence { x }2-x1,x3-x2,.. constitute x _ front _ diff, taking the difference of the neighbors of the second half of the trajectory sequencen-1-xn-2,xn-xn-1And (5) forming an x _ real _ diff, and combining a human track mode one, and taking the last fifth of the track sequence to form a stop segment final _ stop.
S204: and extracting features in each X feature group, including maximum value, peak value, median value, variance, minimum value and range.
According to the invention, the transverse features of the feature system are designed from multiple dimensions, so that the input can be better provided for the model.
Further, the Y feature acquisition specifically includes the following steps:
s211: extracting a Y characteristic class, and carrying out normalization processing on the track longitudinal coordinate Y;
specifically, longitudinal coordinate data of the whole track extracted from the track data (x, y, t) form a transverse sequence { y }1,y2,...,yt,...,ynAnd normalizing the sequence.
S212: respectively extracting a plurality of Y characteristic groups of the whole section Y, the half-folded Y half, the whole section adjacent potential difference Y diff and the whole section adjacent potential difference Y diff;
specifically, for the normalized vertical sequence y1,y2,...,yt,...,ynGet the whole segment { y in the trace sequence1,y2,...,yt,...,ynForm y, take the whole segment in the trace sequence y1,y2,...,yt,...,ynSubtracting 0.5 from the sequence to form y _ half, and taking the adjacent position difference of the whole track sequence (y)2-y1,y3-y2,. } forming y _ diff, taking the neighboring difference of neighboring differences of the whole segment in the trajectory sequence { (y)3-y2)-(y2-y1) ,. constitute y _ diffdiffdiff.
S213: features in each Y feature group are extracted, including variance, mean, range, and sum.
When the transverse feature X is less descriptive in a certain track, the longitudinal feature y which is better descriptive to the 'machine' can play an auxiliary role in the model discrimination.
Further, the specific steps of the T-feature acquisition are as follows:
s221: extracting a T characteristic class, and carrying out normalization processing on the time characteristic T;
specifically, the independent time series are only sampling marks, have no good interpretability, but have better expression significance when combined with the transverse x series.
And (3) representing the speed condition in the track generation process on another layer by utilizing the normalized X minus T, and providing better characteristic input for the model.
S222: extracting a T-X characteristic group, and subtracting the normalized time characteristic T from the normalized transverse coordinate X;
specifically, the transverse coordinate data of the whole track extracted from the track data (x, y, t) form a transverse sequence { x }1,x2,...,xt,...,xnAnd time series t1,t2,...,tnAre normalized separately, and then the transverse sequence { x1,x2,...,xt,...,xnSubtract the time series t1,t2,...,tn}。
S223: and extracting features in the T-X feature group, including maximum value, peak value, median value, variance, minimum value and range.
In summary, the feature groups and feature lists provided in this embodiment are shown in table 1:
example 3
On the basis of the embodiment 1, the step S3 is a specific step of designing a human-machine recognition model as follows:
s301: respectively inputting the characteristics in the multi-dimensional characteristic system into a plurality of training models for algorithm training;
s302: carrying out linear weighting on the training output of the feature algorithm;
further, the training model comprises: a Catboost model, an XGboost model, a Randomforest model and a Logistic regression model.
As shown in fig. 3, specifically, probability values output by training of the castboost model, the XGBoost model, the RandomForest model, and the logistic regression model are linearly weighted, so as to obtain the human-computer recognition model linearly weighted by the four basic models.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.