CN108287989B

CN108287989B - A Trajectory-based Sliding Verification Code Human-Machine Recognition Method

Info

Publication number: CN108287989B
Application number: CN201810050045.8A
Authority: CN
Inventors: 张敏; 陈媛; 阳小龙; 朱翔宇; 孙奇福
Original assignee: University of Science and Technology Beijing USTB
Current assignee: Zhongzi Highway Maintenance And Inspection Technology Co Ltd; CHECC Data Co Ltd
Priority date: 2018-01-18
Filing date: 2018-01-18
Publication date: 2021-05-28
Anticipated expiration: 2038-01-18
Also published as: CN108287989A

Abstract

The invention discloses a trajectory-based sliding verification code human-machine identification method, comprising the following steps: collecting user trajectory data; constructing a multi-dimensional feature system according to the trajectory data; In the present invention, a multi-dimensional feature system is designed by combining the two phenomena of human trajectory, and the user's sliding verification code habit is described by features, and then the user operation is distinguished from the machine imitation. In confrontation, it can take advantage and play a better role in confrontation and protection.

Description

Sliding verification code man-machine identification method based on track

Technical Field

The invention relates to the technical field of biometric authentication, in particular to a track-based sliding verification code man-machine identification method.

Background

The sliding verification code serving as a biological authentication technology can meet the requirement of the current network environment on identity authentication security, and is widely applied to various man-machine verification products. Meanwhile, the method is also focused by attackers, and the attackers develop black products capable of simulating human behaviors to challenge mouse tracks in the sliding verification code verification process.

The attacker generates a humanoid trace batch operation through the blackout facility to bypass detection, and continuously upgrades its counterfeit data during the countermeasure process to continuously bypass the same upgraded detection technique. The existing detection technology mainly aims at machine identification, a mode of countermeasures against continuously updated machine behaviors has hysteresis, and detection and update are always performed after a black production tool causes certain loss. Therefore, in the technical countermeasures with both sides being upgraded, it is important how to take advantage of the countermeasures with the black production tools of the attackers.

Disclosure of Invention

The invention aims to: the track-based sliding verification code man-machine identification method aims to construct a multi-dimensional effective characteristic system to identify a trigger of a sliding verification code and ensure the safety of a network environment protected by verification behaviors.

The technical scheme adopted by the invention is as follows:

a track-based sliding verification code man-machine identification method comprises the following steps:

s1: collecting user track data;

s2: constructing a multi-dimensional characteristic system according to the track data;

s3: and distinguishing the tracks of the multi-dimensional characteristic system according to the designed human-computer recognition model.

Further, the multi-dimensional feature system comprises an X feature, a Y feature and a T feature.

Further, the specific steps of the X feature extraction are as follows:

s201: extracting an X characteristic class, and carrying out normalization processing on a track transverse coordinate X;

s202: dividing the track transverse coordinate into a front half section and a rear half section;

s203: respectively extracting a plurality of X characteristic groups of a front half section X front, a rear half section X rear, a front half section collar difference X front diff, a rear half section collar difference X rear diff and a stop section final stop of a track;

s204: and extracting features in each X feature group, including maximum value, peak value, median value, variance, minimum value and range.

Further, the Y feature extraction specifically comprises the following steps:

s211: extracting a Y characteristic class, and carrying out normalization processing on the track longitudinal coordinate Y;

s212: respectively extracting a plurality of Y characteristic groups of the whole section Y, the half-folded Y half, the whole section adjacent potential difference Y diff and the whole section adjacent potential difference Y diff;

s213: features in each Y feature group are extracted, including variance, mean, range, and sum.

Further, the specific steps of the T feature extraction are as follows:

s221: extracting a T characteristic class, and carrying out normalization processing on the time characteristic T;

s222: extracting a T-X characteristic group, and subtracting the normalized time characteristic T from the normalized transverse coordinate X;

s223: and extracting features in the T-X feature group, including maximum value, peak value, median value, variance, minimum value and range.

Further, the step S3 is a specific step of designing a human-machine recognition model as follows:

s301: inputting the characteristics in the multi-dimensional characteristic system into a plurality of training models for algorithm training;

s302: and carrying out linear weighting on the training output of the feature algorithm.

Further, the training model comprises: a Catboost model, an XGboost model, a Randomforest model and a Logistic regression model.

In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:

1. in the invention, a multi-dimensional characteristic system is designed by combining two phenomena of human tracks, the sliding verification habit of a user is described by using characteristics, and further the user operation is distinguished from machine simulation, so that the system has advantages in the countermeasures with black production tools of attackers, and plays a good countermeasure protection role.

2. In the invention, the behavior habit of the human during the sliding verification code is described by mainly adopting the transverse characteristic x, the characteristic of the machine is described by using the longitudinal characteristic y, and the difference between the human and the machine is described by using the time characteristic t as a supplement, so that the user operation and the machine simulation can be more accurately distinguished, and the accuracy of track distinguishing is improved.

3. In the invention, the actual verification effect is on a test set recorded by 200 ten thousand tracks, the harmonic F value of the accuracy rate and the recall rate reaches 88.56, and is far higher than the effect 87.89 of the scheme mainly describing a 'machine'.

Drawings

FIG. 1 is a graph of a multi-dimensional feature system feature relationship of the present invention;

FIG. 2 is a conceptual diagram of a multi-dimensional feature hierarchy of the present invention;

FIG. 3 is a diagram of a human-machine recognition model relationship according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Example 1

s1: collecting user track data;

the method comprises the steps of collecting user track data (x, y, t), including transverse coordinates x and longitudinal coordinates y of different time points t in the track triggering process, specifically, obtaining track records of a user in the sliding verification code triggering process, and providing data support for the construction of a sliding verification code multi-dimensional feature system.

based on the discovery of two modes, one, the terminal fold-back phenomenon of the human trajectory; secondly, the phenomenon of the human track is far, fast and slow; and performing multi-dimensional feature extraction to further construct a multi-dimensional feature system, so that later algorithm training of a man-machine recognition model is facilitated, and a difference between the man and the machine is found by using feature description instead of using feature description of the machine.

S3: carrying out track distinguishing on a multi-dimensional characteristic system according to a designed man-machine recognition model;

and inputting the characteristics of the multi-dimensional characteristic system into a designed man-machine recognition model for man-machine recognition model optimization, inputting the characteristics of the multi-dimensional characteristic body into the man-machine recognition model for feature learning to obtain different probability values, and enabling the probability output by the man-machine recognition model to be closer to the real type of the track in a linear weighting mode.

In the invention, the human track is analyzed, and based on the discovery of two modes, the mode I is as follows: human trajectory retrace phenomenon, mode two: the method comprises the steps of constructing a multi-dimensional characteristic system for the phenomena of far, fast and near human tracks, further designing a human-computer recognition model, as shown in fig. 1, constructing the multi-dimensional characteristic system for a training set corresponding to user tracks, further training the human-computer recognition model, constructing the same multi-dimensional characteristic system for a prediction set corresponding to predicted tracks, inputting the multi-dimensional characteristic system into the trained human-computer recognition model for model optimization and track type distinguishing training, and performing linear weighting on training output to obtain a probability value for distinguishing track types.

Example 2

On the basis of the embodiment 1, the multi-dimensional feature system comprises X features, Y features and T features.

The invention mainly adopts the transverse characteristic x to describe the behavior habit of the human when performing the sliding verification, uses the longitudinal characteristic y to describe the characteristic of the machine, and uses the time characteristic T as a supplement to describe the difference between the human and the machine, as shown in figure 2.

Further, the specific steps of the X feature extraction are as follows:

s202: the track is divided into a first half section and a second half section by combining the phenomenon of 'far quick and near slow' of the human track mode II;

specifically, "far and near" indicates that the speed is faster when the verification code is slid farther from the target point and slower when the verification code is slid closer to the target point. Therefore, the horizontal x-feature group constructively divides the track into the first and second half segments to be extracted respectively.

and combining the phenomenon of 'end retrace' of the human track mode one to construct an extraction stop final stop feature group.

Specifically, the transverse coordinate data of the whole track extracted from the track data (x, y, t) form a transverse sequence { x }₁,x₂,...,x_t,...,x_nGet the first half of the trace sequence { x }₁,x₂,...,x_n/2Form x _ front, take the second half of the trace sequence { x }_n/2,x_n/2+1,...,x_nConstitute x _ real, take the difference of the ortho position of the first half of the trajectory sequence { x }₂-x₁,x₃-x₂,.. constitute x _ front _ diff, taking the difference of the neighbors of the second half of the trajectory sequence_n-1-x_n-2,x_n-x_n-1And (5) forming an x _ real _ diff, and combining a human track mode one, and taking the last fifth of the track sequence to form a stop segment final _ stop.

According to the invention, the transverse features of the feature system are designed from multiple dimensions, so that the input can be better provided for the model.

Further, the Y feature acquisition specifically includes the following steps:

specifically, longitudinal coordinate data of the whole track extracted from the track data (x, y, t) form a transverse sequence { y }₁,y₂,...,y_t,...,y_nAnd normalizing the sequence.

specifically, for the normalized vertical sequence y₁,y₂,...,y_t,...,y_nGet the whole segment { y in the trace sequence₁,y₂,...,y_t,...,y_nForm y, take the whole segment in the trace sequence y₁,y₂,...,y_t,...,y_nSubtracting 0.5 from the sequence to form y _ half, and taking the adjacent position difference of the whole track sequence (y)₂-y₁,y₃-y₂,. } forming y _ diff, taking the neighboring difference of neighboring differences of the whole segment in the trajectory sequence { (y)₃-y₂)-(y₂-y₁) ,. constitute y _ diffdiffdiff.

When the transverse feature X is less descriptive in a certain track, the longitudinal feature y which is better descriptive to the 'machine' can play an auxiliary role in the model discrimination.

Further, the specific steps of the T-feature acquisition are as follows:

specifically, the independent time series are only sampling marks, have no good interpretability, but have better expression significance when combined with the transverse x series.

And (3) representing the speed condition in the track generation process on another layer by utilizing the normalized X minus T, and providing better characteristic input for the model.

specifically, the transverse coordinate data of the whole track extracted from the track data (x, y, t) form a transverse sequence { x }₁,x₂,...,x_t,...,x_nAnd time series t₁,t₂,...,t_nAre normalized separately, and then the transverse sequence { x₁,x₂,...,x_t,...,x_nSubtract the time series t₁,t₂,...,t_n}。

In summary, the feature groups and feature lists provided in this embodiment are shown in table 1:

example 3

On the basis of the embodiment 1, the step S3 is a specific step of designing a human-machine recognition model as follows:

s301: respectively inputting the characteristics in the multi-dimensional characteristic system into a plurality of training models for algorithm training;

s302: carrying out linear weighting on the training output of the feature algorithm;

As shown in fig. 3, specifically, probability values output by training of the castboost model, the XGBoost model, the RandomForest model, and the logistic regression model are linearly weighted, so as to obtain the human-computer recognition model linearly weighted by the four basic models.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. a trajectory-based sliding verification code man-machine identification method, is characterized in that, comprises the following steps:

S1: Collect user trajectory data;

S2: Build a multi-dimensional feature system based on trajectory data;

S3: According to the designed human-machine identification model, the multi-dimensional feature system is tracked;

The multi-dimensional feature system includes X features, Y features, and T features;

The specific steps of the X feature extraction are as follows:

S201: Extract the X feature class, and normalize the lateral coordinate x of the trajectory;

S202: Divide the lateral coordinates of the trajectory into the first half and the second half;

S203: extracting the first half segment x front, the second half segment x rear, the first half adjacent position difference x front diff, the second half adjacent position difference x rear diff, and the stop segment final stop multiple X feature groups respectively;

S204: Extract the features in each X feature group, including the maximum value, the peak value, the median value, the variance, the minimum value, and the range;

The specific steps of the Y feature extraction are as follows:

S211: Extract the Y feature class, and normalize the longitudinal coordinate y of the trajectory;

S212: extracting multiple Y feature groups of the full segment y, the half-folded y half, the adjacent position difference y diff of the whole segment, and the adjacent position difference ydiff diff of the adjacent position difference of the whole segment, respectively;

S213: Extract features in each Y feature group, including variance, mean, range, and sum;

The specific steps of the T feature extraction are as follows:

S221: Extract the T feature class, and normalize the time feature t;

S222: Extract the T-X feature group, and subtract the normalized time feature t from the normalized horizontal coordinate x;

S223: Extract the features in the T-X feature group, including the maximum value, the peak value, the median value, the variance, the minimum value, and the range.

2. a kind of trajectory-based sliding verification code man-machine identification method according to claim 1, is characterized in that, described step S3 man-machine identification model design concrete steps are as follows:

S301: Input the features in the multi-dimensional feature system into multiple training models for algorithm training;

S302: Perform linear weighting on the training output of the feature algorithm.

3. A trajectory-based human-machine identification method for sliding verification codes according to claim 2, wherein the training model comprises: CatBoost model, XGBoost model, RandomForest model, and LogisticRegression model.