Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
As shown in fig. 1, an embodiment of the present invention proposes a method for service carrier automated registration and login and funds account acquisition, the method comprising the steps of:
step S1, accessing a target website domain name through a headless browser, and capturing a current visible area after loading is completed to generate a page image;
step S2, inputting the page image into a pre-trained target detection model, and identifying the category, position coordinates and confidence level of the UI element in the page;
S3, converting the coordinates of the elements in the frame in the interactive task queue into the coordinates of the corresponding frame, and controlling the browser to switch to the corresponding frame;
Step S4, traversing the interactive task queue, and sequentially executing the following operations of performing simulated clicking operation on button elements, injecting virtual identity information into input frame elements, calling a corresponding identification model for verification code elements, and finishing verification;
Step S5, page scrolling and submitting detection, after finishing the operation of the interactive task queue, detecting whether a visible submitting button exists on the current page, if not, scrolling the page and re-executing the steps S1 to S2 until the visible submitting button is identified, and if so, triggering the submitting operation;
and S6, if the user jumps to the recharging page after submitting, extracting payment account information through the regular expression, and storing the payment account information and the virtual identity information in a database in an associated mode.
In the embodiment of the invention, the full-flow automatic execution simulates manual operation through a headless browser, and the page elements are identified by combining the target detection model, so that the full-flow automatic from accessing websites and filling information to submitting forms is realized, the manual intervention cost is reduced, and the method is particularly suitable for scenes such as batch registration, multi-account management and the like. The intelligent sequencing of task priorities is to prioritize UI elements (such as filling priority) based on preset rules, so that the interactive tasks are ensured to be executed according to a logic sequence, misoperation caused by disordered element loading sequence is avoided, and flow stability is improved. The element coordinate conversion in the frame aims at a nested frame (such as iframe) scene, converts the element coordinate into the coordinate of the opposite frame and switches the context, so that the technical difficulty of cross-frame interaction is solved, and the accurate positioning of the operation to the target element is ensured. And calling a special recognition model (such as OCR and machine learning models) aiming at the verification code, breaking through the limitation of the traditional automatic tool in the scene of the verification code, and improving the processing capacity of the complex page. The dynamic page rolling detection automatically detects the visibility of the submit button, refreshes the visible area by rolling the page and re-identifies the elements, adapts to the scene of long page or dynamic loading content, and avoids the interruption of the flow caused by element hiding. And if the multi-page skip is to be skipped to a recharging page after the multi-page skip and data association are submitted, extracting payment account information through a regular expression and binding and storing the payment account information with a virtual identity, so that the automatic collection and the structured storage of the cross-page data are realized. The model-driven expandability target detection model and the verification code identification model can be continuously optimized through training, adapt to UI changes of different websites, reduce tool failure risks caused by website modification, and improve cross-platform adaptation capability. The data security and compliance are simulated through virtual identity information injection (instead of real data), so that real data leakage risk is reduced, and meanwhile, service requirements such as automatic test and data acquisition in a compliance scene are supported.
In a preferred embodiment of the present invention, step S1, accessing a domain name of a target website through a headless browser, and after loading, capturing a current visible area to generate a page image, includes:
step S11, after the page loading is completed, detecting whether a popup window element shielding a key area exists in the current visible area, if so, automatically calculating the coordinate position of a popup window closing button and triggering a simulated clicking operation to expose the bottom page element;
Step S12, based on the page state of the cleared popup window, identifying the position of a registration button in the page, calculating the coordinate of the center point of the registration button, executing coordinate positioning click through a browser driver, and triggering the page to jump to the registration page;
Step S13, after loading the registered page, controlling the browser to execute vertical scrolling operation, wherein the initial scrolling position is the top of the page, and scrolling step sizes are set according to the screen height in an equal division manner, and the browser is gradually scrolled downwards to the bottom of the page;
And S14, after the registration form submitting operation is completed, monitoring a new page URL which is jumped to by the browser, judging as a recharging page when the page URL contains characteristic keywords, and directly intercepting the current visible area image after the page loading is completed.
In the embodiment of the invention, the steps can be realized through the following specific schemes when applied in specific applications, for example:
in the step S11, the normal position (such as the middle lower part of the page) of the core interaction element such as the register button is preset and defined as the coordinate range of the key area (such as the interval of 30% -70% of the page height and 20% -80% of the width), the coordinate range of the popup window element in the screenshot is analyzed to judge whether the popup window is overlapped with the coordinate of the key area, if so, the popup window is judged to be shielded, the upper right corner area (usually the position of the close button) is calculated based on the coordinate (left, upper, wide and high) of the popup window element, the upper left corner+ (width-20 pixels and height-20 pixels) of the popup window is taken as the central point coordinate of the close button, and a click event is sent to the coordinate through the driving of the browser to close the popup window.
In the step S12, the boundary frame coordinates (left, upper, wide and high) of the registration button are output through the target detection model, the coordinates of the central point of the button are calculated to be (left+width/2, upper+height/2) according to the boundary frame coordinates, the coordinates of the central point are mapped to the visual port coordinate system of the browser, and the registration entry is accurately positioned by driving the execution mouse to move to the coordinates and triggering the clicking event to trigger the page jump.
In the step S13, the vertical height (H, unit pixel) of the view port of the browser is obtained, the total height of the page (which can be obtained through document. Body. Scroll height) is equally divided according to the view port height, the step length is set to be H, that is, one screen height is rolled each time, the H pixel is rolled downwards from the top (rolling position 0) of the page until the rolling position reaches or exceeds the total height of the page, so as to ensure that all contents are loaded, and after the screen is rolled to the bottom of the page, the screenshot interface is called to capture the current complete page image, including all form input boxes and submit buttons.
In the step S14, feature keywords (such as "recharge", "pay", "fund") of the recharging page are preset, the path or parameter part of the URL of the new page is analyzed, whether any keyword is included is judged, if the URL includes the feature keywords, the recharging page is judged, the screenshot operation is triggered, otherwise, monitoring is continued, the image in the view port of the current browser is directly intercepted for subsequent account information extraction, the jump stage of the registration process is automatically identified, the recharging page is accurately positioned, the key information page is timely captured, a foundation is laid for subsequent payment account information extraction through regular expressions, and the efficiency of cross-page data acquisition is improved.
In a preferred embodiment of the present invention, step S2, inputting the page image into a pre-trained target detection model, identifying UI element types, position coordinates and confidence in the page, and sorting the identification results based on a preset priority rule to generate an interactive task queue arranged in descending order of priority, including:
Step S21, inputting the page images generated in the step S13 and the step S14 into a pre-trained target detection model, and outputting a recognition result set of each UI element, wherein each recognition result comprises a category label, a position coordinate and a confidence coefficient;
Step S22, carrying out priority assignment on each identification result output in the step S21 according to Euclidean distance between the element center point and the page center to obtain weights respectively corresponding to the category labels, the position coordinates and the confidence level;
Step S23, arranging the element set with the priority label output in the step S22 in a descending order of priority, and arranging the element set with the priority in a descending order of confidence level with the priority element set to generate a structured task queue;
step S24, traversing the structured task queue of step S23, namely calculating the absolute pixel coordinates of each element in the browser window according to the normalized coordinates of each element, and converting the absolute pixel coordinates into XPath paths through a DOM position back-push algorithm.
In the embodiment of the invention, the steps can be realized through the following specific schemes when applied in specific applications, for example:
In the step S21, the whole page image or the visible area image generated in the steps S13 and S14 is scaled to the model input size (e.g. 640×480 pixels), the pixel proportion is kept to avoid distortion, the pre-trained target detection model (e.g. YOLO, fasterR-CNN) carries out convolution calculation on the image, the coordinates (left, upper, wide and high) of the bounding boxes of all visible UI elements (e.g. input boxes, buttons and labels) are identified, and the pre-defined category labels (e.g. "input", "button", "captcha") are matched, the model generates a confidence score (between 0 and 1) for each identification result, and the probability that the element identification is correct (e.g. confidence is equal to or greater than 0.8) is considered as reliable identification.
In the step S22, the coordinates of the center of the page are preset to be (page width/2, page height/2), the Euclidean distance between the center point (left+width/2, upper+height/2) of each element boundary frame and the center of the page is calculated, the closer the distance is, the higher the priority basic value is (for example, the element with the distance less than or equal to 1/4 of the page height is regarded as a 'high priority area'), the preset element category priority weight (for example, the input frame weight +30, the button weight +20 and the prompt text weight +10) is set, the key operation type element (for example, the register button and the submit button) is higher in weight, the element with the confidence more than or equal to 0.9 is additionally added with the priority score (for example +10), the element with the confidence less than or equal to 0.5 is marked as 'to be confirmed', the score (for example, -20 score), the total priority score is obtained by accumulating the distance, category and confidence score, and the total priority score is divided into five levels (for example, the highest level 1 and lowest level).
Level 1 (score no less than 80), the input box and the submit button are needed to be filled;
Grade 3 (score 40-60) option and common prompt label;
Grade 5 (score < 20) advertising elements, non-critical decorative patterns.
The method combines element position, functional importance and identification reliability, ensures the preferential treatment of the core interaction elements (such as a registration form input box), avoids the interference of secondary elements on the flow, adapts to different page layouts (such as difference between a mobile terminal and a PC terminal), automatically adjusts element priority, and improves the flexibility of the flow.
In step S23, the first order (descending order of priority) is first ordered according to the priority level of step S22 from 1 to 5, so as to ensure that the higher-level elements (such as the necessary entries) are in front. The method comprises the steps of performing secondary ranking (confidence descending order), namely, in the same-level elements, ranking from high to low according to a confidence score (for example, two input boxes which are all 2 levels and the element with the confidence of 0.9 is ranked in front of 0.8), and converting the ranked elements into JSON format task items containing operation types (click/input/verification), element coordinates, priority labels and confidence, so as to generate an executable queue structure (for example, task 1, task 2, task N).
The invention ensures that the automatic process is executed according to the principles of importance priority and reliability priority through the double sequencing of priority and confidence, reduces logic errors caused by disordered element sequences, and has the advantages of convenient debugging and log recording of a structured queue and supporting advanced functions such as midway pause, breakpoint continuous transmission and the like.
Step S24, normalizing the coordinates to absolute pixels:
If the element coordinates are normalized values (0-1 interval, for example, the center point coordinate x=0.5 represents the horizontal center line of the page), multiplying the actual size of the window of the browser (for example, the window width 1920 pixels, x=0.5x1920=960 pixels) to obtain absolute pixel coordinates (XY), obtaining a DOM tree of the page through the driving of the browser, and traversing the search element from the root node (html) according to the following logic:
Locating the uppermost visible element by elementFromPoint (XY) method according to absolute pixel coordinates (XY), recursively tracing the parent node upwards, concatenating the label name and attribute (such as// input [ @ id = 'username' ]) to generate a unique XPath path, removing redundancy level (such as skipping div container layer), and reserving the shortest effective path (such as// form [ @ class = 'register-form' ]/input [1 ]).
According to the method, through bidirectional mapping of the pixel coordinates and the DOM structure, the problem of matching of 'visual visible elements' and 'bottom code elements' is solved, simulation operation is ensured to be precisely acted on a target control (such as a hidden button which is prevented from being clicked to be shielded), an absolute pixel coordinate conversion mechanism is adapted to different screen sizes (such as a mobile phone, a flat plate and a desktop end), a XPath path supports dynamic page structure change, and the positioning failure risk caused by page modification is reduced.
In a preferred embodiment of the present invention, the pre-trained object detection model comprises:
Extracting a target website domain name from an abused domain name library, accessing the domain name through an automatic script and executing page loading state verification;
Based on the complete page screenshot, adopting a hierarchical clustering algorithm to group images with similar visual characteristics so as to obtain a de-duplicated image;
labeling category labels and position information of 17 types of UI elements on the image after the duplication removal to generate a labeling file conforming to a target detection format;
Integrating the annotation files, and dividing the annotation files into a training set, a verification set and a test set according to a preset proportion;
Based on the training set, the validation set and the test set, the following automatic optimization procedure is performed:
Initializing parameter populations, wherein each group of parameters comprises learning rate, anchor frame size and network structure configuration, performing model training on each parameter combination, calculating fitness by using verification set precision and loss value, iteratively generating a new population through selection, intersection and mutation operation until the fitness converges to obtain optimized parameters, performing whole network fine tuning and fusion incremental sample training by adopting the optimized parameters to obtain a pre-trained target detection model.
In the embodiment of the invention, the steps can be realized through the following specific schemes when applied in specific applications, for example:
Active domain names are screened from the abused domain name library (e.g., DNS resolution records are present in the last 30 days), excluding domain names that have been marked as inaccessible. The HTTP request is sent by an automated script (such as the requests library of Python), the response status code is detected (such as 200 indicates success), the response content is parsed, and it is verified whether the complete HTML structure is contained (such as the existence of < | DOCTYPEhtml > declaration, < HTML > root tag). The method and the device for acquiring the page data of the input model training can ensure that the page data of the input model training are valid samples which can be normally accessed, avoid the waste of calculation resources, automatically filter abnormal pages through structural verification rules and improve the reliability of data acquisition.
The method comprises the steps of presetting common popup window closing button features (such as an upper right corner X icon, coordinates are located in 10% -20% of the top of a page and 10% -20% of the right side area), positioning and triggering clicking through an image matching technology (such as template matching), positioning buttons based on page text features (such as 'registration', 'SignUp') or visual features (such as orange buttons and larger fonts), calculating center point coordinates, then simulating clicking, triggering a registration process, gradually scrolling the page (1/2 of the height of a window is scrolled each time) until the content at the bottom of the page is not updated (such as scrollHeight is unchanged after continuous scrolling), and capturing a complete image containing a foothold submitting button. The invention eliminates the interference element (popup window) by simulating manual operation and triggers key interaction (registration), ensures that the screenshot contains a complete registration form structure, and can capture the form element loaded on a long page or asynchronously by a rolling loading mechanism so as to avoid data loss.
Extracting visual features (such as color histogram, gradient direction histogram HOG and CSS style features) from each screenshot to generate feature vectors with fixed dimension (such as 512 dimensions), calculating cosine similarity (similarity >0.8 is regarded as similarity) among images to construct a tree-shaped cluster structure, selecting a center image as a representative sample for each cluster, eliminating other similar images in the cluster, retaining about 30% -50% of original images, ensuring that similar pages only retain 1-2 representative samples, eliminating repeated or highly similar page images (such as registered pages with different domain names but identical templates) through clustering, reducing training data quantity and retaining feature diversity;
the training efficiency is improved, the data volume after the duplicate removal is reduced, the model training time can be shortened, and repeated learning of redundant information is avoided.
Manually annotating the de-duplicated image with 17 classes of UI elements (e.g., input boxes, radio boxes, drop-down menus, captcha boxes, etc.), drawing bounding boxes and associating class labels using an annotating tool (e.g., labelMe, rectLabel), converting the annotation data into a format required by the object detection model (e.g., the. Txt format of Yolo, each line containing a class index and normalized coordinates).
Data set partitioning:
The invention randomly divides training sets, verification sets and test sets according to the ratio of 7:2:1, ensures that the distribution of various elements in each set is balanced (such as the input box is consistent with the whole in the training set), ensures element positioning accuracy (pixel-level boundary box) and class accuracy, provides reliable supervision signals for models, and supports performance verification (verification set parameter adjustment) and final generalization capability test (test set) in the model training process by a staged data set.
Model parameter optimization flow process based on genetic algorithm:
Parameter population initialization:
Generating 100 groups of initial parameter combinations, setting a learning rate range to be 0.001-0.1, presetting anchor frame size reference COCO data sets (such as (10, 13), (16, 30) and the like), selecting YOLOv-spp or FasterR-CNN variants by a network structure, training a model 50 by each group of parameters, and recording mAP (average precision mean) and loss values of a verification set, wherein an adaptability formula is that the adaptability = mAP-0.5 multiplied by loss values, and the higher the value is, the better the performance is represented.
Genetic manipulation:
selecting, namely reserving a parameter combination of which the fitness is 20% before as a parent;
crossing, wherein the parent parameters randomly exchange part of dimensions (such as the crossing of learning rate and anchor frame size);
Variation, adding random disturbance (such as learning rate + -10% fluctuation) to the parameters after crossing.
And stopping iteration when the iteration is terminated and the continuous 5 generations of fitness is improved by the amplitude of <1%, and selecting an optimal parameter combination.
And fine-tuning the model, training the model by using the optimized parameters, gradually adding newly acquired incremental samples (such as newly adding 1000 screenshots every week), and performing online learning.
The invention can replace manual trial-and-error parameter adjustment, quickly search the optimal parameter combination through a bionic algorithm, improve the training efficiency of the model (the parameter adjustment time is shortened by more than 70 percent), and enable the model to continuously learn a new page design mode (such as a novel verification code mode) by an incremental sample training mechanism, and delay the model failure period caused by website modification.
In a preferred embodiment of the present invention, calculating fitness using validation set accuracy and loss value comprises:
Based on the model training output of the current parameter combination, loading the prediction result of the model training output on the verification set;
Counting the matching condition of the prediction frame and the real frame, namely calculating the coincidence degree of the target detection model prediction frame and the labeling frame for each image of the verification set;
respectively counting the proportion of the correctly detected quantity to the corresponding class labeling total number according to 17 classes of UI elements, and taking arithmetic average on the precision values of all classes to obtain an average precision index of the verification set;
extracting CIoU loss values of all samples from a target detection model verification log, and taking arithmetic average on CIoU loss values of all samples to obtain a CIoU loss average value;
defining a precision weight coefficient and a loss weight coefficient, weighting an average precision index according to positive correlation, and weighting a CIoU loss mean value according to negative correlation to generate an fitness score.
In the embodiment of the invention, the steps can be realized through the following specific schemes when applied in specific applications, for example:
And (3) forward propagating each image in the verification set by using a target detection model trained by the current parameter combination, and outputting coordinates (x 1, y1, x2, y 2), category labels and confidence scores of the prediction frames. The prediction results are stored in a structured format (e.g., JSON) by image ID, and each prediction box contains [ category, confidence, coordinates ] information. And reading the real annotation frame information of the corresponding image from the verification set annotation file, and ensuring that the prediction result corresponds to the real label one by one according to the image ID. The invention stores the prediction and the real result in a unified format, provides a basis for subsequent matching calculation, avoids evaluation errors caused by inconsistent data structures, and saves intermediate results to support subsequent detailed analysis (such as false detection/missing detection case backtracking).
And traversing all the prediction frames and the true annotation frames of each image in the verification set.
IoU calculation, for each prediction box, calculate its intersection ratio with all real boxes (IoU):
Intersection area, which is to predict the pixel number of the overlapping area of the frame and the real frame;
union area, i.e. predicted frame area + real frame area-intersection area;
IoU = intersection area/union area.
And (3) judging the matching, wherein if IoU of a certain predicted frame and a certain real frame are more than or equal to 0.5 (the threshold value is adjustable) and the types are the same, the matching is judged to be effective once.
The method and the device provided by the invention have the advantages that IoU is used as standard measurement, the spatial coincidence degree of the prediction frame and the real target is intuitively reflected, the limitation of independent coordinate errors is avoided, each real target is ensured to have the opportunity to be correctly matched by traversing all possible frame pair combinations, and the evaluation accuracy under a multi-target scene is improved.
The calculation process for calculating the classification accuracy is as follows:
Category-level statistics, namely respectively counting according to 17 types of UI elements (such as an input box, a button and a verification code):
the correct detection number is the number of the predicted frames which is more than or equal to 0.5 and the real frames IoU with the same category;
The total number of labels is the total number of true label frames of the category in the verification set.
Class accuracy calculation, namely each class accuracy = correct detection number/annotation total number multiplied by 100%.
And calculating average precision, namely arithmetic averaging the 17-class precision values to obtain the average precision (mAP@0.5) of the verification set. The method provided by the invention distinguishes the detection effect of different UI elements, identifies the advantages of the model and the short plate (such as continuous missed detection of certain type of elements), avoids evaluation deviation (such as category leading result with large proportion) caused by uneven category distribution, and ensures equal contribution of all categories to the final index.
And extracting CIoU (CompleteIoU) loss values of each sample from the model verification log, comprehensively considering the overlapping rate, the center point distance and the length-width ratio of the prediction frame and the real frame by CIoU loss, removing extreme loss values (such as values exceeding 3 times of standard deviation), avoiding the influence of outliers on the overall average value, summing up CIoU loss values of all effective samples, and averaging to obtain CIoU loss average value. The method has the advantages that CIoU is lost, the overlapping rate, the position precision and the shape consistency are optimized, the quality of the predicted frame is reflected more comprehensively than that of the traditional IoU, noise interference is reduced through outlier filtering, the lost average value can represent the real performance of a model better, and misjudgment caused by individual sample fluctuation is avoided.
The weighted fitness score is generated as follows:
coefficient setting:
the importance of the average precision is reflected by the precision weight coefficient (such as 0.7);
The loss weight coefficient (e.g., 0.3) reflects the importance of CIoU loss.
Normalization:
Scaling the average precision index to the (0, 1) interval (if the original precision is 85%, the normalized value is 0.85);
CIoU the mean loss is mapped to the [0,1] interval by 1/(1+ mean loss) (the smaller the loss, the larger the mapping value).
Weighted summation-fitness score = precision weight coefficient x normalized precision + loss weight coefficient x normalized loss map value.
The invention considers the accuracy (precision) and the positioning quality (loss) of the model at the same time, avoids one-sided optimization caused by a single index (such as sacrificing the positioning precision only by pursuing a high recall), and flexibly adapts to different service requirements by adjusting the weight coefficient (such as improving the loss weight for scenes with extremely high positioning precision requirements).
In a preferred embodiment of the present invention, step S3, for an element in the frame in the interaction task queue, converts its coordinate into a coordinate relative to the frame, and controls the browser to switch to the corresponding frame, includes:
S31, analyzing XPath path characteristics generated in S24, if the path contains an HTML label or a frame hierarchical structure, judging that the corresponding element is positioned in the frame, and extracting a frame identifier (ID or Name attribute) of the frame in a parent page;
Step S32, relative coordinate conversion calculation, namely reading pixel coordinates (X_frame, Y_frame) of the upper left corner of the frame in a parent page through a browser interface, and reading the width W_frame and the height H_frame of the frame;
calculating the relative position of the elements within the frame:
relative abscissa= (element absolute abscissa-x_frame)/(w_frame);
relative ordinate= (element absolute ordinate-y_frame)/(h_frame);
Outputting normalized relative coordinates;
step S33, based on the frame identifier extracted in S31, locating DOM nodes of the corresponding frame in the browser, and switching WebDriver the operation context to the inside of the frame.
In the embodiment of the invention, the steps can be realized through the following specific schemes when applied in specific applications, for example:
in the above step S31, the XPath path is parsed, and the XPath path (such as// html/body/iframe [ @ id = 'register' ]/div/input) of each element in the task queue is traversed, and whether the iframe or frame tag is included is checked. If an iframe label exists in the path, extracting an id or name attribute value (such as a register) of the label as a unique identifier of the frame, and if the path comprises a nested frame (such as a father frame- & gtchild frame), recursively extracting the identifier of each level to generate a frame level chain (such as [ "parent_frame", "child_frame" ]).
According to the method, manual annotation is not needed, the frame level in the page is automatically found through XPath grammar characteristics, the problem of cross-frame positioning which is difficult to process by traditional automation tools is solved, the id/name is extracted as a positioning basis, the method is more stable than the coordinate, and the positioning failure risk caused by page layout change is reduced.
In the above step S32, the browser driver obtains the upper left corner coordinates (x_frame, y_frame) of the target frame in the parent page, obtains the width (w_frame) and the height (h_frame) of the frame, reads the absolute pixel coordinates (x_abs, y_abs) of the element from the task queue, and calculates the offset of the element with respect to the upper left corner of the frame:
Horizontal offset = x_abs-x_frame;
vertical offset = y_abs-y_frame.
Normalization:
dividing the offset by the frame size to obtain normalized relative coordinates:
Relative abscissa = horizontal offset/w_frame;
Relative ordinate = vertical offset/h_frame.
The normalized coordinates are irrelevant to the specific screen size, and the same set of coordinates can be multiplexed under different resolutions, so that the compatibility of an automation script is improved;
The dynamic frame is suitable for positioning elements accurately by relative coordinates as long as the internal structure of the frame is unchanged even if the frame position is changed due to page response type design, and the maintenance cost is reduced.
In the step S33, the frame identifier (e.g. id= "register") extracted in S31 is used to locate the frame DOM node by the switch_to.frame () method of the browser driver (e.g. Selenium), the context switch operation is performed to limit the scope of all subsequent operations (e.g. clicking and inputting) in the target frame, if there is a multi-layer nested frame, the frames are sequentially switched (e.g. switched to the parent frame and then to the child frame) according to the hierarchical chain, after switching, by driving the element query (e.g. find_element_by_xpath ()), whether the elements in the frame can be correctly located is verified, and the success of switching is ensured.
By context switching, the invention enables the automation tool to operate the elements in the frame like operating the common page, breaks through the limitation that the traditional tool can only operate the top page, ensures that all operations aiming at the elements in the frame are executed in correct contexts, and avoids operation failure or misoperation caused by context confusion.
In a preferred embodiment of the present invention, step S4 traverses the interaction task queue, and sequentially performs the following operations, including performing a simulated click operation on a button element, injecting virtual identity information into an input frame element, and calling a corresponding recognition model for a verification code element to complete verification, including:
S41, sequentially reading task items from the head of the structured task queue generated in S23, and determining an execution sequence according to priority labels in the task items;
step S42, the current task item:
If the element is located in the frame, the relative coordinates calculated in the step S32 are adopted, and if the element is located in the main document, the absolute coordinates converted in the step S24 are adopted, and the operation coordinates in the view port of the browser are dynamically calculated:
a main document directly using absolute pixel coordinates;
the relative coordinates are multiplied by the actual size of the frame and the frame offset;
Step S43, executing corresponding operation according to element category, clicking button operation, waiting for 500ms page response time after clicking, inputting form operation, branching processing according to verification code type, and sliding verification code;
and step S44, intercepting the visible area of the current page again after completing one task item operation, calling a target detection model to verify the operation effect, recording the positioning information of the current element if the operation fails, and reinserting the task into the tail of the queue.
In the embodiment of the invention, the steps can be realized through the following specific schemes when applied in specific applications, for example:
in step S41, the ordered structured task queues (the priorities are arranged in descending order and the priorities are arranged in descending order according to the confidence level) are obtained from step S23, the task items are extracted one by one according to the queue order, each task item comprises an element type (a button/an input box/a verification code), coordinate information and a priority label, if a high-priority task (such as an input box must be filled) exists in the queue, priority processing is performed, and a low-priority task (such as an option) is performed after the high-priority task is completed. The invention ensures the preferential execution of core operations (such as registration button clicking), avoids flow interruption caused by secondary element processing, can rapidly terminate the flow when the high-priority task fails, and reduces the resources consumed by invalid operations.
In the step S42, it is checked whether the XPath path of the task item includes an iframe tag, and it is determined whether the element is located in the frame, if so, the relative coordinates of S32 are used in the frame, and if so, the absolute coordinates of S24 are used in the main document.
Coordinate mapping:
a main document element directly using absolute pixel coordinates (x_abs, y_abs) as an operation point;
elements within the framework:
Acquiring a real-time size (W_frame, H_frame) and an upper left corner offset (X_frame, Y_frame) of the current frame;
the operation coordinates are calculated as x=relative abscissa×w_frame+x_frame, and y=relative ordinate×h_frame+y_frame, and if the calculated coordinates are out of the current viewport range, page scrolling is performed to bring the element into the viewable area. The method and the system uniformly process the positioning of the main document and the elements in the frame, do not need to write special logic for different contexts, and can accurately operate the elements through real-time calculation even if the size of the page changes due to interaction during operation.
In step S43, a browser driver (e.g. Selenium) is used to send a mouse click event to the calculated coordinates, wait 500ms after clicking, set response time for the page (e.g. loading new content, displaying prompt box), locate input box elements, empty original content, inject virtual identity information (e.g. randomly generated name, mobile phone number, mailbox) according to preset rules, call OCR model to identify characters in the picture, input the characters to the corresponding input box, identify the positions of the slide and notch, calculate the sliding distance, simulate human operation track (accelerate before decelerate) to drag the slide, identify prompt text, click the corresponding picture area.
The invention simplifies complex UI interaction into a unified operation interface through classification processing, reduces development cost, and customizes a solution for different types of verification codes so that an automation flow can process more than 80% of common verification mechanisms.
After the current task item is completed, the step S44 is described above, the visible area of the page is intercepted to generate a new image, the new image is input into the target detection model, the state change of the operation element (for example, whether the button is changed into a "clicked" style, whether the input box is filled with content) is identified, the attribute change of the element before and after the operation (for example, the disabled state of the button, the value attribute of the input box) is compared, if the verification fails (for example, the input box is still empty), the positioning information of XPath, coordinates and the like of the element is recorded, the task is reinserted into the tail of the queue, and the task is marked as requiring retry (multiple test for 3 times). The invention identifies the real-time verification operation effect through the model, avoids the hidden error of 'successful operation but not effective', automatically retries the failed task, and improves the success rate of the flow (the test shows that the overall passing rate can be improved by more than 25 percent).
In a preferred embodiment of the present invention, step S5, detecting whether a visible commit button exists on a current page after completion of the operation of the interactive task queue, if not, scrolling the page and re-executing steps S1 to S2 until the visible commit button is identified, and if so, triggering the commit operation, including:
step S51, based on the latest screenshot of the page after the operation of step S44 is completed, a target detection model is called to identify whether a 'submit button' class element exists in the current visible area;
Step S52, when the detection result of step S51 is invisible, of:
Acquiring the total height of the current page and the height of the window of the browser, setting the single scrolling amount to 80% of the height of the window, and executing scrolling operation:
recording the current position of the scroll bar as an initial position, triggering the browser to scroll downwards by a scroll amount unit, and waiting for the redrawing time of a 500ms page;
status update and cycling:
re-executing step S1, intercepting a new visible area image, re-executing step S2, generating a new interactive task queue, returning to step S51 for submitting button detection, and cycling until any one of the following conditions is met:
detecting a visible submit button, wherein the accumulated scrolling amount exceeds the total height of the page;
Step S53, in the process of re-identifying each time of scrolling, an operated element mark library is established, and the interaction task queue newly generated in the step S2 is filtered;
step S54, locating button coordinates when step S51 detects a visible submit button.
In the embodiment of the invention, the steps can be realized by inputting the latest page screenshot generated in the step S44 into a target detection model, classifying all elements in the screenshot by the model, screening out elements with labels of 'submit buttons' (such as 'register', 'immediately submit', and the like), checking whether button coordinates are in the current window range, and excluding the blocked elements (such as a popup window or a floating layer above the button).
The above step S51 may be implemented, for example, by obtaining the total height of the page (e.g., document. Body. Scroll height) and the window height, and setting the single scroll amount to 80% of the window height (to avoid missing elements due to excessive scrolling).
Scrolling is performed:
Recording the current position of the scroll bar as an initial position, triggering the browser to scroll downwards by a unit of scroll amount (such as the current position and the window height multiplied by 80%), waiting for 500ms, ensuring that the page is finished redrawing and dynamic loading, re-executing step S1 (screenshot) and step S2 (generating a task queue) after each scroll, and calling step S51 again to detect a submit button until a termination condition is met, wherein the accumulated scroll amount exceeds the total height of the page (representing the full amount of contents traversed).
According to the invention, through the self-adaptive rolling strategy, even if the submit button is positioned at the bottom of a long page or in a dynamic loading area, the submit button can be detected, the reasonable rolling step length and waiting time balance the detection efficiency and the page loading integrity, and the average detection time is shortened by 30%.
Step S53, when step S4 is initially executed, an empty operated element mark library (such as a set processed_elements) is established, XPath or unique identification of an element is added to the mark library every time a task item is completed, every element in the queue is traversed when a new task queue is generated after each scroll, if the element identification exists in the mark library, the element is removed from the queue, the invention prevents repeated operation on a filled input box or a clicked button, reduces invalid interaction, and avoids page state confusion caused by repeated operation (such as repeated request triggering by repeatedly clicking a submit button).
In the step S54, the boundary frame coordinates (left, upper, wide and high) of the submitted button are extracted from the output of the object detection model, the coordinates of the center point of the button are calculated to be (left+width/2, upper+height/2), if the button is positioned in the frame, the coordinates of the center point are mapped to the browser window by using the relative coordinate conversion method of S32, if the absolute coordinates are directly used in the main document, the invention calculates the center point through the boundary frame output by the model, and the invention has more accurate positioning than the traditional text-based or CSS selector, the success rate is increased to 98%, even if the website modifies the class or ID of the button, the positioning can still be accurately performed as long as the visual style is unchanged, and the maintenance cost is reduced by 50%.
In a preferred embodiment of the present invention, step S6, if the user jumps to the recharging page after submitting, extracts payment account information through a regular expression, and stores the payment account information in association with the virtual identity information in a database, including:
step S61, based on the page jump result triggered in the step S54, monitoring whether the URL of the new page contains a preset keyword, intercepting the complete HTML source code of the current page, and performing a double verification mechanism to obtain a recharging page passing verification;
and step S62, performing hierarchical analysis on the recharging page which passes the verification in step S61, standardizing and storing the data in association, and packaging and transmitting to a data construction module when the recharging page passes the verification in step S61.
In an embodiment of the present invention, the above steps may be implemented in a manner, for example,
In the step S61, the characteristic keywords (e.g. "recharge", "payaccount", "fund-transfer") of the recharging page are preset, the path or parameter portion of the URL of the new page is analyzed, whether any keyword is included is judged, the target detection model is called to identify whether UI elements (e.g. "payment account", "account opening line" text labels or account input boxes) related to recharging exist in the page, whether the HTML source code includes a form structure (e.g. "formaction ="/pay ">) related to payment or a specific JS file reference (e.g. payment SDK script) is checked, if the dual verification is passed, the complete HTML source code is intercepted and redundant contents such as comments and blank lines are removed, and if the dual verification is not passed (e.g. skip to an error page or advertisement page), the error URL is recorded and the subsequent operation is terminated.
In the step S62, text contents (such as < spanid = "account" >123456789</span >) in the HTML tag are matched through the regular expression, account information related fields are extracted, target data are positioned by combining upper and lower Wen Yuyi (such as keywords of "bank account", "payment account" and the like), interference information (such as false account in advertisements) is eliminated, and format cleaning is performed on the extracted account information:
The bank account removes blank spaces and special symbols, unifies the blank spaces and special symbols into pure numbers, the account opening row names are mapped to standard names (such as 'Chinese industry and commerce banks', which are uniformly abbreviated as 'work rows'), virtual identity information (such as generated random names and mobile phone numbers) of current operation is obtained from a task queue, one-to-one association is established between account information and identity information through task IDs, the associated data are written into a structured database (such as MySQL), fields comprise virtual names, mobile phone numbers, payment accounts, account opening rows, associated time stamps and the like, and if pages do not pass recharging page verification, original URLs, screenshot and source codes are packaged and transmitted to a data construction module for expanding a training set or optimizing verification rules.
The invention converts unstructured page data into service data which can be directly used through layering analysis and standardization processing, adapts to the data format requirement of a downstream system, and can be used for training a reverse feeding model without verified data to form a data closed loop of identification-verification-optimization, so that the identification capability of a system on a novel recharging page is continuously improved, the virtual identity and account information are strongly associated, complete context data is provided for subsequent batch recharging, fund management and other services, and the matching cost of manual data is reduced.
A business carrier automated registration login and funds account acquisition system, comprising:
The acquisition module is used for accessing the domain name of the target website through the headless browser, and capturing a current visible area after loading is completed to generate a page image;
The generation module is used for inputting the page image into a pre-trained target detection model, and identifying the category, the position coordinate and the confidence of the UI element in the page;
the conversion module is used for converting the coordinates of the elements in the frame in the interactive task queue into the coordinates of the relative frame and controlling the browser to switch to the corresponding frame;
The verification module is used for traversing the interactive task queue and sequentially executing the following operations of performing simulated clicking operation on button elements, injecting virtual identity information into input frame elements, calling a corresponding identification model for verification code elements and completing verification;
The processing module is used for detecting page rolling and submitting, detecting whether a visible submitting button exists on a current page after the interactive task queue operation is completed until the visible submitting button is identified, triggering the submitting operation if the visible submitting button is visible, and extracting payment account information through a regular expression and storing the payment account information and the virtual identity information in a database in a correlated manner if the payment account information is jumped to a recharging page after the submitting.
Data set preparation and model training:
In the embodiment, a structured image dataset covering 17 types of UI elements is constructed by means of screenshot, screening, clustering duplication removal, manual labeling and the like of a target field website, and a high-quality training sample is provided for a target detection model. In the embodiment, YOLOv series of models are selected as a UI identification framework, batch increment training strategies, CIoU bounding box loss optimization and multi-scale feature fusion structures are combined in the training process, and genetic algorithms are introduced to automatically tune super-parameters and model structures so as to improve detection accuracy, training efficiency and model generalization capability. The method comprises the following specific steps:
Step 1, the target website collects, screens and accesses and verifies and randomly extracts about 20,000 suspicious business website domain names from the domain name database. The system adopts an automatic script to access each domain name and performs preliminary screening according to page characteristics to remove the types of sites that 1) pages cannot be loaded or 404 are wrong, 2) the content is empty and irrelevant to business, and 3) non-standard structures are used and screen-capturing is not possible.
Step 2, the automatic screenshot process automatically executes the screenshot process on the screened websites according to the following sequence to ensure that key registration path pages are captured, namely 1) a front page popup window is closed (if the key registration path pages are available), 2) a registration button is clicked after the front page is loaded, 3) the registration pages are completely loaded and then rolled to the bottom of the pages to capture pages containing all form fields and the 'submit' button, and 4) the pages containing operation mode information are captured after registration and then jumped to a business operation page.
And 3, performing picture de-duplication and clustering de-redundancy to improve training efficiency, performing de-duplication processing on the screenshot images by using a Ward-based hierarchical clustering method, removing repeated samples, and reserving a clustering center image to ensure data diversity and representativeness.
And 4, manually labeling 17 types of UI elements in the screenshot images by using a Yolo _mark tool, wherein the labeling format accords with the txt format required by YOLOv series. Each file contains the element class number and its normalized coordinates in the image.
And 5, after the data organization and division are used for constructing a data set, dividing the data set into a training set, a verification set and a test set according to the proportion of 8:1:1, and ensuring the sample equilibrium distribution of three types of pages (a popup window, a home page and a registration page).
And 6, before the model is formally trained, the system introduces a genetic algorithm to perform global search and automatic optimization on key training parameters of YOLOv series models so as to improve the performance and generalization capability of the models in webpage UI element recognition tasks. The genetic algorithm is initialized, namely, a fitness function is defined, the performance index of the model on a verification set is used as an evaluation standard, and the fitness is used for measuring the individual quality and is a core basis for the genetic algorithm to perform 'selection-evolution'. Optimization parameter spaces include learning rate, lot size, anchor box size, multi-scale feature fusion structure configuration (FPN level), attention mechanism insertion location, etc. (2) And (3) population generation and evolution, namely initializing a system to generate a plurality of individuals (each individual is a group of parameter combinations), respectively carrying out model training (such as 10-20 rounds) on each individual in a small number of rounds in each generation, and calculating the fitness value of each individual. Standard genetic manipulation (selection, crossover, mutation) is then performed. (3) Iterative optimization, namely iterating the algorithm generation by generation according to the change condition of the fitness until a stopping condition is met (such as the fitness lifting amplitude is lower than a threshold value or the maximum algebra is reached), and finally reserving the individual parameter with the highest fitness for formal training.
Step 7, performing three-stage model training based on the optimization parameters, and performing model training in a three-stage mode on the basis of the genetic algorithm optimization result:
freezing the backbone network, training only the detection head to stabilize the basic recognition capability;
thawing part of the trunk layer, introducing a multi-scale training mechanism, and improving the detection precision of UI components with different sizes;
and fine tuning the whole network, and further optimizing the model performance by combining strategies such as increment samples, difficult lifting and the like.
Automatic registration flow:
In the embodiment, through the deep linkage of model identification and browser execution, the system automatically analyzes the webpage structure and executes simulated user behavior according to a priority mechanism, and the limitation of the fixed path of the traditional script is broken through. The method comprises the following specific steps:
Step 1, the initial access and page screenshot system calls SeleniumWebDriver through an automatic control module, starts a headless browser and accesses a target service class website domain name. After the page loading is completed, the screenshot of the current visible area is automatically processed and transmitted into the target detection model which is trained and completed in the embodiment 1.
And 2, the UI element identification and priority ordering model identifies 17 types of UI elements in the page, and outputs category labels, normalized position coordinates and confidence values. The system prioritizes the elements five levels according to a preset priority mechanism (IMMEDIATE, HIGH, MODERATE, LOW, LAST):
immediate priority, "close popup button" and cover critical area;
high priority, namely jump type components such as a registration button, a login button and the like;
Moderate priority, form input class component (username, password input box, etc.);
low priority: captcha component (graphic captcha, sliding captcha);
The last priority is that the components such as a submit button, a contact customer service and the like are not immediately interacted. After the priority is divided, an interactive task queue is generated according to descending order, and the interactive task queue comprises information such as element category, coordinates, confidence level, XPath and the like.
And step 3, converting the coordinates of the elements in the frames, if the elements are positioned in or in the frames, converting the coordinates into relative positions by the system, switching WebDriver to the corresponding frames, converting the coordinates into XPath by using a custom script, and ensuring the operation precision.
Step 4, the simulation operation is executed to traverse the task queue, and the corresponding operation is executed according to the category:
Clicking operation, namely simulating clicking buttons such as registration, closing, submitting and the like;
inputting the compliance identity information generated by the virtual information pool into the form field;
verification code processing:
a graphic verification code is identified and input by using an OCR model;
sliding verification code, calculating sliding distance through image processing and simulating dragging. After the operation is finished, the screen capturing is automatically performed, the identification is repeated, and the next flow is entered.
Step 5, after the interactive operation of page rolling and submitting detection is completed, the system analyzes the page state:
If a 'submit button' in the visible area is identified, directly executing the submission;
if the button is not identified or is located outside the window, automatically scrolling the page and re-identifying the page until the button is detected;
and establishing an interactive locking mechanism, marking the operated elements, and avoiding repeated triggering.
Step 6, judging whether to jump to a business operation page after the registration state is submitted:
Recording state and error screenshot, marking failure and transferring to a data module;
and successfully entering an information extraction process.
Extracting service information:
After the business operation page is successfully entered, the system analyzes the content through the matching of the regular expression and the key field, extracts information and stores the information in a standardized way. The method comprises the following specific steps:
Step 1, page analysis and information extraction are performed on a screenshot or HTML source code, and an operation mode and account information are identified through matching of a regular expression and a keyword, and the method comprises the following steps:
Payment account and account opener information, third party payment identification, electronic payment account and other compliance service fields.
And 2, carrying out format cleaning (such as space removal and standard name mapping) on the extracted information by information classification and standardization, and classifying according to labels such as type, source, time and the like.
And 3, associating the virtual identity information used for registration with the extracted service operation information by data association and storage, and storing the virtual identity information and the extracted service operation information into a back-end database to provide support for subsequent service analysis.
Example 4 exception and feedback mechanism
And through task state monitoring and failure sample feedback, abnormal identification and data reflow are realized, and model optimization and system iteration are supported. The method comprises the following specific steps:
Step 1, if the following exceptions are encountered in the registration flow, the operation is interrupted and the reason is recorded:
Element identification failure or misjudgment, page jump abnormality, verification code processing failure and logic interruption caused by repeated operation.
And step 2, feeding back a failure sample feedback failure task to a data module, and automatically storing screenshot and failure classification for incremental model training and system optimization.
Step 3, the locking mechanism and the maximum number of attempts are to prevent infinite loop, and the maximum number of operations is set for each type of UI element (not more than 2 times by default, except for a closing button, and up to 5 times are allowed).
The embodiment takes the service automation flow as a scene, and the technical scheme is displayed through the compliance description, so that the method is mainly characterized in that:
The model generalization capability is optimized and improved through clustering deduplication, layering labeling and genetic algorithm, full-process automation is realized based on task scheduling, cross-frame operation and dynamic rolling detection of priority, and the stability and adaptability of the system are continuously improved through failure sample reflow.
While the foregoing is directed to the preferred embodiments of the present invention, it will be appreciated by those skilled in the art that various modifications and adaptations can be made without departing from the principles of the present invention, and such modifications and adaptations are intended to be comprehended within the scope of the present invention.