CN113721896B - A method and device for optimizing financial fraud modeling language - Google Patents
A method and device for optimizing financial fraud modeling language Download PDFInfo
- Publication number
- CN113721896B CN113721896B CN202110712728.7A CN202110712728A CN113721896B CN 113721896 B CN113721896 B CN 113721896B CN 202110712728 A CN202110712728 A CN 202110712728A CN 113721896 B CN113721896 B CN 113721896B
- Authority
- CN
- China
- Prior art keywords
- node
- type
- event
- nodes
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims description 46
- 238000001514 detection method Methods 0.000 claims abstract description 70
- 238000006243 chemical reaction Methods 0.000 claims abstract description 61
- 238000012545 processing Methods 0.000 claims description 74
- 230000014509 gene expression Effects 0.000 claims description 43
- 230000009466 transformation Effects 0.000 claims description 9
- 238000005457 optimization Methods 0.000 abstract description 34
- 238000003672 processing method Methods 0.000 abstract description 18
- 230000006870 function Effects 0.000 description 21
- 238000012546 transfer Methods 0.000 description 17
- 230000008569 process Effects 0.000 description 15
- 230000002776 aggregation Effects 0.000 description 9
- 238000004220 aggregation Methods 0.000 description 9
- 238000013461 design Methods 0.000 description 9
- 230000009471 action Effects 0.000 description 8
- 238000004458 analytical method Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 6
- 230000006872 improvement Effects 0.000 description 5
- 230000000052 comparative effect Effects 0.000 description 4
- 238000005192 partition Methods 0.000 description 3
- 230000001960 triggered effect Effects 0.000 description 3
- 230000006399 behavior Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 238000003825 pressing Methods 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 230000000593 degrading effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000005096 rolling process Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/30—Creation or generation of source code
- G06F8/31—Programming languages or programming paradigms
- G06F8/315—Object-oriented languages
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2433—Query languages
- G06F16/2445—Data retrieval commands; View definitions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24553—Query execution of query operations
- G06F16/24554—Unary operations; Data partitioning operations
- G06F16/24556—Aggregation; Duplicate elimination
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/42—Syntactic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/42—Syntactic analysis
- G06F8/425—Lexical analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/43—Checking; Contextual analysis
- G06F8/436—Semantic checking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
- G06F8/447—Target code generation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/04—Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Business, Economics & Management (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Mathematical Physics (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- Technology Law (AREA)
- General Business, Economics & Management (AREA)
- Computing Systems (AREA)
- Devices For Executing Special Programs (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses an optimization processing method and device of a financial fraud modeling language, comprising the steps of generating FFML abstract syntax tree according to fraud detection rules written by using the financial fraud modeling language FFML, judging node types of nodes, generating target conversion data according to left return values, comparison return values and right return values of the nodes if the node types of the nodes are SingleCondition, and generating SQL codes corresponding to fraud detection rules according to the target conversion data. By applying the invention, the fraud detection rule written by FFML can be quickly converted into the SQL programming language which can be identified by the streaming platform.
Description
Technical Field
The invention relates to the technical field of computers, in particular to an optimization processing method and device for a financial fraud modeling language.
Background
With the advancement of modern technologies such as the internet and mobile computers, the number of types of financial fraud is increasing. To cope with the new type of financial fraud, automated financial fraud detection methods using computer technology have been developed, and automated financial fraud detection methods using computer technology are classified into passive fraud and active fraud. Active fraud detection introduces real-time streaming techniques into the field of financial fraud detection, enabling transaction request detection to be real-time.
The active fraud depends on detection rules established by domain experts, and in general, the domain experts firstly propose and explain new fraud detection rules to IT encoding staff, then the IT encoding staff carries out actual platform code writing, and finally the IT encoding staff can deploy to a stream processing platform for real-time fraud monitoring.
However, due to the fact that the industry difference between the field expert and the IT encoder is large, the problems of low communication efficiency, high misunderstanding rate and the like exist, and the novel fraud detection rule needs a long time to achieve actual deployment, so that large economic loss can be caused. How to convert the modeling language aiming at financial fraud, which is used by the domain expert, into a programming language which can be identified by the platform is a problem yet to be solved.
Disclosure of Invention
The invention provides an optimization processing method and device for a financial fraud modeling language, which are used for overcoming at least one technical problem in the prior art.
According to a first aspect of an embodiment of the present invention, there is provided a method for optimizing a financial fraud modeling language, including:
Generating FFML abstract syntax trees corresponding to fraud detection rules according to fraud detection rules written using a financial fraud modeling language (FINANCIAL FRAUD MODELLING LANGUAGE, FFML);
Judging the node type of each node in the FFML abstract syntax tree by traversing the nodes;
if the node type of the node is SingleCondition, converting the Boolean expression in the data stream according to the left return value of the left expression sub-node of the node, the comparison return value of the comparison operator sub-node and the right return value of the right expression sub-node to generate target conversion data;
and generating a structured query language (Structured Query Language, SQL) code corresponding to the fraud detection rule according to the target conversion data.
According to a second aspect of an embodiment of the present invention, there is provided an optimization processing device for a financial fraud modeling language, including:
The device comprises a first generation module, a first judgment module, a third generation module and a fourth generation module;
the first generation module is configured to generate FFML abstract syntax tree corresponding to a fraud detection rule according to the fraud detection rule written using the financial fraud modeling language FFML;
The first judging module is configured to judge a node type of the node by traversing each node in the FFML abstract syntax tree;
the third generating module is configured to, if the node type of the node is SingleCondition, convert a boolean expression in the data stream according to a left return value of a left expression sub-node of the node, a comparison return value of a comparison operator sub-node, and a right return value of a right expression sub-node, and generate target conversion data;
And the fourth generation module is used for generating SQL codes corresponding to the fraud detection rules according to the target conversion data.
The innovation points of the embodiment of the invention include:
1. the invention can generate FFML abstract grammar tree corresponding to the fraud detection rules based on the fraud detection rules written by using the financial fraud modeling language FFML, further generate corresponding conversion data according to node types of all nodes in the FFML abstract grammar tree, finally generate SQL codes corresponding to the fraud detection rules according to all conversion data, and quickly convert the fraud detection rules written by using the financial fraud modeling language FFML into SQL programming languages which can be identified by a streaming platform.
2. The invention can determine the processing flow of the nodes of each node type according to the node type of each node in FFML abstract syntax tree so as to realize the accurate conversion of the financial fraud modeling language FFML, and is one of the innovation points of the embodiment of the invention.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of one embodiment of the present invention;
FIG. 2 is a general frame diagram of a back end design module of the present invention;
FIG. 3 is a first FFML abstract syntax tree in accordance with the present invention;
FIG. 4 is a schematic view of yet another embodiment of the present invention;
FIG. 5 is a flow chart showing the sub-step process of step 511 of the present invention;
FIG. 6 is a second FFML abstract syntax tree according to the present invention;
FIG. 7 is a schematic diagram of the configuration of the optimizing processing device of the financial fraud modeling language of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without any inventive effort, are intended to be within the scope of the invention.
With the advancement of modern technologies such as the internet and mobile computers, the number of types of financial fraud is increasing. To cope with the new type of financial fraud, automated financial fraud detection methods using computer technology have been developed, and automated financial fraud detection methods using computer technology are classified into passive fraud and active fraud. Active fraud detection introduces real-time streaming techniques into the field of financial fraud detection, enabling transaction request detection to be real-time.
The active fraud depends on detection rules established by domain experts, and in general, the domain experts firstly propose and explain new fraud detection rules to IT encoding staff, then the IT encoding staff carries out actual platform code writing, and finally the IT encoding staff can deploy to a stream processing platform for real-time fraud monitoring.
However, due to the fact that the industry difference between the field expert and the IT encoder is large, the problems of low communication efficiency, high misunderstanding rate and the like exist, and the novel fraud detection rule needs a long time to achieve actual deployment, so that large economic loss can be caused.
In order to solve the problems, the invention provides an optimization processing method and device for a financial fraud modeling language, which can quickly convert fraud detection rules written by using the financial fraud modeling language FFML into an SQL programming language which can be identified by a streaming platform, and has high processing efficiency and real-time performance.
The invention provides an optimization processing method and device for a financial fraud modeling language.
Referring to fig. 1, fig. 1 is a schematic diagram of an embodiment of the present invention. As shown in fig. 1, the optimizing processing method of the financial fraud modeling language includes the following processing steps:
step 101, according to fraud detection rules written by using the financial fraud modeling language FFML, generating FFML abstract syntax trees corresponding to the fraud detection rules.
In the step, the technical effects of generating a programming language which can be identified by a stream platform by codes through semantic analysis are realized by using a technical expert in the field to use a fraud detection rule written by a financial fraud modeling language FFML, firstly generating symbol streams through a lexical analyzer, generating a grammar tree through the grammar analysis, wherein the grammar tree cannot be directly used as input of semantic analysis, so that the grammar tree can be converted into an intermediate grammar representation, namely a FFML abstract grammar tree, and then realizing the semantic analysis based on the FFML abstract grammar tree through the subsequent steps.
It should be noted that, a bridge is required between the parsing and the semantic parsing, and the parse tree (also called a specific parse tree) directly obtained in the parsing contains a lot of redundant syntax structure information and cannot be directly used as input of the speech parsing, so that an abstract syntax tree needs to be constructed in the process of the parsing as a middle representation of the grammar connecting the front end and the rear end.
Step 103, judging the node type of the node by traversing each node in the FFML abstract syntax tree, and executing step 107 if the node type of the node is SingleCondition.
It should be noted that, in the implementation, step 101 may be implemented by a front-end design module of the optimization processing method of the financial fraud modeling language, where, specifically, the function of the front-end design module is to convert the fraud detection rule written using the financial fraud modeling language FFML into a FFML abstract syntax tree corresponding to the fraud detection rule.
Steps 103 through 109 may be implemented by a back-end design module of an optimization processing method of the financial fraud modeling language.
The overall frame of the back end design module may refer to fig. 2, and fig. 2 is an overall frame diagram of the back end design module according to the present invention. The visitor module is a body module in the whole back-end design module, traverses FFML abstract syntax trees generated by the front-end design module, constructs code conversion logic in the traversal process, and then generates specific stream processing codes by calling the template module. In the code conversion process, the symbol table and the built-in function module are required to cooperate, and the generated stream processing code is subjected to targeted optimization according to the code optimization module. The portion outlined by the dotted line in fig. 2 is a specific composition of the entire conversion back end.
The specific functions of each module in the back-end design module are as follows:
And the visitor module is used for gathering semantic actions required by code conversion, traversing FFML abstract syntax trees and realizing concrete semantic analysis in cooperation with other modules in the traversing process.
The symbol table is used for storing some symbols and attribute information thereof encountered in the semantic analysis process, so that different parts of the visitor can conveniently access public information.
Built-in function module FFML, the language allows the user to call some built-in functions, such as TOTALDEBIT, BADACCOUNT, and the code conversion of the built-in functions is processed uniformly by the built-in function module.
And the template module adopts a built-in code template to fill corresponding templates by visitors to generate final codes in order to avoid errors and unify output forms when generating the target codes.
And the code optimization module is used for guiding visitors to generate high-efficiency stream processing codes by defining a plurality of different code optimization methods for different stream processing codes according to different execution efficiencies of the finally generated operator graphs of the stream processing system.
The present invention focuses on the visitor module.
The translation conversion methods of different languages mainly comprise grammar guidance methods, rule-based methods and model-based methods. The method is more flexible than the method based on grammar guidance, more efficient and easy to read than the method based on rules, and more universal in the industry.
The core of the model-based approach is by building an intermediate representation model of the grammar, and then all speech-related actions are expanded around this model. The invention adopts the abstract syntax tree as an intermediate representation model, and then adopts a visitor to traverse the abstract syntax tree to complete concrete semantic conversion actions.
The visitor mode defines a single visitor, the semantic actions of different abstract syntax tree nodes are all gathered together, the abstract syntax tree nodes are used as parameters, and different operations are executed according to different types of nodes. Compared with the method that the semantic actions of the abstract syntax tree are directly embedded into the heterogeneous abstract syntax tree, the visitor mode is more flexible and easy to expand.
The structure of fraud detection rules written using the financial fraud modeling language FFML is shown in table 1. The method mainly comprises four parts, namely rule naming, event sequence, condition definition and action definition. "rule naming" designates an ID to the currently defined rule, an "event sequence" indicates when what event is detected, the following operations are performed, a "condition definition" defines whether the variables in the event are eligible here when a trigger event is detected, and if so, the relevant actions defined by the "action definition" are triggered. The main part of fraud detection rules written using the financial fraud modeling language FFML is the sequence of events and condition definitions.
In this step, the visitor module determines the node type of the node by traversing each node in the FFML abstract syntax tree, and if the node type of the node is SingleCondition, step 107 is executed.
It should be noted that, the detailed description is given here only for the processing manner of the nodes of the two node types, and it is not meant to represent the processing manner of the nodes capable of processing only the two node types, and the processing manner of the nodes of the other node types will be described later.
In a specific implementation, the visitor module judges the node type of the node by traversing each node in the FFML abstract syntax tree, and if the node type of the node is SINGLEEVENT, the visitor module can screen events meeting the parameter requirements from a preset data stream according to the parameter requirements of the child nodes of the node of the SINGLEEVENT type to generate first conversion data.
The parameter requirements may include a time parameter time, an event sequence parameter events and operation information, and the operation information includes a channel and operation behaviors in the channel.
Note that SINGLEEVENT types of nodes include two child nodes.
Specifically, the method comprises the following steps:
In a first step, two child nodes of a SINGLEEVENT type node are accessed, the return value of a first child node is saved as a first variable, and the return value of a second child node is saved as a second variable.
Specifically, the first variable may be denoted as channel and the second variable may be denoted as params.
And a second step of determining the event type defined by the SINGLEEVENT type node according to the second variable.
Specifically, SINGLEEVENT types of nodes define two types of events, including simple independent events and responsible sequence events. The judgment can be performed through the return value of the second child node, namely the type of the second variable params, if the second variable params is a character string, the event defined by the node of the current SINGLEEVENT type is a simple independent event, then the processing flow corresponding to the simple independent event is entered, and if the second variable params is a list, the event defined by the node of the current SINGLEEVENT type is a responsible sequence event, then the processing flow corresponding to the complex sequence event is entered.
And thirdly, screening target events meeting the parameter requirements from a preset data stream by executing a processing flow corresponding to the event type, and generating first conversion data.
Specifically, according to the parameter requirements carried in the first variable channel or the second variable params, selecting a target event meeting the conditions from all event lists, generating a new table corresponding to the target event, and marking the new table as first conversion data.
And step 107, converting the Boolean expression in the data stream according to the left return value of the left expression sub-node of the node, the comparison return value of the comparison operator sub-node and the right return value of the right expression sub-node, and generating target conversion data.
The boolean expression includes a comparison expression, for example, a >1, b < = 2, i.e. with a comparison operator, which includes: >, =, < =, > =, +| =.
Note that the child nodes of the SingleCondition node type nodes are in the form of fixed left, compare operators, and right expressions.
In this step, when the node type of the node is SingleCondition, first, the first child node, i.e., the left expression node, is accessed to obtain its return value lhs, and the node is recursively processed down and then divided into three types of child nodes, namely, simple Event Variable (EVENTPARAM), query (Query), and history Query (HISTSTATEMENT).
For the nodes of the simple Event Variable (EVENTPARAM) class, the events and the variables are directly returned, for the nodes of the Query (Query) class, a stream window aggregation conversion method or a stream processing system user-defined function (UDF) conversion method is adopted, and for the nodes of the historical Query (HISTSTATEMENT) class, a processing mode corresponding to the nodes of the HISTSTATEMENT type is adopted for processing.
Next, a second child node, compare operation Fu Jiedian, is accessed, resulting in its return value op.
Then, the third child node, the right expression node, is accessed to obtain its return value rhs.
Finally, the comparison expression code conversion is carried out through lhs, op and rhs, the specific conversion is realized through connection (Join) and condition selection (white) in SQL language, firstly, the lhs and the rhs are connected through connection (Join), and then, the condition judgment is carried out through the condition selection (white) grammar.
For example, consider the FFML abstract syntax tree in FIG. 3:
as shown in fig. 3, for the first SingleCondition nodes, it corresponds to FFML code QUERY TOTALDEBIT (ATM, 2) <=500.
The first sub-node, namely the left expression node, is accessed, the node is a query node, can be optimized by adopting a built-in function, can also be a stream window aggregation conversion method, and if the stream window aggregation conversion method is adopted, the specific flow is as follows:
(a) The TOTALDEBIT function represents the total amount of transactions in the last n days, here the total amount of transactions through ATM channels in the last 2 days, which are first aggregated in two days as a window, namely:
CREATE TEMPORARY VIEW`procedure_1`AS(SELECT accountnumber,
SUM(`value`),AS totaldebit,TUMBLE_END(rowtime,INTERVAL`2`DAY)AS rowtime FROM event_8GROUP BY accountnumber,TUMBLE(rowtime,INTERVAL`2`DAY))
A new table procedure_1 is obtained.
(B) Since TOTALDEBIT only requires the last N days of data, the last entry in the table needs to be fetched, using TOP_N syntax, i.e
CREATE TEMPORARY VIEW`procedure_2`AS(SELECT accountnumber,totaldebit,rowtime FROM(SELECT*,ROW_NUMBER()OVER(PARTITION BY accountnumber ORDER BY rowtime DESC)as rownum FROM produce_1)WHERE rownum<=1)
(C) The return left operand lhs is (procedure_2, total bit).
In the second step, a second child node, i.e., comparison operation Fu Jiedian, is accessed, resulting in op as < =.
And thirdly, accessing a third child node, namely a right expression node, to obtain rhs of 500.
Fourth, comparative expression conversion is performed using WHERE grammar, i.e
CREATE TEMPORARY VIEW`comparison_1`AS(SELECT accountnumber,rowtime FROM procedure_2 WHERE`totaldebit`<=500.0)
Fifth, the whole information is selected from the complete event list, namely:
CREATE TEMPORARY VIEW`condition_1`AS(SELECT*FROM event_7,comparison_1WHERE event_7.accountnumber=comparison_1.accountnumber AND event_7.rowtime>=comparison_1.rowtime)
As shown in fig. 3, for the second SingleCondition nodes, it corresponds to transfer.
In the first step, the left expression node is accessed, and the node is a simple variable node, and the event variable is directly returned, namely ("transfer", "value").
In the second step, the comparison operation Fu Jiedian is accessed, resulting in op=.
And thirdly, accessing the right expression node of the rhs to obtain the rhs of 500.
Fourth, directly selecting the event meeting the condition through the SELECT syntax, namely:
CREATE TEMPORARY VIEW`comparison_2`AS(SELECT*FROM transfer WHERE`value`>=500.0)
Fifth, reading the current table in the symbol table, denoted as condition_1, and requiring intersection of comparison _2 and condition_1, namely:
CREATE TEMPORARY VIEW`condition_2`AS(SELECT*FROM comparison_2 WHERE id IN(SELECT id FROM condition_1)).
As shown in fig. 3, for the third SingleCondition node, which corresponds to the hit (4) [ QUERY TOTALDEBIT (ONL) > =100 ] > =1, is a historical data query point, and a specific description of the specific procedure may refer to the relevant description of the HISTSTATEMENT type node.
And step 109, generating SQL codes corresponding to the fraud detection rules according to the target conversion data.
In the step, the fraud detection rules written by using the financial fraud modeling language FFML are converted into SQL programming languages which can be identified by the streaming platform by corresponding processing modes aiming at different types of nodes, so that the processing efficiency is high and the real-time performance is realized.
In a specific implementation, the SQL code corresponding to the fraud detection rule may be generated by using the first transformation data and the target transformation data.
Therefore, in the optimizing processing method of the financial fraud modeling language, which is provided by the invention, the FFML abstract syntax tree corresponding to the fraud detection rule can be generated based on the fraud detection rule written by using the financial fraud modeling language FFML, further, corresponding conversion data is generated according to the node type of each node in the FFML abstract syntax tree, finally, SQL codes corresponding to the fraud detection rule are generated according to each conversion data, the fraud detection rule written by using the financial fraud modeling language FFML can be quickly converted into SQL programming language which can be identified by a streaming platform, the processing efficiency is high, and the real-time performance is realized.
In one implementation, HISTSTATEMENT types of nodes are used to query the historical data for data that satisfies the condition, with two child nodes, one being the number of entries to query and the other being the query condition.
The processing manner for HISTSTATEMENT types of nodes is as follows:
The first step is to access the first child node of HISTSTATEMENT types of nodes, obtain the number of entries to be queried, record d, write it into the symbol table at hist_days, and then access the conditional node for use.
And accessing a second child node of the HISTSTATEMENT type node, namely a conditional node, and storing return values of t and k, wherein t is a newly generated table, and k is a key value corresponding to the query condition.
And thirdly, recovering the hist_days in the symbol table to be 1.
Fourth, COUNT (COUNT) aggregation is performed with entries of the same k in t, as a new column, a new table is created and returned.
For example, referring to FIG. 3, FIG. 3 is a first FFML abstract syntax tree in accordance with the present invention. Take FFML abstract syntax tree in fig. 3 as an example for illustration:
as shown in fig. 3, in a first step, the first child node of the HISTSTATEMENT type node is accessed, the number d of entries to be queried is 4, and hist_days in the symbol table is set to 4.
In a second step, a second child node, the conditional node, of the HISTSTATEMENT type node is accessed, creating the following three new tables, the functions being aggregation, TOPN selection and comparative expression data filtering, respectively.
CREATE TEMPORARY VIEW`procedure_3`AS(SELECT accountnumber,SUM(`value`)AS totaldebit,TUMBLE_END(rowtime,INTERVAL`1`DAY)AS rowtime FROM event_9GROUP BY accountnumber,TUMBLE(rowtime,INTERVAL`1`DAY))
CREATE TEMPORARY VIEW`procedure_4`AS(SELECT accountnumber,totaldebit,rowtime FROM(SELECT*,ROW_NUMBER()OVER(PARTITION BY accountnumber ORDER BY rowtime DESC)as rownum FROM procedure_3)WHERE rownum<=4)
CREATE TEMPORARY VIEW`comparison_3`AS(SELECT accountnumber,rowtime FROM procedure_4 WHERE`totaldebit`>=100.0)
And thirdly, recovering the hist_days in the symbol table to be 1.
Fourth, the data in comparison _3 table is subjected to COUNT aggregation, and a new table count_1 is generated as follows.
CREATE TEMPORARY VIEW`count_1`AS(SELECT accountnumber,COUNT(*)AS daycount,MAX(rowtime)AS rowtime FROM comparison_3GROUP BY accountnumber)
In specific implementation, the invention further provides an optimization processing method of the financial fraud modeling language.
Referring to fig. 4, fig. 4 is a schematic diagram of a further embodiment of the present invention. As shown in fig. 4, the optimizing processing method of the financial fraud modeling language includes the following processing steps:
Step 501, according to fraud detection rules written using the financial fraud modeling language FFML, a FFML abstract syntax tree corresponding to the fraud detection rules is generated.
The detailed description of this step may refer to step 101 in the optimization processing method of the financial fraud modeling language shown in fig. 1.
Step 503, judging the node type of the node by traversing each node in the FFML abstract syntax tree, if the node type of the node is SINGLEEVENT, executing step 505, if the node type of the node is SingleCondition, executing step 513, if the node type of the node is EVENTSTATEMENT, executing step 515, and if the node type of the node is ConditionStatement, executing step 517.
It should be noted that, only the processing manner of the nodes of the four node types will be described in detail herein, and it is not intended to represent that only the nodes of the four node types can be processed.
Step 505, two child nodes of the SINGLEEVENT type node are accessed, the return value of the first child node is stored as a first variable, the return value of the second child node is stored as a second variable, and step 507 is executed.
Step 507, judging whether the second variable is a character string or a list, if the second variable is a character string, determining that the event type is a simple independent event, executing step 509, and if the second variable is a list, determining that the event type is a complex sequence event, executing step 511.
Step 509, when the event type is a simple independent event, selecting a target event meeting the parameter requirement from a preset data stream by executing a first processing flow corresponding to the simple independent event, and generating first conversion data.
In this step, if the second variable params is a character string, the event defined by the node of the current SINGLEEVENT type is a simple independent event, then a first processing flow corresponding to the simple independent event is entered, and a target event meeting the parameter requirement is screened from a preset data stream to generate first conversion data.
The first process flow includes directly returning its events and variables. The simple independent event only defines a certain operation behavior a of the account on a certain channel c, so that the selection grammar can be directly adopted to SELECT all a operations of the account to be executed through the channel c.
Step 511, when the event type is a complex sequence event, selecting a target event meeting the parameter requirement from a preset data stream by executing a second processing flow corresponding to the complex sequence event, generating first conversion data, and executing step 521.
In this step, if the second variable params is a list, the event defined by the node of the current SINGLEEVENT type is a responsible sequence event, and then a second process flow corresponding to the complex sequence event is entered.
It should be noted that the complex sequence event is composed of two parts, namely a sequence time and a sequence event group. The sequence event defines the maximum time span allowed by the occurrence of the sequence event, and the sequence event group defines the precedence relationship of the occurrence of the event.
The second processing flow comprises the steps of firstly obtaining the time span parameter time and the event sequence parameter events through a params list, then merging the tables corresponding to the events in the events through a UNION ALL grammar, merging the tables only according to the public values required by judging the events, wherein the public values required by judging the events comprise event IDs, account IDs, event types and event times, the merged tables are all_events, then, adopting a Complex Event Processing (CEP) MATCH grammar to generate a new table m from the all_events tables according to the events conforming to the sequence time and the sequence event groups, finally, storing the basic information of hit events in the new table m, selecting the complete information of the hit events from the corresponding event tables through a SELECT grammar, and creating and returning to a target event table n.
Complex sequence events include compound events, such as ONL SEQ (10) (password_change, transfer) indicating that an account is connected to perform password modification and transfer operations in ONL channels within 10 seconds/min.
Optionally, referring to fig. 5, fig. 5 is a flow chart of sub-step processing of step 511 in the present invention. As shown in fig. 5, step 611 specifically includes the following sub-steps:
A substep 61, obtaining a time span parameter time and an event sequence parameter events from the second parameter params.
And a sub-step 62, merging the tables corresponding to the events in the event sequence parameters, and generating a merging table all_events, wherein the merging table all_events comprises the basic information of the events.
And a sub-step 63, selecting a target event meeting the time span parameter time requirement from the events in the merging table all_events, and generating a target event table.
For example, referring to FIG. 3, FIG. 3 is a first FFML abstract syntax tree in accordance with the present invention. Take FFML abstract syntax tree in fig. 3 as an example for illustration:
in the first step, as shown in fig. 3, for the first SINGLEEVENT type node, corresponding to ONL SQE (5) [ password_change, transfer ] in FFML rule, two child nodes are first accessed to obtain variables, namely "ONL" and list [5 ], and "password_change", "transfer", respectively, and the parameters are a list, so that complex sequence events are performed, and then complex sequence processing is performed.
(A) The time span parameter time and the event sequence parameter events are obtained by params and are respectively 5 and [ "password_change", "transfer" ].
(B) The events were merged using the UNION ALL syntax, resulting in the following three new tables:
CREATE TEMPORARY VIEW`event_1`AS(SELECT*FROM`password_change`WHERE change=`ONL`)
CREATE TEMPORARY VIEW`event_2`AS(SELECT*FROM`transfer`WHERE change=`ONL`)
CREATE TEMPORARY VIEW`event_3`AS(SELECT id,accountnumber,rowtime,eventtype FROM`event_1`)UNION ALL(SELECT id,accountnumber,rowtime,eventtype FROM`event_2`))
The event_1 selects the password_change event of the ONL channel, the event_2 selects the transfer event, and the event_3 merges the event related meta information common to the two tables into one table.
(C) And carrying out complex event processing through stream processing MATCH grammar to obtain the following codes:
(d) Since only the basic information of the hit event is stored in the table event_4, all the information of the hit event is selected from the corresponding event table through the SELECT syntax, and the target event table event_5 is created.
(E) And returning to the target event table event_5.
In the second step, as shown in fig. 3, for the second SINGLEEVENT type node, ATM [ transfer ] in the corresponding FFML rule first accesses two child nodes thereof to obtain variable channel= "ATM", and params is a character string, thus being a simple independent event, and the processing of the simple independent event is performed below.
(A) Simple independent event directly adopts SELECT syntax to SELECT channel event, namely CREATE TEMPORARY VIEW ' event_6as (SELECT FROM TRANSFER WHERE CHANNEL = ' ATM '
(B) Get and return the target event table event_6
And step 513, converting the Boolean expression in the data stream according to the left return value of the left expression sub-node of the node, the comparison return value of the comparison operator sub-node and the right return value of the right expression sub-node to generate target conversion data, and executing step 521.
The detailed description of this step may refer to step 107 in the optimization processing method of the financial fraud modeling language shown in fig. 1.
Step 515, by traversing the child nodes of the node, executing the processing flow of the child nodes, obtaining the SQL table name of each child node, and storing the SQL table name in the events list, and executing step 516.
Note that EVENTSTATEMENT type nodes support defining multiple or events, EVENTSTATEMENT type nodes with sub-nodes of type SINGLEEVENT, i.e., a single independent event or sequence of events.
In this step, when the node type of the node is EVENTSTATEMENT, first, the child nodes of the node of EVENTSTATEMENT type are traversed, SINGLEEVENT processing flows corresponding to the child nodes of SINGLEEVENT type are executed, the SQL table of the child nodes of SINGLEEVENT type is obtained, and the SQL table is stored in the events list.
Step 516, merging the contents of all SQL tables in the events list to generate third conversion data, and executing step 521.
Specifically, the contents of ALL SQL tables in the events list may be merged by a UNION ALL operator.
In this step, since EVENTSTATEMENT types of nodes only support or event, the contents of ALL SQL tables in the events list can be merged, i.e. ALL contents in the SELECT single table, then merged by using the UNION ALL operator to generate a new stream processing table, and the new stream processing table is written into the event_table in the symbol table, and then the values are needed for the processing of the condition definition related nodes, and the new stream processing table is used as the third conversion data.
For example, referring to FIG. 6, FIG. 6 is a second FFML abstract syntax tree in accordance with the present invention. Take FFML abstract syntax tree in fig. 6 as an example for illustration:
As shown in fig. 6, in a first step, the sub-nodes of the EVENTSTATEMENT type node, that is, the two sub-nodes of the SINGLEEVENT type node, are traversed, and the sub-nodes of the SINGLEEVENT type are accessed by calling the processing flow corresponding to the SINGLEEVENT type node, so as to obtain the return values, namely event_5 and event_6.
In the second step, the events event_5 and event_6 can be combined by UNION ALL, i.e
CREATE TEMPORARY VIEW`event_7`AS((SELECT*FROM event_5)UNION ALL(SELECT*FROM event_6))。
And thirdly, setting the event_table in the symbol table as event_7.
Step 517, sequentially accessing each child node of the ConditionStatement type node, judging whether the logic operation after each child node is an AND operation or an OR operation, if the logic operation is an AND operation, updating the current table in the symbol table into a stack top element, and if the logic operation is an OR operation, updating the current table in the symbol table into a value corresponding to the event_table in the symbol table, and executing step 519.
Wherein the ConditionStatement type nodes comprise a plurality of SingleCondition type nodes which are connected through logical symbols AND and OR.
In this step, when the node type of the node is ConditionStatement, first, each child node of the node of ConditionStatement type is sequentially accessed until all child nodes are accessed, and the processing flow for each child node is as follows:
The first step, the return value of the child node is used as a stack top element, the logic operation behind the child node is judged, if the logic operation is an AND operation, the second step is executed, and if the logic operation is an OR operation, the third step is executed.
And secondly, updating a stack top element of a current table in the symbol table, and popping up the stack top element.
And thirdly, updating the current table in the symbol table to a value corresponding to the event_table in the symbol table.
For example, consider the FFML abstract syntax tree in FIG. 3:
As shown in fig. 3, in a first step, a first child node of a ConditionStatement type node is accessed, a return value condition_1 is obtained by calling SingleCondition an access function, the return value condition_1 is pressed to the stack top, and then a second child node is accessed to obtain the logical operation of the second child node as an and operation.
And secondly, updating the current table in the symbol table to be condition_1, and popping up the stack top element.
And thirdly, accessing the second child node to obtain a return value condition_2, pressing the return value condition_2 to the stack top, and determining the logical operation of the third child node to be an OR operation.
Fourth, the current table in the symbol table is updated to be the value corresponding to the event_table in the symbol table, namely, event_7.
And fifthly, accessing the third child node to obtain a return value condition_3.
Sixth, the two remaining tables in the stack are combined, namely:
CREATE TEMPORARY VIEW`condition_4`AS((SELECT*FROM condition_2)UNION ALL(SELECT*FROM condition_3))
In step 519, after all child nodes of the ConditionStatement type node access is completed, all tables in the stack are merged to generate fourth translation data.
In this step, after ALL child nodes of the ConditionStatement type node access is completed, ALL tables in the stack are merged by UNION ALL to obtain a new table, and the new table is written into the condition_table in the symbol table.
And step 521, generating an SQL code corresponding to the fraud detection rule according to the first conversion data, the target conversion data, the third conversion data and the fourth conversion data.
In the step, the fraud detection rules written by using the financial fraud modeling language FFML are converted into SQL programming languages which can be identified by the streaming platform by corresponding processing modes aiming at different types of nodes, so that the processing efficiency is high and the real-time performance is realized.
Therefore, in the optimizing processing method of the financial fraud modeling language, which is provided by the invention, the FFML abstract syntax tree corresponding to the fraud detection rule can be generated based on the fraud detection rule written by using the financial fraud modeling language FFML, further, corresponding conversion data is generated according to the node type of each node in the FFML abstract syntax tree, finally, SQL codes corresponding to the fraud detection rule are generated according to each conversion data, the fraud detection rule written by using the financial fraud modeling language FFML can be quickly converted into SQL programming language which can be identified by a streaming platform, the processing efficiency is high, and the real-time performance is realized.
In one implementation, fraud detection rules written using the financial fraud modeling language FFML can be quickly translated into a SQL programming language that can be recognized by a Flink-based platform. The invention can optimize the performance of the generated SQL codes according to the characteristics of the Flink stream processing system, and specifically comprises the following four aspects:
First, UNION ALL optimization.
The UNION ALL operation of the stream processing system is essentially different from the merging operation of the database tables and requires special handling. The UNION ALL is actually simply a combination of two data streams together into the next operator inside the stream processing system. Since the stream processing system operator operation is time driven, e.g., the window operation is triggered only when the watermark exceeding the window end time reaches the current operator, additional attention is required to the merging and streaming of the data streams. For an operator with multiple input streams, the flank stream processing system takes the minimum value of the input stream time of the operator time, which can lead to that if one input stream does not come with data, i.e. no new watermark comes, no matter how other input stream times advance, and the concurrent operator time does not advance, i.e. no new time watermark is sent downstream, the stream processing system time will be blocked at the operator, and the operation triggered by the time required by the subsequent operator will not be executed.
In the invention, the data streams are not merged by UNION ALL in the generated code, but a subsequent operator is configured for each data stream, namely merging is avoided by doubling the operators.
For example, in the optimization processing method of a financial fraud modeling language shown in fig. 4, the example of "illustration" in step 516 is taken as an example, and the process flow after "UNION ALL optimization" is used is described as being changed specifically by "second step" and "third step".
Specifically, after using "UNION ALL optimization", examples are as follows:
As shown in fig. 6, in a first step, the sub-nodes of the EVENTSTATEMENT type node, that is, the two sub-nodes of the SINGLEEVENT type node, are traversed, and the sub-nodes of the SINGLEEVENT type are accessed by calling the processing flow corresponding to the SINGLEEVENT type node, so as to obtain the return values, namely event_5 and event_6.
Second, the event_table in the symbol table is set to a list [ event_5, event_6].
For another example, in the optimization processing method of the financial fraud modeling language shown in fig. 4, after the "UNION ALL optimization" is used, the table event_3 may not be created, and the tables with different names and the same actual content may be combined into one table, which may greatly reduce the number of tables, and further reduce the number of operators that are finally generated.
The invention can check each newly created table by constructing the global view information table during conversion, and if the newly created table exists, the newly created table ID is directly returned. The key value of the global view information table is formed by a certain combination of the template name for creating the table and the value of the filling item, so that the specific meaning of the table can be accurately and uniquely expressed.
Specifically, after "UNION ALL optimization" is used, "the sixth step" is modified, and the improvement flow is as follows:
Take FFML abstract syntax tree in fig. 3 as an example for illustration:
As shown in fig. 3, in a first step, a first child node of a ConditionStatement type node is accessed, a return value condition_1 is obtained by calling SingleCondition an access function, the return value condition_1 is pressed to the stack top, and then a second child node is accessed to obtain the logical operation of the second child node as an and operation.
And secondly, updating the current table in the symbol table to be condition_1, and popping up the stack top element.
And thirdly, accessing the second child node to obtain a return value condition_2, pressing the return value condition_2 to the stack top, and determining the logical operation of the third child node to be an OR operation.
Fourth, the current table in the symbol table is updated to be the value corresponding to the event_table in the symbol table, namely, event_7.
And fifthly, accessing the third child node to obtain a return value condition_3.
And sixthly, carrying out a pass from the first step to the fifth step on all tables in the event_table in the symbol table to obtain 4 new tables, namely, condition_1, condition_2, condition_3 and condition_4.
For another example, taking the example of "illustration" in the sub-step 63 as an example in the optimization processing method of a financial fraud modeling language shown in fig. 4, after "UNION ALL optimization" is used, the "UNION ALL operation" is removed in the "(b)" in the "first step", that is, the table event_3 is not created, and the tables with different names and identical actual contents are combined into one table, so that the number of tables can be greatly reduced, and the number of operators to be finally generated is further reduced.
Second, table deduplication optimization.
Table deduplication optimization involves the access of each node, merging two tables with the same definition.
For example, CREATE TEMPPORARY VIEW 'event_4' as (SELECT FROM TRANSFER WHERE CHANNEL = 'ATM'
CREATE TEMPPORARY VIEW 'event_5' as (SELECT FROM TRANSFER WHERE CHANNEL = 'ATM'), since event_4 and event_5 are identical, after the duplicate optimization of the open table, the two tables are combined into one, i.e. only event_4.
Third, built-in function optimization.
Although the built-in function is needed to be directly realized through the window function of the stream processing system, the efficiency of the window operator is not necessarily high, and the influence factors are many, such as the configuration of the stream processing system, the characteristics of the inflow data and the like, and the window operator needs to maintain a large number of states and consumes more resources; meanwhile, most of the data to be queried of the built-in functions are relatively simple, for example, the transfer sum of a certain account in the last day is sensitive in practical application, and the original database system can record the data, so that when the built-in functions are processed, a method for checking an external database can be directly adopted instead of a method for processing the stream, namely, a corresponding process is created by adopting an API at the bottom of the stream processing, the external database is directly queried in the process, and a result is returned.
For example, based on the example in substep 63 in the substep process flow of step 511 shown in fig. 5, the process flow after "built-in function optimization" is described, and the improvement is made here in "first step" compared to the example in substep 63:
the first step, the first sub-node, namely the left expression node, is accessed, and the node is a query node, and the specific flow is as follows:
(a) The local JOIN is performed using the stream processing built-in function syntax, namely:
CREATE TEMPORARY VIEW`procedure_2`AS(SELECT S.id,S.rowtime,T.v AS totaldebit FROM event_4 AS S,LATERAL TABLE(TOTALDEBIT(accountnumber,`ATM`,2,1))AS T(v))
(b) The return left operand lhs is (procedure_2, total bit).
In the second step, a second child node, i.e., comparison operation Fu Jiedian, is accessed, resulting in op as < =.
And thirdly, accessing a third child node, namely a right expression node, to obtain rhs of 500.
Fourth, comparative expression conversion is performed using WHERE grammar, i.e
CREATE TEMPORARY VIEW`comparison_1`AS(SELECT accountnumber,rowtime FROM procedure_2 WHERE`totaldebit`<=500.0)
Fifth, the whole information is selected from the complete event list, namely:
CREATE TEMPORARY VIEW`condition_1`AS(SELECT*FROM event_7,comparison_1WHERE event_7.accountnumber=comparison_1.accountnumber AND event_7.rowtime>=comparison_1.rowtime)
As shown in fig. 3, for the second SingleCondition nodes, it corresponds to transfer.
In the first step, the left expression node is accessed, and the node is a simple variable node, and the event variable is directly returned, namely ("transfer", "value").
In the second step, the comparison operation Fu Jiedian is accessed, resulting in op=.
And thirdly, accessing the right expression node of the rhs to obtain the rhs of 500.
Fourth, directly selecting the event meeting the condition through the SELECT syntax, namely:
CREATE TEMPORARY VIEW`comparison_2`AS(SELECT*FROM transfer WHERE`value`>=500.0)
Fifth, reading the current table in the symbol table, denoted as condition_1, and requiring intersection of comparison _2 and condition_1, namely:
CREATE TEMPORARY VIEW`condition_2`AS(SELECT*FROM comparison_2 WHERE id IN(SELECT id FROM condition_1)).
As shown in fig. 3, for the third SingleCondition node, which corresponds to the hit (4) [ QUERY TOTALDEBIT (ONL) > =100 ] > =1, is a historical data query point, and a specific description of the specific procedure may refer to the relevant description of the HISTSTATEMENT type node.
Fourth, table update optimization is that, in the case of a database system, only the data in the table needs to be rewritten, but in the case of a stream processing system, since the table is actually a single data stream, it cannot be rewritten, and when an entry is updated, it is necessary to retransmit a new data stream with an update flag, and obviously, this update operation is not efficient. If the table is updated very frequently, a large number of stream elements may appear in the stream processing system, degrading system performance. Thus, table update optimization translates code that generates table updates into code that does not require table updates.
The table update optimization is mainly embodied in the processing flow of HISTSTATEMENT type nodes.
For example, taking the example of "illustrating" in the foregoing "processing manner for HISTSTATEMENT types of nodes" as an example, the improvement of the flow after "table update optimization" is described, the improvement point is mainly "fourth step", and the processing flow after improvement is as follows:
as shown in fig. 3, in a first step, the first child node of the HISTSTATEMENT type node is accessed, the number d of entries to be queried is 4, and hist_days in the symbol table is set to 4.
In a second step, a second child node, the conditional node, of the HISTSTATEMENT type node is accessed, creating the following three new tables, the functions being aggregation, TOPN selection and comparative expression data filtering, respectively.
CREATE TEMPORARY VIEW`procedure_3`AS(SELECT accountnumber,SUM(`value`)AS totaldebit,TUMBLE_END(rowtime,INTERVAL`1`DAY)AS rowtime FROM event_9 GROUP BY accountnumber,TUMBLE(rowtime,INTERVAL`1`DAY))
CREATE TEMPORARY VIEW`procedure_4`AS(SELECT accountnumber,totaldebit,rowtime FROM(SELECT*,ROW_NUMBER()OVER(PARTITION BY accountnumber ORDER BY rowtime DESC)as rownum FROM procedure_3)WHERE rownum<=4)
CREATE TEMPORARY VIEW`comparison_3`AS(SELECT accountnumber,rowtime FROM procedure_4 WHERE`totaldebit`>=100.0)
And thirdly, recovering the hist_days in the symbol table to be 1.
Fourth, rolling window aggregation is carried out on the data in the comparison _3 table, the window time is set to be 1 second, and global COUNT aggregation is not directly adopted any more, namely
CREATE TEMPORARY VIEW`count_1`AS(SELECT id,MAX(rowtime)AS rowtime,COUNT(*)AS daycount FROM comparison_3 GROUP BY id,TUMBLE(rowtime,INTERVAL`1`SECOND))
The invention also provides an optimizing processing device of the financial fraud modeling language. Referring to fig. 7, fig. 7 is a schematic structural view of an optimizing processing apparatus of a financial fraud modeling language of the present invention.
As shown in fig. 7, the apparatus 80 includes a first generating module 801, a first judging module 802, a third generating module 804, and a fourth generating module 805;
The first generation module 801 is configured to generate FFML abstract syntax tree corresponding to a fraud detection rule according to the fraud detection rule written using the financial fraud modeling language FFML;
the first judging module 802 is configured to judge a node type of the node by traversing each node in the FFML abstract syntax tree;
The third generating module 804 is configured to, if the node type of the node is SingleCondition, convert a boolean expression in the data stream according to the left return value of the left expression sub-node of the node, the comparison return value of the comparison operator sub-node, and the right return value of the right expression sub-node, and generate target conversion data;
The fourth generating module 805 is configured to generate, according to the target conversion data, an SQL code corresponding to the fraud detection rule.
Optionally, the device further comprises an execution module and a fifth generation module;
The execution module is used for executing the processing flow of the sub-nodes by traversing the sub-nodes of the node if the node type of the node is EVENTSTATEMENT, obtaining the SQL table of each sub-node, and storing the SQL table in the events list;
and the fifth generation module is used for merging the contents of all SQL tables in the events list to generate third conversion data.
Optionally, the fifth generating module is specifically configured to combine contents of ALL SQL tables in the events list through a UNION ALL operator.
Optionally, the device further comprises a second judging module, a first updating module, a second updating module and a merging module;
The second judging module is configured to sequentially access each child node of the ConditionStatement type if the node type of the node is ConditionStatement, and judge whether the logical operation after each child node is an and operation or an or operation;
the first updating module is configured to update a current table in the symbol table to be a stack top element if the logical operation is an and operation;
the second updating module is configured to update the current table in the symbol table to a value corresponding to the event_table in the symbol table if the logic operation is an or operation;
And the merging module is used for merging all tables in a stack to generate fourth conversion data after all child nodes of the ConditionStatement type node access is completed, wherein the ConditionStatement type node comprises a plurality of SingleCondition type nodes which are connected through logical symbols and logical symbols or.
Optionally, the fourth generating module 805 is specifically configured to generate an SQL code corresponding to the fraud detection rule according to the target transformation data, the third transformation data, and the fourth transformation data.
Therefore, the optimizing processing device of the financial fraud modeling language provided by the invention can generate the FFML abstract syntax tree corresponding to the fraud detection rule based on the fraud detection rule written by using the financial fraud modeling language FFML, further generate corresponding conversion data according to the node type of each node in the FFML abstract syntax tree, finally generate the SQL code corresponding to the fraud detection rule according to each conversion data, and can quickly convert the fraud detection rule written by using the financial fraud modeling language FFML into the SQL programming language which can be identified by the streaming platform, so that the processing efficiency is high and the real-time performance is realized.
It should be noted that the terms "comprising" and "having" and any variations thereof in the embodiments of the present invention and the accompanying drawings are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.
Those of ordinary skill in the art will appreciate that the drawing is merely a schematic illustration of one embodiment and that modules or flow in the drawing are not necessarily required to practice the invention. It will be appreciated by those of ordinary skill in the art that modules in an apparatus of an embodiment may be distributed in an apparatus of an embodiment as described in the embodiments, and that corresponding changes may be located in one or more apparatuses different from the embodiment. The modules of the above embodiments may be combined into one module, or may be further split into a plurality of sub-modules.
It should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention, and not for limiting the same, and although the present invention has been described in detail with reference to the above-mentioned embodiments, it should be understood by those skilled in the art that the technical solution described in the above-mentioned embodiments may be modified or some technical features may be equivalently replaced, and these modifications or substitutions do not make the essence of the corresponding technical solution deviate from the spirit and scope of the technical solution of the embodiment of the present invention.
Claims (8)
1. A method for optimizing a financial fraud modeling language, comprising:
Generating FFML abstract syntax trees corresponding to fraud detection rules according to fraud detection rules written by using a financial fraud modeling language FFML;
Judging the node type of each node in the FFML abstract syntax tree by traversing the nodes;
If the node type of the node is SingleCondition, converting the Boolean expression in the data stream according to the left return value of the node left expression sub-node, the comparison return value of the comparison operator sub-node and the right return value of the right expression sub-node to generate target conversion data, wherein the sub-node of the SingleCondition node type is in the form of a fixed left expression, a fixed comparison operator and a fixed right expression;
generating a Structured Query Language (SQL) code corresponding to the fraud detection rule according to the target conversion data;
The method further comprises the steps of:
If the node type of the node is EVENTSTATEMENT, executing the processing flow of the sub-nodes by traversing the sub-nodes of the node to obtain an SQL table of each sub-node, and storing the SQL table in a events list, wherein the node of the EVENTSTATEMENT node type supports SINGLEEVENT the type of the sub-node of the EVENTSTATEMENT node type defining a plurality of or events, and the SINGLEEVENT is a single independent event or sequence event;
And merging the contents of all SQL tables in the events list to generate third conversion data.
2. The method of claim 1, wherein the step of merging the contents of all SQL tables in the events list comprises:
and merging the contents of ALL SQL tables in the events list through a UNION ALL operator.
3. The method according to claim 1, wherein the method further comprises:
If the node type of the node is ConditionStatement, sequentially accessing all child nodes of the node of the ConditionStatement type, and judging whether the logic operation after each child node is an AND operation or OR operation;
if the logic operation is an AND operation, updating the current table in the symbol table into a stack top element;
if the logic operation is an OR operation, updating the current table in the symbol table to a value corresponding to the event_table in the symbol table;
After all child nodes of the ConditionStatement type nodes are accessed, merging all tables in a stack to generate fourth conversion data;
Wherein the ConditionStatement type nodes comprise a plurality of SingleCondition type nodes which are connected through logical symbols AND and OR.
4. A method according to claim 3, wherein the step of generating structured query language, SQL, code corresponding to the fraud detection rule based on the target conversion data comprises:
And generating SQL codes corresponding to the fraud detection rules according to the target conversion data, the third conversion data and the fourth conversion data.
5. The optimizing processing device of the financial fraud modeling language is characterized by comprising a first generation module, a first judgment module, a third generation module and a fourth generation module;
the first generation module is configured to generate FFML abstract syntax tree corresponding to a fraud detection rule according to the fraud detection rule written using the financial fraud modeling language FFML;
The first judging module is configured to judge a node type of the node by traversing each node in the FFML abstract syntax tree;
The third generating module is configured to, if the node type of the node is SingleCondition, transform the boolean expression in the data stream according to the left return value of the node's left expression sub-node, the comparison return value of the comparison operator sub-node, and the right return value of the right expression sub-node, and generate target transformation data, where the sub-node of the SingleCondition node type is in the form of a fixed left expression, a comparison operator, and a right expression;
The fourth generation module is used for generating SQL codes corresponding to the fraud detection rules according to the target conversion data;
the device also comprises an execution module and a fifth generation module;
The executing module is configured to, if the node type of the node is EVENTSTATEMENT, execute the processing flow of the child nodes by traversing the child nodes of the node, obtain an SQL table of each child node, and store the SQL table in the events list, where the node of the EVENTSTATEMENT node type supports SINGLEEVENT the child nodes of the node of the EVENTSTATEMENT node type, and the SINGLEEVENT is a single independent event or a sequence event;
and the fifth generation module is used for merging the contents of all SQL tables in the events list to generate third conversion data.
6. The apparatus of claim 5, wherein the device comprises a plurality of sensors,
The fifth generation module is specifically configured to combine contents of ALL SQL tables in the events list through a UNION ALL operator.
7. The apparatus of claim 5, wherein the device comprises a plurality of sensors,
The device also comprises a second judging module, a first updating module, a second updating module and a merging module;
The second judging module is configured to sequentially access each child node of the ConditionStatement type if the node type of the node is ConditionStatement, and judge whether the logical operation after each child node is an and operation or an or operation;
the first updating module is configured to update a current table in the symbol table to be a stack top element if the logical operation is an and operation;
the second updating module is configured to update the current table in the symbol table to a value corresponding to the event_table in the symbol table if the logic operation is an or operation;
And the merging module is used for merging all tables in a stack to generate fourth conversion data after all child nodes of the ConditionStatement type node access is completed, wherein the ConditionStatement type node comprises a plurality of SingleCondition type nodes which are connected through logical symbols and logical symbols or.
8. The apparatus of claim 5, wherein the device comprises a plurality of sensors,
The fourth generation module is specifically configured to generate an SQL code corresponding to the fraud detection rule according to the target transformation data, the third transformation data, and the fourth transformation data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110712728.7A CN113721896B (en) | 2021-06-25 | 2021-06-25 | A method and device for optimizing financial fraud modeling language |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110712728.7A CN113721896B (en) | 2021-06-25 | 2021-06-25 | A method and device for optimizing financial fraud modeling language |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113721896A CN113721896A (en) | 2021-11-30 |
CN113721896B true CN113721896B (en) | 2024-12-10 |
Family
ID=78673069
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110712728.7A Active CN113721896B (en) | 2021-06-25 | 2021-06-25 | A method and device for optimizing financial fraud modeling language |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113721896B (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101561817A (en) * | 2009-06-02 | 2009-10-21 | 天津大学 | Conversion algorithm from XQuery to SQL query language and method for querying relational data |
Family Cites Families (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2012201466B2 (en) * | 2005-06-27 | 2014-07-03 | Csc Technology Singapore Pte Ltd | Code Transformation |
JP5681041B2 (en) * | 2011-06-03 | 2015-03-04 | 富士通株式会社 | Name identification rule generation method, apparatus, and program |
CN103927473A (en) * | 2013-01-16 | 2014-07-16 | 广东电网公司信息中心 | Method, device and system for detecting source code safety of mobile intelligent terminal |
US9569779B2 (en) * | 2013-01-17 | 2017-02-14 | International Business Machines Corporation | Fraud detection employing personalized fraud detection rules |
US10061573B2 (en) * | 2013-01-29 | 2018-08-28 | Mobilize.Net Corporation | User interfaces of application porting software platform |
CN106293653B (en) * | 2015-05-19 | 2020-11-06 | 深圳市腾讯计算机系统有限公司 | Code processing method and device and computer readable medium |
CN107704382B (en) * | 2017-09-07 | 2020-09-25 | 北京信息科技大学 | Python-oriented function call path generation method and system |
CN107766107A (en) * | 2017-10-31 | 2018-03-06 | 四川长虹电器股份有限公司 | The analytic method of xml document universal parser based on Xpath language |
AU2019340705B2 (en) * | 2018-09-11 | 2024-02-08 | Mastercard Technologies Canada ULC | Optimized execution of fraud detection rules |
CN109697201B (en) * | 2018-12-27 | 2020-12-04 | 清华大学 | A query processing method, system, device and computer-readable storage medium |
US11487521B2 (en) * | 2019-03-04 | 2022-11-01 | Next Pathway Inc. | System and method for source code translation using stream expressions |
CN110597502B (en) * | 2019-08-20 | 2023-05-23 | 北京东方国信科技股份有限公司 | Single step debugging method for realizing PL/SQL language based on java |
CN111324344B (en) * | 2020-02-28 | 2025-03-28 | 深圳前海微众银行股份有限公司 | Method, device, equipment and readable storage medium for generating code statements |
CN111638883B (en) * | 2020-05-14 | 2023-05-16 | 四川新网银行股份有限公司 | Decision engine implementation method based on decision tree |
CN112861945B (en) * | 2021-01-28 | 2022-05-13 | 清华大学 | A Multimodal Fusion Lie Detection Method |
-
2021
- 2021-06-25 CN CN202110712728.7A patent/CN113721896B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101561817A (en) * | 2009-06-02 | 2009-10-21 | 天津大学 | Conversion algorithm from XQuery to SQL query language and method for querying relational data |
Also Published As
Publication number | Publication date |
---|---|
CN113721896A (en) | 2021-11-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11455306B2 (en) | Query classification and processing using neural network based machine learning | |
US11354305B2 (en) | User interface commands for regular expression generation | |
CN106446045B (en) | User portrait construction method and system based on dialogue interaction | |
US20090144229A1 (en) | Static query optimization for linq | |
CN106664224B (en) | Method and system for metadata enhanced inventory management for communication systems | |
US20200320142A1 (en) | Regular expression generation using span highlighting alignment | |
US20130275446A1 (en) | Creating data in a data store using a dynamic ontology | |
CN116745758A (en) | Intelligent query editor using neural network-based machine learning | |
US20070271233A1 (en) | A generic interface for deep embedding of expression trees in programming languages | |
US11941018B2 (en) | Regular expression generation for negative example using context | |
CN111209211A (en) | Cross-project software defect prediction method based on long-term and short-term memory neural network | |
CN111176656A (en) | A complex data matching method and medium | |
CN114356964A (en) | Data lineage construction method, device, storage medium and electronic device | |
CN113608903A (en) | Fault management method based on XML language | |
CN113391793B (en) | Processing method and device of financial fraud modeling language for stream processing | |
WO2020263676A1 (en) | Regular expression generation using span highlighting alignment | |
CN113721896B (en) | A method and device for optimizing financial fraud modeling language | |
CN110008448B (en) | Method and device for automatically converting SQL code into Java code | |
US10719424B1 (en) | Compositional string analysis | |
US20050177788A1 (en) | Text to XML transformer and method | |
CN118113264A (en) | A SQL code hint method based on keyword backtracking and token sorting | |
CN106991144B (en) | Method and system for customizing data crawling workflow | |
CN108388646A (en) | A kind of method that can ensure SQL integralities and dynamic and change | |
CN113590650A (en) | Feature expression based structured query statement discrimination method and device | |
EP3987409A1 (en) | User interface commands for regular expression generation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |