[go: up one dir, main page]

CN114547109A - Database information screening method, system, storage medium and electronic equipment - Google Patents

Database information screening method, system, storage medium and electronic equipment Download PDF

Info

Publication number
CN114547109A
CN114547109A CN202210181495.7A CN202210181495A CN114547109A CN 114547109 A CN114547109 A CN 114547109A CN 202210181495 A CN202210181495 A CN 202210181495A CN 114547109 A CN114547109 A CN 114547109A
Authority
CN
China
Prior art keywords
information
field
data table
database
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210181495.7A
Other languages
Chinese (zh)
Inventor
武裕斌
张野
孙毅
彭文
刘青
曹凯
李保磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Ums Co ltd
Original Assignee
China Ums Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Ums Co ltd filed Critical China Ums Co ltd
Priority to CN202210181495.7A priority Critical patent/CN114547109A/en
Publication of CN114547109A publication Critical patent/CN114547109A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明实施例提供了一种数据库信息筛选方法、系统、存储介质及电子设备。其中,方法包括:获取采样数据表,在采样数据表中至少具有一条记录信息时,从采样数据表中查找获得至少具有一个字段信息的字段类型,得到目标字段类型,基于目标信息筛选模板对目标字段类型下的每个字段信息进行信息检测,获得与目标信息筛选模板相匹配的字段信息的个数,在个数与目标字段类型下的字段信息总数的比值大于预设比值时,获取目标字段类型下的字段文件在数据库中的位置信息,得到第一位置信息;字段文件包括至少一个字段信息,基于第一位置信息从数据库中查找获得第一目标信息。本发明能够高效实现数据库信息筛选,以满足数据库信息大规模批量处理的需求。

Figure 202210181495

Embodiments of the present invention provide a database information screening method, system, storage medium and electronic device. Wherein, the method includes: acquiring a sampling data table, when the sampling data table has at least one record information, searching and obtaining a field type with at least one field information in the sampling data table, obtaining a target field type, and screening a template based on the target information for the target field type. Information detection is performed for each field information under the field type, and the number of field information that matches the target information screening template is obtained. When the ratio of the number to the total number of field information under the target field type is greater than the preset ratio, the target field is obtained. The position information of the field file under the type in the database is obtained, and the first position information is obtained; the field file includes at least one field information, and the first target information is obtained by searching the database based on the first position information. The invention can efficiently realize the screening of database information, so as to meet the requirement of large-scale batch processing of database information.

Figure 202210181495

Description

数据库信息筛选方法、系统、存储介质及电子设备Database information screening method, system, storage medium and electronic device

技术领域technical field

本发明涉及数据处理技术领域,特别是涉及一种数据库信息筛选方法、系统、存储介质及电子设备。The invention relates to the technical field of data processing, in particular to a database information screening method, system, storage medium and electronic device.

背景技术Background technique

存储于数据库的信息难免会涉及一些敏感信息,例如身份证号、银行卡号、手机号、银行卡磁道信息、银行卡验证码、银行卡有效期以及密码等信息。这些敏感信息如果泄漏会产生不良影响,因此需要尽快从数据库筛选出数据库中存储的明文敏感信息,以便于对这些明文敏感信息进行加密操作。The information stored in the database will inevitably involve some sensitive information, such as ID number, bank card number, mobile phone number, bank card track information, bank card verification code, bank card validity period, password and other information. If these sensitive information is leaked, it will have adverse effects. Therefore, it is necessary to filter out the plaintext sensitive information stored in the database from the database as soon as possible, so as to facilitate the encryption of the plaintext sensitive information.

目前,在数据库敏感信息筛选时,一次只能针对少量数据进行敏感信息扫描,无法满足数据库信息大规模批量处理的需求。因此,如何高效实现数据库信息筛选对于防止敏感信息泄露来说是十分必要的。At present, when screening sensitive information in a database, only a small amount of data can be scanned for sensitive information at a time, which cannot meet the needs of large-scale batch processing of database information. Therefore, how to efficiently implement database information screening is very necessary to prevent the leakage of sensitive information.

发明内容SUMMARY OF THE INVENTION

本发明实施例的目的在于提供一种数据库信息筛选方法、系统、存储介质及电子设备,能够高效实现数据库信息筛选,以满足数据库信息大规模批量处理的需求。具体技术方案如下:The purpose of the embodiments of the present invention is to provide a database information screening method, system, storage medium and electronic device, which can efficiently realize database information screening and meet the needs of large-scale batch processing of database information. The specific technical solutions are as follows:

本发明提供了一种数据库信息筛选方法,包括:The invention provides a database information screening method, comprising:

获取采样数据表;所述采样数据表是基于数据库的采样信息生成的;Obtain a sampling data table; the sampling data table is generated based on the sampling information of the database;

在所述采样数据表中至少具有一条记录信息时,从所述采样数据表中查找获得至少具有一个字段信息的字段类型,得到目标字段类型;When there is at least one piece of record information in the sampling data table, look up and obtain a field type with at least one field information from the sampling data table, and obtain a target field type;

基于目标信息筛选模板对所述目标字段类型下的每个字段信息进行信息检测,获得与所述目标信息筛选模板相匹配的字段信息的个数;Perform information detection on each field information under the target field type based on the target information screening template, and obtain the number of field information matching the target information screening template;

在所述个数与所述目标字段类型下的字段信息总数的比值大于预设比值时,获取所述目标字段类型下的字段文件在所述数据库中的位置信息,得到第一位置信息;所述字段文件包括至少一个所述字段信息;When the ratio of the number to the total number of field information under the target field type is greater than a preset ratio, acquiring the position information of the field file under the target field type in the database, and obtaining the first position information; The field file includes at least one of the field information;

基于所述第一位置信息从所述数据库中查找获得第一目标信息。The first target information is obtained by searching the database based on the first location information.

可选地,所述基于目标信息筛选模板对所述目标字段类型下的每个字段信息进行信息检测,具体包括:Optionally, performing information detection on each field information under the target field type based on the target information screening template, specifically including:

当所述目标信息筛选模板为关键词信息时,获取所述目标字段类型下的字段信息的字段名;判断所述字段名中是否含有所述关键词信息;若含有,则检测得到所述字段名对应的字段信息与所述目标信息筛选模板相匹配;When the target information screening template is keyword information, obtain the field name of the field information under the target field type; determine whether the field name contains the keyword information; if so, detect and obtain the field The field information corresponding to the name matches the target information screening template;

当所述目标信息筛选模板为字符组合信息时,判断所述目标字段类型下的字段信息中是否含有所述字符组合信息;若含有,则检测得到所述目标字段类型下的字段信息与所述目标信息筛选模板相匹配。When the target information screening template is character combination information, it is judged whether the field information under the target field type contains the character combination information; if it does, it is detected that the field information under the target field type and the The target information filter template matches.

可选地,还包括:Optionally, also include:

在所述个数与所述目标字段类型下的字段信息总数的比值大于0且小于预设比值时,获取所述目标字段类型下的字段信息在所述数据库中的位置信息,得到第二位置信息;When the ratio of the number to the total number of field information under the target field type is greater than 0 and less than a preset ratio, obtain the position information of the field information under the target field type in the database, and obtain the second position information;

基于所述第二位置信息从所述数据库中查找获得第二目标信息。The second target information is obtained by searching the database based on the second location information.

可选地,还包括:Optionally, also include:

基于所述第一目标信息参数生成第一信息筛选报告表;所述第一目标信息参数为所述数据库的实例名、所述数据库的IP地址、所述采样数据表所属的采样批次、与所述采样数据表对应的用户名、所述采样数据表的表名、与所述目标信息筛选模板相匹配的字段信息的字段名、与所述目标信息筛选模板相匹配的字段信息的个数、第一目标信息的名称、预设比值、第一位置信息中的一种或多种;A first information screening report table is generated based on the first target information parameter; the first target information parameter is the instance name of the database, the IP address of the database, the sampling batch to which the sampling data table belongs, and The username corresponding to the sampling data table, the table name of the sampling data table, the field name of the field information matching the target information screening template, and the number of field information matching the target information screening template , one or more of the name of the first target information, the preset ratio, and the first position information;

基于所述第二目标信息参数生成第二信息筛选报告表;所述第二目标信息参数为所述数据库的实例名、所述数据库的IP地址、所述采样数据表所属的采样批次、与所述采样数据表对应的用户名、所述采样数据表的表名、与所述目标信息筛选模板相匹配的字段信息的字段名、与所述目标信息筛选模板相匹配的字段信息的个数、第二目标信息的名称、预设比值、第二位置信息中的一种或多种。A second information screening report table is generated based on the second target information parameter; the second target information parameter is the instance name of the database, the IP address of the database, the sampling batch to which the sampling data table belongs, and The username corresponding to the sampling data table, the table name of the sampling data table, the field name of the field information matching the target information screening template, and the number of field information matching the target information screening template , one or more of the name of the second target information, the preset ratio, and the second position information.

可选地,还包括:Optionally, also include:

在所述采样数据表中至少具有一条记录信息时,从所述采样数据表中查找获得不具有字段信息的字段类型,得到参考字段类型;When there is at least one piece of record information in the sampling data table, look up and obtain a field type that does not have field information from the sampling data table, and obtain a reference field type;

基于所述第三目标信息参数生成第三信息筛选报告表;所述第三目标信息参数为所述数据库的实例名、所述数据库的IP地址、所述采样数据表所属的采样批次、与所述采样数据表对应的用户名、所述采样数据表的表名、与所述参考字段类型对应的字段名、非空记录信息的个数中的一种或多种;A third information screening report table is generated based on the third target information parameter; the third target information parameter is the instance name of the database, the IP address of the database, the sampling batch to which the sampling data table belongs, and One or more of the user name corresponding to the sampling data table, the table name of the sampling data table, the field name corresponding to the reference field type, and the number of non-empty record information;

在所述采样数据表中不具有记录信息时,基于所述第四目标信息参数生成第四信息筛选报告表;所述第四目标信息参数为所述数据库的实例名、所述数据库的IP地址、所述采样数据表所属的采样批次、与所述采样数据表对应的用户名、所述采样数据表的表名、非空记录信息的个数中的一种或多种。When there is no record information in the sampling data table, a fourth information screening report table is generated based on the fourth target information parameter; the fourth target information parameter is the instance name of the database, the IP address of the database , one or more of the sampling batch to which the sampling data table belongs, the user name corresponding to the sampling data table, the table name of the sampling data table, and the number of non-empty record information.

可选地,所述采样数据表的生成方法,具体包括:Optionally, the method for generating the sampling data table specifically includes:

获取oracle数据库中与用户名对应的全部数据表,得到第一数据表集合;Obtain all the data tables corresponding to the user name in the oracle database, and obtain the first data table set;

在所述第一数据表集合中选取一个数据表,计算选取的数据表中一条记录信息的存储大小与所述选取的数据表的存储大小的比值,得到存储比值;Select a data table in the described first data table set, calculate the ratio of the storage size of a piece of record information in the selected data table and the storage size of the selected data table to obtain the storage ratio;

确定采样数量的倍数,以使基于所述倍数计算的采样比例在预设采样取值范围内;所述预设采样取值范围为所述oracle数据库的采样函数的采样取值范围;所述采样比例为所述倍数与所述采样数量的乘积乘以所述存储比值;Determine the multiple of the sampling quantity, so that the sampling ratio calculated based on the multiple is within the preset sampling value range; the preset sampling value range is the sampling value range of the sampling function of the oracle database; the sampling The ratio is the product of the multiple and the sampling number multiplied by the storage ratio;

基于所述oracle数据库的采样函数从所述选取的数据表中采集所述倍数的采样数量的记录信息,得到第一采样记录信息;Based on the sampling function of the oracle database, the record information of the multiple sampling quantity is collected from the selected data table, and the first sampling record information is obtained;

基于从所述第一采样记录信息中选取出的所述采样数量的记录信息生成所述采样数据表。The sampling data table is generated based on the recording information of the sampling quantity selected from the first sampling recording information.

可选地,所述采样数据表的生成方法,具体包括:Optionally, the method for generating the sampling data table specifically includes:

获取mysql数据库中与用户名对应的全部数据表,得到第二数据表集合;Obtain all the data tables corresponding to the user name in the mysql database, and obtain the second data table set;

在所述第二数据表集合中选取一个数据表,基于所述mysql数据库的数据导出指令从选取的数据表中导出预设个数的记录信息,得到第二记录信息;Select a data table in the second data table set, derive the record information of the preset number from the selected data table based on the data export instruction of the mysql database, and obtain the second record information;

基于所述第二记录信息生成所述采样数据表。The sampling data table is generated based on the second record information.

本发明还提供一种数据库信息筛选系统,包括:The present invention also provides a database information screening system, comprising:

数据获取模块,用于获取采样数据表;所述采样数据表是基于数据库的采样信息生成的;a data acquisition module for acquiring a sampling data table; the sampling data table is generated based on the sampling information of the database;

字段确定模块,用于在所述采样数据表中至少具有一条记录信息时,从所述采样数据表中查找获得至少具有一个字段信息的字段类型,得到目标字段类型;a field determination module, configured to search and obtain a field type with at least one field information from the sampled data table when there is at least one piece of record information in the sampled data table, and obtain a target field type;

信息检测模块,用于基于目标信息筛选模板对所述目标字段类型下的每个字段信息进行信息检测,获得与所述目标信息筛选模板相匹配的字段信息的个数;an information detection module, configured to perform information detection on each field information under the target field type based on a target information screening template, and obtain the number of field information that matches the target information screening template;

第一位置确定模块,用于在所述个数与所述目标字段类型下的字段信息总数的比值大于预设比值时,获取所述目标字段类型下的字段文件在所述数据库中的位置信息,得到第一位置信息;所述字段文件包括至少一个所述字段信息;A first location determination module, configured to acquire the location information of the field file under the target field type in the database when the ratio of the number to the total number of field information under the target field type is greater than a preset ratio , obtain the first location information; the field file includes at least one of the field information;

数据提取模块,用于基于所述第一位置信息从所述数据库中查找获得第一目标信息。A data extraction module, configured to search and obtain first target information from the database based on the first location information.

本发明还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有程序,所述程序被处理器执行时实现上述的数据库信息筛选方法。The present invention also provides a computer-readable storage medium, where a program is stored on the computer-readable storage medium, and when the program is executed by a processor, the above database information screening method is implemented.

本发明还提供一种电子设备,包括:The present invention also provides an electronic device, comprising:

至少一个处理器、以及与所述处理器连接的至少一个存储器、总线;at least one processor, and at least one memory and bus connected to the processor;

所述处理器、所述存储器通过所述总线完成相互间的通信;所述处理器用于调用所述存储器中的程序指令,以执行上述的数据库信息筛选方法。The processor and the memory communicate with each other through the bus; the processor is used to call program instructions in the memory to execute the above database information screening method.

本发明实施例提供的一种数据库信息筛选方法、系统、存储介质及电子设备,从采样数据表中查找与目标信息筛选模板相匹配的字段信息,获取该字段信息在数据库中的位置信息,基于位置信息从数据库中查找获得目标信息。本发明通过按字段筛选目标信息的方法,能够高效实现数据库信息筛选,以满足数据库信息大规模批量处理的需求。In a database information screening method, system, storage medium and electronic device provided by the embodiments of the present invention, the field information matching the target information screening template is searched from the sampling data table, and the location information of the field information in the database is obtained, based on The location information is searched from the database to obtain the target information. Through the method of screening target information by field, the present invention can efficiently realize the screening of database information, so as to meet the needs of large-scale batch processing of database information.

当然,实施本发明的任一产品或方法必不一定需要同时达到以上所述的所有优点。Of course, it is not necessary for any product or method to implement the present invention to simultaneously achieve all of the advantages described above.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative efforts.

图1为本发明实施例提供的数据库信息筛选方法流程图;1 is a flowchart of a database information screening method provided in an embodiment of the present invention;

图2为本发明实施例提供的数据库信息筛选系统的框图;2 is a block diagram of a database information screening system provided by an embodiment of the present invention;

图3为本发明实施例提供的一种电子设备的结构示意图。FIG. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

本发明实施例提供了一种数据库信息筛选方法,如图1所示,方法包括:An embodiment of the present invention provides a database information screening method, as shown in FIG. 1 , the method includes:

S101、获取采样数据表。采样数据表是基于数据库的采样信息生成的。S101. Obtain a sampling data table. The sampling data table is generated based on the sampling information of the database.

可选的,在本发明的一个可选实施例中,上述获取的采样数据表的具体实施方式可以是:获取数据库中存储的全部数据表的集合,并在该集合中随机选取一个非空数据表进行数据提取,并将提取的数据导入到空白的采样数据表中,从而获得非空的采样数据表。Optionally, in an optional embodiment of the present invention, the specific implementation of the acquired sampling data table may be: acquiring a set of all data tables stored in the database, and randomly selecting a non-null data in the set. Extract data from the table, and import the extracted data into a blank sampling data table to obtain a non-empty sampling data table.

可选的,在本发明的另一个可选实施例中,上述数据库的类型可以是关系型数据库,例如mySQL数据库和Oracle数据库。Optionally, in another optional embodiment of the present invention, the type of the above-mentioned database may be a relational database, such as a mySQL database and an Oracle database.

S102、在采样数据表中至少具有一条记录信息时,从采样数据表中查找获得至少具有一个字段信息的字段类型,得到目标字段类型。S102. When there is at least one piece of record information in the sampling data table, search and obtain a field type with at least one field information in the sampling data table, and obtain the target field type.

其中,上述字段类型可以是上述获取的采样数据表中,字段信息所在列的列名。The above-mentioned field type may be the column name of the column where the field information is located in the above-mentioned acquired sampling data table.

S103、基于目标信息筛选模板对目标字段类型下的每个字段信息进行信息检测,获得与目标信息筛选模板相匹配的字段信息的个数。S103: Perform information detection on each field information under the target field type based on the target information screening template, and obtain the number of field information matching the target information screening template.

可选的,在本发明的一个可选实施例中,上述目标信息筛选模板可以是根据字段信息预设的筛选规则。例如:密码规则、银行卡有效期规则和验证码规则等。其中,上述筛选规则可以是包括多个字段信息文本的集合。Optionally, in an optional embodiment of the present invention, the target information screening template may be a screening rule preset according to field information. For example: password rules, bank card validity rules and verification code rules, etc. Wherein, the above-mentioned filtering rule may be a set including a plurality of field information texts.

示例性的,目标信息筛选模板可以有如下几种:Exemplarily, the target information screening templates can be as follows:

1)身份证号筛选模板可以为以数字1或5或6开头然后拼接1-5的任意1位数字,也可以为以数字2或4开头然后拼接1-3的任意1位数字,也可以为以数字3开头拼接范围为1-7的任意1位数字,继续拼接0后拼接1-9的任意1位数字,或继续拼接1-6的任1位数字后拼接范围为0-9的任意1位数字,或继续拼接70,或继续拼接90,然后拼接0后拼接1-9的任意1位数字,或然后拼接1-9的1位数字后拼接0-9的任意1位数字,中间拼接范围为19000101-20991231,最后拼接任意3位数字并且以任意1位数字或大写X或小写x结束。1) The ID number screening template can be any 1-digit number starting with the number 1 or 5 or 6 and then splicing 1-5, or it can be any 1-digit number starting with the number 2 or 4 and then splicing 1-3, or To splicing any 1-digit number in the range of 1-7 starting with the number 3, continue splicing 0 and then splicing any 1-9 digit, or continue splicing any 1-6 digit and then splicing any 1-digit number in the range of 0-9 Any 1-digit number, or continue to splicing 70, or continue to splicing 90, then splicing 0 and then splicing any 1-9 digit, or then splicing 1-9 1-digit and then splicing 0-9 any 1-digit number, The middle splicing range is 19000101-20991231, and finally splicing any 3 digits and ending with any 1 digit or uppercase X or lowercase x.

2)银行卡号筛选模板可以为以数字62开头拼接任意2位数字,2组可包含连接符-拼接任意4位数字,可包含连接符-拼接任意3或4位数字,可以选1组可包含连接符-拼接任意1-3位数字结束。2) The bank card number screening template can be spliced with any 2 digits starting with the number 62, and 2 groups can contain connectors - splicing any 4 digits, can include connectors - splicing any 3 or 4 digits, you can choose 1 group that can contain Connector - Concatenate any 1-3 digits to end.

3)手机号筛选模板可以为开头可以包含加号或00开始拼接数字86的组合,中间可以为数字13或18然后拼接0-9的任意1位数字,中间还可以为数字14拼接数字5或7或9,中间还可以为数字15或19拼接一个非4的数字,中间还可以为数字166,中间还可以为数字17拼接一个非2且非4且非9的数字,最后以任意8位数字结束。3) The mobile phone number screening template can be a combination that can contain a plus sign or 00 at the beginning and splicing the number 86, the middle can be the number 13 or 18 and then any 1-digit number from 0-9, and the middle can also be the number 14 spliced with the number 5 or 7 or 9, the number 15 or 19 can be spliced with a non-4 number in the middle, the number 166 can also be in the middle, and the number 17 can also be spliced with a non-2, non-4 and non-9 number, and finally any 8 digits Number ends.

4)磁道信息筛选模板可以为以字母B开头中间拼接16位数字并且以^加除换行符之外的任意单字符结束,还可以为以16位数字开头直接拼接以等号=加除换行符之外的任意单字符结束,也可以为以数字99开头中间拼接16位数字并且以等号=加除换行符之外的任意单字符结束。4) The track information screening template can be spliced with a 16-digit number starting with the letter B and ending with ^ plus any single character except a newline, or it can be directly spliced with an equal sign at the beginning of the 16-digit number = plus except for a newline It can end with any single character of , or it can start with the number 99 and spliced 16 digits in the middle and end with an equal sign = plus any single character except the newline character.

5)密码规则筛选模板可以为列名关键词为PWd或PaSSwD或kL的大写转换匹配且含有6-18位大小写字母和数字。5) The password rule filtering template can be the uppercase conversion match with the column name keyword PWd or PaSSwD or kL and contain 6-18 uppercase and lowercase letters and numbers.

6)银行卡筛选模板可以为列名关键词为EXP或fld14的大写转换匹配,且含有前四位为1800-3999后两位为01-12的组合,或含有前两位为01-12后四位为1800-3999的组合。6) The bank card screening template can be a capital conversion match where the column name keyword is EXP or fld14, and the first four digits are 1800-3999 and the last two digits are 01-12, or the first two digits are 01-12. The four digits are a combination of 1800-3999.

7)银行卡验证码筛选模板可以为列名关键词为CvN的大写转换匹配,且含有3位数字。7) The bank card verification code screening template can match the uppercase conversion with the column name keyword CvN, and contains 3 digits.

对于上述目标信息筛选模板的具体类型,可以根据具体应用场景进行调整,并不做过多限定。The specific types of the above target information screening templates can be adjusted according to specific application scenarios, and are not limited too much.

可选的,在本发明的另一个可选实施例中,上述目标信息筛选模板的数据构建结构,可以包括:筛选规则的序号、正则表达式、敏感阈值系数、名称和列名关键词规则。其中,上述敏感阈值系数可以用于计算下述预设比值。上述正则表达式和列名关键词规则可以基于实际应用场景进行调整,并不做过多限定。Optionally, in another optional embodiment of the present invention, the data construction structure of the target information screening template may include: the sequence number of the screening rule, the regular expression, the sensitivity threshold coefficient, the name and the column name keyword rule. Wherein, the above-mentioned sensitive threshold coefficient can be used to calculate the following preset ratio. The above regular expressions and column name keyword rules can be adjusted based on actual application scenarios, and are not limited too much.

S104、在个数与目标字段类型下的字段信息总数的比值大于预设比值时,获取目标字段类型下的字段文件在数据库中的位置信息,得到第一位置信息;字段文件包括至少一个字段信息。S104, when the ratio of the number to the total number of field information under the target field type is greater than the preset ratio, obtain the position information of the field file under the target field type in the database, and obtain the first position information; the field file includes at least one field information .

其中,上述字段文件可以是上述数据库中,包含上述目标字段类型下的字段信息的非空数据表。上述第一位置信息可以包括:数据库所在服务器的地址、数据库地址和数据库实例名等。Wherein, the above-mentioned field file may be a non-empty data table in the above-mentioned database that includes field information under the above-mentioned target field type. The above-mentioned first location information may include: the address of the server where the database is located, the database address, the database instance name, and the like.

可选的,在本发明的一个可选实施例中,在个数与目标字段类型下的字段信息总数的比值大于预设比值时,可以将个数对应的字段信息确定为上述目标信息筛选模板中相应的筛选规则对应的敏感数据。Optionally, in an optional embodiment of the present invention, when the ratio of the number to the total number of field information under the target field type is greater than the preset ratio, the field information corresponding to the number may be determined as the target information screening template. Sensitive data corresponding to the corresponding filtering rules in .

S105、基于第一位置信息从数据库中查找获得第一目标信息。S105. Search and obtain first target information from a database based on the first location information.

本发明通过从采样数据表中查找与目标信息筛选模板相匹配的字段信息,获取该字段信息在数据库中的位置信息,基于位置信息从数据库中查找获得目标信息。本发明通过按字段筛选目标信息的方法,能够高效实现数据库信息筛选,以满足数据库信息大规模批量处理的需求。The present invention obtains the position information of the field information in the database by searching the field information matching the target information screening template from the sampling data table, and obtains the target information from the database based on the position information. Through the method of screening target information by field, the present invention can efficiently realize the screening of database information, so as to meet the needs of large-scale batch processing of database information.

可选的,基于目标信息筛选模板对目标字段类型下的每个字段信息进行信息检测,具体包括:Optionally, perform information detection on each field information under the target field type based on the target information screening template, specifically including:

当目标信息筛选模板为关键词信息时,获取目标字段类型下的字段信息的字段名;判断字段名中是否含有关键词信息;若含有,则检测得到字段名对应的字段信息与目标信息筛选模板相匹配。When the target information screening template is keyword information, obtain the field name of the field information under the target field type; determine whether the field name contains keyword information; if so, detect the field information corresponding to the field name and the target information screening template match.

当目标信息筛选模板为字符组合信息时,判断目标字段类型下的字段信息中是否含有字符组合信息;若含有,则检测得到目标字段类型下的字段信息与目标信息筛选模板相匹配。When the target information filter template is character combination information, it is judged whether the field information under the target field type contains character combination information; if so, it is detected that the field information under the target field type matches the target information filter template.

可选的,上述方法还包括:Optionally, the above method further includes:

在个数与目标字段类型下的字段信息总数的比值大于0且小于预设比值时,获取目标字段类型下的字段信息在数据库中的位置信息,得到第二位置信息;When the ratio of the number to the total number of field information under the target field type is greater than 0 and less than the preset ratio, obtain the position information of the field information under the target field type in the database, and obtain the second position information;

基于第二位置信息从数据库中查找获得第二目标信息。The second target information is obtained by searching the database based on the second position information.

可选的,在本发明的一个可选实施例中,在获得上述第二目标信息后,根据第二目标信息、与该第二目标信息对应的字段信息所在的字段文件和字段信息生成临时结果文件,并对该临时结果文件进行存储。Optionally, in an optional embodiment of the present invention, after obtaining the second target information, a temporary result is generated according to the second target information, the field file where the field information corresponding to the second target information is located, and the field information. file and store the temporary result file.

可选的,上述方法还包括:Optionally, the above method further includes:

基于第一目标信息参数生成第一信息筛选报告表;第一目标信息参数为数据库的实例名、数据库的IP地址、采样数据表所属的采样批次、与采样数据表对应的用户名、采样数据表的表名、与目标信息筛选模板相匹配的字段信息的字段名、与目标信息筛选模板相匹配的字段信息的个数、第一目标信息的名称、预设比值、第一位置信息中的一种或多种。Generate a first information screening report table based on the first target information parameter; the first target information parameter is the instance name of the database, the IP address of the database, the sampling batch to which the sampling data table belongs, the user name corresponding to the sampling data table, and the sampling data The table name of the table, the field name of the field information matching the target information screening template, the number of field information matching the target information screening template, the name of the first target information, the preset ratio, the first position information one or more.

基于第二目标信息参数生成第二信息筛选报告表;第二目标信息参数为数据库的实例名、数据库的IP地址、采样数据表所属的采样批次、与采样数据表对应的用户名、采样数据表的表名、与目标信息筛选模板相匹配的字段信息的字段名、与目标信息筛选模板相匹配的字段信息的个数、第二目标信息的名称、预设比值、第二位置信息中的一种或多种。Generate a second information screening report table based on the second target information parameter; the second target information parameter is the instance name of the database, the IP address of the database, the sampling batch to which the sampling data table belongs, the user name corresponding to the sampling data table, the sampling data The table name of the table, the field name of the field information that matches the target information screening template, the number of field information that matches the target information screening template, the name of the second target information, the preset ratio, and the second position information. one or more.

可选的,上述方法还包括:Optionally, the above method further includes:

在采样数据表中至少具有一条记录信息时,从采样数据表中查找获得不具有字段信息的字段类型,得到参考字段类型;When there is at least one record information in the sampling data table, look up and obtain the field type without field information from the sampling data table, and obtain the reference field type;

基于第三目标信息参数生成第三信息筛选报告表;第三目标信息参数为数据库的实例名、数据库的IP地址、采样数据表所属的采样批次、与采样数据表对应的用户名、采样数据表的表名、与参考字段类型对应的字段名、非空记录信息的个数中的一种或多种;Generate a third information screening report table based on the third target information parameter; the third target information parameter is the instance name of the database, the IP address of the database, the sampling batch to which the sampling data table belongs, the user name corresponding to the sampling data table, the sampling data One or more of the table name of the table, the field name corresponding to the reference field type, and the number of non-null record information;

在采样数据表中不具有记录信息时,基于第四目标信息参数生成第四信息筛选报告表;第四目标信息参数为数据库的实例名、数据库的IP地址、采样数据表所属的采样批次、与采样数据表对应的用户名、采样数据表的表名、非空记录信息的个数中的一种或多种。When there is no record information in the sampling data table, a fourth information screening report table is generated based on the fourth target information parameter; the fourth target information parameter is the instance name of the database, the IP address of the database, the sampling batch to which the sampling data table belongs, One or more of the user name corresponding to the sampling data table, the table name of the sampling data table, and the number of non-empty record information.

可选的,采样数据表的生成方法,具体包括:Optionally, the generation method of the sampling data table, specifically including:

获取oracle数据库中与用户名对应的全部数据表,得到第一数据表集合;Obtain all the data tables corresponding to the user name in the oracle database, and obtain the first data table set;

在第一数据表集合中选取一个数据表,计算选取的数据表中一条记录信息的存储大小与选取的数据表的存储大小的比值,得到存储比值;Select a data table in the first data table set, calculate the ratio of the storage size of a record information in the selected data table to the storage size of the selected data table, and obtain the storage ratio;

确定采样数量的倍数,以使基于倍数计算的采样比例在预设采样取值范围内;预设采样取值范围为oracle数据库的采样函数的采样取值范围;采样比例为倍数与采样数量的乘积乘以存储比值;Determine the multiple of the sampling number, so that the sampling ratio calculated based on the multiple is within the preset sampling value range; the preset sampling value range is the sampling value range of the sampling function of the oracle database; the sampling ratio is the product of the multiple and the sampling number Multiply by the stored ratio;

基于oracle数据库的采样函数从选取的数据表中采集倍数的采样数量的记录信息,得到第一采样记录信息;Based on the sampling function of the oracle database, the record information of the multiple sampling quantity is collected from the selected data table, and the first sampling record information is obtained;

基于从第一采样记录信息中选取出的采样数量的记录信息生成采样数据表。The sampling data table is generated based on the record information of the number of samples selected from the first sample record information.

其中,上述用户名为可以对上述数据库中的数据表进行修改操作的用户识别名。Wherein, the above-mentioned user name is a user identification name that can modify the data table in the above-mentioned database.

可选的,在本发明的一个可选实施例中,上述获取oracle数据库中与用户名对应的全部数据表,得到第一数据表集合的具体实施方式,可以是:Optionally, in an optional embodiment of the present invention, the above-mentioned specific implementation of obtaining all the data tables corresponding to the user names in the oracle database to obtain the first data table set may be:

获取用户模式列表范围和用户名可修改的数据表。Gets a user-mode list of ranges and user-name modifiable data tables.

根据用户模式列表范围,通过查询数据库中用户和数据表的关系视图,循环获取数据中,各用户模式下的数据表范围。该数据表范围即为上述第一数据表集合。According to the range of the user mode list, by querying the relational view between the user and the data table in the database, the data table range in each user mode in the data is obtained circularly. The data table range is the above-mentioned first data table set.

可选的,在本发明的另一个可选实施例中,上述基于oracle数据库的采样函数从选取的数据表中采集倍数的采样数量的记录信息,得到第一采样记录信息的实施过程,可以是基于采样函数,按照上述采样比例进行数据采样,然后将获得采样数据中的前采样数量个记录信息确定为上述第一采样记录信息。例如:采样数量为5000时,从数据表的10000条数据中取前5000条作为上述记录信息。Optionally, in another optional embodiment of the present invention, the above-mentioned sampling function based on the oracle database collects the record information of the multiple sampling quantity from the selected data table, and the implementation process of obtaining the first sampling record information can be: Based on the sampling function, the data is sampled according to the above-mentioned sampling ratio, and then the first sampling record information in the obtained sampling data is determined as the above-mentioned first sampling record information. For example: when the number of samples is 5000, the first 5000 records are taken from the 10000 records in the data table as the above record information.

其中,上述采样函数可以是oracle数据库自带的采样函数sample(ratio)。The above sampling function may be a sampling function sample(ratio) that comes with the oracle database.

可选的,采样数据表的生成方法,具体包括:Optionally, the generation method of the sampling data table, specifically including:

获取mysql数据库中与用户名对应的全部数据表,得到第二数据表集合;Obtain all the data tables corresponding to the user name in the mysql database, and obtain the second data table set;

在第二数据表集合中选取一个数据表,基于mysql数据库的数据导出指令从选取的数据表中导出预设个数的记录信息,得到第二记录信息;Select a data table in the second data table set, derive the record information of the preset number from the selected data table based on the data export instruction of the mysql database, and obtain the second record information;

基于第二记录信息生成采样数据表。A sample data table is generated based on the second record information.

与上述数据库信息筛选方法的实施例相对应的,本发明还提供了一种数据库信息筛选系统,如图2所示,包括:Corresponding to the embodiment of the above-mentioned database information screening method, the present invention also provides a database information screening system, as shown in Figure 2, comprising:

数据获取模块201,用于获取采样数据表。采样数据表是基于数据库的采样信息生成的。The data acquisition module 201 is used for acquiring the sampling data table. The sampling data table is generated based on the sampling information of the database.

字段确定模块202,用于在采样数据表中至少具有一条记录信息时,从采样数据表中查找获得至少具有一个字段信息的字段类型,得到目标字段类型。The field determination module 202 is configured to, when the sampled data table has at least one piece of record information, search and obtain a field type with at least one field information from the sampled data table to obtain the target field type.

信息检测模块203,基于目标信息筛选模板对目标字段类型下的每个字段信息进行信息检测,获得与目标信息筛选模板相匹配的字段信息的个数。The information detection module 203 performs information detection on each field information under the target field type based on the target information screening template, and obtains the number of field information matching the target information screening template.

第一位置确定模块204,用于在个数与目标字段类型下的字段信息总数的比值大于预设比值时,获取目标字段类型下的字段文件在数据库中的位置信息,得到第一位置信息。字段文件包括至少一个字段信息。The first location determination module 204 is configured to obtain location information of the field files of the target field type in the database when the ratio of the number to the total number of field information of the target field type is greater than a preset ratio to obtain the first location information. The field file includes at least one field information.

数据提取模块205,基于第一位置信息从数据库中查找获得第一目标信息。The data extraction module 205 searches and obtains the first target information from the database based on the first position information.

可选的,上述信息检测模块203被具体设置为:Optionally, the above information detection module 203 is specifically set as:

当目标信息筛选模板为关键词信息时,获取目标字段类型下的字段信息的字段名。判断字段名中是否含有关键词信息。若含有,则检测得到字段名对应的字段信息与目标信息筛选模板相匹配。When the target information filter template is keyword information, obtain the field name of the field information under the target field type. Determine whether the field name contains keyword information. If it contains, it is detected that the field information corresponding to the field name matches the target information filtering template.

当目标信息筛选模板为字符组合信息时,判断目标字段类型下的字段信息中是否含有字符组合信息。若含有,则检测得到目标字段类型下的字段信息与目标信息筛选模板相匹配。When the target information filtering template is character combination information, it is judged whether the field information under the target field type contains character combination information. If it contains, it is detected that the field information under the target field type matches the target information filtering template.

可选的,上述数据库信息筛选系统还包括:Optionally, the above database information screening system further includes:

第二位置确定模块,用于在个数与目标字段类型下的字段信息总数的比值大于0且小于预设比值时,获取目标字段类型下的字段信息在数据库中的位置信息,得到第二位置信息。The second position determination module is configured to obtain the position information of the field information under the target field type in the database when the ratio of the number to the total number of field information under the target field type is greater than 0 and less than the preset ratio, and obtain the second position information.

基于第二位置信息从数据库中查找获得第二目标信息。The second target information is obtained by searching the database based on the second position information.

可选的,上述数据库信息筛选系统还包括:Optionally, the above database information screening system further includes:

第一报表生成子模块,基于第一目标信息参数生成第一信息筛选报告表。第一目标信息参数为数据库的实例名、数据库的IP地址、采样数据表所属的采样批次、与采样数据表对应的用户名、采样数据表的表名、与目标信息筛选模板相匹配的字段信息的字段名、与目标信息筛选模板相匹配的字段信息的个数、第一目标信息的名称、预设比值、第一位置信息中的一种或多种。The first report generation sub-module generates a first information screening report table based on the first target information parameter. The first target information parameter is the instance name of the database, the IP address of the database, the sampling batch to which the sampling data table belongs, the user name corresponding to the sampling data table, the table name of the sampling data table, and the field matching the target information filtering template. One or more of the field name of the information, the number of field information matching the target information screening template, the name of the first target information, a preset ratio, and the first position information.

第二报表生成子模块,基于第二目标信息参数生成第二信息筛选报告表。第二目标信息参数为数据库的实例名、数据库的IP地址、采样数据表所属的采样批次、与采样数据表对应的用户名、采样数据表的表名、与目标信息筛选模板相匹配的字段信息的字段名、与目标信息筛选模板相匹配的字段信息的个数、第二目标信息的名称、预设比值、第二位置信息中的一种或多种。The second report generation sub-module generates a second information screening report table based on the second target information parameter. The second target information parameters are the instance name of the database, the IP address of the database, the sampling batch to which the sampling data table belongs, the user name corresponding to the sampling data table, the table name of the sampling data table, and the field matching the target information filtering template One or more of the field name of the information, the number of field information matching the target information screening template, the name of the second target information, the preset ratio, and the second position information.

可选的,上述字段确定模块202还被设置为:Optionally, the above-mentioned field determination module 202 is also set to:

在采样数据表中至少具有一条记录信息时,从采样数据表中查找获得不具有字段信息的字段类型,得到参考字段类型。When there is at least one record information in the sampling data table, look up and obtain the field type without field information in the sampling data table, and obtain the reference field type.

基于第三目标信息参数生成第三信息筛选报告表。第三目标信息参数为数据库的实例名、数据库的IP地址、采样数据表所属的采样批次、与采样数据表对应的用户名、采样数据表的表名、与参考字段类型对应的字段名、非空记录信息的个数中的一种或多种。A third information screening report table is generated based on the third target information parameter. The third target information parameter is the instance name of the database, the IP address of the database, the sampling batch to which the sampling data table belongs, the user name corresponding to the sampling data table, the table name of the sampling data table, the field name corresponding to the reference field type, One or more of the number of non-null record information.

在采样数据表中不具有记录信息时,基于第四目标信息参数生成第四信息筛选报告表。第四目标信息参数为数据库的实例名、数据库的IP地址、采样数据表所属的采样批次、与采样数据表对应的用户名、采样数据表的表名、非空记录信息的个数中的一种或多种。When there is no record information in the sampling data table, a fourth information screening report table is generated based on the fourth target information parameter. The fourth target information parameter is the instance name of the database, the IP address of the database, the sampling batch to which the sampling data table belongs, the user name corresponding to the sampling data table, the table name of the sampling data table, and the number of non-empty record information. one or more.

可选的,上述数据库信息筛选系统,还包括:Optionally, the above database information screening system further includes:

数据表生成模块,用于获取oracle数据库中与用户名对应的全部数据表,得到第一数据表集合。The data table generation module is used to obtain all the data tables corresponding to the user names in the oracle database, and obtain the first data table set.

在第一数据表集合中选取一个数据表,计算选取的数据表中一条记录信息的存储大小与选取的数据表的存储大小的比值,得到存储比值。A data table is selected from the first data table set, and the ratio of the storage size of a piece of record information in the selected data table to the storage size of the selected data table is calculated to obtain the storage ratio.

确定采样数量的倍数,以使基于倍数计算的采样比例在预设采样取值范围内。预设采样取值范围为oracle数据库的采样函数的采样取值范围。采样比例为倍数与采样数量的乘积乘以存储比值。Determine the multiple of the sampling quantity, so that the sampling ratio calculated based on the multiple is within the preset sampling value range. The preset sampling value range is the sampling value range of the sampling function of the oracle database. The sampling ratio is the product of the multiple and the number of samples multiplied by the storage ratio.

基于oracle数据库的采样函数从选取的数据表中采集倍数的采样数量的记录信息,得到第一采样记录信息。Based on the sampling function of the oracle database, the record information of the multiple sampling quantity is collected from the selected data table to obtain the first sampling record information.

基于从第一采样记录信息中选取出的采样数量的记录信息生成采样数据表。The sampling data table is generated based on the record information of the number of samples selected from the first sample record information.

可选的,上述数据表生成模块还被设置为:Optionally, the above data table generation module is also set to:

获取mysql数据库中与用户名对应的全部数据表,得到第二数据表集合。Obtain all the data tables corresponding to the user name in the mysql database, and obtain the second data table set.

在第二数据表集合中选取一个数据表,基于mysql数据库的数据导出指令从选取的数据表中导出预设个数的记录信息,得到第二记录信息。A data table is selected from the second data table set, and a preset number of record information is derived from the selected data table based on a data export instruction of the mysql database to obtain second record information.

基于第二记录信息生成采样数据表。A sample data table is generated based on the second record information.

本发明实施例提供了一种计算机可读存储介质,计算机可读存储介质上存储有程序,程序被处理器执行时实现如上述任一项的数据库信息筛选方法。An embodiment of the present invention provides a computer-readable storage medium, where a program is stored on the computer-readable storage medium, and when the program is executed by a processor, any one of the above database information screening methods is implemented.

本发明实施例提供了一种电子设备,包括:An embodiment of the present invention provides an electronic device, including:

至少一个处理器、以及与处理器连接的至少一个存储器、总线;at least one processor, and at least one memory and bus connected to the processor;

处理器、存储器通过总线完成相互间的通信;处理器用于调用存储器中的程序指令,以执行如上述任一项的数据库信息筛选方法。The processor and the memory communicate with each other through the bus; the processor is used to call program instructions in the memory to execute the database information screening method as described above.

上述电子设备,如图3所示,电子设备30包括至少一个处理器301、以及与处理器301连接的至少一个存储器302、总线303;其中,处理器301、存储器302通过总线303完成相互间的通信;处理器301用于调用存储器302中的程序指令,以执行上述的数据库信息筛选方法。本文中的电子设备可以是服务器、PC、PAD、手机等。The above electronic device, as shown in FIG. 3 , the electronic device 30 includes at least one processor 301, and at least one memory 302 and a bus 303 connected to the processor 301; Communication; the processor 301 is used for calling the program instructions in the memory 302 to execute the above-mentioned database information screening method. The electronic device herein can be a server, a PC, a PAD, a mobile phone, and the like.

本申请还提供了一种计算机程序产品,当在数据处理设备上执行时,适于执行初始化有上述的数据库信息筛选方法包括的步骤的程序。The present application also provides a computer program product, which, when executed on a data processing device, is adapted to execute a program initialized with the steps included in the above database information screening method.

本申请是参照根据本申请实施例的方法、系统和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to flowchart illustrations and/or block diagrams of methods, systems and computer program products according to embodiments of the present application. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.

在一个典型的配置中,设备包括一个或多个处理器(CPU)、存储器和总线。设备还可以包括输入/输出接口、网络接口等。In a typical configuration, a device includes one or more processors (CPUs), memory, and a bus. Devices may also include input/output interfaces, network interfaces, and the like.

存储器可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM),存储器包括至少一个存储芯片。存储器是计算机可读介质的示例。Memory may include non-persistent memory in computer readable media, random access memory (RAM) and/or non-volatile memory, such as read only memory (ROM) or flash memory (flash RAM), the memory including at least one memory chip. Memory is an example of a computer-readable medium.

计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。Computer-readable media includes both persistent and non-permanent, removable and non-removable media, and storage of information may be implemented by any method or technology. Information may be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, excludes transitory computer-readable media, such as modulated data signals and carrier waves.

本领域技术人员应明白,本申请的实施例可提供为方法、系统或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。It will be appreciated by those skilled in the art that the embodiments of the present application may be provided as a method, a system or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括要素的过程、方法、商品或者设备中还存在另外的相同要素。It should be noted that, in this document, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any relationship between these entities or operations. any such actual relationship or sequence exists. It should also be noted that the terms "comprising", "comprising" or any other variation thereof are intended to encompass a non-exclusive inclusion such that a process, method, article or device comprising a series of elements includes not only those elements, but also Other elements not expressly listed, or which are inherent to such a process, method, article of manufacture, or apparatus are also included. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, method, article of manufacture or apparatus that includes the element.

本说明书中的各个实施例均采用相关的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于系统实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。Each embodiment in this specification is described in a related manner, and the same and similar parts between the various embodiments may be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, as for the system embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for related parts, please refer to the partial description of the method embodiment.

以上仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。The above are merely examples of the present application, and are not intended to limit the present application. Various modifications and variations of this application are possible for those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application shall be included within the scope of the claims of this application.

Claims (10)

1. A database information screening method is characterized by comprising the following steps:
acquiring a sampling data table; the sampling data table is generated based on sampling information of a database;
when at least one piece of record information exists in the sampling data table, searching and obtaining a field type with at least one piece of field information from the sampling data table to obtain a target field type;
performing information detection on each field information under the target field type based on a target information screening template to obtain the number of the field information matched with the target information screening template;
when the ratio of the number to the total number of the field information under the target field type is greater than a preset ratio, acquiring the position information of the field file under the target field type in the database to obtain first position information; the field file comprises at least one field information;
and searching the database based on the first position information to obtain first target information.
2. The database information screening method according to claim 1, wherein the information detection of each field information under the target field type based on the target information screening template specifically includes:
when the target information screening template is the keyword information, acquiring the field name of the field information under the target field type; judging whether the field name contains the keyword information or not; if yes, detecting to obtain that the field information corresponding to the field name is matched with the target information screening template;
when the target information screening template is character combination information, judging whether the field information under the target field type contains the character combination information; and if so, detecting that the field information under the target field type is matched with the target information screening template.
3. The database information screening method according to claim 1, further comprising:
when the ratio of the number to the total number of the field information under the target field type is greater than 0 and smaller than a preset ratio, acquiring the position information of the field information under the target field type in the database to obtain second position information;
and searching the database to obtain second target information based on the second position information.
4. The database information screening method according to claim 3, further comprising:
generating a first information screening report table based on the first target information parameter; the first target information parameter is one or more of an instance name of the database, an IP address of the database, a sampling batch to which the sampling data table belongs, a user name corresponding to the sampling data table, a table name of the sampling data table, a field name of field information matched with the target information screening template, the number of the field information matched with the target information screening template, a name of first target information, a preset ratio and first position information;
generating a second information screening report table based on the second target information parameter; the second target information parameter is one or more of an instance name of the database, an IP address of the database, a sampling batch to which the sampling data table belongs, a user name corresponding to the sampling data table, a table name of the sampling data table, a field name of field information matched with the target information screening template, the number of field information matched with the target information screening template, a name of second target information, a preset ratio and second position information.
5. The database information screening method according to claim 1, further comprising:
when at least one piece of record information exists in the sampling data table, searching and obtaining a field type without field information from the sampling data table to obtain a reference field type;
generating a third information screening report table based on the third target information parameter; the third target information parameter is one or more of an instance name of the database, an IP address of the database, a sampling batch to which the sampling data table belongs, a user name corresponding to the sampling data table, a table name of the sampling data table, a field name corresponding to the reference field type, and the number of non-empty record information;
when the sampling data table does not have the record information, generating a fourth information screening report table based on the fourth target information parameter; the fourth target information parameter is one or more of an instance name of the database, an IP address of the database, a sampling batch to which the sampling data table belongs, a user name corresponding to the sampling data table, a table name of the sampling data table, and the number of non-empty record information.
6. The database information screening method according to claim 1, wherein the method for generating the sampling data table specifically includes:
acquiring all data tables corresponding to the user name in an oracle database to obtain a first data table set;
selecting a data table from the first data table set, and calculating the ratio of the storage size of a piece of recording information in the selected data table to the storage size of the selected data table to obtain a storage ratio;
determining a multiple of the sampling number so that a sampling proportion calculated based on the multiple is within a preset sampling value range; the preset sampling value range is the sampling value range of the sampling function of the oracle database; the sampling proportion is the product of the multiple and the sampling number multiplied by the storage ratio value;
collecting record information of the multiple sampling number from the selected data table based on the sampling function of the oracle database to obtain first sampling record information;
generating the sampling data table based on the record information of the sampling number selected from the first sampling record information.
7. The database information screening method according to claim 1, wherein the method for generating the sampling data table specifically includes:
acquiring all data tables corresponding to the user name in the mysql database to obtain a second data table set;
selecting a data table from the second data table set, and deriving a preset number of record information from the selected data table based on a data derivation instruction of the mysql database to obtain second record information;
and generating the sampling data table based on the second record information.
8. A database information screening system, comprising:
the data acquisition module is used for acquiring a sampling data table; the sampling data table is generated based on sampling information of a database;
the field determining module is used for searching and obtaining a field type with at least one piece of field information from the sampling data table to obtain a target field type when at least one piece of record information exists in the sampling data table;
the information detection module is used for carrying out information detection on each field information under the target field type based on a target information screening template to obtain the number of the field information matched with the target information screening template;
the first position determining module is used for acquiring the position information of the field file in the target field type in the database when the ratio of the number to the total number of the field information in the target field type is greater than a preset ratio to obtain first position information; the field file comprises at least one field information;
and the data extraction module is used for searching and obtaining first target information from the database based on the first position information.
9. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a program which, when executed by a processor, implements the database information screening method according to any one of claims 1 to 7.
10. An electronic device, comprising:
at least one processor, and at least one memory, bus connected with the processor;
the processor and the memory complete mutual communication through the bus; the processor is used for calling the program instructions in the memory to execute the database information screening method of any one of claims 1 to 7.
CN202210181495.7A 2022-02-25 2022-02-25 Database information screening method, system, storage medium and electronic equipment Pending CN114547109A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210181495.7A CN114547109A (en) 2022-02-25 2022-02-25 Database information screening method, system, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210181495.7A CN114547109A (en) 2022-02-25 2022-02-25 Database information screening method, system, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN114547109A true CN114547109A (en) 2022-05-27

Family

ID=81680433

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210181495.7A Pending CN114547109A (en) 2022-02-25 2022-02-25 Database information screening method, system, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN114547109A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115310514A (en) * 2022-07-05 2022-11-08 上海淇毓信息科技有限公司 Method and device for identifying target type data in mass data

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105825138A (en) * 2015-01-04 2016-08-03 北京神州泰岳软件股份有限公司 Sensitive data identification method and device
CN110502515A (en) * 2019-08-15 2019-11-26 中国平安财产保险股份有限公司 Collecting method, device, equipment and computer readable storage medium
CN111783126A (en) * 2020-07-21 2020-10-16 支付宝(杭州)信息技术有限公司 A privacy data identification method, apparatus, device and readable medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105825138A (en) * 2015-01-04 2016-08-03 北京神州泰岳软件股份有限公司 Sensitive data identification method and device
CN110502515A (en) * 2019-08-15 2019-11-26 中国平安财产保险股份有限公司 Collecting method, device, equipment and computer readable storage medium
CN111783126A (en) * 2020-07-21 2020-10-16 支付宝(杭州)信息技术有限公司 A privacy data identification method, apparatus, device and readable medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115310514A (en) * 2022-07-05 2022-11-08 上海淇毓信息科技有限公司 Method and device for identifying target type data in mass data

Similar Documents

Publication Publication Date Title
CN109510737B (en) Protocol interface testing method and device, computer equipment and storage medium
JP6599906B2 (en) Login account prompt
US10216848B2 (en) Method and system for recommending cloud websites based on terminal access statistics
CN112579623B (en) Method, device, storage medium and equipment for storing data
CN108009223B (en) Method and device for detecting consistency of transaction data
KR102110642B1 (en) Password protection question setting method and device
CN110807487A (en) Method and device for identifying user based on domain name system flow record data
CN105205365B (en) Registration and authentication method and device for biological characteristic information
CN114297719A (en) Data desensitization method and device, storage medium and electronic equipment
CN114707180A (en) Log desensitization method and device
CN113220187B (en) Micro banking business interaction method and related equipment
CN114547109A (en) Database information screening method, system, storage medium and electronic equipment
CN113127767B (en) Mobile phone number extraction method and device, electronic equipment and storage medium
CN102523286A (en) Method and device for obtaining credit degree of service
CN112835903A (en) Sensitive data identification method and equipment
CN115098738B (en) Business data extraction method, device, storage medium and electronic device
CN110019357B (en) Database query script generation method and device
CN113641703B (en) Customer data query method and device, electronic device, and storage medium
CN111753020A (en) A method and device for establishing a relationship network model
CN110858852B (en) Method and device for acquiring registered domain name
CN112579534B (en) File screening method and device
CN113987049A (en) Sensitive data discovery processing method and system
CN113486194A (en) Weight-proof method and device for knowledge graph
CN110059272B (en) Page feature recognition method and device
CN110929049B (en) User account identification method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Country or region after: China

Address after: 1006 and 1008 zhangheng Road, China (Shanghai) pilot Free Trade Zone, Pudong New Area, Shanghai, 201203

Applicant after: UnionPay Business Payment Co.,Ltd.

Address before: No. 1006 and 1008 Zhangheng Road, China (Shanghai) Pilot Free Trade Zone, Pudong New Area, Shanghai

Applicant before: CHINA UMS CO.,LTD.

Country or region before: China

CB02 Change of applicant information