CN117556100B

CN117556100B - Audit tag configuration strategy generation method, device, equipment and storage medium

Info

Publication number: CN117556100B
Application number: CN202311583480.4A
Authority: CN
Inventors: 周维聪; 邱添羽; 李航; 史清江
Original assignee: Chinese University of Hong Kong Shenzhen
Current assignee: Chinese University of Hong Kong Shenzhen
Priority date: 2023-11-23
Filing date: 2023-11-23
Publication date: 2025-01-17
Anticipated expiration: 2043-11-23
Also published as: CN117556100A

Abstract

The invention provides a method, a device, equipment and a storage medium for generating an audit label configuration strategy, which comprise the steps of obtaining an interception standard document and an audit standard specification document; the method comprises the steps of obtaining a plurality of text information and picture information based on an audit standard specification document and an interception standard document, generating inquiry instruction information based on each text information and each picture information, and inputting the inquiry instruction information into a preset question-answer model to obtain an audit tag configuration strategy. According to the method, the text information and the picture information are extracted based on the interception standard document and the auditing standard specification document, the text information and the picture information are combined to generate the inquiry instruction information, the inquiry instruction information is input into the open-source preset question-answering model, and the auditing label configuration strategy is obtained, so that auditing labels are not required to be manually configured according to the standard, the labor and time cost is reduced, and the open-source preset question-answering model has strong understanding capability and reasoning capability, so that the generated auditing label configuration strategy is more comprehensive.

Description

Method, device, equipment and storage medium for generating audit tag configuration strategy

Technical Field

The present invention relates to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, a device, and a storage medium for generating an audit tag configuration policy.

Background

Currently, sensitive contents such as violence, low custom and the like exist in various information on the Internet. In order to purify the network environment, clear network space is created, and content wind control auditing is important. Previously, the content wind control auditing is purely dependent on manual auditing, and is time-consuming and labor-consuming.

In recent years, with the development of machine learning and deep learning algorithms, machine auditing becomes the main force of content wind control auditing by virtue of good cost performance. However, the current graphic content wind control machine auditing system still needs to rely on a lot of manpower, especially for the B-end client, facing different clients and different scenes, and needs to arrange experienced configurators to manually configure different auditing labels after fully communicating with the clients so as to obtain good interception effects. However, manually configuring audit labels is costly in terms of both manpower and time, and requires the ability of the configurator to understand each audit label to a precise depth, once the understanding is biased, the resulting audit effect is affected.

Disclosure of Invention

The invention provides a method, a device, equipment and a storage medium for generating an audit label configuration strategy, and aims to solve the technical problems that manual configuration of audit labels is high in labor and time cost, configuration personnel are required to accurately and deeply understand the capability of each audit label, and the final audit effect is affected once deviation exists in understanding.

The invention provides a method for generating an audit tag configuration strategy, which comprises the following steps:

acquiring an interception standard document of a target client and an auditing standard specification document of a product;

Based on the auditing standard specification document and the interception standard document, extracting a plurality of text information and a plurality of picture information;

generating inquiry instruction information based on each text information and each picture information;

And inputting the inquiry instruction information into a preset question-answer model to obtain an audit tag configuration strategy output by the preset question-answer model.

According to the method for generating the audit tag configuration strategy provided by the invention, the method for extracting a plurality of text information and a plurality of picture information based on the audit standard specification document and the interception standard document comprises the following steps:

Sequentially carrying out text extraction and picture extraction on the auditing standard specification document and the interception standard document to obtain a plurality of text extraction information and picture information associated with the text extraction information;

and carrying out semantic sentence breaking processing on each text extraction information to obtain each text information.

According to the method for generating the audit tag configuration strategy provided by the invention, the query instruction information is generated based on the text information and the picture information, and the method comprises the following steps:

coding each piece of picture information to obtain picture coding information;

and generating inquiry instruction information in a preset inquiry format based on the picture coding information and the associated text information.

According to the audit tag configuration strategy generation method provided by the invention, the interception standard document and the audit standard specification document comprise one or more of multi-level tag information, definition description information and sample information.

According to the method for generating the audit tag configuration strategy provided by the invention, after the query instruction information is input into the preset question-answer model to obtain the audit tag configuration strategy output by the preset question-answer model, the method further comprises the steps of:

Obtaining training samples of different sensitive content types, wherein the training samples are configured with sample tags corresponding to the audit tag configuration strategies;

Iteratively training a preset content auditing model based on each training sample and each corresponding sample label;

and auditing the content to be audited by using the trained content auditing model to obtain a content auditing result.

According to the method for generating the audit tag configuration strategy provided by the invention, before the query instruction information is input into a preset question-answer model to obtain the audit tag configuration strategy output by the preset question-answer model, the method further comprises the steps of:

acquiring a plurality of different auditing topics;

for any one of the examination questions, repeatedly inputting the examination questions into the preset question-answering model to obtain first feedback information of a plurality of times output by the preset question-answering model,

Ranking the first feedback information to train a reward model according to ranking results;

inputting the auditing questions into the preset question-answering model to obtain second feedback information output by the preset question-answering model;

Scoring the second feedback information using the trained reward model;

and carrying out parameter adjustment on the preset question-answer model based on the scoring result and the auditing questions.

Acquiring a plurality of inquiry questions with different interception standards and audits, wherein each inquiry question is configured with answer information;

and carrying out parameter adjustment on the preset question-answer model based on each question and the corresponding answer information.

The invention also provides a device for generating the audit tag configuration strategy, which comprises the following steps:

The acquisition module is used for acquiring the interception standard document of the target client and the auditing standard specification document of the product;

the extraction module is used for extracting a plurality of text information and a plurality of picture information based on the auditing standard specification document and the interception standard document;

the generation module is used for generating inquiry instruction information based on the text information and the picture information;

And the label configuration module is used for inputting the inquiry instruction information into a preset question-answer model to obtain an audit label configuration strategy output by the preset question-answer model.

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes any one of the audit tag configuration policy generation methods when executing the program.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements an audit tag configuration policy generation method as described in any of the above.

The invention also provides a computer program product comprising a computer program which when executed by a processor implements the audit tag configuration policy generation method as described in any of the above.

The method, the device, the equipment and the storage medium for generating the audit tag configuration strategy comprise the steps of obtaining an interception standard document of a target client and an audit standard specification document of a product, extracting a plurality of text information and a plurality of picture information based on the audit standard specification document and the interception standard document, generating inquiry instruction information based on the text information and the picture information, and inputting the inquiry instruction information into a preset question-answer model to obtain the audit tag configuration strategy output by the preset question-answer model. According to the method, text and picture extraction is carried out on the interception standard documents and the audit standard documents of the products to obtain a plurality of text information and a plurality of picture information, and then the query instruction information is generated by combining the text information and the picture information, so that the query instruction information is input into the open-source preset question-answer model to obtain the audit tag configuration strategy, manual configuration of the audit tag configuration strategy according to the standard is not needed, labor and time cost is reduced, and the understanding capability and reasoning capability of the open-source preset question-answer model are stable, so that the generated audit tag configuration strategy is comprehensive, and the condition that the audit effect is influenced by deviation of the manual configuration audit tag configuration strategy is reduced.

Drawings

In order to more clearly illustrate the invention or the technical solutions in the prior art, the drawings that are used in the description of the embodiments or the prior art will be briefly described one by one, it being obvious that the drawings in the description below are some embodiments of the invention, and that other drawings can be obtained from these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of an audit tag configuration policy generation method provided by the present invention;

FIG. 2 is a complete schematic diagram of an audit tag configuration policy generation method provided by the present invention;

FIG. 3 is a schematic diagram of the configuration of audit tag configuration policy generating device provided by the present invention;

Fig. 4 is a schematic structural diagram of an electronic device provided by the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The terminology used in the one or more embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the invention. As used in one or more embodiments of the invention, the singular forms "a," "an," "the," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present invention refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that, although the terms first, second, etc. may be used in one or more embodiments of the invention to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first may also be referred to as a second, and similarly, a second may also be referred to as a first, without departing from the scope of one or more embodiments of the invention. Depending on the context of the user, the word "if" as used herein may be interpreted the method is "in the environment. Or" when.

In recent years, with the development of machine learning and deep learning algorithms, machine auditing becomes the main force of content wind control auditing by virtue of good cost performance. However, the current graphic content wind control machine auditing system still needs to rely on a lot of manpower, especially for the client at the end B, facing different clients and different scenes, and needs to arrange experienced configurators to manually configure different auditing label configuration strategies after fully communicating with the clients so as to obtain good interception effects. The cost is high in manpower and time, and the ability of a configurator to understand each audit tag configuration strategy accurately and deeply is required, and once the configuration deviates, the final audit effect is affected.

In view of the above problems, the present invention provides the following embodiments. Specifically, fig. 1 is a schematic flow chart of an audit tag configuration policy generation method provided by the present invention. As shown in fig. 1, the method for generating the audit tag configuration policy includes:

S11, acquiring an interception standard document of a target client and an audit standard specification document of a product;

It should be noted that, the intercepting standard document of the client mainly focuses on the needs, expectations and standards of the client. This may include customer specifications, requirements, and standardized procedures to ensure that the product or service meets their expectations. The customer's interception criteria documents are focused on customer and market needs and may relate to stakeholders associated with the customer, sales, and market, etc.

It should be noted that, the audit standard specification document of the product focuses on the specification, quality standard, manufacturing process of the product and meets the requirements of specific industries or regulations. The audit standard specification document of the product may be formulated by an internal department such as a product development, quality control or manufacturing department.

It should be noted that, the client may fill in the document according to the specified format, and then the policy configurator checks and sorts the document into a certain form of interception standard document and audit standard specification document, where the interception standard document and audit standard specification document may optionally include one or more of multi-level tag information, definition description information and sample information. The multi-level tag information characterizes the type tag of the sensitive content, the definition description information represents the detail description information of the sensitive content of the type, and the sample information represents a detailed example or sample graph. Alternatively, the document form may be referred to in Table 1.

TABLE 1

Step S12, extracting a plurality of text information and a plurality of picture information based on the auditing standard specification document and the interception standard document;

Specifically, since the interception standard document and the audit standard specification document of the product are integrated according to a certain form, in this embodiment, text and picture recognition can be performed on the audit standard specification document and the interception standard document in sequence, respectively, so as to obtain a plurality of text extraction information and picture information associated with the text extraction information. For example, the advertisement, the bid product, the trademark, logo and other contents, the A product and other text extraction information and the corresponding trademark and logo sample picture information are sequentially identified according to the sequence of the first-level label, the second-level label, the definition description information and the sample information. And performing semantic sentence breaking processing on each text extraction information to obtain each text information.

Step S13, generating inquiry instruction information based on the text information and the picture information;

Specifically, each piece of picture information is encoded to obtain picture encoding information, for example, the picture information is encoded through a linear change layer. And further generating query instruction information in a preset query format based on the picture coding information and the associated text information, wherein the preset query format can be optionally set according to practical situations, for example, the preset query format is defined as a label currently, namely, picture sense-wearing exposure is defined as female chest exposure, and picture advertisement-mobile phone logo is defined as logo of a mobile phone brand.

The picture interception standard of the current customer is 1 interception of female chest wearing leakage, such as [0.123,0.862,0.411.], and 2 interception of all bidding contents related to A brand mobile phones, including commodities, trademarks, logo and the like. Please output the suggested audit tag configuration policy, if there is no corresponding tag capability, it is explicitly stated.

And S14, inputting the inquiry instruction information into a preset question-answer model to obtain an audit tag configuration strategy output by the preset question-answer model.

It should be noted that the preset question-answering model is a large model of an open source, for example, chatGPT and BERT models. The existing open source large model can well understand human language. And the query instruction information is input into a preset question-answer model, so that the preset question-answer model can be obtained, and the audit tag configuration strategy corresponding to the target client is output.

The method comprises the steps of obtaining an interception standard document of a target client and an audit standard specification document of a product, extracting a plurality of text information and a plurality of picture information based on the audit standard specification document and the interception standard document, generating inquiry instruction information based on the text information and the picture information, and inputting the inquiry instruction information into a preset question-answering model to obtain an audit tag configuration strategy output by the preset question-answering model. The method comprises the steps of extracting texts and pictures from an audit standard specification document of an interception standard document and a product to obtain a plurality of text information and a plurality of picture information, and further combining the text information and the picture information to generate inquiry instruction information, so that the inquiry instruction information is input into an open-source preset question-answer model to obtain an audit tag configuration strategy, manual configuration audit tag configuration strategies according to the standard are not needed, labor and time cost is reduced, and understanding capability and reasoning capability of the open-source preset question-answer model are stable, so that the generated audit tag configuration strategy is comprehensive, and the condition that deviation of manually configuring audit tags influences audit effects is reduced.

In one embodiment of the present invention, the generating the query instruction information based on each of the text information and each of the picture information includes:

Specifically, each piece of picture information is encoded to obtain picture encoded information, the picture encoded information is weighted, and the weighted picture encoded information and the text information related to the weighted picture encoded information are generated to generate inquiry instruction information in a preset inquiry format.

In an embodiment of the present invention, before the inputting the query instruction information into a preset question-answer model to obtain the audit tag configuration policy output by the preset question-answer model, the method further includes:

The method comprises the steps of obtaining a plurality of different examination questions, repeatedly inputting the examination questions into a preset question-answering model for any examination question to obtain a plurality of times of first feedback information output by the preset question-answering model, sequencing the first feedback information to train a reward model according to a sequencing result, inputting the examination questions into the preset question-answering model for any examination question to obtain second feedback information output by the preset question-answering model, scoring the second feedback information by using the trained reward model, and carrying out parameter adjustment on the preset question-answering model based on the scoring result and the examination questions.

In practical use, the interception standards of various clients and various scenes are complex, and the open source large model needs to be finely tuned in consideration of the fact that the open source large model is not specially aimed at the content auditing scene.

The method comprises the steps of firstly obtaining a plurality of different examination questions, and then repeatedly inputting the examination questions into the preset question-answering model for obtaining a plurality of first feedback information output by the preset question-answering model, and further obtaining a sequencing result of manually sequencing each first feedback information corresponding to any examination question. Further, the reward model is trained using the ranked results and corresponding reward information. This generally involves methods of supervised learning or reinforcement learning. In reinforcement learning, a strategy gradient-like algorithm may be used to update model parameters by maximizing the desired rewards, resulting in a trained rewards model. And further, inputting the auditing questions into the preset question-answering model to obtain second feedback information output by the preset question-answering model, and scoring the second feedback information by using the trained rewarding model. And then, using the scoring result of the reward model to carry out parameter adjustment on the preset question-answer model. This may be done by an optimization algorithm such as gradient descent, updating the parameters of the model according to the direction of the reward.

According to the embodiment of the invention, the preset question-answer model is enabled to learn more information corresponding to the content auditing scene by carrying out parameter adjustment on the preset question-answer model, so that the accuracy of model reply is improved.

acquiring a plurality of inquiry questions with different interception standards and audits, wherein each inquiry question is configured with answer information; and carrying out parameter adjustment on the preset question-answer model based on each question and the corresponding answer information.

Specifically, a plurality of inquiry questions with different interception standards and audits are obtained, wherein each inquiry question is configured with answer information set manually, and further parameter adjustment is performed on the preset inquiry and answer model based on each inquiry question and the corresponding answer information.

In one embodiment of the present invention, after the query instruction information is input to a preset question-answer model to obtain an audit tag configuration policy output by the preset question-answer model, the method further includes:

The method comprises the steps of obtaining training samples of different sensitive content types, wherein the training samples are configured with sample tags corresponding to an audit tag configuration strategy, iteratively training a preset content audit model based on each training sample and each corresponding sample tag, and auditing content to be audited by using the trained content audit model to obtain a content audit result.

Specifically, a large number of training samples containing different sensitive content types are first obtained, and these training samples need to be labeled, that is, a corresponding audit tag configuration policy (i.e., a sample tag in this embodiment) is assigned to each training sample to indicate the sensitive content type present in the training sample. And further, based on each training sample and each corresponding sample label, iteratively training a preset content auditing model. In the process of content auditing, the content to be audited is obtained, and the content to be audited is audited by using a trained content auditing model, so that a content auditing result is obtained.

Referring to fig. 2, fig. 2 is a complete schematic diagram of an audit tag configuration policy generation method provided by the present invention. Specifically, firstly, an interception standard document of a customer and an audit standard specification document of a product which are integrated according to a certain form are obtained, then a plurality of text information and a plurality of picture information are sequentially extracted from the audit standard specification document and the interception standard document, further, inquiry instruction information is generated based on the text information and the picture information, and accordingly the inquiry instruction information is input into a preset question-answering model to obtain an audit tag configuration strategy output by the preset question-answering model.

The audit tag configuration policy generating device provided by the invention is described below, and the audit tag configuration policy generating device described below and the audit tag configuration policy generating method described above can be referred to correspondingly.

Fig. 3 is a schematic structural diagram of an audit tag configuration policy generating device provided by the present invention, and as shown in fig. 3, an audit tag configuration policy generating device according to an embodiment of the present invention includes:

an acquisition module 21, configured to acquire an interception standard document of a target client and an audit standard specification document of a product;

An extracting module 22, configured to extract a plurality of text information and a plurality of picture information based on the audit standard specification document and the interception standard document;

A generating module 23, configured to generate query instruction information based on each of the text information and each of the picture information;

The audit tag configuration policy generation device further includes:

coding each piece of picture information to obtain picture coding information;

The audit tag configuration policy generation device further includes:

the interception standard document and the audit standard specification document comprise one or more of multi-level tag information, definition description information and sample information.

The audit tag configuration policy generation device further includes:

acquiring a plurality of different auditing topics;

Scoring the second feedback information using the trained reward model;

The audit tag configuration policy generation device further includes:

It should be noted that, the above device provided in the embodiment of the present invention can implement all the method steps implemented in the method embodiment and achieve the same technical effects, and detailed descriptions of the same parts and beneficial effects as those of the method embodiment in the embodiment are omitted.

Fig. 3 is a schematic structural diagram of an electronic device according to the present invention, as shown in fig. 3, the electronic device may include a processor (processor) 310, a memory (memory) 320, a communication interface (CommunicationsInterface) 330, and a communication bus 340, where the processor 310, the memory 320, and the communication interface 330 complete communication with each other through the communication bus 340. The processor 310 may call a logic instruction in the memory 320 to execute an audit tag configuration policy generation method, where the method includes obtaining an interception standard document of a target customer and an audit standard specification document of a product, extracting a plurality of text information and a plurality of picture information based on the audit standard specification document and the interception standard document, generating query instruction information based on each text information and each picture information, and inputting the query instruction information to a preset question-answer model to obtain an audit tag configuration policy output by the preset question-answer model.

Further, the logic instructions in the memory 320 described above may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. The storage medium includes a U disk, a removable hard disk, a Read-only memory (ROM), a random access memory (RAM, randomAccessMemory), a magnetic disk, an optical disk, or other various media capable of storing program codes.

In still another aspect, the present invention further provides a non-transitory computer readable storage medium, on which a computer program is stored, the computer program being implemented when executed by a processor to perform the method for generating an audit tag configuration policy provided by the above methods, where the method includes obtaining an interception standard document of a target customer and an audit standard specification document of a product, extracting, based on the audit standard specification document and the interception standard document, a plurality of text information and a plurality of picture information, generating query instruction information based on each of the text information and each of the picture information, and inputting the query instruction information to a preset question-answer model to obtain an audit tag configuration policy output by the preset question-answer model.

In another aspect, the present invention also provides a computer program product, where the computer program product includes a computer program, where the computer program can be stored on a non-transitory computer readable storage medium, and when the computer program is executed by a processor, the computer can execute the audit tag configuration policy generating method provided by the above methods.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

It should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention, and not for limiting the same, and although the present invention has been described in detail with reference to the above-mentioned embodiments, it should be understood by those skilled in the art that the technical solution described in the above-mentioned embodiments may be modified or some technical features may be equivalently replaced, and these modifications or substitutions do not make the essence of the corresponding technical solution deviate from the spirit and scope of the technical solution of the embodiments of the present invention.

Claims

1. An audit tag configuration policy generation method, comprising:

Inputting the inquiry instruction information into a preset question-answer model to obtain an audit tag configuration strategy output by the preset question-answer model;

the interception standard document and the auditing standard specification document comprise multi-level tag information, definition description information and sample information, wherein the multi-level tag information represents type tags of sensitive contents, the definition description information represents detail description information of the type of sensitive contents, and the sample information represents detailed examples or sample diagrams;

The extracting a plurality of text information and a plurality of picture information based on the audit standard specification document and the interception standard document comprises the following steps:

performing text extraction and picture extraction on the auditing standard specification document and the interception standard document according to the sequence of the multi-level label information, the definition description information and the sample information to obtain a plurality of text extraction information and picture information associated with the text extraction information;

performing semantic sentence breaking processing on each text extraction information to obtain each text information;

the generating inquiry instruction information based on each text information and each picture information includes:

coding each piece of picture information to obtain picture coding information;

Generating inquiry instruction information in a preset inquiry format based on the picture coding information and the associated text information;

before the query instruction information is input into a preset question-answer model to obtain the audit tag configuration strategy output by the preset question-answer model, the method further comprises the steps of:

acquiring a plurality of different auditing topics;

Scoring the second feedback information using the trained reward model;

2. The method for generating an audit tag configuration policy according to claim 1, wherein after inputting the query instruction information to a preset question-answer model to obtain an audit tag configuration policy output by the preset question-answer model, further comprises:

3. The method for generating an audit tag configuration policy according to claim 1, wherein before inputting the query instruction information into a preset question-answer model to obtain an audit tag configuration policy output by the preset question-answer model, the method further comprises:

4. An audit tag configuration policy generation apparatus, comprising:

The label configuration module is used for inputting the inquiry instruction information into a preset question-answer model to obtain an audit label configuration strategy output by the preset question-answer model;

coding each piece of picture information to obtain picture coding information;

acquiring a plurality of different auditing topics;

Scoring the second feedback information using the trained reward model;

5. An electronic device comprising a memory, a processor and a computer program stored on the memory and running on the processor, wherein the processor implements the audit tag configuration policy generation method of any of claims 1 to 3 when the program is executed by the processor.

6. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the audit tag configuration policy generation method according to any of claims 1 to 3.