CN111899042B - Malicious exposure advertisement behavior detection method and device, storage medium and terminal - Google Patents
Malicious exposure advertisement behavior detection method and device, storage medium and terminal Download PDFInfo
- Publication number
- CN111899042B CN111899042B CN201910372420.5A CN201910372420A CN111899042B CN 111899042 B CN111899042 B CN 111899042B CN 201910372420 A CN201910372420 A CN 201910372420A CN 111899042 B CN111899042 B CN 111899042B
- Authority
- CN
- China
- Prior art keywords
- text
- occupied area
- article
- detected
- display
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 33
- 230000000694 effects Effects 0.000 claims abstract description 46
- 230000006399 behavior Effects 0.000 claims abstract description 41
- 238000010586 diagram Methods 0.000 claims abstract description 29
- 238000009877 rendering Methods 0.000 claims abstract description 21
- 238000000034 method Methods 0.000 claims description 44
- 238000012545 processing Methods 0.000 claims description 18
- 238000005260 corrosion Methods 0.000 claims description 5
- 230000007797 corrosion Effects 0.000 claims description 5
- 238000007781 pre-processing Methods 0.000 claims description 5
- 238000004590 computer program Methods 0.000 claims description 2
- 230000006870 function Effects 0.000 description 11
- 238000004891 communication Methods 0.000 description 9
- 238000005516 engineering process Methods 0.000 description 5
- 238000012423 maintenance Methods 0.000 description 4
- 238000007405 data analysis Methods 0.000 description 3
- 238000003708 edge detection Methods 0.000 description 3
- 238000010295 mobile communication Methods 0.000 description 3
- 230000002159 abnormal effect Effects 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 2
- 238000004380 ashing Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 230000005484 gravity Effects 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 239000000047 product Substances 0.000 description 2
- 230000005236 sound signal Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000007599 discharging Methods 0.000 description 1
- 238000005530 etching Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000009191 jumping Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 238000010897 surface acoustic wave method Methods 0.000 description 1
- 230000007306 turnover Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
- G06Q30/0248—Avoiding fraud
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
- G06Q30/0277—Online advertisement
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- Finance (AREA)
- Theoretical Computer Science (AREA)
- Strategic Management (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Entrepreneurship & Innovation (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Economics (AREA)
- Game Theory and Decision Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The embodiment of the application discloses a detection method, a device, a storage medium and a terminal for malicious exposure advertisement behaviors, wherein the detection method comprises the following steps: performing page rendering based on page data of articles to be detected, and generating a target page; acquiring the display occupied area of text content in an effect diagram of a target page; calculating the actual occupied area of the text content in the article to be detected; the display footprint is compared with the actual footprint to determine whether the article to be detected maliciously exposes the advertisement. The scheme of the application can effectively improve the accuracy of advertisement detection in the article.
Description
Technical Field
The application relates to the technical field of information processing, in particular to a method and a device for detecting malicious exposure advertisement behaviors, a storage medium and a terminal.
Background
With the development of the internet and the development of the mobile communication network, advertisement placement modes and expressions are increasing.
In the related art, advertisements can be planted in some articles, information, videos and the like with larger traffic. For example, consider a common exposure-based advertising billing method CPM (Cost Per Mille) advertising, which charges per thousand people for such advertising. The articles published by the accounts registered in some platforms can be applied for original articles as long as a certain word number is exceeded and the text content has no similarity with all historical articles of the platforms. Since the platform gives the advertisement of the original article a proportional ratio much higher than that of the general article. Therefore, some bad accounts lead the advertisement at the bottom of the original article to be preposed towards the top as far as possible, so that the article enjoys the division of the original high advertisement and simultaneously maliciously exposes the advertisement, and the difficulty of maintaining the quality of the article in a public platform is increased.
Disclosure of Invention
The embodiment of the application provides a detection method, a detection device, a storage medium and a detection terminal for malicious exposure advertisement behaviors, which can effectively improve the accuracy of advertisement detection in articles.
The embodiment of the application provides a detection method of malicious exposure advertisement behaviors, which is applied to a client and comprises the following steps:
Performing page rendering based on page data of articles to be detected, and generating a target page;
acquiring the display occupied area of text content in the effect diagram of the target page;
calculating the actual occupied area of the text content in the article to be detected;
comparing the display occupied area with the actual occupied area to determine whether the article to be detected maliciously exposes advertisements.
Correspondingly, the embodiment of the application also provides a detection device for malicious exposure advertisement behaviors, which is applied to the client and comprises the following steps:
The rendering unit is used for performing page rendering based on the page data of the article to be detected and generating a target page;
The obtaining unit is used for obtaining the display occupied area of the text content in the effect diagram of the target page;
the calculating unit is used for calculating the actual occupied area of the text content in the article to be detected;
And the determining unit is used for comparing the display occupied area with the actual occupied area to determine whether the article to be detected maliciously exposes the advertisement.
Accordingly, an embodiment of the present application further provides a storage medium, where the storage medium stores a plurality of instructions, where the instructions are adapted to be loaded by a processor to perform the steps in the method as described above.
A terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method as described above when executing the program.
In the embodiment of the application, whether the text area is hidden in the effect diagram is judged by comparing the display area of the text content in the effect diagram after the corresponding rendering of the article into the display page and the actually occupied area of the text content. Therefore, whether the article maliciously exposes the advertisement or not is determined, and the accuracy of advertisement detection is improved. The account of malicious exposure advertisement can be directly detected, so that the behavior of some accounts of malicious exposure advertisement is limited, and the article quality in the communication platform is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of a method for detecting malicious exposure advertisement behaviors according to an embodiment of the present application.
Fig. 2 is another flow chart of a method for detecting malicious exposure advertisement behaviors according to an embodiment of the present application.
Fig. 3 is an application scenario schematic diagram of a method for detecting malicious exposure advertisement behaviors provided by an embodiment of the present application.
Fig. 4 is a diagram showing an effect of a local page of an article to be detected according to an embodiment of the present application.
Fig. 5 is a schematic structural diagram of a detection device for malicious exposure advertisement behavior according to an embodiment of the present application.
Fig. 6 is a schematic structural diagram of a terminal according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to fall within the scope of the application.
The embodiment of the application provides a method and a device for detecting malicious exposure advertisement behaviors, a storage medium and a terminal.
The detection device for malicious exposure advertisement behavior can be integrated in a terminal with a storage unit, a microprocessor, and an operation capability, such as a tablet PC (Personal Computer) and a mobile phone.
According to the rules of some public platforms, an original article (the expression form is that labels of two words of originality are marked under the title of the article) can be applied as long as the number of words is exceeded and the text content and all historical articles of the platform are not similar. Since the platform gives the advertisement of the original article a proportional ratio much higher than that of the general article. Therefore, a batch of bad accounts are bred, after original articles are released, the front-end webpage technology is utilized to hide the text of the articles in the page display stage, and the content typesetting is controlled on codes to cause abnormal exposure. For example, modifying the font size to 0, the content not occupying the typeset space; changing the format of the text content and the advertisement to overlap; the advertisement can be placed in front by modifying the size of the text box to be very small. Therefore, not only is the original certification obtained, but also the problem of long-range display (the original text certification has a requirement on the number of text words) caused by the original text is eliminated, so that the advertisement at the bottom of the text is preposed towards the top as much as possible. The method can lead the articles to be divided into original high advertisements and maliciously expose the advertisements, and greatly increases the difficulty of maintaining the quality of the articles in public platforms.
In the related art, abnormal exposure is discovered by analyzing the webpage codes of the articles, which is a solution with strong rules, is easily bypassed by various schemes, needs to supplement new rules at any time, has very high maintenance cost, and cannot ensure that the horses can find out when new conditions exist.
Based on the above, the embodiment of the application provides a method for detecting malicious exposure advertisement behaviors, which is applied to a client, and the method comprises the following steps: performing page rendering based on page data of articles to be detected, and generating a target page; acquiring the display occupied area of text content in an effect diagram of a target page; calculating the actual occupied area of the text content in the article to be detected; the display footprint is compared with the actual footprint to determine whether the article to be detected maliciously exposes the advertisement. The following will describe in detail. The numbers of the following examples are not intended to limit the preferred order of the examples.
It should be noted that, the client is an information exchange platform capable of publishing messages (such as stamps) and being used for different users to view, comment, reload and other operations.
Please refer to fig. 1 to 4. Fig. 1 is a flow chart of a method for detecting malicious exposure advertisement behaviors according to an embodiment of the present application. The specific flow of the detection method of the malicious exposure advertisement behavior can be as follows:
101. and performing page rendering based on the page data of the article to be detected, and generating a target page.
In the embodiment of the application, the browser kernel is required to be called for page rendering. The page data may be an HTML file, which includes information (such as configuration information) required for describing the browser, and specific content (such as text content, picture content, video content, and related controls) that are finally displayed.
Specifically, the browser kernel API interface is called to analyze the HTML structure, the received HTML binary data is read, bytes are converted into HTML characters according to a specified coding format, and the construction of a document object model (DOM, document object model) is completed. Style recalculation is then performed, attaching cascading style sheets (CSS, CASCADING STYLE SHEETS) to the document object model to create a rendering tree (RENDER TREE). The rendering tree is a set of objects to be rendered, each object contains a document object model of a calculated style corresponding to the object model, and for each element to be rendered, the position is calculated to form a layout file. And then displaying the rendering tree on the view control through the layout file, and loading external resources such as texts, pictures, videos and the like to corresponding positions for display, so that a target page is generated.
It should be noted that, in the embodiment of the present application, the article to be detected at least needs to include text content, and no requirement is required for pictures and videos.
102. And acquiring the display occupied area of the text content in the effect diagram of the target page.
The effect graph of the target page refers to the whole picture which is finally presented and can be displayed in the client display interface.
In the embodiment of the application, the text area presented in the effect graph needs to be identified and the area is calculated. In the embodiment of the application, the text region can be identified in a plurality of ways. The text regions in the effect map may be identified, for example, based on picture text recognition techniques. That is, in some embodiments, the step of "obtaining the display occupied area of the text content in the effect map of the target page" may include the following steps:
(21) Preprocessing the effect graph to obtain a content outline graph of the effect graph;
(22) Performing expansion treatment and corrosion treatment on the content profile for multiple times alternately to obtain a treated content profile;
(23) Identifying a text region from the processed content profile;
(24) A display footprint of the text content is determined based on the text region.
Specifically, the preprocessing may include image ashing, edge detection, and image binarization in order.
The image ashing process is to reduce the dimension of the image and improve the processing efficiency of the following operation. Since the object of this embodiment is to identify text regions, specific colors are not affected, and therefore only color gradient (i.e., gray value) needs to be maintained.
When the edge detection is carried out on the image, a sobel operator can be adopted for detection. The Sobel operator is one of the most important operators in pixel image edge detection, and is a discrete first-order difference operator used for calculating the approximation of the first-order gradient of the image brightness function. Using this operator at any point in the image will result in a gradient vector or normal vector for that point.
When the image is binarized, the gray value of the pixel point in the effect graph after the processing is set to 0 or 255, namely the whole image is obviously black-white, so that the data volume in the image is greatly reduced, and the outline of the target can be highlighted.
In the embodiment of the application, the expansion treatment and the corrosion treatment are carried out on the content contour image obtained by the pretreatment, so that the concentrated contours are connected into blocks. Wherein the expansion process is a process of incorporating all background points in contact with the object into the object to expand the boundary to the outside. For example, a typical advertisement page is background-containing, rather than plain text, so text on the advertisement can be fused with its background by an expansion process to avoid misidentifying text on the advertisement as text regions in the article. The etching treatment is a process of eliminating boundary points and shrinking the boundary inward. For example, since the text is formed of strokes, in order to be able to directly recognize the text region later, it is necessary to erode the text to connect the region within the text outline together with the strokes into a block region. Therefore, scattered content contours belonging to the same property in the content contour map are concentrated and connected into a piece, and the processed content contour map is obtained.
In some embodiments, the step of identifying text regions from the processed content profile may include the following:
(231) Acquiring typesetting information and word size information of text content from page data;
(232) Determining the presentation form of the display area when the text content is displayed according to the typesetting information and the word size information;
(233) Based on the presentation form of the presentation area, a text area is identified from the processed content profile.
The typesetting information can be typesetting of text content displayed on a rendered page, such as information of transverse arrangement, longitudinal arrangement, line spacing, word spacing and the like. The character size information is the size of the font, and the number of pixel points occupied by characters with the same character size in the display screen is the same. Thus, the presentation form (i.e., the overall outline) of the text content when it is presented on the page can be determined by the typesetting information and the word size information of the text content. Then, based on the obtained presentation form, the pattern in the processed content outline drawing may be searched, and the matched region may be screened out as the displayed text region.
In some embodiments, the step of determining the display footprint of the text content based on the text region may include the following steps:
acquiring the number of first pixel points occupied by the text region in the client;
And determining the display occupied area of the text content according to the first pixel point number.
Specifically, the resolution of the display screen where the client is located can be obtained, and the number of the first pixel points occupied by the text area in the client is calculated based on the proportion of the text area in the display interface of the client.
In practical application, the first number of pixel points can be directly used as the display occupied area of the text content. In addition, the area of a single pixel point can be obtained, and the display occupied area can be calculated based on the area of the single pixel point and the number of the first pixel points.
103. And calculating the actual occupied area of the text content in the article to be detected.
Referring to fig. 3, in some embodiments, the step of "calculating the actual occupied area of the text content in the article to be detected" may include the following steps:
Extracting text attributes from the page data, wherein the text attributes at least comprise: font, font size, and number of words;
According to the fonts, the font sizes and the number of the characters, calculating the number of second pixel points occupied by the characters in the article to be detected in the client;
and determining the actual occupied area of the text content according to the number of the second pixels.
Specifically, a single character of the font and the font size is obtained, and the number of pixel points to be occupied corresponds to the number. And then, based on the number of the pixels and the number of the characters, calculating to obtain the number of second pixels occupied by the characters in the article in the client. It should be noted that, characters with the same font size and different fonts correspond to different numbers of pixel points to be occupied; characters with the same font size and different font sizes are adopted, and the number of the corresponding occupied pixels is also different.
In practical application, the number of the second pixel points can be directly used as the actual occupation area of the characters in the article. In addition, the area of a single pixel point can be obtained, and the actual occupied area can be calculated based on the area of the single pixel point and the number of the second pixel points.
104. The display footprint is compared with the actual footprint to determine whether the article to be detected maliciously exposes the advertisement.
Specifically, there may be various ways of determining whether an article to be detected maliciously exposes an advertisement. In the above description, an original article is required to satisfy at least a certain word number. If the area of the text area displayed by the page in the article is greatly smaller than the actual occupied area of the total number of words in the article, the article has the potential to maliciously hide the text, and the advertisement position is improved forward.
Referring to fig. 3, whether a target page is normally displayed (i.e., text content is not hidden, and a text display area is normally displayed in the right side of fig. 4) or abnormally displayed (i.e., text content is hidden, and a text display area is extremely small in the left side of fig. 4) can be determined according to the difference between the display occupied area and the actual occupied area, so that whether an article to be detected maliciously exposes an advertisement can be indirectly determined.
In some embodiments, the step of comparing the display footprint with the actual footprint to determine whether the article to be detected maliciously exposes the advertisement may include the following steps:
calculating the ratio between the display occupied area and the actual occupied area;
Judging whether the ratio is smaller than a preset ratio;
If yes, judging that the article to be detected maliciously exposes the advertisement;
if not, judging that the article to be detected is a non-maliciously exposed advertisement.
The preset ratio can be evaluated and set by a person skilled in the art or a manufacturer to which the product belongs. Preferably, the preset ratio may be set between 0.6 and 0.8. For example, the preset ratio may be 0.7, when the ratio of the display occupied area to the actual occupied area is smaller than 0.7, the suspicion of hidden characters can be judged, at this time, the articles to be detected can be considered to be maliciously exposed advertisements, otherwise, the non-maliciously exposed advertisements can be judged.
In some embodiments, the step of comparing the display footprint with the actual footprint to determine whether the article to be detected maliciously exposes the advertisement may include the following steps:
calculating a difference between the display occupied area and the actual occupied area;
Judging whether the difference value is larger than a preset difference value or not;
If yes, judging that the article to be detected maliciously exposes the advertisement;
if not, judging that the article to be detected is a non-maliciously exposed advertisement.
Similarly, the preset difference may be evaluated by a person skilled in the art or a manufacturer to whom the product belongs. Preferably, the preset difference value needs to be determined based on the text of the article, the size of the display page of the client, the resolution of the display interface of the terminal, and the like.
In some embodiments, in order to reduce the detection range and improve the processing efficiency, the advertisement in the original article can be filtered out and the article that can be exposed on the first screen can be filtered out. That is, referring to fig. 2, after the target page is generated, before the display occupied area of the text content in the effect diagram of the target page is acquired, the following procedure may be further included:
105. Capturing a picture presented by a client in the process of displaying a target page;
106. judging whether the displayed picture changes or not; if yes, step 102 is performed, otherwise step 107 is performed.
107. And judging malicious advertisements of the articles to be detected.
In practical applications, in order to increase the exposure of advertisements, the advertisements are usually placed in the articles. Therefore, before the page rendering of the article to be detected, articles marked as malicious exposure advertisements by the articles appearing on the head screen of the article can be screened, so that a large number of articles with malicious exposure advertisements can be filtered, and the subsequent load of a processor is greatly reduced. In the implementation, whether the article forms a malicious exposure advertisement behavior or not can be judged through the recorded user reading behavior, and a judgment result is reported by the client.
Specifically, when the user reads the article, the client side reports that the user turns over a plurality of screens, because the advertisement appears on the last screen, if the user does not turn over the screen, it is very likely that the first screen is the advertisement, and the method can reduce the detection range and reduce the data volume to be processed subsequently. Therefore, the detection range is reduced based on the reading behavior of the user, and the processing efficiency is improved.
According to the detection method for malicious exposure advertisement behaviors, malicious exposure advertisement behaviors are not directly identified aiming at content components of the web page codes of the articles, but based on user reading behavior data analysis and picture character recognition technology, the display area of text contents in an effect diagram after the articles are correspondingly rendered into display pages and the actual occupied area of the text contents are compared to judge whether text areas are hidden in the effect diagram, so that whether the articles are maliciously exposed for advertisements is judged indirectly, and the accuracy of advertisement detection is improved. In addition, the account of malicious exposure advertisement can be directly detected, so that the behavior of some accounts of malicious exposure advertisement is limited, the article quality in the communication platform is improved, and the maintenance cost is reduced.
In order to facilitate better implementation of the method for detecting malicious exposure advertisement behaviors provided by the embodiment of the application, the embodiment of the application also provides a device (abbreviated as a processing device) based on the method for detecting malicious exposure advertisement behaviors, which is applied to a client. The meaning of the nouns is the same as that of the detection method of the malicious exposure advertisement behaviors, and specific implementation details can be referred to the description in the embodiment of the method.
Referring to fig. 5, fig. 5 is a schematic structural diagram of a detection apparatus for malicious advertisement exposure according to an embodiment of the present application, where the processing apparatus 400 may include a rendering unit 401, an obtaining unit 402, a calculating unit 403, and a determining unit 404, and may specifically be as follows:
A rendering unit 401, configured to perform page rendering based on page data of an article to be detected, and generate a target page;
an obtaining unit 402, configured to obtain a display occupied area of text content in an effect chart of the target page;
A calculating unit 403, configured to calculate an actual occupied area of text content in the article to be detected;
and the determining unit 404 is configured to compare the display occupied area and the actual occupied area, and determine whether the article to be detected is maliciously exposed to the advertisement.
In some embodiments, the acquisition unit 402 may include:
the first processing subunit is used for preprocessing the effect graph to obtain a content outline graph of the effect graph;
the second processing subunit is used for carrying out multiple alternation of expansion processing and corrosion processing on the content profile to obtain a processed content profile;
an identification subunit, configured to identify a text region from the processed content profile;
and the determining subunit is used for determining the display occupied area of the text content based on the text area.
In some embodiments, the identification subunit may be configured to:
Acquiring typesetting information and word size information of text content from the page data;
Determining the presentation form of a presentation area when the text content is presented according to the typesetting information and the word size information;
and identifying a text region from the processed content contour map based on the presentation form of the presentation region.
In some embodiments, the determining subunit may be configured to:
acquiring the number of first pixel points occupied by the text region in the client;
and determining the display occupied area of the text content according to the number of the first pixel points.
In some embodiments, the computing unit is configured to:
Extracting text attributes from the page data, wherein the text attributes at least comprise: font, font size, and number of words;
calculating the number of second pixel points occupied by the characters in the article to be detected in the client according to the fonts, the font sizes and the number of the characters;
And determining the actual occupied area of the text content according to the number of the second pixel points.
In some embodiments, the apparatus 400 may further include:
the capturing unit is used for capturing the picture presented by the client in the process of displaying the target page before acquiring the display occupied area of the text content in the effect graph of the target page after generating the target page;
A judging unit for judging whether the presented picture changes;
And the acquisition unit is used for acquiring the display occupied area of the text content in the effect diagram of the target page when the judgment unit judges that the effect diagram is yes. .
In some embodiments, the determining unit 404 may specifically be configured to:
calculating the ratio between the display occupied area and the actual occupied area;
judging whether the ratio is smaller than a preset ratio or not;
If yes, judging that the article to be detected maliciously exposes the advertisement;
If not, judging that the article to be detected is a non-malicious exposure advertisement.
In some embodiments, the determining unit 404 may specifically be configured to:
Calculating a difference between the display footprint and the actual footprint;
Judging whether the difference is larger than a preset difference or not;
If yes, judging that the article to be detected maliciously exposes the advertisement;
If not, judging that the article to be detected is a non-malicious exposure advertisement.
According to the malicious exposure advertisement behavior detection device provided by the embodiment of the application, based on the user reading behavior data analysis and the picture character recognition technology, the display area of text content in the effect graph after the text is correspondingly rendered into the display page and the actual occupied area of the text content are compared to judge whether a text area is hidden in the effect graph, so that whether the text is maliciously exposed for advertisement is judged indirectly, and the advertisement detection accuracy is improved. In addition, the account of malicious exposure advertisement can be directly detected, so that the behavior of some accounts of malicious exposure advertisement is limited, the article quality in the communication platform is improved, and the maintenance cost is reduced.
The embodiment of the application also provides a terminal which can be specifically terminal equipment such as a smart phone, a tablet personal computer and the like, wherein the terminal equipment is provided with the client of the embodiment. As shown in fig. 6, the terminal may include Radio Frequency (RF) circuitry 601, memory 602 including one or more computer readable storage media, input unit 603, display unit 604, sensor 605, audio circuitry 606, wireless fidelity (WiFi, wireless Fidelity) module 607, processor 608 including one or more processing cores, and power supply 609. It will be appreciated by those skilled in the art that the terminal structure shown in fig. 6 is not limiting of the terminal and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components. Wherein:
The RF circuit 601 may be used for receiving and transmitting signals during a message or a call, and in particular, after receiving downlink information of a base station, the downlink information is processed by one or more processors 608; in addition, data relating to uplink is transmitted to the base station. Typically, RF circuitry 601 includes, but is not limited to, an antenna, at least one amplifier, a tuner, one or more oscillators, a subscriber identity module (SIM, subscriber Identity Module) card, a transceiver, a coupler, a low noise amplifier (LNA, low Noise Amplifier), a duplexer, and the like. In addition, the RF circuitry 601 may also communicate with networks and other devices through wireless communications. The wireless communication may use any communication standard or protocol including, but not limited to, global system for mobile communications (GSM, global System of Mobile communication), universal packet Radio Service (GPRS, general Packet Radio Service), code division multiple access (CDMA, code Division Multiple Access), wideband code division multiple access (WCDMA, wideband Code Division Multiple Access), long term evolution (LTE, long Term Evolution), email, short message Service (SMS, short MESSAGING SERVICE), and the like.
The memory 602 may be used to store software programs and modules that are stored in the memory 602 for execution by the processor 608 to perform various functional applications and data processing. The memory 602 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the terminal, etc. In addition, the memory 602 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 602 may also include a memory controller to provide access to the memory 602 by the processor 608 and the input unit 603.
The input unit 603 may be used to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control. In particular, in one particular embodiment, the input unit 603 may include a touch-sensitive surface, as well as other input devices. The touch-sensitive surface, also referred to as a touch display screen or a touch pad, may collect touch operations thereon or thereabout by a user (e.g., operations thereon or thereabout by a user using any suitable object or accessory such as a finger, stylus, etc.), and actuate the corresponding connection means according to a predetermined program. Alternatively, the touch-sensitive surface may comprise two parts, a touch detection device and a touch controller. The touch detection device detects the touch azimuth of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device and converts it into touch point coordinates, which are then sent to the processor 608, and can receive commands from the processor 608 and execute them. In addition, touch sensitive surfaces may be implemented in a variety of types, such as resistive, capacitive, infrared, and surface acoustic waves. The input unit 603 may comprise other input devices in addition to a touch sensitive surface. In particular, other input devices may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, mouse, joystick, etc.
The display unit 604 may be used to display information input by a user or information provided to the user and various graphical user interfaces of the terminal, which may be composed of graphics, text, icons, video and any combination thereof. The display unit 604 may include a display panel, which may optionally be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like. Further, the touch-sensitive surface may overlay a display panel, and upon detection of a touch operation thereon or thereabout, the touch-sensitive surface is passed to the processor 608 to determine the type of touch event, and the processor 608 then provides a corresponding visual output on the display panel based on the type of touch event. Although in fig. 6 the touch sensitive surface and the display panel are implemented as two separate components for input and output functions, in some embodiments the touch sensitive surface may be integrated with the display panel to implement the input and output functions. For example, in an embodiment of the present application, the display unit 604 may be configured to display the article to be detected (i.e., the target page) after the page is rendered. The user may touch related controls set on the target page through the input unit 603, so as to implement operations such as page sliding and page jumping.
The terminal may also include at least one sensor 605, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor that may adjust the brightness of the display panel according to the brightness of ambient light, and a proximity sensor that may turn off the display panel and/or backlight when the terminal moves to the ear. As one of the motion sensors, the gravity acceleration sensor can detect the acceleration in all directions (generally three axes), and can detect the gravity and the direction when the mobile phone is stationary, and can be used for applications of recognizing the gesture of the mobile phone (such as horizontal and vertical screen switching, related games, magnetometer gesture calibration), vibration recognition related functions (such as pedometer and knocking), and the like; other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc. that may also be configured in the terminal are not described in detail herein.
Audio circuitry 606, speakers, and a microphone may provide an audio interface between the user and the terminal. The audio circuit 606 may transmit the received electrical signal after audio data conversion to a speaker, where the electrical signal is converted to a sound signal for output; on the other hand, the microphone converts the collected sound signals into electrical signals, which are received by the audio circuit 606 and converted into audio data, which are processed by the audio data output processor 608 for transmission to, for example, another terminal via the RF circuit 601, or which are output to the memory 602 for further processing. The audio circuit 606 may also include an ear bud jack to provide communication of the peripheral ear bud with the terminal.
The WiFi belongs to a short-distance wireless transmission technology, and the terminal can help the user to send and receive e-mail, browse web pages, access streaming media and the like through the WiFi module 607, so that wireless broadband internet access is provided for the user. Although fig. 6 shows a WiFi module 607, it is understood that it does not belong to the essential constitution of the terminal, and can be omitted entirely as required within the scope of not changing the essence of the invention.
The processor 608 is a control center of the terminal, and connects various parts of the entire mobile phone using various interfaces and lines, and performs various functions of the terminal and processes data by running or executing software programs and/or modules stored in the memory 602, and calling data stored in the memory 602, thereby controlling the mobile phone as a whole. Optionally, the processor 608 may include one or more processing cores; preferably, the processor 608 may integrate an application processor that primarily handles operating systems, user interfaces, applications, etc., with a modem processor that primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 608.
The terminal also includes a power supply 609 (e.g., a battery) for powering the various components, which may be logically connected to the processor 608 via a power management system so as to provide for managing charging, discharging, and power consumption by the power management system. The power supply 609 may also include one or more of any components, such as a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.
Although not shown, the terminal may further include a camera, a bluetooth module, etc., which will not be described herein. Specifically, in this embodiment, the processor 608 in the terminal loads executable files corresponding to the processes of one or more application programs into the memory 602 according to the following instructions, and the processor 608 executes the application programs stored in the memory 602, so as to implement various functions:
Performing page rendering based on page data of articles to be detected, and generating a target page;
acquiring the display occupied area of text content in the effect diagram of the target page;
calculating the actual occupied area of the text content in the article to be detected;
comparing the display occupied area with the actual occupied area to determine whether the article to be detected maliciously exposes advertisements.
According to the embodiment of the application, based on the user reading behavior data analysis and the picture character recognition technology, the display area of the text content in the effect graph after the corresponding rendering of the article into the display page and the actual occupied area of the text content are compared to judge whether the text area is hidden in the effect graph, so that whether the article maliciously exposes the advertisement is judged indirectly, and the accuracy of advertisement detection is improved. In addition, the account of malicious exposure advertisement can be directly detected, so that the behavior of some accounts of malicious exposure advertisement is limited, the article quality in the communication platform is improved, and the maintenance cost is reduced .
Those of ordinary skill in the art will appreciate that all or a portion of the steps of the various methods of the above embodiments may be performed by instructions, or by instructions controlling associated hardware, which may be stored in a computer-readable storage medium and loaded and executed by a processor.
To this end, an embodiment of the present application provides a storage medium storing a plurality of instructions that can be loaded by a processor to perform steps in any one of the methods for detecting malicious exposure advertisement behaviors provided in the embodiments of the present application. For example, the instructions may perform the steps of:
Performing page rendering based on page data of articles to be detected, and generating a target page; acquiring the display occupied area of text content in the effect diagram of the target page; calculating the actual occupied area of the text content in the article to be detected; comparing the display occupied area with the actual occupied area to determine whether the article to be detected maliciously exposes advertisements.
The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.
Wherein the storage medium may include: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.
The instructions stored in the storage medium can execute steps in any method for malicious advertisement exposure provided by the embodiment of the present application, so that the beneficial effects that can be achieved by any method for malicious advertisement exposure provided by the embodiment of the present application can be achieved, which are detailed in the previous embodiments and are not described herein.
The method, the device, the storage medium and the terminal for detecting the malicious exposure advertisement behavior provided by the embodiment of the application are described in detail, and specific examples are applied to the description of the principle and the implementation mode of the application, and the description of the above embodiments is only used for helping to understand the method and the core idea of the application; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in light of the ideas of the present application, the present description should not be construed as limiting the present application.
Claims (14)
1. A detection method of malicious exposure advertisement behavior is applied to a client, and is characterized by comprising the following steps:
Performing page rendering based on page data of articles to be detected, and generating a target page;
identifying a text region presented in an effect diagram of the target page, and acquiring the display occupied area of text content in the effect diagram according to the text region;
calculating the actual occupied area of the text content in the article to be detected;
comparing the display occupied area with the actual occupied area to determine whether the article to be detected maliciously exposes advertisements.
2. The method for detecting malicious exposure advertisement behaviors according to claim 1, wherein the identifying the text region presented in the effect diagram of the target page and obtaining the display occupation area of the text content in the effect diagram according to the text region comprises:
preprocessing the effect graph to obtain a content outline graph of the effect graph;
Performing expansion treatment and corrosion treatment on the content profile for multiple times alternately to obtain a treated content profile;
identifying a text region from the processed content profile;
And determining the display occupied area of the text content based on the text area.
3. The method for detecting malicious advertisement actions according to claim 2, wherein said identifying text regions from the processed content profile comprises:
Acquiring typesetting information and word size information of text content from the page data;
Determining the presentation form of a presentation area when the text content is presented according to the typesetting information and the word size information;
and identifying a text region from the processed content contour map based on the presentation form of the presentation region.
4. The method for detecting malicious exposure advertisement behavior according to claim 2, wherein the determining the display occupied area of the text content based on the text region comprises
Acquiring the number of first pixel points occupied by the text region in the client;
and determining the display occupied area of the text content according to the number of the first pixel points.
5. The method for detecting malicious exposure advertisement behaviors according to claim 1, wherein the calculating the actual occupied area of the text content in the article to be detected includes:
Extracting text attributes from the page data, wherein the text attributes at least comprise: font, font size, and number of words;
calculating the number of second pixel points occupied by the characters in the article to be detected in the client according to the fonts, the font sizes and the number of the characters;
And determining the actual occupied area of the text content according to the number of the second pixel points.
6. The method for detecting malicious exposure advertisement behaviors according to claim 1, wherein after generating a target page, before acquiring a display occupied area of text content in an effect diagram of the target page, further comprises:
capturing a picture presented by the client in the process of displaying the target page;
judging whether the displayed picture changes or not;
And if yes, executing the step of acquiring the display occupied area of the text content in the effect diagram of the target page.
7. The method of any one of claims 1-6, wherein the comparing the display footprint with the actual footprint to determine whether the article to be detected is a malicious exposure advertisement comprises:
calculating the ratio between the display occupied area and the actual occupied area;
judging whether the ratio is smaller than a preset ratio or not;
If yes, judging that the article to be detected maliciously exposes the advertisement;
If not, judging that the article to be detected is a non-malicious exposure advertisement.
8. The method of any one of claims 1-6, wherein the comparing the display footprint with the actual footprint to determine whether the article to be detected is a malicious exposure advertisement comprises:
Calculating a difference between the display footprint and the actual footprint;
Judging whether the difference is larger than a preset difference or not;
If yes, judging that the article to be detected maliciously exposes the advertisement;
If not, judging that the article to be detected is a non-malicious exposure advertisement.
9. A detection apparatus for malicious exposure advertisement behavior, applied to a client, comprising:
The rendering unit is used for performing page rendering based on the page data of the article to be detected and generating a target page;
The obtaining unit is used for identifying a text region presented in the effect diagram of the target page and obtaining the display occupied area of text content in the effect diagram according to the text region;
the calculating unit is used for calculating the actual occupied area of the text content in the article to be detected;
And the determining unit is used for comparing the display occupied area with the actual occupied area to determine whether the article to be detected maliciously exposes the advertisement.
10. The apparatus for detecting malicious exposure advertisement behavior according to claim 9, wherein the acquisition unit comprises:
the first processing subunit is used for preprocessing the effect graph to obtain a content outline graph of the effect graph;
the second processing subunit is used for carrying out multiple alternation of expansion processing and corrosion processing on the content profile to obtain a processed content profile;
an identification subunit, configured to identify a text region from the processed content profile;
and the determining subunit is used for determining the display occupied area of the text content based on the text area.
11. The apparatus for detecting malicious exposure advertisement behavior according to claim 9, wherein the computing unit is configured to:
Extracting text attributes from the page data, wherein the text attributes at least comprise: font, font size, and number of words;
calculating the number of second pixel points occupied by the characters in the article to be detected in the client according to the fonts, the font sizes and the number of the characters;
And determining the actual occupied area of the text content according to the number of the second pixel points.
12. The apparatus for detecting malicious exposure ad activity of claim 9, further comprising:
the capturing unit is used for capturing the picture presented by the client in the process of displaying the target page before acquiring the display occupied area of the text content in the effect graph of the target page after generating the target page;
A judging unit for judging whether the presented picture changes;
and the acquisition unit is used for acquiring the display occupied area of the text content in the effect diagram of the target page when the judgment unit judges that the effect diagram is yes.
13. A storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the steps of the method of any one of claims 1 to 8.
14. A terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1 to 8 when executing the program.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910372420.5A CN111899042B (en) | 2019-05-06 | 2019-05-06 | Malicious exposure advertisement behavior detection method and device, storage medium and terminal |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910372420.5A CN111899042B (en) | 2019-05-06 | 2019-05-06 | Malicious exposure advertisement behavior detection method and device, storage medium and terminal |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111899042A CN111899042A (en) | 2020-11-06 |
CN111899042B true CN111899042B (en) | 2024-04-30 |
Family
ID=73169490
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910372420.5A Active CN111899042B (en) | 2019-05-06 | 2019-05-06 | Malicious exposure advertisement behavior detection method and device, storage medium and terminal |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111899042B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115118492B (en) * | 2022-06-27 | 2023-03-24 | 珠海市鸿瑞信息技术股份有限公司 | Equipment state monitoring system and method based on TCP access |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002218340A (en) * | 2001-01-19 | 2002-08-02 | Sony Corp | Receiver, receiving method, broadcasting equipment, broadcasting method, recording medium and program |
CN103810425A (en) * | 2012-11-13 | 2014-05-21 | 腾讯科技(深圳)有限公司 | Method and device for detecting malicious website |
CN103902889A (en) * | 2012-12-26 | 2014-07-02 | 腾讯科技(深圳)有限公司 | Malicious message cloud detection method and server |
CN106407228A (en) * | 2015-08-03 | 2017-02-15 | 天脉聚源(北京)科技有限公司 | Webpage application function hide and display control method and system |
CN107247691A (en) * | 2017-05-24 | 2017-10-13 | 腾讯科技(深圳)有限公司 | A kind of display methods of text message, device, mobile terminal and storage medium |
CN107665076A (en) * | 2017-09-14 | 2018-02-06 | 广州神马移动信息科技有限公司 | Show method, equipment, browser and the electronic equipment of web page contents |
CN108694321A (en) * | 2017-04-07 | 2018-10-23 | 武汉安天信息技术有限责任公司 | A kind of recognition methods of fishing website and device |
CN109067944A (en) * | 2018-08-22 | 2018-12-21 | Oppo广东移动通信有限公司 | terminal control method, device, mobile terminal and storage medium |
CN109255356A (en) * | 2018-07-24 | 2019-01-22 | 阿里巴巴集团控股有限公司 | A kind of character recognition method, device and computer readable storage medium |
CN109446895A (en) * | 2018-09-18 | 2019-03-08 | 中国汽车技术研究中心有限公司 | A kind of pedestrian recognition method based on human body head feature |
-
2019
- 2019-05-06 CN CN201910372420.5A patent/CN111899042B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002218340A (en) * | 2001-01-19 | 2002-08-02 | Sony Corp | Receiver, receiving method, broadcasting equipment, broadcasting method, recording medium and program |
CN103810425A (en) * | 2012-11-13 | 2014-05-21 | 腾讯科技(深圳)有限公司 | Method and device for detecting malicious website |
CN103902889A (en) * | 2012-12-26 | 2014-07-02 | 腾讯科技(深圳)有限公司 | Malicious message cloud detection method and server |
CN106407228A (en) * | 2015-08-03 | 2017-02-15 | 天脉聚源(北京)科技有限公司 | Webpage application function hide and display control method and system |
CN108694321A (en) * | 2017-04-07 | 2018-10-23 | 武汉安天信息技术有限责任公司 | A kind of recognition methods of fishing website and device |
CN107247691A (en) * | 2017-05-24 | 2017-10-13 | 腾讯科技(深圳)有限公司 | A kind of display methods of text message, device, mobile terminal and storage medium |
CN107665076A (en) * | 2017-09-14 | 2018-02-06 | 广州神马移动信息科技有限公司 | Show method, equipment, browser and the electronic equipment of web page contents |
CN109255356A (en) * | 2018-07-24 | 2019-01-22 | 阿里巴巴集团控股有限公司 | A kind of character recognition method, device and computer readable storage medium |
CN109067944A (en) * | 2018-08-22 | 2018-12-21 | Oppo广东移动通信有限公司 | terminal control method, device, mobile terminal and storage medium |
CN109446895A (en) * | 2018-09-18 | 2019-03-08 | 中国汽车技术研究中心有限公司 | A kind of pedestrian recognition method based on human body head feature |
Also Published As
Publication number | Publication date |
---|---|
CN111899042A (en) | 2020-11-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9507990B2 (en) | Two-dimensional code recognition method and apparatus | |
CN106778440B (en) | Two-dimensional code identification method and device | |
CN109003194B (en) | Comment sharing method, terminal and storage medium | |
CN105867751B (en) | Operation information processing method and device | |
CN106708496B (en) | Processing method and device for label page in graphical interface | |
CN113313804B (en) | Image rendering method and device, electronic equipment and storage medium | |
CN110443171B (en) | Video file classification method and device, storage medium and terminal | |
CN110969056B (en) | Document layout analysis method, device and storage medium for document image | |
CN110784672B (en) | Video data transmission method, device, equipment and storage medium | |
CN108205568A (en) | Method and device based on label selection data | |
CN110555171A (en) | Information processing method, device, storage medium and system | |
CN105095259B (en) | Waterfall flow object display method and device | |
CN111899042B (en) | Malicious exposure advertisement behavior detection method and device, storage medium and terminal | |
CN112541489A (en) | Image detection method and device, mobile terminal and storage medium | |
CN110045897B (en) | Information display method and terminal device | |
CN117392934A (en) | Image processing method, device, storage medium and electronic equipment | |
CN106548110A (en) | Image processing method and device | |
CN117435109A (en) | Content display method and device and computer readable storage medium | |
CN112181266B (en) | Graphic code identification method, device, terminal and storage medium | |
CN116992826A (en) | Document data processing method and device and storage medium | |
CN113536100B (en) | Information processing method, device and computer readable storage medium | |
CN110503084B (en) | Method and device for identifying text region in image | |
CN107871017B (en) | Method and device for detecting information filtering function | |
CN112733573B (en) | Form detection method and device, mobile terminal and storage medium | |
CN115797921B (en) | Subtitle identification method and device, electronic equipment and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |