[go: up one dir, main page]

CN104331475B - A kind of information detecting method and device - Google Patents

A kind of information detecting method and device Download PDF

Info

Publication number
CN104331475B
CN104331475B CN201410611713.1A CN201410611713A CN104331475B CN 104331475 B CN104331475 B CN 104331475B CN 201410611713 A CN201410611713 A CN 201410611713A CN 104331475 B CN104331475 B CN 104331475B
Authority
CN
China
Prior art keywords
word
text message
keyword
attribute
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410611713.1A
Other languages
Chinese (zh)
Other versions
CN104331475A (en
Inventor
张扬蕾
张丽辉
冯晓娜
刘建辉
文帅营
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZHENGZHOU XIZHI INFORMATION TECHNOLOGY Co Ltd
Original Assignee
ZHENGZHOU XIZHI INFORMATION TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZHENGZHOU XIZHI INFORMATION TECHNOLOGY Co Ltd filed Critical ZHENGZHOU XIZHI INFORMATION TECHNOLOGY Co Ltd
Priority to CN201410611713.1A priority Critical patent/CN104331475B/en
Publication of CN104331475A publication Critical patent/CN104331475A/en
Application granted granted Critical
Publication of CN104331475B publication Critical patent/CN104331475B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3335Syntactic pre-processing, e.g. stopword elimination, stemming

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Document Processing Apparatus (AREA)

Abstract

This application provides a kind of information detecting method and device, one of which information detecting method, including:Obtain the text message of measurement information to be checked;Text message is compared with the first attribute word in more attribute dictionaries, the first attribute word includes the alternative word of keyword and keyword;When text message includes the first attribute word, five characters in text message before the first attribute word and five characters after the first attribute word are compared with the second attribute word in more attribute dictionaries, comparison result is obtained, the second attribute word is the determiner of keyword;According to comparison result, determine whether text message is invalid information.Compared with prior art, what the application provided is this by that can carry out more comprehensive detection, the probability of decision error caused by reducing single keyword, so as to improve the accuracy of infomation detection in a manner of comparing to determine invalid information by different words to text message.

Description

A kind of information detecting method and device
Technical field
The application is related to information detection technology field, more particularly to a kind of information detecting method and device.
Background technology
Website obtains the favor of more and more people as a kind of new tool of communications, and in order to prevent invalid information, Such as include and relate to "pornography, gambling and drug abuse and trafficking", violence, the national information for forbidding issue of terror, issued on website, it is issued in information Before need to carry out legitimacy detection to information first, so-called legitimacy shows that information meets homeland security requirement.
Instantly information detecting method is:Treat detection information and carry out word segmentation processing, obtain multiple independent words, then will Each independent word is compared with the keyword in keywords database, when word is identical with the keyword in keywords database, Measurement information to be checked is judged for invalid information, i.e., does not allow the information announced, the wherein keyword in keywords database is to show Relate to the word of the information such as "pornography, gambling and drug abuse and trafficking", violence, terror.
As can be seen that existing information detection method is only capable of obtaining after being segmented according to measurement information to be checked from said process One group of word in whether containing keyword judge whether measurement information to be checked is invalid information, this determination methods generally can not be right Detection information is judged comprehensively, therefore the accuracy that prior art is judged invalid information need to be improved.
The content of the invention
In view of this, the application provides a kind of information detecting method, for improving the accuracy of infomation detection.
The application also provides a kind of information detector, to ensure the realization and application of the above method in practice.
The information detecting method and the technical scheme of device that the application provides are as follows:
On the one hand, the embodiment of the present application provides a kind of information detecting method, and methods described includes:
Obtain the text message of measurement information to be checked;
Text message is compared with the first attribute word in the more attribute dictionaries pre-established, wherein the first attribute word Alternative word including keyword and keyword, alternative word are to have same pronunciation with keyword or include the word of same morpheme;
When text message includes the first attribute word, by text message be located at the first attribute word before five characters and Five characters after the first attribute word are compared with the second attribute word in more attribute dictionaries, obtain comparison result, the Two attribute words are the determiner of keyword, and determiner is used to be defined keyword;
According to comparison result, determine whether text message is invalid information.
Preferably, determiner includes key player on a team's word, and key player on a team's word forms illegal phrase with keyword;
According to comparison result, determine whether text message is that invalid information includes:When comparison result shows in text message During including key player on a team's word, it is invalid information to determine text message;
When comparison result shows not include key player on a team's word in text message, it is legal information to determine text message.
Preferably, determiner include it is counter select word, it is counter to select word and the legal phrase of keyword composition;
According to comparison result, determine whether text message is that invalid information includes:When comparison result shows in text message Do not include anti-when selecting word, it is invalid information to determine text message;
When comparison result show text message include it is counter select word when, it is legal information to determine text message.
Preferably, obtaining the text message of measurement information to be checked includes:
Determine the position of symbol in measurement information to be checked;
Symbol is deleted from determined opening position, obtains text message.
Preferably, the process that pre-establishes of more attribute dictionaries includes:
Obtain the keyword of any object to be detected;
Attributive analysis is carried out to keyword, obtains the alternative word and the second attribute word of keyword;
According to acquired keyword, it is determined that the position of resulting alternative word and the second attribute word in more attribute dictionaries Put;
By in position determined by resulting alternative word and the write-in of the second attribute word.
On the other hand, the application provides a kind of information detector, and described device includes:
Acquisition module, for obtaining the text message of measurement information to be checked;
First comparing module, for text message and the first attribute word in more attribute dictionaries for pre-establishing to be compared It is right, wherein the first attribute word includes the alternative word of keyword and keyword, alternative word be with keyword have same pronunciation or Include the word of same morpheme;
Second comparing module, for when text message includes the first attribute word, the first category being located in text message Five characters before property word and five characters after the first attribute word are compared with the second attribute word in more attribute dictionaries It is right, comparison result is obtained, the second attribute word is the determiner of keyword, and determiner is used to be defined keyword;
Determining module, for according to comparison result, determining whether text message is invalid information.
Preferably, determiner includes key player on a team's word, and key player on a team's word forms illegal phrase with keyword;
Determining module is used for when comparison result shows that text message includes key player on a team's word, determines text message for illegal letter Breath;And for when comparison result shows not include key player on a team's word in text message, it to be legal information to determine text message.
Preferably, determiner include it is counter select word, it is counter to select word and the legal phrase of keyword composition;
Determining module be used for when comparison result show in text message not include it is counter select word when, it is illegal to determine text message Information;And for when comparison result show text message include it is counter select word when, it is legal information to determine text message.
Preferably, acquisition module includes:
Determining unit, for determining the position of symbol in measurement information to be checked;
Unit is deleted, for deleting symbol from determined opening position, obtains text message.
Preferably, information detector also includes:
Keyword acquisition module, for obtaining the keyword of any object to be detected;
Analysis module, for carrying out attributive analysis to keyword, obtain the alternative word and the second attribute word of keyword;
Position acquisition module, for according to acquired keyword, it is determined that resulting alternative word and the second attribute word exists Position in more attribute dictionaries;
Module is write, for by position determined by resulting alternative word and the write-in of the second attribute word.
Compared with prior art, the application includes advantages below:
In this application, the text message of measurement information to be checked is obtained first;By text message and the more attributes pre-established The first attribute word is compared in dictionary;When text message includes the first attribute word, the first attribute will be located in text message Five characters before word and five characters after the first attribute word are compared to obtain comparison result with the second attribute word, Then according to comparison result, judge whether text message is invalid information;Compared with prior art, the application is not only and passed through Whether include keyword to judge whether it be invalid information, can also determine whether to treat measurement information if treating the text message of measurement information Text message whether include keyword alternative word and text message in be located at the first attribute word before five characters and be located at Whether the determiner for including being used to be defined keyword finally judges text message to five characters after first attribute word Whether be invalid information, it is this by a manner of comparing to determine invalid information by different words relative to non-using single crucial word judgment Method information approach, more comprehensive detection can be carried out to text message, decision error is several caused by reducing single keyword Rate, so as to improve the accuracy of infomation detection.
Brief description of the drawings
In order to illustrate more clearly of the technical scheme in the embodiment of the present application, make required in being described below to embodiment Accompanying drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the present application, for For those of ordinary skill in the art, without having to pay creative labor, it can also be obtained according to these accompanying drawings His accompanying drawing.
Fig. 1 is a kind of flow chart for information detecting method that the embodiment of the present application provides;
Fig. 2 is a kind of second of flow of information detecting method that the embodiment of the present application provides when determiner is key player on a team's word Figure;
Fig. 3 is the third flow that determiner is a kind of information detecting method that anti-the embodiment of the present application when selecting word provides Figure;
The more attribute dictionaries of a kind of information detecting method that Fig. 4 provides for the embodiment of the present application pre-establish process flow Figure;
Fig. 5 is a kind of staff's inputting interface schematic diagram for information detecting method that the embodiment of the present application provides;
Fig. 6 is a kind of schematic diagram for information detector that the embodiment of the present application provides;
Fig. 7 is a kind of schematic diagram of the acquisition module for information detector that the embodiment of the present application provides;
Fig. 8 is the correlation module for being used to establish more attribute dictionaries in a kind of information detector that the embodiment of the present application provides Schematic diagram.
Embodiment
In order that those skilled in the art more fully understand the application, below in conjunction with the accompanying drawing in the embodiment of the present application, Technical scheme in the embodiment of the present application is clearly and completely described, it is clear that described embodiment is only the application Part of the embodiment, rather than whole embodiments.Based on the embodiment in the application, those of ordinary skill in the art are not having The every other embodiment obtained under the premise of creative work is made, belongs to the scope of the application protection.
Referring to Fig. 1, a kind of flow chart of the information detecting method provided it illustrates the embodiment of the present application, can include Following steps:
101:Obtain the text message of measurement information to be checked.
Wherein text message is the information of measurement information Chinese character segment composition to be checked, and text information does not include punctuation mark Etc. non-legible information, obtaining a kind of feasible pattern of text message in the embodiment of the present application is:By the symbol in measurement information to be checked Number all delete, be left part be measurement information to be checked text messages.
Such as measurement information to be checked is:During 12 days 6 October, careful investigation is passed through by prohibition of drug group of Cang Yuan counties, in small black Jiang Zhishuan The kms of Jiang Fangxiang two set card and intercept traffic in drugs vehicle.40 divide when 6, and a mini van does not listen prohibition of drug people's police's warning to rush by force Card.It is in the text message obtained after treatment:October 12, prohibition of drug group of 6 Shi Cangyuan counties was investigated in little Hei Jiang by careful 40 divide a mini van not listen prohibition of drug people's police's warning to rush by force when setting card interception traffic in drugs vehicle 6 to the km of Shuan Jiang directions two Card, from this example it can be seen that text message only includes word.
102:Text message is compared with the first attribute word in the more attribute dictionaries pre-established.
Alternative word of the first attribute word including keyword and keyword in the embodiment of the present application, wherein keyword are can be true Determine text message and be the basic word of invalid information, such as relate to the information that "pornography, gambling and drug abuse and trafficking", violence, terror etc. violate national relevant regulations Word.
Alternative word for there is same pronunciation with keyword or include the word of same morpheme, its extent of injury and keyword The extent of injury is identical, for excluding artificial clerical error keyword such case when measurement information to be checked is invalid information.For example close When keyword is invoice, its alternative word can be unstable, hair drift etc.;Such as keyword is rifle again, and its alternative word can be wooden storehouse etc..
It is by text message and keyword when text message is compared with the first attribute word in more attribute dictionaries It is compared successively with alternative word, to determine whether include the first attribute word in text message;If do not include in text message First attribute word, then text information is legal information, end operation;, should if text message includes the first attribute word Text message may be invalid information, now need by text message compared with other words, finally to determine whether it is Invalid information.
103:When text message includes the first attribute word, by five words in text message before the first attribute word Symbol and five characters after the first attribute word are compared with the second attribute word in more attribute dictionaries, obtain comparing knot Fruit.
Wherein the second attribute word is the determiner of keyword, for being defined to keyword.So-called restriction can be pair The use range of keyword, occupation mode, some restrictions using approach etc.;Determiner can be located at key in phrase order Before word, such as " sucking " in " sucking methamphetamine ", the determiner is located at keyword before and is used for the occupation mode for limiting methamphetamine; Certainly in phrase order, determiner can also be located at after keyword, and such as " detection " of " methamphetamine detection ", the determiner are located at Approach is used after keyword and for limiting.
The first attribute word includes keyword and alternative word in the embodiment of the present application, when text message includes keyword, Then five characters in text message before keyword and five characters after keyword and the second attribute word are carried out Compare;When text message includes alternative word, then by five characters in text message before alternative word and positioned at alternative word Five characters afterwards are compared with the second attribute word;When text message includes keyword and alternative word simultaneously, then by text It is located at five characters before keyword and five characters after keyword, and five words before alternative word in information Symbol and five characters after alternative word are compared with the second attribute word.
Determiner as the second attribute word is located proximate to keyword in text message, therefore by text message Totally ten characters are compared forward and backward each five characters of one attribute word with determiner, to determine whether above-mentioned ten characters wrap The second attribute word is included, it is possible thereby to improve accuracy of the text message when detecting whether to include the second attribute word.If text Five and more than five characters are spaced in the second attribute word and the first attribute word in information, the second attribute word cannot be to One attribute word plays restriction effect, now then need not judge whether text message is illegal according to the second attribute word.
104:According to comparison result, determine whether text message is invalid information.
, can be according to comparison result from semantically judging text message in the embodiment of the present application after comparison result is obtained Whether it is invalid information.
Using above-mentioned technical proposal, the text message of measurement information to be checked is obtained first;By text message and pre-establish The first attribute word is compared in more attribute dictionaries;, will be in text message positioned at the when text message includes the first attribute word Five characters before one attribute word and five characters after the first attribute word are compared to be compared with the second attribute word To result, then according to comparison result, judge whether text message is invalid information;Compared with prior art, the application is not only It is only through treating whether the text message of measurement information includes keyword to sentence whether it is invalid information, can also determines whether to treat Whether the text message of measurement information includes being located at five characters before the first attribute word in the alternative word and text message of keyword Whether the determiner for including being used to be defined keyword finally judges text with five characters after the first attribute word Whether this information is invalid information, it is this by a manner of comparing to determine invalid information by different words relative to using single keyword Judgement invalid information method, can carry out more comprehensive detection to text message, and judgement caused by reducing single keyword is wrong Probability, so as to improve the accuracy of infomation detection by mistake.
It is relative in a manner of different words compare to determine invalid information to carry out illustration the application by way of example in the embodiment of the present application In the accuracy that infomation detection can be improved using single crucial word judgment invalid information method:
As text message is:" selling a kind of this commodity of commodity can detect in food whether contain methamphetamine composition ", close Keyword is:Methamphetamine, its determiner are:Detection.When being judged using existing single keyword, text information includes closing Keyword ice, then using single keyword judge the current situation will text information be determined as invalid information.But pass through semanteme Analysis understand text information it is actual be legal information, the judged result mistake of single keyword.When using the embodiment of the present application During the infomation detection mode of offer, judge that text information is possible to as invalid information by keyword first, secondly should Text message is compared with determiner " detection ", and obtain comparison result includes detecting this determiner for text message, so Afterwards according to comparison result from from text message is semantically judged, for legal information, judged result is correct.It can be proved by the example The information detecting method that the embodiment of the present application provides can improve the accuracy of infomation detection.
Below will with determiner include key player on a team's word or it is counter select word come in the embodiment of the present application according to comparison result determination Whether text message is that invalid information illustrates.Wherein key player on a team's word and keyword form the key player on a team of illegal phrase, such as " invoice " Word includes " generation opens ", " sale " etc., and when including key player on a team's word and keyword simultaneously in text message, text information is illegal letter Breath.It is counter accordingly to select word and the legal phrase of keyword composition, such as the counter of ice to select word to include " test paper ", " detection " etc., when Text message includes anti-when selecting word and keyword, and the text is legal information.From key player on a team's word and it is counter select word from the point of view of, both are to text The judgment mode of this information is different, can specifically refer to shown in Fig. 2 and Fig. 3.
Wherein Fig. 2 is determiner when being key player on a team's word, second of flow of the information detecting method that the embodiment of the present application provides Figure, may comprise steps of:
101:Obtain the text message of measurement information to be checked.Symbol in measurement information to be checked is all deleted, remaining part is For the text message of measurement information to be checked.
102:Text message is compared with the first attribute word in the more attribute dictionaries pre-established, wherein the first category Property word include the alternative word of keyword and keyword, alternative word is to have same pronunciation or including same morpheme with keyword Word.
103:When text message includes the first attribute word, by five words in text message before the first attribute word Symbol and five characters after the first attribute word are compared with the second attribute word in more attribute dictionaries, obtain comparing knot Fruit.Second attribute word is the determiner of keyword, for being defined to keyword.
105:When comparison result shows that text message includes key player on a team's word, it is invalid information to determine text message.
106:When comparison result shows not include key player on a team's word in text message, it is legal information to determine text message.
Fig. 3 be determiner for it is counter select word when, the third flow chart of the information detecting method of the embodiment of the present application offer can To comprise the following steps:
101:Obtain the text message of measurement information to be checked.
Symbol in measurement information to be checked is all deleted, is left the text message that part is measurement information to be checked.
102:Text message is compared with the first attribute word in the more attribute dictionaries pre-established, wherein the first category Property word include the alternative word of keyword and keyword, alternative word is to have same pronunciation or including same morpheme with keyword Word.
103:When text message includes the first attribute word, by five words in text message before the first attribute word Symbol and five characters after the first attribute word are compared with the second attribute word in more attribute dictionaries, obtain comparing knot Fruit.Second attribute word is the determiner of keyword, for being defined to keyword.
107:When comparison result show in text message not include it is counter select word when, it is invalid information to determine text message;
108:When comparison result show text message include it is counter select word when, it is legal information to determine text message.
It should be noted is that:The embodiment of the present application provide information detecting method can also simultaneously be to text message It is no including key player on a team's word and it is counter select word to be judged, when by key player on a team's word or counter selecting word to judge that text message is invalid information When, it is determined that text message is invalid information.
Process also is pre-established including more attribute dictionaries in above-mentioned all embodiments, referring to Fig. 4, it illustrates this Shen The process of more attribute dictionaries please be established in embodiment, may comprise steps of:
401:Obtain the keyword of any object to be detected.
Object wherein to be detected is to be present in text message the things that may result in that text message is invalid information, such as Foregoing methamphetamine is an object to be detected, then the keyword got is ice.
402:Attributive analysis is carried out to keyword, obtains the alternative word and the second attribute word of keyword.
Attributive analysis wherein to keyword can be completed by staff, input what it was thought after its attribute is analyzed Alternative word and the second attribute word.Such as interface shown in Fig. 5, the change thought by staff can be provided for staff Shape word and the second attribute word write the relevant position at the interface, so as to obtain the alternative word of keyword and the second attribute word.
403:According to acquired keyword, it is determined that resulting alternative word and the second attribute word are in more attribute dictionaries Position.
After keyword, alternative word and the second attribute word is got, it is necessary first to determine keyword in more attribute dictionaries Position and the second attribute word (i.e. determiner) of keyword select word for key player on a team's word is still counter, the then position according to keyword It is determined that with keyword in position of the position of same a line as alternative word and the second attribute word in more attribute dictionaries.
404:By in position determined by resulting alternative word and the write-in of the second attribute word.
By taking table 1 as an example, table 1 is a kind of form of more attribute dictionaries in the embodiment of the present application, and it illustrates keyword, deformation The storage mode of word and the second attribute word in more attribute dictionaries, wherein "×" represent that the word is not present.
A kind of form of the table dictionary of attribute more than 1
After the completion of more attribute dictionaries are established, if necessary to add keyword, alternative word and the second attribute word, then whenever discovery One keyword, alternative word and the second attribute word, repeat step 303 to 304 is to improve more attribute dictionaries.
For foregoing each method embodiment, in order to be briefly described, therefore it is all expressed as to a series of combination of actions, but It is that those skilled in the art should know, the application is not limited by described sequence of movement, because according to the application, certain A little steps can use other orders or carry out simultaneously.Secondly, those skilled in the art should also know, be retouched in specification The embodiment stated belongs to preferred embodiment, necessary to involved action and module not necessarily the application.
Corresponding with above method embodiment, the embodiment of the present application also provides a kind of information detector, infomation detection dress A kind of structural representation for putting as shown in fig. 6, including:Acquisition module 11, the first comparing module 12, the second comparing module 13 and really Cover half block 14.Wherein:
Acquisition module 11, for obtaining the text message of measurement information to be checked.
Wherein text message is the information of measurement information Chinese character segment composition to be checked, and text information does not include punctuation mark Etc. non-legible information, a kind of feasible pattern of acquisition module 11 is in the embodiment of the present application:By the symbol in measurement information to be checked All delete, be left the text message that part is measurement information to be checked.
Such as measurement information to be checked is:During 12 days 6 October, careful investigation is passed through by prohibition of drug group of Cang Yuan counties, in small black Jiang Zhishuan The kms of Jiang Fangxiang two set card and intercept traffic in drugs vehicle.40 divide when 6, and a mini van does not listen prohibition of drug people's police's warning to rush by force Card.It is in the text message that acquisition module 11 obtains after treatment:October 12, prohibition of drug group of 6 Shi Cangyuan counties was detectd by careful Look into the small black kms of Jiang Zhishuan Jiang Fangxiang two set card intercept traffic in drugs vehicle 6 when 40 divide a mini van do not listen the prohibition of drug people's police Punching blocks by force for warning, from this example it can be seen that text message only includes word.
Specific acquisition module 11 can take structural representation as shown in Figure 7, and acquisition module 11 can include:It is determined that Unit 111 and deletion unit 112, wherein:
Determining unit 111, for determining the position of symbol in the measurement information to be checked;
Unit 112 is deleted, for deleting the symbol from determined opening position, obtains the text message.
First comparing module 12, for the first attribute word in text message and the more attribute dictionaries pre-established to be carried out Compare.
Alternative word of the first attribute word including keyword and keyword in the embodiment of the present application, wherein keyword are can be true Determine text message and be the basic word of invalid information, such as relate to the information that "pornography, gambling and drug abuse and trafficking", violence, terror etc. violate national relevant regulations Word.
Alternative word for there is same pronunciation with keyword or include the word of same morpheme, its extent of injury and keyword The extent of injury is identical, for excluding artificial clerical error keyword such case when measurement information to be checked is invalid information.For example close When keyword is invoice, its alternative word can be unstable, hair drift etc.;Such as keyword is rifle again, and its alternative word can be wooden storehouse etc..
First comparing module 12 is by text when text message is compared with the first attribute word in more attribute dictionaries This information is compared successively with keyword and alternative word, to determine whether include the first attribute word in text message;It is if literary Do not include the first attribute word in this information, then text information is legal information, end operation;If text message includes One attribute word, then text information may be that invalid information triggers the second comparing module 13, it is necessary to carry out operation in next step, with Finally determine whether it is invalid information.
Second comparing module 13, for when text message includes the first attribute word, first being located in text message Five characters before attribute word and five characters after the first attribute word are carried out with the second attribute word in more attribute dictionaries Compare, obtain comparison result.
Wherein the second attribute word is the determiner of keyword, for being defined to keyword.So-called restriction can be pair The use range of keyword, occupation mode, some restrictions using approach etc.;Determiner can be located at key in phrase order Before word, such as " sucking " in " sucking methamphetamine ", the determiner is located at keyword before and is used for the occupation mode for limiting methamphetamine; Certainly in phrase order, determiner can also be located at after keyword, and such as " detection " of " methamphetamine detection ", the determiner are located at Approach is used after keyword and for limiting.
When text message includes the first attribute word, by forward and backward each five characters of the first attribute word in text message Totally ten characters are compared with the second attribute word, to determine whether above-mentioned ten characters include the second attribute word, it is possible thereby to Improve accuracy of the text message when detecting whether to include the second attribute word.If the second attribute word in text message and Five and more than five characters are spaced in one attribute word, the second attribute word cannot play restriction effect to the first attribute word, Now then it need not judge whether text message is illegal according to the second attribute word.
Determining module 14, for according to comparison result, determining whether text message is invalid information.In the embodiment of the present application In after comparison result is obtained, determining module 14 can be according to comparison result from semantically judging whether text message is illegally to believe Breath.
Key player on a team's word will be included with determiner below or counter select word to be said to determining module in the embodiment of the present application 14 It is bright.Wherein key player on a team's word and keyword form illegal phrase, and key player on a team's word of such as " invoice " includes " generation opens ", " sale ", works as text When including key player on a team's word and keyword simultaneously in information, text information is invalid information.It is counter accordingly to select word to be formed with keyword Legal phrase, such as the counter of ice select word to include " test paper ", " detection " etc., when text message includes counter selecting word and keyword When, the text is legal information.
When determiner includes key player on a team's word, determining module 14 is used for:When comparison result shows that text message includes key player on a team During word, it is invalid information to determine text message;When comparison result shows not include key player on a team's word in text message, text envelope is determined Cease for legal information;
Determiner include it is counter select word when, determining module 14 is used for:When comparison result shows not include instead in text message When selecting word, it is invalid information to determine text message;And for when comparison result show text message include it is counter select word when, really It is legal information to determine text message.
The device of above-mentioned all embodiments is stored with more attribute dictionaries.Referring to Fig. 8, it illustrates the embodiment of the present application The correlation module for being used to establish more attribute dictionaries that a kind of information detector can include, including:Keyword acquisition module 15, Analysis module 16, position acquisition module 17 and write module 18.Wherein:
Keyword acquisition module 15, for obtaining the keyword of any object to be detected.
Object wherein to be detected is to be present in text message the things that may result in that text message is invalid information, such as Foregoing methamphetamine is an object to be detected, then the keyword got is ice
Analysis module 16, for carrying out attributive analysis to the keyword, obtain the alternative word of the keyword and described Second attribute word.
Attributive analysis wherein to keyword can be completed by staff, input what it was thought after its attribute is analyzed Alternative word and the second attribute word.Such as interface shown in Fig. 5, the change thought by staff can be provided for staff Shape word and the second attribute word write the relevant position at the interface, so as to which analysis module 16 obtains the alternative word and the second category of keyword Property word.
Position acquisition module 17, for according to the acquired keyword, it is determined that the resulting alternative word and institute State position of the second attribute word in more attribute dictionaries.
After keyword, alternative word and the second attribute word is got, it is necessary first to which position acquisition module 17 determines keyword Second attribute word (i.e. determiner) of position and keyword in more attribute dictionaries selects word, Ran Houwei for key player on a team's word is still counter The position that acquisition module 17 is put according to keyword is determined with keyword in the position of same a line as alternative word and the second attribute word Position in more attribute dictionaries.
Write module 18, for by the resulting alternative word and the second attribute word write-in determined by position In.
By taking table 1 as an example, table 1 is a kind of form of more attribute dictionaries in the embodiment of the present application, and it illustrates keyword, deformation The storage mode of word and the second attribute word in more attribute dictionaries, wherein "×" represent that the word is not present.
A kind of form of the table dictionary of attribute more than 1
For convenience of description, it is divided into various units during description apparatus above with function to describe respectively.Certainly, this is being implemented The function of each unit can be realized in same or multiple softwares and/or hardware during application.
A kind of information detecting method and device provided herein are described in detail above, it is used herein Specific case is set forth to the principle and embodiment of the application, and the explanation of above example is only intended to help and understands this The method and its core concept of application;Meanwhile for those of ordinary skill in the art, according to the thought of the application, specific There will be changes in embodiment and application, in summary, this specification content should not be construed as to the application's Limitation.

Claims (10)

1. a kind of information detecting method, it is characterised in that methods described includes:
Obtain the text message of measurement information to be checked;
The text message is compared with the first attribute word in the more attribute dictionaries pre-established, wherein first category Property word include the alternative word of keyword and the keyword, the alternative word is to have same pronunciation or bag with the keyword Include the word of same morpheme;
When the text message includes the first attribute word, before the first attribute word is located in the text message Five characters and five characters after the first attribute word carried out with the second attribute word in more attribute dictionaries Compare, obtain comparison result, the second attribute word is the determiner of the keyword, and the determiner is used for the key Word is defined;
According to the comparison result, determine whether the text message is invalid information.
2. according to the method for claim 1, it is characterised in that the determiner includes key player on a team's word, key player on a team's word and institute State keyword and form illegal phrase;
It is described according to the comparison result, determine whether the text message is that invalid information includes:When the comparison result table When the bright text message includes key player on a team's word, it is invalid information to determine the text message;
When the comparison result shows not include key player on a team's word in the text message, it is legal to determine the text message Information.
3. according to the method for claim 1, it is characterised in that the determiner include it is counter select word, it is described counter to select word and institute State keyword and form legal phrase;
It is described according to the comparison result, determine whether the text message is that invalid information includes:When the comparison result table Do not include described anti-when selecting word in the bright text message, it is invalid information to determine the text message;
When the comparison result show the text message include it is described it is counter select word when, it is legal letter to determine the text message Breath.
4. according to the method for claim 1, it is characterised in that the text message for obtaining measurement information to be checked includes:
Determine the position of symbol in the measurement information to be checked;
The symbol is deleted from determined opening position, obtains the text message.
5. according to the method described in Claims 1-4 any one, it is characterised in that more attribute dictionaries pre-establish process Including:
Obtain the keyword of any object to be detected;
Attributive analysis is carried out to the keyword, obtains the alternative word of the keyword and the second attribute word;
According to the acquired keyword, it is determined that the resulting alternative word and the second attribute word are in more attributes Position in dictionary;
By in position determined by the resulting alternative word and the second attribute word write-in.
6. a kind of information detector, it is characterised in that described device includes:
Acquisition module, for obtaining the text message of measurement information to be checked;
First comparing module, for the text message and the first attribute word in more attribute dictionaries for pre-establishing to be compared Right, wherein the first attribute word includes the alternative word of keyword and the keyword, the alternative word is and the keyword With same pronunciation or include the word of same morpheme;
Second comparing module, for when the text message includes the first attribute word, by the text message middle position In five characters before the first attribute word and five characters after the first attribute word and more attribute dictionaries In the second attribute word be compared, obtain comparison result, the second attribute word is the determiner of the keyword, the limit Determine word to be used to be defined the keyword;
Determining module, for according to the comparison result, determining whether the text message is invalid information.
7. device according to claim 6, it is characterised in that the determiner includes key player on a team's word, key player on a team's word and institute State keyword and form illegal phrase;
The determining module is used for when the comparison result shows that the text message includes key player on a team's word, it is determined that described Text message is invalid information;And for showing not include key player on a team's word in the text message when the comparison result When, it is legal information to determine the text message.
8. device according to claim 6, it is characterised in that the determiner include it is counter select word, it is described counter to select word and institute State keyword and form legal phrase;
The determining module be used for when the comparison result show in the text message not include it is described it is counter select word when, determine institute It is invalid information to state text message;And for showing that the text message includes described counter selecting word when the comparison result When, it is legal information to determine the text message.
9. device according to claim 6, it is characterised in that the acquisition module includes:
Determining unit, for determining the position of symbol in the measurement information to be checked;
Unit is deleted, for deleting the symbol from determined opening position, obtains the text message.
10. according to the device described in claim 6 to 9 any one, it is characterised in that described device also includes:
Keyword acquisition module, for obtaining the keyword of any object to be detected;
Analysis module, for carrying out attributive analysis to the keyword, obtain the alternative word of the keyword and second category Property word;
Position acquisition module, for according to the acquired keyword, it is determined that the resulting alternative word and described second Position of the attribute word in more attribute dictionaries;
Module is write, for by position determined by the resulting alternative word and the second attribute word write-in.
CN201410611713.1A 2014-11-04 2014-11-04 A kind of information detecting method and device Active CN104331475B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410611713.1A CN104331475B (en) 2014-11-04 2014-11-04 A kind of information detecting method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410611713.1A CN104331475B (en) 2014-11-04 2014-11-04 A kind of information detecting method and device

Publications (2)

Publication Number Publication Date
CN104331475A CN104331475A (en) 2015-02-04
CN104331475B true CN104331475B (en) 2018-03-23

Family

ID=52406202

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410611713.1A Active CN104331475B (en) 2014-11-04 2014-11-04 A kind of information detecting method and device

Country Status (1)

Country Link
CN (1) CN104331475B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108108373B (en) 2016-11-25 2020-09-25 阿里巴巴集团控股有限公司 Name matching method and device
CN109933775B (en) * 2017-12-15 2022-02-18 腾讯科技(深圳)有限公司 UGC content processing method and device
CN108536859A (en) * 2018-04-18 2018-09-14 北京小度信息科技有限公司 Content authentication method, apparatus, electronic equipment and computer readable storage medium
CN111488738B (en) * 2019-01-25 2023-04-28 阿里巴巴集团控股有限公司 Illegal information identification method and device
CN109886683A (en) * 2019-02-25 2019-06-14 北京神荼科技有限公司 Monitor the method, apparatus and storage medium of block chain data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2415062A (en) * 2004-06-08 2005-12-14 Malcolm Ripley Junk mail filter for emails based on subject field text
CN101247279A (en) * 2007-10-23 2008-08-20 北京邮电大学 An Internet Content Security Detection System
CN102053993A (en) * 2009-11-10 2011-05-11 阿里巴巴集团控股有限公司 Text filtering method and text filtering system
CN102779176A (en) * 2012-06-27 2012-11-14 北京奇虎科技有限公司 System and method for key word filtering

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2415062A (en) * 2004-06-08 2005-12-14 Malcolm Ripley Junk mail filter for emails based on subject field text
CN101247279A (en) * 2007-10-23 2008-08-20 北京邮电大学 An Internet Content Security Detection System
CN102053993A (en) * 2009-11-10 2011-05-11 阿里巴巴集团控股有限公司 Text filtering method and text filtering system
CN102779176A (en) * 2012-06-27 2012-11-14 北京奇虎科技有限公司 System and method for key word filtering

Also Published As

Publication number Publication date
CN104331475A (en) 2015-02-04

Similar Documents

Publication Publication Date Title
CN109117482B (en) An Adversarial Sample Generation Method for Chinese Text Sentiment Tendency Detection
Ahmed et al. Detecting opinion spams and fake news using text classification
CN104331475B (en) A kind of information detecting method and device
Koppel et al. Determining if two documents are written by the same author
Stamatatos Author identification using imbalanced and limited training texts
Menai Detection of plagiarism in Arabic documents
Spitters et al. Authorship analysis on dark marketplace forums
CN103150405B (en) Classification model modeling method, Chinese cross-textual reference resolution method and system
US9692771B2 (en) System and method for estimating typicality of names and textual data
Altakrori et al. Arabic authorship attribution: An extensive study on twitter posts
CN107872323A (en) A password security evaluation method and system based on user information detection
CN110020430B (en) Malicious information identification method, device, equipment and storage medium
CN104239490A (en) Multi-account detection method and device for UGC (user generated content) website platform
Bian et al. Detecting spam game reviews on steam with a semi-supervised approach
CN106888201A (en) A kind of method of calibration and device
CN105701085A (en) Network duplicate checking method and system
Hakak et al. Diacritical digital Quran authentication model
Shahid et al. Accurate detection of automatically spun content via stylometric analysis
CN109933775B (en) UGC content processing method and device
Alshamasi et al. Ensemble-Based Clustering for Writing Style Change Detection in Multi-Authored Textual Documents.
CN103049434A (en) System and method for identifying anagrams
CN105701086A (en) Method and system for detecting literature through sliding window
CN113240322A (en) Climate risk exposure quality method, device, electronic equipment and storage medium
Parveen et al. Opinion Mining in Twitter–Sarcasm Detection
Abbott et al. Password differences based on language and testing of memory recall

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 450000 Zhengzhou science and technology zone, Henan high tech Road, building 169, building 1, No. 1

Applicant after: ZHENGZHOU XIZHI INFORMATION TECHNOLOGY CO., LTD.

Address before: 450000 Zhengzhou science and technology zone, Henan high tech Road, building 169, building 1, No. 1

Applicant before: ZHENGZHOU XIZHI INFORMATION TECHNOLOGY CO., LTD.

COR Change of bibliographic data
GR01 Patent grant
GR01 Patent grant