Disclosure of Invention
The invention aims to provide an artificial intelligence-based network rumor recognition system which solves at least one of the problems in the prior art.
In order to achieve the above purpose, the invention adopts the following technical scheme:
an artificial intelligence based network rumor identification system comprising:
the data acquisition module is used for acquiring news texts;
The text analysis module is used for analyzing the characteristics of the news text according to the number of mispronounced words of the preset keywords and the news text;
The user analysis module is used for analyzing the behavior characteristics of the user according to the comment times, the praise times, the sharing times and the reply times of the news text publisher in the first monitoring period;
The spreading analysis module is used for analyzing the spreading speed of the news text according to the sharing total number of the news text in the second monitoring period and analyzing the spreading characteristics of the news text according to the spreading speed of the news text and the number of the spreading people of the news text in the second monitoring period;
The recognition module is used for recognizing the news text according to the analysis result of the characteristics of the news text, the analysis result of the behavior characteristics of the user and the analysis result of the propagation characteristics of the news text so as to obtain the recognition result of the news text, and is also used for adjusting the recognition process of the news text according to the number of the historical propagation rumor texts of the news text publisher and the release time of the news text;
and the storage module is used for storing the recognition result of the news text and outputting the recognition result to the user.
Further, the text analysis module comprises a text analysis unit, a text analysis unit and a text analysis unit, wherein the text analysis unit is used for matching a preset keyword with a news text, if the preset keyword is successfully matched with the news text, the preset keyword is used as a rumor keyword of the news text, the occurrence frequency of the ith rumor keyword in the news text is counted, and the occurrence frequency is recorded as fi;
The text analysis unit is also used for segmenting the news text according to the news text, and setting the emotion intensity value of the j-th word of the news text as Dj;
the text analysis unit is also used for counting the number of wrongly written words of the news text and marking the number as B.
Further, the text analysis module further comprises a feature extraction unit, and the feature extraction unit calculates text features of the news text according to the occurrence times fi of the rumor keywords in the news text, the emotion intensity value Dj of each word of the news text and the mispronounced word number B of the news text to obtain features WT of the news text.
Further, the user analysis module is configured to analyze the user behavior characteristics according to the comment times Ct, the praise times R, the sharing times L and the reply times S of the news text publisher in the first monitoring period, so as to obtain the user behavior characteristics HT.
Further, the propagation analysis module includes a speed analysis unit, which is configured to analyze a propagation speed of the news text according to a total sharing number fx of the news text in the second monitoring period, so as to obtain a propagation speed Ve of the news text.
Further, the propagation analysis module further comprises a construction unit, wherein the construction unit is used for constructing the propagator coefficients according to the number of the propagators of the news text in the second monitoring period, and the construction result of the propagator coefficients comprises P1, P2 and P3.
Further, the propagation analysis module further includes a propagation analysis unit, configured to analyze the propagation characteristics of the news text according to the propagation speed Ve of the news text and the number zc of the propagators and the construction result of the propagator coefficient of the news text in the second monitoring period, so as to obtain the propagation characteristics CT of the news text.
Further, the recognition module includes a recognition unit for recognizing the news text according to the analysis result of the characteristics of the news text, the analysis result of the behavior characteristics of the user, and the analysis result of the propagation characteristics of the news text, wherein:
if γ1×wt+γ2×ht+γ3×ct is not more than u0, the recognition unit determines that the news text is a normal news text;
If γ1×wt+γ2×ht+γ3×ct > u0, the recognition unit determines that the news text is a rumor risk news text;
Wherein γ1 is text weight, γ2 is behavior weight, γ3 is propagation weight, and u0 is preset anomaly coefficient.
Further, the recognition module further comprises a source analysis unit, the source analysis unit constructs source anomaly coefficients of the news text according to the number ys of the historical propagation rumor texts of the news text publisher, adjusts the recognition process of the news text according to the source anomaly coefficients, and sets the adjusted preset anomaly coefficients as u1.
Further, the recognition further comprises a consistency analysis unit, the consistency analysis unit analyzes consistency parameters YZ of the news text according to the release days t0 and the preset release days t1 of the news text, compares the consistency parameters YZ of the news text with a preset consistency threshold YZ0, and sets a preset adjustment ratio as eta 1 if the consistency parameters of the news text do not accord with the threshold.
The method has the advantages that through multi-level feature analysis, including text features, user behaviors, propagation features and source credibility, network rumors can be accurately identified, accuracy and efficiency of judgment are improved, the introduction of the user analysis and propagation analysis module is beneficial to better evaluating information propagation risks, powerful support is provided for decision making, and finally, the storage and feedback functions of the system ensure that users can timely acquire identification results, and user experience and credibility are improved. In conclusion, the invention provides an efficient and accurate solution in rumor identification, and is helpful for maintaining a healthy and real network environment.
Detailed Description
In order to more clearly illustrate the present invention, the present invention will be further described with reference to preferred embodiments and the accompanying drawings. Like parts in the drawings are denoted by the same reference numerals. It is to be understood by persons skilled in the art that the following detailed description is illustrative and not restrictive, and that this invention is not limited to the details given herein.
It should be noted that although the terms first, second, third, etc. may be used in the embodiments of the present application, the description should not be limited to these terms. These terms are only used to distinguish one from another. For example, a first may also be referred to as a second, and similarly, a second may also be referred to as a first, without departing from the scope of embodiments of the application.
Referring to fig. 1, a schematic diagram of an artificial intelligence based network rumor recognition system according to the present embodiment is shown, which includes,
It can be understood that the method for collecting news texts is not particularly limited in this embodiment, and a person skilled in the art can freely set the method only by meeting the method for collecting news texts, wherein the news texts can be collected from news websites and social media through a crawler technology or a third party API interface.
With continued reference to fig. 1, the system further includes:
The text analysis module is connected with the data acquisition module and is used for analyzing news text characteristics according to the number of mispronounced words of preset keywords and news texts.
Specifically, the setting of the preset keywords is not specifically limited, and a person skilled in the art can freely set the preset keywords only by meeting the setting requirement of the preset keywords, wherein the preset keywords can be set as "burst, urgent, shock, explosive, real, secret, action, masking, decision inner curtain, danger, alarm, threat, fraud, health warning" and the like.
Referring to fig. 2, the text analysis module includes:
The text analysis unit is used for matching the preset keywords with the news text, if the preset keywords are successfully matched with the news text, the preset keywords are used as rumor keywords of the news text, if the preset keywords are not successfully matched with the news text, the preset keywords are not used as rumor keywords of the news text, the occurrence times of the ith rumor keywords in the news text are counted, and the occurrence times are recorded as fi;
The text analysis unit is also used for segmenting the news text according to the news text, and setting the emotion intensity value of the j-th word of the news text as Dj;
The text analysis unit is also used for counting the number of wrongly written characters of the news text and recording the number of wrongly written characters as B, and through analyzing the preset keywords and the number of wrongly written characters, the module can quickly identify possible characteristics of the rumor text and promote preliminary screening.
Specifically, in this embodiment, the number of occurrences of each rumor keyword in the news text, the method of word segmentation in the news text, and the method of obtaining the number of wrongly written words are not specifically limited, and a person skilled in the art may freely set the number of occurrences of each rumor keyword in the news text, the method of word segmentation in the news text, and the method of obtaining the number of wrongly written words in each time, which are only required to meet the set requirements, where word frequency can be counted by SnowNLP, word segmentation can be performed on the news text by using a bargain word segmentation tool, emotion assignment can be performed on each word in the news text by using a hakuh emotion dictionary, and the number of wrongly written words in the news text can be identified by using Hunspell.
With continued reference to fig. 2, the text analysis module further includes:
The feature extraction unit is connected with the text analysis unit, and calculates text features WT of the news text according to the occurrence times of the key words of each rumor in the news text, the emotion intensity value of each word of the news text and the number of wrongly written words of the news text, and sets:
;
According to the multi-dimensional characteristics such as the occurrence frequency of the keywords, the emotion intensity, the number of wrongly written words and the like, the text characteristic WT is comprehensively calculated, the judgment capacity and the accuracy of the system on the news text are improved, and the accuracy of network rumor identification is further improved, wherein w1 is the keyword weight, w2 is the emotion weight, w3 is the wrongly written word weight, w1+w2+w3=1, I is the number of rumor keywords in the news text, N is the total word number of the news text, and NZ is the total word number of the news text.
It can be understood that, in this embodiment, the method for obtaining the total word number (N) and the total word Number (NZ) of the news text is not specifically limited, and can be freely set by a person skilled in the art, so long as the total word number (N) and the total word Number (NZ) of the news text generally meet the requirement of obtaining through a string processing function in a programming language, where the total word number (N) and the total word Number (NZ) can be obtained through the string processing function in the programming language.
With continued reference to fig. 1, the system further includes:
The user analysis module is connected with the text analysis module and is used for analyzing the user behavior characteristics according to the comment times Ct, the praise times R, the sharing times L and the reply times S of the news text publisher in the first monitoring period so as to effectively evaluate the user behavior characteristics, wherein:
The user behavior feature analysis module sets the user behavior feature as HT, sets ht=α×ct/t1+β× (r+l+s)/T1, where T1 is the duration of the first monitoring period, α is the comment weight, β is the interaction weight, and α+β=1, it can be understood that in this embodiment, the setting of the comment weight and the interaction weight is not specifically limited, and a person skilled in the art can freely set the setting requirements of the comment weight and the interaction weight only need to be satisfied, where the optimal value of α is 0.4, and the optimal value of β is 0.6, and by analyzing the interaction behavior (comment, praise, share, etc.) of the news publisher in the first monitoring period, the module can effectively evaluate the user behavior feature HT, reflect the potential risk of rumor propagation, and integrate social interaction into the rumor recognition process by the user behavior analysis, thereby improving the practicability of the system.
It can be understood that, in the embodiment, the comment times Ct, the praise times R, the sharing times L and the reply times S of the news text publisher in the first monitoring period are not specifically limited, and the person skilled in the art can freely set the comment times Ct, the praise times R, the sharing times L and the reply times S of the news text publisher in the first monitoring period only need to be met, wherein the comment times Ct, the praise times R, the sharing times L and the reply times S of the news text publisher in the first monitoring period can be obtained by the API of the social media or the news website, the setting of the first monitoring period is not specifically limited, and the person skilled in the art can freely set the setting of the first monitoring period only need to be met, wherein the news text publisher can be used as the first monitoring period 7 days before the publishing.
With continued reference to fig. 1, the system further includes:
and the transmission analysis module is connected with the user analysis module and is used for analyzing the transmission characteristics of the news text according to the sharing total number and the number of the transmission people of the news text in the second monitoring period.
Referring to fig. 3, the propagation analysis module includes:
The speed analysis unit is used for analyzing the propagation speed Ve of the news text according to the sharing total number fx of the news text in the second monitoring period, timely identifying information propagation trend, setting ve=fx/T2 and T2 as the duration of the second monitoring period, and calculating the propagation speed Ve by analyzing the relationship between the sharing total number and time of the news text in the appointed monitoring period. A high propagation speed generally means that the content is interesting and shared by a large number of users in a short time, possibly indicating that the content is an emergency or information with a high degree of focus. Through the analysis, the system can timely identify the propagation trend of the information and provide basis for rumor identification.
With continued reference to fig. 3, the propagation analysis module further includes:
The construction unit is connected with the speed analysis unit and used for constructing the propagator coefficient according to the number of propagators of the news text in the second monitoring period so as to effectively distinguish the credibility of different propagators and reduce the propagation risk of false information, if cb (k) is less than or equal to q1, the propagator coefficient of the kth propagator is set as P1 by the propagation analysis unit, if q1< cb (k) is less than or equal to q2, the propagator coefficient of the kth propagator is set as P2 by the propagation analysis unit, if cb (k) is greater than q2, the propagator coefficient of the kth propagator is set as P3 by the propagation analysis unit, and the propagator coefficient can be evaluated according to the past propagation records of different propagators, so that the robustness and the effectiveness of the system are further enhanced;
It is understood that, in this embodiment, each preset number and preset coefficient are not specifically limited, and a person skilled in the art can freely set the number and preset coefficient, and only needs to meet the setting requirement of each preset number and preset coefficient, wherein, the optimal value of q1 is 3, the optimal value of q2 is 5, the optimal value of P1 is 1, the optimal value of P2 is 1.2, and the optimal value of P3 is 1.5.
Specifically, the method for acquiring the total sharing number of the news texts, the number of the propagator and the number of the propagator historical propagation rumor texts in the second monitoring period is not limited in this embodiment, and can be freely set by a person skilled in the art, and only the requirements for acquiring the total sharing number of the news texts, the number of the propagator and the number of the propagator historical propagation rumor texts in the second monitoring period are met, wherein the news texts can be acquired through social media or APIs of a news website, the method for acquiring the news texts in the second monitoring period is not limited in this embodiment, and can be freely set by a person skilled in the art, and only the requirements for setting the second monitoring period are met, wherein the second monitoring period can be set to be 5 days after the news texts are released.
With continued reference to fig. 3, the propagation analysis module further includes:
the propagation analysis unit is connected with the construction unit and is used for analyzing the propagation characteristics CT of the news text according to the propagation speed Ve of the news text, the number zc of the propagators of the news text in the second monitoring period and the construction result of the propagator coefficients, and setting:
;
The method comprises the steps of determining a transmission speed of news text, determining a transmission characteristic of the news text, determining a transmission quantity threshold value of CB, determining a transmission characteristic of the news text, determining a transmission speed of VT, determining a transmission quantity threshold value of CB, determining a transmission coefficient of a kth transmitter of the news text by integrating a transmission speed analysis and a transmission coefficient result, and further analyzing the transmission characteristic of the news text to help judge influence and potential public opinion risks of the news content, so that scientific basis is provided for further rumor identification.
It can be understood that, in this embodiment, the setting of the preset propagation speed and the preset propagation number threshold is not specifically limited, and a person skilled in the art can freely set the setting requirement of the preset propagation speed and the preset propagation number threshold only needs to be met, wherein the optimal value of VT is 200 times/day, and the optimal value of CB is 1000.
With continued reference to fig. 1, the system further includes:
the recognition module is connected with the transmission analysis module and is used for recognizing the news text according to the analysis result of the characteristics of the news text, the analysis result of the behavior characteristics of the user and the analysis result of the transmission characteristics of the news text so as to obtain the recognition result of the news text, and adjusting the recognition process of the news text according to the number of the historical transmission rumor texts of a news text publisher and the release time of the news text.
Referring to fig. 4, the identification module includes:
the recognition unit is used for recognizing the news text according to the analysis result of the characteristics of the news text, the analysis result of the behavior characteristics of the user and the analysis result of the propagation characteristics of the news text, wherein:
if γ1×wt+γ2×ht+γ3×ct is not more than u0, the recognition unit determines that the news text is a normal news text;
If γ1×wt+γ2×ht+γ3×ct > u0, the recognition unit determines that the news text is a rumor risk news text;
the method comprises the steps of comprehensively analyzing text characteristics, user behavior characteristics and propagation characteristics of a news text, so that final judgment of the news text is realized, and accuracy of network rumor identification is improved.
It can be understood that the setting of each weight and the preset anomaly coefficient is not specifically limited in this embodiment, and a person skilled in the art can freely set the setting of each weight and the preset anomaly coefficient only by meeting the setting requirement, wherein the optimal value of γ1 is 0.4, the optimal value of γ2 is 0.3, the optimal value of γ3 is 0.3, and the optimal value of u0 is 0.46.
With continued reference to fig. 4, the identification module further includes:
The source analysis unit is connected with the identification unit, constructs source anomaly coefficients of the news texts according to the number ys of the historical propagation rumor texts of the news text publishers, and adjusts the identification process of the news texts according to the source anomaly coefficients, wherein:
If ys=0, the source analysis unit sets the source abnormality factor of the news text to 0;
If ys >0, the source analysis unit sets the source anomaly coefficient of the news text as LY, and sets ly=η×lg (ys+1), wherein η is a preset adjustment ratio, and 0< η <0.4;
The source analysis unit sets the adjusted preset abnormal coefficient as u1, sets u1=u0-LY, and constructs the source abnormal coefficient by analyzing the historical propagation rumor text quantity of the publisher, so that the system can perform risk assessment on the source of the news text to improve the accuracy of network rumor identification.
Specifically, the method for obtaining the number of the history propagation rumor texts of the news text publisher is not specifically limited, and can be freely set by a person skilled in the art, and only the requirement for obtaining the number of the history propagation rumor texts of the news text publisher is met, wherein the method can be obtained through social media or APIs of a news website.
It can be understood that the setting of the preset adjustment ratio is not particularly limited in this embodiment, and a person skilled in the art can freely set the setting of the preset adjustment ratio only by meeting the setting requirement of the preset adjustment ratio, wherein the optimal value of η is 0.3.
With continued reference to fig. 4, the identifying further includes:
The consistency analysis unit is connected with the source analysis unit, analyzes consistency parameters YZ of the news text according to the release days t0 and the preset release days t1 of the news text, and sets YZ=1-1/(1+e t1-t0), and e is a natural logarithm;
The consistency analysis unit compares consistency parameters YZ of the news text with a preset consistency threshold YZ0 and adjusts the construction process of the source anomaly coefficient according to the comparison result, wherein:
if YZ is less than or equal to YZ0, the consistency analysis unit judges that the consistency parameters of the news text accord with the threshold value and does not adjust;
If YZ > YZ0, the consistency analysis unit judges that the consistency parameter of the news text does not accord with the threshold value, sets the preset adjustment proportion as eta 1, sets eta 1 = eta x {1+ exp [3 x (YZ-YZ 0)/(YZ+yz0) -3] }, evaluates the consistency parameter YZ of the news text according to the release time, and is helpful for judging whether the information is a network rumor not, thereby providing important basis for rumor identification and further improving the accuracy of network rumor identification.
Specifically, the embodiment does not specifically limit the manner of obtaining the number of days of release of the news text, and a person skilled in the art can freely set the method, which only needs to meet the requirement of obtaining the number of days of release of the news text, wherein the method can be obtained through social media or an API of a news website.
It can be understood that, in this embodiment, the setting of the preset consistency threshold and the preset number of days of release is not specifically limited, and a person skilled in the art can freely set the setting of the preset consistency threshold and the preset number of days of release only by meeting the setting requirement of the preset consistency threshold and the preset number of days of release, wherein the optimal value of t1 is 5 days, and the optimal value of yz0 is 0.5.
With continued reference to fig. 1, the system further includes:
the storage module is connected with the identification module and used for storing the identification result of the news text and outputting the identification result to a user.
Specifically, the network rumor recognition system based on artificial intelligence is applied to rumor recognition of social media and a network platform, and through comprehensively analyzing multidimensional information such as text characteristics, user behaviors, propagation characteristics, information sources and the like, accuracy and efficiency of rumor recognition are improved, and the network system is beneficial to building a healthier and safer network environment.
It should be understood that the foregoing examples of the present invention are provided merely for clearly illustrating the present invention and are not intended to limit the embodiments of the present invention, and that various other changes and modifications may be made therein by one skilled in the art without departing from the spirit and scope of the present invention as defined by the appended claims.