[go: up one dir, main page]

CN112836505A - An Information Credibility Detection and Evaluation System Based on Multi-source Data - Google Patents

An Information Credibility Detection and Evaluation System Based on Multi-source Data Download PDF

Info

Publication number
CN112836505A
CN112836505A CN202110080173.9A CN202110080173A CN112836505A CN 112836505 A CN112836505 A CN 112836505A CN 202110080173 A CN202110080173 A CN 202110080173A CN 112836505 A CN112836505 A CN 112836505A
Authority
CN
China
Prior art keywords
source data
data
credibility
research
reliability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110080173.9A
Other languages
Chinese (zh)
Inventor
丛杨
董家华
孙干
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenyang Institute of Automation of CAS
Original Assignee
Shenyang Institute of Automation of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenyang Institute of Automation of CAS filed Critical Shenyang Institute of Automation of CAS
Priority to CN202110080173.9A priority Critical patent/CN112836505A/en
Publication of CN112836505A publication Critical patent/CN112836505A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明提供了一种基于多源数据的信息可信度检测与评价系统,其中,该系统包括:数据采集模块,用于采集与调研任务相关的多源数据;数据处理模块,用于对采集的与调研任务相关的多源数据进行预处理操作,获取其语义特征描述向量以及每个调研问题的属性标签信息;可信度检测与评价模块,用于检测每个调研问题的可信度,并评价被调研对象的整体可信度。本发明在无监督指导信息下挖掘多源数据的潜在结构特性和多样化、特色化的数据属性,从而有效的进行多源调研数据进行信息可信度检测与评估,分析检测每条调研数据的可信度,并评估被调研对象的整体可信度,为相关可信度检测的数据挖掘任务提供强有力的监督与指导。

Figure 202110080173

The present invention provides an information reliability detection and evaluation system based on multi-source data, wherein the system includes: a data acquisition module for acquiring multi-source data related to research tasks; a data processing module for The multi-source data related to the research task is preprocessed to obtain its semantic feature description vector and the attribute label information of each research question; the credibility detection and evaluation module is used to detect the credibility of each research question. And evaluate the overall credibility of the respondents. The invention mines the potential structural characteristics and diversified and characteristic data attributes of multi-source data under unsupervised guidance information, so as to effectively carry out multi-source research data for information credibility detection and evaluation, and analyze and detect the validity of each piece of research data. reliability, and evaluate the overall credibility of the surveyed object, providing strong supervision and guidance for the data mining task of related credibility detection.

Figure 202110080173

Description

Information credibility detection and evaluation system based on multi-source data
Technical Field
The invention belongs to the technical field of information credibility detection and evaluation, and particularly relates to an information credibility detection and evaluation system based on multi-source data.
Background
The existing data reliability analysis system mainly detects whether a single data source is abnormal or not under the guidance of partial supervision information. Due to the influences of the complexity of a data structure, the diversification of data attributes, noise interference information and the like in the real world, the existing data reliability analysis system has the defect of robustness for multi-source data reliability detection.
Disclosure of Invention
In order to solve the problems, the invention provides an information credibility detection and evaluation system based on multi-source data, which can dig the potential structural characteristics and diversified and distinctive data attributes of the multi-source data under unsupervised guidance information, thereby effectively carrying out information credibility detection and evaluation on the multi-source research data, analyzing and detecting the credibility of each piece of research data, evaluating the overall credibility of a researched object, and providing powerful supervision and guidance for data mining tasks of related credibility detection.
The invention aims to provide an information credibility detection and evaluation system based on multi-source data, which is used for carrying out information credibility detection and evaluation on the multi-source research data, analyzing and detecting the credibility of each piece of research data, and evaluating the overall credibility of a researched object.
The technical scheme adopted by the invention for realizing the purpose is as follows:
a multi-source data-based information credibility detection and evaluation system comprises:
the data acquisition module is used for acquiring multi-source data related to the research task;
the data processing module is used for preprocessing the collected multi-source data related to the research tasks and acquiring semantic feature description vectors in the multi-source data and attribute labels of each research task;
and the reliability detection and evaluation module is used for detecting the reliability of each investigation task according to the semantic feature description vector and the attribute label information and evaluating the overall reliability of the investigated object.
The data acquisition module comprises:
a keyword acquisition unit for acquiring keywords related to the research task;
and the data mining unit is used for acquiring multi-source data information containing keywords in the research task.
The data processing module comprises:
the data screening unit is used for pre-screening the multi-source data to eliminate repeated and redundant multi-source data and delete the multi-source data irrelevant to the semantics of the keywords;
the text word segmentation unit is used for carrying out sentence word segmentation processing on the multi-source data to obtain vocabulary data after the sentence word segmentation;
and the characteristic extraction unit is used for extracting the characteristic vector in the vocabulary data and further determining the semantic characteristic description vector of the statement.
The credibility detection and evaluation module comprises:
the multi-source data credibility detection unit is used for detecting the credibility of each piece of multi-source data related to the investigation task and giving a credibility detection value;
and the reliability evaluation unit of the investigated object is used for evaluating the overall reliability of the investigated object and providing a reliability evaluation value of the investigated object.
The multi-source data credibility detection unit comprises:
the dictionary learning unit is used for learning sparse dictionaries for different research tasks, extracting semantic information of multi-source data related to the research tasks from semantic feature description vectors of the multi-source data, and discarding redundant semantic information of the multi-source data unrelated to the research tasks;
the data reconstruction unit is used for reconstructing semantic feature description vectors of multi-source data used for the research task according to the learned sparse dictionary;
the data reliability detection unit is used for quantizing the reconstructed semantic feature description vector into a reliability detection numerical value;
the investigation object credibility evaluation unit comprises:
the attribute level analysis unit is used for determining the relative importance weights of all the attribute labels of the multi-source data of the research task through a level analysis method;
and the object reliability evaluation unit is used for carrying out weighted average on the reliability values of all the multi-source data according to the relative importance weight of each piece of multi-source data to obtain the reliability evaluation value of the investigated object.
A method for detecting and evaluating information reliability based on multi-source data comprises the following steps:
1) the data acquisition module acquires multi-source data related to the research task;
2) the data processing module carries out preprocessing operation on the collected multi-source data related to the research tasks and obtains semantic feature description vectors in the multi-source data and attribute labels of each research task;
3) and the reliability detection and evaluation module detects the reliability of each investigation task according to the semantic feature description vector and the attribute label information and evaluates the overall reliability of the investigated object.
The step 1) comprises the following steps:
1.1) a keyword acquisition unit acquires keywords related to a research task;
1.2) the data mining unit acquires multi-source data information containing keywords in the research task.
The step 2) comprises the following steps:
2.1) the data screening unit pre-screens the multi-source data to eliminate repeated and redundant multi-source data and delete the multi-source data irrelevant to the semantics of the keywords;
2.2) the text word segmentation unit carries out sentence word segmentation on the multi-source research data to obtain vocabulary data after sentence word segmentation;
2.3) the feature extraction unit extracts the feature vector in the vocabulary data and further determines the semantic feature description vector of the sentence.
The step 3) comprises the following steps:
3.1) the multi-source data credibility detection unit detects the credibility of each piece of multi-source data related to the research task and provides a credibility detection value;
and 3.2) evaluating the overall reliability of the investigated object by the investigated object reliability evaluating unit, and giving a reliability evaluation value of the investigated object.
Said step 3.1), step 3.2), comprising the steps of:
3.1.1) the dictionary learning unit learns sparse dictionaries aiming at different research tasks, extracts semantic information of multi-source data relevant to the research tasks from semantic feature description vectors of the multi-source data, and discards redundant semantic information of the multi-source data irrelevant to the research tasks;
3.1.2) the data reconstruction unit reconstructs semantic feature description vectors of the multi-source data used for the research task based on the learned sparse dictionary;
3.1.3) the data credibility detection unit quantizes the reconstructed semantic feature description vector into a credibility detection numerical value;
3.2.1) the attribute hierarchical analysis unit determines the relative importance weights of all the attribute labels of the multi-source data of the investigation task through a hierarchical analysis method;
3.2.2) the object credibility evaluation unit obtains the credibility evaluation numerical value of the investigated object by weighted average of the credibility numerical values of all multi-source data according to the relative importance weight of each multi-source data.
The invention has the advantages and beneficial effects that:
1. the invention fills the blank of establishing a multi-source data credibility detection and evaluation system under the condition of unsupervised information guidance, and provides powerful supervision and guidance for related multi-source data mining tasks based on credibility detection in the big data era of false information flooding.
2. The method can dig the potential structural characteristics and diversified and distinctive data attributes of the multi-source data under the unsupervised guidance information, thereby effectively improving the performance and robustness of information credibility detection and evaluation of the multi-source research data and providing reference significance for other big data-driven excavation and evaluation tasks.
Drawings
In order to more clearly explain the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention, and therefore should not be considered as limiting the scope, and those skilled in the art can also obtain other drawings according to the drawings without inventive efforts.
FIG. 1 is a schematic diagram of a system for detecting and evaluating information credibility based on multi-source data according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a data acquisition module in an information credibility detection and evaluation system based on multi-source data according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a data processing module in an information credibility detection and evaluation system based on multi-source data according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a reliability detection and evaluation module in an information reliability detection and evaluation system based on multi-source data according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a multi-source data reliability detection unit in an information reliability detection and evaluation system based on multi-source data according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of a reliability evaluation unit of an investigated object in an information reliability detection and evaluation system based on multi-source data according to an embodiment of the present invention;
FIG. 7 is a flowchart of a method for constructing an information credibility detection and evaluation system based on multi-source data according to an embodiment of the present invention;
FIG. 8 is a data collection flow chart in a method for constructing a multi-source data-based information reliability detection and evaluation system according to an embodiment of the present invention;
FIG. 9 is a data processing flow chart in a method for constructing a multi-source data-based information reliability detection and evaluation system according to an embodiment of the present invention;
FIG. 10 is a flowchart of reliability detection and evaluation in a method for constructing a multi-source data-based information reliability detection and evaluation system according to an embodiment of the present invention;
FIG. 11 is a flowchart of multi-source data reliability detection in a method for constructing a multi-source data-based information reliability detection and evaluation system according to an embodiment of the present invention;
fig. 12 is a flowchart of reliability evaluation of an object to be investigated in a method for constructing an information reliability detection and evaluation system based on multi-source data according to an embodiment of the present invention.
Detailed Description
In order to make the advantages, technical solutions and purposes of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly set forth below with reference to the drawings in the embodiments of detecting and evaluating the information credibility of the multi-source data investigated and filled by enterprises, where the embodiments of detecting and evaluating the information credibility of the multi-source data investigated and filled by enterprises are only a part of the embodiments of the present invention, and not all of the embodiments of the present invention. The components of the embodiments of the present invention illustrated in the drawings may be designed in a variety of different combined configurations. Accordingly, the detailed description of the embodiments of the present invention provided below in the accompanying drawings is not intended to limit the scope of the invention as claimed, but is merely representative of selected embodiments of the invention. Other embodiments, which are based on the embodiments of the invention and can be obtained by a person skilled in the art without inventive work, are within the scope of the patent protection of the invention.
The multi-source research data used for related task evaluation has a large amount of abnormal information and false data, which seriously affects the accuracy of an evaluation system, and further restricts the supervision and guidance function of data mining on industrial mode adjustment and economic development acceleration gear shifting. In order to solve the problems, the invention provides an information credibility detection and evaluation system based on multi-source data, which can detect and evaluate the information credibility of the multi-source data for research and evaluation, analyze and detect the credibility of each piece of research data, and evaluate the overall credibility of a researched object.
As shown in fig. 1, the present embodiment provides an information reliability detection and evaluation system based on multi-source data, where the system includes:
the data acquisition module 11: the system is used for collecting multi-source text data related to research tasks, wherein the multi-source text data refers to text data information of different enterprises in different cities obtained from government official networks, research questionnaires and enterprise internet;
the data processing module 22: the system comprises a word segmentation technology, a word segmentation technology and a data processing and analyzing unit, wherein the word segmentation technology is used for preprocessing collected multi-source data related to a research task to eliminate noise interference information, obtaining vocabulary data of a text statement by utilizing the word segmentation technology, and obtaining semantic feature description vectors and attribute label information of each research problem;
confidence detection and evaluation module 33: the reliability value is quantized to a value between 0 and 100, and the larger the value is, the higher the reliability is.
Wherein the correlation includes at least one of a keyword representing the investigated object or the recommendation.
For the collected multi-source text data related to enterprise research problems, the embodiment of the invention carries out preprocessing operation on the data to obtain the text feature description vector and the attribute label information of each research problem. The preprocessing operation comprises the steps of removing repeated and redundant data and deleting text expression information irrelevant to the key word semantics. And then, performing word segmentation on the text sentence to obtain corresponding word data, utilizing a word embedding technology to perform word segmentation on the sentence to obtain a word embedding matrix according to the context expression of the word data, performing maximum pooling operation on word segmentation characteristics to obtain characteristic expression of the text sentence, and performing credibility detection and evaluation tasks of research data. The attribute label information refers to main categories to which set research questions belong, including logistics support, economic benefits, registered funds, partners, development scale, employee treatment, technical field, working time, civil investigation, government policies, innovation contributions and talent introduction, and each main category includes 5-10 categories of secondary subdivisions.
For the above-mentioned approach to obtain multi-source text data information from different enterprises in different cities from government official networks, questionnaires, and enterprise internet, a data acquisition module 11 provided by the embodiment of the present invention is shown in fig. 2, and includes:
the keyword acquisition unit 111: the system is used for acquiring statement keywords related to research and evaluation tasks, and information retrieval related to enterprise research and evaluation is conveniently carried out from government official networks, research questionnaires, enterprise internet and the like according to the acquired keywords;
the data mining unit 112: the method is used for mining multi-source data information containing keywords in the research and evaluation task. With the keywords as a reference, the information retrieval can be obtained through a remote data access interface or a web crawler. On one hand, public data of government official websites or open data of related websites can be used as data interfaces to acquire data; and on the other hand, a web crawler technology can be adopted to crawl data information related to enterprise research from website data.
In order to extract text feature vectors from the multi-source data investigated by the enterprises, the embodiment of the invention provides that semantic feature representation of text sentences is learned by exploring word embedding matrixes by considering vocabulary context information. As shown in fig. 3, the data processing module 22 according to the embodiment of the present invention includes:
the data screening unit 221 is used for performing pre-screening processing on the collected multi-source data related to enterprise research problems, eliminating repeated and redundant filling data, and deleting text expression information unrelated to keyword semantics;
the text word segmentation unit 222 is used for performing sentence word segmentation on the acquired multi-source data researched by the enterprise to obtain vocabulary data after sentence word segmentation;
the feature extraction unit 223 extracts a feature vector for the vocabulary data after the text is segmented according to the word embedding technology, and further determines a feature description vector of the text statement, where the feature description vector is expressed in the robustness of the feature space of the corresponding text statement.
The method comprises the steps of firstly, carrying out simple screening processing on acquired multi-source data researched by enterprises, automatically eliminating irrelevant information such as less information (for example, less than 20 words), repeated text information (information duplication), redundant text (for example, redundant 5000 words) and the like, then carrying out word segmentation processing on text statement data with rich residual information through word segmentation technology, embedding a word segmentation vocabulary learning word into a matrix based on the word embedding technology, acquiring feature vectors of the word segmentation vocabulary through the word embedding matrix, carrying out maximum pooling operation on the feature vectors of all the word segmentation of the text statement and acquiring semantic feature representation of the current text statement.
For the semantic feature representation of the text sentence obtained above, the embodiment of the present invention provides a credibility detection and evaluation module 33, which is used for credibility detection of each research question of an enterprise and credibility analysis of the whole researched enterprise. As shown in fig. 4, the reliability detection and evaluation module 33 according to the embodiment of the present invention includes:
multi-source data reliability detection unit 331: the reliability detection value is used for detecting the reliability of each investigation question and is quantized to be between 0 and 100, and the larger the value is, the higher the reliability is;
investigated subject reliability evaluation section 332: the method is used for evaluating the overall credibility of the investigated enterprise and giving a credibility evaluation value of the investigated object, the credibility detection value is between 0 and 100, and the higher the value is, the higher the credibility is.
For the multi-source data credibility detection unit 331, the information credibility of the enterprise filled data is detected by constructing a dictionary optimization model, and powerful supervision and guidance are provided for relevant credibility assessment tasks. As shown in fig. 5, the multi-source data reliability detection unit 331 includes:
the dictionary learning unit 3311: the dictionary optimization technology can be used for constructing and learning a sparse dictionary aiming at credibility evaluation detection tasks of different investigation problems, the dictionary can mine text information highly related to the credibility evaluation detection tasks, redundant irrelevant text data is abandoned, and the complex structure of multi-source data is effectively explored;
the data reconstruction unit 3312: based on the learned sparse dictionary, text feature description vectors used for detection and evaluation before can be reconstructed, and reconstruction errors can be obtained;
the data reliability detection unit 3313: the confidence measure value used to quantify the reconstruction error by the activation function between 0 and 100, with higher values being indicative of higher values.
The sparse dictionary expression relevant to enterprise research multi-source data credibility detection evaluation is learned by constructing a combined optimization objective function of the text dictionary and sparse features of the text sentences for the obtained text sentence feature description vectors of enterprise research, the sparse dictionary can mine text sentence information highly relevant to information credibility detection, and redundant irrelevant text data is abandoned. And reconstructing text statement information of enterprise research data by taking the optimized sparse dictionary as reference, and quantizing the reconstruction error into a value of reliability detection through an activation function. Therefore, the credibility detection and evaluation can be accurately and quickly carried out on each filled problem of the enterprise, and the credibility evaluation values are quantized to be 0 and 100.
For the reliability evaluation unit 332 of the investigated object, importance weights of the data attributes are determined through an analytic hierarchy process, and the overall reliability evaluation is performed on the investigated enterprise in a data weighting mode. As shown in fig. 6, the investigated object reliability evaluation unit 332 includes:
attribute level analysis unit 3321: the method is used for determining relative importance weights of all attributes through an analytic hierarchy process for 12 main category attributes and 5-10 secondary category attributes of research data;
the object reliability evaluation unit 3322: and according to the relative importance weight of each piece of research data, carrying out weighted average on the credibility values to obtain the overall credibility value of the researched enterprise, wherein the value is quantized to be between 0 and 100, and the larger the value is, the higher the credibility is.
Based on the same inventive concept, the embodiment of the invention also provides an implementation method corresponding to the information credibility detection and evaluation system based on the multi-source data, and as the principle of the implementation method in the embodiment of the invention is similar to that of the enterprise filled information credibility detection and evaluation system in the embodiment of the invention, the implementation of the method can refer to the implementation of the system, and repeated parts are not repeated. As shown in fig. 7, a flowchart of a method for detecting and evaluating information reliability based on multi-source data according to an embodiment of the present application includes:
s11: collecting multisource data which is filled by enterprises and related to enterprise research problems, wherein the multisource text data refers to text data information of different enterprises in different cities obtained from government official networks, research questionnaires and enterprise internet;
s22: preprocessing collected multi-source data related to the research tasks to obtain semantic feature description vectors and attribute label information of each research question, wherein the semantic feature description vectors comprise 12 main categories and 5-10 secondary subdivided categories;
s33: and detecting the reliability of each investigation question and evaluating the overall reliability of the investigated object.
In the embodiment of the present invention, as shown in fig. 8, the step S11 specifically includes the following steps:
s111: acquiring keywords related to enterprise research data evaluation so as to perform information retrieval related to the enterprise research evaluation;
s112: and mining multi-source data containing the keywords in the enterprise research evaluation by a data interface or a web crawler.
In the embodiment of the present invention, as shown in fig. 9, the step S22 specifically includes the following steps:
s221: pre-screening collected multi-source data related to enterprise research problems, eliminating repeated and redundant enterprise filled data, and deleting text expression information unrelated to keyword semantics;
s222: performing sentence segmentation processing on the acquired multisource data investigated by the enterprises to obtain vocabulary data after sentence segmentation;
s223: and extracting a characteristic vector for the vocabulary data after the text is participled according to a word embedding technology, and further determining the characteristic description vector of the text statement.
In the embodiment of the present invention, as shown in fig. 10, the step S33 specifically includes the following steps:
s331: detecting the reliability of each investigation problem and giving a reliability detection value;
s332: and evaluating the overall credibility of the investigated enterprise, and giving a credibility evaluation value of the investigated object.
In an embodiment of the present invention, as shown in fig. 11, the step S331 specifically includes the following steps:
s3311: constructing and learning a sparse dictionary aiming at credibility evaluation detection tasks of different research problems, mining text information highly related to the credibility evaluation detection tasks, and abandoning redundant irrelevant text data;
s3312: based on the constructed and learned sparse dictionary, the acquired text feature description vector of the multi-source data for detection and evaluation can be reconstructed;
s3313: the reconstruction error is quantized by an activation function to a value for confidence detection.
In the embodiment of the present invention, as shown in fig. 12, the step S332 specifically includes the following steps:
s3321: determining relative importance weights of all attributes of the researched data through an analytic hierarchy process;
s3322: and according to the relative importance weight of each piece of research data, carrying out weighted average on the credibility values to obtain the overall credibility value of the enterprise to be researched.

Claims (10)

1. A multi-source data-based information credibility detection and evaluation system is characterized by comprising:
the data acquisition module is used for acquiring multi-source data related to the research task;
the data processing module is used for preprocessing the collected multi-source data related to the research tasks and acquiring semantic feature description vectors in the multi-source data and attribute labels of each research task;
and the reliability detection and evaluation module is used for detecting the reliability of each investigation task according to the semantic feature description vector and the attribute label information and evaluating the overall reliability of the investigated object.
2. The system of claim 1, wherein the data collection module comprises:
a keyword acquisition unit for acquiring keywords related to the research task;
and the data mining unit is used for acquiring multi-source data information containing keywords in the research task.
3. The system of claim 1, wherein the data processing module comprises:
the data screening unit is used for pre-screening the multi-source data to eliminate repeated and redundant multi-source data and delete the multi-source data irrelevant to the semantics of the keywords;
the text word segmentation unit is used for carrying out sentence word segmentation processing on the multi-source data to obtain vocabulary data after the sentence word segmentation;
and the characteristic extraction unit is used for extracting the characteristic vector in the vocabulary data and further determining the semantic characteristic description vector of the statement.
4. The system of claim 1, wherein the credibility detection and evaluation module comprises:
the multi-source data credibility detection unit is used for detecting the credibility of each piece of multi-source data related to the investigation task and giving a credibility detection value;
and the reliability evaluation unit of the investigated object is used for evaluating the overall reliability of the investigated object and providing a reliability evaluation value of the investigated object.
5. The system of claim 4, wherein the multi-source data credibility detection unit comprises:
the dictionary learning unit is used for learning sparse dictionaries for different research tasks, extracting semantic information of multi-source data related to the research tasks from semantic feature description vectors of the multi-source data, and discarding redundant semantic information of the multi-source data unrelated to the research tasks;
the data reconstruction unit is used for reconstructing semantic feature description vectors of multi-source data used for the research task according to the learned sparse dictionary;
the data reliability detection unit is used for quantizing the reconstructed semantic feature description vector into a reliability detection numerical value;
the investigation object credibility evaluation unit comprises:
the attribute level analysis unit is used for determining the relative importance weights of all the attribute labels of the multi-source data of the research task through a level analysis method;
and the object reliability evaluation unit is used for carrying out weighted average on the reliability values of all the multi-source data according to the relative importance weight of each piece of multi-source data to obtain the reliability evaluation value of the investigated object.
6. A method for detecting and evaluating information reliability based on multi-source data is characterized by comprising the following steps:
1) the data acquisition module acquires multi-source data related to the research task;
2) the data processing module carries out preprocessing operation on the collected multi-source data related to the research tasks and obtains semantic feature description vectors in the multi-source data and attribute labels of each research task;
3) and the reliability detection and evaluation module detects the reliability of each investigation task according to the semantic feature description vector and the attribute label information and evaluates the overall reliability of the investigated object.
7. The method for detecting and evaluating the information reliability based on the multi-source data according to claim 6, wherein the step 1) comprises the following steps:
1.1) a keyword acquisition unit acquires keywords related to a research task;
1.2) the data mining unit acquires multi-source data information containing keywords in the research task.
8. The method for detecting and evaluating information credibility based on multi-source data according to claim 6, wherein the step 2) comprises the following steps:
2.1) the data screening unit pre-screens the multi-source data to eliminate repeated and redundant multi-source data and delete the multi-source data irrelevant to the semantics of the keywords;
2.2) the text word segmentation unit carries out sentence word segmentation on the multi-source research data to obtain vocabulary data after sentence word segmentation;
2.3) the feature extraction unit extracts the feature vector in the vocabulary data and further determines the semantic feature description vector of the sentence.
9. The method for detecting and evaluating information credibility based on multi-source data according to claim 6, wherein the step 3) comprises the following steps:
3.1) the multi-source data credibility detection unit detects the credibility of each piece of multi-source data related to the research task and provides a credibility detection value;
and 3.2) evaluating the overall reliability of the investigated object by the investigated object reliability evaluating unit, and giving a reliability evaluation value of the investigated object.
10. The method for detecting and evaluating information reliability based on multi-source data according to claim 9, wherein the steps 3.1) and 3.2) comprise the following steps:
3.1.1) the dictionary learning unit learns sparse dictionaries aiming at different research tasks, extracts semantic information of multi-source data relevant to the research tasks from semantic feature description vectors of the multi-source data, and discards redundant semantic information of the multi-source data irrelevant to the research tasks;
3.1.2) the data reconstruction unit reconstructs semantic feature description vectors of the multi-source data used for the research task based on the learned sparse dictionary;
3.1.3) the data credibility detection unit quantizes the reconstructed semantic feature description vector into a credibility detection numerical value;
3.2.1) the attribute hierarchical analysis unit determines the relative importance weights of all the attribute labels of the multi-source data of the investigation task through a hierarchical analysis method;
3.2.2) the object credibility evaluation unit obtains the credibility evaluation numerical value of the investigated object by weighted average of the credibility numerical values of all multi-source data according to the relative importance weight of each multi-source data.
CN202110080173.9A 2021-01-21 2021-01-21 An Information Credibility Detection and Evaluation System Based on Multi-source Data Pending CN112836505A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110080173.9A CN112836505A (en) 2021-01-21 2021-01-21 An Information Credibility Detection and Evaluation System Based on Multi-source Data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110080173.9A CN112836505A (en) 2021-01-21 2021-01-21 An Information Credibility Detection and Evaluation System Based on Multi-source Data

Publications (1)

Publication Number Publication Date
CN112836505A true CN112836505A (en) 2021-05-25

Family

ID=75929659

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110080173.9A Pending CN112836505A (en) 2021-01-21 2021-01-21 An Information Credibility Detection and Evaluation System Based on Multi-source Data

Country Status (1)

Country Link
CN (1) CN112836505A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115511683A (en) * 2022-09-23 2022-12-23 上海市疾病预防控制中心 Public health data acquisition and processing system
CN118134511A (en) * 2024-05-06 2024-06-04 中汽信息科技(天津)有限公司 Carbon emission management method, device and medium for vehicle products

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014209283A (en) * 2013-04-16 2014-11-06 日本電信電話株式会社 Evaluation information extraction device, and certainty factor learning device, method, and program
CN107526820A (en) * 2017-08-29 2017-12-29 广东省技术经济研究发展中心 A kind of more storehouse enterprise innovation monitoring big data normal data base construction methods of multi-source
CN108663651A (en) * 2018-05-04 2018-10-16 国网上海市电力公司 A kind of intelligent electric energy meter evaluation of running status system based on multisource data fusion
CN110245874A (en) * 2019-03-27 2019-09-17 中国海洋大学 A kind of Decision fusion method based on machine learning and knowledge reasoning
US20190311301A1 (en) * 2018-04-10 2019-10-10 Ebay Inc. Dynamically generated machine learning models and visualization thereof

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014209283A (en) * 2013-04-16 2014-11-06 日本電信電話株式会社 Evaluation information extraction device, and certainty factor learning device, method, and program
CN107526820A (en) * 2017-08-29 2017-12-29 广东省技术经济研究发展中心 A kind of more storehouse enterprise innovation monitoring big data normal data base construction methods of multi-source
US20190311301A1 (en) * 2018-04-10 2019-10-10 Ebay Inc. Dynamically generated machine learning models and visualization thereof
CN108663651A (en) * 2018-05-04 2018-10-16 国网上海市电力公司 A kind of intelligent electric energy meter evaluation of running status system based on multisource data fusion
CN110245874A (en) * 2019-03-27 2019-09-17 中国海洋大学 A kind of Decision fusion method based on machine learning and knowledge reasoning

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
周双全;杨小文;张建忠;张志平;韩舒;: "浮动车数据和定点检测数据的融合算法研究", 交通标准化, no. 16 *
李泓洋;万烂军;李长云;陈意伟;: "基于神经网络和证据理论的滚动轴承故障预测方法", 湖南工业大学学报, no. 04 *
洪晓斌,子文江,余蓉,罗宗强,何振威: "大型钢结构无损云检测的可信度融合评估", 华南理工大学学报( 自然科学版), no. 3, pages 70 - 77 *
王骏彪;: "软件定义边界下的可信动态访问控制模型研究", 移动通信, no. 08 *
胡清华;张道强;张长水;: "复杂环境下的机器学习研究专刊前言", 软件学报, no. 11 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115511683A (en) * 2022-09-23 2022-12-23 上海市疾病预防控制中心 Public health data acquisition and processing system
CN115511683B (en) * 2022-09-23 2025-09-16 上海市疾病预防控制中心 Public health data acquisition and processing system
CN118134511A (en) * 2024-05-06 2024-06-04 中汽信息科技(天津)有限公司 Carbon emission management method, device and medium for vehicle products

Similar Documents

Publication Publication Date Title
CN111783394B (en) Training method of event extraction model, event extraction method, system and equipment
CN117312577B (en) Traffic incident knowledge graph construction method based on multi-layer semantic graph convolutional neural network
CN112541180B (en) Software security vulnerability detection method based on grammatical features and semantic features
Ledger et al. Detecting llm hallucinations using monte carlo simulations on token probabilities
CN113191148B (en) A rail transit entity recognition method based on semi-supervised learning and clustering
CN113282955A (en) Method, system, terminal and medium for extracting privacy information in privacy policy
CN112579777B (en) A semi-supervised classification method for unlabeled text
Qiu et al. A question answering system based on mineral exploration ontology generation: A deep learning methodology
CN112394973B (en) Multi-language code plagiarism detection method based on pseudo-twin network
CN113076538A (en) Method for extracting embedded privacy policy of mobile application APK file
CN119577144A (en) A network public opinion analysis, classification and grading early warning method and system
CN113886524A (en) Network security threat event extraction method based on short text
CN112836505A (en) An Information Credibility Detection and Evaluation System Based on Multi-source Data
CN112270187A (en) Bert-LSTM-based rumor detection model
CN115221517A (en) Open source repository malicious packet detection method and system
CN119397548A (en) Application logic vulnerability detection method
CN119474810A (en) Adverse geological type identification method and system based on joint deep learning network
Joshi et al. Text data augmentation
CN113326371B (en) An event extraction method that combines pre-trained language models and anti-noise interference remote supervision information
CN120216325A (en) A method for detecting and analyzing function-level code vulnerabilities based on language models
CN119128642A (en) A multimodal large model for risk content identification and summary generation
CN116166789B (en) Method naming accurate recommendation and examination method
CN118132936A (en) Multi-modal fusion-based multi-classification method for petroleum reservoirs
CN118114663A (en) Automatic extraction method, system, equipment and medium for entities in power failure field
CN118365308A (en) Tunnel disease maintenance countermeasure method and system based on knowledge graph and expert system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210525

RJ01 Rejection of invention patent application after publication