CN101887460A - A Document Quality Evaluation Method and Its Application - Google Patents
A Document Quality Evaluation Method and Its Application Download PDFInfo
- Publication number
- CN101887460A CN101887460A CN2010102263535A CN201010226353A CN101887460A CN 101887460 A CN101887460 A CN 101887460A CN 2010102263535 A CN2010102263535 A CN 2010102263535A CN 201010226353 A CN201010226353 A CN 201010226353A CN 101887460 A CN101887460 A CN 101887460A
- Authority
- CN
- China
- Prior art keywords
- mrow
- msub
- document
- author
- literature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明提供一种应用于文献共享平台中的文献质量评估算法,该算法包括以下步骤:利用文献-文献,文献-期刊会议和作者的关系构建学术网络图;将这些关系定量成图上顶点之间的转移关系,建模得到转移概率矩阵;利用用户对文献的收藏行为建立模型,计算得到基于用户分析的文献质量值;对该图进行带重启动的随机游走迭代算法,得到文献质量、期刊会议质量和作者学术声望的信息。本发明第一次将用户行为信息与文献质量评价结合起来,能够在给出文献质量分析结果时还能够给出作者学术声望和期刊会议学术质量的分析结果,本方法的排序效果相比其他方法有明显提高。
The invention provides a document quality evaluation algorithm applied in a document sharing platform, the algorithm comprising the following steps: constructing an academic network graph using the relationship between document-document, document-journal meeting and author; The transition relationship among them is modeled to obtain the transition probability matrix; the user's collection behavior of documents is used to build a model, and the document quality value based on user analysis is calculated; the random walk iterative algorithm with restart is performed on the graph to obtain the document quality, Information on journal conference quality and author academic reputation. The present invention combines user behavior information with document quality evaluation for the first time, and can also give the analysis results of author's academic reputation and academic quality of journal conferences when giving document quality analysis results. The ranking effect of this method is compared with other methods There is a significant improvement.
Description
技术领域technical field
本发明涉及一种文献的质量评估方法,具体涉及一种在文献共享平台上的文献质量评估方法,属于知识挖掘技术领域。The invention relates to a document quality assessment method, in particular to a document quality assessment method on a document sharing platform, and belongs to the technical field of knowledge mining.
背景技术Background technique
近年以来,随着科学研究的飞速发展,科技文献的出版速度逐年增加,其数量已经非常庞大,例如仅针对计算机和信息科学领域的数字图书馆CiteSeerX上就存有150多万篇科技文献。科研人员在进行研究工作的过程中需要阅读和参考大量的科技文献资料,高质量的文献和低质量的文献对于科研工作者的价值是迥然不同的,从这些良莠不齐而数量十分庞大的文献资料中获取具有较高价值的科技文献成为了一项非常困难的工作。因此,如何对科技文献的质量进行有效的自动评估这一研究课题也吸引了越来越多的研究人员。In recent years, with the rapid development of scientific research, the publication speed of scientific and technological literature has increased year by year, and the number has become very large. For example, there are more than 1.5 million scientific and technological literature on CiteSeerX, a digital library only for the field of computer and information science. Scientific researchers need to read and refer to a large number of scientific and technological documents in the process of conducting research work. The value of high-quality documents and low-quality documents to scientific researchers is very different. Obtaining high-value scientific literature has become a very difficult task. Therefore, the research topic of how to effectively and automatically evaluate the quality of scientific and technological literature has attracted more and more researchers.
在学术研究领域的社会化文献共享交流网站上,用户可以收藏自己认为比较有价值的科技文献,标注标签,进行评论,并将这些文献分享给其他的用户。用户的收藏行为应当在对科技文献的质量进行分析的时候成为一个重要的参考,而目前利用了用户的行为来对科技文献质量进行分析的研究还非常少。因此,在Web 2.0环境下,如何将用户行为有效应用到科技文献质量评价系统中,值得进一步研究。On the social literature sharing and exchange website in the field of academic research, users can collect scientific and technological literature that they think are more valuable, mark tags, make comments, and share these literature with other users. User's collection behavior should be an important reference when analyzing the quality of scientific and technological literature, but currently there are very few studies that use user behavior to analyze the quality of scientific and technological literature. Therefore, in the Web 2.0 environment, how to effectively apply user behavior to the quality evaluation system of scientific and technological literature is worthy of further study.
对学术论文进行质量评估,学术界现有的评价方法主要包括同行评议、引文分析和基于链接分析的方法。同行评议通常用于论文的前期评价,如会议或期刊评审投稿论文;引文评价用于后期评价,例如评价研究人员已发表论文的学术水平。To evaluate the quality of academic papers, the existing evaluation methods in academia mainly include peer review, citation analysis and methods based on link analysis. Peer review is usually used for early evaluation of papers, such as reviewing submitted papers at conferences or journals; citation evaluation is used for later evaluation, such as evaluating the academic level of published papers by researchers.
同行评议,即由相同研究领域的自身专家学者从所选课题的意义以及创新性、研究方法、研究完成的质量、论文写作水平等各个方面进行综合性的评价。同行评议的优点在于专家对研究质量的评价是细致而准确的,专家凭借相关领域深厚的学术造诣能够看清学术研究的水平高下;而缺点则在于当前评价制度尚不完善、“同行”自律不严容易引发一些“流弊”,并且对大量的学术论文进行同行评价费时费力,是不太现实的。Peer review is a comprehensive evaluation by experts and scholars in the same research field from the significance and innovation of the selected topic, research methods, the quality of research completion, and the level of paper writing. The advantage of peer review is that the evaluation of research quality by experts is meticulous and accurate. Experts can see the level of academic research by virtue of their profound academic attainments in related fields. Laxity is likely to lead to some "frauds", and peer evaluation of a large number of academic papers is time-consuming and laborious, which is not realistic.
引文分析,即利用学术论文间的引用和被引用关系采用某种具体方法和评价标准对论文进行质量评价。引文分析法的研究人员提出了一系列量化的质量评价指标,例如被引频次、影响因子等。相对于同行评议,引文分析的评价方法更加简单,易于利用计算机自动完成;与此同时,引文分析的结果更粗糙,而且必须利用论文间的引用与被引用关系,对新发表的文献,因为被引用较少,往往给出的评价偏低,局限性较强。Citation analysis is to use the citation and cited relationship between academic papers to evaluate the quality of papers by using a specific method and evaluation standard. Researchers of citation analysis methods have proposed a series of quantitative quality evaluation indicators, such as cited frequency, impact factor, etc. Compared with peer review, the evaluation method of citation analysis is simpler, and it is easy to use computer to complete automatically; at the same time, the results of citation analysis are rougher, and the relationship between citations and citations between papers must be used. There are few citations, the evaluation is often given low, and the limitations are strong.
Brin和Page在1998年基于网页之间的链接关系提出了PageRank算法来对网页按照其重要度排序,并以此为基础创立了Google搜索引擎。Kleinberg提出了另外一种链接分析算法HITS算法。之后,考虑到科技文献之间通过引用关系天然形成的链接结构,很多研究人员基于这些方法的思想来解决文献质量评价方面的问题。In 1998, Brin and Page proposed the PageRank algorithm based on the link relationship between web pages to sort web pages according to their importance, and based on this, they created the Google search engine. Kleinberg proposed another link analysis algorithm HITS algorithm. Later, considering the link structure naturally formed by the citation relationship between scientific and technical literature, many researchers have solved the problem of literature quality evaluation based on the ideas of these methods.
发明内容Contents of the invention
本发明的目的是通过对文献、作者和期刊会议之间的关系建模并进行分析,利用Web 2.0环境下用户行为和文献质量之间的关系协助分析文献质量。本发明将同行评议和引文分析这两种分析方法统一在带重启动的随机游走算法框架下,给出最终的分析结果。The purpose of the present invention is to use the relationship between user behavior and document quality in the Web 2.0 environment to assist in the analysis of document quality by modeling and analyzing the relationship between documents, authors and journal conferences. The invention unifies the two analysis methods of peer review and citation analysis under the frame of random walk algorithm with restart, and gives the final analysis result.
本发明解决其技术问题所采用的方案是(流程如图1所示):The scheme adopted by the present invention to solve its technical problems is (flow process as shown in Figure 1):
本发明提出一种评估文献质量的方法,该方法应用于科技文献共享平台,在该平台上,用户可以对文献进行收藏、添加标签、评论、分享给其他用户,其特征在于,所述方法包括以下步骤:The present invention proposes a method for evaluating the quality of documents, which is applied to a sharing platform for scientific and technological documents. On the platform, users can collect, add tags, comment on and share documents with other users. The method is characterized in that the method includes The following steps:
A.利用文献的引用关系、文献与期刊会议和作者的关系以及文献的发表时间,构建带权的有向图,称为学术网络图;A. Using the citation relationship of the literature, the relationship between the literature and journal conferences and authors, and the publication time of the literature, construct a directed graph with weights, which is called an academic network graph;
B.将文献的引用关系、文献与期刊会议和作者的关系定量成图上顶点之间的转移关系,建模得到学术网络图上的转移概率矩阵;B. Quantify the citation relationship of literature, the relationship between literature and journal conferences and authors into the transfer relationship between vertices on the graph, and model the transfer probability matrix on the academic network graph;
C.利用用户对文献的收藏行为建立模型,考虑收藏时间,利用HITS算法计算得到一个基于用户分析的文献质量值;C. Use the user's collection behavior to establish a model, consider the collection time, and use the HITS algorithm to calculate a document quality value based on user analysis;
D.根据步骤B和步骤C建立的模型,进行带重启动的随机游走迭代,直到结果收敛,得到学术网络图上每个顶点的概率值,这个概率值即为文献质量、期刊会议质量和作者学术声望的信息。D. According to the model established in step B and step C, perform random walk iterations with restart until the result converges, and obtain the probability value of each vertex on the academic network graph. This probability value is the document quality, journal conference quality and Information about the author's academic reputation.
本发明提供的方法不仅可用于科技文献共享平台,同样也适用于论文共享平台或网站(其中的文献指的是论文),以及图片共享平台或网站(其中的文献指的是图片)等。The method provided by the present invention can be used not only for scientific and technological literature sharing platforms, but also for paper sharing platforms or websites (documents therein refer to papers), and picture sharing platforms or websites (documents therein refer to pictures) and the like.
本发明的有益效果:Beneficial effects of the present invention:
本发明提出的应用于科技文献的基于图的质量评估方法,第一次将用户行为信息与文献质量评价结合起来,能够在给出文献质量分析结果时还能够给出作者学术声望和期刊会议学术质量的分析结果。如将本发明应用于科技文献检索网站,对用户按照关键字检索到的结果进行质量值排序,能够帮助用户更快找到高质量的科技文献,更快了解到高质量的期刊和会议,以及学术声望高的作者。实验证明,本方法的排序效果相比其他方法有明显提高。The graph-based quality assessment method applied to scientific and technological documents proposed by the present invention combines user behavior information with document quality evaluation for the first time, and can also provide the author's academic reputation and journal conference academic results when the document quality analysis results are given. quality analysis results. For example, if the present invention is applied to a scientific and technological literature retrieval website, the quality values of the results retrieved by users according to keywords can be sorted, which can help users find high-quality scientific and technological literature faster, learn about high-quality journals and conferences, and academic Author of high reputation. Experiments show that the sorting effect of this method is significantly improved compared with other methods.
附图说明Description of drawings
图1为根据本发明的基于图的科技文献质量评估方法的总流程图;Fig. 1 is the general flowchart of the method for evaluating the quality of scientific and technological documents based on graphs according to the present invention;
图2为根据本发明构建的学术网络图;Fig. 2 is an academic network diagram constructed according to the present invention;
图3为根据本发明构建的学术网络图上顶点间转移关系图;Fig. 3 is a transfer relationship diagram between vertices on the academic network graph constructed according to the present invention;
图4为根据本发明构建的用户-文献收藏关系图。Fig. 4 is a user-document collection relationship diagram constructed according to the present invention.
具体实施方式Detailed ways
下面结合附图和具体实施方式对本发明作进一步详细描述:Below in conjunction with accompanying drawing and specific embodiment the present invention is described in further detail:
步骤1,利用文献的引用关系、文献与期刊会议和作者的关系以及文献的发表时间,构建带权的有向图,称为学术网络图。Step 1: Construct a weighted directed graph, called the academic network graph, using the citation relationship of the literature, the relationship between the literature and journal conferences and authors, and the publication time of the literature.
本发明设计构建的学术网络图由三个部分组成,对文献、作者、期刊会议三种实体之间的关系进行建模。三个部分分别为:The academic network diagram designed and constructed by the present invention is composed of three parts, and models the relationships among three entities: documents, authors, and periodical conferences. The three parts are:
●文献引文互联子图Gdd=(Vd,Edd),● Literature citation interconnection subgraph G dd = (V d , E dd ),
Gdd是有向图,表示文献之间的引用关系,其中Vd是文献顶点集,Edd是边集,有向边<di,dj>∈Edd表示文献di引用了文献dj;G dd is a directed graph, which represents the citation relationship between documents, where V d is a document vertex set, E dd is an edge set, and the directed edge <d i , d j >∈E dd means that document d i cites document d j ;
●作者-文献子图Gad=(Va∪Vd,Ead),● Author-document subgraph G ad = (V a ∪ V d , E ad ),
Gad是一个二部图,表示作者和文献之间的著作关系,其中Va是作者顶点集,Ead是边集,无向边(ai,dj)∈Ead表示作者ai写作了文献dj;G ad is a bipartite graph, which represents the authorship relationship between the author and the document, where V a is the author vertex set, E ad is the edge set, and the undirected edge (a i , d j ) ∈ E ad represents the author a i writing Document d j ;
●期刊会议-文献子图Gcd=(Vc∪Vd,Ecd),●Journal conference-document subgraph G cd =(V c ∪V d ,E cd ),
Gcd是一个二部图,表示期刊会议和文献之间的发表关系,其中Vc是期刊、会议顶点集,Ecd是边集,无向边(ci,dj)∈Ecd表示文献dj发表在期刊或会议ci上;G cd is a bipartite graph, which represents the publishing relationship between journal conferences and documents, where V c is the journal and conference vertex set, E cd is the edge set, and the undirected edge (c i , d j ) ∈ E cd represents the document d j published in journals or conferences c i ;
这三个子图的组合即为学术网络图,如图2所示。The combination of these three subgraphs is the academic network diagram, as shown in Figure 2.
定义学术网络图为有向图G=(V,E)。其中V为顶点集,V=Va∪Vd∪Vc,E为边集,E=Edd∪Ead∪Ecd。考虑到随机游走需要在有向图上进行,因此这里将作者-文献子图和期刊会议-文献子图中的每一条无向边都表示成连接这两个顶点的两条有向边,例如:(ci,dj)→<ci,dj>∪<dj,ci>。Define the academic network graph as a directed graph G=(V, E). Where V is the vertex set, V=V a ∪V d ∪V c , E is the edge set, E=E dd ∪E ad ∪E cd . Considering that the random walk needs to be performed on a directed graph, each undirected edge in the author-document subgraph and journal meeting-document subgraph is represented here as two directed edges connecting these two vertices. For example: (c i , d j )→<c i , d j >∪<d j , c i >.
步骤2,将文献的引用关系、文献与期刊会议和作者的关系定量成图上顶点之间的转移关系,建模得到学术网络图上的转移概率矩阵。Step 2. Quantify the citation relationship of the literature, the relationship between the literature and the journal conference, and the author into the transition relationship between the vertices on the graph, and model the transition probability matrix on the academic network graph.
学术网络图G中每个顶点代表一个作者、一篇文献或者一个期刊/会议,因此图G是一个包含三种不同类型实体的异构图。本发明对不同类型的顶点(实体)之间的转移定义不同的转移概率α,如图3中所示。对于这些转移概率参数,定义:Each vertex in the academic network graph G represents an author, a document, or a journal/conference, so graph G is a heterogeneous graph containing three different types of entities. The present invention defines different transition probabilities α for transitions between different types of vertices (entities), as shown in FIG. 3 . For these transition probability parameters, define:
αad=αcd=1α ad =α cd =1
αda+αdc+αdd=1α da +α dc +α dd =1
其中:αad为从作者顶点到文献顶点的转移概率,αcd为从发表地点顶点到文献顶点的转移概率,αda为从文献顶点到作者顶点的转移概率,αdc为从文献顶点到发表地点顶点的转移概率,αdd为从文献顶点到文献顶点的转移概率。Among them: α ad is the transition probability from the author vertex to the document vertex, α cd is the transition probability from the publication site vertex to the document vertex, α da is the transition probability from the document vertex to the author vertex, α dc is the transition probability from the document vertex to the publication vertex The transition probability of the location vertex, α dd is the transition probability from document vertex to document vertex.
定义W(G)为图G的带权邻接矩阵,对应于学术网络图中不同顶点之间关系的权重,根据前面对学术网络图的定义,W(G)可以被分解为如下表所示的一系列子矩阵。首先,本发明对各个子矩阵赋初值获得初始的带权邻接矩阵;然后,对矩阵的初值应用权值计分函数,获得最终的带权邻接矩阵;最后,再以带权邻接矩阵为基础,计算得到转移概率矩阵。Define W(G) as the weighted adjacency matrix of graph G, which corresponds to the weight of the relationship between different vertices in the academic network graph. According to the previous definition of academic network graph, W(G) can be decomposed into the following table A series of sub-matrices of . First, the present invention assigns initial values to each sub-matrix to obtain an initial weighted adjacency matrix; then, a weighted scoring function is applied to the initial value of the matrix to obtain the final weighted adjacency matrix; finally, the weighted adjacency matrix is used as Based on the calculation, the transition probability matrix is obtained.
以下分别给出这些子矩阵的初始定义:The initial definitions of these sub-matrices are given below:
●从文献顶点到文献顶点的带权邻接矩阵● Weighted adjacency matrix from document vertex to document vertex
其中t(d)表示文献d的发表时间,Γdd(di)表示文献di引用的文献的集合。Where t(d) represents the publication time of document d, and Γ dd (d i ) represents the collection of documents cited by document d i .
●从作者地点顶点到文献顶点的带权邻接矩阵●Weighted adjacency matrix from author location vertex to document vertex
其中Γad(ai)表示作者ai发表文献的集合,作者a是文献d的第k作者。Where Γ ad (a i ) represents the collection of published documents by author a i , Author a is the kth author of document d.
●从文献顶点到作者顶点的带权邻接矩阵Wda(j,i)=|Γda(dj)|-k+1●The weighted adjacency matrix W da (j, i)=|Γ da (d j )|-k+1 from document vertex to author vertex
其中Γda(dj)表示文献dj的作者集合,k表示作者ai是文献dj的第k作者。Where Γ da (d j ) represents the set of authors of document d j , and k represents that author a i is the kth author of document d j .
●从文献顶点到发表地点顶点的带权邻接矩阵 ●The weighted adjacency matrix from the document vertex to the publication site vertex
●从发表地点顶点到文献顶点的带权邻接矩阵●The weighted adjacency matrix from the publishing site vertex to the document vertex
其中cik表示会议ci的某一届,或者期刊ci的某一卷,Γcd(cim)表示发表在cim上的文献集合,t(cim)表示cim的对应时间(年份)。Among them, c ik represents a certain session of conference ci , or a certain volume of journal ci , Γ cd (c im ) represents the collection of documents published on c im , t(c im ) represents the corresponding time of c im (year ).
Γcd(cik)={d|t(d)=t(cik)∧d∈Γcd(ci)}Γ cd (ci ik )={d|t(d)=t(ci ik )∧d∈Γ cd ( ci )}
显然,
接下来对矩阵中的初始权值应用一个权值计分函数Ф:Next apply a weight scoring function Ф to the initial weights in the matrix:
W(i,j)=Ф(W(i,j))W(i,j)=Ф(W(i,j))
合适的权值计分函数的标准是:这个函数应该是一个单调递增函数,但随着自变量取值的增大,函数值的增长幅度逐渐减小,即:Ф′(x)>0且Ф″(x)<0,本方法中取 The standard for a suitable weight scoring function is: this function should be a monotonically increasing function, but as the value of the independent variable increases, the growth rate of the function value gradually decreases, that is: Ф′(x)>0 and Ф″(x)<0, this method takes
接下来,首先定义三个子图对应的转移概率矩阵,最后计算出整个学术网络图的转移概率矩阵。Next, first define the transition probability matrix corresponding to the three sub-graphs, and finally calculate the transition probability matrix of the entire academic network graph.
●文献引用子图Gdd ● Literature citation subgraph G dd
文献到文献的转移概率矩阵为:The document-to-document transition probability matrix is:
其中,in,
●作者-文献子图Gad ●Author-document subgraph G ad
作者到文献的转移概率矩阵为:The transition probability matrix from author to document is:
其中,in,
文献到作者的转移概率矩阵为:The transition probability matrix from document to author is:
其中,in,
●期刊会议-文献子图Gcd ●Journal conference - literature subgraph G cd
文献到期刊会议的转移概率矩阵为:The transition probability matrix from a document to a journal conference is:
其中,in,
期刊会议到文献的转移概率矩阵为:The transition probability matrix from journal conference to literature is:
其中,in,
通过子图的转移概率矩阵,得到学术网络图上的转移概率矩阵:Through the transition probability matrix of the subgraph, the transition probability matrix on the academic network graph is obtained:
步骤3,利用用户对文献的收藏行为建立模型,考虑收藏时间,利用HITS算法计算得到一个基于用户分析的文献质量值。Step 3, use the user's collection behavior to establish a model, consider the collection time, and use the HITS algorithm to calculate a document quality value based on user analysis.
本发明将文献和用户之间通过收藏行为连接起来构造用户-文献收藏关系图,用户和文献是图中的顶点,收藏行为是边,如图4所示。本发明定义用户-文献收藏体系为B=(U,D,T,R),其中U是用户集合,D是文献集合,T是一系列时间点的集合,表示收藏关系的集合。(u,d,t)∈R,表示用户u在时刻t收藏了文献d。The present invention connects documents and users through collection behaviors to construct a user-document collection relationship graph. Users and documents are vertices in the graph, and collection behaviors are edges, as shown in FIG. 4 . The present invention defines the user-document collection system as B=(U, D, T, R), wherein U is a collection of users, D is a collection of documents, and T is a collection of a series of time points, Represents a collection of favorites. (u, d, t) ∈ R, means that user u bookmarked document d at time t.
定义文献集合的质量值向量为:q=(q1,q2,Λ,qm),其中m=|D|;定义用户集合的专家度向量为:e=(e1,e2,Λ,en),其中n=|U|。定义用户-文献收藏关系图的邻接矩阵A:The quality value vector defining the document set is: q=(q 1 , q 2 , Λ, q m ), where m=|D|; the expert degree vector defining the user set is: e=(e 1 , e 2 , Λ , e n ), where n=|U|. Define the adjacency matrix A of the user-document collection relationship graph:
计算文献质量值和用户专家度就是重复如下的迭代过程直到结果收敛:Calculating the document quality value and user expert degree is to repeat the following iterative process until the result converges:
q=e×Aq=e×A
e=q×AT e=q× AT
步骤4,根据步骤2和步骤3建立的模型,进行带重启动的随机游走迭代,直到结果收敛,得到学术网络图上每个顶点的概率值,这个概率值即为文献质量、期刊会议质量和作者学术声望的信息。Step 4, according to the model established in step 2 and step 3, perform random walk iterations with restart until the result converges, and obtain the probability value of each vertex on the academic network graph, which is the quality of literature and the quality of journal conferences and information on the author's academic reputation.
设d为文献质量值向量,a为作者学术声望向量,c为期刊会议质量值向量。将对应三种实体的向量连接成一个向量:π=[dT,aT,cT]T。带重启动的随机游走算法可以用如下的公式表达:Let d be the vector of document quality value, a be the vector of author's academic reputation, and c be the vector of journal conference quality value. Connect the vectors corresponding to the three entities into one vector: π=[d T , a T , c T ] T . The random walk algorithm with restart can be expressed by the following formula:
πt+1=cMTπt+(1-c)Q,0≤c≤1π t+1 =cM T π t +(1-c)Q, 0≤c≤1
采用如下的方法构建Q:Q is constructed in the following way:
对Q(i)进行规范化,使得
在判断是否收敛时,将相邻的前后两次迭代得到的π向量相减,如果差小于10-6,则判断其为收敛。假设最后得到的向量为πn,则其中的值为文献质量值、作者学术声望值和期刊会议质量值。When judging whether it is converged, subtract the π vectors obtained by two adjacent iterations, and if the difference is less than 10 -6 , it is judged to be converged. Assuming that the final vector is π n , the values in it are literature quality value, author's academic reputation value and journal conference quality value.
性能评测performance evaluation
本发明的科技文献质量评价方法为文献、期刊会议和作者都给出了一个质量评分值,利用这一分值得到的排序结果进行实验评测。The method for evaluating the quality of scientific and technological documents of the present invention provides a quality scoring value for documents, periodical conferences and authors, and uses the sorting results obtained by this score for experimental evaluation.
首先对文献质量评价的结果进行评测,选取三个领域:“Opinion Mining”、“Topic Model”和“Social Network”的文献来进行评测。文献评价的实验人工评测主要利用人工对质量排序结果打分的方式结合DCG(Discounted Cumulative Gain)评测算法来评测。评测者依据不同的文献的质量不同给其赋予不同的分值,分值越高的文献越应该排在排序结果的前面。之后,使用DCG评测算法来对结果进行评测,DCG值越高,说明算法输出的排序结果越符合实际需要。DCG评测值的计算公式为:Firstly, evaluate the results of the literature quality evaluation, and select the literature in three fields: "Opinion Mining", "Topic Model" and "Social Network" for evaluation. The experimental manual evaluation of literature evaluation mainly uses the method of manually scoring the quality sorting results combined with the DCG (Discounted Cumulative Gain) evaluation algorithm to evaluate. Evaluators assign different scores to different documents according to their quality, and the documents with higher scores should be ranked at the front of the sorting results. After that, use the DCG evaluation algorithm to evaluate the results. The higher the DCG value, the more the sorting results output by the algorithm meet the actual needs. The calculation formula of DCG evaluation value is:
其中scorei为评测者给排序结果中第i项的分值。Where score i is the score given by the evaluator to the i-th item in the ranking results.
对文献质量的评价,所采用的对比方法如下:The comparison methods used to evaluate the quality of literature are as follows:
●PageRank算法结果中的文献部分●The literature part of the PageRank algorithm results
●PopRank算法结果中的文献部分●The literature part of the results of the PopRank algorithm
●学术网络图上的Random Walk算法(RW)结果中的文献部分●The literature part in the results of the Random Walk algorithm (RW) on the academic network graph
●文献被引次数(Citation Count):文献在本文实验采用的论文集中的被引用次数。●Citation Count: The number of times the document is cited in the collection of papers used in this experiment.
以下为评测结果(为了便于表示,本发明的方法记为RW+U ):The following is the evaluation result (for ease of expression, the method of the present invention is denoted as RW+U):
其次是对作者学术声望的评价实验结果进行评测,方法与文献质量评价实验相同,对比方法如下:The second is to evaluate the results of the author’s academic prestige evaluation experiment. The method is the same as that of the literature quality evaluation experiment. The comparison method is as follows:
PageRank算法结果中的作者部分Author section in PageRank algorithm results
PopRank算法结果中的作者部分Author section in PopRank algorithm results
学术网络图上的Random Walk算法(RW)结果中的作者部分The author part in the results of the Random Walk algorithm (RW) on the academic network graph
发表文献数(Publication Count):作者在实验的领域文献集中发表的文献总数Publication Count: The total number of documents published by the author in the field literature collection of the experiment
领域文献被引次数(Citation Count):作者在实验的领域文献集中发表的文献的被引次数总和Citation Count: The total number of citations of the literature published by the author in the experimental field literature collection
评测结果如下所示:The evaluation results are as follows:
最后是对期刊的学术质量评价结果进行评测。考虑到影响因子是学术界中普遍采用的期刊质量评价方法,所以评测的参考标准是修改版影响因子分析法的结果。修改版影响因子计算方法如下:Finally, evaluate the academic quality evaluation results of the journals. Considering that the impact factor is a commonly used journal quality evaluation method in academia, the reference standard for evaluation is the result of the modified version of the impact factor analysis method. The calculation method of the revised impact factor is as follows:
其中,D是期刊X上发表的文献的总数,C是这些文献被引用次数之和。Among them, D is the total number of documents published in journal X, and C is the sum of the number of citations of these documents.
对于期刊评价评测的方法是前N个结果的准确率,其计算方法如下:The method for journal evaluation is the accuracy rate of the top N results, which is calculated as follows:
以下为评测结果:The following are the evaluation results:
上表所示为几种算法结果中的文献质量值平均值的按年分布情况。这里列出的是从1971年到2009年的平均值,每年的均值是用当年发表文献的质量值之和除以发表的文献数。从图中可以看出,本发明的方法RW和RW+U对新文献的质量值要普遍高于其他两种方法,说明本发明的方法解决了传统方法中新文献评价结果普遍偏低的问题。The table above shows the yearly distribution of the average document quality values in the results of several algorithms. Listed here are the average values from 1971 to 2009, and the average value for each year is the sum of the quality values of published documents in that year divided by the number of published documents. As can be seen from the figure, the method RW and RW+U of the present invention have generally higher quality values for new documents than the other two methods, indicating that the method of the present invention solves the problem that the evaluation results of new documents in traditional methods are generally low .
需要注意的是,公布实施例的目的在于帮助进一步理解本发明,本领域的技术人员可以理解:在不脱离本发明及所附权利要求的精神和范围内,各种替换和修改都是可能的。例如,本发明同样可以应用于论文共享平台或网站(只需用论文取代文献),以及图片共享平台或网站(只需用图片取代文献)等。因此,本发明不应局限于实施例所公开的内容,本发明要求保护的范围以权利要求书界定的范围为准。It should be noted that the purpose of the published embodiments is to help further understand the present invention, and those skilled in the art can understand that various replacements and modifications are possible without departing from the spirit and scope of the present invention and the appended claims . For example, the present invention can also be applied to paper sharing platforms or websites (only need to replace documents with papers), and picture sharing platforms or websites (only need to replace documents with pictures). Therefore, the present invention should not be limited to the content disclosed in the embodiments, and the protection scope of the present invention is subject to the scope defined in the claims.
Claims (8)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2010102263535A CN101887460A (en) | 2010-07-14 | 2010-07-14 | A Document Quality Evaluation Method and Its Application |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2010102263535A CN101887460A (en) | 2010-07-14 | 2010-07-14 | A Document Quality Evaluation Method and Its Application |
Publications (1)
Publication Number | Publication Date |
---|---|
CN101887460A true CN101887460A (en) | 2010-11-17 |
Family
ID=43073382
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2010102263535A Pending CN101887460A (en) | 2010-07-14 | 2010-07-14 | A Document Quality Evaluation Method and Its Application |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101887460A (en) |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102984191A (en) * | 2011-09-07 | 2013-03-20 | 百度在线网络技术(北京)有限公司 | Method and device and equipment used for determining behavior related quality information |
CN103559407A (en) * | 2013-11-14 | 2014-02-05 | 北京航空航天大学深圳研究院 | Recommendation system and method for measuring node intimacy in weighted graph with direction |
CN104462215A (en) * | 2014-11-05 | 2015-03-25 | 大连理工大学 | Scientific and technical literature quoting number predicting method based on time sequence |
CN104537495A (en) * | 2014-12-31 | 2015-04-22 | 浙江大学 | Scholar ability calculation method and system |
CN104657488A (en) * | 2015-03-05 | 2015-05-27 | 中南大学 | Method for calculating author influence based on citation propagation network |
CN105404641A (en) * | 2015-10-23 | 2016-03-16 | 华建宇通科技(北京)有限责任公司 | Baseline based journal evaluation method and evaluation apparatus |
CN105589948A (en) * | 2015-12-18 | 2016-05-18 | 重庆邮电大学 | Document citation network visualization and document recommendation method and system |
CN105740386A (en) * | 2016-01-27 | 2016-07-06 | 北京航空航天大学 | Thesis search method and device based on sorting integration |
CN105843876A (en) * | 2016-03-18 | 2016-08-10 | 合网络技术(北京)有限公司 | Multimedia resource quality assessment method and apparatus |
CN107391659A (en) * | 2017-07-18 | 2017-11-24 | 北京工业大学 | A kind of citation network academic evaluation sort method based on credit worthiness |
CN107833142A (en) * | 2017-11-08 | 2018-03-23 | 广西师范大学 | Academic social networks scientific research cooperative person recommends method |
WO2018077181A1 (en) * | 2016-10-27 | 2018-05-03 | 腾讯科技(深圳)有限公司 | Method and device for graph centrality calculation, and storage medium |
CN109272228A (en) * | 2018-09-12 | 2019-01-25 | 石家庄铁道大学 | Scientific research influence power analysis method based on Research Team's cooperative network |
CN109801692A (en) * | 2018-12-14 | 2019-05-24 | 平安医疗健康管理股份有限公司 | A kind of Medical record database method for evaluating quality and device |
CN110457439A (en) * | 2019-08-06 | 2019-11-15 | 北京如优教育科技有限公司 | One-stop intelligent writes householder method, device and system |
CN110825942A (en) * | 2019-10-22 | 2020-02-21 | 清华大学 | Method and system for calculating quality of thesis |
CN110955749A (en) * | 2019-10-24 | 2020-04-03 | 浙江工业大学 | Paper attention prediction method |
CN112286988A (en) * | 2020-10-23 | 2021-01-29 | 平安科技(深圳)有限公司 | Medical document sorting method and device, electronic equipment and storage medium |
CN112508461A (en) * | 2021-01-27 | 2021-03-16 | 中国科学院自动化研究所 | Academic influence evaluation service platform system and device for multiple elements |
US11328328B2 (en) | 2019-03-28 | 2022-05-10 | Coupang Corp. | Computer-implemented method for arranging hyperlinks on a grapical user-interface |
-
2010
- 2010-07-14 CN CN2010102263535A patent/CN101887460A/en active Pending
Cited By (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102984191A (en) * | 2011-09-07 | 2013-03-20 | 百度在线网络技术(北京)有限公司 | Method and device and equipment used for determining behavior related quality information |
CN102984191B (en) * | 2011-09-07 | 2017-06-09 | 百度在线网络技术(北京)有限公司 | Method, device and equipment for determining behavior correlated quality information |
CN103559407B (en) * | 2013-11-14 | 2016-08-31 | 北京航空航天大学深圳研究院 | A kind of commending system for measuring direct graph with weight interior joint cohesion and method |
CN103559407A (en) * | 2013-11-14 | 2014-02-05 | 北京航空航天大学深圳研究院 | Recommendation system and method for measuring node intimacy in weighted graph with direction |
CN104462215A (en) * | 2014-11-05 | 2015-03-25 | 大连理工大学 | Scientific and technical literature quoting number predicting method based on time sequence |
CN104462215B (en) * | 2014-11-05 | 2017-07-11 | 大连理工大学 | A kind of scientific and technical literature based on time series is cited number Forecasting Methodology |
CN104537495A (en) * | 2014-12-31 | 2015-04-22 | 浙江大学 | Scholar ability calculation method and system |
CN104657488A (en) * | 2015-03-05 | 2015-05-27 | 中南大学 | Method for calculating author influence based on citation propagation network |
CN105404641A (en) * | 2015-10-23 | 2016-03-16 | 华建宇通科技(北京)有限责任公司 | Baseline based journal evaluation method and evaluation apparatus |
CN105404641B (en) * | 2015-10-23 | 2018-10-26 | 华建宇通科技(北京)有限责任公司 | A kind of Journal Evaluation method and evaluating apparatus based on baseline |
CN105589948A (en) * | 2015-12-18 | 2016-05-18 | 重庆邮电大学 | Document citation network visualization and document recommendation method and system |
CN105589948B (en) * | 2015-12-18 | 2018-10-12 | 重庆邮电大学 | A kind of reference citation network visualization and literature recommendation method and system |
CN105740386A (en) * | 2016-01-27 | 2016-07-06 | 北京航空航天大学 | Thesis search method and device based on sorting integration |
CN105843876A (en) * | 2016-03-18 | 2016-08-10 | 合网络技术(北京)有限公司 | Multimedia resource quality assessment method and apparatus |
CN105843876B (en) * | 2016-03-18 | 2020-07-14 | 阿里巴巴(中国)有限公司 | Quality evaluation method and device for multimedia resources |
WO2018077181A1 (en) * | 2016-10-27 | 2018-05-03 | 腾讯科技(深圳)有限公司 | Method and device for graph centrality calculation, and storage medium |
US10936765B2 (en) | 2016-10-27 | 2021-03-02 | Tencent Technology (Shenzhen) Company Limited | Graph centrality calculation method and apparatus, and storage medium |
CN107391659B (en) * | 2017-07-18 | 2020-05-22 | 北京工业大学 | A Reputation-Based Citation Network Academic Influence Evaluation Ranking Method |
CN107391659A (en) * | 2017-07-18 | 2017-11-24 | 北京工业大学 | A kind of citation network academic evaluation sort method based on credit worthiness |
CN107833142A (en) * | 2017-11-08 | 2018-03-23 | 广西师范大学 | Academic social networks scientific research cooperative person recommends method |
CN109272228B (en) * | 2018-09-12 | 2022-03-15 | 石家庄铁道大学 | Scientific research influence analysis method based on scientific research team cooperation network |
CN109272228A (en) * | 2018-09-12 | 2019-01-25 | 石家庄铁道大学 | Scientific research influence power analysis method based on Research Team's cooperative network |
CN109801692A (en) * | 2018-12-14 | 2019-05-24 | 平安医疗健康管理股份有限公司 | A kind of Medical record database method for evaluating quality and device |
US11328328B2 (en) | 2019-03-28 | 2022-05-10 | Coupang Corp. | Computer-implemented method for arranging hyperlinks on a grapical user-interface |
CN110457439A (en) * | 2019-08-06 | 2019-11-15 | 北京如优教育科技有限公司 | One-stop intelligent writes householder method, device and system |
CN110825942A (en) * | 2019-10-22 | 2020-02-21 | 清华大学 | Method and system for calculating quality of thesis |
CN110955749A (en) * | 2019-10-24 | 2020-04-03 | 浙江工业大学 | Paper attention prediction method |
CN112286988A (en) * | 2020-10-23 | 2021-01-29 | 平安科技(深圳)有限公司 | Medical document sorting method and device, electronic equipment and storage medium |
WO2021179687A1 (en) * | 2020-10-23 | 2021-09-16 | 平安科技(深圳)有限公司 | Medical literature sorting method and apparatus, electronic device and storage medium |
CN112286988B (en) * | 2020-10-23 | 2023-07-25 | 平安科技(深圳)有限公司 | Medical document ordering method, device, electronic equipment and storage medium |
CN112508461A (en) * | 2021-01-27 | 2021-03-16 | 中国科学院自动化研究所 | Academic influence evaluation service platform system and device for multiple elements |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101887460A (en) | A Document Quality Evaluation Method and Its Application | |
CN107766324B (en) | Text consistency analysis method based on deep neural network | |
CN111159395B (en) | Chart neural network-based rumor standpoint detection method and device and electronic equipment | |
CN108399163B (en) | Text similarity measurement method combining word aggregation and word combination semantic features | |
CN104915448B (en) | A kind of entity based on level convolutional network and paragraph link method | |
CN103198228B (en) | Based on the relational network link Forecasting Methodology of the hidden topic model of broad sense relationship | |
CN113254637B (en) | Grammar-fused aspect-level text emotion classification method and system | |
CN108573411A (en) | A Hybrid Recommendation Method Based on Deep Sentiment Analysis of User Reviews and Fusion of Multi-source Recommendation Views | |
Zhang et al. | User community discovery from multi-relational networks | |
CN113312480B (en) | Multi-label classification method and equipment for scientific papers based on graph convolutional network | |
Zhu et al. | Global and local multi-view multi-label learning | |
CN110674318A (en) | Data recommendation method based on citation network community discovery | |
CN108776844A (en) | Social network user behavior prediction method based on context-aware tensor resolution | |
CN105740381A (en) | User interest mining method based on complex network characteristics and neural network clustering | |
CN112527981B (en) | Open information extraction method, device, electronic device and storage medium | |
CN113449204A (en) | Social event classification method and device based on local aggregation graph attention network | |
CN107545033A (en) | A kind of computational methods based on the knowledge base entity classification for representing study | |
CN113516553A (en) | Credit risk early warning method and device | |
CN104778205A (en) | Heterogeneous information network-based mobile application ordering and clustering method | |
Yi et al. | Graphical visual analysis of consumer electronics public comment information mining under knowledge graph | |
Qi et al. | Application of LDA and word2vec to detect English off-topic composition | |
CN112001165B (en) | A method for fine-grained text sentiment analysis based on user harshness | |
CN105162648B (en) | Corporations' detection method based on backbone network extension | |
Bai et al. | Quantifying the impact of scientific collaboration and papers via motif-based heterogeneous networks | |
CN117933222A (en) | Power marketing reform policy evaluation method based on policy consistency index model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Open date: 20101117 |