CN103631859B - Intelligent review expert recommending method for science and technology projects - Google Patents
Intelligent review expert recommending method for science and technology projects Download PDFInfo
- Publication number
- CN103631859B CN103631859B CN201310509358.2A CN201310509358A CN103631859B CN 103631859 B CN103631859 B CN 103631859B CN 201310509358 A CN201310509358 A CN 201310509358A CN 103631859 B CN103631859 B CN 103631859B
- Authority
- CN
- China
- Prior art keywords
- word
- words
- feature
- information
- node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000012552 review Methods 0.000 title claims abstract description 89
- 238000005516 engineering process Methods 0.000 title claims abstract description 66
- 238000000034 method Methods 0.000 title claims abstract description 47
- 230000002776 aggregation Effects 0.000 claims abstract description 41
- 238000004220 aggregation Methods 0.000 claims abstract description 41
- 230000011218 segmentation Effects 0.000 claims abstract description 28
- 238000004364 calculation method Methods 0.000 claims description 36
- 239000013598 vector Substances 0.000 claims description 34
- 239000011159 matrix material Substances 0.000 claims description 21
- 238000000605 extraction Methods 0.000 claims description 15
- 238000010276 construction Methods 0.000 claims description 13
- 238000011160 research Methods 0.000 claims description 12
- 238000011156 evaluation Methods 0.000 claims description 11
- 230000006870 function Effects 0.000 claims description 11
- 238000004422 calculation algorithm Methods 0.000 claims description 9
- 230000002452 interceptive effect Effects 0.000 claims description 3
- 238000003058 natural language processing Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 238000007619 statistical method Methods 0.000 claims description 3
- 238000001914 filtration Methods 0.000 abstract description 2
- 238000012423 maintenance Methods 0.000 description 2
- 238000011112 process operation Methods 0.000 description 2
- TVZRAEYQIKYCPH-UHFFFAOYSA-N 3-(trimethylsilyl)propane-1-sulfonic acid Chemical compound C[Si](C)(C)CCCS(O)(=O)=O TVZRAEYQIKYCPH-UHFFFAOYSA-N 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/316—Indexing structures
- G06F16/328—Management therefor
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- General Business, Economics & Management (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
技术领域technical field
本发明属于专家推荐技术领域,尤其涉及一种基于网络服务的科技项目评审专家智能推荐方法,它是一种辅助科技项目立项决策的智能方法。The invention belongs to the technical field of expert recommendation, and in particular relates to an intelligent recommendation method for scientific and technological project evaluation experts based on network services, which is an intelligent method for assisting the decision-making of scientific and technological projects.
背景技术Background technique
随着科技项目管理系统在我国各职能部门迅速普及,科技项目的评审工作从以往的集中会议模式发展到当前的网络模式,打破了评审工作中专家地域的限制。评审专家根据领域知识和资助机构的资助标准,对项目申请书进行评议,资助机构依据专家的评议情况决定是否资助。With the rapid popularization of science and technology project management system in various functional departments in our country, the evaluation work of science and technology projects has developed from the previous centralized meeting mode to the current network mode, breaking the geographical restrictions of experts in the evaluation work. Evaluation experts review the project application based on domain knowledge and the funding standards of the funding agency, and the funding agency decides whether to fund or not based on the experts' evaluation.
目前面向科技项目的专家推荐大多仅凭项目管理人员的主观意识推荐专家对待审项目进行评审,一个待审项目往往需要多个专家进行评审,人工推荐专家势必存在效率不高、工作量大、缺乏科学性等问题,所遴选出的专家并非是最合适的。因此,对科技项目评审专家智能推荐的研究是非常关键的,可以有效地缓解专家与所评项目内容不匹配等问题,大大提升科技项目评审工作的社会服务能力。At present, most expert recommendations for scientific and technological projects only rely on the subjective consciousness of project managers to recommend experts to review projects to be reviewed. A project to be reviewed often requires multiple experts to review, and manual recommendation of experts is bound to be inefficient, heavy workload, and lack of resources. For issues such as scientific nature, the selected experts are not the most suitable. Therefore, the research on the intelligent recommendation of science and technology project review experts is very critical, which can effectively alleviate the problems of mismatch between experts and the content of the projects reviewed, and greatly improve the social service ability of science and technology project review work.
现今智能推荐技术,如协同过滤推荐、基于内容的推荐等,大多应用在影视推荐网站、商品推荐网站,鲜有在科技项目评审专家信息库中的研究与应用,由于特定领域的限制,为科技项目智能推荐专家技术与一般的推荐技术还是有区别的:首先,科技项目管理系统的推荐涉及各行各业,领域知识非常复杂;其次,科技项目评审专家的推荐涉及到科技项目的资助基金,对专家推荐的客观性、公正性和精准性的要求是非常高的。目前在这方面,我国还缺乏系统化的方法指导和成熟的技术支持。而信息文本具有“半结构化”等特征,专家信息和待审科技项目信息的内容是可以进行匹配的,本发明充分利用结构特征以及词语语义信息计算项目与专家的信息相似度。若相似度较高,则表示专家对该项目熟悉,产生推荐专家列表对项目进行评审。本发明同时提供一种为科技项目推荐评审专家的决策支持系统(Decision Support System,DSS),将评审专家分配到领域知识相匹配的项目进行科学评审,使得辅助专家(决策用户)实现科学的决策,帮助决策用户提高决策水平和质量,使评审更具科学性和客观性。Today's intelligent recommendation technologies, such as collaborative filtering recommendation, content-based recommendation, etc., are mostly used in film and television recommendation websites and commodity recommendation websites, and there are few researches and applications in the expert information database of scientific and technological project review. Project intelligent recommendation expert technology is still different from general recommendation technology: firstly, the recommendation of science and technology project management system involves all walks of life, and the domain knowledge is very complicated; secondly, the recommendation of science and technology project review experts involves the funding funds of science and technology projects The requirements for objectivity, impartiality and accuracy of expert recommendations are very high. At present, in this regard, our country still lacks systematic method guidance and mature technical support. The information text has the characteristics of "semi-structured", and the content of the expert information and the pending scientific and technological project information can be matched. The present invention makes full use of the structural features and word semantic information to calculate the information similarity between the project and the expert. If the similarity is high, it means that the experts are familiar with the project, and a list of recommended experts will be generated to review the project. The present invention also provides a decision support system (Decision Support System, DSS) that recommends review experts for scientific and technological projects, and assigns review experts to projects that match the domain knowledge for scientific review, so that the auxiliary experts (decision users) can realize scientific decision-making , to help decision-making users improve the decision-making level and quality, and make the review more scientific and objective.
发明内容Contents of the invention
本发明针对现有技术的不足,提供了一种面向科技项目的评审专家智能推荐方法。Aiming at the deficiencies of the prior art, the invention provides an intelligent recommendation method for review experts oriented to scientific and technological projects.
本发明面向科技项目的评审专家推荐过程包括如下步骤:The present invention is oriented to the evaluation expert recommendation process of scientific and technological projects, including the following steps:
步骤1.把科技项目和专家信息中的通用词和惯用词作为专业停用词库;把标点符号、非汉字作为切分标记库。Step 1. Use common words and idiomatic words in scientific and technological projects and expert information as a professional stop lexicon; use punctuation marks and non-Chinese characters as a segmentation tag library.
步骤2.对科技项目信息、专家信息进行分词:根据科技项目信息中切分标记,将项目名称、主要研究内容、技术指标等信息切分成子串序列;根据评审专家信息中切分标记,抽取专家信息、获奖情况、发明情况、发表论文情况、课题承担过的项目及完成情况、研究方向等信息切分成子串序列,一个子串序列即一个字段信息;利用中科院ICTCLAS对子串序列进行分词。Step 2. Carry out word segmentation for scientific and technological project information and expert information: according to the segmentation marks in the scientific and technological project information, the project name, main research content, technical indicators and other information are segmented into substring sequences; according to the segmentation marks in the evaluation expert information, extract Information such as expert information, awards, inventions, published papers, projects undertaken by the subject and their completion, and research directions are divided into substring sequences, and a substring sequence is a field information; the substring sequence is segmented using ICTCLAS of the Chinese Academy of Sciences .
步骤3.科技项目特征词语提取:利用通用停用词库和专业停用词库对分词进行停用词过滤,通用停用词库采用哈工大停用词表,把去除停用词的分词结果作为一个词语集合。Step 3. Feature word extraction for science and technology projects: Use the general stop word database and the professional stop word database to filter the word segmentation. A collection of words.
专业停用词库的构建是一个自学习不断完善的过程,在信息分词过程中不断统计词语的词频,词语在文本出现的概率大于一定阈值,将它纳入到停用词库。The construction of professional inactive thesaurus is a process of self-learning and continuous improvement. In the process of information word segmentation, the word frequency of words is continuously counted. If the probability of words appearing in the text is greater than a certain threshold, it will be included in the inactive thesaurus.
科技项目信息量较大,对词语集合进行词语间语义相似度计算,根据词的语义关系和词的共现关系构建词语网络,计算网络中的词语聚集特征值;然后结合词语的统计特征值,计算词语的关键度来提取出科技项目特征词语;科技项目的特征词语就是提取综合文本的统计特征信息和语义特征信息,更加准确地提取出特征词语。The science and technology project has a large amount of information. The semantic similarity between words is calculated on the word set, and the word network is constructed according to the semantic relationship of words and the co-occurrence relationship of words, and the word aggregation feature value in the network is calculated; then combined with the statistical feature value of words, Calculate the key degree of words to extract the characteristic words of science and technology projects; the characteristic words of science and technology projects are to extract the statistical feature information and semantic feature information of the comprehensive text, and extract the characteristic words more accurately.
所述的语义相似度计算过程如下:The described semantic similarity calculation process is as follows:
在知网语义词典中,如果对于两个词语W1和W2,W1有n个概念:S11,S12,...,S1n,W2有m个概念:S21,S22,...,S2m。词语W1和W2的相似度SimSEM(W1,W2)等于各个概念的相似度之最大值:In HowNet Semantic Dictionary, if for two words W 1 and W 2 , W 1 has n concepts: S11, S12, ..., S1n, W 2 has m concepts: S21, S22, ..., S2m. The similarity SimSEM(W1, W2) of words W 1 and W 2 is equal to the maximum value of the similarity of each concept:
实词和虚词具有不同的描述语言,需要计算其对应的句法义原或关系义原之间的相似度。实词概念包括第一基本义原、其他基本义原、关系义原描述、关系符号描述,相似度分别记为Sim1(p1,p2)、Sim2(p1,p2)、Sim3(p1,p2)、Sim4(p1,p2)。两个特征结构的相似度计算最终还原到基本义原或具体词的相似度计算。Content words and function words have different description languages, and it is necessary to calculate the similarity between their corresponding syntactic sememe or relational sememe. The concept of content words includes the first basic sememe, other basic sememes , relational sememe description, and relational symbol description. ,p 2 ), Sim4(p 1 ,p 2 ). The similarity calculation of two feature structures is finally restored to the basic sememe or the similarity calculation of specific words.
βi(1≤i≤4)是可调节的参数,且有:β1+β2+β3+β4=1,β1≥β2≥β3≥β4。β i (1≤i≤4) is an adjustable parameter, and has: β 1 +β 2 +β 3 +β 4 =1, β 1 ≥β 2 ≥β 3 ≥β 4 .
设CW={C1,C2,...,Cm}为处理后得到的词语集合,其对应的语义相似度邻接矩阵Sm定义为:Let CW={C1, C2, ..., Cm} be the word set obtained after processing, and its corresponding semantic similarity adjacency matrix S m is defined as:
其中,Sim(C1,C2)为词C1与词C2的语义相似度,Sim(Ci,Ci)为1,Sim(Ci,Cj)=Sim(Cj,Ci)。Among them, Sim(C 1 ,C 2 ) is the semantic similarity between word C 1 and word C 2 , Sim(C i ,C i ) is 1, Sim(C i ,C j )=Sim(C j ,C i ).
词语集合CW={C1,C2,...,Cm}经过词语语义相似度计算得到m×(1+m)/2个词语间相似度的值。The word set CW={C1, C2, . . . , Cm} is calculated to obtain the value of m×(1+m)/2 similarity between words through word semantic similarity calculation.
所述的词的共现关系计算过程如下:The co-occurrence relationship calculation process of the words is as follows:
词共现模型是基于统计方法的自然语言处理研究领域的重要模型之一。根据词共现模型,若两个词经常共现在文档的同一窗口单元(如一句话、一个自然段等),这两个词在意义上是相互关联的,它们在一定程度上表达该文本的语义信息。利用滑动窗口(滑动窗口长度为3)对词语序列中的词语进行词语共现度计算,滑动窗口如图1所示:The word co-occurrence model is one of the important models in the research field of natural language processing based on statistical methods. According to the word co-occurrence model, if two words often co-occur in the same window unit of the document (such as a sentence, a natural paragraph, etc.), the two words are related in meaning, and they express the meaning of the text to a certain extent. semantic information. Use the sliding window (the length of the sliding window is 3) to calculate the word co-occurrence degree of the words in the word sequence. The sliding window is as shown in Figure 1:
首先,对词语序列进行词语提取,即去除空格,null以及合并相同的词,得到词语集合CW={C1,C2,...,Cm},其中m≤n。First, word extraction is performed on the word sequence, that is, removing spaces, nulls and merging the same words to obtain a word set CW={C1, C2, ..., Cm}, where m≤n.
词语集合CW对应的词语共现度矩阵Cm定义为:The word co-occurrence matrix Cm corresponding to the word set CW is defined as:
Cm初始时,Coo(Ci,Cj)为01(1≤i,j≤m)。When Cm is initialized, Coo(Ci, Cj) is 01 (1≤i, j≤m).
借助滑动窗口对词语序列进行词语共现度计算,滑动窗口中的词为Ti-1TiTi+1(1<i<n):Calculate the word co-occurrence degree on the word sequence with the help of sliding window. The words in the sliding window are T i-1 T i T i+1 (1<i<n):
1)若i=n-1,转4);若Ti-1是空格或null,滑动窗口滑向下一个词,i++;否则,转2)。1) If i=n-1, go to 4); if T i-1 is a space or null, the sliding window slides to the next word, i++; otherwise, go to 2).
2)若Ti为中文,则Coo(Ti-1,Ti)++,转3);若Ti为null,转3);否则转1)。2) If T i is Chinese, then Coo(T i-1 ,T i )++, go to 3); if T i is null, go to 3); otherwise, go to 1).
3)若Ti是中文,则Coo(Ti-1,Ti+1)++,i++,转1);否则,转1)。3) If T i is Chinese, then Coo(T i-1 ,T i+1 )++, i++, go to 1); otherwise, go to 1).
4)若Tn-2是中文,转5);否则,转7)4) If T n-2 is Chinese, go to 5); otherwise, go to 7)
5)若Tn-1是中文,Coo(Tn-2,Tn-1)++,转6);若Tn-1是空格,转6);否则结束。5) If T n-1 is Chinese, Coo(T n-2 ,T n-1 )++, go to 6); if T n-1 is a space, go to 6); otherwise end.
6)若Tn是中文,Coo(Tn-2,Tn)++,结束;否则结束。6) If T n is Chinese, Coo(T n-2 ,T n )++, end; otherwise, end.
7)若Tn-1是中文,且Tn也是中文,则Coo(Tn-1,Tn)++,结束;否则结束。7) If T n-1 is Chinese, and T n is also Chinese, then Coo(T n-1 ,T n )++, end; otherwise, end.
经过上面步骤的计算,得到词语共现度矩阵Cm,并对Cm的每一个元素进行归一化处理,也就是每一个元素除以矩阵中所有元素的最大值,即max{Coo(Ci,Cj)|1≤i,j≤m}。After the calculation of the above steps, the word co-occurrence matrix Cm is obtained, and each element of Cm is normalized, that is, each element is divided by the maximum value of all elements in the matrix, that is, max{Coo(C i , C j )|1≤i, j≤m}.
所述的词语网络如下:The word network described is as follows:
在构建带权词语网络时,首先要得到词语网络的权值矩阵,定义权值矩阵Wm为:When constructing a weighted word network, the weight matrix of the word network must first be obtained, and the weight matrix Wm is defined as:
其中,α为0.3,β为0.7,强化词语之间的语义关系,弱化词语之间的共现关系。Among them, α is 0.3 and β is 0.7, which strengthens the semantic relationship between words and weakens the co-occurrence relationship between words.
Wm作为输入的词语网络对应的邻接矩阵,则其对应的网络图定义为:G={V,E};其中图G为无向加权图,V表示图G中的顶点集,E表示G中的边集,vi表示V中第i个顶点(词)。W m is the adjacency matrix corresponding to the input word network, and its corresponding network graph is defined as: G={V, E}; where graph G is an undirected weighted graph, V represents the vertex set in graph G, and E represents G In the edge set, v i represents the i-th vertex (word) in V.
所述的词语聚集特征值的计算过程如下:The calculation process of the described word aggregation feature value is as follows:
词语网络的重要特征有度分布、平均最短路径、聚集度与聚集系数。节点的度体现该节点与其它节点的关联情况。节点的聚集度和聚集系数体现在此节点局部范围内的节点相互连接密度。节点的度和聚集系数体现该节点在局部范围内的重要性。本发明通过节点的加权度、聚集系数和节点介数来计算节点的聚集特征值,既能让重要的词语赋予较高的权值,又保证与许多重要的词语有关联的词也有较高的评分。The important features of word network are degree distribution, average shortest path, aggregation degree and aggregation coefficient. The degree of a node reflects the relationship between the node and other nodes. The aggregation degree and aggregation coefficient of a node reflect the interconnection density of nodes within the local range of this node. The degree and aggregation coefficient of a node reflect the local importance of the node. The present invention calculates the aggregation characteristic value of the node through the weighted degree of the node, the aggregation coefficient and the node betweenness, which can not only give important words a higher weight, but also ensure that the words associated with many important words also have a higher score.
在词语语义相似度网络图中,无序偶对(vi,vj)表示节点vi与vj之间的边,则节点vi的加权度的定义为:In the word semantic similarity network graph, the unordered pair (v i , v j ) represents the edge between the node v i and v j , then the weighted degree of the node v i is defined as:
其中,wij为节点vi与vj间边上的权值,n为节点的总个数。Among them, w ij is the weight on the edge between nodes v i and v j , and n is the total number of nodes.
在词语语义相似度网络图中,无序偶对(vi,vj)表示节点vi与vj之间的边,节点vi的非加权度Di为Di=|{(vi,vj):(vi,vj)∈E,vi,vj∈V}|;节点vi的聚集度Ki为邻居节点间存在的实际边数:Ti=|{(vj,vk):(vi,vk)∈E,(vj,vk)∈E,vi,vj∈V}|,则节点vj的聚集系数Ci的定义为:In the word semantic similarity network graph, the unordered pair (v i , v j ) represents the edge between node v i and v j , and the unweighted degree D i of node v i is D i =|{(v i ,v j ):(v i ,v j )∈E,v i ,v j ∈V}|; the aggregation degree K i of node v i is the actual number of edges existing between neighbor nodes: T i =|{(v j ,v k ):(v i ,v k )∈E,(v j ,v k )∈E,v i ,v j ∈V}|, then the clustering coefficient C i of node v j is defined as:
在词语语义相似度网络图中,节点介数Betweenness是节点x和w间且最短路径通过节点vi的可能性概率。两个非相邻节点间的联系度依赖于连接两点间的最短路径上的节点,这些节点潜在扮演控制节点间交互信息流的角色,Bi体现节点vi在局部环境下的互连接度,则节点介数Betweenness的定义为:In the word semantic similarity network graph, betweenness between nodes is the possibility probability that the shortest path passes through node v i between nodes x and w. The connection degree between two non-adjacent nodes depends on the nodes on the shortest path connecting the two points. These nodes potentially play the role of controlling the interactive information flow between nodes. B i reflects the interconnection degree of node v i in the local environment , the node betweenness betweenness is defined as:
d(w,x)表示带权词语语义相似度网络图中任意两节点w和x间最短路径数目,表示任意两节点w和x且经过vi(vi∈G)的最短路径数目。d(w,x) represents the number of shortest paths between any two nodes w and x in the weighted word semantic similarity network graph, Indicates the number of shortest paths between any two nodes w and x passing through v i (v i ∈ G).
将节点vi的平均加权度、聚集系数和介数Betweenness进行加权综合衡量节点的聚集特征值,节点vi的聚集特征值Zi的定义为:The average weighted degree, aggregation coefficient and betweenness of node v i are weighted to comprehensively measure the aggregation characteristic value of the node. The aggregation characteristic value Z i of node v i is defined as:
其中,a+b+c=1。Among them, a+b+c=1.
所述的词语的统计特征值的计算过程如下:The calculation process of the statistical characteristic value of described words is as follows:
采用非线性函数对词频进行归一化处理。词语Wi在文本中的词频权重TFi定义为:Word frequencies are normalized using a nonlinear function. The word frequency weight TFi of word W i in the text is defined as:
其中,TFi表示词语Wi的词频权重,pj表示文本中的某个词语,f为词频统计函数。Among them, TFi represents the word frequency weight of the word W i , p j represents a certain word in the text, and f is the word frequency statistical function.
中文文本中能标识文本特性的一般是实词,如名词、动词、形容词等。而感叹词、介词、连词等虚词对确定文本类别基本没有意义,会对特征词语提取带来很大干扰。词语Wi在文本中的词性权重posi定义为:Content words, such as nouns, verbs, adjectives, etc., can identify text characteristics in Chinese texts. However, function words such as interjections, prepositions, and conjunctions are basically meaningless to determine the text category, and will greatly interfere with the extraction of feature words. The part-of-speech weight posi of word W i in the text is defined as:
词越长越能反映具体的信息,反之,较短的词的所表示意义通常较抽象。尤其在文档中的特征词语多是一些专业学术组合词汇,长度较长,其含义更明确,更能反映文本主题。增加长词的权重,有利于对词汇进行分割,从而更准确地反映出词在文档中的重要程度。The longer the word, the more specific information it can reflect. On the contrary, the meaning expressed by the shorter word is usually more abstract. In particular, the characteristic words in the document are mostly some professional academic combination words, which are longer in length, have clearer meanings, and can better reflect the theme of the text. Increasing the weight of long words is conducive to the segmentation of words, so as to more accurately reflect the importance of words in documents.
词语Wi在文本中的词长权重leni定义为:The word length weight leni of word W i in the text is defined as:
对于词语序列中的每个词,其统计特征值为For each word in the word sequence, its statistical feature value is
statsi=A*TFi+B*posi+C*leni stats i =A*TF i +B*pos i +C*len i
其中,A+B+C=1。Among them, A+B+C=1.
所述的词语Wi关键度的计算过程如下:The calculation process of the word W i key degree is as follows:
对应于加权词语网络中的每个节点,它的关键度值Impi定义为:Corresponding to each node in the weighted word network, its key value Imp i is defined as:
Impi=β*statsi+(1-β)*Zi Imp i =β*stats i +(1-β)*Z i
其中,0<β<1。Wherein, 0<β<1.
通过计算将得到关键度的值,从大到小排序,设定一个阈值γ(0<γ<1),取出前q个的值,则这些词语将作为科技项目的特征词语,这些词语充分反映主题,而且是比较重要的词语。By calculating the value of the key degree, sort from large to small, set a threshold γ (0<γ<1), and take out the first q values, then these words will be used as the characteristic words of the scientific and technological project, and these words fully reflect the Themes, and more important words.
步骤4.评审专家特征词语提取:评审专家信息量较科技项目信息少,科技项目的特征词构建网络并基于统计特征和语义特征的提取技术,不适合评审专家信息的特征词语提取,直接根据通用停用词库和专业停用词库进行停用词过滤,提取每个专家的特征词集合,通用停用词库是也是采用哈工大停用词表,专业停用词库需要人员进行不断地维护。Step 4. Extraction of characteristic words of review experts: the amount of information of review experts is less than that of science and technology projects. The network of feature words of science and technology projects is constructed based on the extraction technology of statistical features and semantic features, which is not suitable for the extraction of feature words of review expert information. The stop word database and the professional stop word database are used to filter stop words and extract the feature word set of each expert. The general stop word database also adopts the stop word list of Harbin Institute of Technology, and the professional stop lexicon needs continuous maintenance by personnel. .
步骤5.构建科技项目、评审专家的分字段知识表示模型:通过对空间向量模型和物元知识集模型进行扩展,依据科技项目中的不同字段信息建立文本表示模型PRO=(id,F,WF,T,V),其中id表示在项目库中的标识字段;F表示科技项目中字段类别集合;WF为字段的权重;T为特征词语;V表示字段所对应的词语及其权重集合即Vi={vi1,f(vi1),vi2,f(vi2),...,vin,f(vin)},vij表示第i个字段中的第j个特征词语,f(vij)表示vij关键词所对应的频数。科技项目信息的知识表示如下:Step 5. Construct the sub-field knowledge representation model of scientific and technological projects and review experts: by extending the space vector model and the matter-element knowledge set model, a text representation model PRO=(id, F, WF is established according to different field information in scientific and technological projects , T, V), where id represents the identification field in the project library; F represents the field category set in the science and technology project; WF is the weight of the field; T is the characteristic word; V represents the word corresponding to the field and its weight set, that is, V i = {v i1 , f(v i1 ), v i2 , f(v i2 ),..., v in , f(v in )}, v ij represents the jth feature word in the i-th field, f(v ij ) represents the frequency corresponding to the keyword v ij . The knowledge representation of technology item information is as follows:
同理,根据专家中的不同字段信息建立知识表示模型TM=(id,F,WF,T,V)。其中,id表示在专家库中的标识字段;F表示评审专家中字段类别集合;WF为字段的权重集合;T为特征词语;V表示字段所对应的特征词语及其权重集合即Vi={vi1,f(vi1),vi2,f(vi2),...,vin,f(vin)},vij表示第i个字段中的第j个特征词语,f(vij)表示vij特征词语在所对应的字段内的出现频率。评审专家信息的知识表示为:Similarly, a knowledge representation model TM=(id, F, WF, T, V) is established according to different field information in experts. Among them, id represents the identification field in the expert database; F represents the field category set in the review expert; WF is the weight set of the field; T is the characteristic word; v i1 ,f(v i1 ),v i2 ,f(v i2 ),...,v in ,f(v in )}, v ij represents the jth feature word in the i-th field, f(v ij ) indicates the occurrence frequency of v ij feature words in the corresponding field. The knowledge of reviewing expert information is expressed as:
评审专家信息索引库构建:待评审专家知识表示模型构建完成后,将信息索引入库:首先从专家库中读取一个评审专家的内容项信息;基于分词结果建立词语语义网络并提取评审专家所包含的特征词;依据知识表示模型并利用Apache Lucene对其建立索引;将建立好的索引按所属类别加至对应的索引库中,直到所有的评审专家索引入库。Construction of review expert information index database: After the construction of the review expert knowledge representation model is completed, the information is indexed into the database: first, a review expert’s content item information is read from the expert database; word semantic network is established based on the word segmentation results and the review expert’s information is extracted. The included feature words; based on the knowledge representation model and using Apache Lucene to build an index; add the established index to the corresponding index library according to its category, until all the review expert indexes are stored in the library.
步骤6:根据项目的个数,推荐方式分为单一待审项目推荐专家和分组(多个)待审项目推荐专家。分组推荐专家对步骤5的待审项目知识表示模型做相应的字段间和项目间的特征合并操作,单一待审专家推荐只做相应的字段间特征合并操作。同时,对步骤5的评审专家的知识表示模型进行字段间特征合并。依据知识表示模型并利用Apache Lucene对合并后的特征信息建立索引。其中,科技项目索引构建在进行项目推荐时进行。Step 6: According to the number of projects, the recommendation method is divided into single pending project recommendation experts and group (multiple) pending project recommendation experts. Experts recommended by groups perform the corresponding inter-field and inter-item feature merging operations on the pending project knowledge representation model in step 5, and a single pending expert recommendation only performs the corresponding inter-field feature merging operation. At the same time, the inter-field features are combined for the knowledge representation model of the review experts in step 5. Based on the knowledge representation model and using Apache Lucene to index the merged feature information. Wherein, the science and technology project index construction is carried out when performing project recommendation.
科技项目申报管理系统中待审项目往往是需要分组推荐的,上述特征合并操作,确保不会消除步骤5中知识表示模型设置不同字段权重对相似度计算产生推荐的贡献差异。Projects to be reviewed in the technology project application management system often need to be recommended in groups. The above-mentioned feature merging operation ensures that the difference in the contribution of different field weights set by the knowledge representation model in step 5 to the similarity calculation to generate recommendations will not be eliminated.
所述的待审项目、评审专家的特征合并通过逻辑异或操作进行过程如下:The feature merging of the items to be reviewed and the review experts is carried out through the logical XOR operation as follows:
(1)一个待审项目、一个评审专家的字段间特征合并(1) Feature merging between fields of a project to be reviewed and a review expert
假设字段特征词集合W'1和W'2合并,则定义W'1和W'2合并规则为:Assuming that the field feature word sets W' 1 and W' 2 are merged, then define the merging rules of W' 1 and W' 2 for:
其中,word1i,word2j为特征词。Among them, word 1i and word 2j are feature words.
加入字段权重改进并扩展上述定义,对评审专家、科技项目的字段间特征进行合并,合并规则为:Add field weights to improve and expand the above definition, and combine the characteristics of review experts and technology projects between fields. The merging rules are:
(2)分组待审项目的项目间特征合并(2) Inter-item feature merging of group pending items
这一合并过程操作只针对待审科技项目的特征向量,不针对评审专家特征向量,专家特征向量只需要做字段间特征合并操作。若V(d1)和V(d2)分别是两个科技项目经过字段间特征合并后的向量模型,对任意t1j∈V(d1),t2j∈V(d2),若存在t1j与t2j相同则合并。定义为:This merging process operation is only for the feature vectors of pending scientific and technological projects, not for the feature vectors of review experts, who only need to perform feature merging operations between fields. If V(d 1 ) and V(d 2 ) are respectively the vector models of two scientific and technological items after inter-field feature merging, for any t 1j ∈ V(d 1 ), t 2j ∈ V(d 2 ), if there is Merge if t 1j is the same as t 2j . defined as:
其中,k=1,…,n,tk为特征词条项,wk(p)为tk的权重。Wherein, k=1,...,n, t k is a feature term item, and w k (p) is the weight of t k .
科技项目组的知识表示模型产生的基本过程如下:The basic process of generating the knowledge representation model of the science and technology project team is as follows:
a).合并科技项目字段间特征,得到每个项目的向量模型V(d);a). Merge the inter-field features of science and technology items to obtain the vector model V(d) of each item;
b).将所有科技项目向量模型集合采用合并策略通过上述的方法,对科技项目组建立基于向量空间的知识表示模型。b). Use the merge strategy for all vector model collections of science and technology items Through the above method, a knowledge representation model based on vector space is established for the science and technology project team.
V(p)={<t1,w1(p)>,<t2,w2(p)>,...,<tn,wn(p)>}V(p)={<t 1 ,w 1 (p)>,<t 2 ,w 2 (p)>,...,<t n ,w n (p)>}
其中,k=1,…,n,tk为项目组特征词词条项,wk(p)为tk的权重。Among them, k=1,...,n, t k is the feature word entry of the item group, and w k (p) is the weight of t k .
步骤7.经过步骤6的评审专家和科技项目的知识表示模型的字段间特征进行合并,假设评审专家信息向量若表示为P={s1,f(s1),s2,f(s2),...,sn,f(sn)},科技项目信息(组)向量表示为Q={t1,f(t1),t2,f(t2),...,tn,f(tn)},基于最大匹配算法计算待审科技项目(组)向量与评审专家的语义相似度。Step 7. After step 6, the review experts and the inter-field features of the knowledge representation model of the scientific and technological project are merged, assuming that if the review expert information vector is expressed as P={s 1 ,f(s 1 ),s 2 ,f(s 2 ),...,s n ,f(s n )}, the vector of technology project information (group) is expressed as Q={t 1 ,f(t 1 ),t 2 ,f(t 2 ),..., t n ,f(t n )}, based on the maximum matching algorithm, calculate the semantic similarity between the vector of the technology project (group) pending review and the review experts.
步骤8.设置相似度截断,依据相似度的大小产生推荐指数,产生最终的推荐评审专家列表。Step 8. Set the similarity cutoff, generate the recommendation index according to the similarity, and generate the final recommendation review expert list.
本发明有益效果如下:The beneficial effects of the present invention are as follows:
能够更加便捷地、智能地、精准地推荐出科技项目评审专家;能够大大减轻科技项目申报管理系统科技工作者对评审专家的分配任务,减少管理的成本费用;能够保证评审专家与待审科技项目具有较高的领域匹配度,保证评审专家对项目的评审做到客观性、公正性和科学性,提供自动的、高效的、公正的决策支持,避免科技项目审批出现人情关系网、“马太效应”等审批不端的问题。It can more conveniently, intelligently and accurately recommend review experts for scientific and technological projects; it can greatly reduce the assignment tasks of scientific and technological workers to review experts in the application management system of scientific and technological projects, and reduce management costs; it can ensure that review experts and pending scientific and technological projects It has a high degree of field matching, which ensures that review experts are objective, fair and scientific in project review, provides automatic, efficient and fair decision-making support, and avoids the emergence of human relationship networks in the approval of scientific and technological projects. "Matthew Effect” and other improper approval issues.
附图说明Description of drawings
图1是本发明中进行词语共现度计算滑动窗口。Fig. 1 is a sliding window for word co-occurrence calculation in the present invention.
图2是本发明中基于二部图的最大匹配算法原理示意图。Fig. 2 is a schematic diagram of the principle of the bipartite graph-based maximum matching algorithm in the present invention.
图3是本发明中面向科技项目的评审专家智能推荐方法流程图。Fig. 3 is a flow chart of the intelligent recommendation method for review experts oriented to scientific and technological projects in the present invention.
图4是本发明中科技项目和评审专家信息的特征词的提取流程图。Fig. 4 is a flow chart of extracting feature words of scientific and technological projects and review expert information in the present invention.
图5是本发明中评审专家知识索引库构建流程图。Fig. 5 is a flow chart of the construction of the review expert knowledge index database in the present invention.
具体实施方式detailed description
下面结合附图对本发明作进一步说明,应该强调的是下述说明仅仅是示例性的,而不是为了限制本发明的范围及其应用。以下对本发明的具体实施方式作进一步详述,基于发明中的实施例,本领域普通技术人员在没有创造性劳动前提下所获得的所有其他实施例,都属于本发明的保护范围。The present invention will be further described below in conjunction with the accompanying drawings. It should be emphasized that the following description is only exemplary, not intended to limit the scope of the present invention and its application. The specific implementation of the present invention will be described in further detail below. Based on the embodiments of the invention, all other embodiments obtained by those of ordinary skill in the art without creative work all belong to the protection scope of the present invention.
如图3所示,本发明的推荐方法的主要思路是:(1)针对科技项目申报管理系统中的专家信息和待审科技项目信息,将主要文本切分成子串序列并进行中科院ICTCLAS分词,对分词结果进行停用词过滤得到词语集合;(2)科技项目信息包括主要研究内容、技术指标等信息,信息量较大,发明根据词的语义关系和词的共现关系构建词语网络,并计算词语网络的节点聚集特征值,与统计特征值加权计算词语关键度,提取每个科技项目的特征词;(3)专家信息比科技项目信息精简,信息量较少,直接将每个专家信息经过滤得到的词语集合作为特征词;(4)根据科技项目、专家字段信息的重要性不同设置字段权重,依据(2)和(3)得到的特征词,分别构建针对项目和专家的知识表示模型,构建专家索引库;(5)分组推荐专家模型待审项目知识表示模型做字段间和项目间的特征合并操作,单一待审项目专家推荐只做字段间特征合并操作。同时对专家知识表示模型做字段间特征合并。(6)综合考虑词语具有语义模糊匹配的特征,计算专家信息与待审科技项目信息的相似度,通过设定阈值截断产生最终推荐专家列表。As shown in Figure 3, the main train of thought of the recommendation method of the present invention is: (1) for the expert information in the scientific and technological project declaration management system and the information of the scientific and technological project to be examined, the main text is divided into substring sequences and ICTCLAS word segmentation of the Chinese Academy of Sciences, Filter the word segmentation results to obtain the word set; (2) the scientific and technological project information includes the main research content, technical indicators and other information. Calculate the node aggregation eigenvalues of the word network, calculate the key words weighted with the statistical eigenvalues, and extract the characteristic words of each scientific and technological project; (3) The expert information is more streamlined than the scientific and technological project information, and the amount of information is less, and each expert information is directly The filtered word set is used as the feature word; (4) Set the field weight according to the importance of the scientific and technological project and expert field information, and construct the knowledge representation for the project and expert according to the feature words obtained in (2) and (3). (5) The knowledge representation model of the pending project knowledge representation model performs feature merging operations between fields and items, and the expert recommendation of a single pending project only performs feature merging operations between fields. At the same time, feature merging between fields is performed on the expert knowledge representation model. (6) Considering that words have semantic fuzzy matching characteristics, calculate the similarity between expert information and pending technology project information, and generate the final recommended expert list by setting a threshold cutoff.
步骤1.把科技项目和专家信息中的通用词和惯用词作为专业停用词库;把标点符号、非汉字作为切分标记库。Step 1. Use common words and idiomatic words in scientific and technological projects and expert information as a professional stop lexicon; use punctuation marks and non-Chinese characters as a segmentation tag library.
步骤2.对科技项目信息、专家信息进行分词:根据科技项目信息中切分标记,将项目名称、主要研究内容、技术指标等信息切分成子串序列;根据评审专家信息中切分标记,抽取专家信息、获奖情况、发明情况、发表论文情况、课题承担过的项目及完成情况、研究方向等信息切分成子串序列,一个子串序列即一个字段信息;利用中科院ICTCLAS对子串序列进行分词。Step 2. Carry out word segmentation for scientific and technological project information and expert information: according to the segmentation marks in the scientific and technological project information, the project name, main research content, technical indicators and other information are segmented into substring sequences; according to the segmentation marks in the evaluation expert information, extract Information such as expert information, awards, inventions, published papers, projects undertaken by the subject and their completion, and research directions are divided into substring sequences, and a substring sequence is a field information; the substring sequence is segmented using ICTCLAS of the Chinese Academy of Sciences .
步骤3.科技项目特征词语提取:利用通用停用词库和专业停用词库对分词进行停用词过滤,通用停用词库采用哈工大停用词表,把去除停用词的分词结果作为一个词语集合,参见图4。Step 3. Feature word extraction for science and technology projects: Use the general stop word database and the professional stop word database to filter the word segmentation. A set of words, see Figure 4.
专业停用词库的构建是一个自学习不断完善的过程,在信息分词过程中不断统计词语的词频,词语在文本出现的概率大于一定阈值,将它纳入到停用词库。The construction of professional inactive thesaurus is a process of self-learning and continuous improvement. In the process of information word segmentation, the word frequency of words is continuously counted. If the probability of words appearing in the text is greater than a certain threshold, it will be included in the inactive thesaurus.
科技项目信息量较大,对词语集合进行词语间语义相似度计算,根据词的语义关系和词的共现关系构建词语网络,计算网络中的词语聚集特征值;然后结合词语的统计特征值,计算词语的关键度来提取出科技项目特征词语;科技项目的特征词语就是提取综合文本的统计特征信息和语义特征信息,更加准确地提取出特征词语。The science and technology project has a large amount of information. The semantic similarity between words is calculated on the word set, and the word network is constructed according to the semantic relationship of words and the co-occurrence relationship of words, and the word aggregation feature value in the network is calculated; then combined with the statistical feature value of words, Calculate the key degree of words to extract the characteristic words of science and technology projects; the characteristic words of science and technology projects are to extract the statistical feature information and semantic feature information of the comprehensive text, and extract the characteristic words more accurately.
所述的语义相似度计算过程如下:The described semantic similarity calculation process is as follows:
在知网语义词典中,如果对于两个词语W1和W2,W1有n个概念:S11,S12,...,S1n,W2有m个概念:S21,S22,...,S2m。词语W1和W2的相似度SimSEM(W1,W2)等于各个概念的相似度之最大值:In HowNet Semantic Dictionary, if for two words W 1 and W 2 , W 1 has n concepts: S11, S12, ..., S1n, W 2 has m concepts: S21, S22, ..., S2m. The similarity SimSEM(W1, W2) of words W 1 and W 2 is equal to the maximum value of the similarity of each concept:
实词和虚词具有不同的描述语言,需要计算其对应的句法义原或关系义原之间的相似度。实词概念包括第一基本义原、其他基本义原、关系义原描述、关系符号描述,相似度分别记为Sim1(p1,p2)、Sim2(p1,p2)、Sim3(p1,p2)、Sim4(p1,p2)。两个特征结构的相似度计算最终还原到基本义原或具体词的相似度计算。Content words and function words have different description languages, and it is necessary to calculate the similarity between their corresponding syntactic sememe or relational sememe. The concept of content words includes the first basic sememe, other basic sememes , relational sememe description, and relational symbol description. ,p 2 ), Sim4(p 1 ,p 2 ). The similarity calculation of two feature structures is finally restored to the basic sememe or the similarity calculation of specific words.
βi(1≤i≤4)是可调节的参数,且有:β1+β2+β3+β4=1,β1≥β2≥β3≥β4。β i (1≤i≤4) is an adjustable parameter, and has: β 1 +β 2 +β 3 +β 4 =1, β 1 ≥β 2 ≥β 3 ≥β 4 .
设CW={C1,C2,...,Cm}为处理后得到的词语集合,其对应的语义相似度邻接矩阵Sm定义为:Let CW={C1, C2, ..., Cm} be the word set obtained after processing, and its corresponding semantic similarity adjacency matrix S m is defined as:
其中,Sim(C1,C2)为词C1与词C2的语义相似度,Sim(Ci,Ci)为1,Sim(Ci,Cj)=Sim(Cj,Ci)。Among them, Sim(C 1 ,C 2 ) is the semantic similarity between word C 1 and word C 2 , Sim(C i ,C i ) is 1, Sim(C i ,C j )=Sim(C j ,C i ).
词语集合CW={C1,C2,...,Cm}经过词语语义相似度计算得到m×(1+m)/2个词语间相似度的值。The word set CW={C1, C2, . . . , Cm} is calculated to obtain the value of m×(1+m)/2 similarity between words through word semantic similarity calculation.
所述的词的共现关系计算过程如下:The co-occurrence relationship calculation process of the words is as follows:
词共现模型是基于统计方法的自然语言处理研究领域的重要模型之一。根据词共现模型,若两个词经常共现在文档的同一窗口单元(如一句话、一个自然段等),这两个词在意义上是相互关联的,它们在一定程度上表达该文本的语义信息。利用滑动窗口(滑动窗口长度为3)对词语序列中的词语进行词语共现度计算,滑动窗口如图1所示:The word co-occurrence model is one of the important models in the research field of natural language processing based on statistical methods. According to the word co-occurrence model, if two words often co-occur in the same window unit of the document (such as a sentence, a natural paragraph, etc.), the two words are related in meaning, and they express the meaning of the text to a certain extent. semantic information. Use the sliding window (the length of the sliding window is 3) to calculate the word co-occurrence degree of the words in the word sequence. The sliding window is as shown in Figure 1:
首先,对词语序列进行词语提取,即去除空格,null以及合并相同的词,得到词语集合CW={C1,C2,...,Cm},其中m≤n。First, word extraction is performed on the word sequence, that is, removing spaces, nulls and merging the same words to obtain a word set CW={C1, C2, ..., Cm}, where m≤n.
词语集合CW对应的词语共现度矩阵Cm定义为:The word co-occurrence matrix Cm corresponding to the word set CW is defined as:
Cm初始时,Coo(Ci,Cj)为01(1≤i,j≤m)。When Cm is initialized, Coo(Ci, Cj) is 01 (1≤i, j≤m).
借助滑动窗口对词语序列进行词语共现度计算,滑动窗口中的词为Ti-1TiTi+1(1<i<n):Calculate the word co-occurrence degree on the word sequence with the help of sliding window. The words in the sliding window are T i-1 T i T i+1 (1<i<n):
1)若i=n-1,转4);若Ti-1是空格或null,滑动窗口滑向下一个词,i++;否则,转2)。1) If i=n-1, go to 4); if T i-1 is a space or null, the sliding window slides to the next word, i++; otherwise, go to 2).
2)若Ti为中文,则Coo(Ti-1,Ti)++,转3);若Ti为null,转3);否则转1)。2) If T i is Chinese, then Coo(T i-1 ,T i )++, go to 3); if T i is null, go to 3); otherwise, go to 1).
3)若Ti是中文,则Coo(Ti-1,Ti+1)++,i++,转1);否则,转1)。3) If T i is Chinese, then Coo(T i-1 ,T i+1 )++, i++, go to 1); otherwise, go to 1).
4)若Tn-2是中文,转5);否则,转7)4) If T n-2 is Chinese, go to 5); otherwise, go to 7)
5)若Tn-1是中文,Coo(Tn-2,Tn-1)++,转6);若Tn-1是空格,转6);否则结束。5) If T n-1 is Chinese, Coo(T n-2 ,T n-1 )++, go to 6); if T n-1 is a space, go to 6); otherwise end.
6)若Tn是中文,Coo(Tn-2,Tn)++,结束;否则结束。6) If T n is Chinese, Coo(T n-2 ,T n )++, end; otherwise, end.
7)若Tn-1是中文,且Tn也是中文,则Coo(Tn-1,Tn)++,结束;否则结束。7) If T n-1 is Chinese, and T n is also Chinese, then Coo(T n-1 ,T n )++, end; otherwise, end.
经过上面步骤的计算,得到词语共现度矩阵Cm,并对Cm的每一个元素进行归一化处理,也就是每一个元素除以矩阵中所有元素的最大值,即max{Coo(Ci,Cj)|1≤i,j≤m}。After the calculation of the above steps, the word co-occurrence matrix Cm is obtained, and each element of Cm is normalized, that is, each element is divided by the maximum value of all elements in the matrix, that is, max{Coo(C i , C j )|1≤i, j≤m}.
所述的词语网络如下:The word network described is as follows:
在构建带权词语网络时,首先要得到词语网络的权值矩阵,定义权值矩阵Wm为:When constructing a weighted word network, the weight matrix of the word network must first be obtained, and the weight matrix Wm is defined as:
其中,α为0.3,β为0.7,强化词语之间的语义关系,弱化词语之间的共现关系。Among them, α is 0.3 and β is 0.7, which strengthens the semantic relationship between words and weakens the co-occurrence relationship between words.
Wm作为输入的词语网络对应的邻接矩阵,则其对应的网络图定义为:G={V,E};其中图G为无向加权图,V表示图G中的顶点集,E表示G中的边集,vi表示V中第i个顶点(词)。W m is the adjacency matrix corresponding to the input word network, and its corresponding network graph is defined as: G={V, E}; where graph G is an undirected weighted graph, V represents the vertex set in graph G, and E represents G In the edge set, v i represents the i-th vertex (word) in V.
所述的词语聚集特征值的计算过程如下:The calculation process of the described word aggregation feature value is as follows:
词语网络的重要特征有度分布、平均最短路径、聚集度与聚集系数。节点的度体现该节点与其它节点的关联情况。节点的聚集度和聚集系数体现在此节点局部范围内的节点相互连接密度。节点的度和聚集系数体现该节点在局部范围内的重要性。本发明通过节点的加权度、聚集系数和节点介数来计算节点的聚集特征值,既能让重要的词语赋予较高的权值,又保证与许多重要的词语有关联的词也有较高的评分。The important features of word network are degree distribution, average shortest path, aggregation degree and aggregation coefficient. The degree of a node reflects the relationship between the node and other nodes. The aggregation degree and aggregation coefficient of a node reflect the interconnection density of nodes within the local range of this node. The degree and aggregation coefficient of a node reflect the local importance of the node. The present invention calculates the aggregation characteristic value of the node through the weighted degree of the node, the aggregation coefficient and the node betweenness, which can not only give important words a higher weight, but also ensure that the words associated with many important words also have a higher score.
在词语语义相似度网络图中,无序偶对(vi,vj)表示节点vi与vj之间的边,则节点vi的加权度的定义为:In the word semantic similarity network graph, the unordered pair (v i , v j ) represents the edge between the node v i and v j , then the weighted degree of the node v i is defined as:
其中,wij为节点vi与vj间边上的权值,n为节点的总个数。Among them, w ij is the weight on the edge between nodes v i and v j , and n is the total number of nodes.
在词语语义相似度网络图中,无序偶对(vi,vj)表示节点vi与vj之间的边,节点vi的非加权度Di为Di=|{(vi,vj):(vi,vj)∈E,vi,vj∈V}|;节点vi的聚集度Ki为邻居节点间存在的实际边数:Ti=|{(vj,vk):(vi,vk)∈E,(vj,vk)∈E,vi,vj∈V}|,则节点vj的聚集系数Ci的定义为:In the word semantic similarity network graph, the unordered pair (v i , v j ) represents the edge between node v i and v j , and the unweighted degree D i of node v i is D i =|{(v i ,v j ):(v i ,v j )∈E,v i ,v j ∈V}|; the aggregation degree K i of node v i is the actual number of edges existing between neighbor nodes: T i =|{(v j ,v k ):(v i ,v k )∈E,(v j ,v k )∈E,v i ,v j ∈V}|, then the clustering coefficient C i of node v j is defined as:
在词语语义相似度网络图中,节点介数Betweenness是节点x和w间且最短路径通过节点vi的可能性概率。两个非相邻节点间的联系度依赖于连接两点间的最短路径上的节点,这些节点潜在扮演控制节点间交互信息流的角色,Bi体现节点vi在局部环境下的互连接度,则节点介数Betweenness的定义为:In the word semantic similarity network graph, betweenness between nodes is the possibility probability that the shortest path passes through node v i between nodes x and w. The connection degree between two non-adjacent nodes depends on the nodes on the shortest path connecting the two points. These nodes potentially play the role of controlling the interactive information flow between nodes. B i reflects the interconnection degree of node v i in the local environment , the node betweenness betweenness is defined as:
d(w,x)表示带权词语语义相似度网络图中任意两节点w和x间最短路径数目,表示任意两节点w和x且经过vi(vi∈G)的最短路径数目。d(w,x) represents the number of shortest paths between any two nodes w and x in the weighted word semantic similarity network graph, Indicates the number of shortest paths between any two nodes w and x passing through v i (v i ∈ G).
将节点vi的平均加权度、聚集系数和介数Betweenness进行加权综合衡量节点的聚集特征值,节点vi的聚集特征值Zi的定义为:The average weighted degree, aggregation coefficient and betweenness of node v i are weighted to comprehensively measure the aggregation characteristic value of the node. The aggregation characteristic value Z i of node vi is defined as:
其中,a+b+c=1。Among them, a+b+c=1.
所述的词语的统计特征值的计算过程如下:The calculation process of the statistical characteristic value of described words is as follows:
采用非线性函数对词频进行归一化处理。词语Wi在文本中的词频权重TFi定义为:Word frequencies are normalized using a nonlinear function. The word frequency weight TFi of word W i in the text is defined as:
其中,TFi表示词语Wi的词频权重,pj表示文本中的某个词语,f为词频统计函数。Among them, TFi represents the word frequency weight of the word W i , p j represents a certain word in the text, and f is the word frequency statistical function.
中文文本中能标识文本特性的一般是实词,如名词、动词、形容词等。而感叹词、介词、连词等虚词对确定文本类别基本没有意义,会对特征词语提取带来很大干扰。词语Wi在文本中的词性权重posi定义为:Content words, such as nouns, verbs, adjectives, etc., can identify text characteristics in Chinese texts. However, function words such as interjections, prepositions, and conjunctions are basically meaningless to determine the text category, and will greatly interfere with the extraction of feature words. The part-of-speech weight posi of word W i in the text is defined as:
词越长越能反映具体的信息,反之,较短的词的所表示意义通常较抽象。尤其在文档中的特征词语多是一些专业学术组合词汇,长度较长,其含义更明确,更能反映文本主题。增加长词的权重,有利于对词汇进行分割,从而更准确地反映出词在文档中的重要程度。The longer the word, the more specific information it can reflect. On the contrary, the meaning expressed by the shorter word is usually more abstract. In particular, the characteristic words in the document are mostly some professional academic combination words, which are longer in length, have clearer meanings, and can better reflect the theme of the text. Increasing the weight of long words is conducive to the segmentation of words, so as to more accurately reflect the importance of words in documents.
词语Wi在文本中的词长权重leni定义为:The word length weight leni of word W i in the text is defined as:
对于词语序列中的每个词,其统计特征值为For each word in the word sequence, its statistical feature value is
statsi=A*TFi+B*posi+C*leni stats i =A*TF i +B*pos i +C*len i
其中,A+B+C=1。Among them, A+B+C=1.
所述的词语Wi关键度的计算过程如下:The calculation process of the word W i key degree is as follows:
对应于加权词语网络中的每个节点,它的关键度值Impi定义为:Corresponding to each node in the weighted word network, its key value Imp i is defined as:
Impi=β*statsi+(1-β)*Zi Imp i =β*stats i +(1-β)*Z i
其中,0<β<1。Wherein, 0<β<1.
通过计算将得到关键度的值,从大到小排序,设定一个阈值γ(0<γ<1),取出前q个的值,则这些词语将作为科技项目的特征词语,这些词语充分反映主题,而且是比较重要的词语。By calculating the value of the key degree, sort from large to small, set a threshold γ (0<γ<1), and take out the first q values, then these words will be used as the characteristic words of the scientific and technological project, and these words fully reflect the Themes, and more important words.
步骤4.评审专家特征词语提取:评审专家信息量较科技项目信息少,科技项目的特征词构建网络并基于统计特征和语义特征的提取技术,不适合评审专家信息的特征词语提取,直接根据通用停用词库和专业停用词库进行停用词过滤,提取每个专家的特征词集合,通用停用词库是也是采用哈工大停用词表,专业停用词库需要人员进行不断地维护。Step 4. Extraction of characteristic words of review experts: the amount of information of review experts is less than that of science and technology projects. The network of feature words of science and technology projects is constructed based on the extraction technology of statistical features and semantic features, which is not suitable for the extraction of feature words of review expert information. The stop word database and the professional stop word database are used to filter stop words and extract the feature word set of each expert. The general stop word database also adopts the stop word list of Harbin Institute of Technology, and the professional stop lexicon needs continuous maintenance by personnel. .
步骤5.构建科技项目、评审专家的分字段知识表示模型:通过对空间向量模型和物元知识集模型进行扩展,依据科技项目中的不同字段信息建立文本表示模型PRO=(id,F,WF,T,V),其中id表示在项目库中的标识字段;F表示科技项目中字段类别集合;WF为字段的权重;T为特征词语;V表示字段所对应的词语及其权重集合即Vi={vi1,f(vi1),vi2,f(vi2),...,vin,f(vin)},vij表示第i个字段中的第j个特征词语,f(vij)表示vij关键词所对应的频数。科技项目信息的知识表示如下:Step 5. Construct the sub-field knowledge representation model of scientific and technological projects and review experts: by extending the space vector model and the matter-element knowledge set model, a text representation model PRO=(id, F, WF is established according to different field information in scientific and technological projects , T, V), where id represents the identification field in the project library; F represents the field category set in the science and technology project; WF is the weight of the field; T is the characteristic word; V represents the word corresponding to the field and its weight set, that is, V i = {v i1 , f(v i1 ), v i2 , f(v i2 ),..., v in , f(v in )}, v ij represents the jth feature word in the i-th field, f(v ij ) represents the frequency corresponding to the keyword v ij . The knowledge representation of technology item information is as follows:
同理,根据专家中的不同字段信息建立知识表示模型TM=(id,F,WF,T,V)。其中,id表示在专家库中的标识字段;F表示评审专家中字段类别集合;WF为字段的权重集合;T为特征词语;V表示字段所对应的特征词语及其权重集合即Vi={vi1,f(vi1),vi2,f(vi2),...,vin,f(vin)},vij表示第i个字段中的第j个特征词语,f(vij)表示vij特征词语在所对应的字段内的出现频率。评审专家信息的知识表示为:Similarly, a knowledge representation model TM=(id, F, WF, T, V) is established according to different field information in experts. Among them, id represents the identification field in the expert database; F represents the field category set in the review expert; WF is the weight set of the field; T is the characteristic word; v i1 ,f(v i1 ),v i2 ,f(v i2 ),...,v in ,f(v in )}, v ij represents the jth feature word in the i-th field, f(v ij ) indicates the occurrence frequency of v ij feature words in the corresponding field. The knowledge of reviewing expert information is expressed as:
评审专家信息索引库构建:待评审专家知识表示模型构建完成后,将信息索引入库:首先从专家库中读取一个评审专家的内容项信息;基于分词结果建立词语语义网络并提取评审专家所包含的特征词;依据知识表示模型并利用Apache Lucene对其建立索引;将建立好的索引按所属类别加至对应的索引库中,直到所有的评审专家索引入库,参见图5。Construction of review expert information index database: After the construction of the review expert knowledge representation model is completed, the information is indexed into the database: first, a review expert’s content item information is read from the expert database; word semantic network is established based on the word segmentation results and the review expert’s information is extracted. Included feature words; according to the knowledge representation model and using Apache Lucene to build an index; add the established index to the corresponding index library according to the category until all the review expert indexes are put into the library, see Figure 5.
步骤6:根据项目的个数,推荐方式分为单一待审项目推荐专家和分组(多个)待审项目推荐专家。分组推荐专家对步骤5的待审项目知识表示模型做相应的字段间和项目间的特征合并操作,单一待审专家推荐只做相应的字段间特征合并操作。同时,对步骤5的评审专家的知识表示模型进行字段间特征合并。依据知识表示模型并利用Apache Lucene对合并后的特征信息建立索引。其中,科技项目索引构建在进行项目推荐时进行。Step 6: According to the number of projects, the recommendation method is divided into single pending project recommendation experts and group (multiple) pending project recommendation experts. Experts recommended by groups perform the corresponding inter-field and inter-item feature merging operations on the pending project knowledge representation model in step 5, and a single pending expert recommendation only performs the corresponding inter-field feature merging operation. At the same time, the inter-field features are combined for the knowledge representation model of the review experts in step 5. Based on the knowledge representation model and using Apache Lucene to index the merged feature information. Wherein, the science and technology project index construction is carried out when performing project recommendation.
科技项目申报管理系统中待审项目往往是需要分组推荐的,上述特征合并操作,确保不会消除步骤5中知识表示模型设置不同字段权重对相似度计算产生推荐的贡献差异。Projects to be reviewed in the technology project application management system often need to be recommended in groups. The above-mentioned feature merging operation ensures that the difference in the contribution of different field weights set by the knowledge representation model in step 5 to the similarity calculation to generate recommendations will not be eliminated.
所述的待审项目、评审专家的特征合并通过逻辑异或操作进行过程如下:The feature merging of the items to be reviewed and the review experts is carried out through the logical XOR operation as follows:
(1)一个待审项目、一个评审专家的字段间特征合并(1) Feature merging between fields of a project to be reviewed and a review expert
假设字段特征词集合W'1和W'2合并,则定义W'1和W'2合并规则为:Assuming that the field feature word sets W' 1 and W' 2 are merged, then define the merging rules of W' 1 and W' 2 for:
其中,word1i,word2j为特征词。Among them, word 1i and word 2j are feature words.
加入字段权重改进并扩展上述定义,对评审专家、科技项目的字段间特征进行合并,合并规则为:Add field weights to improve and expand the above definition, and combine the characteristics of review experts and technology projects between fields. The merging rules are:
(2)分组待审项目的项目间特征合并(2) Inter-item feature merging of group pending items
这一合并过程操作只针对待审科技项目的特征向量,不针对评审专家特征向量,专家特征向量只需要做字段间特征合并操作。若V(d1)和V(d2)分别是两个科技项目经过字段间特征合并后的向量模型,对任意t1j∈V(d1),t2j∈V(d2),若存在t1j与t2j相同则合并。定义为:This merging process operation is only for the feature vectors of pending scientific and technological projects, not for the feature vectors of review experts, who only need to perform feature merging operations between fields. If V(d 1 ) and V(d 2 ) are respectively the vector models of two scientific and technological items after inter-field feature merging, for any t 1j ∈ V(d 1 ), t 2j ∈ V(d 2 ), if there is Merge if t 1j is the same as t 2j . defined as:
其中,k=1,…,n,tk为特征词条项,wk(p)为tk的权重。Wherein, k=1,...,n, t k is a feature term item, and w k (p) is the weight of t k .
科技项目组的知识模型表示产生的基本过程如下:The basic process of generating the knowledge model representation of the science and technology project team is as follows:
a).合并科技项目字段间特征,得到每个项目的向量模型V(d);a). Merge the inter-field features of science and technology items to obtain the vector model V(d) of each item;
b).将所有科技项目向量模型集合采用合并策略通过上述的方法,对科技项目组建立基于向量空间的知识表示模型。b). Use the merge strategy for all vector model collections of science and technology items Through the above method, a knowledge representation model based on vector space is established for the science and technology project team.
V(p)={<t1,w1(p)>,<t2,w2(p)>,...,<tn,wn(p)>}V(p)={<t 1 ,w 1 (p)>,<t 2 ,w 2 (p)>,...,<t n ,w n (p)>}
其中,k=1,…,n,tk为项目组特征词词条项,wk(p)为tk的权重。Among them, k=1,...,n, t k is the feature word entry of the item group, and w k (p) is the weight of t k .
步骤7.经过步骤6的评审专家和科技项目的知识表示模型的字段间特征进行合并,假设评审专家信息向量若表示为P={s1,f(s1),s2,f(s2),...,sn,f(sn)},科技项目信息(组)向量表示为Q={t1,f(t1),t2,f(t2),...,tn,f(tn)},基于最大匹配算法计算待审科技项目(组)向量与评审专家的语义相似度。Step 7. After step 6, the review experts and the inter-field features of the knowledge representation model of the scientific and technological project are merged, assuming that if the review expert information vector is expressed as P={s 1 ,f(s 1 ),s 2 ,f(s 2 ),...,s n ,f(s n )}, the vector of technology project information (group) is expressed as Q={t 1 ,f(t 1 ),t 2 ,f(t 2 ),..., t n ,f(t n )}, based on the maximum matching algorithm, calculate the semantic similarity between the vector of the technology project (group) pending review and the review experts.
所述待审科技项目(组)向量与评审专家向量的基于二部图最大匹配算法计算语义相似度计算过程如下:The calculation process of the semantic similarity calculation based on the bipartite graph maximum matching algorithm between the pending science and technology project (group) vector and the review expert vector is as follows:
基于最大匹配算法计算语义相似度,就是获得两个文本的采用基于二部图的最大匹配算法相似度。如图2所示,基于二部图的最大匹配算法计算特征项的相似度,其原理就是把科技项目(组)向量的每个特征词作为X部的一个顶点,评审专家向量的每个特征词作为Y部的一个顶点,等效为求一个完备二部图的最大权匹配,附图2中粗线部分就是X部特征词语与某个Y部特征词最大的语义相似度。The calculation of semantic similarity based on the maximum matching algorithm is to obtain the similarity of two texts using the maximum matching algorithm based on bipartite graph. As shown in Figure 2, the maximum matching algorithm based on the bipartite graph calculates the similarity of feature items. The principle is to use each feature word of the science and technology item (group) vector as a vertex of the X part, and review each feature of the expert vector As a vertex of part Y, a word is equivalent to seeking the maximum weight matching of a complete bipartite graph. The thick line in Figure 2 is the maximum semantic similarity between a feature word of part X and a feature word of part Y.
所谓语义相似度,就是基于知网的相似度计算获得的。本发明借助知网语义词典和最大匹配算法计算待审项目(组)和评审专家间的语义相似度,则计算公式为:The so-called semantic similarity is calculated based on the similarity of HowNet. The present invention calculates the semantic similarity between the pending project (group) and the review experts by means of the HowNet semantic dictionary and the maximum matching algorithm, and the calculation formula is:
其中,si,tj为语义相似度最大值SimSEM(si,tj)的边(图2中粗线)所对应的两个词语节点,m,n分别为科技项目向量表示的特征词个数和评审专家向量表示的特征词个数。p为语义相似度最大的边(图2中粗线)的数目。Among them, s i , t j are the two word nodes corresponding to the edge (thick line in Figure 2) of the maximum value of semantic similarity SimSEM(s i , t j ), m, n are the characteristic words represented by the vector of the scientific and technological project and the number of feature words represented by the review expert vector. p is the number of edges (thick lines in Figure 2) with the largest semantic similarity.
上述待审项目(组)与评审专家信息的语义相似度涉及到语言、词语语义、词语结构等多种因素,它表示两者的匹配程度,相似度大,说明两者匹配度高,评审专家适合评审该项目(组)。The semantic similarity between the above items (groups) to be reviewed and the review expert information involves various factors such as language, word semantics, and word structure. Suitable for reviewing the project (group).
步骤8.设置相似度截断,依据相似度的大小产生推荐指数,产生最终的推荐评审专家列表。Step 8. Set the similarity cutoff, generate the recommendation index according to the similarity, and generate the final recommendation review expert list.
以上所述仅是本发明的优选实施方式,应当指出,对于科技项目评审专家领域的智能机器推荐技术,在不脱离本发明技术原理的前提下,还可以做出若干改进和变形,这些改进和变形也应该视为本发明的法律保护范围。The above is only a preferred embodiment of the present invention. It should be pointed out that for the intelligent machine recommendation technology in the field of scientific and technological project review experts, without departing from the technical principle of the present invention, some improvements and deformations can also be made. These improvements and Deformation should also be regarded as the scope of legal protection of the present invention.
Claims (3)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310509358.2A CN103631859B (en) | 2013-10-24 | 2013-10-24 | Intelligent review expert recommending method for science and technology projects |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310509358.2A CN103631859B (en) | 2013-10-24 | 2013-10-24 | Intelligent review expert recommending method for science and technology projects |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103631859A CN103631859A (en) | 2014-03-12 |
CN103631859B true CN103631859B (en) | 2017-01-11 |
Family
ID=50212901
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310509358.2A Expired - Fee Related CN103631859B (en) | 2013-10-24 | 2013-10-24 | Intelligent review expert recommending method for science and technology projects |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103631859B (en) |
Families Citing this family (48)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103823896B (en) * | 2014-03-13 | 2017-02-15 | 蚌埠医学院 | Subject characteristic value algorithm and subject characteristic value algorithm-based project evaluation expert recommendation algorithm |
CN104361102B (en) * | 2014-11-24 | 2018-05-11 | 清华大学 | A kind of expert recommendation method and system based on group matches |
US20160203140A1 (en) * | 2015-01-14 | 2016-07-14 | General Electric Company | Method, system, and user interface for expert search based on case resolution logs |
CN105912581A (en) * | 2016-03-31 | 2016-08-31 | 比美特医护在线(北京)科技有限公司 | Information processing method and device |
CN107194672B (en) * | 2016-11-09 | 2021-07-13 | 北京理工大学 | A review assignment method that integrates academic expertise and social network |
CN108427667B (en) * | 2017-02-15 | 2021-08-10 | 北京国双科技有限公司 | Legal document segmentation method and device |
CN107229738B (en) * | 2017-06-18 | 2020-04-03 | 杭州电子科技大学 | A search and ranking method of academic papers based on document scoring model and relevance |
CN107609006B (en) * | 2017-07-24 | 2021-01-29 | 华中师范大学 | Search optimization method based on local log research |
CN107656920B (en) * | 2017-09-14 | 2020-12-18 | 杭州电子科技大学 | A patent-based recommendation method for scientific and technological talents |
CN107784087B (en) * | 2017-10-09 | 2020-11-06 | 东软集团股份有限公司 | Hot word determination method, device and equipment |
CN107807978B (en) * | 2017-10-26 | 2021-07-06 | 北京航空航天大学 | A Code Reviewer Recommendation Method Based on Collaborative Filtering |
CN108229684B (en) * | 2018-01-26 | 2022-04-15 | 中国科学技术信息研究所 | Method and device for constructing expert knowledge vector model and terminal equipment |
CN108399491B (en) * | 2018-02-02 | 2021-10-29 | 浙江工业大学 | A Network Graph Based Employee Diversity Ranking Method |
CN108804633B (en) * | 2018-06-01 | 2021-10-08 | 腾讯科技(深圳)有限公司 | Content recommendation method based on behavior semantic knowledge network |
CN108846056B (en) * | 2018-06-01 | 2021-04-23 | 云南电网有限责任公司电力科学研究院 | Method and device for expert recommendation of scientific and technological achievement evaluation |
CN108549730A (en) * | 2018-06-01 | 2018-09-18 | 云南电网有限责任公司电力科学研究院 | A kind of search method and device of expert info |
CN108920556B (en) * | 2018-06-20 | 2021-11-19 | 华东师范大学 | Expert recommending method based on discipline knowledge graph |
CN108873706B (en) * | 2018-07-30 | 2022-04-15 | 中国石油化工股份有限公司 | Trap evaluation intelligent expert recommendation method based on deep neural network |
CN109308315B (en) * | 2018-10-19 | 2022-09-16 | 南京理工大学 | A Collaborative Recommendation Method Based on Expert Domain Similarity and Association |
CN109857872A (en) * | 2019-02-18 | 2019-06-07 | 浪潮软件集团有限公司 | The information recommendation method and device of knowledge based map |
CN109992642B (en) * | 2019-03-29 | 2022-11-18 | 华南理工大学 | A single-task expert automatic selection method and system based on scientific and technological entries |
CN110046225B (en) * | 2019-04-16 | 2020-11-24 | 广东省科技基础条件平台中心 | Scientific and technological project material integrity assessment decision model training method |
CN112182327B (en) * | 2019-07-05 | 2024-06-14 | 北京猎户星空科技有限公司 | Data processing method, device, equipment and medium |
CN110442618B (en) * | 2019-07-25 | 2023-04-18 | 昆明理工大学 | Convolutional neural network review expert recommendation method fusing expert information association relation |
CN110443574B (en) * | 2019-07-25 | 2023-04-07 | 昆明理工大学 | Recommendation method for multi-project convolutional neural network review experts |
CN111143690A (en) * | 2019-12-31 | 2020-05-12 | 中国电子科技集团公司信息科学研究院 | Expert recommendation method and system based on associated expert database |
CN111598526B (en) * | 2020-04-21 | 2023-02-03 | 奇计(江苏)科技服务有限公司 | Intelligent comparison review method for describing scientific and technological innovation content |
CN111666420B (en) * | 2020-05-29 | 2021-02-26 | 华东师范大学 | Method for intensively extracting experts based on subject knowledge graph |
CN111951141A (en) * | 2020-07-09 | 2020-11-17 | 广东港鑫科技有限公司 | Double random supervision method, system and terminal equipment based on big data intelligent analysis |
CN111782797A (en) * | 2020-07-13 | 2020-10-16 | 贵州省科技信息中心 | Automatic matching method for scientific and technological project review experts and storage medium |
CN112100370B (en) * | 2020-08-10 | 2023-07-25 | 淮阴工学院 | A recommendation method based on image review expert combination based on text convolution and similarity algorithm |
CN112287679A (en) * | 2020-10-16 | 2021-01-29 | 国网江西省电力有限公司电力科学研究院 | Structured extraction method and system for text information in scientific and technological project review |
CN112381381B (en) * | 2020-11-12 | 2023-11-17 | 深圳供电局有限公司 | Expert's device is recommended to intelligence |
CN112487260A (en) * | 2020-12-07 | 2021-03-12 | 上海市研发公共服务平台管理中心 | Instrument project declaration and review expert matching method, device, equipment and medium |
CN112417870A (en) * | 2020-12-10 | 2021-02-26 | 北京中电普华信息技术有限公司 | Expert information screening method and system |
CN112948527B (en) * | 2021-02-23 | 2023-06-16 | 云南大学 | Improved TextRank keyword extraction method and device |
CN113554210A (en) * | 2021-05-17 | 2021-10-26 | 南京工程学院 | Comment scoring and declaration prediction system and method for fund project declaration |
CN113255364B (en) * | 2021-05-28 | 2025-02-11 | 华斌 | Machine integration method of multiple expert opinions in government information projects based on knowledge fusion |
CN113516094B (en) * | 2021-07-28 | 2024-03-08 | 中国科学院计算技术研究所 | System and method for matching and evaluating expert for document |
CN113569575B (en) * | 2021-08-10 | 2024-02-09 | 云南电网有限责任公司电力科学研究院 | Evaluation expert recommendation method based on pictographic-semantic dual-feature space mapping |
CN113643008A (en) * | 2021-10-15 | 2021-11-12 | 中国铁道科学研究院集团有限公司科学技术信息研究所 | Acceptance expert matching method, device, equipment and readable storage medium |
CN114186002A (en) * | 2021-12-14 | 2022-03-15 | 智博天宫(苏州)人工智能产业研究院有限公司 | Scientific and technological achievement data processing and analyzing method and system |
CN115033772B (en) * | 2022-06-20 | 2024-06-21 | 浙江大学 | Creative excitation method and device based on semantic network |
CN115577696B (en) * | 2022-11-15 | 2023-04-07 | 四川省公路规划勘察设计研究院有限公司 | A project similarity evaluation and analysis method based on WBS tree |
CN116303642A (en) * | 2023-02-07 | 2023-06-23 | 中国计量科学研究院 | A method and device for selecting and avoiding test experts in the test of scientific and technological achievements |
CN117093670A (en) * | 2023-07-18 | 2023-11-21 | 北京智信佳科技有限公司 | Method for realizing intelligent recommending expert in paper |
CN117034273A (en) * | 2023-08-28 | 2023-11-10 | 山东省计算中心(国家超级计算济南中心) | Android malicious software detection method and system based on graph rolling network |
CN117131279A (en) * | 2023-09-13 | 2023-11-28 | 合肥工业大学 | Data processing method and device for expert recommendation |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101075942A (en) * | 2007-06-22 | 2007-11-21 | 清华大学 | Method and system for processing social network expert information based on expert value progation algorithm |
CN102495860A (en) * | 2011-11-22 | 2012-06-13 | 北京大学 | Expert recommendation method based on language model |
CN102855241A (en) * | 2011-06-28 | 2013-01-02 | 上海迈辉信息技术有限公司 | Multi-index expert suggestion system and realization method thereof |
CN102880657A (en) * | 2012-08-31 | 2013-01-16 | 电子科技大学 | Searcher-Based Expert Recommendation Method |
-
2013
- 2013-10-24 CN CN201310509358.2A patent/CN103631859B/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101075942A (en) * | 2007-06-22 | 2007-11-21 | 清华大学 | Method and system for processing social network expert information based on expert value progation algorithm |
CN102855241A (en) * | 2011-06-28 | 2013-01-02 | 上海迈辉信息技术有限公司 | Multi-index expert suggestion system and realization method thereof |
CN102495860A (en) * | 2011-11-22 | 2012-06-13 | 北京大学 | Expert recommendation method based on language model |
CN102880657A (en) * | 2012-08-31 | 2013-01-16 | 电子科技大学 | Searcher-Based Expert Recommendation Method |
Non-Patent Citations (1)
Title |
---|
科技项目评审专家推荐系统的研究与实现;胡斌;《中国优秀硕士学位论文全文数据库(信息科技辑)》;20130715(第7期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN103631859A (en) | 2014-03-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103631859B (en) | Intelligent review expert recommending method for science and technology projects | |
CN113239181B (en) | Scientific and technological literature citation recommendation method based on deep learning | |
CN103605665B (en) | Keyword based evaluation expert intelligent search and recommendation method | |
CN111966917B (en) | Event detection and summarization method based on pre-training language model | |
Gautam et al. | Sentiment analysis of twitter data using machine learning approaches and semantic analysis | |
CN102591988B (en) | Short text classification method based on semantic graphs | |
CN106095759B (en) | A kind of invoice cargo classifying method based on heuristic rule | |
CN110427623A (en) | Semi-structured document Knowledge Extraction Method, device, electronic equipment and storage medium | |
CN107491531A (en) | Chinese network comment sensibility classification method based on integrated study framework | |
CN105468713A (en) | Multi-model fused short text classification method | |
CN107357837A (en) | The electric business excavated based on order-preserving submatrix and Frequent episodes comments on sensibility classification method | |
CN108388660A (en) | A kind of improved electric business product pain spot analysis method | |
Romanov et al. | Application of natural language processing algorithms to the task of automatic classification of Russian scientific texts | |
CN108804651A (en) | A kind of Social behaviors detection method based on reinforcing Bayes's classification | |
Saranya et al. | Onto-based sentiment classification using machine learning techniques | |
CN105955975A (en) | Knowledge recommendation method for academic literature | |
CN108304509A (en) | A kind of comment spam filter method for indicating mutually to learn based on the multidirectional amount of text | |
Nejad et al. | A combination of frequent pattern mining and graph traversal approaches for aspect elicitation in customer reviews | |
Zhao et al. | Sentimental prediction model of personality based on CNN-LSTM in a social media environment | |
Pouromid et al. | ParsBERT post-training for sentiment analysis of tweets concerning stock market | |
Al-Hagree et al. | Arabic sentiment analysis on mobile applications using Levenshtein distance algorithm and naive Bayes | |
Zhao et al. | POS-ATAEPE-BiLSTM: an aspect-based sentiment analysis algorithm considering part-of-speech embedding | |
Lu et al. | [Retracted] A Deep Learning‐Based Text Classification of Adverse Nursing Events | |
Jin et al. | Representation and extraction of diesel engine maintenance knowledge graph with bidirectional relations based on BERT and the Bi-LSTM-CRF model | |
CN116933164A (en) | Medical and health service demand classification method based on similarity |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
EE01 | Entry into force of recordation of patent licensing contract | ||
EE01 | Entry into force of recordation of patent licensing contract |
Application publication date: 20140312 Assignee: Hangzhou eddy current technology Co.,Ltd. Assignor: HANGZHOU DIANZI University Contract record no.: X2020330000008 Denomination of invention: Intelligent review expert recommending method for science and technology projects Granted publication date: 20170111 License type: Common License Record date: 20200117 |
|
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20170111 |