[go: up one dir, main page]

TW200933403A - Expert finding system and method - Google Patents

Expert finding system and method Download PDF

Info

Publication number
TW200933403A
TW200933403A TW97101630A TW97101630A TW200933403A TW 200933403 A TW200933403 A TW 200933403A TW 97101630 A TW97101630 A TW 97101630A TW 97101630 A TW97101630 A TW 97101630A TW 200933403 A TW200933403 A TW 200933403A
Authority
TW
Taiwan
Prior art keywords
document
citation
network
cited
records
Prior art date
Application number
TW97101630A
Other languages
Chinese (zh)
Other versions
TWI352913B (en
Inventor
Hahn-Ming Lee
Jan-Ming Ho
Kai-Hsiang Yang
Chia-Ching Chou
Shui-Shi Chen
Original Assignee
Univ Nat Taiwan Science Tech
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Univ Nat Taiwan Science Tech filed Critical Univ Nat Taiwan Science Tech
Priority to TW97101630A priority Critical patent/TWI352913B/en
Publication of TW200933403A publication Critical patent/TW200933403A/en
Application granted granted Critical
Publication of TWI352913B publication Critical patent/TWI352913B/en

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

An expert finding method including the followed steps is provided. Firstly, a citation database with a plurality of citation record is provided. Next, an importance degree of each citation record is estimated. Thereafter, build a language model according to the attribute data of each citation record such that a language model database is formed. Afterwards, the language model database is searched by a user with an inputted keyword such that a citation list is generated. Thereafter, the authors of the citation list are sorted by the importance degree of each citation record such that an expert list is outputted to the user.

Description

200933403 06twf.doc/n 九、發明說明: 【發明所屬之技術領域】 統及方法,且特別 本發明是有關於一種資料搜尋的系 是有關於一種專家搜尋的系統及方法。 【先前技術】 現實生活中’自動化搜尋特定領域專家是一個非常迫 切關題。傳統解決此問題㈣統或方法,必須預先建立 -領域專家資料庫,而建立領域專家資料庫必需耗費大量 人力。此外,習知的專家搜尋系統及方法,只著重在依據 一個專豕候選人的名字出現在相關主題文件的次數來決 定候選人是否具有符合給定主題的專長。如此一來,所搜 尋到的專家可能不是最適合的。 在傳統上,搜尋特定領域專家的方法大致分為三種, 分別是文件特徵比對方法、系統檢索方法及知識管理方法: 文件特徵方法,係利用文件探勘的方法,擷取出代表 〇 一文件特徵的向量,並比對兩份文件特徵向量的相似度; 如中華民國專利公告號1268437所提出的文件比對系統及 文件比對方法’是—種透過比對文件資料特徵值分析文件 相似度的專利’其利用文件探勘的技術,萃取出代表該份 文件之特徵值’再利用特徵值進行文件相似度比對。然而, 此方法必須事先建立大量的文件樣本並且標記其所屬主 題,且無法藉此判斷文件品質好壞,故無法用以評估學者 之研究領域及其重要程度。 5 )06twf.doc/n 200933403 絲主要利用設計—完善的文件檢索系 的引索方法建立文件關聯;如中華民國 itim439所提出的資料檢索方法及系統,是一 ϊ 統進行多種資料屬性之檢索作業的專 二種貢料屬性的網路關係。然而,該系統只適 部,但目前專家學者之論文著作大都散佈 料庫中,所以建立-系統來統-管理 ❸ Ο 學者著作並不可行。 姑*知方法主要為設計—完整的知識管理,或是 η ’耩此找出得以解決某項工作所需之知識。如中 華民,專利公告號⑵勘所提出的企料屬搜尋引擎, 主要疋利用知識框架的方式呈現企業所 利用概念查詢的方式找到所需之資訊。然而,這 ==需求無關之資訊’而非用來解決搜尋特 【發明内容】 本發月&amp;供種專豕搜尋系統及方法,能夠屬於某 家作品n與權雜來狀專家們的 ^ 者搜尋到特定領域中最適合的專家 ^職用 本發明提出-種專家搜尋系統,包括 庫以及:賴專轉合單元。引敎歸料庫儲 用文獻f錄。這些〗丨用文獻記錄分別包含多個屬性資料, 且這些屬性資料包含—作者名稱。領域專家謀合單元連接 6 06twf.doc/n 200933403 至引用文獻資料庫。領域專家謀合單元包括一重要性評估 器、-語言模型建立器、一語言模型資料庫、一關鍵字謀 合器以及-謀合結果整合器。重要性評估器用以評估各個 引用文獻的-重要程度。語言模型建立器依據這些屬性資 料’建立這些引用文獻的一語言模型。語言模型資料庫儲 存語言模型建立器所建立的語言模型。關鍵字謀合器依據 使用者輸入之-關鍵字,搜尋語言模型資料庫,藉以產生 ❾ 與關鍵子相關之—?丨用讀列表。謀合結果整合器依據各 ,引用文獻記錄的重要程度,排序引用文獻列表所對應的 這些作者名稱’藉以輸出一專家列表給使用者。 在本發明之一實施例中,專家搜尋系統更包括一弓丨用 文獻解析單元。4峨專家齡單元經由個文獻解析單元 連接至引用文獻資料庫。引用文獻解析單元從各個引用文 獻S己錄中,擷取出這些屬性資料。 在本發明之一實施例中,專家搜尋系統更包括一弓丨用 絲網路過濾單元’連接於引用文獻資料庫。引用文獻網 ◎ 路過渡單元依據這些屬性資料的一相關程度,建立這些引 用文獻記錄間的多個鏈結,藉以產生這些引用文獻記錄的 一關係網路。關係網路包含多個群組。 在本發明之一實施例中,引用文獻網路過濾單元包括 Γ屬性相依度計算器,分別對各個引用文獻’計算各屬性 資料間的相關程度。 一在本發明之—實施例中,引用文獻網路過濾單元包括 權重圖產生器以及一弱鏈結過濾器。權重圖產生器用以 200933403 06twf.doc/n 建立關係網路的一引用文獻群聚圖,並計算出引用文獻群 聚圖中各個鏈結的一鏈節強度。弱鏈結過濾器比對這些鏈 結強度是否位於一預設範圍,藉以濾除這些鏈結強度在預 设範圍外的這些鍵結。 在本發明之一實施例中,引用文獻網路過濾單元,包 括一社群網路中心搜尋器以及一關係向量產生器。社群網 路中心搜尋器標記引用文獻群聚圖中,每一這些群組之社 〇 =網路中心。關係向量產生器根據社群網路中心搜尋器的 標記結果,產生同族群與不同族群的關係向量。 在本發明之一實施例中,專家搜尋系統,更包括一引 用文獻關係分析單元。引用文獻關係分析單元依據這些屬 性資料,配合一社群網路分析方式,分析關係網路中這些 亏丨用文獻記錄的關係。 在本發明之一實施例中,引用文獻關係分析單元包括 群中心向量產生器、一引用文獻關係網路資料庫、一引 ❹ 用文獻關聯建立器以及一引用文獻類別標記器。群中心向 量產生器’接收一新引用文獻記錄,並產生新引用文獻記 条f關係網路中其他群組之一關係向量。引用文獻關係網 路資料庫用以儲存關係網路。引用文獻關聯建立器將新引 獻記錄加入弓丨用文獻關係網路之中。引用文獻類別標 5己盗標記關係網路中每一個引用文獻記錄所屬之群組。 在本發明之一實施例中,專家搜尋系統更包括一引用 文獻分辨單元’藉由用以判斷兩引用文獻在關係網路中為 1才目關或是負相關。 8 200933403 06twf.doc/n 在本發明之-實施财1収獻分 一 兀分類器以及一預測結果產生器。二匕括: 分類模型。預測結果產生器利用二元八^ 立一一兀 用文獻記錄間在關係網路中是否有一鏈钟存在判斷兩引 在本發明之-實施例中,領域專家^單丄, 一關鍵字擴展器。關鍵字擴展器具有一 . ^ Α ,、节關鍵子列表。關鍵 ❹ Ο 子列表包含對應多個領域的多個領域關鍵字,且關 展益用以將關鍵字擴展成這些領域關鍵字其中之一。' 在本發明之一實施例中,這些屬性資料更包含這些引 用文獻的標題、發表來源以及發表時間中至少其中之一。 次本發明提出一種專家搜尋方法,包括先提供一引用文 獻資料庫。引用文獻資料庫儲存多筆引用文獻記錄,而這 些引用文獻記錄分別包含多個屬性資料,且這些屬性資料 包含一作者名稱。接著,評估各個引用文獻記錄的一重要 程度。之後,依據這些屬性資料,建立這些引用文獻的一 語言模型,藉以產生一語言模型資料庫。然後,依據使用 者f入之一關鍵字,搜尋語言模型資料庫,藉以產生與關 鍵字相關之一引用文獻列表 。再者,依據各個引用文獻記 錄的重要程度’排序引用文獻列表所對應的這些作者名 稱,藉以輪出一專家列表給使用者。 在本發明之一實施例中,在提供引用文獻資料庫的步 驟之後’更包括分別從這些引用文獻記錄中,擷取出這些 屬性資料。 在本發明之一實施例中,在擷取出這些屬性資料之 9 200933403 06twf.doc/n 後’更包括先計算這些屬性資料的相似度。接著,依據這 些屬性資料的相似度,建立這些引用文獻記錄間的多個^ 結’藉以產生這些引用文獻記錄的一關係網路,其中關係 網路包含多個群組。 ” 在本發明之一實施例中,在產生關係網路的步驟之 後,更包括先計算各個鏈結之一鏈結強度。然後,比對這 些鏈結強度是否位於一預設範圍。之後,濾除這些鏈結^ 0 度在預設範圍外的這些鏈結。 在本發明之一實施例中,在產生關係網路的步驟之 後’更包括建立關係網路的一引用文獻群聚圖。 在本發明之一實施例中,在建立引用文獻群聚圖的步 驟之後,更包括標記引用文獻群聚圖中,各個群組之社^ 網路中心。 在本發明之一實施例中,在產生關係網路的步驟之 後,更包括先接收一新引用文獻記錄。之後,利用一一元 分類模型,判斷新引用文獻記錄與在關係網路中,是否與 ❹ 這些引用文獻記錄有一鏈結存在。 、 在本發明之一實施例中,依據關鍵字,搜尋語言模型 資料庫的步驟,包括先提供一關鍵字列表,其中關^字列 表包含對應多個領域的多個領域關鍵字。接著,將關鍵字 擴展成這些領域關鍵字其中之―。然後,依據所擴展的領 域關鍵字’搜尋語言語言模型資料庫。 本發明之專家搜尋系統與方法,依據引用文獻的重要 程度來判斷專家們的專長,而可讓使用者搜尋到某領域中 06twf.doc/n 200933403 最適合的專家。並且, 析方法,自動分析網際網;可利用社群網路分 無需依賴預先建立之領域專家㈣文獻’所以 尋一特定領域專家之問題。’,即可解決自動化搜 為讓本發明之上述特徵和優點能 舉多個實施例’並配合所附圖式,作詳細說日'文特 【實施方式】 下列各實施例之專家搜尋魏及方法,依 品質與權烕性來狀專家們的專長,讓= 可利用社群網路分析方法分析網際網路上==還 ===研究成果證據,所以無需依 專豕貝料庫,即可解決自動化搜尋—特定領域專家之問200933403 06twf.doc/n IX. INSTRUCTIONS: [Technical field to which the invention pertains] The method and system, and in particular, the invention relates to a system for searching for data. [Prior Art] Real-time search for experts in specific fields is a very pressing issue. Traditionally, to solve this problem (4) system or method, it is necessary to establish a database of domain experts in advance, and the establishment of a domain expert database requires a lot of manpower. In addition, conventional expert search systems and methods focus only on the number of times a candidate appears in a related topic file to determine whether a candidate has expertise in a given topic. As a result, the experts searched for may not be the most suitable. Traditionally, methods for searching experts in specific fields are roughly divided into three types, namely, file feature comparison method, system search method, and knowledge management method: file feature method, which uses file exploration method to extract the characteristics of the file. Vector, and compares the similarity of the feature vectors of the two documents; as in the document comparison system and the document comparison method proposed by the Republic of China Patent Publication No. 1268437, it is a patent that analyzes the similarity of documents by comparing the characteristic values of the document data. 'It uses the technique of document exploration to extract the feature value of the document' and reuse the feature value for file similarity comparison. However, this method must establish a large number of file samples in advance and mark the subject to which it belongs, and cannot judge the quality of the file by this, so it cannot be used to evaluate the research field of the scholar and its importance. 5) 06twf.doc/n 200933403 Silk mainly uses the design---------------------------------------------- The network relationship of the two special tributary properties. However, the system is only suitable, but most of the papers by experts and scholars are scattered in the database, so it is not feasible to establish a system-system-administration ❸ 学者 scholar's work. The method of knowing is mainly design-complete knowledge management, or η' to find out the knowledge needed to solve a certain job. For example, in China, the patent proposal (2) surveyed the enterprise as a search engine, mainly using the knowledge framework to present the information used by the company to find the information needed. However, this == demand-independent information' is not used to solve the search special [invention content] This month's &amp; seed-specific search system and method can belong to a certain work n and the right to the experts The search for the most suitable expert in a particular field. The present invention proposes an expert search system, including a library and a dedicated transfer unit. The 敎 敎 敎 储 储 储 。 。 。 。. These document records contain multiple attribute data, and these attribute data contain the author name. Domain experts seek unit connections 6 06twf.doc/n 200933403 to the cited literature database. The domain expert engagement unit includes an importance evaluator, a language model builder, a language model database, a keyword actor, and a result integration integrator. The importance estimator is used to assess the importance of each citation. The language model builder builds a language model of these citations based on these attribute data. The language model database stores the language model established by the language model builder. The keyword combiner searches for the language model database based on the user-entered keyword, thereby generating ❾ related to the key sub-? Use the read list. The result integration integrator sorts the names of the authors corresponding to the list of cited documents by the importance of each of the cited documents, thereby outputting a list of experts to the user. In an embodiment of the invention, the expert search system further includes a document parsing unit. The 4峨 expert age unit is connected to the cited literature database via a document analysis unit. The citation analysis unit extracts these attribute data from the respective citation records. In one embodiment of the invention, the expert search system further includes a bow screen filter unit&apos; coupled to the citation database. The road transition unit establishes a plurality of links between the cited document records based on a degree of correlation of the attribute data, thereby generating a relational network of the cited document records. A network of relationships consists of multiple groups. In an embodiment of the present invention, the citations network filtering unit includes a Γ attribute Dependency Calculator for calculating the degree of correlation between the respective attribute data for each of the cited documents. In an embodiment of the invention, the citation network filtering unit comprises a weight map generator and a weak chain filter. The weight map generator uses 200933403 06twf.doc/n to establish a citation map of the reference network and calculate the strength of a link of each link in the citation map. The weak link filter compares the strength of these links to a predetermined range to filter out these bonds outside of the preset range. In one embodiment of the invention, the citation network filtering unit includes a social network center searcher and a relationship vector generator. The Community Network Center Finder tag references the document clusters, and the community of each of these groups = Network Center. The relationship vector generator generates a relationship vector of the same group and different groups according to the tag result of the social network center searcher. In one embodiment of the invention, the expert search system further includes a reference document relationship analysis unit. Based on these attribute data, the Citation Analysis Unit analyzes the relationship between these deficient document records in the relational network in conjunction with a social network analysis method. In one embodiment of the invention, the citation document relationship analysis unit includes a group center vector generator, a citation document relationship network database, a citation document association builder, and a citation document class marker. The group center vector generator&apos; receives a new citation document record and generates a new citation document to correlate the relationship vector of one of the other groups in the network. The Citational Relational Network Database is used to store relational networks. The Citation Correlation Builder adds new citation records to the literature network. The citing category category 5 pirate marks the group to which each cited document record belongs. In an embodiment of the invention, the expert search system further includes a reference document resolution unit </ RTI> for determining whether the two citations are related or negatively correlated in the relational network. 8 200933403 06twf.doc/n In the present invention, the implementation of the revenue 1 is divided into a classifier and a prediction result generator. Two: The classification model. The predictive result generator uses a binary arbitrarily one to use whether there is a chain clock in the relational network between the documents and records. In the embodiment of the present invention, the domain expert ^ single 丄, a keyword expander . The keyword expander has a . ^ Α , and a key sub-list of sections. The key ❹ Ο sublist contains multiple domain keywords for multiple domains, and the benefits are used to extend the keywords into one of these domain keywords. In an embodiment of the present invention, the attribute data further includes at least one of a title, a publication source, and a publication time of the reference documents. The present invention proposes an expert search method, which first provides a reference literature database. The citation database stores a plurality of citation records, each of which contains a plurality of attribute data, and the attribute data includes an author name. Next, an important degree of evaluation of each citation record is evaluated. Then, based on these attribute data, a language model of these cited documents is established to generate a language model database. Then, based on the user f into one of the keywords, the language model database is searched to generate a list of cited documents related to the keyword. Furthermore, the names of the authors corresponding to the list of citations are sorted according to the degree of importance of each citation record, whereby a list of experts is rotated to the user. In an embodiment of the invention, after the step of providing a reference to the document database, the invention further includes extracting the attribute data from the cited document records. In an embodiment of the present invention, after extracting the attribute data 9 200933403 06twf.doc/n, the degree of similarity of the attribute data is first calculated. Then, based on the similarity of the attribute data, a plurality of associations between the citation records are created to generate a relational network of the citation records, wherein the relationship network comprises a plurality of groups. In an embodiment of the present invention, after the step of generating a relational network, the method further comprises: first calculating a link strength of each link. Then, comparing whether the link strengths are within a predetermined range. In addition to these links, the links are outside the preset range. In one embodiment of the invention, after the step of generating the relational network, a more citation of a citation of the relational network is established. In an embodiment of the present invention, after the step of establishing a citation map of the document, the method further comprises: marking the social network center of each group in the citation of the document cluster. In an embodiment of the present invention, generating After the step of the network, the first step is to receive a new citation record. Then, using the one-to-one classification model, it is judged whether the new citation record and the relational network have a link with the citation record. In an embodiment of the present invention, the step of searching for a language model database according to a keyword includes first providing a keyword list, where the keyword list includes multiple fields. Multiple domain keywords. Next, the keywords are expanded into these domain keywords. Then, based on the extended domain keywords 'search language language model database. The expert search system and method of the present invention, based on citations The importance of judging the expertise of the experts, and allowing users to search for the most suitable experts in a field 06twf.doc/n 200933403. And, the analysis method, automatic analysis of the Internet; the use of social network points without dependence Pre-established domain experts (4) Literature 'so look for a problem in a specific field of experts.', can solve the above-mentioned features and advantages of the present invention in order to make the above-mentioned features and advantages of the present invention日文文 [Implementation] Experts in the following examples search for Wei and methods, based on the quality and power of the experts, let = use the social network analysis method to analyze the Internet == also = == Research evidence, so you don't have to rely on a dedicated library to solve automated searches - experts in specific fields

圖1為本發明-實施例的專家搜尋系統的示意圖 Α為本發明一實施例之引用文獻記錄的示意圖,圖2Β 圖2A之引用文獻記錄的屬性資料示意圖。請參考圖1、^ A與圖2B’專家搜尋系統1〇〇包括一引用文獻資料庫工 以,一領域專家謀合單元120。引用文獻資料庫11〇儲存 多筆引用文獻記錄R1〜R2。這些引用文獻記錄R1〜R2分别 包含多個屬性資料A1〜A4。在本實施例中,這些屬性資料 A1〜Μ包含引用文獻的作者名稱A1、標題A2、發表來源 Α3 (如書目、期刊名稱等)以及發表時間Α4 (如文獻發 11 〇6twf.doc/n1 is a schematic diagram of an expert search system according to an embodiment of the present invention. Α is a schematic diagram of a cited document record according to an embodiment of the present invention, and FIG. 2 is a schematic diagram of attribute data recorded by the cited document of FIG. 2A. Please refer to FIG. 1, FIG. 2A and FIG. 2B'. The expert search system 1 includes a reference document database, and a domain expert combines the unit 120. Cited literature database 11〇Storage Multiple cited literature records R1~R2. These cited document records R1 to R2 respectively contain a plurality of attribute data A1 to A4. In the present embodiment, these attribute data A1~Μ contain the author name A1 of the cited document, the title A2, the publication source Α3 (e.g., bibliography, journal name, etc.) and the publication time Α4 (if the document is issued 11 〇 6twf.doc/n

二語言模型建立器124依據這些屬性資料A1〜A4,建立 ^些引用文獻的-語言模型。語言模型資料庫126儲存語 β模型建立器所建立的語言模型。詳細而言,語言模型建 立器124可利用統計引用文獻之關鍵字的方式來建立語言 模型,以作為比對搜尋之用。 ° 200933403 表的年、月、日等)。 專家謀合單元120連接至引社獻資料庫110。 7頁域專豕謀合單元12G包括—重要性評估器⑵、一語古 模型建立器124、一姐古楛创杳祖虚° 叩&lt;^模1貧科庫126、„關鍵字謀合器 2以及—謀合結果整合器⑽。重要性評储a?用以評 ^文獻記錄R1〜R2的—重要程度。糊來說,重要性 評估器122可依據公認之期刊的影響程度指標而計算出 引用文獻記錄R1〜R2對應的權重值。並且,亦可將 的權重值對應儲存於引用文獻資料庫11〇中。如此一來, 重^•性評估H 122便可藉由引敎獻資料庫中權重值 的高低來評估引用文獻R1〜R2的重要程度。 圖3為圖1之使用者介面的示意圖。請參考圖〗與圖 3’關鍵字謀合器128可透過使用者介面14〇接收使用者輸 入之一關鍵字,再依據此關鍵字搜尋語言模型資料庫 126。在搜尋完成之後,關鍵字謀合器128便可產生與關鍵 子相關之一引用文獻列表。然後,謀合結果整合器13〇再 依據各個引用文獻記錄R1〜R2的重要程度,排序引用獻 列表所對應的這些作者名稱’藉以輸出^:專 者0 12 200933403 〇6twf.doc/n 例如,使用者透過使用者介面140,輸入一主題之關 鍵字“Fuzzy”。關鍵字謀合器128謀合使用者輸入的關鍵字 與引用文獻語言模型資料庫126中之引用文獻記錄,以找 出與主題相關之引用文獻列表。並且,關鍵字謀合器128 自引用文獻列表的引用文獻記錄找出文獻之作者。謀合結 果130整合器根據謀合結果評估文獻與使用者輸入主題之 謀合程度與文獻之程度統一計算出專家列表,如“陳錫 ❹ 明”、“李嘉晃”等等。此外,主題擴展器140可配合一關鍵 字列表,將使用者輸入的關鍵字加以擴展範圍,以提升系 統的謀合能力《其中,關鍵字列表包含對應多個領域的多 個領域關鍵字。例如將使用者輸入之“graph,,擴展為“graph tiieory” 等。 Ο 從另一個角度來說’依照上述實施例之專家搜尋系統 100的作動流程,可歸納出一專家搜尋方法。圖4為本實 施例所歸納出之專家搜尋方法。請參考圖1與圖4,本實 施例所歸納出的專家搜尋方法,包括下列步驟。首先,提 供引用文獻資料庫110(步驟S310)。接著,評估引用文 獻資料庫110猶叙各個引用文獻記錄 R1〜R2的重要程 二』步? S320 )。之後,依據各個弓丨狀獻記錄ri〜r2 庙資料A1〜八4建立語言模型,藉以產生語言模型資料 拮i26 (=』S33〇)。然後’依據使用者輸入之關鍵字, 文^ 藉產生與關鍵字相關之引用 R1〜沾畲S34〇)。再來’依據各個引用文獻記錄 °度’排序引用文獻列表所對應的這些作者 13 200933403 〇6twf.doc/n 名稱,藉以輸出專家列表給使用者(步驟幻5〇 )。 圖5為本發明另一實施例的專家搜尋系統的示意圖。 =參考圖5,領域專家謀合器260與前述實施例之領域專 豕謀合器110相類似,且專家搜尋系統1〇〇及專家搜尋系 統200與使用者的互動亦相似,在此將不再贅述。本實施 例將針對引用文獻權重資料庫270的建立過程作詳細的說 明。大致來說,引用文獻權重資料庫27()的建立的過程, ❹ 可大致分為一訓練階段與一測試階段。訓練階段主要是利 用現有引用文獻來建立基礎的關係網路。測試階段則是藉 由匯入新引用文獻記錄50來擴展關係網路的範圍。 圖7為圖5的專家搜尋系統在訓練階段的流程圖。請 對照參考圖5與圖7 ’引用文獻資料庫21〇儲存多筆引用 文獻記錄R1〜R2 (見圖2A)。引用文獻資料庫21〇連接 至一引用文獻解析單元220a,以藉由引用文獻解析單元 220a將引用文獻記錄R1〜R2解析成如圖2B的屬性資料 表。引用文獻記錄R1〜R2通常包含作者名稱a卜標題A2、 © 發表來源A3以及發表時間A4等屬性資料(見圖2B )。 詳細來說,引用文獻解析單元220a連接至一背景知 識資料庫290。背景知識資料庫290存有多個文法樣板, 當引用文獻記錄R1〜R2自引用文獻資料庫210匯入引用文 獻解析單元220a時’引用文獻解析單元220a將引用文獻 記錄R1〜R2與背景知識資料庫290中的文法樣板進行比 對’而將引用文獻記錄的屬性資料解析出來(步驟S410)。 接著,藉由引用文獻解析單元220a還原部分的屬性資料, 200933403 )6tw£doc/n 例如將以縮寫表示的發表地點還原成完整的詞句,來避免 因為縮寫問題造成的比對錯誤(步驟S420)。此外,在將 引用文獻記錄R1〜R2匯入引用文獻解析單元22〇a之前, 還可事先行判斷引用文獻記錄R1〜R2是否屬於同一著 作’並於引用文獻資料庫210中標示。另外,當引用文獻 記錄R1〜R2與背景知識資料庫290中的文法樣板都不能相 符時’亦可對應建立新的文法樣板,並將新的樣板存入背 景知識資料庫290之中。 〇 圖6A為圖5之引用文獻網路過濾單元與部分組件的 連接關係的示意圖。請參考圖6A,引用文獻網路過遽單元 230可包括一屬性相依度計算器234。屬性相依度計算器 234利用基於字詞(t〇ken based)或比對編輯距離 distance)的方法,來計算兩筆引用文獻記錄之屬性資料的 相似^ (步驟S430)。由於有可能因為引用文獻本身記錄 的拼字錯誤,而干擾相似度計算的準確程度。因此,屬性 相依度計算器234可再根據屬性資料間的相依性,計算屬 ❹ 依相似度(步驟S440),藉以調整不合理的相似度’ 提同訓練資料的品質。舉例來說,在比兩引用文獻記錄中, 所^算出屬性資料Ai〜A3的相似度都在〇 8〜〇 9間’而屬 性資料A4的相似度卻在〇.3〜0.4間,而遠低於屬性資料 A1〜A3的相似度時’很有可能是屬性資料a#的拼字出現 了錯誤。此時’便可依據屬性相依相似度(如屬性資料 A1〜A4的相似度的平均值),將屬性資料的a4的相似度 往上調整。 15 ,06twf.doc/n 200933403 引用文獻網路過濾單元23G更可包括—權重圖產生器 232與-弱鏈結過據器236。權重圖產生器攻依據屬性資 料間的相似度丄建立關係網路的一引用文獻群聚圖(步驟 S450),並計算出引用文獻群聚圖中各個鏈結的一鏈節強 度。之後’弱鏈結過渡器236比對各個鍵結的鍵結強度是 否位於一預設範圍,藉以濾除鏈結強度在預設範圍外的鏈 結(步驟S460)。 引用文獻網路過濾單元230更可包括社群網路中心搜 尋器238與一關係向量產生器239。社群網路中心搜尋器 238 #示6己引用文獻群聚圖中,同一文件之引用文獻群組^ 社群網路中心,並輸出給關係向量產生器239 (步驟 S470)。關係向量產生器239根據社群網路中心搜尋器238 的標記結果,產生同族群與不同族群的關係向量以提供引 用文獻分辨單元240做為訓練資料(步驟S48〇)。 圖6B為圖5之引用文獻分辨單元與部分組件的連接 關係的示意圖。請參考圖6B ’引用文獻分辨單元240可包 ❹ 括一二元分類器242與一預測結果產生器244。二元分類 器242根據引用文獻網路過渡單元230所提供的訓練資 料’訓練出一個正相關和負相關的二元分類模型給預測結 果產生器244,此二元分類模型例如為一支援向量機模型, (Support Vector Machine model, SVM model)。然後,當 一新引用文獻記錄50輸入預測結果產生器244之後,預測 結果產生器244藉由此二元分類模型輸出分辨結果。分辨 結果包括正相關與負相關兩種結果。正相關表示兩個引用 文獻可能是引用同一文獻,反之則不是。 16 200933403 06twf.doc/n 在本實施例中’新引用文獻記錄50可先經由引用文 獻解析單元220b,將屬性資料A1〜A4解析出來,再經由 引用文獻關係分析單元250輸入預測結果產生器244。在 一未繪示的實施例中,引用文獻解析單元22〇a與引用文獻 解析單元220b可為同一個引用文獻解析單元。 圖8為圖5的專家搜尋系統在測試階段的流程圖,圖 6C為圖5之引用文獻關係分析單元與部分組件的連接關 ❹ 係的示意圖。請對照參考圖5、圖6C與圖8,新引用文獻 s己錄50例如以定時的方式,自網際網路4〇下載至引用文 獻解析單元220b。引用文獻關係分析單元25〇可包括一群 中心向夏產生器252。在引用文獻解析單元22〇b將新引用 文獻記錄50解析成個別多個屬性資料之後,群中心產生器 252分析引用文獻之關係網路的社群中心節點,並計算新 引用文獻記錄50與各中心的關聯度(步驟S5〇2)。之後, 引用文獻分辨單元240判斷新引用文獻記錄50是否與引用 文獻關係網路之任一群組產生關聯(步驟S5〇4)。 ❹ 弓。丨用文獻關係分析單元250更可包括一引用文獻關聯 建立器256與一引用文獻關係網路資料庫254。引用文獻 關係網路資料庫254用以儲存前述之引用文獻群聚圖。若 新引用文獻記錄50未與引用文獻群聚圖中的任何群組關 聯,引用文獻關聯建立器256建立一新的群組(步驟 =06)。若新引用文獻記錄5〇與引用文獻群聚圖中的某 一群組產生關聯’則將新引用文獻記錄50標記為所屬群組 丄步驟S508)。然後,將建立新的群組或標記所屬群組的 結果匯入弓丨用文獻關係網路資料庫254中(步驟S510)。 17 200933403 〇6twf.doc/n 之後’再透過分析社群網路的特性,修正 情形(步驟S512)。接著,觸新引用文獻記錄 同時屬於多個群組(步驟S514)。 疋 於多個群組時,則將所屬的群組中同時屬 例的數目來計算與該文獻之關聯:之=== 獻所隸屬的群組(步驟S516)。例如1 、 5 記錄50與所屬群組產生鏈結的數量多寡,來 Ο ❹ f %所屬之—最可能群組。#是新引敎獻記錄5〇 …、重複隸屬多個群組的情形時 5…' 標記結果輸出識別=(=: 某領域中最適合4m’而可讓使用者搜尋到 :網路分析方法,自動分析網際網 獻,所以皋需佑齒從* _吩工八置了侍的引用文 自動化搜尋-特定專家㈣庫,即可解決 限定多個實施例揭露如上,然其並非用以 脫離本=之=:#屬=領,有通常知識者,在不 為準。此保雜圍當視後社申· 騎界定者 【圖式簡單說明】 =為實施例的專家搜尋系統的示 為本發明-實施例之引収獻記錄的示意圖。 18 06twf.doc/n 200933403 圖2B為圖2A之引用文獻記錄的屬性資料示意圖。 圖3為圖1之使用者介面的示意圖。 圖4為本實施例所歸納出之專家搜尋方法。 圖5為本發明另一實施例的專家搜尋系統的示意圖。 圖6A為圖5之引用文獻網路過濾單元與部分組件的 連接關係的示意圖。 圖6B為圖5之引用文獻分辨單元與部分組件的連接 關係的示意圖。 ® 圖6C為圖5之引用文獻關係分析單元與部分組件的 連接關係的示意圖。 圖7為圖5的專家搜尋系統在訓練階段的流程圖。 圖8為圖5的專家搜尋系統在測試階段的流程圖。 【主要元件符號說明】 40 :網際網路 50 :新引用文獻記錄 ❹ 1〇〇:專家搜尋系統 110:引用文獻資料庫 120 :領域專家謀合單元 122 :重要性評估器 124:語言模型建立器 126:語言模型資料庫 128 :關鍵字謀合器 130 ·謀合結果整合器 19 06twf.doc/n 200933403 140 :使用者介面 200 :專家搜尋系統 210 :引用文獻資料庫 220a、220b :引用文獻解析單元 230 :引用文獻網路過濾單元 232 :權重圖產生器 234 :屬性相依度計算器 236 :弱鏈結過濾器 ® 238:社群網路中心搜尋器 239 :關係向量產生器 240 :引用文獻分辨單元 242 :二元分類器 244 :預測結果產生器 250 :引用文獻關係分析單元 252 :群中心向量產生器 254:引用文獻關係網路資料庫 〇 256 :引用文獻關聯建立器 258 :引用文獻類別標記器 260 :領域專家謀合器 270 :引用文獻權重資料庫 280 :使用者介面 290 :背景知識資料庫 A1〜A4 :屬性資料 R1〜R2 :引用文獻記錄 S310-S350、S410-S480、S502〜S518 :步驟 20Based on these attribute data A1 to A4, the two-language model builder 124 creates a language model for the cited documents. The language model database 126 stores the language model established by the beta model builder. In detail, the language model builder 124 can use the statistical reference to the keywords of the document to build a language model for comparison search. ° 200933403 The year, month, day, etc. of the table). The expert cooperation unit 120 is connected to the introduction database 110. The 7-page domain specialization unit 12G includes an importance evaluator (2), a language model builder 124, a sister 楛 楛 杳 虚 虚 叩 ^ ^ ^ ^ 贫 贫 126 126 126 2 and the result integration integrator (10). The importance of the evaluation is used to evaluate the importance of the document records R1 to R2. For the paste, the importance evaluator 122 can be based on the recognized degree of influence of the journal. Calculate the weight value corresponding to the reference document records R1 to R2, and store the weight value correspondingly in the cited document database 11。. In this way, the weight evaluation H 122 can be presented by The importance of the weights in the database is used to evaluate the importance of the cited documents R1 to R2. Fig. 3 is a schematic diagram of the user interface of Fig. 1. Please refer to the figure and Fig. 3' keyword matching device 128 through the user interface 14 〇 receiving a keyword input by the user, and searching the language model database 126 according to the keyword. After the search is completed, the keyword finder 128 can generate a list of cited documents related to the key sub-category. Result integrator 13 〇 based on each reference The document records the importance of R1~R2, and sorts the names of the authors corresponding to the list. 'By exporting ^: Specialist 0 12 200933403 〇6twf.doc/n For example, the user inputs the key of a theme through the user interface 140. The word "Fuzzy". The keyword combiner 128 seeks the user-entered keyword and the cited document record in the citation language model database 126 to find a list of citations related to the subject. The self-cited document list refers to the literature record to find the author of the document. The result of the integration 130 is based on the result of the evaluation to evaluate the degree of cooperation between the document and the user input subject and the degree of the literature to calculate a list of experts, such as "Chen" In addition, the theme expander 140 can cooperate with a keyword list to extend the range of keywords input by the user to improve the system's ability to combine. Corresponding to multiple domain keywords in multiple fields, such as "graph, extended to "graph tiieory", etc. input by the user. Ο From another point of view, an expert search method can be summarized according to the operation flow of the expert search system 100 of the above embodiment. Fig. 4 is an expert search method summarized by the embodiment. Referring to FIG. 1 and FIG. 4, the expert search method summarized in this embodiment includes the following steps. First, the citation database 110 is provided (step S310). Next, the evaluation citation database 110 is still referred to as the important process of each of the cited document records R1 to R2 (step S320). After that, according to each bow-shaped record ri~r2 temple data A1~8 4 to establish a language model, in order to generate language model data ii26 (=』S33〇). Then, based on the keyword entered by the user, the text is used to generate a reference related to the keyword R1~畲畲S34〇). Then, according to the respective citation records, the authors are ranked according to the list of references 13 200933403 〇6twf.doc/n, in order to output the expert list to the user (step illusion 5〇). FIG. 5 is a schematic diagram of an expert search system according to another embodiment of the present invention. Referring to FIG. 5, the domain expert combiner 260 is similar to the domain specific combiner 110 of the previous embodiment, and the expert search system 1 and the expert search system 200 are similar to the user interaction, and will not be here. Let me repeat. This embodiment will be described in detail with respect to the process of establishing the cited document weight database 270. In general, the process of establishing the reference weight database 27() can be roughly divided into a training phase and a testing phase. The training phase is mainly to use the existing citations to establish a basic network of relationships. The testing phase extends the scope of the relational network by importing new citation records 50. 7 is a flow chart of the expert search system of FIG. 5 during the training phase. Please store a plurality of cited document records R1 to R2 (see Fig. 2A) with reference to Fig. 5 and Fig. 7 ' cited literature database 21'. The cited document database 21 is connected to a reference document analyzing unit 220a to resolve the cited document records R1 to R2 into the attribute data table as shown in Fig. 2B by the cited document analyzing unit 220a. The cited document records R1 to R2 usually contain attribute data such as the author name ab title A2, © publication source A3, and publication time A4 (see Fig. 2B). In detail, the citation document parsing unit 220a is connected to a background knowledge database 290. The background knowledge database 290 stores a plurality of grammar templates. When the reference document records R1 R R2 are imported from the cited document database 210 into the citation analysis unit 220a, the citation document resolving unit 220a will cite the document records R1 R2 and the background knowledge data. The grammar templates in the library 290 are compared ' and the attribute data of the cited document records are parsed (step S410). Then, the attribute data of the part is restored by the reference document parsing unit 220a, 200933403) 6tw£doc/n, for example, the publishing place represented by the abbreviation is restored to the complete word to avoid the alignment error caused by the abbreviation problem (step S420) . Further, before the reference document records R1 to R2 are imported into the citation analysis unit 22A, it is also possible to judge in advance whether or not the citation records R1 to R2 belong to the same job' and are indicated in the citation database 210. In addition, when the reference document records R1 to R2 do not match the grammar templates in the background knowledge database 290, a new grammar template may be created correspondingly, and the new template may be stored in the background knowledge database 290. Figure 6A is a schematic diagram showing the connection relationship between the network filtering unit and some components of the cited document of Figure 5. Referring to FIG. 6A, the citation network over-the-counter unit 230 can include an attribute dependency calculator 234. The attribute dependency calculator 234 calculates the similarity of the attribute data of the two cited document records by the method based on the word (t〇ken based) or the comparison edit distance (step S430). It is possible to interfere with the accuracy of the similarity calculation because of the spelling errors recorded by the cited documents themselves. Therefore, the attribute dependency calculator 234 can further calculate the belonging degree according to the dependency between the attribute data (step S440), thereby adjusting the unreasonable similarity' to the quality of the training material. For example, in the two cited literature records, the similarity of the calculated attribute data Ai~A3 is between 〇8 and 〇9, while the similarity of the attribute data A4 is between 〇.3~0.4, and far When the similarity is lower than the attribute data A1 to A3, it is very likely that the spelling of the attribute data a# has an error. At this time, the similarity of a4 of the attribute data can be adjusted upward according to the attribute-dependent similarity (such as the average of the similarities of the attribute data A1 to A4). 15 , 06 twf.doc / n 200933403 Citations The network filtering unit 23G may further include a weight map generator 232 and a weak link processor 236. The weight map generator attacks a reference document cluster map of the relational network according to the similarity between the attribute data (step S450), and calculates a link strength of each link in the cited document cluster map. Then, the weak link transition 236 compares the bond strength of each of the bonds to a predetermined range, thereby filtering out the link whose link strength is outside the preset range (step S460). The citation network filtering unit 230 may further include a social network center searcher 238 and a relationship vector generator 239. The social network center searcher 238 #示6 has cited the document group ^ community network center of the same file and outputs it to the relationship vector generator 239 (step S470). The relationship vector generator 239 generates a relationship vector of the same group and the different groups based on the flagged result of the social network center searcher 238 to provide the reference document distinguishing unit 240 as the training material (step S48). Figure 6B is a schematic diagram showing the connection relationship between the reference resolution unit of Figure 5 and some components. Referring to FIG. 6B, the cited document resolution unit 240 may include a binary classifier 242 and a prediction result generator 244. The binary classifier 242 trains a positive correlation and negative correlation binary classification model to the prediction result generator 244 according to the training data provided by the reference document network transition unit 230. The binary classification model is, for example, a support vector machine. Model, (Support Vector Machine model, SVM model). Then, when a new citation document record 50 is input to the prediction result generator 244, the prediction result generator 244 outputs the resolution result by the binary classification model. The resolution results include both positive and negative correlations. A positive correlation means that two citations may refer to the same document, and vice versa. In the present embodiment, the newly cited document record 50 may first parse the attribute data A1 to A4 via the cited document analysis unit 220b, and then input the prediction result generator 244 via the cited document relationship analysis unit 250. . In an embodiment not shown, the citation document parsing unit 22a and the citation parsing unit 220b may be the same citation parsing unit. 8 is a flow chart of the expert search system of FIG. 5 in a test phase, and FIG. 6C is a schematic diagram of the connection relationship between the citation relationship analysis unit and some components of FIG. Referring to Figures 5, 6C and 8, the new citations 50 are downloaded from the Internet 4 to the citation document parsing unit 220b, for example, in a timed manner. The citation document relationship analysis unit 25A may include a group of center-to-summer generators 252. After the cited document analysis unit 22〇b parses the newly cited document record 50 into a plurality of individual attribute data, the group center generator 252 analyzes the community center node of the relational network of the cited documents, and calculates the newly cited document record 50 and each The degree of association of the center (step S5〇2). Thereafter, the citation data resolving unit 240 determines whether the newly citation document record 50 is associated with any group of citation literature relation networks (step S5〇4). ❹ bow. The document relationship analysis unit 250 may further include a citation document association builder 256 and a citation document relationship network database 254. Citations The relational network database 254 is used to store the aforementioned citation maps of citations. If the newly cited document record 50 is not associated with any of the groups in the cited document cluster, the cited document association builder 256 creates a new group (step = 06). If the newly cited document record 5〇 is associated with a certain group in the citation document cluster graph, then the new citation document record 50 is marked as belonging to the group 丄 step S508). Then, the result of establishing a new group or tag belonging group is merged into the document relation network database 254 (step S510). 17 200933403 〇6twf.doc/n After 're-analysis of the characteristics of the social network, the situation is corrected (step S512). Next, the new citation document record belongs to a plurality of groups at the same time (step S514). In the case of a plurality of groups, the association with the document is calculated by the number of simultaneous instances in the group to which it belongs: === the group to which the contribution belongs (step S516). For example, 1 and 5 record the number of links generated by 50 and their group, and Ο ❹ f % belongs to the most likely group. #是新敎敎献记录5〇..., repeating the situation of multiple groups 5...' Marking result output recognition = (=: The most suitable 4m in a certain field can be searched by users: network analysis method Automatic analysis of the Internet, so there is no need to slap the teeth from the * _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ =之=:#属=领,有常知识,不正准. This is the case of the company, and the definition of the expert search system for the example BRIEF DESCRIPTION OF THE EMBODIMENT - A schematic diagram of the collection of records. 18 06twf.doc/n 200933403 Figure 2B is a schematic diagram of the attribute data recorded in the cited document of Figure 2A. Figure 3 is a schematic diagram of the user interface of Figure 1. The expert search method is summarized in Fig. 5. Fig. 5 is a schematic diagram of an expert search system according to another embodiment of the present invention. Fig. 6A is a schematic diagram showing the connection relationship between the network filter unit and some components of the citation of Fig. 5. Fig. 6B is a view 5 cited document resolution unit and some components Schematic diagram of the connection relationship. Figure 6C is a schematic diagram of the connection relationship between the citation analysis unit and some components of Figure 5. Figure 7 is a flow chart of the expert search system of Figure 5 during the training phase. Figure 8 is the expert search of Figure 5. Flowchart of the system in the test phase. [Main component symbol description] 40: Internet 50: New citation record ❹ 1〇〇: Expert search system 110: Citation database 120: Domain expert recruitment unit 122: Importance Evaluator 124: Language Model Builder 126: Language Model Database 128: Keyword Combiner 130 • Consolidation Result Integrator 19 06twf.doc/n 200933403 140: User Interface 200: Expert Search System 210: Citations Library 220a, 220b: Citation Document Parsing Unit 230: Citations Web Filter Unit 232: Weight Map Generator 234: Attribute Dependency Calculator 236: Weak Link Filter® 238: Community Network Center Finder 239: Relationship Vector generator 240: reference document resolution unit 242: binary classifier 244: prediction result generator 250: citation document relationship analysis unit 252: group center vector generator 254: Citations Relational Network Database 256: Citations Association Builder 258: Citations Classifiers 260: Domain Experts 270: Citations Weights Library 280: User Interface 290: Background Knowledge Databases A1 to A4 : Attribute data R1 to R2: Reference document records S310-S350, S410-S480, S502~S518: Step 20

Claims (1)

200933403 O6twf.doc/n 十、申謗專利範固: 1. 一種專家搜尋系統,包括: -引用文獻資料庫,儲存多筆文獻記錄,其中該 些引用文獻記錄分觀含多個屬性資料,且該些屬性資料 包含一作者名稱;以及 -領域專家謀合單元,連接至刻狀獻資料庫,包 括:200933403 O6twf.doc/n X. Application for patents: 1. An expert search system, comprising: - a citation database, storing a plurality of document records, wherein the citation records have a plurality of attribute data, and The attribute data contains an author name; and - the domain expert unit, connected to the database, including: Ο 一重要性評估器’用以評估各該些引用文獻記錄 的一重要程度; 一5吾S模型建立器,依據該些屬性資料,建立該 些引用文獻的一語言模型; 吾5模型資料庫’儲存該語言模型建立器所建 立的該語言模型; 一一關鍵予謀合器,依據一關鍵字,搜尋該語言模 型貧料庫,藉以產生與該關鍵字相關之一引用文獻列 表,以及 -謀合結果整合H ’依據各該些引敎獻記錄的 該重要程度’排序該引用文獻列表所對應的該些作者 名稱’耩以輸出一專家列表。 2.如申請專利_第!項所述之專家搜尋系統,更包 括: 一引用文獻解析單元,該領域專家謀合單元經由該引 用文獻解析單元連接至該引用文獻資料庫,其中該引用文 獻解析單元從各該些引用文獻記錄中,擷取出該些屬性資 21 06twf.doc/n 200933403 料。 3.如申請專利範圍第1項所述之專家搜尋系統,更包 括: 一引用文獻網路過濾單元,連接於該引用文獻資料 庫,該引用文獻網路過濾單元依據該些屬性資料的_相關 程度,建立該些引用文獻記錄間的多個鏈結,藉以產生該 些引用文獻記錄的一關係網路,其中該關係網路包含多^Ο an importance evaluator' is used to evaluate the importance of each of the cited bibliographic records; a 5 s S model builder, based on the attribute data, to establish a language model of the citations; 'Storing the language model established by the language model builder; one key to the utility, searching for the language model poor library based on a keyword, thereby generating a list of cited documents related to the keyword, and - The result integration H' sorts the author names corresponding to the list of citations by the importance of each of the citation records to output a list of experts. 2. If you apply for a patent _ the first! The expert search system of the item further includes: a reference document analysis unit, wherein the domain expert cooperation unit is connected to the reference document database via the reference document analysis unit, wherein the reference document analysis unit records from each of the cited documents In the middle, I took out some of these attributes 21 06twf.doc/n 200933403. 3. The expert search system of claim 1, further comprising: a citation network filtering unit connected to the citation database, the citation network filtering unit is based on the _ correlation of the attribute data Degree, establishing a plurality of links between the cited document records, thereby generating a relational network of the cited document records, wherein the relationship network includes more ^ 群組。 4.如申請專利範圍第3項所述之專家搜尋系統,其中 該引用文獻網路過濾單元,包括: 八 一屬性相依度計算器’分別對各該些引用文獻,計算 各屬性資料間的相關程度。 5.如申請專利範圍第3項所述之專家搜尋系統,其 該引用文獻網路過濾單元,包括: ’八 惟至罔座王窃,用以逻立該關係網路的— = 算出該引用文獻群聚圖中各該些鏈結的-鏈 -弱鏈結過濾H,比對鏈結強度是否位於 鏈結強度在該預設朗外的料鏈二 該引用文叙專讀衫統,其中 各該引用文獻群聚圖中, -關係向!產生器,根據該社群網路中心搜尋器的標 22 06twf.doc/n 200933403 記結果,產生同族群與不同族群的關係向量。 7.如申請專利範圍第3項所述之專家搜尋系統,更包 括: 一引用文獻關係分析單元,依據該些屬性資料,配合 一社群網路分析方式,分析該關係網路中該些引用文獻記 錄的關係。 8·如申請專利範圍第7項所述之專家搜尋系統,其中 該引用文獻關係分析單元,包括: -群中〜向量產生器’接收—新引用文獻記錄並產 生該新引用文獻記錄與該關係網路中其他群組之一關係向 量; 一引用文獻關係網路資料庫,儲存該關係網路; -引用文獻襲建立器,將該新引用文獻記錄加入引 用文獻關係網路之中;以及 一引用文獻類別標記器,標記該關係網路中每一該些 引用文獻記錄所屬之群組。 © 9.如申请專利範圍第3項所述之專家搜尋系統,更包 括: 一引用文獻分辨單元,藉由用以判斷兩引用文獻在該 關係網路中為正相關或是負相關。 10·如申請專利範圍第9項所述之專家搜尋系統,其 中該引用文獻分辨單元,包括: ~ &quot; 一二元分類器,建立一二元分類模型;以及 一預測結果產生器,利用該二元分類模型,判斷兩引 23 06twf.doc/n 200933403 用文獻記錄間在該關係網路中是否有一鏈結存在。 u.如申請專利範圍第1項所述之專家搜尋系統,1 中該領域專家謀合單元,更包括: ^ -關鍵字擴展H ’具有—關鍵字列表,其中該關鍵字 列表包含對應多個領域的多個領域關鍵字,且該關鍵字擴 展器用以將該關鍵字擴展成該些領域關鍵字其中之一。 12·如申請專利範圍第1項所述之專家搜尋系統,其 巾齡屬性資料更包含該些㈣文獻的標題、發表來源以 ¥ 及發表時間中至少其中之一。 13. —種專家搜尋方法,包括: 提供一引用文獻資料庫,其中該引用文獻資料庫儲存 多筆引用文獻記錄,而該些引用文獻記錄分別包含多個屬 性資料,且該些屬性資料包含一作者名稱; 评估各該些引用文獻記錄的一重要程度; 依據該些屬性資料,建立該些引用文獻記錄的一語言 模型’藉以產生一語言模型資料庫; ❹ 依據使用者輸入之一關鍵字,搜尋該語言模型資料 庫,藉以產生與該關鍵字相關之一引用文獻列表;以及 依據各該些引用文獻記錄的該重要程度,排序該引用 文獻列表所對應的該些作者名稱,藉以輸出一專家列表給 使用者® 14. 如申請專利範圍第13項所述之專家搜尋方法,其 中在提供該引用文獻資料庫之後,更包括: 分別從該些引用文獻記錄中,擷取出該些屬性資料。 24 200933403 〇6twf.doc/n 15. 如申請專利範圍第13項所述之專家搜尋方法,其 中在擷取出該些屬性資料之後’更包括: 八 計算該些屬性資料的相似度;以及 依據該些屬性資料的相似度,建立該些引用文獻記錄 間的多個鏈結,藉以產生該些引用文獻記錄的一關係網 路,其中該關係網路包含多個群組。 16. 如申請專利範圍第15項所述之專家搜尋方法,其 中在產生該關係網路之後,更包括: 計算各該些鏈結之一鏈結強度; 比對該些鍵結強度是否位於一預設範圍;以及 遽除該些鏈結強度在該預設範圍外的該些鏈結。 17. 如申請專利範圍第15項所述之專家搜尋方法,其 中在產生該關係網路之後,更包括: 八 建立該關係網路的一引用文獻群聚圖。 18·如申請專利範圍第17項所述之專家搜尋方法,其 中在建立該引用文獻群聚圖之後’更包括: 、 標記該引用文獻群聚圖中’各該些群組之社群網路中 〇 19·如申請專利範圍第15項所述之專家搜尋方法,其 中在產生該關係網路之後,更包括: ^ 接收一新引用文獻記錄;以及 利用一二元分類模型,判斷該新引用文獻記錄與在該 關係網路中,是否與該些引用文獻記錄有一鏈結存在。 20·如申請專利範圍第13項所述之專家搜尋方法,其 25 200933403 )〇6twf.doc/n 中依據該關鍵字,搜尋該語言模型資料庫的步驟,包括: 提供一關鍵字列表,其中該關鍵字列表包含對應多個 領域的多個領域關鍵字; 將該關鍵字擴展成該些領域關鍵字其中之一;以及 依據所擴展的該領域關鍵字,搜尋該語言語言模型資 料庫。Group. 4. The expert search system of claim 3, wherein the cited document network filtering unit comprises: an Bayi Property Dependency Calculator for calculating the correlation between each attribute data for each of the cited documents. degree. 5. The expert search system of claim 3, wherein the cited document network filtering unit comprises: 'Eight to the scorpion king thief used to arbitrate the relationship network — = calculate the reference The chain-chain-weak chain of each of the links in the document clustering filter H, and whether the strength of the link is located in the chain of the chain of the predetermined intensity is in the quotation. In each of the citations of the citations, - relationship orientation! The generator generates a relationship vector between the same group and the different groups according to the result of the social network center searcher 22 06twf.doc/n 200933403. 7. The expert search system as described in claim 3, further comprising: a citation document relationship analysis unit, based on the attribute data, and a social network analysis method, analyzing the references in the relationship network The relationship between document records. 8. The expert search system of claim 7, wherein the cited document relationship analysis unit comprises: - a group-to-vector generator 'receives' - newly cited document record and generates the new cited document record and the relationship a relationship vector of one of the other groups in the network; a reference to the relational network database to store the network; - a reference to the document builder, the new citation record added to the citation network; and A document category marker is referenced to mark the group to which each of the cited document records belongs in the network. © 9. The expert search system of claim 3, further comprising: a reference document resolution unit for determining whether the two cited documents are positively correlated or negatively correlated in the network. 10. The expert search system of claim 9, wherein the cited document resolution unit comprises: ~ &quot; a binary classifier, establishing a binary classification model; and a prediction result generator, using the Binary classification model, judging two references 23 06twf.doc/n 200933403 Whether there is a link in the relational network between the documents. u. The expert search system described in claim 1 of the patent scope, 1 in the field of expert recruitment units, further includes: ^ - keyword extension H ' has - a keyword list, wherein the keyword list contains corresponding multiple A plurality of domain keywords of the domain, and the keyword expander is used to expand the keyword into one of the domain keywords. 12. If the expert search system described in item 1 of the patent application scope, the age attribute data further includes at least one of the titles of the (4) documents, the publication source, and the publication time. 13. An expert search method, comprising: providing a reference document database, wherein the reference document database stores a plurality of cited document records, and the reference document records respectively comprise a plurality of attribute data, and the attribute data includes one Author's name; assessing an important degree of each of the cited document records; based on the attribute data, establishing a language model of the cited document records 'to generate a language model database; ❹ according to a keyword input by the user, Searching the language model database for generating a list of cited documents related to the keyword; and sorting the author names corresponding to the list of cited documents according to the importance degree of each of the cited document records, thereby outputting an expert List to the user® 14. The expert search method as described in claim 13 of the patent application, wherein after providing the reference document database, the method further comprises: extracting the attribute data from the cited document records respectively. 24 200933403 〇6twf.doc/n 15. The expert search method described in claim 13 of the patent application, wherein after extracting the attribute data, the method further comprises: eight calculating the similarity of the attribute data; The similarity of the attribute data establishes a plurality of links between the referenced document records, thereby generating a relational network of the referenced document records, wherein the relational network comprises a plurality of groups. 16. The expert search method of claim 15, wherein after generating the relationship network, the method further comprises: calculating a link strength of each of the links; whether the bond strength is located at a a preset range; and removing the links whose link strength is outside the preset range. 17. The expert search method of claim 15, wherein after generating the relationship network, the method further comprises: VIII establishing a citation map of the reference network. 18. The expert search method of claim 17, wherein after the citation of the citation document is established, the method further comprises: marking the social network of each of the groups in the citation chart (19) The expert search method of claim 15, wherein after generating the relationship network, the method further comprises: ^ receiving a new citation document record; and judging the new citation using a binary classification model Document records and in the relationship network, whether there is a link with the cited document records. 20. The expert search method described in claim 13 of the patent application, 25 200933403) 〇 6twf.doc/n, according to the keyword, the step of searching the language model database includes: providing a keyword list, wherein The keyword list includes a plurality of domain keywords corresponding to the plurality of domains; expanding the keyword into one of the domain keywords; and searching the language language model database according to the extended domain keyword. 2626
TW97101630A 2008-01-16 2008-01-16 Expert finding system and method TWI352913B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW97101630A TWI352913B (en) 2008-01-16 2008-01-16 Expert finding system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW97101630A TWI352913B (en) 2008-01-16 2008-01-16 Expert finding system and method

Publications (2)

Publication Number Publication Date
TW200933403A true TW200933403A (en) 2009-08-01
TWI352913B TWI352913B (en) 2011-11-21

Family

ID=44865897

Family Applications (1)

Application Number Title Priority Date Filing Date
TW97101630A TWI352913B (en) 2008-01-16 2008-01-16 Expert finding system and method

Country Status (1)

Country Link
TW (1) TWI352913B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI419071B (en) * 2010-03-05 2013-12-11 Ceci Engineering Consultants Inc Active knowledge management system, method and computer program product for problem solving

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI419071B (en) * 2010-03-05 2013-12-11 Ceci Engineering Consultants Inc Active knowledge management system, method and computer program product for problem solving

Also Published As

Publication number Publication date
TWI352913B (en) 2011-11-21

Similar Documents

Publication Publication Date Title
CN108804521B (en) Knowledge graph-based question-answering method and agricultural encyclopedia question-answering system
CN106991092B (en) Method and equipment for mining similar referee documents based on big data
WO2018196561A1 (en) Label information generating method and device for application and storage medium
CN103049575B (en) A kind of academic conference search system of topic adaptation
CN103310003A (en) Method and system for predicting click rate of new advertisement based on click log
CN103838857B (en) Automatic service combination system and method based on semantics
CN102902821A (en) Methods for labeling and searching advanced semantics of imagse based on network hot topics and device
CN109101551B (en) Question-answer knowledge base construction method and device
CN110910175B (en) Image generation method for travel ticket product
JP6308708B1 (en) Patent requirement conformity prediction device and patent requirement conformity prediction program
CN115563313A (en) Semantic retrieval system for literature and books based on knowledge graph
CN113900954B (en) Test case recommendation method and device using knowledge graph
CN118820389B (en) Keyword-based data association storage method and device
CN115630843A (en) Contract clause automatic checking method and system
CN107133274B (en) Distributed information retrieval set selection method based on graph knowledge base
CN118467595A (en) Search method, device, equipment, and medium for target domain based on large language model
CN113722421A (en) Contract auditing method and system and computer readable storage medium
CN102799627A (en) Data association method based on first-order logic and nerve network
CN114911999B (en) Name matching method and device
CN111209375A (en) Universal clause and document matching method
CN114564579B (en) An entity classification method and system based on massive knowledge graph and graph embedding
CN103678327B (en) Method and device for information association
CN108182181B (en) A hybrid similarity-based method for duplication detection of public contribution merge requests
CN111008285B (en) Author disambiguation method based on thesis key attribute network
TW200933403A (en) Expert finding system and method

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees