KR20010095721A

KR20010095721A - method for related terms searching

Info

Publication number: KR20010095721A
Application number: KR1020000019027A
Authority: KR
Inventors: 이은미
Original assignee: 이은미
Priority date: 2000-04-11
Filing date: 2000-04-11
Publication date: 2001-11-07
Anticipated expiration: 2020-04-11
Also published as: KR100372078B1

Abstract

PURPOSE: A method for searching a related word is provided to create a thesaurus and to recognize related words rapidly and cheaply by collecting/analyzing prepared query languages and automatically recognizing related terms and creating a thesaurus for using an information search system. CONSTITUTION: It is judged whether new search session is existing by searching a query language log(S10,S12). If new search session is existing, all terms displayed on query languages prepared during the corresponding search session are extracted(S16). It is judged whether the number of the extracted terms is a plural number(S18). If the number of the extracted terms is a plural number, related language information on the extracted terms is updated, and one term not updating related language information is selected(S20). The extracted languages are registered as a related word of the selected term(S22). If related language information on all the extracted terms has been updated(S24), the process is returned to the above stage(S10,S12).

Description

Method for related terms searching}

본 발명은 관련어 검색 방법에 관한 것으로서, 특히 사용자로부터 입력받은 질의어 로그를 이용하여 관련어집을 자동으로 생성하고, 생성된 관련어집을 이용하여 의미적으로 관련성이 있는 관련어를 검색할 수 있도록 하는 관련어 검색 방법에 관한 것이다.The present invention relates to a related word search method, and more particularly, to a related word search method that automatically generates a related glossary using a query log input from a user, and searches a related word that is semantically related using the generated related glossary. It is about.

일반적으로 관련어집(thesaurus)이란 용어의 사용법과 용어들 사이의 관계에 대한 정보를 제공하는 어휘 도구로, 용어의 관계성은 상위 개념(BT;Broader Term), 하위 개념(NT;Narrower Term), 동의어(UF;Use For Or Synonymous), 유사어, 관계어(RT;Related Term), 대체어 등과 같은 의미적 관계와, 용어들이 문서들에서 발생하는 특성을 이용하여 정의한 것으로, 독립 발생, 의존적 발생 등과 같은 통계적 관계로 분류되는 데, 관련어집은 이러한 관계성을 이용, 탐색시 질의에 포함된 용어의 의미를 확대하기 위해 주로 사용된다.In general, thesaurus is a lexical tool that provides information on the use of the term and the relationships between terms. The relationship between terms is broader term (BT), narrower term (NT), and synonyms. Semantic relationships (UF), synonyms, related terms (RT), substitutes, etc., and terms defined using characteristics that occur in documents, such as independent occurrences and dependent occurrences. It is classified as a statistical relationship. A related glossary is mainly used to expand the meaning of terms included in a query when using this relationship.

전술한 바와 같은 관련어집을 생성하는 방법으로 종래에는, 전문가들이 용어들을 의미적 관련성에 따라 분류하여 관련어집을 생성하는 데, 전술한 방법은 전문가들이 수동으로 생성하는 것이기 때문에 시간과 비용이 많이 요구되는 문제점이 발생한다.As a method for generating a related glossary as described above, in the related art, experts generate a related glossary by classifying terms according to semantic relevance. The above-described method requires a lot of time and cost because the expert generates the related glossary. This happens.

따라서, 관련어집을 보다 적은 시간과 비용으로 생성하기 위해 자동화 방법이 개발되었는 데, 지금까지 개발된 관련어집 자동 생산 방법으로는 용어들간의 관련성을 동시에 발생할 확률로 정의한 동시 발생 분류 방법과, 문서들을 분류한 후에 각 그룹에서만 주로 나타나는 용어들을 관련어로 정의하는 문서 분류 방법과, 어학적 지식과 문서에서의 동시 발생 특성을 이용하여 용어들간의 관계를 파악하는 문법 분류 방법이 존재한다.Therefore, an automated method has been developed to generate the related glossaries with less time and cost. The automatic production of the related glossaries has been developed so far. After that, there are document classification methods that define terms that appear mainly in each group as related words, and grammar classification methods that grasp the relationship between terms using linguistic knowledge and co-occurrence characteristics in documents.

그러나, 전술한 방법들은 주로 용어들 간의 의미적 관계는 고려하지 않고 통계적인 관계만 고려하여 관련어집을 생성하기 때문에 관련어 검색시에 자동으로 생성된 용어들 간의 관계를 사람들이 납득할 수 없는 경우가 발생하는 문제점이 있다.However, since the above-described methods mainly generate a related glossary by considering only statistical relationships without considering semantic relationships between terms, people may not be able to understand the relationship between terms automatically generated when searching for related words. There is a problem.

본 발명은 전술한 문제점을 해결하기 위해 안출된 것으로서, 정보 검색 시스템을 이용하기 위하여 작성한 질의어들을 수집/분석하여 의미적으로 관련있는 용어들을 자동으로 파악하여 관련어집을 생성함으로써, 의미적 관련성을 알 수 있는 관련어집을 적은 시간과 비용으로 생성하고, 의미적으로 관련성이 있는 용어들을 파악할 수 있도록 하는 관련어 검색 방법을 제공함에 그 목적이 있다.The present invention has been made to solve the above-described problems, by collecting / analyzing query words written to use the information retrieval system to automatically identify the semanticly related terms to generate a related glossary, thereby understanding the semantic relevance. The purpose of the present invention is to provide a related word search method that can generate a related related glossary in a small amount of time and cost and to identify semantically relevant terms.

도 1은 본 발명에 적용되는 질의어 로그의 구성을 나타내는 도.1 is a view showing the configuration of a query log applied to the present invention.

도 2는 본 발명에 적용되는 관련어집을 나타내는 도.2 is a view showing a related glossary applied to the present invention.

도 3 및 도 4는 본 발명에 따른 관련어 검색 방법을 설명하기 위한 플로우챠트.3 and 4 are flowcharts for explaining a related word search method according to the present invention.

*** 도면의 주요 부분에 대한 부호의 설명 ****** Explanation of symbols for the main parts of the drawing ***

10. 질의어 로그, 13. 식별자부,10. query log, 13. identifier section,

15. 시간부, 17. 질의어부,15. Time Department, 17. Query Fisherman,

20. 관련어집, 25. 관련어 리스트 파일,20. Glossary, 25. Glossary list file,

25a. 관련어부, 25b. 가중치부25a. Related fisherman, 25b. Weight part

전술한 목적을 달성하기 위한 본 발명의 관련어 검색 방법은, 새로운 검색 세션 동안 작성된 질의어들에서 나타난 용어를 추출하고, 상기 추출된 용어들 중 하나를 선택하여 상기 추출된 용어들을 상기 선택된 용어의 관련어로 등록하는 관련어집 생성 과정과; 관련어 검색을 위해 작성된 질의어에 나타난 용어를 추출하여 상기 추출된 용어에 속한 관련어를 상기 질의어의 관련어로 등록하고, 상기 질의어에 대한 관련어들을 정렬하여 출력하는 관련어 검색 과정을 포함하여 이루어진다.In order to achieve the above object, the related word search method of the present invention extracts a term appearing from query terms created during a new search session, selects one of the extracted terms, and extracts the extracted terms as related terms of the selected term. Generating a related dictionary to be registered; And extracting a term appearing in the query word created for the related word search, registering the related word belonging to the extracted term as a related word, and sorting and outputting related words for the query word.

여기서, 상기 관련어집 생성 과정은, 질의어 로그를 검색하여 새로운 검색세션이 존재하는 경우에는 상기 검색 세션 동안 작성된 질의어들에서 나타난 모든 용어를 추출하는 단계와; 상기 추출된 용어가 적어도 2개 이상인 경우에는 상기 추출된 용어들 중에서 관련어 정보를 갱신하지 않은 새로운 용어를 하나 선택하는 단계와; 상기 선택된 용어의 관련어 리스트 파일에 상기 추출된 용어들을 저장시켜 상기 추출된 용어들을 상기 선택된 용어의 관련어로 등록하는 단계를 포함하여 이루어지는 것을 특징으로 한다. 그리고, 상기 추출된 용어들에 대한 관련어 정보를 모두 갱신시켰는 지를 판단하는 단계와; 상기 판단결과 상기 추출된 용어들에 대한 관련어 정보를 모두 갱신시킨 경우에는 상기 질의어 로그를 검색하여 새로운 검색 세션이 존재하는 지를 판단하는 단계와; 상기 판단결과 상기 추출된 용어들에 대한 관련어 정보를 모두 갱신시키지 않은 경우에는 나머지 추출된 용어들에 대한 관련어 정보를 갱신시키는 단계를 더 포함하여 이루어지는 것을 특징으로 한다.The generating of the related phrasebook may include: searching a query log and extracting all terms appearing from the query words created during the search session when a new search session exists; If the extracted terms are at least two, selecting one of the extracted terms that has not been updated with related information; And storing the extracted terms in a related word list file of the selected term and registering the extracted terms as related words of the selected term. Determining whether all relevant information about the extracted terms has been updated; Determining whether a new search session exists by searching the query log when all related information about the extracted terms is updated as a result of the determination; The method may further include updating related word information about the remaining extracted terms when the related word information regarding the extracted terms is not updated.

한편, 상기 관련어 검색 과정은, 관련어를 검색하기 위해 작성된 질의어들에서 나타난 모든 용어를 추출하는 단계와; 상기 추출된 용어들 중에서 관련어 리스트 파일을 가져오지 않은 용어 하나를 선택하는 단계와; 상기 선택된 용어의 관련어 리스트 파일을 가져와 상기 관련어 리스트 파일에 저장되어 있는 관련어들을 상기 질의어에 대한 관련어로 등록하는 단계와; 상기 추출된 모든 용어들에 대한 관련어를 상기 질의어에 대한 관련어로 등록한 경우에는 상기 질의어에 대한 관련어를 관련 정도에 따라 정렬시켜 출력하는 단계를 포함하여 이루어지는 것을 특징으로 한다. 그리고, 상기 추출된 모든 용어들에 대한 관련어를 상기 질의어에 대한 관련어로 등록하지 못한 경우에는, 나머지 추출된 용어들에 대한 관련어를 상기 질의어에 대한 관련어로 등록하는 단계를 더 포함하여 이루어지는 것을 특징으로 한다.On the other hand, the related words search process, the step of extracting all the terms appearing in the query words written to search for related words; Selecting one of the extracted terms, for which a related word list file is not imported; Importing a related word list file of the selected term and registering related words stored in the related word list file as related words for the query word; When the related words for all the extracted terms are registered as the related words for the query, the related words for the query words may be arranged according to the degree of relatedness and output. If the related words for all the extracted terms are not registered as related to the query, registering the related words for the remaining extracted terms as related to the query may be further included. do.

이하에서는 첨부한 도면을 참조하여 본 발명의 바람직한 실시예에 따른 관련어 검색 방법에 대해서 상세하게 설명한다.Hereinafter, a related word search method according to a preferred embodiment of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명에 적용되는 질의어 로그의 구성을 나타내는 도로, 본 발명에 적용되는 질의어 로그(10)는, 식별자부(13)와, 시간부(15)와, 질의어부(17)를 구비하여 이루어진다.1 is a road showing the structure of a query log applied to the present invention. The query log 10 applied to the present invention includes an identifier unit 13, a time unit 15, and a query unit 17. Is done.

이와 같은 구성에 있어서, 식별자부(13)에는 질의어를 작성한 사람을 식별할 수 있는 식별 부호(예를 들어, ID)가 저장되고, 시간부(15)에는 질의어를 작성한 시간이 저장되고, 질의어부(17)에는 사용자로부터 입력받은 질의어가 저장된다.In such a configuration, the identifier section 13 stores an identification code (for example, an ID) for identifying the person who created the query, and the time section 15 stores the time at which the query was created. In 17, the query word input from the user is stored.

도 2는 본 발명에 적용되는 관련어집을 나타내는 도로, 본 발명에 적용되는 관련어집(20)은 다수의 용어(23)들을 포함하여 이루어지는 데, 각각의 용어(23)들은 관련어 리스트 파일(25)을 구비하고 있다.2 is a road showing a related glossary applied to the present invention. The related glossary 20 applied to the present invention includes a plurality of terms 23. Each term 23 represents a related word list file 25. Equipped.

전술한 관련어 리스트 파일(25)은 관련어부(25a)와 가중치부(25b)의 순서쌍들로 구성되는 데, 관련어는 용어와 같은 검색 세션(session)에서 1회 이상 출현한 용어이며, 각 관련어는 용어와의 관련 정보를 나타내기 위한 가중치를 가지고 있다.The above-described related word list file 25 is composed of ordered pairs of the related word part 25a and the weight part 25b. The related word is a term that appears more than once in a search session such as a term, and each related word is referred to. It has a weight to indicate related information with a term.

도 3 및 도 4는 본 발명에 따른 관련어 검색 방법을 설명하기 위한 플로우챠트로, 도 3은 본 발명에 따른 관련어 검색 방법의 관련어집 생성 과정을 보인 도이고, 도 4는 본 발명에 따른 관련어 검색 방법의 관련어 검색 과정을 보인 도이다.3 and 4 are flowcharts illustrating a method for searching a related word according to the present invention. FIG. 3 is a view illustrating a process of generating a related phrasebook of the related word search method according to the present invention. FIG. 4 is a related word search according to the present invention. Figure shows the process of searching for the related word of the method.

먼저, 본 발명에 따른 관련어 검색 방법은 질의어 로그에 기록된 새로운 검색 세션 동안 작성된 질의어들에서 나타난 용어를 추출하고, 추출된 용어들 중에서 하나를 선택하여 추출된 용어들을 선택된 용어의 관련어로 등록하는 관련어집 생성 과정과, 관련어 검색을 위해 작성된 질의어에 나타난 용어를 추출하여 해당 용어에 속한 관련어를 상기 질의어의 관련어로 등록하고, 질의어에 대한 관련어들을 정렬하여 출력하는 관련어 검색 과정을 포함하여 이루어진다.First, a related word search method according to the present invention extracts a term appearing from query terms created during a new search session recorded in the query log, and selects one of the extracted terms and registers the extracted terms as related terms of the selected term. And a related word search process of extracting a term appearing in a query word prepared for searching for a related word, registering a related word belonging to the term as a related word of the query word, and sorting and outputting related words for the query word.

먼저, 관련어집 생성 과정은, 도 3에 도시하는 바와 같이, 질의어 로그(10)를 검색하여 새로운 검색 세션이 있는 지를 판단한다(S10, S12). 상기한 과정 S12에서 검색 세션이라 함은 임의의 사용자가 정보 검색 시스템을 이용하여 소정의 시간동안 수행한 검색 활동으로, 하나의 검색 세션은 최소한 1회의 질의어 작성과 검색 결과 확인 절차를 갖는다. 질의어를 이용한 검색 결과가 사용자를 만족시키지 못할 경우, 사용자는 다시 질의어를 작성하여 검색을 수행하는 데, 하나의 검색 세션 동안 사용자가 찾고자 하는 것은 특정 주제에 관련된 정보라고 할 수 있으며, 검색 세션 동안 작성된 1회 이상의 질의어들은 주제와 관련된 것이라고 할 수 있다.First, as shown in FIG. 3, the related phrasebook generation process searches the query log 10 to determine whether there is a new search session (S10 and S12). In the process S12, a search session is a search activity performed by a user for a predetermined time using an information search system, and one search session has at least one query word writing and search result checking procedure. If the search result using the query does not satisfy the user, the user rewrites the query to perform the search. The user is looking for information related to a specific topic during a search session. More than one query can be said to be related to a topic.

상기한 과정 S12의 판단결과 새로운 검색 세션이 없는 경우에는 관련어집 생성을 완료하고(S14), 새로운 검색 세션이 있는 경우에는 해당 검색 세션 동안 작성된 질의어들에서 나타난 모든 용어를 추출한다(S16).As a result of the determination of step S12, if there is no new search session, the associated glossary generation is completed (S14), and if there is a new search session, all terms appearing in the query words created during the corresponding search session are extracted (S16).

이후에는 상기한 과정 S16에서 추출한 용어가 2개 이상인 지를판단한다(S18). 상기한 과정 S18의 판단결과 추출한 용어가 2개 이상이 아닌 경우에는 상기한 과정 S10으로 진행하여 질의어 로그(10)를 검색하여 새로운 검색 세션이 있는 지를 판단하고, 추출한 용어가 2개 이상인 경우에는 관련어집(20)에 추출된 용어들에 대한 관련어 정보를 갱신시키는 데, 추출된 용어들 중에서 관련어 정보를 갱신하지 않은 새로운 용어를 하나 선택한다(S20).Thereafter, it is determined whether two or more terms extracted in the above-described process S16 are used (S18). As a result of the determination of step S18, if the extracted terms are not two or more, the process proceeds to step S10 to search the query log 10 to determine whether there is a new search session, and if the extracted terms are two or more, the related terms In order to update the related information about the terms extracted in the house 20, one of the extracted terms is selected a new term that does not update the related information.

이후에는 상기한 과정 S16에서 추출된 용어들을 상기한 과정 S20에서 선택된 용어의 관련어로 등록하는 데, 선택된 용어에 대한 관련어 리스트 파일(25)에 검색 세션에 같이 출현한 용어들에 대한 정보를 갱신시킨다(S22). 즉, 새로운 관련어를 등록하거나 가중치 값을 변경한다.Subsequently, the terms extracted in step S16 are registered as related words of the term selected in step S20, and the information about the terms that appear together in the search session is updated in the related word list file 25 for the selected term. (S22). That is, a new related word is registered or a weight value is changed.

이후에는 상기한 과정 S16에서 추출된 모든 용어들에 대한 관련어 정보를 모두 갱신시켰는 지를 판단한다(S24). 상기한 과정 S24의 판단결과 모든 용어들에 대한 관련어 정보를 모두 갱신시킨 경우에는 상기한 과정 S10로 진행하여 질의어 로그(10)를 검색하여 새로운 검색 세션이 있는 지를 판단한다.Thereafter, it is determined whether all related information about all terms extracted in the process S16 is updated (S24). As a result of the determination of step S24, when all related information about all terms is updated, the process proceeds to step S10 to search the query log 10 to determine whether there is a new search session.

한편, 상기한 과정 S24의 판단결과 모든 용어들에 대한 관련어 정보를 모두 갱신시키지 못한 경우에는 상기한 과정 S20로 진행하여 추출된 용어들 중에서 관련어 정보를 갱신하지 않은 새로운 용어를 하나 선택하고, 선택된 용어에 대한 관련어 리스트 파일(25)에 검색 세션에 같이 출현한 용어들에 대한 정보를 갱신시키는 과정을 반복 수행한다.On the other hand, if the determination result of the process S24 does not update all of the related word information for all terms, the process proceeds to the above step S20 and selects a new term that does not update the related information from the extracted terms, the selected term The process of updating the information about the terms that appear together in the search session in the related word list file 25 is repeated.

한편, 관련어 검색 과정은, 도 4에 도시하는 바와 같이, 관련어를 검색하기 위하여 작성된 질의어들에서 나타난 모든 용어들을 추출하고(S30), 추출된 용어들중에서 관련어 리스트 파일(25)을 관련어집(20)에서 가져오지 않은 용어 하나를 선택한다(S32). 이후에는 선택된 용어의 관련어 리스트 파일(25)을 관련어집(20)에서 읽어들이고(S34), 읽어들인 관련어 리스트 파일(25)에 저장되어 있는 관련어들을 상기한 질의어에 대한 관련어로 등록한다(S36).Meanwhile, as shown in FIG. 4, the related word search process extracts all terms appearing from the query words written to search the related words (S30), and extracts the related word list file 25 from the extracted terms. In step S32, one term that is not imported is selected. Thereafter, the related word list file 25 of the selected term is read from the related word book 20 (S34), and the related words stored in the read related word list file 25 are registered as related words for the query word (S36). .

이후에는 상기한 과정 S30에서 추출된 모든 용어들에 대한 관련어들을 질의어에 대한 관련어로 등록했는 지를 판단한다(S38). 상기한 과정 S38의 판단결과 모든 용어들에 대한 관련어들을 질의어에 대한 관련어로 등록하지 않은 경우에는 상기한 과정 S32로 진행하여 추출된 용어들 중에서 관련어 리스트 파일(25)을 관련어집(20)에서 가져오지 않은 용어 하나를 선택하고, 선택된 용어의 관련어 리스트 파일(25)을 관련어집(20)에서 읽어들이고, 읽어들인 관련어 리스트 파일(25)에 저장되어 있는 관련어들을 상기한 질의어에 대한 관련어로 등록하는 과정을 반복 수행한다.Subsequently, it is determined whether related words for all terms extracted in the process S30 are registered as related words for a query word (S38). As a result of the determination of step S38, if the related words for all the terms are not registered as the related words for the query word, the related word list file 25 is taken from the related glossary 20 from the extracted terms in step S32. To select one term that has not come, read the related word list file 25 of the selected term from the related word book 20, and register related words stored in the loaded related word list file 25 as related words for the query. Repeat the process.

한편, 상기한 과정 S38의 판단결과 모든 용어들에 대한 관련어들을 질의어에 대한 관련어로 등록한 경우에는 질의어에 대한 관련어들을 관련 정도에 따라 정렬시키고(S40), 정렬된 관련어들을 요청된 방식에 따라 출력한다(S42).On the other hand, if the related words for all terms are registered as related to the query as a result of the determination in step S38 (S40), the related words for the query are sorted according to the degree of relevance (S40), and the sorted related words are output according to the requested method. (S42).

본 발명의 관련어집 생성 및 관련어 검색 방법은 전술한 실시예에 국한되지 않고 본 발명의 기술 사상이 허용하는 범위 내에서 다양하게 변형하여 실시할 수 있다.The method of generating a related phrasebook and a related word retrieval of the present invention are not limited to the above-described embodiments, and various modifications can be carried out within the range permitted by the technical idea of the present invention.

이상에서 설명한 바와 같은 본 발명의 관련어 검색 방법에 따르면, 정보 검색 시스템을 이용하기 위하여 작성한 질의어들을 수집/분석하여 의미적으로 관련있는 용어들을 자동으로 파악하여 관련어집을 생성함으로써, 의미적 관련성을 알 수 있는 관련어집을 적은 시간과 비용으로 생성할 수 있는 효과가 있다.According to the related words search method of the present invention as described above, by collecting / analyzing the query words written to use the information retrieval system to automatically identify the semantically relevant terms to generate a related glossary, it is possible to know the semantic relevance There is an effect that it is possible to generate an associated glossary in less time and cost.

그리고, 전술한 바와 같이 생성된 관련어집을 이용하여 관계어를 검색함으로써, 각 용어에 대하여 의미적으로 관련성이 있는 용어들을 파악할 수 있는 효과가 있다.In addition, by searching for the related word using the related terminology generated as described above, there is an effect of identifying terms that are semantically related to each term.

Claims

Extracting a term appearing from the query terms created during a new search session, and generating a related glossary by selecting one of the extracted terms and registering the extracted terms as a related word of the selected term;

And a related word search process of extracting a term appearing in a query word written for a related word search, registering a related word belonging to the extracted term as a related word of the query word, and sorting and outputting related words for the query word.

The method of claim 1, wherein the related lexicon generation process,

Searching the query log to extract all terms appearing in the queries created during the search session when there is a new search session;

If the extracted terms are at least two, selecting one of the extracted terms that has not been updated with related information;

Storing the extracted terms in a related word list file of the selected term and registering the extracted terms as related words of the selected term.

The method of claim 2, further comprising: determining whether all related information about the extracted terms has been updated;

Determining whether a new search session exists by searching the query log when all related information about the extracted terms is updated as a result of the determination;

And if the related term information for the extracted terms is not updated, further comprising updating the related term information for the remaining extracted terms.

The method of claim 2, wherein the query log,

An identifier unit for storing an identification code for identifying a query creator;

A time unit which stores a time at which the query is written;

Related word search method comprising the query portion for storing the received query.

The method of claim 1, wherein the related word search process comprises:

Extracting all terms appearing in the queries written to search for related terms;

Selecting one of the extracted terms, for which a related word list file is not imported;

Importing a related word list file of the selected term and registering related words stored in the related word list file as related words for the query word;

And registering the related words for all of the extracted terms as related words for the query word, sorting and outputting the related words for the query words according to a related degree.

The method of claim 5, wherein when the related word for all the extracted terms is not registered as the related word for the query word,

And registering the related word for the remaining extracted terms as the related word for the query.

The method according to claim 2 or 5, wherein the related word list file,

Method for generating a related dictionary, characterized in that consisting of ordered pairs of the related words associated with the term and the weight of the related words.

The method of claim 1, wherein the search session,

A search activity performed for a predetermined time, wherein one search session has at least one query writing procedure and a search result checking procedure.