KR102716693B1

KR102716693B1 - System and method for supporting text conversion services using customized regular expressions for each user

Info

Publication number: KR102716693B1
Application number: KR1020240058710A
Authority: KR
Inventors: 황지욱
Original assignee: 주식회사 퍼즐에이아이
Priority date: 2024-03-13
Filing date: 2024-05-02
Publication date: 2024-10-15
Anticipated expiration: 2044-05-02

Abstract

본 발명의 일 실시예에 따른 사용자 별 맞춤형 정규 표현식을 이용한 텍스트 변환 서비스를 지원하는 시스템은 사용자의 발화음성을 입력받는 입력부; 상기 발화음성을 1차 텍스트로 변환하고, 상기 1차 텍스트에서 사용자 별로 활용될 미리 설정한 정규 표현식을 적용하여 정규 표현식으로 변환한 2차 텍스트로 변환하는 음성 인식 모델부; 및 사용자 별 정규 표현식을 설정 및 제공하는 정규표현식 설정부를 포함한다.A system supporting a text conversion service using a user-specific customized regular expression according to one embodiment of the present invention includes: an input unit for receiving a user's spoken voice; a voice recognition model unit for converting the spoken voice into a primary text, and converting the primary text into a secondary text converted into a regular expression by applying a preset regular expression to be utilized by each user; and a regular expression setting unit for setting and providing a user-specific regular expression.

Description

System and method for supporting text conversion services using customized regular expressions for each user

본 발명은 사람이 발화(發話)하는 자연 언어(Natural language)를 입력받아 처리하는 시스템 및 방법으로, 더욱 상세하게는 사용자 별 맞춤형 정규 표현식을 이용한 텍스트 변환 서비스를 지원하는 시스템 및 방법에 관한 것이다.The present invention relates to a system and method for receiving and processing natural language spoken by a person, and more specifically, to a system and method for supporting a text conversion service using regular expressions customized for each user.

일반적으로, 사람이 발화한 문장을 음성 인식을 통해 텍스트로 변환할 수 있다. 이러한 음성 인식 기술을 이용한 자동 번역 시스템 및 자연 언어 이해 기술에 기반한 대화 시스템 등에서는, 사용자가 발화한 문장을 입력받아 기 설정된 언어 처리를 수행한다.In general, sentences spoken by a person can be converted into text through speech recognition. In automatic translation systems using speech recognition technology and conversation systems based on natural language understanding technology, sentences spoken by a user are input and preset language processing is performed.

한편, 자동 번역 시스템 및 대화 시스템 등의 언어 처리 시스템에 입력되는 문장은, 사용자의 자유로운 발화 또는 문장의 의미를 강조하는 도치법 등의 사용에 의해 명사나 부사가 문장의 뒤에 위치하는 등 문장의 구성 순서가 변경되는 경우가 많다. Meanwhile, sentences input into language processing systems such as automatic translation systems and conversation systems often have their sentence order changed, such as nouns or adverbs being placed at the end of the sentence, due to the user's free speech or the use of inversion techniques to emphasize the meaning of the sentence.

이처럼 일반적인 문장 구성 성분과 다른 순서로 구성된 문장이 입력될 경우, 언어 처리 시스템의 품질이 떨어질 수 있다.When sentences composed in a different order from the general sentence structure are input, the quality of the language processing system may deteriorate.

이러한 문제는 일반적인 문장 구성 순서(즉, 다수를 차지하는 문장 구성 순서)를 잘 처리하는 규칙 기반의 언어 처리 시스템과 통계 기반의 언어 처리 시스템 모두에서 나타날 수 있다.These problems can arise in both rule-based language processing systems, which handle common sentence structures well (i.e., the majority of sentence structures), and statistical-based language processing systems.

따라서, 종래의 언어 처리 시스템에서는, 도치 문장이나 문장 부호가 누락된 호격 등의 언어 처리를 위해, 문자 열의 부분 일치 또는 전체 일치 여부를 확인하여 기 지정된 문장 형태로 수정하여 처리하였다. 그러나, 이러한 수정 처리를 위해서는 다양한 경우의 입력에 모두 대처하기 위해 많은 양의 데이터베이스를 구축해야 할 뿐만 아니라, 구축된 데이터베이스와 실제 입력 문장이 일치되지 않는 경우에는 언어 처리 자체가 불가능하다는 문제가 있었다.Therefore, in conventional language processing systems, in order to process languages such as inverted sentences or vocative cases with missing punctuation marks, the partial or complete matching of character strings was checked and then modified into a pre-specified sentence form for processing. However, for such modification processing, not only was a large database required to cope with all input cases, but there was also the problem that language processing itself was impossible if the constructed database did not match the actual input sentence.

이와 관련하여, 한국등록특허공보 제1497411 호(발명의 명칭: 문체 변환 장치, 문체 변환 방법, 저장 매체, 자동 대화 서비스 시스템 및 방법)는, 사용자에게 제공할 문장을 다양한 문체로 제공하여 사용자 친숙도를 높이기 위하여, 변환 대상 문장을 수신하는 통신부, 특정 어절과 이에 대응하는 문체 변환 어절이 저장된 어절 변환 DB, 변환 대상 문장에서 마지막 어절을 분리하고 어절 변환 DB의 저장 내용과 비교하여 문체 변환 어절을 선택하는 어절 비교부, 및 변환 대상 문장에 포함된 어절을 어절 비교부에서 선택된 어절로 대체하여 결과 문장으로 형성하는 문장 생성부를 포함하는 문체 변환 장치를 개시하고 있다.In this regard, Korean Patent Publication No. 1497411 (Title of the Invention: Style Conversion Device, Style Conversion Method, Storage Medium, Automatic Dialogue Service System and Method) discloses a style conversion device including a communication unit which receives a target sentence for conversion, a phrase conversion DB in which specific phrases and style-converted phrases corresponding to them are stored, a phrase comparison unit which separates the last phrase from the target sentence for conversion and selects the style-converted phrase by comparing it with the stored contents of the phrase conversion DB, and a sentence generation unit which forms a result sentence by replacing the phrase included in the target sentence for conversion with the phrase selected by the phrase comparison unit.

공개특허공보 제10-2021-0047709호Publication of Patent Publication No. 10-2021-0047709

본 발명이 해결하고자 하는 과제는 종래의 문제점을 해결할 수 있는 사용자 별 맞춤형 정규 표현식을 이용한 텍스트 변환 서비스를 지원하는 시스템 및 방법을 제공하는 데 그 목적이 있다.The purpose of the present invention is to provide a system and method that support a text conversion service using a user-specific customized regular expression that can solve conventional problems.

상기 과제를 해결하기 위한 본 발명의 일 실시예에 따른 사용자 별 맞춤형 정규 표현식을 텍스트 변환 서비스를 지원하는 시스템은 사용자의 발화음성을 입력받는 입력부; 상기 발화음성을 1차 텍스트로 변환하고, 상기 1차 텍스트에서 사용자 별로 활용될 미리 설정한 정규 표현식을 적용하여 정규화 문장인 2차 텍스트로 치환하는 음성 인식-변환 모델부; 및 기 정의한 정규표현식을 설정 및 제공하는 정규표현식 설정부를 포함하고, 상기 음성 인식-변환 모델부는 상기 발화음성을 STT 프로그램을 통해 텍스트로 변환하는 음성-텍스트 변환부; 변환된 1차 텍스트 내에 문자열들이 기 정의한 정규표현식의 문자열을 포함하는 지 여부를 판단하는 필터링부; 및 상기 필터링부에서 기 정의한 정규표현식의 문자열을 포함하는 지 확인된 문자열을 기 설정된 정규표현식 패턴에 기초하여 치환하는 정규표현식 치환부를 포함하는 것을 특징으로 한다.According to one embodiment of the present invention for solving the above problem, a system supporting a text conversion service using a user-specific customized regular expression comprises: an input unit for receiving a user's spoken voice; a voice recognition-conversion model unit for converting the spoken voice into a primary text and replacing the primary text with a secondary text which is a normalized sentence by applying a preset regular expression to be utilized by the user; and a regular expression setting unit for setting and providing a pre-defined regular expression, wherein the voice recognition-conversion model unit comprises: a voice-to-text conversion unit for converting the spoken voice into text through an STT program; a filtering unit for determining whether strings in the converted primary text include strings of a pre-defined regular expression; and a regular expression replacement unit for replacing strings, the strings of which are confirmed by the filtering unit to include strings of a pre-defined regular expression, based on a preset regular expression pattern.

삭제delete

상기 과제를 해결하기 위한 본 발명의 일 실시예에 따른 사용자 별 맞춤형 정규 표현식을 이용한 텍스트 변환 방법은 입력부에서 사용자의 발화음성을 입력받는 단계; 및 음성 인식-변환 모델부에서 상기 발화음성을 1차 텍스트로 변환하고, 상기 1차 텍스트에서 사용자 별로 활용될 미리 설정한 정규 표현식을 적용하여 정규화 문장인 2차 텍스트로 치환하는 단계를 포함하고, 상기 음성 인식-변환 모델부는 상기 발화음성을 STT 프로그램을 통해 텍스트로 변환하는 음성-텍스트 변환부; 변환된 1차 텍스트 내에 문자열들이 기 정의한 정규표현식의 문자열을 포함하는 지 여부를 판단하는 필터링부; 및 상기 필터링부에서 기 정의한 정규표현식의 문자열을 포함하는 지 확인된 문자열을 기 설정된 정규표현식 패턴에 기초하여 치환하는 정규표현식 치환부를 포함하는 것을 특징으로 한다.According to one embodiment of the present invention for solving the above problem, a method for converting text using a customized regular expression for each user includes the steps of: receiving a user's spoken voice from an input unit; and converting the spoken voice into a primary text in a voice recognition-conversion model unit, and applying a preset regular expression to be utilized for each user to the primary text to replace it with a secondary text which is a normalized sentence, wherein the voice recognition-conversion model unit includes: a voice-to-text conversion unit which converts the spoken voice into text through an STT program; a filtering unit which determines whether strings in the converted primary text include strings of a pre-defined regular expression; and a regular expression replacement unit which replaces strings confirmed by the filtering unit to include strings of a pre-defined regular expression based on a preset regular expression pattern.

삭제delete

본 발명의 일 실시예에 따른 사용자 별 맞춤형 정규표현식을 이용한 텍스트 변환 시스템 및 방법을 이용하면, 사용자가 부주의하게 발화하거나, 발화 문장이 도치 형태이거나, 오타이거나, 부가적인 감탄사이거나, 또는 무의미한 입력 등의 다양한 문장 형식에 대해 올바른 문장 재구성이 가능하여 자연 언어 처리 시의 잘못된 분석을 방지할 수 있다는 이점을 제공한다.The use of a text conversion system and method using a user-specific customized regular expression according to one embodiment of the present invention provides the advantage of preventing incorrect analysis during natural language processing by enabling correct sentence reconstruction for various sentence formats such as carelessly uttered by a user, inverted sentences, typos, additional exclamations, or meaningless inputs.

도 1은 본 발명의 일 실시예에 따른 사용자 별 맞춤형 정규 표현식을 이용한 텍스트 변환 시스템의 장치 구성도이다.
도 2는 도 1에 도시된 음성 인식-변환 모델부의 세부 구성도이다.
도 3 및 도 4는 정규표현식의 일 예시도이다.
도 5는 정규표현식 치환부의 알고리즘의 예시도이다.
도 6은 본 발명의 일 실시예에 따른 사용자 별 맞춤형 정규표현식을 이용한 텍스트 변환 서비스를 지원하는 방법을 설명한 흐름도이다.
도 7은 도 6의 S720 과정의 세부 흐름도이다.FIG. 1 is a device configuration diagram of a text conversion system using user-specific customized regular expressions according to one embodiment of the present invention.
Figure 2 is a detailed configuration diagram of the voice recognition-conversion model part illustrated in Figure 1.
Figures 3 and 4 are examples of regular expressions.
Figure 5 is an example of the algorithm for the regular expression substitution part.
FIG. 6 is a flowchart illustrating a method for supporting a text conversion service using a user-specific customized regular expression according to one embodiment of the present invention.
Figure 7 is a detailed flowchart of process S720 of Figure 6.

이하, 본 명세서의 실시예가 첨부된 도면을 참조하여 기재된다. 그러나, 이는 본 명세서에 기재된 기술을 특정한 실시 형태에 대해 한정하는 것이 아니며, 본 명세서의 실시예의 다양한 변경(modifications), 균등물(equivalents), 및/또는 대체물(alternatives)을 포함하는 것으로 이해되어야 한다. 도면의 설명과 관련하여, 유사한 구성요소에 대해서는 유사한 참조 부호가 사용될 수 있다. 본 명세서에서, "가진다," "가질 수 있다," "포함한다," 또는 "포함할 수 있다" 등의 표현은 해당 특징(예: 수치, 기능, 동작, 또는 부품 등의 구성요소)의 존재를 가리키며, 추가적인 특징의 존재를 배제하지 않는다.Hereinafter, embodiments of the present specification will be described with reference to the accompanying drawings. However, this does not limit the technology described in the present specification to specific embodiments, but should be understood to include various modifications, equivalents, and/or alternatives of the embodiments of the present specification. In connection with the description of the drawings, similar reference numerals may be used for similar components. In the present specification, expressions such as "has," "may have," "includes," or "may include" indicate the presence of a corresponding feature (e.g., a component such as a number, function, operation, or part), and do not exclude the presence of additional features.

본 명세서에서, "A 또는 B," "A 또는/및 B 중 적어도 하나," 또는 "A 또는/및 B 중 하나 또는 그 이상"등의 표현은 함께 나열된 항목들의 모든 가능한 조합을 포함할 수 있다. 예를 들면, "A 또는 B," "A 및 B 중 적어도 하나," 또는 "A 또는 B 중 적어도 하나"는, (1) 적어도 하나의 A를 포함, (2) 적어도 하나의 B를 포함, 또는 (3) 적어도 하나의 A 및 적어도 하나의 B 모두를 포함하는 경우를 모두 지칭할 수 있다.As used herein, the phrases “A or B,” “at least one of A and/or B,” or “one or more of A or/and B” can include all possible combinations of the listed items. For example, “A or B,” “at least one of A and B,” or “at least one of A or B” can all refer to (1) including at least one A, (2) including at least one B, or (3) including both at least one A and at least one B.

본 명세서에서 사용된 "제 1," "제 2," "첫째," 또는 "둘째,"등의 표현들은 다양한 구성요소들을, 순서 및/또는 중요도에 상관없이 수식할 수 있고, 한 구성요소를 다른 구성요소와 구분하기 위해 사용될 뿐 해당 구성요소들을 한정하지 않는다. 예를 들면, 제 1 사용자 기기와 제 2 사용자 기기는, 순서 또는 중요도와 무관하게, 서로 다른 사용자 기기를 나타낼 수 있다. 예를 들면, 본 명세서에 기재된 권리 범위를 벗어나지 않으면서 제 1 구성요소는 제 2 구성요소로 명명될 수 있고, 유사하게 제 2 구성요소도 제 1 구성요소로 바꾸어 명명될 수 있다.As used herein, the expressions “first,” “second,” “first,” or “second,” etc. can describe various components, regardless of order and/or importance, and are only used to distinguish one component from another, but do not limit the components. For example, a first user device and a second user device can represent different user devices, regardless of order or importance. For example, without departing from the scope of the rights set forth in this specification, a first component can be referred to as a second component, and similarly, a second component can also be referred to as a first component.

어떤 구성요소(예: 제 1 구성요소)가 다른 구성요소(예: 제 2 구성요소)에 "(기능적으로 또는 통신적으로) 연결되어((operatively or communicatively) coupled with/to)" 있다거나 "접속되어(connected to)" 있다고 언급된 때에는, 상기 어떤 구성요소가 상기 다른 구성요소에 직접적으로 연결되거나, 다른 구성요소(예: 제 3 구성요소)를 통하여 연결될 수 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소(예: 제 1 구성요소)가 다른 구성요소(예: 제 2 구성요소)에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 상기 어떤 구성요소와 상기 다른 구성요소 사이에 다른 구성요소(예: 제 3 구성요소)가 존재하지 않는 것으로 이해될 수 있다.When it is stated that a component (e.g., a first component) is "(operatively or communicatively) coupled with/to" or "connected to" another component (e.g., a second component), it should be understood that the component can be directly coupled to the other component, or can be connected via another component (e.g., a third component). On the other hand, when it is stated that a component (e.g., a first component) is "directly coupled to" or "directly connected" to another component (e.g., a second component), it should be understood that no other component (e.g., a third component) exists between the component and the other component.

본 명세서에서 사용된 표현 "~하도록 구성된(또는 설정된)(configured to)"은 상황에 따라, 예를 들면, "~에 적합한(suitable for)," "~하는 능력을 가지는(having the capacity to)," "~하도록 설계된(designed to)," "~하도록 변경된(adapted to)," "~하도록 만들어진(made to)," 또는 "~를 할 수 있는(capable of)"과 바꾸어 사용될 수 있다. 용어 "~하도록 구성된(또는 설정된)"은 하드웨어적으로 "특별히 설계된(specifically designed to)" 것만을 반드시 의미하지 않을 수 있다. 대신, 어떤 상황에서는, "~하도록 구성된 장치"라는 표현은, 그 장치가 다른 장치 또는 부품들과 함께 "~할 수 있는" 것을 의미할 수 있다. The expression "configured to" as used herein can be used interchangeably with, for example, "suitable for," "having the capacity to," "designed to," "adapted to," "made to," or "capable of." The term "configured to" does not necessarily mean only that which is "specifically designed to" in terms of hardware. Instead, in some contexts, the expression "a device configured to" can mean that the device is "capable of" doing something together with other devices or components.

예를 들면, 문구 "A, B, 및 C를 수행하도록 구성된(또는 설정된) 프로세서"는 해당 동작을 수행하기 위한 전용 프로세서(예: 임베디드 프로세서), 또는 메모리 장치에 저장된 하나 이상의 소프트웨어 프로그램들을 실행함으로써, 해당 동작들을 수행할 수 있는 범용프로세서(generic-purpose processor)(예: CPU 또는 application processor)를 의미할 수 있다.For example, the phrase "a processor configured (or set) to perform A, B, and C" can mean a dedicated processor (e.g., an embedded processor) to perform those operations, or a generic-purpose processor (e.g., a CPU or application processor) that can perform those operations by executing one or more software programs stored in a memory device.

본 명세서에서 사용된 용어들은 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 다른 실시예의 범위를 한정하려는 의도가 아닐 수 있다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함할 수 있다. 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 용어들은 본 명세서에 기재된 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가질 수 있다. 본 명세서에 사용된 용어들 중 일반적인 사전에 정의된 용어들은, 관련 기술의 문맥상 가지는 의미와 동일 또는 유사한 의미로 해석될 수 있으며, 본 명세서에서 명백하게 정의되지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다. 경우에 따라서, 본 명세서에서 정의된 용어일지라도 본 명세서의 실시예들을 배제하도록 해석될 수 없다.The terms used in this specification are only used to describe specific embodiments and may not be intended to limit the scope of other embodiments. The singular expression may include the plural expression unless the context clearly indicates otherwise. The terms used herein, including technical or scientific terms, may have the same meaning as commonly understood by a person of ordinary skill in the art described in this specification. Among the terms used in this specification, terms defined in general dictionaries may be interpreted as having the same or similar meaning in the context of the related art, and shall not be interpreted in an ideal or excessively formal meaning unless explicitly defined in this specification. In some cases, even if a term is defined in this specification, it cannot be interpreted to exclude the embodiments of this specification.

먼저, 본 발명을 설명하는 데 앞서, 본 발명에서 언급하는 정규식에 대해서 간략하게 설명하도록 한다.First, before explaining the present invention, let us briefly explain the regular expression mentioned in the present invention.

정규표현식의 사전적인 의미로는 특정한 규칙을 가진 문자열의 집합을 표현하는 데 사용하는 형식 언어이다. 주로 Programming Language나 Text Editor 등에서 문자열의 검색과 치환을 위한 용도로 쓰이고 있다.The dictionary definition of regular expression is a formal language used to express a set of strings with specific rules. It is mainly used for searching and replacing strings in programming languages and text editors.

또한, 정규표현식(Regular Expression)은 특정한 규칙을 가진 문자열의 집합을 표현하는데 널리 사용되는 방식이다. 정규표현식은 그 복잡도에 따라 하나의 스트링, 복수 개의 스트링과 이를 연결하는 와일드 카드(임의 개수의 문자를 의미) 문자, 그리고 불포함시 상태를 나타내는 부정(negation), 범위체크 등의 형태를 포함할 수 있다. 과거에는 스트링 한 개만으로도 공격 시그니처를 표현할 수 있었으나, 현재는 복수 개의 스트링 조합을 사용하는 시그니처가 많이 사용되고 있다. 이러한 정규표현식을 이용하면 고속으로 통신하고 있는 패킷에서 특정 패턴의 문자열을 찾음으로써 적은 수의 표현식 만으로도 다양한 문자열을 찾아낼 수 있다Also, regular expressions are a widely used method for expressing a set of strings with specific rules. Depending on their complexity, regular expressions can include a single string, multiple strings, and wildcard characters (meaning any number of characters) that connect them, as well as negation and range checks that indicate the status when not included. In the past, attack signatures could be expressed with just one string, but now signatures that use a combination of multiple strings are widely used. By using these regular expressions, you can find various strings with just a small number of expressions by finding strings with a specific pattern in packets that are being communicated at high speed.

이하, 첨부된 도면들에 기초하여 본 발명의 일 실시예에 따른 사용자 별 맞춤형 정규 표현식을 이용한 텍스트 변환 서비스를 지원하는 시스템 및 방법을 보다 상세하게 설명하도록 한다.Hereinafter, a system and method for supporting a text conversion service using a user-specific customized regular expression according to one embodiment of the present invention will be described in more detail based on the attached drawings.

도 1은 본 발명의 일 실시예에 따른 사용자 별 맞춤형 정규 표현식을 이용한 텍스트 변환 시스템의 장치 구성도이고, 도 2는 도 1에 도시된 음성 인식-변환 모델부의 세부 구성도이고, 도 3 및 도 4는 정규표현식의 일 예시도이고, 도 5는 정규표현식 치환부의 알고리즘의 예시도이다.FIG. 1 is a device configuration diagram of a text conversion system using a user-specific customized regular expression according to one embodiment of the present invention, FIG. 2 is a detailed configuration diagram of a voice recognition-conversion model section illustrated in FIG. 1, FIGS. 3 and 4 are diagrams showing examples of regular expressions, and FIG. 5 is an example diagram of an algorithm of a regular expression substitution section.

도 1 내지 도 5을 참조, 본 발명의 일 실시예에 따른 사용자 별 맞춤형 정규 표현식을 이용한 텍스트 변환 시스템(100)은 입력부(110) 및 음성 인식-변환 모델부(120)를 포함한다.Referring to FIGS. 1 to 5, a text conversion system (100) using a user-specific customized regular expression according to one embodiment of the present invention includes an input unit (110) and a voice recognition-conversion model unit (120).

상기 입력부(110)는 사용자의 발화음성을 입력받아 음성파일로 변환하여 제공하는 구성일 수 있다.The above input unit (110) may be configured to receive a user's spoken voice, convert it into a voice file, and provide it.

상기 음성 인식-변환 모델부(120)는 상기 입력부(110)에서 제공된 음성파일을 STT(Speech to Text) 프로그램을 통해 1차 텍스트로 변환하여 출력한 후, 1차 텍스트에서 사용자 별로 활용될 미리 설정한 정규 표현식을 적용하여 정규화 문장인 2차 텍스트로 치환하여 출력하는 구성일 수 있다.The above-mentioned voice recognition-conversion model unit (120) may be configured to convert a voice file provided from the input unit (110) into a primary text using an STT (Speech to Text) program and output it, and then apply a preset regular expression to be used by each user to substitute it into a secondary text, which is a normalized sentence, and output it.

보다 구체적으로, 상기 음성인식 모델부(120)는 음성-텍스트 변환부(121), 필터링부(122) 및 문장 치환부(122)를 포함한다.More specifically, the speech recognition model unit (120) includes a speech-to-text conversion unit (121), a filtering unit (122), and a sentence substitution unit (122).

상기 음성-텍스트 변환부(121)는 음성파일을 STT(Speech to Text) 프로그램을 통해 텍스트로 변환하는 구성일 수 있다.The above-mentioned voice-to-text conversion unit (121) may be configured to convert a voice file into text through an STT (Speech to Text) program.

상기 필터링부(122)는 상기 음성-텍스트 변환부(121)에서 변환된 1차 텍스트 내에 문자열들이 기 정의한 정규표현식의 문자열을 포함하는 지 여부를 판단(필터링)하는 구성일 수 있다.The above filtering unit (122) may be configured to determine (filter) whether the strings in the primary text converted by the voice-to-text conversion unit (121) include strings of a predefined regular expression.

상기 정규표현식 치환부(123)는 상기 필터링부(122)에서 기 정의한 정규표현식의 문자열을 포함하는 지 확인된 문자열을 기 설정된 정규표현식(Regular Expression) 패턴에 기초하여 치환하는 구성일 수 있다.The above regular expression substitution unit (123) may be configured to substitute a string that has been confirmed to contain a string of a regular expression defined in the above filtering unit (122) based on a preset regular expression pattern.

즉, 정규표현식 치환부(123)는 전형적인 오타 및 형태소 분석 시 구별이 어려운 어위 등이 포함된 입력문을 기 설정된 정규표현식 패턴을 적용하여 정규 문장으로 치환하는 구성일 수 있다.That is, the regular expression substitution part (123) may be configured to substitute an input sentence containing typical typos and words that are difficult to distinguish during morphological analysis into a regular sentence by applying a preset regular expression pattern.

예를 들어, “전 사과가 좋아요” 라는 문장의 경우, 형태소 분석 시 구별이 어려운 어휘로서, "저+는"이 축약된 "전" 이라는 어휘가 포함되어 있다. 이러한, "전"이라는 어휘는, 관형사로서 "이전"을 의미하는 "전(煎)" 또는 "전체"를 의미하는 "전(全)" 등으로 분석될 수 있어 모호성을 가진다. 문장 치환부(110)는 이러한 모호성을 갖는 문장의 요소를 검출하여 정규표현식에 기반하여 치환한다.For example, in the sentence “I like apples”, the word “전”, which is a contraction of “저+는”, is included as a word that is difficult to distinguish during morphological analysis. This word “전” can be analyzed as “전(煎)” meaning “before” or “전(全)” meaning “all” as an adjective, and thus is ambiguous. The sentence substitution unit (110) detects elements of sentences with this ambiguity and replaces them based on regular expressions.

도 6은 본 발명의 일 실시예에 따른 사용자 별 맞춤형 정규표현식을 이용한 텍스트 변환 서비스를 지원하는 방법을 설명한 흐름도이고, 도 7은 도 6의 S720 과정의 세부 흐름도이다.FIG. 6 is a flowchart illustrating a method for supporting a text conversion service using a user-specific customized regular expression according to one embodiment of the present invention, and FIG. 7 is a detailed flowchart of process S720 of FIG. 6.

도 6 및 도 7을 참조하면, 본 발명의 일 실시예에 따른 사용자 별 맞춤형 정규 표현식을 이용한 텍스트 변환 서비스를 지원하는 방법(S700)은 먼저, 입력부(110)에서 사용자의 발화음성을 입력받으면(S710), 음성 인식-변환 모델부(120)에서 상기 발화음성을 1차 텍스트로 변환하고, 상기 1차 텍스트에서 사용자 별로 활용될 미리 설정한 정규표현식을 적용하여 정규화 문장인 2차 텍스트로 치환(S720)하는 과정을 포함한다.Referring to FIGS. 6 and 7, a method (S700) for supporting a text conversion service using a customized regular expression for each user according to one embodiment of the present invention includes the steps of first receiving a user's spoken voice from an input unit (110) (S710), converting the spoken voice into a primary text in a voice recognition-conversion model unit (120), and applying a preset regular expression to be utilized for each user to the primary text to replace it with a secondary text, which is a normalized sentence (S720).

여기서, 상기 S720 과정은 상기 음성 인식-변환 모델부에서 변환된 1차 텍스트 내에 문자열들이 상기 기 정의한 정규표현식의 문자열을 포함하는 지 여부를 판단하는 단계를 포함할 수 있다.Here, the S720 process may include a step of determining whether the strings in the primary text converted by the speech recognition-conversion model unit include strings of the defined regular expression.

또한, 상기 1차 텍스트 내의 문자열과 상기 기 정의한 정규표현식의 문자열 간의 매칭을 통해 치환하는 단계를 포함할 수 있다.In addition, it may include a step of replacing a string in the primary text with a string in the defined regular expression.

따라서, 본 발명의 일 실시예에 따른 사용자 별 맞춤형 정규표현식을 이용한 텍스트 변환 서비스를 지원하는 시스템 및 방법을 이용하면, 음성을 텍스트로 변환한 문장을 정규표현식에 맞춰 정규화 문장으로 치환할 수 있어, 입력문에 포함되어 있던 전형적인 오타 또는 모호성을 갖는 어휘 등이 정규표현식에 맞춰 교정될 수 있다는 이점이 있다. Therefore, by using a system and method that supports a text conversion service using a user-specific customized regular expression according to one embodiment of the present invention, a sentence converted from voice to text can be replaced with a normalized sentence according to a regular expression, so that typical typos or ambiguous words included in an input sentence can be corrected according to a regular expression, which has the advantage of allowing the correction of such words.

본 문서에 개시된 일 실시 예에 따른 시스템은 다양한 형태의 장치가 될 수 있다. 전자 장치는, 예를 들면, 휴대용 통신 장치(예: 스마트폰), 컴퓨터 장치, 휴대용 멀티미디어 장치, 휴대용 의료 기기, 카메라, 웨어러블 장치, 또는 가전 장치를 포함할 수 있다. 본 문서의 실시 예에 따른 전자 장치는 전술한 기기들에 한정되지 않는다.The system according to an embodiment disclosed in this document may be a device of various forms. The electronic device may include, for example, a portable communication device (e.g., a smart phone), a computer device, a portable multimedia device, a portable medical device, a camera, a wearable device, or a home appliance device. The electronic device according to an embodiment of this document is not limited to the above-described devices.

본 문서의 일 실시 예 및 이에 사용된 용어들은 본 문서에 기재된 기술적 특징들을 특정한 실시 예들로 한정하 려는 것이 아니며, 해당 실시 예의 다양한 변경, 균등물, 또는 대체물을 포함하는 것으로 이해되어야 한다. 도 면의 설명과 관련하여, 유사한 또는 관련된 구성요소에 대해서는 유사한 참조 부호가 사용될 수 있다. 아이템 에 대응하는 명사의 단수 형은 관련된 문맥상 명백하게 다르게 지시하지 않는 한, 상기 아이템 한 개 또는 복수 개를 포함할 수 있다. 본 문서에서, "A 또는 B", "A 및 B 중 적어도 하나", "A 또는 B 중 적어도 하나", "A, B 또는 C", "A, B 및 C 중 적어도 하나", 및 "A, B, 또는 C 중 적어도 하나"와 같은 문구들 각각은 그 문구들 중 해당하는 문구에 함께 나열된 항목들 중 어느 하나, 또는 그들의 모든 가능한 조합을 포함할 수 있다. "제 1", "제 2", 또는 "첫째" 또는 "둘째"와 같은 용어들은 단순히 해당 구성요소를 다른 해당 구성요소와 구분하기 위 해 사용될 수 있으며, 해당 구성요소들을 다른 측면(예: 중요성 또는 순서)에서 한정하지 않는다. 어떤(예: 제 1) 구성요소가 다른(예: 제 2) 구성요소에, "기능적으로" 또는 "통신적으로"라는 용어와 함께 또는 이런 용어 없이, "커플드" 또는 "커넥티드"라고 언급된 경우, 그것은 상기 어떤 구성요소가 상기 다른 구성요소에 직접적 으로(예: 유선으로), 무선으로, 또는 제 3 구성요소를 통하여 연결될 수 있다는 것을 의미한다.The embodiments of this document and the terminology used herein are not intended to limit the technical features described in this document to specific embodiments, but should be understood to include various modifications, equivalents, or substitutes of the embodiments. In connection with the description of the drawings, similar reference numerals may be used for similar or related components. The singular form of a noun corresponding to an item may include one or more of the items, unless the context clearly indicates otherwise. In this document, each of the phrases "A or B", "at least one of A and B", "at least one of A or B", "A, B, or C", "at least one of A, B, and C", and "at least one of A, B, or C" can include any one of the items listed together in the corresponding phrase, or all possible combinations thereof. Terms such as "first", "second", or "first" or "second" may be used merely to distinguish one component from another, and do not limit the components in any other respect (e.g., importance or order). When a component (e.g., a first) is referred to as "coupled" or "connected" to another (e.g., a second) component, with or without the terms "functionally" or "communicatively," it means that the component can be connected to the other component directly (e.g., wired), wirelessly, or through a third component.

본 문서에서 사용된 용어 "모듈"은 하드웨어, 소프트웨어 또는 펌웨어로 구현된 유닛을 포함할 수 있으며, 예를 들면, 로직, 논리 블록, 부품, 또는 회로 등의 용어와 상호 호환적으로 사용될 수 있다. 모듈은, 일체로 구성 된 부품 또는 하나 또는 그 이상의 기능을 수행하는, 상기 부품의 최소 단위 또는 그 일부가 될 수 있다. 예를 들면, 일 실시 예에 따르면, 모듈은 ASIC(application-specific integrated circuit)의 형태로 구현될 수 있다.The term "module" as used in this document may include a unit implemented in hardware, software or firmware, and may be used interchangeably with terms such as logic, logic block, component, or circuit. A module may be an integrally configured component or a minimum unit of the component or a part thereof that performs one or more functions. For example, according to one embodiment, a module may be implemented in the form of an application-specific integrated circuit (ASIC).

본 문서의 일 실시 예는 기기(machine)(예: 전자 장치(101)) 의해 읽을 수 있는 저장 매체(storage medium)(예: 내장 메모리(136) 또는 외장 메모리(138))에 저장된 하나 이상의 명령어들을 포함하는 소프트웨어 (예: 프로그램(140))로서 구현될 수 있다. 예를 들면, 기기(예: 전자 장치(101))의 프로세서(예: 프로세서 (120))는, 저장 매체로부터 저장된 하나 이상의 명령어들 중 적어도 하나의 명령을 호출하고, 그것을 실행할 수 있다. 이것은 기기가 상기 호출된 적어도 하나의 명령어에 따라 적어도 하나의 기능을 수행하도록 운영되는 것 을 가능하게 한다. 상기 하나 이상의 명령어들은 컴파일러에 의해 생성된 코드 또는 인터프리터에 의해 실행될 수 있는 코드를 포함할 수 있다. 기기로 읽을 수 있는 저장매체는, 비일시적(non-transitory) 저장매체의 형태로 제공될 수 있다. 여기서, '비일시적'은 저장매체가 실재(tangible)하는 장치이고, 신호(signal)(예: 전자기 파)를 포함하지 않는다는 것을 의미할 뿐이며, 이 용어는 데이터가 저장매체에 반영구적으로 저장되는 경우와 임시적으로 저장되는 경우를 구분하지 않는다.An embodiment of the present document may be implemented as software (e.g., a program (140)) including one or more instructions stored in a storage medium (e.g., an internal memory (136) or an external memory (138)) readable by a machine (e.g., an electronic device (101)). For example, a processor (e.g., a processor (120)) of the machine (e.g., the electronic device (101)) may call at least one instruction among the one or more instructions stored from the storage medium and execute it. This enables the machine to operate to perform at least one function according to the at least one called instruction. The one or more instructions may include code generated by a compiler or code executable by an interpreter. The machine-readable storage medium may be provided in the form of a non-transitory storage medium. Here, 'non-transitory' simply means that the storage medium is a tangible device and does not contain signals (e.g. electromagnetic waves), and the term does not distinguish between cases where data is stored semi-permanently or temporarily on the storage medium.

일 실시 예에 따르면, 본 문서에 개시된 일 실시 예에 따른 방법은 컴퓨터 프로그램 제품(computer program product)에 포함되어 제공될 수 있다. 컴퓨터 프로그램 제품은 상품으로서 판매자 및 구매자 간에 거래될 수 있다. 컴퓨터 프로그램 제품은 기기로 읽을 수 있는 저장 매체(예: compact disc read only memory (CDROM))의 형태로 배포되거나, 또는 어플리케이션 스토어(예: 플레이 스토어 TM)를 통해 또는 두개의 사용자 장치들 (예: 스마트폰들) 간에 직접, 온라인으로 배포(예: 다운로드 또는 업로드)될 수 있다. 온라인 배포의 경우에, 컴퓨터 프로그램 제품의 적어도 일부는 제조사의 서버, 어플리케이션 스토어의 서버, 또는 중계 서버의 메모리 와 같은 기기로 읽을 수 있는 저장 매체에 적어도 일시 저장되거나, 임시적으로 생성될 수 있다.According to one embodiment, the method according to one embodiment disclosed in the present document may be provided as included in a computer program product. The computer program product may be traded between a seller and a buyer as a commodity. The computer program product may be distributed in the form of a machine-readable storage medium (e.g., a compact disc read only memory (CDROM)), or may be distributed online (e.g., downloaded or uploaded) via an application store (e.g., Play Store TM) or directly between two user devices (e.g., smartphones). In the case of online distribution, at least a part of the computer program product may be at least temporarily stored or temporarily generated in a machine-readable storage medium, such as a memory of a manufacturer's server, a server of an application store, or an intermediary server.

일 실시 예에 따르면, 상기 기술한 구성요소들의 각각의 구성요소(예: 모듈 또는 프로그램)는 단수 또는 복수의 개체를 포함할 수 있다. 일 실시 예에 따르면, 전술한 해당 구성요소들 중 하나 이상의 구성요소들 또는 동작 들이 생략되거나, 또는 하나 이상의 다른 구성요소들 또는 동작들이 추가될 수 있다. 대체적으로 또는 추가적으로, 복수의 구성요소들(예: 모듈 또는 프로그램)은 하나의 구성요소로 통합될 수 있다. 이런 경우, 통합된 구성요소는 상기 복수의 구성요소들 각각의 구성요소의 하나 이상의 기능들을 상기 통합 이전에 상기 복수의 구 성요소들 중 해당 구성요소에 의해 수행되는 것과 동일 또는 유사하게 수행할 수 있다. 일 실시 예에 따르면, 모듈, 프로그램 또는 다른 구성요소에 의해 수행되는 동작들은 순차적으로, 병렬적으로, 반복적으로, 또는 휴리 스틱하게 실행되거나, 상기 동작들 중 하나 이상이 다른 순서로 실행되거나, 생략되거나, 또는 하나 이상의 다 른 동작들이 추가될 수 있다.According to one embodiment, each of the components (e.g., modules or programs) described above may include a single or multiple entities. According to one embodiment, one or more of the components or operations of the aforementioned components may be omitted, or one or more other components or operations may be added. Alternatively or additionally, a plurality of components (e.g., modules or programs) may be integrated into a single component. In such a case, the integrated component may perform one or more functions of each of the components of the plurality of components identically or similarly to those performed by the corresponding component of the plurality of components prior to the integration. According to one embodiment, the operations performed by a module, program or other component may be executed sequentially, in parallel, repeatedly, or heuristically, or one or more of the operations may be executed in a different order, omitted, or one or more other operations may be added.

100: 사용자 별 맞춤형 정규 표현식을 이용한 텍스트 변환 서비스를 지원하는 시스템
110: 입력부
120: 음성 인식-변환 모델부
121: 음성-텍스트 변환부
122: 필터링부
123: 정규표현식 치환부
130: 정규표현식 제공부100: A system that supports text conversion services using user-specific customized regular expressions.
110: Input section
120: Speech Recognition-Conversion Model Section
121: Speech-to-text conversion unit
122: Filtering section
123: Regular expression substitution
130: Regular expression provider

Claims

An input section for receiving the user's spoken voice;
A speech recognition-conversion model unit that converts the above-mentioned spoken voice into a primary text and replaces the primary text with a secondary text, which is a normalized sentence, by applying a preset regular expression to be utilized by each user; and
Includes a regular expression setting section that sets and provides predefined regular expressions,
The above speech recognition-conversion model part
A voice-to-text conversion unit that converts the above spoken voice into text through an STT program;
A filtering unit that determines whether the strings in the converted primary text contain strings of a predefined regular expression; and
It is characterized by including a regular expression substitution unit that substitutes a string confirmed to contain a string of a regular expression defined in the above filtering unit based on a preset regular expression pattern.
A system that supports text conversion services using customized regular expressions for each user.

delete

A step of receiving the user's spoken voice from the input unit; and
In the speech recognition-conversion model section, the spoken voice is converted into a primary text, and a step is included in replacing the primary text with a secondary text, which is a normalized sentence, by applying a preset regular expression to be utilized by each user.
The above speech recognition-conversion model part
A voice-to-text conversion unit that converts the above spoken voice into text through an STT program;
A filtering unit that determines whether the strings in the converted primary text contain strings of a predefined regular expression; and
It is characterized by including a regular expression substitution unit that substitutes a string confirmed to contain a string of a regular expression defined in the above filtering unit based on a preset regular expression pattern.
A method to support text conversion services using user-specific customized regular expressions.

delete