KR20140127036A

KR20140127036A - Server and method for spam filtering

Info

Publication number: KR20140127036A
Application number: KR1020130045535A
Authority: KR
Inventors: 권태현
Original assignee: (주)네오위즈게임즈
Priority date: 2013-04-24
Filing date: 2013-04-24
Publication date: 2014-11-03

Abstract

정확한 확률로 스팸 메시지를 필터링하는 기술을 제공한다. 본 발명의 일 실시예에 따른 스팸 필터링 서버는, 기설정된 단어들을 저장하고 있고, 단말들 사이에서 메시지를 송수신하는 서비스를 제공하는 메시지 서버에서 전송되는 메시지가 기설정된 단어들 중 적어도 하나를 포함하는지 여부에 따라서, 전송되는 메시지를 스팸 가능성이 있는 메시지인 제1 메시지로 분류하는 제1 필터링부; 제1 메시지로 분류된 메시지가 메시지 서버에서 외부 단말로 전송될 시, 제1 메시지로 분류된 메시지를 스팸 메시지인 제2 메시지로 분류할지 여부를 문의하는 정보로서 제1 요청 정보를 제1 메시지에 포함하여 전송하도록 제어하는 스팸 정보 송신부; 및 제1 요청 정보에 대한 외부 단말의 응답 정보를 수신하고, 응답 정보에 상기 제1 메시지로 분류된 메시지를 제2 메시지로 분류한다는 정보가 포함되는 경우, 상기 제1 메시지로 분류된 메시지를 제2 메시지로 분류하여 스팸 메시지 데이터베이스에 저장하는 제2 필터링부;를 포함하는 것을 특징으로 한다.Provides techniques to filter spam messages with precise probability. The spam filtering server according to an embodiment of the present invention stores predetermined words and determines whether a message transmitted from a message server providing a service for sending and receiving messages between terminals includes at least one of predetermined words A first filtering unit for classifying a transmitted message into a first message, which is a message having a possibility of spam, according to whether or not the message is spam; When the message classified as the first message is transmitted from the message server to the external terminal, the first request information is information for inquiring whether to classify the message classified as the first message into the second message as the spam message, A spam information transmission unit for controlling the transmission of the spam information; And receiving response information of an external terminal with respect to the first request information, and when the response information includes information that classifies a message classified as the first message into a second message, 2 message and storing the classified message in the spam message database.

Description

TECHNICAL FIELD [0001] The present invention relates to a spam filtering server and a spam filtering server,

본 발명은 스팸 메시지를 필터링하기 위한 기술에 관한 것으로, 구체적으로는 다원화된 스팸 메시지 필터링 방법을 적용하여 스팸 메시지 필터링의 정확도를 향상시키기 위한 기술에 관한 것이다. BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a technique for filtering a spam message, and more particularly, to a technique for improving the accuracy of spam message filtering by applying a multi-dimensional spam message filtering method.

최근 데이터의 송수신 기술이 발달하면서, 이동 단말, 컴퓨터 등을 이용하여 사용자들 사이에서 메시지를 송수신하는 서비스의 이용률이 증가하고 있다. 메시지 송수신 서비스는 예를 들어, 실시간 채팅 서비스, 쪽지 서비스, 메일 서비스 등이 있다.Recently, as a technology for transmitting and receiving data has been developed, the utilization rate of services for transmitting and receiving messages among users is increasing by using a mobile terminal, a computer, and the like. The message transmission / reception service includes, for example, a real-time chat service, a note service, and a mail service.

사용자들은 메시지 송수신 서비스를 이용하여 사적인 정보를 교환하거나, 업무를 수행함으로써, 오프라인(Off-line)에서의 메시지 송수신보다 더욱 빠른 정보 교환을 할 수 있게 되었다.Users can exchange information more quickly than sending and receiving messages off-line by exchanging private information or performing work using a message transmission / reception service.

그러나 이러한 메시지 송수신 서비스에서 사용자들은 스팸(Spam) 메시지에 의하여 많은 불편함을 겪어왔다. 스팸이란 사용자가 원하지 않는 광고 메시지를 주로 의미한다.However, in such a message transmission / reception service, users have suffered from a lot of inconvenience due to a spam message. Spam means primarily advertising messages that users do not want.

이러한 스팸 메시지를 필터링하여 사용자들에게 전송되는 것을 차단하거나, 사용자들의 메시지 수신 모듈에 포함된 스팸 메시지 저장 공간에 저장하여 사용자들의 불편함을 없애기 위한 연구가 계속되어 왔다.Research has continued to filter such spam messages to prevent them from being sent to users, or to save users' inconvenience by storing them in the spam message storage space included in the message reception module of users.

메시지 서버에서 전송되는 메시지를 전송하기 전, 일정한 스팸 필터링 시스템을 이용하여 전송되는 메시지가 스팸 메시지인지를 판단함으로써 스팸 메시지를 필터링하고 있다. 이에 따라서 스팸 필터링 확률이 높은 스팸 필터링 시스템을 개발하기 위한 연구가 계속되고 있다.A spam message is filtered by determining whether a message transmitted using a certain spam filtering system is a spam message before transmitting a message transmitted from the message server. As a result, researches continue to develop a spam filtering system with high probability of spam filtering.

그러나 기존의 스팸 필터링 시스템은, 스팸 메시지가 아님에도 불구하고 스팸 메시지로 분류하거나, 스팸 메시지임에도 불구하고 스팸 메시지가 아닌 것으로 분류하여 스팸 필터링의 정확도가 매우 낮아 사용자들의 불편함을 일으켜 왔다.However, the existing spam filtering system classified spam messages as spam messages even though they are not spam messages, or classified spam messages as spam messages, and the accuracy of spam filtering is very low.

이에 본 발명은 기존의 스팸 필터링 시스템에 비하여 매우 높은 스팸 필터링 정확도를 가지고, 스팸 필터링 시스템의 피드백을 통해 더욱 신뢰성 있는 스팸 필터링 기술을 제공하는 데 그 목적이 있다.Accordingly, it is an object of the present invention to provide a more reliable spam filtering technique through feedback of a spam filtering system, having a very high spam filtering accuracy as compared with the existing spam filtering system.

상기 목적을 달성하기 위하여, 본 발명의 일 실시예에 따른 스팸 필터링 서버는, 기설정된 단어들을 저장하고 있고, 단말들 사이에서 메시지를 송수신하는 서비스를 제공하는 메시지 서버에서 전송되는 메시지가 상기 기설정된 단어들 중 적어도 하나를 포함하는지 여부에 따라서, 상기 전송되는 메시지를 스팸 가능성이 있는 제1 메시지로 분류하는 제1 필터링부; 상기 제1 메시지로 분류된 메시지가 상기 메시지 서버에서 외부 단말로 전송될 시, 상기 제1 메시지로 분류된 메시지를 스팸 메시지인 제2 메시지로 분류할지 여부를 문의하는 정보로서 제1 요청 정보를 제1 메시지에 포함하여 전송하도록 제어하는 스팸 정보 송신부; 및 상기 제1 요청 정보에 대한 상기 외부 단말의 응답 정보를 수신하고, 상기 응답 정보에 상기 제1 메시지로 분류된 메시지를 상기 제2 메시지로 분류한다는 정보가 포함되는 경우, 상기 제1 메시지로 분류된 메시지를 상기 제2 메시지로 분류하여 스팸 메시지 데이터베이스에 저장하는 제2 필터링부;를 포함하는 것을 특징으로 한다.In order to achieve the above object, a spam filtering server according to an embodiment of the present invention stores preset words, and a message transmitted from a message server providing a service for transmitting / A first filtering unit for classifying the transmitted message into a first message having a possibility of spam, depending on whether the message includes at least one of words; When the message classified by the first message is transmitted from the message server to the external terminal, inquiring whether to classify the message classified as the first message into the second message as the spam message, 1 message and transmits the spam information; And when the response information includes information indicating that the message classified into the first message is classified into the second message, receiving the response information of the external terminal with respect to the first request information, And a second filtering unit for classifying the received message into the second message and storing the classified message in the spam message database.

본 발명의 일 실시예에 따른 스팸 필터링 방법은, 스팸 필터링 서버가, 단말들 사이에서 메시지를 송수신하는 서비스를 제공하는 메시지 서버에서 전송되는 메시지가, 기 저장된 기설정된 단어들 중 적어도 하나를 포함하는지 여부에 따라서, 상기 전송되는 메시지를 스팸 가능성이 있는 제1 메시지로 분류하는 단계; 상기 제1 메시지로 분류된 메시지가 상기 메시지 서버에서 외부 단말로 전송될 시, 상기 제1 메시지로 분류된 메시지를 스팸 메시지인 제2 메시지로 분류할지 여부를 문의하는 정보로서 제1 요청 정보를 제1 메시지에 포함하여 전송하도록 제어하는 단계; 및 상기 제1 요청 정보에 대한 상기 외부 단말의 응답 정보에 상기 제1 메시지로 분류된 메시지를 상기 제2 메시지로 분류한다는 정보가 포함되는 경우, 상기 제1 메시지로 분류된 메시지를 상기 제2 메시지로 분류하여 스팸 메시지 데이터베이스에 저장하는 단계;를 포함하는 것을 특징으로 한다.The spam filtering method according to an embodiment of the present invention is a method in which a message sent from a message server providing a service for sending and receiving a message between terminals includes at least one of pre- Classifying the transmitted message into a first message having a possibility of spam, depending on whether the message is spam or not; When the message classified by the first message is transmitted from the message server to the external terminal, inquiring whether to classify the message classified as the first message into the second message as the spam message, 1 message to be transmitted; And if the response information of the external terminal for the first request information includes information for classifying a message classified as the first message into the second message, And storing the classified message in a spam message database.

본 발명의 다른 실시예에 따른 스팸 필터링 방법은, 스팸 필터링 서버가, 단말들 사이에서 메시지를 송수신하는 서비스를 제공하는 메시지 서버에서 전송되는 메시지를 스팸 메시지인 제2 메시지로 분류할지 여부를 결정하는 단계; 상기 전송되는 메시지가 상기 제2 메시지로 분류되지 않은 경우, 기 저장된 기설정된 단어들 중 적어도 하나를 포함하는지 여부에 따라서, 상기 전송되는 메시지를 스팸 가능성이 있는 제1 메시지로 분류하는 단계; 상기 제1 메시지로 분류된 메시지가 상기 메시지 서버에서 외부 단말로 전송될 시, 상기 제1 메시지로 분류된 메시지를 스팸 메시지인 제2 메시지로 분류할지 여부를 문의하는 정보로서 제1 요청 정보를 제1 메시지에 포함하여 전송하도록 제어하는 단계; 및 상기 제1 요청 정보에 대한 상기 외부 단말의 응답 정보에 상기 제1 메시지로 분류된 메시지를 상기 제2 메시지로 분류한다는 정보가 포함되는 경우, 상기 제1 메시지로 분류된 메시지를 상기 제2 메시지로 분류하여 스팸 메시지 데이터베이스에 저장하는 단계;를 포함하는 것을 특징으로 한다.The spam filtering method according to another embodiment of the present invention determines whether or not the spam filtering server classifies a message transmitted from a message server providing a service for sending and receiving messages between terminals to a second message that is a spam message step; Classifying the transmitted message as a first message having a possibility of spam according to whether the transmitted message is not classified as the second message or not, if the transmitted message includes at least one of pre-stored words; When the message classified by the first message is transmitted from the message server to the external terminal, inquiring whether to classify the message classified as the first message into the second message as the spam message, 1 message to be transmitted; And if the response information of the external terminal for the first request information includes information for classifying a message classified as the first message into the second message, And storing the classified message in a spam message database.

본 발명에 의하면, 제1 필터링부에 의하여 1차적으로 스팸이 예상되는 메시지를 추출하고, 이를 외부 단말의 응답 정보에 따라서 제2 필터링부가 스팸 메시지로 최종 결정하는 것을 통해 스팸 메시지를 추출하기 때문에, 기존의 스팸 필터링 시스템보다 더욱 정확한 스팸 필터링 시스템을 제공할 수 있는 효과가 있다.According to the present invention, a spam message is extracted by extracting a message that is primarily expected to be spam by the first filtering unit and finally determining the spam message as a second filtering unit spam message according to the response information of the external terminal, It is possible to provide a more accurate spam filtering system than the existing spam filtering system.

또한, 사용자로부터의 스팸 신고, 관리자 단말로부터의 모니터링, 패턴 분석 등을 통한 다원화된 필터링 시스템과 함께 스팸 필터링에 대한 재학습 모듈을 제공하기 때문에, 정확하게 스팸 메시지만을 걸러내는 것이 가능하며, 스팸 메시지 패턴이 다양화되더라도 이에 용이하게 적응함으로써, 스팸 필터링에 대한 신뢰성을 높일 수 있는 효과가 있다.In addition, since a re-learning module for spam filtering is provided together with a diversified filtering system through reporting of spam from a user, monitoring from an administrator terminal, pattern analysis, etc., it is possible to correctly filter only spam messages, So that the reliability of the spam filtering can be improved.

도 1은 본 발명의 일 실시예에 따른 스팸 필터링 서버의 구성도이다.
도 2 내지 8은 본 발명의 각 실시예에 따른 스팸 필터링 방법의 플로우차트이다.
도 9는 본 발명의 각 실시예의 구현에 따라서 사용자 단말에 표시되는 메시지 수신 화면의 예를 도시한 것이다.1 is a configuration diagram of a spam filtering server according to an embodiment of the present invention.
2 to 8 are flowcharts of a spam filtering method according to each embodiment of the present invention.
FIG. 9 shows an example of a message reception screen displayed on a user terminal according to an implementation of each embodiment of the present invention.

이하 첨부된 도면을 참조하여 본 발명의 각 실시예에 따른 스팸 필터링 서버 및 방법에 대하여 설명하기로 한다.Hereinafter, a spam filtering server and method according to embodiments of the present invention will be described with reference to the accompanying drawings.

이하의 설명에서 본 발명에 대한 이해를 명확히 하기 위하여, 본 발명의 특징에 대한 공지의 기술에 대한 설명은 생략하기로 한다. 이하의 실시 예는 본 발명의 이해를 돕기 위한 상세한 설명이며, 본 발명의 권리 범위를 제한하는 것이 아님은 당연할 것이다. 따라서, 본 발명과 동일한 기능을 수행하는 균등한 발명 역시 본 발명의 권리 범위에 속할 것이다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, the present invention will be described in detail with reference to the accompanying drawings. The following examples are intended to illustrate the present invention and should not be construed as limiting the scope of the present invention. Accordingly, equivalent inventions performing the same functions as the present invention are also within the scope of the present invention.

이하의 설명에서 동일한 식별 기호는 동일한 구성을 의미하며, 불필요한 중복적인 설명 및 공지 기술에 대한 설명은 생략하기로 한다.In the following description, the same reference numerals denote the same components, and unnecessary redundant explanations and descriptions of known technologies will be omitted.

본 발명의 실시 예에서 “통신”, “통신망” 및 “네트워크”는 동일한 의미로 사용될 수 있다. 상기 세 용어들은, 파일을 사용자 단말, 다른 사용자들의 단말 및 다운로드 서버 사이에서 송수신할 수 있는 유무선의 근거리 및 광역 데이터 송수신망을 의미한다.In the embodiments of the present invention, " communication ", " communication network ", and " network " The three terms refer to wired and wireless local area and wide area data transmission and reception networks capable of transmitting and receiving a file between a user terminal, a terminal of another user, and a download server.

이하의 설명에서 “서버”란, 메시지를 스팸으로 분류할지 여부를 결정하는 서버 컴퓨터를 의미한다. 용량이 작거나 처리 데이터의 수가 작은 서비스의 경우 하나의 서버에 다수의 서비스가 운영될 수 있다. 또한, 용량이 매우 크거나 실시간으로 스팸 여부를 결정해야 하는 메시지의 양이 많은 서비스의 경우, 서비스의 기능에 따라서 하나의 서비스의 운영을 위한 서버가 하나 이상 존재할 수도 있다.In the following description, " server " means a server computer that determines whether to classify a message as spam. In the case of a service having a small capacity or a small number of processing data, a plurality of services can be operated on one server. Also, in the case of a service having a very large capacity or a large amount of messages that need to be determined as spam in real time, there may be one or more servers for operating one service according to the function of the service.

또한 서버에는 데이터 베이스에 대한 미들웨어나 결제 처리를 수행하는 서버들이 연결될 수 있으나, 본 발명에서는 이에 대한 설명은 생략하기로 한다.In addition, middleware for the database and servers performing settlement processing may be connected to the server, but a description thereof will be omitted in the present invention.

본 발명에서 스팸 필터링은, 사용자들 사이에서 메시지의 송수신이 가능한 서비스인 메시지 송수신 서비스를 제공하는 메시지 서버로부터 전송되는 메시지 각각을 분석하여 해당 메시지가 스팸 메시지인지 여부를 결정하는 모든 과정을 의미한다.In the present invention, the term 'spam filtering' refers to the entire process of analyzing each message transmitted from a message server providing a message transmission / reception service, which is a service capable of transmitting / receiving a message between users, and determining whether the corresponding message is a spam message.

스팸 메시지란, 인터넷상의 다수 수신인에게 무더기로 송신된 전자 우편(e-mail) 메시지, 또는 다수 뉴스그룹(newsgroup)에 일제히 게재된 뉴스 기사. 우편을 통해 불특정 다수의 수취인에게 무더기로 발송된 광고나 선전 우편물(junk mail)과 같은 의미이다. Spam messages are e-mail messages sent in bulk to a large number of recipients on the Internet, or news articles that are simultaneously posted in a large number of newsgroups. It is the same as an advertisement or a junk mail sent to a lot of unspecified number of recipients by mail.

스팸은 대부분의 경우에 수신인이 원하지도 않고 관심도 없는 메시지이거나 각 뉴스그룹의 토론 주제와도 상관이 없는 기사들이다. 이와 같은 메시지를 송신하거나 기사를 게재(port)하는 행위를 스패밍(spamming)이라고 한다. 스패밍은 명목상의 아주 적은 비용으로 다수의 사람들에게 상품을 광고하거나 특정 종교를 포교하거나 심지어 특정인, 특정 상품 또는 특정 기업을 비방할 목적으로 인터넷을 악용하는 행위로 취급받는다.In most cases, spam is a message that the recipient does not want and does not care about, or that is not related to the discussion topic of each newsgroup. The act of sending such a message or porting an article is called spamming. Spamming is treated as an act of abusing the Internet for the purpose of advertising goods to a large number of people at a nominal cost, propagating a particular religion, or even denigrating a particular person, a particular product or a particular business.

따라서 본 발명의 각 실시예에 따른 스팸 필터링 서버 및 방법은, 메시지 서버에 포함되어 있거나 메시지 서버에 연결된 별도의 서버로 구성될 수 있다. 이하에서 스팸 필터링 서버는 서버로 지칭하며, 메시지 서버는 메시지 서버로 지칭할 수 있다.Therefore, the spam filtering server and method according to each embodiment of the present invention may be comprised of a separate server connected to the message server or included in the message server. Hereinafter, the spam filtering server is referred to as a server, and the message server may be referred to as a message server.

이하에서는 도 1에 대한 설명을 도 2 내지 8의 플로우차트를 통해 설명하기로 한다. 이를 통해, 도 2 내지 8의 각 실시예에 따른 스팸 필터링 방법의 기술적 특징은, 도 1의 실시예에 따른 스팸 필터링 서버의 각 구성이 수행할 수 있는 것으로 이해될 것이다.Hereinafter, the description of FIG. 1 will be described with reference to the flowcharts of FIGS. Accordingly, it will be understood that the technical features of the spam filtering method according to each embodiment of FIGS. 2 to 8 can be performed by the respective components of the spam filtering server according to the embodiment of FIG.

도 1은 본 발명의 일 실시예에 따른 스팸 필터링 서버의 구성도이다.1 is a configuration diagram of a spam filtering server according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 일 실시예에 따른 스팸 필터링 서버(10)는, 제1 필터링부(11), 스팸 정보 송신부(12) 및 제2 필터링(13)부를 기본적으로 포함하며, 필터 학습 모듈(14)을 추가적으로 포함할 수 있다.Referring to FIG. 1, a spam filtering server 10 according to an embodiment of the present invention basically includes a first filtering unit 11, a spam information transmitting unit 12, and a second filtering unit 13, Learning module 14 may be additionally included.

도 1과 함께 도 2를 참조하면, 먼저 제1 필터링부(11)는 메시지 서버(30)에서 전송되는 메시지가 제1 필터링부(11)에 저장되어 있는 기설정된 단어들 중 적어도 하나를 포함하는지 여부에 따라, 전송되는 메시지를 제1 메시지로 분류하는 단계(S10)를 수행한다.Referring to FIG. 2 together with FIG. 1, the first filtering unit 11 determines whether a message transmitted from the message server 30 includes at least one of preset words stored in the first filtering unit 11 (S10) of classifying the transmitted message into the first message.

제1 필터링부(11)는 메시지 서버(30)에서 전송되는 메시지를 메시지 전송 전 수신하여, 전송되는 메시지가 제1 메시지에 해당하는지 여부를 판단한다. 제1 메시지는 스팸일 가능성이 높은 메시지를 의미하며, 기존의 스팸 필터링 시스템에서는 제1 메시지로 분류될 경우 스팸 메시지로 분류된다.The first filtering unit 11 receives a message transmitted from the message server 30 before message transmission and determines whether the transmitted message corresponds to the first message. The first message means a message likely to be spam, and in the existing spam filtering system, it is classified as a spam message when it is classified as the first message.

제1 필터링부(11)는 구체적으로, 도 3에 기재된 흐름을 통해 전송되는 메시지가 제1 메시지인지 여부를 결정하게 된다.The first filtering unit 11 specifically determines whether the message transmitted through the flow described in FIG. 3 is the first message.

도 3을 참조하면, 제1 필터링부(11)는 먼저, 전송되는 메시지의 내용을 추출하여, 메시지의 내용에 기설정된 단어들 중 적어도 하나가 포함되는지 여부에 따라서 스팸 확률을 연산하고(S11), 연산된 스팸 확률이 기설정된 임계 확률(예를 들어 80%)을 초과하는 경우 전송되는 메시지를 제1 메시지로 분류하는 단계(S12)를 수행한다.Referring to FIG. 3, the first filtering unit 11 first extracts the content of the transmitted message, calculates a spam probability according to whether at least one of predetermined words is included in the content of the message (S11) , And classifying the transmitted message as the first message (S12) when the calculated spam probability exceeds a preset threshold probability (for example, 80%).

도 3의 실시예 및 제1 필터링부(11)가 수행하는 기능의 대표적인 예는 Bayesian filter이다. Bayesian filter는 1980년대 수학 이론으로서, 확률을 기반으로 데이터를 필터링하는 대표적인 기술이다. A representative example of the function performed by the embodiment of FIG. 3 and the first filtering unit 11 is a Bayesian filter. The Bayesian filter is a mathematical theory of the 1980s, and is a representative technique for filtering data based on probability.

Bayesian filter는 스팸 메시지와 스팸이 아닌 메시지의 표본을 추출하여 합성한 뒤, 각 단어의 스팸 빈도(확률)을 학습하고, 학습된 인덱스와 메시지를 비교하여 메시지에 포함된 단어를 추출하여 스팸 확률을 계산함으로써 스팸 메시지를 분류하는 필터이다.Bayesian filter extracts and combines spam messages and non-spam samples, then learns the spam frequency (probability) of each word, compares the learned index with the message, extracts the word contained in the message, It is a filter that classifies spam messages by calculation.

기존의 스팸 필터링 시스템 대부분이 바로 Bayesian filter를 사용한 필터링 방법으로, Bayesian filter의 기능성을 높이기 위해 학습 시스템을 적용시켜왔다. 학습 시스템은 Bayesian filter를 통해 스팸으로 분류된 메시지에 포함된 단어를 학습된 인덱스에 반영하여 단어의 스팸 빈도를 업데이트하는 것이다. Most of the existing spam filtering system is a filtering method using Bayesian filter, and the learning system has been applied to improve the functionality of Bayesian filter. The learning system updates the spam frequency of the word by reflecting the word contained in the message classified as spam through the Bayesian filter to the learned index.

그러나 기존의 Bayesian filter를 이용한 스팸 필터링에서는, Bayesian poison이 발생할 수 있어 문제점이 지적되어 왔다. Bayesian poison이란, 스팸이 아닌 메시지를 다량의 메시지로 전송하여, Bayesian filter가 정상 메시지를 스팸으로 판단하도록 하는 것이다.However, Bayesian poisoning is a problem in spam filtering using existing Bayesian filter. Bayesian poison is to send a non-spam message to a large number of messages, so that the Bayesian filter will judge the normal message as spam.

본 발명에서는, Bayesian filter의 상기의 문제점 및 학습에 의한 오류를 방지하기 위해, Bayesian filter를 채용하는 필터링 결과 전송되는 메시지가 스팸일 가능성이 있는 제1 메시지인지 여부만을 판단하고, 후술하는 단계를 통해 필터링 정확도를 높이고 있다.In the present invention, to prevent the above-mentioned problems of the Bayesian filter and errors due to learning, it is only determined whether or not the message transmitted as the filtering result employing the Bayesian filter is the first message likely to be spam, Filtering accuracy is being improved.

제1 필터링부(11)에서는 예를 들어 Bayesian filter를 이용하여 전송되는 메시지를 제1 메시지로 분류하고 있으나, 이외의 알려진 일반적인 자동 스팸 필터링 방식이 모두 사용될 수 있음은 당연할 것이다.In the first filtering unit 11, for example, a message transmitted using a Bayesian filter is classified as a first message, but it is natural that all other known automatic spam filtering methods can be used.

다시 도 1 및 2를 참조하면, 한편 스팸 정보 송신부(12)는 제1 메시지로 분류된 메시지가 메시지 서버(30)를 통해 외부 단말(20)에 전송될 때, 제1 메시지를 스팸 메시지인 제2 메시지로 분류할지 여부를 문의하는 정보로서 제1 요청 정보를 제1 메시지에 포함하여 전송하도록 메시지 서버(20)를 제어하는 단계(S20)를 수행한다.Referring back to FIGS. 1 and 2, when the message classified as the first message is transmitted to the external terminal 20 through the message server 30, the spam information transmitter 12 transmits the first message as a spam message (S20) controlling the message server 20 to include the first request information in the first message as information for inquiring whether to classify the message into two messages.

상술한 바와 같이 제1 메시지는 스팸일 가능성이 높은 메시지로 분류된 메시지를 의미한다. 본 발명에서는 제1 메시지를 스팸으로 확정하지 않고, 먼저 메시지를 수신하는 외부 단말(20)에 메시지 서버(30)가 메시지를 전송할 때, 스팸일 가능성이 있으며 해당 메시지를 스팸으로 확정할지 여부를 문의하는 제1 요청 정보를 추가하여 전송하도록 하는 것이다.As described above, the first message means a message classified as a message likely to be spam. According to the present invention, when the message server 30 transmits a message to the external terminal 20 receiving the message, it is possible that the first message is not spam, and whether or not the message is determined as spam And transmits the first request information.

도 9를 참조하면 제1 요청 정보가 추가된 메시지의 수신 화면의 예가 도시되어 있다.Referring to FIG. 9, an example of a reception screen of a message to which first request information is added is shown.

메시지 수신 화면(100)을 참조하면, 메시지의 내용 정보(101)가 포함될 수 있으며, 제1 요청 정보(102)가 포함될 수 있다. 제1 요청 정보(102)는 해당 메시지가 스팸일 가능성이 높다는 것을 알리는 정보와, 해당 메시지를 스팸으로 확정할지 여부를 문의하는 정보가 포함될 수 있다. Referring to the message receiving screen 100, the contents information 101 of the message can be included and the first request information 102 can be included. The first request information 102 may include information indicating that the corresponding message is highly likely to be spam and information for inquiring whether to confirm the message as spam.

외부 단말(20)을 이용하는 사용자는, 응답 정보(103)를 선택할 수 있으며, 해당 응답 정보는 메시지 서버(30)를 통해 제2 필터링부(13)에 전송된다.A user using the external terminal 20 can select the response information 103 and the response information is transmitted to the second filtering unit 13 through the message server 30. [

다시 도 1 및 2를 참조하면, 제2 필터링부(13)는, 제1 요청 정보에 대한 외부 단말(20)의 상술한 응답 정보를 수신한 뒤, 응답 정보에 제1 메시지로 분류된 상태의 메시지를 제2 메시지, 즉 스팸 메시지로 분류한다는 정보가 포함되는 경우, 제1 메시지로 분류된 메시지를 제2 메시지로 분류하여 스팸 메시지 데이터베이스에 저장하는 단계(S30)를 수행한다.1 and 2, the second filtering unit 13 receives the above-described response information of the external terminal 20 with respect to the first request information, If the message is classified as the second message, that is, the spam message, the message classified as the first message is classified into the second message and stored in the spam message database (S30).

스팸 메시지 데이터베이스는 도면에 도시되지 않았으나, 제2 필터링부(13) 또는 메시지 서버(30)의 저장 공간에 마련될 수 있다. 스팸 메시지 데이터베이스에는, 스팸 메시지의 내용 정보와 함께 스팸 메시지의 송신 정보가 저장될 수 있다.The spam message database is not shown in the drawing, but may be provided in the storage space of the second filtering unit 13 or the message server 30. [ In the spam message database, the contents information of the spam message and the transmission information of the spam message can be stored.

스팸 메시지의 송신 정보는 예를 들어 송신 IP정보, 송신자 정보 등, 송신자를 식별할 수 있는 모든 정보를 의미한다.The transmission information of the spam message refers to all information capable of identifying the sender, such as, for example, transmission IP information and sender information.

한편 제2 필터링부(13)는 제1 메시지에 대한 제1 요청 정보에 대응하는 응답 정보의 수신과 무관하게 일정 메시지를 제2 메시지로 분류할 수 있다. 이에 대한 설명이 도 4 내지 6에 도시되어 있다.On the other hand, the second filtering unit 13 can classify a certain message into the second message irrespective of the reception of the response information corresponding to the first request information for the first message. A description thereof is shown in Figs. 4 to 6. Fig.

예를 들어, 도 1 및 도 4를 참조하면, 제2 필터링부(13)는 외부 단말(20)로부터 메시지 서버(30)에서 전송된 메시지를 제2 메시지로 분류할 것을 요청하는 제2 요청 정보를 수신하는 단계(S40)를 수행할 경우, 제2 요청 정보에 대응하는 메시지를 제2 메시지로 분류하는 단계(S50)를 수행한다.For example, referring to FIG. 1 and FIG. 4, the second filtering unit 13 may receive second request information requesting classification of a message transmitted from the message server 30 as a second message from the external terminal 20, (S40), a step S50 of classifying a message corresponding to the second request information into a second message is performed.

예를 들어 메시지 서버(30)를 통해 메시지 송수신 서비스를 이용하는 사용자가, 스팸으로 분류되지 않은 메시지를 확인하고 해당 메시지를 스팸으로 등록하고자 원하는 경우가 있다. 이러한 경우 사용자는 외부 단말(20)을 통해 메시지를 스팸 메시지로 등록할 수 있으며, 제2 필터링부(13)는 이 정보를 제2 요청 정보로 수신하여 해당 메시지를 제2 메시지로 분류하게 되는 것이다.For example, a user who uses the message transmission / reception service through the message server 30 may want to check a message not classified as spam and register the message as spam. In this case, the user can register the message as a spam message through the external terminal 20, and the second filtering unit 13 receives the information as the second request information and classifies the message as the second message .

또는, 메시지 서버(30)의 관리자 단말 역시 외부 단말(20)에 포함될 수 있다. 이 경우 외부 단말(20)은 메시지 서버(30)에서 전송되는 메시지들을 모니터링하는 기능을 수행하는 단말을 의미한다.Alternatively, the administrator terminal of the message server 30 may also be included in the external terminal 20. In this case, the external terminal 20 means a terminal performing a function of monitoring messages transmitted from the message server 30.

메시지 서버(30)의 관리자 단말을 통해 관리자는 메시지를 수동으로 체크하여 해당 메시지가 스팸인지 여부를 정할 수 있다. 또는 메시지의 발송 패턴을 수동으로 확인할 수 있다.The administrator can manually check the message through the administrator terminal of the message server 30 to determine whether the corresponding message is spam or not. Alternatively, you can manually check the sending pattern of the message.

이때 메시지 송수신 서비스의 이용에 관계없이 관리자가 전송되는 메시지들 중 일부를 스팸으로 확정할 수 있으며, 이 경우 제2 요청 정보가 관리자 단말에 의해 전송될 수 있는 것이다.At this time, regardless of the use of the message transmission / reception service, the administrator can confirm some of the transmitted messages as spam. In this case, the second request information can be transmitted by the administrator terminal.

한편, 상기 언급한 바와 같이 스팸 메시지 데이터베이스에는 제2 메시지의 정보를 저장 시, 메시지의 내용 및 송신자를 식별할 수 있는 모든 정보가 저장될 수 있다.Meanwhile, as described above, when storing the information of the second message in the spam message database, all information that can identify the content of the message and the sender can be stored.

이때 제2 필터링부(13)는, 메시지 서버(30)에서 전송된 메시지가 스팸 메시지 데이터베이스에 저장된 송신자 식별 정보에 대응하는 메시지인 경우에, 해당 메시지를 바로 제2 메시지로 분류하여 스팸 메시지 데이터베이스에 저장할 수 있다.In this case, when the message transmitted from the message server 30 is a message corresponding to the sender identification information stored in the spam message database, the second filtering unit 13 classifies the message into the second message, Can be stored.

이러한 기술은 최근 특정 IP 기반의 스팸 필터링 방법으로서 사용되고 있는 DNS Query 방식의 스팸 필터링 방법을 예로 들 수 있다. This technique can be exemplified as a spam filtering method of the DNS Query method which is recently used as a specific IP-based spam filtering method.

한편 도 1, 도 5 및 6을 참조하면, 제2 필터링부(13)는 메시지가 발송되는 횟수에 관한 정보를 이용하여 특정 메시지를 제2 메시지로 바로 분류할 수 있다. 이 역시 응답 정보에 무관하게 메시지 서버(30)에서 전송되는 메시지를 이용하여 수행하는 기능이다.1, 5 and 6, the second filtering unit 13 may classify a specific message into a second message using information on the number of times the message is sent. Is also a function to perform using a message transmitted from the message server 30 regardless of the response information.

먼저 도 5를 참조하면, 제2 필터링부(13)는 먼저 제1 필터링부(11)로부터 제1 메시지로 분류된 메시지의 정보를 수신하여, 제1 메시지로 분류된 메시지가 기설정된 제1 임계 횟수를 초과하여 반복 전송되었는지 여부를 감지하는 단계(S41)를 수행한다.First, referring to FIG. 5, the second filtering unit 13 first receives information of a message classified as a first message from the first filtering unit 11, and stores the message classified as the first message in a predetermined first threshold A step S41 of detecting whether the data has been repeatedly transmitted exceeding the number of times is performed.

스팸 메시지의 경우 반복되어 무작위의 수신인들에게 전송되는 것이 일반적이며, 이 경우 제1 메시지로 분류된 메시지가 스팸일 가능성이 매우 높다는 것을 의미한다.In the case of spam messages, it is common that they are repeatedly sent to random recipients, which means that the messages classified as the first message are very likely to be spam.

즉 제2 필터링부는 S41 단계의 수행을 통해 제1 임계 횟수(예를 들어 1000회)를 초과하여 제1 메시지로 분류된 메시지가 반복 전송되는 경우, 이를 제2 메시지로 바로 분류하는 단계(S51)를 수행할 수 있다. That is, when the message classified as the first message exceeds the first threshold number (for example, 1000 times) through the execution of step S41, the second filtering unit classifies the message into the second message (S51) Can be performed.

한편 도 6을 참조하면, 메시지 서버(30)는 기 전송된 메시지에 대한 외부 단말(20)로부터의 응답 메시지가 존재하는지 여부를 감지하여 응답 메시지가 미존재하는 메시지와 동일한 내용의 메시지의 발신 횟수를 감지하는 단계(S42)를 수행하며, 제2 필터링부(13)는 메시지 서버(30)로부터 응답 메시지가 미존재하는 메시지와 동일한 내용의 메시지의 발신 횟수가 기설정된 제2 임계 횟수(예를 들어 2000회)를 초과함을 알리는 정보를 수신하는 경우, 해당 메시지를 제2 메시지로 분류하는 단계(S52)를 수행한다.Referring to FIG. 6, the message server 30 detects whether there is a response message from the external terminal 20 in response to the previously transmitted message, and determines whether the number of origination The second filtering unit 13 performs a step S42 of detecting the number of times that the message having the same content as the message in which the response message does not exist from the message server 30 has a predetermined second threshold number (E.g., 2000 times), the step S52 of classifying the message into the second message is performed.

정상적인 메시지의 경우, 수신한 사용자는 이에 대하여 응답 메시지를 송신하는 경우가 많고, 응답이 없는 메시지의 경우 스팸일 가능성이 있다. 이에 따라서 제2 필터링부(13)는 응답이 없이 전송되기만 한 메시지의 경우 이와 동일한 메시지의 발신 횟수를 카운트한 메시지 서버(30)로부터 해당 메시지의 발송 횟수를 감지하고, 발신 횟수가 제2 임계 횟수를 초과하는 경우 해당 메시지를 스팸으로 확정하게 되는 것이다.In the case of a normal message, the receiving user sends a response message to the user in many cases, and in the case of a message with no response, it is likely to be spam. Accordingly, the second filtering unit 13 detects the number of times the message is transmitted from the message server 30 counting the number of times the same message is transmitted in the case of a message that is only transmitted without a response, and when the number of times of transmission is a second threshold number , The message is determined as spam.

도 4 내지 6의 실시예를 통해, 외부 단말(20)의 응답 정보를 통해 제2 메시지로 분류하는 실시예에서 외부 단말(20)이 응답 정보를 입력해야 하는 횟수를 효과적으로 줄이는 동시에, 스팸 메시지만을 정확하게 분류할 수 있는 효과가 있다. 도 4 및 6의 실시예에 따르면 제1 메시지로 단순 분류되는 메시지의 수가 줄어들 수 있기 때문이다.4 to 6, it is possible to effectively reduce the number of times that the external terminal 20 has to input the response information in the embodiment of classifying it into the second message through the response information of the external terminal 20, There is an effect that can be classified accurately. According to the embodiment of FIGS. 4 and 6, the number of messages that are simply classified into the first message can be reduced.

한편 상기의 효과와 동일한 효과를 위해, 도 7의 실시예에 따른 선 필터링 기능을 제1 필터링부(11) 및 제2 필터링부(13)가 수행할 수 있다.Meanwhile, for the same effect as the above effect, the first filtering unit 11 and the second filtering unit 13 can perform the line filtering function according to the embodiment of FIG.

즉, 도 1 및 도 7을 참조하면 제2 필터링부(13)는 제1 필터링부(11)에 의해 전송되는 메시지가 제1 메시지인지 여부를 판단하기 전, 전송되는 메시지가 스팸 메시지 데이터베이스에 저장된 제2 메시지들 중 어느 하나와 동일한 내용의 메시지인지 여부를 판단하는 단계(S1)를 수행하고, 전송되는 메시지가 제2 메시지와 동일한 메시지인 경우에는 해당 메시지를 바로 제2 메시지로 분류하여 스팸 메시지 데이터베이스에 저장하는 단계(S53)를 수행한다.1 and 7, before the second filtering unit 13 determines whether the message transmitted by the first filtering unit 11 is the first message, the transmitted message is stored in the spam message database (S1) of determining whether the message is the same as any one of the first and second messages, and if the transmitted message is the same message as the second message, classifying the message as a second message, (S53) is performed.

스팸 메시지 데이터베이스에는 상술한 바와 같이 제2 메시지로 분류된 메시지의 정보가 데이터베이스화되어 있다. In the spam message database, information of a message classified into the second message is stored in a database as described above.

따라서, 전송되는 메시지가 스팸 메시지 데이터베이스에 저장된 메시지들 중 어느 하나와 동일하다면, 해당 메시지는 스팸 메시지가 분명하기 때문에, 제2 필터링부(13)는 제1 필터링부(11)에 의한 기능을 수행할 필요 없이 전송되는 메시지를 바로 제2 메시지로 분류하는 것이다.Therefore, if the transmitted message is the same as any one of the messages stored in the spam message database, the second filtering unit 13 performs the function by the first filtering unit 11 And classifies the transmitted message into the second message immediately.

한편 S1 단계의 수행 결과, 전송되는 메시지가 스팸 메시지 데이터베이스에 저장된 제2 메시지들 모두와 다른 내용의 메시지인 경우에는, LSH(Locality-Sensitive Hashing) 방식으로 전송되는 메시지와 스팸 메시지 데이터베이스에 저장된 제2 메시지들 각각을 비교하여, 비교 결과에 따라서 전송되는 메시지가 제2 메시지와 중복되는 메시지인지 여부를 판단하는 단계(S2)를 수행한다.If it is determined in step S1 that the transmitted message is a message different from all of the second messages stored in the spam message database, the message transmitted in the Locality-Sensitive Hashing (LSH) And compares each of the messages and determines whether the message transmitted according to the comparison result is a message that overlaps with the second message (S2).

LSH 방식은 데이터 마이닝 기술에 사용되는 기법으로, 두 문서의 비교 작업을 통해 중복되는 문서일 확률을 연산하는 기법을 의미한다.The LSH method is a technique used in data mining techniques, which means a technique for calculating the probability of duplicate documents by comparing two documents.

LSH는 문서를 몇 개의 고유값으로 표현하는 방법이다. 예를 들어 문서 하나가 100여 개의 단어로 구성되어 있으면, 이를 벡터로 표현하면 100차원으로 볼 수 있다. 이것을 제한된 크기의 n차원으로 줄이는 기술이 LSH 방식이다. n의 크기를 일정하게 작게하면, 문서의 중복 여부를 판단하거나 클러스터링을 가능하게 한다.LSH is a way to represent a document with several eigenvalues. For example, if a document consists of 100 words, it can be represented as a 100-dimensional vector. The LSH method is a technique of reducing this to n dimensions of limited size. If the size of n is made small, it is possible to judge duplication of documents or to enable clustering.

예를 들어 두 개의 문서가 있을 때 문서의 내용에 포함된 모든 키워드에 ID(해당 ID는 두 개의 문서 사이에서 동일한 키워드에는 동일한 ID가 부여된다.)를 부여하고, 여기에 1차원 함수를 n개 만들게 된다. 예를 들어 각각의 문서에서 일정한 간격 k개마다 term ID n개를 추출하게 된다. 간격은 임의대로 설정할 수 있다. 이때 추출된 n개의 ID에서 가장 작은 값을 선택하게 된다. 선택한 ID가 같게되면 두 개의 문서를 유사할 가능성이 높은 문서로 판단하게 된다.For example, when there are two documents, IDs are assigned to all the keywords included in the contents of the document (the IDs are assigned the same IDs for the same keywords among the two documents), and n-dimensional functions . For example, in each document, n term IDs are extracted at k intervals. The interval can be set arbitrarily. At this time, the smallest value is selected from the extracted n IDs. If the selected IDs are the same, the two documents are judged to be highly similar documents.

구체적으로, 문서 A, B가 있고, 문서 A에서 3, 6, 9, 12.. 번째 키워드를 추출하여 정렬한 후 최상위에 오는 키워드(K1)을 선택하고, 그다음 5, 10, 15.. 번째 키워드를 추출하여 정렬한 후 최상위에 오는 키워드(K2)를 선택한다. 마지막으로 7, 14, 21.. 번째 키워드를 추출하여 정렬한 후 최상위에 오는 키워드(K3)를 선택한다. Specifically, there are the documents A and B, the keywords 3, 6, 9, and 12 in the document A are extracted and sorted, and the keyword K1 at the top is selected. Then, Extracts the keywords, sorts them, and selects the keyword K2 at the top. Finally, the 7th, 14th, and 21st keywords are extracted and sorted, and then the keyword K3 at the top is selected.

이를 문서 B에서 동일한 번째의 키워드에 대해 동일한 작업을 수행하게 되며, 이 경우 문서 A와 같이 3개의 키워드가 추출될 수 있다.In document B, the same operation is performed on the same keyword. In this case, three keywords can be extracted as in document A.

이때 해당 키워드가 각각 같은 경우, 문서 A와 문서 B는 같은 확률이 높은 문서로 판단되는 것이다.At this time, when the keywords are the same, the document A and the document B are judged to be the documents having the same probability.

LSH 방식을 통해 빠른 속도로 두 메시지의 동일 여부를 두 메시지가 다르더라도 판단할 수 있다. S2 단계를 통해 전송되는 메시지가 제2 메시지들 중 어느 하나와 동일한 것으로 판단된다면, 전송되는 메시지를 제2 메시지로 분류하는 단계(S53)를 수행한다.Through the LSH method, it is possible to judge whether two messages are identical or not at a high speed even if two messages are different. If it is determined that the message transmitted through step S2 is identical to any one of the second messages, the step S53 of classifying the transmitted message into the second message is performed.

한편, S2 단계의 수행 결과 전송되는 메시지가 제2 메시지로 분류되지 않았음을 알리는 정보를 제2 필터링부(13)로부터 수신한 제1 필터링부(11)는, 전송되는 메시지를 제1 메시지로 분류할지 여부를 판단하는 단계(S10)를 수행하게 된다.On the other hand, the first filtering unit 11, which receives from the second filtering unit 13 information indicating that the message transmitted as a result of step S2 is not classified as the second message, transmits the transmitted message as a first message (S10) whether or not to sort.

이러한 전처리 과정을 통해, 전송되는 메시지에 대해 일일이 제1 메시지에 해당하는지 여부를 판단하지 않고, 스팸이 확실한 메시지를 먼저 제2 메시지로 분류함으로써, 시스템 로드를 줄일 수 있다. 이에 대한 설명은 이하의 도 9에 대한 설명에서 구체적으로 하도록 한다.Through the preprocessing process, the system load can be reduced by classifying a message, which is sure to be spam, into a second message without judging whether the message corresponds to the first message. This will be described in detail with reference to Fig. 9 below.

한편 본 발명의 다른 실시예에서, 스팸 필터링 서버(10)는 필터 학습 모듈(14)을 포함할 수 있다.Meanwhile, in another embodiment of the present invention, the spam filtering server 10 may include a filter learning module 14.

상술한 바와 같이 스팸 필터링 시스템은 스팸을 더욱 정확하게 필터링하기 위해, 재학습 시스템을 적용하고 있다.As described above, the spam filtering system employs a re-learning system to more accurately filter spam.

본원발명에서의 필터 학습 모듈(14)은, 제1 필터링부(11)에 대한 재학습 시스템을 의미하며, 이에 대한 설명이 도 8에 도시되어 있다.The filter learning module 14 in the present invention means a re-learning system for the first filtering unit 11, and a description thereof is shown in FIG.

도 1 및 도 8을 참조하면, 필터 학습 모듈(14)은 메시지 서버(30) 또는 제2 필터링부(13)에 저장된 스팸 메시지 데이터베이스로부터, 제2 메시지로 분류된 메시지들의 내용 정보를 추출하여 제2 메시지에 포함된 적어도 하나의 단어를 추출하는 단계(S60)를 수행한다.Referring to FIGS. 1 and 8, the filter learning module 14 extracts content information of messages classified as the second message from the spam message database stored in the message server 30 or the second filtering unit 13, 2 < / RTI > message (S60).

이후, 추출된 단어를 이용하여 제1 필터링부(11)에 저장된 기설정된 단어들을 갱신하는 단계(S70)를 수행한다.Thereafter, the predetermined word stored in the first filtering unit 11 is updated using the extracted word (S70).

즉, 필터 학습 모듈(14)은 정확하게 스팸으로 분류된 메시지를 이용하여 제2 필터링부(13)가 아닌 제1 필터링부(11)를 재학습하는 기능을 수행하게 된다.That is, the filter learning module 14 performs a function of re-learning the first filtering unit 11 rather than the second filtering unit 13 by using a message accurately classified as spam.

이를 통해, 제1 필터링부(11)의 스팸 필터링 성능을 높이는 한편, 단순히 스팸 필터링을 재학습하는 것이 아니라 제1 필터링부(11)를 재학습하도록 함으로써, 제2 필터링부(13)의 기능을 더욱 높이고, 이를 통해 다원화된 스팸 필터링 시스템의 제공에 있어서 자동 스팸 필터링 기능의 기능성을 높일 수 있는 효과가 있다.This makes it possible to improve the spam filtering performance of the first filtering unit 11 and to re-learn the first filtering unit 11 instead of simply learning the spam filtering again. And the function of the automatic spam filtering function can be enhanced in providing the pluralized spam filtering system.

한편 도 1 내지 8의 각 실시예에 따른 스팸 필터링 서버의 각 구성과, 해당 구성들이 수행하는 본 발명의 각 실시예에 따른 스팸 필터링 방법에 의하면, 전송되는 메시지가 최종적으로 제2 메시지로 분류될지 여부가 결정된다.Meanwhile, according to each configuration of the spam filtering server according to each of the embodiments of FIGS. 1 to 8 and the spam filtering method according to each embodiment of the present invention performed by the configurations, the transmitted message is finally classified into the second message Is determined.

이때, 메시지 서버(30)는, 제2 메시지로 분류된 메시지에 대한 정보를 수신하게 된다. 이 경우 메시지 서버(30)은 제2 메시지로 분류된 메시지가 외부 단말(20)에 전송되는 것을 차단하거나, 제2 메시지로 분류된 메시지를 전송하되, 해당 메시지를 외부 단말(20)에 전송 시 외부 단말(20)의 메시지 수신함에 마련된 스팸 메시지 수신함으로 전송할 수 있다.At this time, the message server 30 receives information on the message classified as the second message. In this case, the message server 30 blocks the message classified as the second message from being transmitted to the external terminal 20, or transmits a message classified as the second message, and when the message is transmitted to the external terminal 20 To the spam message inbox provided in the message inbox of the external terminal 20. [

메시지 수신함이란 외부 단말(20)에 표시되는 인터페이스로서, 사용자가 메시지 송수신 서비스를 이용함에 따라서 자신이 받은 메시지가 표시되는 인터페이스를 의미한다.The message receiving box is an interface displayed on the external terminal 20 and means an interface through which a message received by the user is displayed according to the use of the message transmission / reception service by the user.

이상에서 전술한 본 발명의 실시예에 따른 스팸 필터링 방법은, 단말기에 기본적으로 설치된 애플리케이션(이는 단말기에 기본적으로 탑재된 플랫폼이나 운영체제 등에 포함된 프로그램을 포함할 수 있음)에 의해 실행될 수 있고, 사용자가 애플리케이션 스토어 서버, 애플리케이션 또는 해당 서비스와 관련된 웹 서버 등의 애플리케이션 제공 서버를 통해 단말기에 직접 설치한 애플리케이션(즉, 프로그램)에 의해 실행될 수도 있다. 이러한 의미에서, 전술한 본 발명의 실시예에 따른 스팸 필터링 방법은 단말기에 기본적으로 설치되거나 사용자에 의해 직접 설치된 애플리케이션(즉, 프로그램)으로 구현되고 단말기 등의 컴퓨터로 읽을 수 있는 기록매체에 기록될 수 있다.As described above, the spam filtering method according to the embodiment of the present invention can be executed by an application installed in a terminal (which may include a program included in a platform or an operating system basically installed in the terminal) May be executed by an application (i.e., a program) installed directly on the terminal via an application providing server such as an application store server, an application, or a web server associated with the service. In this sense, the spam filtering method according to the above-described embodiment of the present invention is implemented as an application (i.e., a program) installed basically in a terminal or directly installed by a user and recorded in a computer-readable recording medium such as a terminal .

이러한 프로그램은 컴퓨터에 의해 읽힐 수 있는 기록매체에 기록되고 컴퓨터에 의해 실행됨으로써 전술한 기능들이 실행될 수 있다. Such a program may be recorded on a recording medium that can be read by a computer and executed by a computer so that the above-described functions can be executed.

이와 같이, 본 발명의 각 실시예에 따른 스팸 필터링 방법을 실행시키기 위하여, 전술한 프로그램은 컴퓨터의 프로세서(CPU)가 읽힐 수 있는 C, C++, JAVA, 기계어 등의 컴퓨터 언어로 코드화된 코드(Code)를 포함할 수 있다. As described above, in order to execute the spam filtering method according to each embodiment of the present invention, the above-mentioned program may be stored in a computer-readable code such as C, C ++, JAVA, ).

이러한 코드는 전술한 기능들을 정의한 함수 등과 관련된 기능적인 코드(Function Code)를 포함할 수 있고, 전술한 기능들을 컴퓨터의 프로세서가 소정의 절차대로 실행시키는데 필요한 실행 절차 관련 제어 코드를 포함할 수도 있다. The code may include a function code related to a function or the like that defines the functions described above and may include an execution procedure related control code necessary for the processor of the computer to execute the functions described above according to a predetermined procedure.

또한, 이러한 코드는 전술한 기능들을 컴퓨터의 프로세서가 실행시키는데 필요한 추가 정보나 미디어가 컴퓨터의 내부 또는 외부 메모리의 어느 위치(주소 번지)에서 참조 되어야 하는지에 대한 메모리 참조 관련 코드를 더 포함할 수 있다. In addition, such code may further include memory reference related code as to what additional information or media needed to cause the processor of the computer to execute the aforementioned functions should be referenced at any location (address) of the internal or external memory of the computer .

또한, 컴퓨터의 프로세서가 전술한 기능들을 실행시키기 위하여 원격(Remote)에 있는 어떠한 다른 컴퓨터나 서버 등과 통신이 필요한 경우, 코드는 컴퓨터의 프로세서가 컴퓨터의 통신 모듈(예: 유선 및/또는 무선 통신 모듈)을 이용하여 원격(Remote)에 있는 어떠한 다른 컴퓨터나 서버 등과 어떻게 통신해야만 하는지, 통신 시 어떠한 정보나 미디어를 송수신해야 하는지 등에 대한 통신 관련 코드를 더 포함할 수도 있다. In addition, when a processor of a computer needs to communicate with any other computer or server, etc., to perform the above-described functions, the code may be stored in a computer's communication module (e.g., a wired and / ) May be used to further include communication related codes such as how to communicate with any other computer or server in the remote, and what information or media should be transmitted or received during communication.

그리고, 본 발명을 구현하기 위한 기능적인(Functional) 프로그램과 이와 관련된 코드 및 코드 세그먼트 등은, 기록매체를 읽어서 프로그램을 실행시키는 컴퓨터의 시스템 환경 등을 고려하여, 본 발명이 속하는 기술분야의 프로그래머들에 의해 용이하게 추론되거나 변경될 수도 있다.The functional program for implementing the present invention and the related code and code segment may be implemented by programmers in the technical field of the present invention in consideration of the system environment of the computer that reads the recording medium and executes the program, Or may be easily modified or modified by the user.

이상에서 전술한 바와 같은 프로그램을 기록한 컴퓨터로 읽힐 수 있는 기록매체는, 일 예로, ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광 미디어 저장장치 등이 있다. Examples of the computer-readable recording medium on which the above-described program is recorded include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical media storage, and the like.

또한 전술한 바와 같은 프로그램을 기록한 컴퓨터로 읽힐 수 있는 기록매체는 네트워크로 커넥션된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다. 이 경우, 다수의 분산된 컴퓨터 중 어느 적어도 하나의 컴퓨터는 상기에 제시된 기능들 중 일부를 실행하고, 그 결과를 다른 분산된 컴퓨터들 중 적어도 하나에 그 실행 결과를 전송할 수 있으며, 그 결과를 전송받은 컴퓨터 역시 상기에 제시된 기능들 중 일부를 실행하여, 그 결과를 역시 다른 분산된 컴퓨터들에 제공할 수 있다.Also, the computer-readable recording medium on which the above-described program is recorded may be distributed to a computer system connected via a network so that computer-readable codes can be stored and executed in a distributed manner. In this case, any of at least one of the plurality of distributed computers may execute some of the functions presented above and transmit the result of the execution to at least one of the other distributed computers, and transmit the result The receiving computer may also perform some of the functions described above and provide the results to other distributed computers as well.

특히, 본 발명의 각 실시예에 따른 스팸 필터링 방법을 실행시키기 위한 프로그램인 애플리케이션을 기록한 컴퓨터로 읽을 수 있는 기록매체는, 애플리케이션 스토어 서버(Application Store Server), 애플리케이션 또는 해당 서비스와 관련된 웹 서버 등의 애플리케이션 제공 서버(Application Provider Server)에 포함된 저장매체(예: 하드디스크 등)이거나, 애플리케이션 제공 서버 그 자체일 수도 있다.In particular, a computer-readable recording medium storing an application that is a program for executing the spam filtering method according to each embodiment of the present invention includes an application store server, an application or a web server related to the service A storage medium (e.g., a hard disk, etc.) included in the application provider server, or an application providing server itself.

본 발명의 각 실시예에 따른 스팸 필터링 방법을 실행시키기 위한 프로그램인 애플리케이션을 기록한 기록매체를 읽을 수 있는 컴퓨터는, 일반적인 데스크 탑이나 노트북 등의 일반 PC 뿐만 아니라, 스마트 폰, 태블릿 PC, PDA(Personal Digital Assistants) 및 이동통신 단말기 등의 모바일 단말기를 포함할 수 있으며, 이뿐만 아니라, 컴퓨팅(Computing) 가능한 모든 기기로 해석되어야 할 것이다. A computer capable of reading a recording medium on which an application, which is a program for executing a spam filtering method according to each embodiment of the present invention, can be read by a computer such as a smart phone, a tablet PC, a PDA Digital assistants, and mobile communication terminals. In addition, the present invention should be interpreted as all devices capable of computing.

또한, 본 발명의 실시예에 따른 스팸 필터링 방법을 실행시키기 위한 프로그램인 애플리케이션을 기록한 기록매체를 읽을 수 있는 컴퓨터가 스마트 폰, 태블릿 PC, PDA(Personal Digital Assistants) 및 이동통신 단말기 등의 모바일 단말기인 경우, 애플리케이션은 애플리케이션 제공 서버에서 일반 PC로 다운로드 되어 동기화 프로그램을 통해 모바일 단말기에 설치될 수도 있다.Further, the computer readable recording medium on which the application, which is a program for executing the spam filtering method according to the embodiment of the present invention, can be read is a mobile terminal such as a smart phone, a tablet PC, a PDA (Personal Digital Assistants) In this case, the application may be downloaded to the general PC from the application providing server and installed in the mobile terminal through the synchronization program.

이상에서, 본 발명의 실시예를 구성하는 모든 구성 요소들이 하나로 결합되거나 결합되어 동작하는 것으로 설명되었다고 해서, 본 발명이 반드시 이러한 실시예에 한정되는 것은 아니다. 즉, 본 발명의 목적 범위 안에서라면, 그 모든 구성 요소들이 적어도 하나로 선택적으로 결합하여 동작할 수도 있다. 또한, 그 모든 구성 요소들이 각각 하나의 독립적인 하드웨어로 구현될 수 있지만, 각 구성 요소들의 그 일부 또는 전부가 선택적으로 조합되어 하나 또는 복수 개의 하드웨어에서 조합된 일부 또는 전부의 기능을 수행하는 프로그램 모듈을 갖는 컴퓨터 프로그램으로서 구현될 수도 있다. 그 컴퓨터 프로그램을 구성하는 코드들 및 코드 세그먼트들은 본 발명의 기술 분야의 당업자에 의해 용이하게 추론될 수 있을 것이다. 이러한 컴퓨터 프로그램은 컴퓨터가 읽을 수 있는 저장매체(Computer Readable Media)에 저장되어 컴퓨터에 의하여 읽혀지고 실행됨으로써, 본 발명의 실시예를 구현할 수 있다. 컴퓨터 프로그램의 저장매체로서는 자기 기록매체, 광 기록매체, 등이 포함될 수 있다.While the present invention has been described in connection with what is presently considered to be the most practical and preferred embodiments, it is to be understood that the invention is not limited to the disclosed embodiments. That is, within the scope of the present invention, all of the components may be selectively coupled to at least one. In addition, although all of the components may be implemented as one independent hardware, some or all of the components may be selectively combined to perform a part or all of the functions in one or a plurality of hardware. As shown in FIG. The codes and code segments constituting the computer program may be easily deduced by those skilled in the art. Such a computer program can be stored in a computer-readable storage medium, readable and executed by a computer, thereby realizing an embodiment of the present invention. As a storage medium of the computer program, a magnetic recording medium, an optical recording medium, or the like can be included.

또한, 이상에서 기재된 "포함하다", "구성하다" 또는 "가지다" 등의 용어는, 특별히 반대되는 기재가 없는 한, 해당 구성 요소가 내재될 수 있음을 의미하는 것이므로, 다른 구성 요소를 제외하는 것이 아니라 다른 구성 요소를 더 포함할 수 있는 것으로 해석되어야 한다. 기술적이거나 과학적인 용어를 포함한 모든 용어들은, 다르게 정의되지 않는 한, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 사전에 정의된 용어와 같이 일반적으로 사용되는 용어들은 관련 기술의 문맥 상의 의미와 일치하는 것으로 해석되어야 하며, 본 발명에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.It is also to be understood that the terms such as " comprises, "" comprising," or "having ", as used herein, mean that a component can be implanted unless specifically stated to the contrary. But should be construed as including other elements. All terms, including technical and scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs, unless otherwise defined. Commonly used terms, such as predefined terms, should be interpreted to be consistent with the contextual meanings of the related art, and are not to be construed as ideal or overly formal, unless expressly defined to the contrary.

이상의 설명은 본 발명의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 발명에 개시된 실시예들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 본 발명의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.
The foregoing description is merely illustrative of the technical idea of the present invention and various changes and modifications may be made by those skilled in the art without departing from the essential characteristics of the present invention. Therefore, the embodiments disclosed in the present invention are intended to illustrate rather than limit the scope of the present invention, and the scope of the technical idea of the present invention is not limited by these embodiments. The scope of protection of the present invention should be construed according to the following claims, and all technical ideas falling within the scope of the same shall be construed as falling within the scope of the present invention.

Claims

And storing the preset words and determining whether the transmitted message is likely to be spam based on whether or not a message transmitted from a message server providing a service for transmitting and receiving messages between terminals includes at least one of the preset words A first filtering unit for classifying the first message into a first message;
When the message classified by the first message is transmitted from the message server to the external terminal, inquiring whether to classify the message classified as the first message into the second message as the spam message, 1 message and transmits the spam information; And
Wherein when the response information includes information that classifies a message classified as the first message into the second message, the response information includes information on the first request information, And a second filtering unit for classifying the message into the second message and storing the classified message in the spam message database.

The method according to claim 1,
Wherein the second filtering unit comprises:
Upon receipt of second request information requesting to classify a message not classified as the first message from the external terminal into a second message, a message corresponding to the second request information, And the second message is classified into the second message.

The method according to claim 1,
Wherein the second filtering unit comprises:
And stores the sender identification information of the message classified as the second message in the spam message database.

The method of claim 3,
Wherein the second filtering unit comprises:
And classifying the message corresponding to the sender identification information stored in the spam message database into the second message when the message transmitted from the message server is a message corresponding to the sender identification information stored in the spam message database. Spam filtering server.

The method according to claim 1,
Wherein the second filtering unit comprises:
When receiving from the message server information indicating that a message classified as the first message is repeatedly transmitted beyond a predetermined first threshold number, transmitting a message repeatedly transmitted in excess of the predetermined first threshold number to the second message Wherein the spam filtering server classifies the spam filtering server into spam filtering servers.

The method according to claim 1,
Wherein the second filtering unit comprises:
When receiving from the message server information indicating that the number of transmissions of the same message as the message in which the response message from the external terminal does not exist for the previously transmitted message exceeds a predetermined second threshold number, Wherein the second message classifies a message whose number of origination times of the same message as the message that does not exist in the response message from the external terminal exceeds a predetermined second threshold number.

The method according to claim 1,
Wherein the second filtering unit comprises:
Determining whether the transmitted message is the same as any one of the second messages stored in the spam message database, and if the transmitted message is the same message as the second message, Is stored in the spam message database.

8. The method of claim 7,
Wherein the second filtering unit comprises:
If the transmitted message is a message different from the second messages stored in the spam message database, the transmitted message and the second message stored in the spam message database are compared using a Locality-Sensitive Hashing And determines whether to classify the transmitted message as the second message according to the comparison result.

9. The method of claim 8,
Wherein the first filtering unit comprises:
As a result of comparing the transmitted message from the second filtering unit with the second message stored in the spam message database using the LSH scheme, information indicating that the transmitted message is not classified as the second message is received And classifies the transmitted message as a first message according to whether the transmitted message includes at least one of the preset words.

The method according to claim 1,
Wherein the first filtering unit comprises:
And calculating a spam probability according to whether or not at least one of the preset words is included in the transmitted message, and classifying the transmitted message as the first message when the calculated spam probability exceeds a preset threshold probability Wherein the spam filtering server is a spam filtering server.

The method according to claim 1,
The message server comprising:
And blocks a message classified as the second message from being transmitted to the external terminal.

The method according to claim 1,
The message server comprising:
Wherein the message is transmitted to a spam message inbox provided in a message inbox of the external terminal when the message classified as the second message is transmitted to the external terminal.

The method according to claim 1,
The external terminal,
A user terminal using a message transmission / reception service provided by the message server, and an administrator terminal of the message server.

The method according to claim 1,
Extracting content information of a second message stored in the spam message database, extracting at least one word included in the second message, and extracting at least one word from the at least one word stored in the first filtering unit Further comprising: a filter learning module for updating the words.

The spam filtering server,
According to whether or not a message transmitted from a message server providing a service for sending and receiving a message between terminals includes at least one of pre-stored preset words, the transmitted message is divided into a first message ;
When the message classified by the first message is transmitted from the message server to the external terminal, inquiring whether to classify the message classified as the first message into the second message as the spam message, 1 message to be transmitted; And
When the response information of the external terminal for the first request information includes information for classifying a message classified as the first message into the second message, And storing the classified spam message in a spam message database.

16. The method of claim 15,
Upon receipt of second request information requesting to classify a message not classified as the first message from the external terminal into a second message, a message corresponding to the second request information, Further comprising: classifying the first message into the second message.

16. The method of claim 15,
Wherein the step of storing in the spam message database comprises:
Wherein the sender identification information of the message classified as the second message is stored in the spam message database.

18. The method of claim 17,
Classifying a message corresponding to the sender identification information stored in the spam message database into the second message when the message transmitted from the message server is a message corresponding to the sender identification information stored in the spam message database; Wherein the spam filtering method further comprises:

16. The method of claim 15,
Wherein the step of storing in the spam message database comprises:
When receiving from the message server information indicating that the message classified as the first message is repeatedly transmitted beyond a predetermined first threshold number, transmitting a message repeatedly transmitted exceeding the predetermined first threshold number to the external terminal And the second message is classified into the second message and stored in the spam message database.

16. The method of claim 15,
When receiving from the message server information indicating that the number of transmissions of the same message as the message in which the response message from the external terminal does not exist for the previously transmitted message exceeds a predetermined second threshold number, Further comprising classifying a message whose number of times of transmission of the same message as that of the message having no response message from the external terminal exceeds a predetermined second threshold number as the second message Way.

The spam filtering server,
Determining whether to classify a message transmitted from a message server providing a service for sending and receiving messages between terminals to a second message that is a spam message;
Classifying the transmitted message into a first message, which is a message with a possibility of spam, according to whether the transmitted message is not classified as the second message, whether or not the transmitted message includes at least one of pre-stored preset words;
When the message classified by the first message is transmitted from the message server to the external terminal, inquiring whether to classify the message classified as the first message into the second message as the spam message, 1 message to be transmitted; And
When the response information of the external terminal for the first request information includes information for classifying a message classified as the first message into the second message, And storing the classified spam message in a spam message database.

22. The method of claim 21,
The step of determining whether to classify into a second message comprises:
Upon receipt of second request information requesting to classify a message not classified as the first message from the external terminal into a second message, a message corresponding to the second request information, Is classified into the second message.

22. The method of claim 21,
Wherein the step of storing in the spam message database comprises:
Wherein the sender identification information of the message classified as the second message is stored in the spam message database.

24. The method of claim 23,
The step of determining whether to classify into a second message comprises:
And classifying the message corresponding to the sender identification information stored in the spam message database into the second message when the message transmitted from the message server is a message corresponding to the sender identification information stored in the spam message database. Spam filtering methods.

22. The method of claim 21,
The step of determining whether to classify into a second message comprises:
When receiving from the message server information indicating that the number of transmissions of the same message as the message in which the response message from the external terminal does not exist for the previously transmitted message exceeds a predetermined second threshold number, Wherein the second message classifies a message whose number of times of transmission of the same message as the message in which the response message from the external terminal does not exist exceeds a predetermined second threshold number.

22. The method of claim 21,
The step of determining whether to classify into a second message comprises:
And classifying the same message as the second message when the transmitted message is the same as any one of the second messages stored in the spam message database.

27. The method of claim 26,
The step of determining whether to classify into a second message comprises:
If the transmitted message is a message different from the second messages stored in the spam message database, the transmitted message and the second message stored in the spam message database are compared using a Locality-Sensitive Hashing And determines whether to classify the transmitted message into the second message according to the comparison result.

22. The method of claim 21,
Wherein the message server comprises:
And blocking a message classified as the second message from being transmitted to the external terminal.

A computer-readable recording medium on which a program for implementing the method of claim 15 or 21 is recorded.