KR20240021387A

KR20240021387A - Job search matching method and system

Info

Publication number: KR20240021387A
Application number: KR1020220099645A
Authority: KR
Inventors: 두일철; 오세종; 김태준; 심봉걸; 정동현
Original assignee: 한국외국어대학교 연구산학협력단
Priority date: 2022-08-10
Filing date: 2022-08-10
Publication date: 2024-02-19
Anticipated expiration: 2042-08-10
Also published as: KR102759478B1

Abstract

본 발명은 구인구직 사이트의 구인공고에 기반해 사용자에게 양질의 기업 추천 서비스를 제공할 수 있는 매칭 방법 및 시스템에 관한 것으로, 사용자의 선택을 기준으로 사용자의 요구 조건에 맞는 구인 공고를 매칭해 제공할 수 있는 방법 및 시스템에 관한 것으로, 사용자가 원하는 기업 조건 및 사용자 스펙을 적어도 하나 이상 입력 받는 단계; 상기 기업 조건을 만족하는 기업들과 적어도 하나 이상의 상기 사용자 스펙에 따른 기업 선호도의 상관 관계를 학습하고, 연산하는 단계; 상기 상관 관계에 따른 상기 기업 조건을 만족하는 기업들의 순위를 결정하는 단계; 및 상기 상관 관계에 따른 상기 기업 순위에 따라 상기 기업의 리스트를 상기 사용자에 제시하는 단계;를 포함하는 구성을 개시한다.The present invention relates to a matching method and system that can provide high-quality company recommendation services to users based on job postings on a job search site. It matches and provides job postings that meet the user's requirements based on the user's selection. It relates to a method and system that can do this, comprising: receiving at least one input of corporate conditions and user specifications desired by the user; learning and calculating a correlation between companies satisfying the company conditions and company preferences according to at least one user specification; determining the ranking of companies satisfying the company conditions according to the correlation; and presenting a list of the companies to the user according to the ranking of the companies according to the correlation.

Description

Job matching method and system{JOB SEARCH MATCHING METHOD AND SYSTEM}

본 발명은 구인구직 사이트의 구인공고에 기반해 사용자에게 양질의 기업 추천 서비스를 제공할 수 있는 매칭 방법 및 시스템에 관한 것으로, 사용자의 선택을 기준으로 사용자의 요구 조건에 맞는 구인 공고를 매칭해 제공할 수 있는 방법 및 시스템에 관한 것이다.The present invention relates to a matching method and system that can provide high-quality company recommendation services to users based on job postings on a job search site. It matches and provides job postings that meet the user's requirements based on the user's selection. It's about methods and systems that can do it.

옛날에는 구인 구직 활동을 위해 신문의 구인 구직란을 통해 정보를 얻거나 간행물을 통해서 취업활동을 위한 정보를 얻을 수 있었지만 오늘날에는 인터넷 또는 앱 서비스를 통해 다양한 기업의 구인 정보를 확인하고 자신의 지원 자격과 자신이 원하는 조건을 만족하는 기업을 선택해 지원할 수 있다.In the past, you could get information through the job search section of newspapers or through publications, but today, you can check job openings at various companies through the Internet or app services and check your application qualifications. You can select and apply to companies that meet your desired conditions.

기존 구인구직 사이트들의 기업 추천은 카테고리별 기업리스트, 리뷰, 복지 및 급여 순으로 이루어 진다. 따라서, 사용자는 자신이 지원하고자 하는 기업을 선택하기 위해서 카테고리별 기업리스트, 리뷰, 복지 및 급여 등의 정보를 복합적으로 확인하여 선택하게 되는데, 수많은 기업들 중에서 지원할 기업을 선택하기 위해서는 각 기업의 정보들을 일일이 확인하여 자신이 원하는 조건에 맞는지 확인하는 작업을 해야 하기 때문에 상당한 시간을 소모하게 된다.Company recommendations from existing job search sites are made in the order of company list by category, reviews, benefits, and salary. Therefore, in order to select a company that the user wishes to apply to, the user must check and select information such as company list by category, reviews, benefits and salary, etc. In order to select a company to apply among numerous companies, the user must select the company's information. This consumes a considerable amount of time because you have to check them one by one to make sure they meet the conditions you want.

따라서, 사용자가 원하는 어떤 구인공고에 대해, 기계학습을 통해 유사한 기업들을 추천해 준다면, 사용자가 직접 여러 조건을 시스템에 입력하는 것보다 간편하게 이용할 수 있을 것이다. 최대한 간결하고 직관적인 이용방법을 가지면서도, 사용자에게 만족스러운 추천 결과를 제공하는 시스템이 필요한 실정이다.Therefore, if similar companies are recommended through machine learning for a job posting that the user wants, it will be easier for the user to use it than directly entering various conditions into the system. There is a need for a system that provides satisfactory recommendation results to users while being as simple and intuitive as possible.

따라서, 본 발명은 상기한 바와 같은 문제점을 해결하기 위한 것으로서, 사용자의 성향에 맞는 기업을 리스팅하여 추천할 수 있는 방법 및 시스템을 제공하고자 한다.Therefore, the present invention is intended to solve the problems described above, and seeks to provide a method and system for listing and recommending companies that suit the user's preferences.

본 발명은 사용자의 목적 달성에 영향을 미치는 속성을 찾아내 연산량을 최소화하여 시스템의 부담을 줄이면서도 정확도를 높일 수 있는 방법 및 시스템을 제공하고자 한다.The present invention seeks to provide a method and system that can increase accuracy while reducing the burden on the system by finding attributes that affect the user's achievement of the goal and minimizing the amount of calculation.

본 발명은 여러 시각화 방법을 이용하여, 사용자로 하여금 꼭 글을 정독하지 않고도 시각화 자료만으로 기업 구인공고의 특성을 파악할 수 있는 방법 및 시스템을 제공하고자 한다.The present invention seeks to provide a method and system that uses various visualization methods to enable users to understand the characteristics of corporate job postings using only visualization data without necessarily reading the text.

상기한 문제를 해결하기 위한 본 발명의 일 실시 예에 따른 구인구직 매칭 방법은 사용자가 원하는 기업 조건 및 사용자 스펙을 적어도 하나 이상 입력 받는 단계; 상기 기업 조건을 만족하는 기업들과 적어도 하나 이상의 상기 사용자 스펙에 따른 기업 선호도의 상관 관계를 학습하고, 연산하는 단계; 상기 상관 관계에 따른 상기 기업 조건을 만족하는 기업들의 순위를 결정하는 단계; 및 상기 상관 관계에 따른 상기 기업 순위에 따라 상기 기업의 리스트를 상기 사용자에 제시하는 단계;를 포함할 수 있다. A job matching method according to an embodiment of the present invention to solve the above problem includes receiving input of at least one company condition and user specification desired by the user; learning and calculating a correlation between companies satisfying the company conditions and company preferences according to at least one user specification; determining the ranking of companies satisfying the company conditions according to the correlation; and presenting a list of the companies to the user according to the ranking of the companies according to the correlation.

본 발명의 일 실시 예에 따르면, 상기 상관 관계 연산 단계는 적어도 하나 이상의 사용자 스펙과 기업 선호도를 매칭하는 단계; 상기 사용자 스펙의 변동과 상기 기업 선호도의 변동의 상관 관계를 확률로 정의하여 학습하는 단계; 및 상기 확률에 따라 상기 기업 조건과 적어도 하나 이상의 사용자 스펙에 따른 기업 선호도의 상관 관계를 연산하는 단계;를 포함할 수 있다.According to one embodiment of the present invention, the correlation calculation step includes matching at least one user specification and company preference; learning the correlation between the change in the user specifications and the change in the company's preference by defining it as a probability; and calculating a correlation between the company conditions and company preferences according to at least one user specification according to the probability.

본 발명의 일 실시 예에 따르면, 상기 상관 관계 연산 단계는 상기 상관 관계가 하기 수학식 1에 의해 연산될 수 있다.According to an embodiment of the present invention, in the correlation calculation step, the correlation may be calculated using Equation 1 below.

[수학식 1][Equation 1]

(여기서, C(S, P)는 상관 관계, E(S)는 분석관련성 값, S는 사용자 스펙, P는 기업 선호도이다.)(Here, C(S, P) is the correlation, E(S) is the analysis relevance value, S is the user specification, and P is the company preference.)

본 발명의 일 실시 예에 따르면, 상기 사용자 스펙은 학교, 전공, 성별, 나이, 지역, MBTI 중 적어도 하나 이상을 포함할 수 있다.According to an embodiment of the present invention, the user specifications may include at least one of school, major, gender, age, region, and MBTI.

본 발명의 일 실시 예에 따르면, 상기 기업들의 순위를 결정 단계는 복수의 상기 사용자 스펙의 조합에 따른 기업 선호도를 포함하는 순위를 결정하고, 상기 사용자 제시 단계는, 상기 순위에 따른 결과를 제공할 수 있다.According to one embodiment of the present invention, the step of determining the ranking of companies determines a ranking including company preference according to a combination of a plurality of user specifications, and the user presentation step provides a result according to the ranking. You can.

상기한 문제를 해결하기 위한 본 발명의 일 실시 예에 따른 구인구직 매칭 시스템은 사용자가 원하는 기업 조건 및 사용자 스펙을 적어도 하나 이상 입력 받는 데이터 가공부; 상기 기업 조건을 만족하는 기업들과 적어도 하나 이상의 상기 사용자 스펙에 따른 기업 선호도의 상관 관계를 학습하고, 연산하는 학습부; 상기 상관 관계에 따른 상기 기업 조건을 만족하는 기업들의 순위를 결정하는 순위결정부; 및 상기 상관 관계에 따른 상기 기업 순위에 따라 상기 기업의 리스트를 상기 사용자에 제시하는 시각화부;를 포함할 수 있다. A job matching system according to an embodiment of the present invention to solve the above problem includes a data processing unit that receives at least one input of company conditions and user specifications desired by the user; a learning unit that learns and calculates a correlation between companies that satisfy the company conditions and company preferences according to at least one user specification; a ranking unit that determines the ranking of companies that satisfy the company conditions according to the correlation; and a visualization unit that presents a list of companies to the user according to the ranking of the companies according to the correlation.

본 발명의 일 실시 예에 따르면, 상기 학습부는 적어도 하나 이상의 사용자 스펙과 기업 선호도를 매칭하고, 상기 사용자 스펙의 변동과 상기 기업 선호도의 변동의 상관 관계를 확률로 정의하여 학습하고, 상기 확률에 따라 상기 기업 조건과 적어도 하나 이상의 사용자 스펙에 따른 기업 선호도의 상관 관계를 연산할 수 있다. According to one embodiment of the present invention, the learning unit matches at least one user specification and corporate preference, defines the correlation between the change in the user specification and the change in the corporate preference as a probability, and learns according to the probability. The correlation between the corporate conditions and corporate preference according to at least one user specification can be calculated.

본 발명의 일 실시 예에 따르면, 상기 학습부는 상기 상관 관계가 하기 수학식 1에 의해 연산할 수 있다.According to an embodiment of the present invention, the learning unit may calculate the correlation using Equation 1 below.

[수학식 1][Equation 1]

본 발명의 일 실시 예에 따르면, 학습부는 복수의 상기 사용자 스펙의 조합에 따른 기업 선호도를 포함하는 순위를 결정하고, 상기 시각화부는, 상기 순위에 따른 결과를 제공할 수 있다.According to an embodiment of the present invention, the learning unit determines a ranking including company preference according to a combination of a plurality of user specifications, and the visualization unit may provide a result according to the ranking.

본 발명에 따르면, 사용자의 성향에 맞는 기업을 순위별로 리스팅하여 추천할 수 있다.According to the present invention, companies that fit the user's preferences can be listed and recommended by ranking.

또한, 사용자의 목적 달성에 영향을 미치는 속성을 찾아내 연산량을 최소화하여 시스템의 부담을 줄이면서 예측의 정확도를 높일 수 있다.In addition, it is possible to minimize the amount of computation by finding attributes that affect the user's goal achievement, thereby reducing the burden on the system and increasing the accuracy of predictions.

또한, 여러 시각화 방법을 이용하여, 사용자로 하여금 꼭 글을 정독하지 않고도 시각화 자료만으로 기업 구인공고의 특성을 파악할 수 있다.In addition, by using various visualization methods, users can understand the characteristics of company job postings using only visualization data without necessarily reading the text.

한편, 본 발명의 효과는 이상에서 언급한 효과들로 제한되지 않으며, 이하에서 설명할 내용으로부터 통상의 기술자에게 자명한 범위 내에서 다양한 효과들이 포함될 수 있다.Meanwhile, the effects of the present invention are not limited to the effects mentioned above, and various effects may be included within the range apparent to those skilled in the art from the contents described below.

도 1은 본 발명의 일 실시 예에 따른 구인구직 매칭 시스템의 블록도이다.
도 2는 본 발명의 일 실시 예에 따른 구인구직 매칭 방법을 적용한 매칭 페이지의 분석 중 화면이다.
도 3은 본 발명의 일 실시 예에 따른 구인구직 매칭 방법을 적용한 매칭 결과 화면이다.
도 4는 본 발명의 일 실시 예에 따른 구인구직 매칭 방법을 적용한 시각화 결과 화면이다.
도 5는 본 발명의 일 실시 예에 따른 구인구직 매칭 방법을 적용한 매칭 페이지에서 원하는 정보가 없을 때 결과 화면이다.
도 6은 본 발명의 일 실시 예에 따른 구인구직 매칭 방법을 적용한 결과 매칭되는 기업이 없을 때 결과 화면이다.
도 7은 본 발명의 일 실시 예에 따른 구인구직 매칭 방법을 적용한 매칭 결과 화면이다.
도 8은 본 발명의 일 실시 예에 따른 구인구직 매칭 방법의 흐름도이다.Figure 1 is a block diagram of a job matching system according to an embodiment of the present invention.
Figure 2 is a screen during analysis of a matching page applying the job search matching method according to an embodiment of the present invention.
Figure 3 is a matching result screen applying the job search matching method according to an embodiment of the present invention.
Figure 4 is a visualization result screen applying the job search matching method according to an embodiment of the present invention.
Figure 5 is a result screen when there is no desired information on a matching page applying the job search matching method according to an embodiment of the present invention.
Figure 6 is a result screen when there are no matching companies as a result of applying the job search matching method according to an embodiment of the present invention.
Figure 7 is a matching result screen applying the job search matching method according to an embodiment of the present invention.
Figure 8 is a flowchart of a job search matching method according to an embodiment of the present invention.

이하, 첨부된 도면들을 참조하여 본 발명에 따른 '구인구직 매칭 방법 및 시스템'을 상세하게 설명한다. 설명하는 실시 예들은 본 발명의 기술사상을 당업자가 용이하게 이해할 수 있도록 제공되는 것으로 이에 의해 본 발명이 한정되지 않는다. 또한, 첨부된 도면에 표현된 사항들은 본 발명의 실시 예들을 쉽게 설명하기 위해 도식화된 도면으로 실제로 구현되는 형태와 상이할 수 있다.Hereinafter, the 'job search matching method and system' according to the present invention will be described in detail with reference to the attached drawings. The described embodiments are provided so that those skilled in the art can easily understand the technical idea of the present invention, and the present invention is not limited thereto. In addition, the matters expressed in the attached drawings may be different from the actual implementation form in the schematic drawings to easily explain the embodiments of the present invention.

한편, 이하에서 표현되는 각 구성부는 본 발명을 구현하기 위한 예일 뿐이다. 따라서, 본 발명의 다른 구현에서는 본 발명의 사상 및 범위를 벗어나지 않는 범위에서 다른 구성부가 사용될 수 있다.Meanwhile, each component expressed below is only an example for implementing the present invention. Accordingly, other components may be used in other implementations of the present invention without departing from the spirit and scope of the present invention.

또한, 각 구성부는 순전히 하드웨어 또는 소프트웨어의 구성만으로 구현될 수도 있지만, 동일 기능을 수행하는 다양한 하드웨어 및 소프트웨어 구성들의 조합으로 구현될 수도 있다. 또한, 하나의 하드웨어 또는 소프트웨어에 의해 둘 이상의 구성부들이 함께 구현될 수도 있다.Additionally, each component may be implemented purely through hardware or software configurations, but may also be implemented through a combination of various hardware and software configurations that perform the same function. Additionally, two or more components may be implemented together by one hardware or software.

또한, 어떤 구성요소들을 '포함'한다는 표현은, '개방형'의 표현으로서 해당구성요소들이 존재하는 것을 단순히 지칭할 뿐이며, 추가적인 구성요소들을 배제하는 것으로 이해되어서는 안된다.Additionally, the expression 'including' certain components is an 'open' expression and simply refers to the existence of the corresponding components and should not be understood as excluding additional components.

도 1은 본 발명의 일 실시 예에 따른 구인구직 매칭 시스템의 블록도이다.Figure 1 is a block diagram of a job matching system according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 일 실시 예에 따른 구인구직 매칭 시스템(1)은 데이터 가공부(100), 학습부(200) 및 시각화부(300)를 포함할 수 있다. Referring to FIG. 1, the job matching system 1 according to an embodiment of the present invention may include a data processing unit 100, a learning unit 200, and a visualization unit 300.

상기 데이터 가공부(100)는 사용자가 원하는 기업 조건 및 사용자 스펙을 적어도 하나 이상 입력 받을 수 있다. 상기 데이터 가공부(100)는 구인 구직 사이트 등을 통해서 기업의 구인 구직 정보를 수집할 수 있다. 상기 데이터 가공부(100)는 구인 구직 사이트에서 기업의 구인 구직 정보를 크롤링해 수집할 수 있다. 상기 데이터 가공부(100)는 상기 사용자가 원하는 기업 조건을 적어도 하나 이상 입력 받을 수 있다. 상기 데이터 가공부(100)는 상기 사용자의 스펙을 적어도 하나 이상 입력 받을 수 있다.The data processing unit 100 can receive input of at least one company condition and user specification desired by the user. The data processing unit 100 may collect information on job vacancies from companies through job vacancies, etc. The data processing unit 100 may crawl and collect job information for companies from job search sites. The data processing unit 100 may receive input of at least one company condition desired by the user. The data processing unit 100 may receive input of at least one specification of the user.

상기 데이터 가공부(100)는 구인 구직 정보에 관한 정형화된 데이터를 마련하기 위해, 여러 채용사이트들 중 정형화된 채용공고의 데이터를 가지고 있는 구인 구직 사이트(예를 들어, 잡플래닛) 플랫폼을 사용해 크롤링할 수 있다. 상기 데이터 가공부(100)는 잡플래닛이 다른 구인구직 플랫폼보다 제공하는 정보가 상대적으로 정형화 돼 있지만, 정보가 부족한 채용공고에 대해, 예외처리를 해주어 상황에 맞는 크롤링을 진행할 수 있다. 상기 데이터 가공부(100)는 브라우저 자동화 도구인 Selenium을 통해 데이터 크롤링을 진행할 수 있고, 대량의 데이터를 수집하기 위해 모든 직무별 채용공고를 2~3일에 걸쳐 수집할 수 있다.The data processing unit 100 crawls using a job search site (e.g., Job Planet) platform that has standardized job advertisement data among various job search sites in order to prepare standardized data on job search information. can do. Although the information provided by Job Planet is relatively standardized compared to other job search platforms, the data processing unit 100 can perform crawling appropriate to the situation by processing exceptions for job advertisements with insufficient information. The data processing unit 100 can perform data crawling through Selenium, a browser automation tool, and can collect job advertisements for all jobs over 2 to 3 days to collect a large amount of data.

상기 데이터 가공부(100)는 형식은 통일되었지만, 채용공고마다 다른 말투와 필요하지 않은 정보들을 교정(불용어 처리)할 수 있고, 유의미한 데이터를 뽑고 학습의 정확도를 높일 수 있도록 데이터 토큰화 단계를 가질 수 있다. 본 발명의 일 실시 예에 따르면, 상기 데이터 가공부(100)는 프로젝트에 맞게 토큰화 방식을 찾기 위해, 대중적인 한글 형태소 분석기인 konlpy(Mecab,Kkma, Okt 등..)와 최신 기술인 Kiwi토크나이저를 직접 비교 분석한 뒤, 더 다양한 품사 태깅 방식을 이용하는 Kiwi토크나이저를 채택할 수 있다. 결론적으로 Kiwi+mecab토크나이저를 혼합해 유기적인 토큰화 방법을 사용했는데, kiwi토크나이저에서 인지할 수 없는 한글과 영어 조합에 대해서는 Mecab토크나이저를 사용하였다. 상기 데이터 가공부(100)는 토큰화를 진행한 데이터를 복리후생과 자격요건 각각의 데이터로 저장할 수 있다. 추후 상기 학습부(200)에서 사용자가 비슷한 직무에 대해 두 가지 기준 중 하나를 선택해 유사한 기업을 추천 받을 수 있도록 학습 모델을 설정할 수 있다.The data processing unit 100 has a unified format, but can correct (stopword processing) different speech styles and unnecessary information for each job posting, and has a data tokenization step to extract meaningful data and increase learning accuracy. You can. According to one embodiment of the present invention, the data processing unit 100 uses konlpy (Mecab, Kkma, Okt, etc.), a popular Korean morpheme analyzer, and Kiwi tokenizer, the latest technology, to find a tokenization method suitable for the project. After directly comparing and analyzing, you can adopt Kiwi Tokenizer, which uses a more diverse part-of-speech tagging method. In conclusion, an organic tokenization method was used by mixing Kiwi+mecab tokenizer, and Mecab tokenizer was used for Korean and English combinations that could not be recognized by kiwi tokenizer. The data processing unit 100 may store the tokenized data as data for welfare benefits and qualification requirements. Later, the learning unit 200 can set up a learning model so that a user can be recommended a similar company by selecting one of two criteria for a similar job.

상기 데이터 가공부(100)는 기업의 채용 공고에서 기업이름과 직무를 tag로, 채용정보를 말뭉치로써 문서라고 간주한다면, 기업이름과 직무의 제목을 가지고 있는 문서들의 유사도 학습을 위해 문서의 벡터화를 진행할 수 있다. 이를 위해, 상기 데이터 가공부(100)는 기존의 워드 임베딩 기술 Word2Vec에서 확장된 문서 임베딩 기술인 Doc2Vec을 사용할 수 있다. Doc2Vec을 데이터에 적용시키기 위해, 자연어를 벡터로 변환하는데 필요한 대부분의 편의기능을 제공하는 파이썬 라이브러리인 gensim을 사용할 수 있다.If the data processing unit 100 considers a document with the company name and job as a tag and the job information as a corpus in the company's recruitment announcement, it vectorizes the document to learn the similarity of documents with the company name and job title. You can proceed. For this purpose, the data processing unit 100 may use Doc2Vec, a document embedding technology extended from the existing word embedding technology Word2Vec. To apply Doc2Vec to data, you can use gensim, a Python library that provides most of the convenience functions needed to convert natural language into vectors.

상기 데이터 가공부(100)는 데이터 전처리의 정확도를 정교화하며 Doc2Vec을 통해 학습한 모델(각 기업 채용 공고마다 유사도를 나타내는 벡터모델)을 사용자입장에서 편리하게 사용해 볼 수 있도록, 시스템을 Flask 프레임워크로 개발할 수 있다. 사용자는 상기 시각화부(300)를 통해 구현된 결과로 잘 정리된 채용 공고와 시각적인 자료, 데이터의 신뢰도를 확인할 수 있는 채용 공고 유사도를 웹을 통해 확인할 수 있다.The data processing unit 100 refines the accuracy of data preprocessing and converts the system into a Flask framework so that the user can conveniently use the model learned through Doc2Vec (a vector model indicating similarity for each company's job posting). can be developed As a result of the visualization unit 300, the user can check well-organized job postings, visual data, and job posting similarity that can confirm the reliability of the data through the web.

본 발명의 일 실시 에에 따르면, 상기 데이터 가공부(100)는 잡플래닛에 등록되어 있는 모든 구인공고들에 대해 크롤링 실시하고, 잡플래닛 사이트의 JavaScript등 동적으로 구현된 부분을 크롤링하기 위해 Selenium의 웹 드라이버를 이용할 수 있다. 잡플래닛이 다른 구인구직 플랫폼보다 상대적으로 정형화된 정보를 제공하지만, 기업마다 제공하는 정보 종류의 개수가 다르기 때문에, 예외 처리를 이용하여 데이터를 수집할 수 있다.According to one embodiment of the present invention, the data processing unit 100 crawls all job postings registered in Job Planet and uses Selenium's web to crawl dynamically implemented parts such as JavaScript of the Job Planet site. Drivers are available. Although Job Planet provides relatively more standardized information than other job search platforms, since the number of types of information provided is different for each company, data can be collected using exception processing.

상기 데이터 가공부(100)는 예외상황 발생시, 해당 페이지를 다시 크롤링할 수 있다. 데이터 크롤링 후, 기업이름, 구인공고제목, 마감일, 채용직무, 경력, 고용형태, 급여, 스킬, 기업소개, 주요업무, 자격요건, 우대사항, 채용절차, 복리후생 별로 데이터를 나누어 csv파일로 저장할 수 있다.When an exception occurs, the data processing unit 100 can crawl the page again. After data crawling, the data can be divided by company name, job posting title, deadline, recruitment job, experience, employment type, salary, skill, company introduction, main tasks, qualifications, preferential treatment, recruitment procedure, and welfare benefits and saved as a CSV file. You can.

상기 데이터 가공부(100)는 저장한 csv파일에 대해 일괄적으로 전처리 작업(불용어 처리 + 성격이 비슷한 데이터는 하나의 column으로 합침)을 수행할 수 있다. 예를 들어, 필요스킬, 자격요건, 우대사항은 성격이 같기 때문에 '자격요건' column에 몰아넣을 수 있다. 예를 들어, 구인공고제목, 직무, 주요업무는 성격이 같기 때문에 'task' column에 몰아넣을 수 있다.The data processing unit 100 can collectively perform preprocessing (stopword processing + merging data with similar characteristics into one column) on the stored CSV files. For example, required skills, qualifications, and preferential treatment can be lumped into the ‘Qualifications’ column because they have the same nature. For example, the job posting title, job title, and main tasks have the same nature, so they can be grouped into the 'task' column.

상기 데이터 가공부(100)는 똑같은 구인공고라도, 마감일이 다른 경우 다른 구인공고로 인식하여 저장되기 때문에, pandas 모듈의 drop_duplicates를 이용하여 '기업명+구인공고제목'이 같은 경우, 중복 데이터를 제외할 수 있다.Since the data processing unit 100 recognizes and stores the same job posting as a different job posting if the deadline is different, if the 'company name + job posting title' is the same, duplicate data can be excluded using the drop_duplicates of the pandas module. You can.

상기 데이터 가공부(100)는 불용어 처리시, 사용자에게 구인공고의 raw data 또한 제공할 것이기 때문에, 원본 상태의 구인공고 글을 따로 column을 만들어 보관할 수 있다.When processing stop words, the data processing unit 100 will also provide raw data of job postings to the user, so the original job postings can be stored in a separate column.

상기 데이터 가공부(100)는 csv 저장시, 구인공고에 "\r" 문자가 포함된 경우, 자동으로 행이 넘어가는 상황이 발생하여 \r를 \n으로 replace 처리할 수 있다.When saving CSV, the data processing unit 100 may replace \r with \n in a situation where a line is automatically skipped if the job posting contains the character "\r".

상기 데이터 가공부(100)는 여러 토크나이저를 사용한 토큰화 기법들을 비교한 후, 가장 토큰화가 잘되는(문장에서 중요한 단어를 잘 추출하는) 토크나이저를 사용할 수 있다.The data processing unit 100 may compare tokenization techniques using various tokenizers and then use the tokenizer that is best at tokenizing (extracting important words from sentences).

Kiwi, Mecab, Kkma, Okt등을 비교한 결과, kiwi와 Mecab토크나이저가 가장 품사 태그를 다양하게 나누고, 학습에 적합하다.As a result of comparing Kiwi, Mecab, Kkma, Okt, etc., Kiwi and Mecab tokenizers divide part-of-speech tags into various parts and are most suitable for learning.

상기 데이터 가공부(100)는 불용어 처리된 데이터에서, kiwi토크나이저를 이용하여 '일반명사, 고유명사, 영어, 어근'으로 토큰화할 수 있다.The data processing unit 100 can tokenize stopword-processed data into 'common nouns, proper nouns, English, and roots' using the kiwi tokenizer.

kiwi 토크나이저는 뜻을 알 수 없는 한글과 영어의 조합을 만나면 exception 발생하게 된다. 예를 들어, 리깅TA(주로 게임 개발사의 구인공고에 자주 등장)하는데 exception이 발생할 수 있다.The kiwi tokenizer generates an exception when it encounters a combination of Korean and English whose meaning is unknown. For example, an exception may occur during rigging TA (which often appears in job postings at game developers).

상기 데이터 가공부(100)는 kiwi에서 exception을 발생시키는 데이터에 대해서는 Mecab 토크나이저를 사용하여 토큰화할 수 있다The data processing unit 100 can tokenize data that causes an exception in kiwi using the Mecab tokenizer.

상기 데이터 가공부(100)는 개발한 토크나이저를 이용하여 csv 파일을 두 개만들 수 있다. 예를 들어, 복리후생과 task column을 토큰화 한 csv 파일, 자격요건과 task column을 토큰화 한 csv 파일을 생성할 수 있다.The data processing unit 100 can create two CSV files using the developed tokenizer. For example, you can create a CSV file that tokenizes benefits and task columns, and a CSV file that tokenizes qualifications and task columns.

상기 데이터 가공부(100)는 각 csv 파일을 만들 때에 task column을 추가적으로 토큰화 하는 이유는, 학습모델을 2개 만들 것이기 때문이다. 예를 들어, 복리후생/자격요건을 학습한 모델 + task를 학습한 모델을 만들 수 있다.The reason why the data processing unit 100 additionally tokenizes the task column when creating each csv file is because two learning models will be created. For example, you can create a model that learns welfare/qualification requirements + a model that learns tasks.

상기 데이터 가공부(100)가 학습 모델을 2개 사용하는 이유는 사용자가 입력한 기업과 '복리후생/자격요건'중 하나의 기준이 비슷한 기업들을 추천할 것인데, '복리후생/자격요건'은 비슷한데 정작 채용직무가 다르다면, 해당 추천은 사용자에게 의미가 없기 때문이다.The reason why the data processing unit 100 uses two learning models is to recommend companies that have similar criteria in one of the 'welfare/qualification requirements' to the company entered by the user, and the 'welfare/qualification requirements' are If the job is similar but the hiring position is different, the recommendation has no meaning to the user.

사용자는 자신의 전문 분야에 대한 구직을 시도하고, 일반적으로 어떤 전문 분야에 대한 직무의 종류는 제한되어 있다.Employers attempt to find jobs in their field of expertise, and generally the types of jobs for any specialty are limited.

상기 데이터 가공부(100)는 토큰화 된 데이터를 Doc2Vec모델을 이용해 학습을 수행하는데, 이 때 Doc2Vec모델의 Hyperparameter를 조정해 가며, 가장 우수한 정확도를 가지는 값을 채택할 수 있다.The data processing unit 100 performs learning on the tokenized data using the Doc2Vec model. At this time, the hyperparameter of the Doc2Vec model can be adjusted to select the value with the best accuracy.

이때 기계 학습의 방법으로 Doc2Vec을 사용한 이유는, TensorFlow를 이용한 문서 임베딩 방법과 비교해 보았을 때 속도가 더 빠르고, 여러 hyperparameter를 조정하며 학습 가능하다는 점에서 Doc2Vec이 학습에 더 유리할 것으로 판단했기 때문이다.The reason why Doc2Vec was used as a machine learning method at this time was because it was judged that Doc2Vec would be more advantageous for learning in that it is faster than the document embedding method using TensorFlow and can be learned by adjusting several hyperparameters.

상기 데이터 가공부(100)는 학습한 두개의 모델 중 task를 학습한 모델에서, 사용자가 입력한 기업과 70% 이상 유사한 기업들의 index를 우선적으로 추출할 수 있다.The data processing unit 100 can preferentially extract the index of companies that are more than 70% similar to the company entered by the user from the model that learned the task among the two learned models.

상기 데이터 가공부(100)는 직무의 유사도 하한선을 70%로 정한 이유는, 직무 모델(task column을 학습한 모델)을 학습시에 사용한 데이터의 양이 많았기 때문에(task column에 '구인공고제목, 직무, 주요업무' 데이터를 몰아 넣음), 70% 유사도를 갖더라도 실제 데이터를 비교한 경우 유사했기 때문이다.The reason why the data processing unit 100 set the lower limit of job similarity to 70% was because the amount of data used when training the job model (the model that learned the task column) was large (the 'job posting title' in the task column , job, and main task data), even if there was a 70% similarity, it was similar when comparing the actual data.

상기 데이터 가공부(100)는 직무모델에서 추출된 기업들에 대해 '복리후생/자격요건'모델로 유사한 기업들을 추출하여 유사도가 높은순으로 출력한다. 예를 들어, 최종적으로 직무가 70%이상 유사하면서 '복리후생/자격요건'또한 유사한 기업들이 추출될 수 있다.The data processing unit 100 extracts similar companies using the 'welfare/qualification requirements' model for companies extracted from the job model and outputs them in descending order of similarity. For example, ultimately, companies with more than 70% similar jobs and similar 'welfare/qualification requirements' can be extracted.

상기 학습부(200)는 상기 기업 조건을 만족하는 기업들과 적어도 하나 이상의 상기 사용자 스펙에 따른 기업 선호도의 상관 관계를 학습하고, 연산할 수 있다. 상기 학습부(200)는 상기 기업 조건을 만족하는 기업들을 선별해 리스팅할 수 있다. 상기 학습부(200)는 상기 기업 조건을 만족하는 기업들의 리스트에서 상기 사용자의 스펙에 따른 기업 선호도의 상관 관계에 따라 상기 사용자가 선호할 만한 기업들을 순위에 따라 나열할 수 있다. The learning unit 200 may learn and calculate the correlation between companies that satisfy the company conditions and company preferences according to at least one user specification. The learning unit 200 can select and list companies that satisfy the company conditions. The learning unit 200 may rank companies that the user may prefer according to the correlation of company preferences according to the user's specifications from the list of companies that satisfy the company conditions.

상기 학습부(200)는 적어도 하나 이상의 사용자 스펙과 기업 선호도를 매칭할 수 있다. 상기 학습부(200)는 상기 사용자 스펙의 변동과 상기 기업 선호도의 변동의 상관 관계를 확률로 정의하여 학습할 수 있다. 상기 학습부(200)는 상기 확률에 따라 상기 기업 조건과 적어도 하나 이상의 사용자 스펙에 따른 기업 선호도의 상관 관계를 연산할 수 있다.The learning unit 200 may match at least one user specification and company preference. The learning unit 200 may learn the correlation between the change in user specifications and the change in company preference by defining it as a probability. The learning unit 200 may calculate a correlation between the company conditions and company preferences according to at least one user specification according to the probability.

상기 학습부(200)는 상기 기상 상태 속성의 변동과 상기 대상의 상태의 변동의 상관 관계를 확률로 정의하여 학습하여 상관 관계를 측정할 수 있다.The learning unit 200 may measure the correlation by defining the correlation between the change in the weather state attribute and the change in the state of the object as a probability and learning.

상기 학습부(200)는 상기 기업 조건을 만족하는 기업들과 적어도 하나 이상의 상기 사용자 스펙에 따른 기업 선호도의 영향력을 연산할 수 있다.The learning unit 200 may calculate the influence of companies satisfying the company conditions and company preferences according to at least one user specification.

상기 영향력은 상기 기업 선호도에 상기 사용자 스펙의 변화에 따른 영향이 얼마나 있는지에 따라 달라지는 값으로 다음의 수학식 1에 의해 연산될 수 있다.The influence is a value that varies depending on how much influence a change in the user specification has on the company's preference and can be calculated using Equation 1 below.

[수학식 1][Equation 1]

여기서 p(x)는 선호도 x의 변화가 발생할 확률이다. 발생할 확률이 1에 가까울수록 영향력은 0에 가까워져 적은 케이스의 학습이 필요하므로 연산량이 적어지고, 발생할 확률이 0에 가까울수록 영향력은 무한히 커져 많은 케이스의 학습이 필요하므로 연산량이 많아 질 수 있다.Here, p(x) is the probability that a change in preference x will occur. As the probability of occurrence approaches 1, the influence approaches 0, requiring learning of a small number of cases, thus reducing the amount of computation. As the probability of occurrence approaches 0, the influence becomes infinitely greater, requiring learning of many cases, which may increase the amount of computation.

상기 학습부(200)에서 연산부하는 상기 영향력의 기대값(평균)을 나타내는 것으로써, 연산부하가 크다는 것은 평균 영향력이 크다는 것이며 불확실성이 크면 클수록 분류하기는 어려워지기 때문에 연산부하가 가장 작은 것을 상위 의사 결정 노드에 위치시켜 연산량을 감소시킬 수 있다. 상기 연산부하는 하기 수학식 2를 연산해 구할 수 있다.In the learning unit 200, the calculation load represents the expected value (average) of the influence. A large calculation load means a large average influence, and the greater the uncertainty, the more difficult it is to classify, so the one with the smallest calculation load is the top doctor. The amount of computation can be reduced by placing it at the decision node. The computational load can be obtained by calculating Equation 2 below.

[수학식 2][Equation 2]

여기서 S는 이미 발생한 모든 사건의 모음을 의미하며 c는 사건의 개수를 의미할 수 있다.Here, S refers to the collection of all events that have already occurred, and c may refer to the number of events.

상기 학습부(200)에서 상기 상관 관계는 하기 수학식 3을 연산해 구할 수 있다.In the learning unit 200, the correlation can be obtained by calculating Equation 3 below.

[수학식 3][Equation 3]

여기서, C(S, P)는 상관 관계, E(S)는 분석관련성 값, S는 사용자 스펙, P는 기업 선호도이다. 여기서 P는 사용자 스펙 중 특정 스펙을 의미하며 어떤 스펙을 가지고 분류했을 때 가장 연산부하가 작은지(즉, 정보획득량이 큰 것)를 판단할 수 있다.Here, C(S, P) is the correlation, E(S) is the analysis relevance value, S is the user specification, and P is the company preference. Here, P refers to a specific specification among user specifications, and it can be determined which specification has the lowest computational load (i.e., the largest amount of information acquisition) when classified.

상기 학습부(200)는 상기 상관 관계에 따른 상기 사용자 스펙의 순위를 결정할 수 있다. 상기 학습부(200)는 상기 상관 관계가 낮은 순서대로 순위를 결정할 수 있다. 상기 학습부(200)는 상기 상관 관계가 낮은 상기 사용자 스펙을 추천 우선순위에 둘 수 있다.The learning unit 200 may determine the ranking of the user specifications according to the correlation. The learning unit 200 may determine the ranking in descending order of correlation. The learning unit 200 may prioritize the user specifications with the low correlation in recommendation.

상기 사용자 스펙은 학교, 전공, 성별, 나이, 지역, MBTI 중 적어도 하나 이상을 포함할 수 있다.The user specifications may include at least one of school, major, gender, age, region, and MBTI.

상기 학습부(200)는 복수의 상기 사용자 스펙의 조합에 따른 기업 선호도를 포함하는 순위를 결정할 수 있다.The learning unit 200 may determine a ranking including company preference according to a combination of a plurality of user specifications.

상기 시각화부(300)는 상기 상관 관계에 따른 상기 기업 순위에 따라 상기 기업의 리스트를 상기 사용자에 제시할 수 있다.The visualization unit 300 may present a list of companies to the user according to the ranking of the companies according to the correlation.

상기 시각화부(300)는 복수의 상기 사용자 스펙에 따른 기업 선호도의 상관 관계의 순위에 따른 결과를 제공할 수 있다. 상기 시각화부(300)는 상기 사용자 스펙들의 복수의 조합에 따른 기업 선호도의 상관 관계의 순위에 따른 결과를 제공할 수 있다. The visualization unit 300 may provide results according to the ranking of the correlation of corporate preferences according to the plurality of user specifications. The visualization unit 300 may provide results according to the ranking of correlations between corporate preferences according to a plurality of combinations of the user specifications.

상기 시각화부(300)는 사용자가 유사한 기업을 추천 받고 추천 받은 기업 공고의 정보를 확인 할 때, 시각적으로 어떤 부분이 유사했는지 확인하기 쉽도록 데이터를 시각화하여 사용자에게 제공할 수 있다. 상기 시각화부(300)는 각각의 기업 공고 내용에 자주 등장하는 핵심 단어를 다양하게 시각화하는 Wordcloud 방식을 사용할 수 있다. 상기 시각화부(300)는 additional web page에 유사한 기업들을 2차원 좌표평면 상에 TSNE(고차원의 벡터 데이터의 차원을 축소 하는 기법)를 이용해 보여줌으로써, 사용자에게 직관적이고 흥미로운 방법으로 유사한 기업을 탐색해볼 수 있는 기능도 제공할 수 있다.The visualization unit 300 can visualize data and provide it to the user so that it is easy to visually check which parts are similar when the user is recommended a similar company and checks the information in the recommended company announcement. The visualization unit 300 can use the Wordcloud method to variously visualize key words that frequently appear in each company's announcement content. The visualization unit 300 displays similar companies on an additional web page using TSNE (a technique for reducing the dimension of high-dimensional vector data) on a two-dimensional coordinate plane, allowing users to explore similar companies in an intuitive and interesting way. Functions that can be used can also be provided.

상기 시각화부(300)는 앞서 '데이터 임베딩' 단계까지 개발한 시스템을 Flask를 이용한 웹 서버에 탑재해 출력할 수 있다. Backend와 FrontEnd간의 데이터 전송은, back-end에서는 html파일을 클라이언트에게 보여주는 render_template 함수를 호출 시 parameter로 front-end로 데이터를 전달하는 방법으로 수행될 수 있다. Back-end에서 전달받은 데이터를, Front-end에서는 JavaScript와 jQuery를 이용하여 html상에 배치한 컴포넌트들에 나타나도록 할 수 있다. Front-end에서 웹 서버로의 데이터 전달은 form태그의 get method를 이용하거나, front-end에서 호출한 URL에 data를 포함하여 전달 후 back-end에서 처리하여 이용하는 방식으로 구현할 수 있다.The visualization unit 300 can output the system developed up to the 'data embedding' stage by mounting it on a web server using Flask. Data transmission between the Backend and FrontEnd can be performed by sending data to the front-end as a parameter when the back-end calls the render_template function that shows the HTML file to the client. Data received from the back-end can be displayed in components placed on HTML using JavaScript and jQuery at the front-end. Data transfer from the front-end to the web server can be implemented by using the get method of the form tag, or by including data in the URL called by the front-end and then processing it at the back-end.

웹의 UI는 bootstrap의 다양한 요소들(ex. progressbar, radio button, table, card)을 이용하여 직관적이고 간결하게 구성할 수 있다.The web UI can be constructed intuitively and simply using various elements of bootstrap (ex. progressbar, radio button, table, card).

시스템이 새로운 학습을 진행중인 경우나 시각화 자료를 생성하는 등, 사용자가 시스템의 연산을 기다려야 하는 경우에는, progressbar를 이용하여 사용자로 하여금 어떤 연산이 진행중인지 알리도록 할 수 있다.If the user needs to wait for the system to perform an operation, such as when the system is in the process of learning something new or creating visualization data, a progressbar can be used to let the user know what operation is in progress.

html 요소들의 구성은, <head>의 style 태그에서 css 문법을 이용하여 배치와 크기를 디자인할 수 있다.The composition of HTML elements can be designed for placement and size using CSS grammar in the style tag of <head>.

시각화 정보를 보여주는 html 파일의 경우, 웹 브라우저에 캐싱된 데이터가 아닌 새로 바뀐 정보를 제공해야 하기 때문에, Cache-Control를 이용하여 캐싱을 하지 않도록 막고, 시각화 이미지를 부를시에 URL에 랜덤한 숫자를 추가하여 서버에 전송 함으로써, 같은 이름의 이미지 파일에 다른 시각화 정보를 덮어씌워 사용하더라도 바뀐 이미지가 사용자에게 제공하도록 구현할 수 있다.In the case of HTML files showing visualization information, new information must be provided rather than cached data in the web browser, so use Cache-Control to prevent caching and add a random number to the URL when calling the visualization image. By adding it and sending it to the server, the changed image can be provided to the user even if the image file of the same name is overwritten with other visualization information.

예시적인 웹 페이지의 구성은 다음과 같다.The structure of an example web page is as follows.

시작페이지: hello.html,Start page: hello.html,

구인공고링크 입력 페이지: apply.html,Job posting link input page: apply.html,

로딩페이지: index.html,Loading page: index.html,

분석결과페이지: result.html,Analysis result page: result.html,

기업 구인공고 세부정보 페이지: linktoimg.html,Company job posting details page: linktoimg.html,

'복리후생'정보가 없음을 알리는 페이지: nobokri.html,Page notifying that there is no 'welfare' information: nobokri.html,

유사한 기업이 없음을 알리는 페이지: nolist.html,A page indicating that there are no similar companies: nolist.html;

시각화 정보를 보여주는 페이지: display_plot.htmlPage showing visualization information: display_plot.html

사용자는 apply.html에 잡플래닛 구인공고 링크를 입력한 뒤, 어떤 기준(복리후생 or 자격요건)에 따라 유사한 기업을 추천받을 것인지 입력하게 된다.The user enters the Job Planet job posting link in apply.html and then enters what criteria (benefits or qualifications) they would like to receive recommendations for similar companies.

상기 시각화부(300)는 웹 서버에서는 사용자가 입력한 링크와 기준에 따라, 학습된 모델에서 유사한 기업을 추출하여 result.html에서 table을 통해 표 형식으로 사용자에게 제공한다.The visualization unit 300 extracts similar companies from the learned model according to the link and criteria entered by the user on the web server and provides them to the user in a table format through a table in result.html.

상기 시각화부(300)는 사용자는 result.html의 각 기업의 [자세히보기]를 클릭하여 해당 기업 구인공고의 키워드를 분석한 WordCloud 결과와, 전처리 되지 않은 raw 구인공고 내용을 제공받는다.In the visualization unit 300, the user clicks [View Details] for each company in result.html and is provided with WordCloud results that analyze the keywords of the company's job posting and raw job posting content that has not been preprocessed.

상기 시각화부(300)는 사용자가 입력한 링크에 '복리후생'에 대한 정보가 없는 상태에서 복리후생으로 추천을 명령한 경우, 시스템은 nobokri.html을 보여줌으로써 사용자에게 링크를 다시 입력할 것을 사용자에게 요구한다.If the visualization unit 300 commands a recommendation as welfare benefits without information on 'welfare benefits' in the link entered by the user, the system displays nobokri.html to instruct the user to re-enter the link. ask for

상기 시각화부(300)는 사용자가 입력한 링크의 기업과 유사한 기업이 없는 경우(직무와 기준이 모두 유사한 기업이 없는 경우), 시스템은 nolist.html을 보여줌으로써 유사 기업이 없음을 알리고, 사용자에게 '입력받은 기준'이 유사하지는 않지만, '직무'가 유사한 다른 기업들을 추천해준다.If there is no company similar to the company in the link entered by the user (when there is no company with similar job duties and standards), the visualization unit 300 notifies the user that there is no similar company by displaying nolist.html, and informs the user Although the 'input criteria' are not similar, other companies with similar 'jobs' are recommended.

상기 시각화부(300)는 result.html에서 사용자는 시각화 정보를 보여주는 display_plot.html로 넘어갈 수 있다.In the visualization unit 300, the user can move from result.html to display_plot.html, which shows visualization information.

상기 시각화부(300)는 display_plot에서는 서버로부터 받은 TSNE좌표 데이터로, Plotly.js를 이용하여 html상에 scatter plot을 그린다.The visualization unit 300 uses TSNE coordinate data received from the server in display_plot and draws a scatter plot on HTML using Plotly.js.

상기 시각화부(300)는 이 때 2개의 plot을 보여주게 되는데, 사용자 입력 기업과 '사용자가 선택한 기준'이 비슷한 기업들을 시각화 한 scatter plot과 '직무'가 비슷한 기업들을 시각화 한 scatter plot을 보여준다.The visualization unit 300 displays two plots at this time: a scatter plot visualizing companies with similar 'user-selected criteria' to the user input company and a scatter plot visualizing companies with similar 'jobs'.

사용자는 각 plot의 marker를 클릭함으로써, 해당 marker에 해당하는 기업의 구인공고내용을linktoimg.html로 이동하여 제공받는다.By clicking on the marker of each plot, the user is moved to linktoimg.html to receive job posting information from the company corresponding to that marker.

상기 시각화부(300)는 linktoimg.html에서는 WordCloud 결과와 raw 구인공고 내용 뿐만 아니라, 사용자로 하여금 시스템이 추천한 데이터의 신뢰도를 직접 확인할 수 있도록, 각 기준에 따른 '유사도'또한 제공하게 된다.The visualization unit 300 provides not only WordCloud results and raw job posting content in linktoimg.html, but also 'similarity' according to each standard so that users can directly check the reliability of the data recommended by the system.

또한 linktoimg.html에서는 사용자가 '시각화 결과' 페이지에서 누르는 marker에 따라, 빨간 marker를 누르는 경우에는 그것이 사용자가 입력한 구인공고임을 알리고, 파란 marker를 누른 경우에는 해당 구인 공고가 사용자 입력 구인공고와 '어떤 기준'에서 유사하다고 분석했는지 제공하도록 했다.Also, in linktoimg.html, depending on the marker the user clicks on the 'Visualization Results' page, if the red marker is clicked, it is notified that it is a job advertisement entered by the user, and if the blue marker is pressed, the job advertisement is notified that the job advertisement is a job advertisement entered by the user. They were asked to provide what criteria they analyzed as similar.

각 구인공고 글의 임베딩 벡터는 200차원의 고차원 데이터이다.The embedding vector of each job posting is 200-dimensional high-dimensional data.

상기 시각화부(300)는 사용자에게 시각화된 정보를 제공하기 위해, TSNE 모듈을 사용하여 데이터를 2차원으로 축소 시킨 뒤, 2차원 평면에 시각화 하여 제공한다.In order to provide visualized information to the user, the visualization unit 300 reduces data to two dimensions using a TSNE module and then provides visualization on a two-dimensional plane.

상기 시각화부(300)는 이때 모든 기업의 정보를 시각화 하는 것이 아닌, 사용자 입력 기업으로부터 일정 거리만큼 떨어진 기업들(가장 유사한 기업들)만 시각화 하여 제공하게 된다.At this time, the visualization unit 300 does not visualize information on all companies, but only visualizes and provides companies that are a certain distance away from the user input company (most similar companies).

상기 시각화부(300)는 시각화된 정보들 중, 임의의 두 marker 간의 위치 차이가 매우 극소하여 하나의 marker처럼 보이는 현상을 방지하기 위해, 임의의 두 marker사이의 거리가 극소할 경우, 둘 중 하나의 위치를 조금 변경하여 겹치지 않도록 시각화를 수행한다.Among the visualized information, the visualization unit 300 is configured to prevent any two markers from appearing as one marker due to a very small difference in position between any two markers. When the distance between any two markers is very small, the visualization unit 300 selects one of the two markers. Perform visualization by slightly changing the position so that they do not overlap.

상기 시각화부(300)는 토큰화된 복리후생/자격요건 데이터들을 키워드로 WordCloud 모듈을 만들어 제공한다.The visualization unit 300 creates and provides tokenized welfare/qualification data as a WordCloud module using keywords.

이때, 사용자가 유사도 판단의 기준으로 '복리후생'을 선택한 경우 '복리후생'토큰화 데이터로 만든 WordCloud 결과를 보여주고, '자격요건'을 선택한 경우, '자격요건' 토큰화 데이터로 만든 WordCloud 결과를 보여준다.At this time, if the user selects 'welfare' as the standard for determining similarity, the WordCloud result created with 'welfare' tokenized data is shown, and if the user selects 'qualification requirements', the WordCloud result created with 'qualification requirement' tokenized data is displayed. shows.

도 2는 본 발명의 일 실시 예에 따른 구인구직 매칭 방법을 적용한 매칭 페이지의 분석 중 화면이고, 도 3은 본 발명의 일 실시 예에 따른 구인구직 매칭 방법을 적용한 매칭 결과 화면이고, 도 4는 본 발명의 일 실시 예에 따른 구인구직 매칭 방법을 적용한 시각화 결과 화면이고, 도 5는 본 발명의 일 실시 예에 따른 구인구직 매칭 방법을 적용한 매칭 페이지에서 원하는 정보가 없을 때 결과 화면이고, 도 6은 본 발명의 일 실시 예에 따른 구인구직 매칭 방법을 적용한 결과 매칭되는 기업이 없을 때 결과 화면이고, 도 7은 본 발명의 일 실시 예에 따른 구인구직 매칭 방법을 적용한 매칭 결과 화면이다.Figure 2 is a screen during analysis of a matching page applying the job opening matching method according to an embodiment of the present invention, Figure 3 is a matching result screen applying the job opening matching method according to an embodiment of the present invention, and Figure 4 is This is a visualization result screen applying the job search matching method according to an embodiment of the present invention, and Figure 5 is a result screen when there is no desired information on the matching page applying the job search matching method according to an embodiment of the present invention, and Figure 6 is a result screen when there are no matching companies as a result of applying the job vacancy matching method according to an embodiment of the present invention, and Figure 7 is a matching result screen after applying the job vacancy matching method according to an embodiment of the present invention.

도 2 내지 도 7을 참조하면, 본 발명의 일 실시 예에 따르면, 상기 데이터 가공부(100)는 구인 구직 사이트(예를 들어, 잡플래닛)에서 원하는 기업의 채용공고 URL을 복사하여 시스템에 입력한 뒤, 해당 공고의 복리후생과 자격 요건 중 마음에 드는 항목을 선택하면 해당 구인 구직 사이트에서 사용자가 입력한 URL의 채용공고를 수집하여, 시각화부(300)가 해당 기업과 채용 직무를 분석하여 유사기업을 표 형식으로 웹 페이지에 출력할 수 있다. 모델을 통한 연산을 수행하는 동안, 사용자에게 로딩화면을 디스플레이하고, 연산이 끝난 후 분석 결과를 제공할 수 있다. 2 to 7, according to an embodiment of the present invention, the data processing unit 100 copies the URL of a job advertisement for a desired company from a job search site (for example, Job Planet) and enters it into the system. Then, when you select an item you like among the welfare benefits and qualifications of the relevant job posting, the job postings of the URL entered by the user are collected from the job search site, and the visualization unit 300 analyzes the company and the job openings. Similar companies can be printed on a web page in table format. While performing calculations through the model, a loading screen can be displayed to the user, and analysis results can be provided after the calculation is completed.

상기 시각화부(300)에서 표시한 표에 나타난 기업들의 'Details' column의 [자세히보기]를 클릭하면, 해당 기업의 실제 구인공고 내용과, 그 내용을 키워드로 분석한 WordCloud 분석결과를 사용자에게 제공할 수 있다.When clicking [View Details] in the 'Details' column of the companies shown in the table displayed in the visualization unit 300, the actual job posting content of the company and the WordCloud analysis results of the content analyzed using keywords are provided to the user. can do.

상기 표 형식의 분석 결과 하단의 [시각화 자료 보러가기]를 클릭하면, 사용자가 입력한 기업과 [직무]가 유사한 기업들, [사용자가 입력한 기준]이 유사한 기업들을 각각 2차원 시각화 자료로 만들어 사용자에게 제공할 수 있다.When you click [View visualization data] at the bottom of the table format analysis results, companies with similar [jobs] and [user-entered criteria] to the company entered by the user are created into two-dimensional visualization data. It can be provided to the user.

상기 시각화부(300)의 시각화 자료에서 빨간 점은 사용자가 입력한 URL의 기업이고, 파란점은 빨간 점의 기업과 유사하다고 분석된 기업들을 의미할 수 있다.In the visualization data of the visualization unit 300, red dots may represent companies whose URLs are entered by the user, and blue dots may represent companies analyzed to be similar to the red dot companies.

상기 시각화부(300)는 위의 시각화 자료에서 점들을 클릭한 경우에도, 그 점에 해당하는 기업에 대한 [자세히보기]정보를 사용자에게 제공할 수 있다.Even when a dot is clicked in the above visualization data, the visualization unit 300 can provide the user with [View details] information about the company corresponding to the dot.

상기 시각화부(300)는 기업 구인 공고의 자세한 정보를 제공하는 페이지에서는 실제 구인 공고의 내용과 키워드 WordCloud 분석 결과, 그리고 사용자가 입력한 기업과 얼마나 유사한지 %로 나타낼 수 있다.On the page that provides detailed information on company job postings, the visualization unit 300 can display the content of the actual job posting, keyword WordCloud analysis results, and percentage similarity to the company entered by the user.

상기 시각화부(300)는 사용자가 입력한 URL의 채용공고에 '복리후생' 정보가 없고, 사용자가 '복리후생'을 기준으로 분석한 결과를 요청한 경우, 링크에 복리후생 정보가 없음을 알리고, 사용자가 URL을 다시 입력할 것을 요청할 수 있다.If there is no 'welfare' information in the job advertisement of the URL entered by the user and the user requests an analysis result based on 'welfare', the visualization unit 300 notifies that there is no welfare information in the link, You can request that the user re-enter the URL.

상기 시각화부(300)는 사용자가 입력한 URL이 모델이 학습하지 않은 새로운 데이터인 경우, 추가적인 학습을 하는 데에 시간이 좀 더 소요되기 때문에, 로딩시간을 조정하고 사용자에게 어떤 연산을 하는 중인지 나타낼 수 있다.If the URL entered by the user is new data that the model has not learned, it takes more time to perform additional learning, so the visualization unit 300 adjusts the loading time and indicates to the user what operation is being performed. You can.

상기 시각화부(300)는 분석결과 비슷한 기업이 없는 경우(직무와 사용자 입력 기준이 모두 유사한 기업이 없는 경우), 이를 사용자에게 알릴 수 있다. 상기 시각화부(300)는 이 후 오른쪽 하단에 직무만 비슷한 기업들을 확인할 수 있는 링크를 제공할 수 있다. 사용자가 해당 링크를 클릭한 경우, 표 형식의 유사 기업리스트를 출력하지만, [직무가 비슷한 기업] 이라고 명시하여 사용자가 어떤 기준으로 분석된 내용인지 인지할 수 있도록 한다.If there are no similar companies as a result of the analysis (if there are no companies with similar job duties and user input criteria), the visualization unit 300 can notify the user of this. The visualization unit 300 may then provide a link at the bottom right to check companies with similar jobs. When the user clicks on the link, a list of similar companies is output in table format, but it is specified as [Companies with similar jobs] so that the user can recognize the criteria for the analysis.

도 8은 본 발명의 일 실시 예에 따른 구인구직 매칭 방법의 흐름도이다.Figure 8 is a flowchart of a job search matching method according to an embodiment of the present invention.

도 8을 참조하면, 본 발명의 일 실시 예에 따른 구인구직 매칭 방법은 사용자가 원하는 기업 조건 및 사용자 스펙을 적어도 하나 이상 입력 받는 단계(S110)를 포함할 수 있다.Referring to FIG. 8, the job matching method according to an embodiment of the present invention may include the step of receiving at least one company condition and user specification desired by the user (S110).

S110 단계에서, 상기 데이터 가공부(100)는 사용자가 원하는 기업 조건 및 사용자 스펙을 적어도 하나 이상 입력 받을 수 있다. 상기 데이터 가공부(100)는 구인 구직 사이트 등을 통해서 기업의 구인 구직 정보를 수집할 수 있다. 상기 데이터 가공부(100)는 구인 구직 사이트에서 기업의 구인 구직 정보를 크롤링해 수집할 수 있다. 상기 데이터 가공부(100)는 상기 사용자가 원하는 기업 조건을 적어도 하나 이상 입력 받을 수 있다. 상기 데이터 가공부(100)는 상기 사용자의 스펙을 적어도 하나 이상 입력 받을 수 있다. In step S110, the data processing unit 100 may receive at least one input of company conditions and user specifications desired by the user. The data processing unit 100 may collect information on job vacancies from companies through job vacancies, etc. The data processing unit 100 may crawl and collect job information from companies on job search sites. The data processing unit 100 may receive input of at least one company condition desired by the user. The data processing unit 100 may receive input of at least one specification of the user.

S110 단계에서, 상기 데이터 가공부(100)는 구인 구직 정보에 관한 정형화된 데이터를 마련하기 위해, 여러 채용사이트들 중 정형화된 채용공고의 데이터를 가지고 있는 구인 구직 사이트(예를 들어, 잡플래닛) 플랫폼을 사용해 크롤링할 수 있다. 상기 데이터 가공부(100)는 잡플래닛이 다른 구인구직 플랫폼보다 제공하는 정보가 상대적으로 정형화 돼 있지만, 정보가 부족한 채용공고에 대해, 예외처리를 해주어 상황에 맞는 크롤링을 진행할 수 있다. 상기 데이터 가공부(100)는 브라우저 자동화 도구인 Selenium을 통해 데이터 크롤링을 진행할 수 있고, 대량의 데이터를 수집하기 위해 모든 직무별 채용공고를 2~3일에 걸쳐 수집할 수 있다.In step S110, the data processing unit 100 selects a job search site (e.g., Job Planet) that has data on standardized job advertisements among various job search sites to prepare standardized data on job search information. You can crawl using the platform. Although the information provided by Job Planet is relatively standardized compared to other job search platforms, the data processing unit 100 can perform crawling appropriate to the situation by processing exceptions for job advertisements with insufficient information. The data processing unit 100 can perform data crawling through Selenium, a browser automation tool, and can collect job advertisements for all jobs over 2 to 3 days to collect a large amount of data.

S110 단계에서, 상기 데이터 가공부(100)는 형식은 통일되었지만, 채용공고마다 다른 말투와 필요하지 않은 정보들을 교정(불용어 처리)할 수 있고, 유의미한 데이터를 뽑고 학습의 정확도를 높일 수 있도록 데이터 토큰화 단계를 가질 수 있다. 본 발명의 일 실시 예에 따르면, 상기 데이터 가공부(100)는 프로젝트에 맞게 토큰화 방식을 찾기 위해, 대중적인 한글 형태소 분석기인 konlpy(Mecab,Kkma, Okt 등..)와 최신 기술인 Kiwi토크나이저를 직접 비교 분석한 뒤, 더 다양한 품사 태깅 방식을 이용하는 Kiwi토크나이저를 채택할 수 있다. 결론적으로 Kiwi+mecab토크나이저를 혼합해 유기적인 토큰화 방법을 사용했는데, kiwi토크나이저에서 인지할 수 없는 한글과 영어 조합에 대해서는 Mecab토크나이저를 사용하였다. 상기 데이터 가공부(100)는 토큰화를 진행한 데이터를 복리후생과 자격요건 각각의 데이터로 저장할 수 있다. 추후 상기 학습부(200)에서 사용자가 비슷한 직무에 대해 두 가지 기준 중 하나를 선택해 유사한 기업을 추천 받을 수 있도록 학습 모델을 설정할 수 있다.In step S110, the data processing unit 100 uses a data token to correct (process stopwords) the format and the different speech and unnecessary information for each job posting, even though the format is unified, and to extract meaningful data and increase the accuracy of learning. There may be stages of anger. According to one embodiment of the present invention, the data processing unit 100 uses konlpy (Mecab, Kkma, Okt, etc.), a popular Korean morpheme analyzer, and Kiwi tokenizer, the latest technology, to find a tokenization method suitable for the project. After directly comparing and analyzing, you can adopt Kiwi Tokenizer, which uses a more diverse part-of-speech tagging method. In conclusion, an organic tokenization method was used by mixing Kiwi+mecab tokenizer, and Mecab tokenizer was used for Korean and English combinations that could not be recognized by kiwi tokenizer. The data processing unit 100 may store the tokenized data as data for welfare benefits and qualification requirements. Later, the learning unit 200 can set up a learning model so that a user can be recommended a similar company by selecting one of two criteria for a similar job.

S110 단계에서, 상기 데이터 가공부(100)는 기업의 채용 공고에서 기업이름과 직무를 tag로, 채용정보를 말뭉치로써 문서라고 간주한다면, 기업이름과 직무의 제목을 가지고 있는 문서들의 유사도 학습을 위해 문서의 벡터화를 진행할 수 있다. 이를 위해, 상기 데이터 가공부(100)는 기존의 워드 임베딩 기술 Word2Vec에서 확장된 문서 임베딩 기술인 Doc2Vec을 사용할 수 있다. Doc2Vec을 저희 데이터에 적용시키기 위해, 자연어를 벡터로 변환하는데 필요한 대부분의 편의기능을 제공하는 파이썬 라이브러리인 gensim을 사용할 수 있다.In step S110, if the data processing unit 100 considers the company's job announcement as a document with the company name and job as a tag and the job information as a corpus, the data processing unit 100 learns the similarity of documents containing the company name and job title. You can vectorize the document. For this purpose, the data processing unit 100 may use Doc2Vec, a document embedding technology extended from the existing word embedding technology Word2Vec. To apply Doc2Vec to our data, we can use gensim, a Python library that provides most of the convenience functions needed to convert natural language to vectors.

S110 단계에서, 상기 데이터 가공부(100)는 데이터 전처리의 정확도를 정교화하며 Doc2Vec을 통해 학습한 모델(각 기업 채용 공고마다 유사도를 나타내는 벡터모델)을 사용자입장에서 편리하게 사용해 볼 수 있도록, 시스템을 Flask 프레임워크로 개발했다. 사용자는 상기 시각화부(300)를 통해 구현된 결과로 잘 정리된 채용 공고와 시각적인 자료, 데이터의 신뢰도를 확인할 수 있는 채용 공고 유사도를 웹을 통해 확인할 수 있다.In step S110, the data processing unit 100 refines the accuracy of data preprocessing and configures the system so that the user can conveniently use the model learned through Doc2Vec (a vector model indicating similarity for each company's job posting). Developed with the Flask framework. As a result of the visualization unit 300, the user can check well-organized job postings, visual data, and job posting similarity that can confirm the reliability of the data through the web.

S110 단계에서, 본 발명의 일 실시 에에 따르면, 상기 데이터 가공부(100)는 잡플래닛에 등록되어 있는 모든 구인공고들에 대해 크롤링을 실시하고, 잡플래닛 사이트의 JavaScript등 동적으로 구현된 부분을 크롤링하기 위해 Selenium의 웹 드라이버를 이용할 수 있다. 잡플래닛이 다른 구인구직 플랫폼보다 상대적으로 정형화된 정보를 제공하지만, 기업마다 제공하는 정보 종류의 개수가 다르기 때문에, 예외 처리를 이용하여 데이터를 수집할 수 있다.In step S110, according to one embodiment of the present invention, the data processing unit 100 crawls all job postings registered in Job Planet and crawls dynamically implemented parts such as JavaScript of the Job Planet site. To do this, you can use Selenium's web driver. Although Job Planet provides relatively more standardized information than other job search platforms, since the number of types of information provided is different for each company, data can be collected using exception processing.

S110 단계에서, 상기 데이터 가공부(100)는 예외상황 발생시, 해당 페이지를 다시 크롤링할 수 있다. 데이터 크롤링 후, 기업이름, 구인공고제목, 마감일, 채용직무, 경력, 고용형태, 급여, 스킬, 기업소개, 주요업무, 자격요건, 우대사항, 채용절차, 복리후생 별로 데이터를 나누어 csv 파일로 저장할 수 있다.In step S110, the data processing unit 100 may crawl the page again when an exception occurs. After crawling the data, the data can be divided by company name, job posting title, deadline, recruitment job, experience, employment type, salary, skill, company introduction, main tasks, qualifications, preferential treatment, recruitment procedure, and welfare benefits and saved as a CSV file. You can.

S110 단계에서, 상기 데이터 가공부(100)는 저장한 csv 파일에 대해 일괄적으로 전처리 작업(불용어 처리 + 성격이 비슷한 데이터는 하나의 column으로 합침)을 수행할 수 있다. 예를 들어, 필요스킬, 자격요건, 우대사항은 성격이 같기 때문에 '자격요건' column에 몰아넣을 수 있다. 예를 들어, 구인공고제목, 직무, 주요업무는 성격이 같기 때문에 'task' column에 몰아넣을 수 있다.In step S110, the data processing unit 100 may collectively perform preprocessing (stopword processing + merging data with similar characteristics into one column) on the stored CSV files. For example, required skills, qualifications, and preferential treatment can be lumped into the ‘Qualifications’ column because they have the same nature. For example, the job posting title, job title, and main tasks have the same nature, so they can be grouped into the 'task' column.

S110 단계에서, 상기 데이터 가공부(100)는 똑같은 구인공고라도, 마감일이 다른 경우 다른 구인공고로 인식하여 저장되기 때문에, pandas 모듈의 drop_duplicates를 이용하여 '기업명+구인공고제목'이 같은 경우, 중복 데이터를 제외할 수 있다.In step S110, the data processing unit 100 recognizes and stores the same job posting as a different job posting if the deadline is different. Therefore, if the 'company name + job posting title' are the same, duplicates are detected using the drop_duplicates of the pandas module. Data can be excluded.

S110 단계에서, 상기 데이터 가공부(100)는 불용어 처리시, 사용자에게 구인공고의 raw data또한 제공할 것이기 때문에, 원본상태의 구인공고 글을 따로 column을 만들어 보관할 수 있다.In step S110, when processing stop words, the data processing unit 100 will also provide raw data of the job posting to the user, so the original job posting can be created and stored in a separate column.

S110 단계에서, 상기 데이터 가공부(100)는 csv저장시, 구인공고에 "\r" 문자가 포함된 경우, 자동으로 행이 넘어가는 상황이 발생하여 \r를 \n으로 replace 처리할 수 있다.In step S110, the data processing unit 100 may replace \r with \n in a situation where a line is automatically skipped if the character "\r" is included in the job posting when saving CSV. .

S110 단계에서, 상기 데이터 가공부(100)는 여러 토크나이저를 사용한 토큰화 기법들을 비교한 후, 가장 토큰화가 잘되는(문장에서 중요한 단어를 잘 추출하는) 토크나이저를 사용할 수 있다.In step S110, the data processing unit 100 may compare tokenization techniques using several tokenizers and then use the tokenizer that is best at tokenizing (extracting important words from sentences well).

Kiwi, Mecab, Kkma, Okt 등을 비교한 결과, kiwi와 Mecab 토크나이저가 가장 품사 태그를 다양하게 나누고, 학습에 적합하다.As a result of comparing Kiwi, Mecab, Kkma, Okt, etc., Kiwi and Mecab tokenizers divide part-of-speech tags in the most diverse ways and are suitable for learning.

S110 단계에서, 상기 데이터 가공부(100)는 불용어 처리된 데이터에서, kiwi토크나이저를 이용하여 '일반명사, 고유명사, 영어, 어근'으로 토큰화할 수 있다.In step S110, the data processing unit 100 can tokenize stopword-processed data into 'common nouns, proper nouns, English, and roots' using the kiwi tokenizer.

S110 단계에서, 상기 데이터 가공부(100)는 kiwi에서 exception을 발생시키는 데이터에 대해서는 Mecab 토크나이저를 사용하여 토큰화할 수 있다In step S110, the data processing unit 100 can tokenize data that causes an exception in kiwi using the Mecab tokenizer.

S110 단계에서, 상기 데이터 가공부(100)는 개발한 토크나이저를 이용하여 csv 파일을 두 개 만들 수 있다. 예를 들어, 복리후생과 task column을 토큰화 한 csv 파일, 자격요건과 task column을 토큰화 한 csv 파일을 생성할 수 있다.In step S110, the data processing unit 100 can create two csv files using the developed tokenizer. For example, you can create a CSV file that tokenizes benefits and task columns, and a CSV file that tokenizes qualifications and task columns.

S110 단계에서, 상기 데이터 가공부(100)는 각 csv파일을 만들 때에 task column을 추가적으로 토큰화 하는 이유는, 학습모델을 2개 만들 것이기 때문이다. 예를 들어, 복리후생/자격요건을 학습한 모델 + task를 학습한 모델을 만들 수 있다.In step S110, the data processing unit 100 additionally tokenizes the task column when creating each csv file because two learning models will be created. For example, you can create a model that learns welfare/qualification requirements + a model that learns tasks.

S110 단계에서, 상기 데이터 가공부(100)가 학습 모델을 2개 사용하는 이유는 사용자가 입력한 기업과 '복리후생/자격요건'중 하나의 기준이 비슷한 기업들을 추천할 것인데, '복리후생/자격요건'은 비슷한데 정작 채용직무가 다르다면, 해당 추천은 사용자에게 의미가 없기 때문이다.In step S110, the reason why the data processing unit 100 uses two learning models is to recommend companies that have similar criteria in one of 'welfare/qualification requirements' to the company entered by the user. If the ‘qualification requirements’ are similar but the actual job duties are different, the recommendation has no meaning to the user.

상기 데이터 가공부(100)는 토큰화 된 데이터를 Doc2Vec 모델을 이용해 학습을 수행하는데, 이 때 Doc2Vec모델의 Hyperparameter를 조정해 가며, 가장 우수한 정확도를 가지는 값을 채택할 수 있다.The data processing unit 100 performs learning on the tokenized data using the Doc2Vec model. At this time, the hyperparameter of the Doc2Vec model can be adjusted to select the value with the best accuracy.

이 때 기계 학습의 방법으로 Doc2Vec을 사용한 이유는, TensorFlow를 이용한 문서 임베딩 방법과 비교해 보았을 때 속도가 더 빠르고, 여러 hyperparameter를 조정하며 학습 가능하다는 점에서 Doc2Vec이 학습에 더 유리할 것으로 판단했기 때문이다.The reason why Doc2Vec was used as a machine learning method at this time was because it was judged that Doc2Vec would be more advantageous for learning in that it is faster than the document embedding method using TensorFlow and can be learned by adjusting several hyperparameters.

S110 단계에서, 상기 데이터 가공부(100)는 학습한 두개의 모델 중 task를 학습한 모델에서, 사용자가 입력한 기업과 70% 이상 유사한 기업들의 index를 우선적으로 추출할 수 있다.In step S110, the data processing unit 100 may preferentially extract an index of companies that are more than 70% similar to the company entered by the user from the model that learned the task among the two learned models.

S110 단계에서, 상기 데이터 가공부(100)는 직무의 유사도 하한선을 70%로 정한 이유는, 직무 모델(task column을 학습한 모델)을 학습시에 사용한 데이터의 양이 많았기 때문에(task column에 '구인공고제목, 직무, 주요업무' 데이터를 몰아 넣음), 70% 유사도를 갖더라도 실제 데이터를 비교한 경우 유사했기 때문이다.In step S110, the data processing unit 100 set the lower limit of job similarity to 70% because the amount of data used when training the job model (a model that learned the task column) was large (in the task column This is because, even though there was a 70% similarity, it was similar when comparing the actual data.

S110 단계에서, 상기 데이터 가공부(100)는 직무모델에서 추출된 기업들에 대해 '복리후생/자격요건'모델로 유사한 기업들을 추출하여 유사도가 높은 순으로 출력합니다. 예를 들어, 최종적으로 직무가 70%이상 유사하면서 '복리후생/자격요건'또한 유사한 기업들이 추출될 수 있다.In step S110, the data processing unit 100 extracts similar companies using the 'welfare/qualification requirements' model for companies extracted from the job model and outputs them in order of highest similarity. For example, ultimately, companies with more than 70% similar jobs and similar 'welfare/qualification requirements' can be extracted.

본 발명의 일 실시 예에 따른 구인구직 매칭 방법은 상기 기업 조건을 만족하는 기업들과 적어도 하나 이상의 상기 사용자 스펙에 따른 기업 선호도의 상관 관계를 학습하고, 연산하는 단계(S120)를 포함할 수 있다.The job matching method according to an embodiment of the present invention may include learning and calculating the correlation between companies satisfying the company conditions and company preferences according to at least one user specification (S120). .

S120 단계에서, 상기 학습부(200)는 상기 기업 조건을 만족하는 기업들과 적어도 하나 이상의 상기 사용자 스펙에 따른 기업 선호도의 상관 관계를 학습하고, 연산할 수 있다. 상기 학습부(200)는 상기 기업 조건을 만족하는 기업들을 선별해 리스팅할 수 있다. 상기 학습부(200)는 상기 기업 조건을 만족하는 기업들의 리스트에서 상기 사용자의 스펙에 따른 기업 선호도의 상관 관계에 따라 상기 사용자가 선호할 만한 기업들을 순위에 따라 나열할 수 있다. In step S120, the learning unit 200 may learn and calculate the correlation between companies satisfying the company conditions and company preferences according to at least one user specification. The learning unit 200 can select and list companies that satisfy the company conditions. The learning unit 200 may rank companies that the user may prefer according to the correlation of company preferences according to the user's specifications from the list of companies that satisfy the company conditions.

S120 단계에서, 상기 학습부(200)는 적어도 하나 이상의 사용자 스펙과 기업 선호도를 매칭할 수 있다. 상기 학습부(200)는 상기 사용자 스펙의 변동과 상기 기업 선호도의 변동의 상관 관계를 확률로 정의하여 학습할 수 있다. 상기 학습부(200)는 상기 확률에 따라 상기 기업 조건과 적어도 하나 이상의 사용자 스펙에 따른 기업 선호도의 상관 관계를 연산할 수 있다. In step S120, the learning unit 200 may match at least one user specification and company preference. The learning unit 200 may learn the correlation between the change in user specifications and the change in company preference by defining it as a probability. The learning unit 200 may calculate a correlation between the company conditions and company preferences according to at least one user specification according to the probability.

S120 단계에서, 상기 학습부(200)는 상기 기상 상태 속성의 변동과 상기 대상의 상태의 변동의 상관 관계를 확률로 정의하여 학습하여 상관 관계를 측정할 수 있다. In step S120, the learning unit 200 may measure the correlation by defining the correlation between the change in the weather state attribute and the change in the state of the object as a probability.

S120 단계에서, 상기 학습부(200)는 상기 기업 조건을 만족하는 기업들과 적어도 하나 이상의 상기 사용자 스펙에 따른 기업 선호도의 영향력을 연산할 수 있다. In step S120, the learning unit 200 may calculate the influence of companies satisfying the company conditions and company preferences according to at least one user specification.

S120 단계에서, 상기 학습부(200)는 상기 영향력은 상기 기업 선호도에 상기 사용자 스펙의 변화에 따른 영향이 얼마나 있는지에 따라 달라지는 값으로 다음의 수학식 1에 의해 연산될 수 있다.In step S120, the learning unit 200 may calculate the influence as a value that varies depending on how much influence a change in the user specification has on the company's preference using Equation 1 below.

[수학식 1][Equation 1]

S120 단계에서, 상기 학습부(200)에서 연산부하는 상기 영향력의 기대값(평균)을 나타내는 것으로써, 연산부하가 크다는 것은 평균 영향력이 크다는 것이며 불확실성이 크면 클수록 분류하기는 어려워지기 때문에 연산부하가 가장 작은 것을 상위 의사 결정 노드에 위치시켜 연산량을 감소시킬 수 있다. 상기 연산부하는 하기 수학식 2를 연산해 구할 수 있다.In step S120, the computational load in the learning unit 200 represents the expected value (average) of the influence. A large computational load means a large average influence, and the greater the uncertainty, the more difficult it is to classify, so the computational load is the highest. The amount of computation can be reduced by placing small items in the upper decision-making nodes. The computational load can be obtained by calculating Equation 2 below.

[수학식 2][Equation 2]

S120 단계에서, 상기 학습부(200)에서 상기 상관 관계는 하기 수학식 3을 연산해 구할 수 있다.In step S120, the learning unit 200 can obtain the correlation by calculating Equation 3 below.

[수학식 3][Equation 3]

본 발명의 일 실시 예에 따른 구인구직 매칭 방법은 상기 상관 관계에 따른 상기 기업 조건을 만족하는 기업들의 순위를 결정하는 단계(S130)를 포함할 수 있다.The job matching method according to an embodiment of the present invention may include determining the ranking of companies that satisfy the company conditions according to the correlation (S130).

S130 단계에서, 상기 학습부(200)는 상기 상관 관계에 따른 상기 사용자 스펙의 순위를 결정할 수 있다. 상기 학습부(200)는 상기 상관 관계가 낮은 순서대로 순위를 결정할 수 있다. 상기 학습부(200)는 상기 상관 관계가 낮은 상기 사용자 스펙을 추천 우선순위에 둘 수 있다. In step S130, the learning unit 200 may determine the ranking of the user specifications according to the correlation. The learning unit 200 may determine the ranking in descending order of correlation. The learning unit 200 may prioritize the user specifications with the low correlation in recommendation.

S130 단계에서, 상기 학습부(200)는 복수의 상기 사용자 스펙의 조합에 따른 기업 선호도를 포함하는 순위를 결정할 수 있다.In step S130, the learning unit 200 may determine a ranking including company preference according to a combination of a plurality of user specifications.

본 발명의 일 실시 예에 따른 구인구직 매칭 방법은 상기 상관 관계에 따른 상기 기업 순위에 따라 상기 기업의 리스트를 상기 사용자에 제시하는 단계(S140)를 포함할 수 있다.The job matching method according to an embodiment of the present invention may include presenting a list of companies to the user according to the ranking of the companies according to the correlation (S140).

S140 단계에서, 상기 시각화부(300)는 상기 상관 관계에 따른 상기 기업 순위에 따라 상기 기업의 리스트를 상기 사용자에 제시할 수 있다.In step S140, the visualization unit 300 may present a list of companies to the user according to the ranking of the companies according to the correlation.

S140 단계에서, 상기 시각화부(300)는 복수의 상기 사용자 스펙에 따른 기업 선호도의 상관 관계의 순위에 따른 결과를 제공할 수 있다. 상기 시각화부(300)는 상기 사용자 스펙들의 복수의 조합에 따른 기업 선호도의 상관 관계의 순위에 따른 결과를 제공할 수 있다. In step S140, the visualization unit 300 may provide results according to the ranking of correlations between corporate preferences according to a plurality of user specifications. The visualization unit 300 may provide results according to the ranking of correlations between corporate preferences according to a plurality of combinations of the user specifications.

S140 단계에서, 상기 시각화부(300)는 사용자가 유사한 기업을 추천 받고 추천 받은 기업 공고의 정보를 확인 할 때, 시각적으로 어떤 부분이 유사했는지 확인하기 쉽도록 데이터를 시각화하여 사용자에게 제공할 수 있다. 상기 시각화부(300)는 각각의 기업 공고 내용에 자주 등장하는 핵심 단어를 다양하게 시각화하는 Wordcloud 방식을 사용할 수 있다. 상기 시각화부(300)는 additional web page에 유사한 기업들을 2차원 좌표평면 상에 TSNE(고차원의 벡터 데이터의 차원을 축소 하는 기법)를 이용해 보여줌으로써, 사용자에게 직관적이고 흥미로운 방법으로 유사한 기업을 탐색해볼 수 있는 기능도 제공할 수 있다.In step S140, the visualization unit 300 may visualize the data and provide it to the user so that it is easy to visually check which parts are similar when the user is recommended a similar company and checks the information in the recommended company announcement. . The visualization unit 300 can use the Wordcloud method to variously visualize key words that frequently appear in each company's announcement content. The visualization unit 300 displays similar companies on an additional web page using TSNE (a technique for reducing the dimension of high-dimensional vector data) on a two-dimensional coordinate plane, allowing users to explore similar companies in an intuitive and interesting way. Functions that can be used can also be provided.

S140 단계에서, 상기 시각화부(300)는 앞서 '데이터 임베딩' 단계까지 개발한 시스템을 Flask를 이용한 웹 서버에 탑재해 출력할 수 있다. Backend와 FrontEnd간의 데이터 전송은, back-end에서는 html파일을 클라이언트에게 보여주는 render_template 함수를 호출 시 parameter로 front-end로 데이터를 전달하는 방법으로 수행된다. Back-end에서 전달받은 데이터를, Front-end에서는 JavaScript와 jQuery를 이용하여 html상에 배치한 컴포넌트들에 나타나도록 할 수 있다. Front-end에서 웹 서버로의 데이터 전달은 form태그의 get method를 이용하거나, front-end에서 호출한 URL에 data를 포함하여 전달 후 back-end에서 처리하여 이용하는 방식으로 구현할 수 있다.In step S140, the visualization unit 300 can load and output the system previously developed up to the 'data embedding' stage on a web server using Flask. Data transmission between the Backend and FrontEnd is performed by sending data to the front-end as a parameter when the back-end calls the render_template function that shows the HTML file to the client. Data received from the back-end can be displayed in components placed on HTML using JavaScript and jQuery at the front-end. Data transfer from the front-end to the web server can be implemented by using the get method of the form tag, or by including data in the URL called by the front-end and then processing it at the back-end.

시각화 정보를 보여주는 html파일의 경우, 웹 브라우저에 캐싱된 데이터가 아닌 새로 바뀐 정보를 제공해야 하기 때문에, Cache-Control를 이용하여 캐싱을 하지 않도록 막고, 시각화 이미지를 부를 시에 URL에 랜덤한 숫자를 추가하여 서버에 전송 함으로써, 같은 이름의 이미지 파일에 다른 시각화 정보를 덮어씌워 사용하더라도 바뀐 이미지가 사용자에게 제공하도록 구현할 수 있다.In the case of html files showing visualization information, new information must be provided rather than data cached in the web browser, so use Cache-Control to prevent caching and add a random number to the URL when calling a visualization image. By adding it and sending it to the server, the changed image can be provided to the user even if the image file of the same name is overwritten with other visualization information.

시작페이지: hello.html,Start page: hello.html,

로딩페이지: index.html,Loading page: index.html,

분석결과페이지: result.html,Analysis result page: result.html,

사용자는 apply.html에 잡플래닛 구인공고 링크를 입력한 뒤, 어떤 기준(복리후생 or 자격요건)에 따라 유사한 기업을 추천 받을 것인지 입력하게 된다.The user enters the Job Planet job posting link in apply.html and then enters what criteria (benefits or qualifications) they would like to receive recommendations for similar companies.

S140 단계에서, 상기 시각화부(300)는 웹 서버에서는 사용자가 입력한 링크와 기준에 따라, 학습된 모델에서 유사한 기업을 추출하여result.html에서 table을 통해 표 형식으로 사용자에게 제공한다.In step S140, the visualization unit 300 extracts similar companies from the learned model according to the link and criteria entered by the user on the web server and provides them to the user in a table format through a table in result.html.

S140 단계에서, 상기 시각화부(300)는 사용자는 result.html의 각 기업의 [자세히보기]를 클릭하여 해당 기업 구인공고의 키워드를 분석한 WordCloud 결과와, 전처리 되지 않은 raw 구인공고 내용을 제공받는다.In step S140, the visualization unit 300 allows the user to click [View Details] for each company in result.html to receive WordCloud results that analyze keywords in job postings for the company and raw job posting content that has not been preprocessed. .

S140 단계에서, 상기 시각화부(300)는 사용자가 입력한 링크에 '복리후생'에 대한 정보가 없는 상태에서 복리후생으로 추천을 명령한 경우, 시스템은 nobokri.html을 보여줌으로써 사용자에게 링크를 다시 입력할 것을 사용자에게 요구한다.In step S140, if the visualization unit 300 commands a recommendation as welfare benefits without information on 'welfare' in the link entered by the user, the system re-directs the link to the user by showing nobokri.html. Prompts the user for input.

S140 단계에서, 상기 시각화부(300)는 사용자가 입력한 링크의 기업과 유사한 기업이 없는 경우(직무와 기준이 모두 유사한 기업이 없는 경우), 시스템은 nolist.html을 보여줌으로써 유사 기업이 없음을 알리고, 사용자에게 '입력 받은 기준'이 유사하지는 않지만, '직무'가 유사한 다른 기업들을 추천해준다.In step S140, if there is no company similar to the company in the link entered by the user (when there is no company similar to the company in both job duties and standards), the system displays nolist.html to indicate that there are no similar companies. Notifies the user and recommends other companies with similar 'jobs' although the 'input criteria' are not similar.

S140 단계에서, 상기 시각화부(300)는 result.html에서 사용자는 시각화 정보를 보여주는 display_plot.html로 넘어갈 수 있다.In step S140, the visualization unit 300 can move from result.html to display_plot.html, which shows visualization information.

S140 단계에서, 상기 시각화부(300)는 display_plot에서는 서버로부터 받은 TSNE좌표 데이터로, Plotly.js를 이용하여 html상에 scatter plot을 그린다.In step S140, the visualization unit 300 draws a scatter plot on HTML using Plotly.js using the TSNE coordinate data received from the server in display_plot.

S140 단계에서, 상기 시각화부(300)는 이 때 2개의 plot을 보여주게 되는데, 사용자 입력 기업과 '사용자가 선택한 기준'이 비슷한 기업들을 시각화 한 scatter plot과 '직무'가 비슷한 기업들을 시각화 한 scatter plot을 보여준다.In step S140, the visualization unit 300 displays two plots: a scatter plot visualizing companies with similar 'user-selected criteria' to the user input company, and a scatter plot visualizing companies with similar 'jobs'. Shows the plot.

S140 단계에서, 상기 시각화부(300)는 linktoimg.html에서는 WordCloud 결과와 raw 구인공고 내용 뿐만 아니라, 사용자로 하여금 시스템이 추천한 데이터의 신뢰도를 직접 확인할 수 있도록, 각 기준에 따른 '유사도'또한 제공하게 된다.In step S140, the visualization unit 300 provides not only WordCloud results and raw job posting content in linktoimg.html, but also 'similarity' according to each criterion so that the user can directly check the reliability of the data recommended by the system. I do it.

S140 단계에서, 상기 시각화부(300)는 사용자에게 시각화된 정보를 제공하기 위해, TSNE 모듈을 사용하여 데이터를 2차원으로 축소 시킨 뒤, 2차원 평면에 시각화 하여 제공한다.In step S140, in order to provide visualized information to the user, the visualization unit 300 reduces the data to two dimensions using a TSNE module and then visualizes it on a two-dimensional plane and provides it.

S140 단계에서, 상기 시각화부(300)는 이때 모든 기업의 정보를 시각화 하는 것이 아닌, 사용자 입력 기업으로부터 일정 거리만큼 떨어진 기업들(가장 유사한 기업들)만 시각화 하여 제공하게 된다.In step S140, the visualization unit 300 does not visualize information on all companies, but only visualizes and provides companies that are a certain distance away from the user input company (most similar companies).

S140 단계에서, 상기 시각화부(300)는 시각화된 정보들 중, 임의의 두 marker 간의 위치 차이가 매우 극소하여 하나의 marker처럼 보이는 현상을 방지하기 위해, 임의의 두 marker사이의 거리가 극소할 경우, 둘 중 하나의 위치를 조금 변경하여 겹치지 않도록 시각화를 수행한다.In step S140, the visualization unit 300 operates when the distance between any two markers is very small among the visualized information, in order to prevent the phenomenon of appearing as one marker due to the very small difference in position between any two markers. , perform visualization by slightly changing the position of one of the two so that they do not overlap.

S140 단계에서, 상기 시각화부(300)는 토큰화된 복리후생/자격요건 데이터들을 키워드로 WordCloud 모듈을 만들어 제공한다.In step S140, the visualization unit 300 creates a WordCloud module using tokenized welfare/qualification data as a keyword and provides it.

이 때, 사용자가 유사도 판단의 기준으로 '복리후생'을 선택한 경우 '복리후생'토큰화 데이터로 만든 WordCloud 결과를 보여주고, '자격요건'을 선택한 경우, '자격요건' 토큰화 데이터로 만든 WordCloud 결과를 보여준다.At this time, if the user selects 'welfare' as the standard for determining similarity, the WordCloud result created with 'welfare' tokenized data is shown, and if the user selects 'qualification requirements', the WordCloud result created with 'qualification requirement' tokenized data is displayed. Shows the results.

이제까지 본 발명에 대하여 그 바람직한 실시예들을 중심으로 살펴보았다. 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.So far, the present invention has been examined focusing on its preferred embodiments. A person skilled in the art to which the present invention pertains will understand that the present invention may be implemented in a modified form without departing from the essential characteristics of the present invention. Therefore, the disclosed embodiments should be considered from an illustrative rather than a restrictive perspective. The scope of the present invention is indicated in the claims rather than the foregoing description, and all differences within the equivalent scope should be construed as being included in the present invention.

Claims

A step of receiving input of at least one company condition and user specification desired by the user;
learning and calculating a correlation between companies satisfying the company conditions and company preferences according to at least one user specification;
determining the ranking of companies satisfying the company conditions according to the correlation; and
A job matching method comprising: presenting a list of companies to the user according to the ranking of companies according to the correlation.

According to paragraph 1,
The correlation calculation step is,
Matching at least one user specification and company preference;
learning the correlation between the change in the user specifications and the change in the company's preference by defining it as a probability; and
A job matching method comprising: calculating a correlation between the company conditions and company preferences according to at least one user specification according to the probability.

According to paragraph 2,
The correlation calculation step is,
A job search matching method, characterized in that the correlation is calculated by Equation 1 below.
[Equation 1]

(Here, C(S, P) is the correlation, E(S) is the analysis relevance value, S is the user specification, and P is the company preference.)

According to paragraph 1,
The user specifications are:
A job matching method characterized by including at least one of school, major, gender, age, region, and MBTI.

According to paragraph 4,
The step of determining the ranking of the above companies is,
Determine a ranking including company preference according to a combination of a plurality of user specifications,
The user presentation step is,
A job matching method characterized by providing results according to the above rankings.

A data processing unit that receives at least one input of corporate conditions and user specifications desired by the user;
a learning unit that learns and calculates a correlation between companies that satisfy the company conditions and company preferences according to at least one user specification;
a ranking unit that determines the ranking of companies that satisfy the company conditions according to the correlation; and
A job matching system comprising a visualization unit that presents a list of companies to the user according to the ranking of companies based on the correlation.

According to clause 6,
The learning department,
Match at least one user specification and corporate preference, define and learn the correlation between the change in the user specification and the change in the corporate preference as a probability, and learn the company according to the corporate condition and at least one user specification according to the probability. A job matching system characterized by calculating correlations between preferences.

In clause 7,
The learning department,
A weather information analysis system based on driving vehicle information, characterized in that the correlation is calculated using Equation 1 below.
[Equation 1]

(Here, C(S, P) is the correlation, E(S) is the analysis relevance value, S is the user specification, and P is the company preference.)

According to clause 8,
The user specifications are:
A job matching system characterized by including at least one of school, major, gender, age, region, and MBTI.

According to clause 9,
The learning department,
Determine a ranking including company preference according to a combination of a plurality of user specifications,
The visualization unit,
A job matching system characterized by providing results according to the above rankings.