KR102467081B1

KR102467081B1 - Video review system, method and program

Info

Publication number: KR102467081B1
Application number: KR1020210043908A
Authority: KR
Inventors: 이현빈; 강태근
Original assignee: 한밭대학교 산학협력단
Priority date: 2021-04-05
Filing date: 2021-04-05
Publication date: 2022-11-11
Anticipated expiration: 2041-04-05
Also published as: KR20220138145A

Abstract

본 발명의 일 실시예에 따른 동영상 검토 방법은, 사용자단말기 화면에 실행된 동영상 검토 프로그램은 확인 및 검토하려는 동영상 파일을 사용자가 입력하면, 해당 동영상 파일을 분석하여 영상 데이터와 음성 데이터 부분으로 추출하는 단계; 상기 동영상 검토 프로그램은 추출된 영상 데이터에서 사용자가 입력한 검토 시간 간격 단위로 적어도 하나의 구간별 동영상 정지 화면을 추출하여 사용자단말기 화면에 출력하는 단계; 상기 동영상 검토 프로그램은 추출된 음성 데이터에서 상기 구간별 동영상 정지 화면에 해당하는 해당 구간별 자막 정보와 음량 정보를 별도로 생성하는 단계; 해당 부분의 동영상 정지 화면별로 구분하여 구간별 동영상 정지 화면, 자막 정보, 음량 정보를 사용자 단말기 화면에 출력하는 단계를 포함한다.In the video review method according to an embodiment of the present invention, when a user inputs a video file to be checked and reviewed, the video review program executed on the screen of the user terminal analyzes the video file and extracts it into video data and audio data. step; The video review program extracts at least one video still picture for each section from the extracted video data in units of a review time interval input by a user and outputs the still picture to the screen of the user terminal; generating, by the video review program, subtitle information and volume information for each section corresponding to the video still screen for each section separately from the extracted audio data; and outputting the video still screen for each section, subtitle information, and volume information by classifying the video still screen of the corresponding part to the screen of the user terminal.

Description

Video review system, method and program {Video review system, method and program}

본 발명은 동영상 검토 시스템 및 방법에 관한 것으로, 더욱 상세하게는 사용자가 입력한 시간 간격으로 정지 화면과 음성 내용 그리고 음량 정보를 제공하여 동영상의 문제 여부를 효율적으로 확인할 수 있게 해주는 동영상 검토 시스템, 방법과 그 프로그램에 관한 것이다.The present invention relates to a video review system and method, and more particularly, to a video review system and method for efficiently checking whether a video has a problem by providing a still image, audio content, and volume information at a time interval input by a user. and about the program.

교육, 문화, 취미 등 다양한 동영상 콘텐츠가 기하급수적으로 증가하는 상황에서 시스템 및 유튜브, VIMEO 등과 같은 영상 컨텐츠 플랫폼에 업로드되고, 유지 및 관리되어야 하는 동영상들이 제대로 구성되어 있는지 확인하는 작업은 방대해지고 그만큼 어려워졌다.In a situation where various video contents such as education, culture, and hobbies increase exponentially, the task of ensuring that the videos that need to be uploaded, maintained, and managed are properly organized in the system and video content platforms such as YouTube and VIMEO becomes vast and difficult. lost.

동영상에 문제가 없는지 확인하기 위해서는 사용자 또는 시스템 관리자가 동영상들을 일일이 재생하여 확인하여야 한다.In order to check whether there is a problem with the video, the user or the system administrator must play the video one by one and check it.

많은 동영상을 관리해야 하는 시스템에서 이 작업은 매우 많은 시간이 필요하고 번거롭기 때문에 다량의, 그리고 재생 시간이 긴 동영상들의 효율적인 확인을 위하여 전체적인 동영상 내용을 빠르게 검토할 수 있는 기술이 필요하다.Since this task is very time-consuming and cumbersome in a system that needs to manage many videos, a technique for quickly reviewing the entire video content is required to efficiently check a large amount of videos with a long playing time.

동영상에 문제가 없는지, 그래서 동영상이 제대로 구성되어 있는지 빠르게 확인할 수 있는 방법 중 하나는 일정한 시간 단위로 동영상의 화면을 보여주고 해당 부분의 음성 정보를 출력하는 것이다.One of the ways to quickly check whether there is a problem with the video and whether the video is properly composed is to display the screen of the video at regular intervals and output audio information of the corresponding part.

따라서, 전술한 문제를 해결하기 위하여 동영상 검토 시스템 및 방법에 대한 연구가 필요하게 되었다.Therefore, in order to solve the above problems, research on a video review system and method has been required.

한국등록특허 제10-1682076호(2016년11월28일 등록)Korean Registered Patent No. 10-1682076 (registered on November 28, 2016)

본 발명의 목적은 사용자가 입력한 시간 간격으로 정지 화면과 음성 내용 그리고 음량 정보를 제공하여 동영상의 문제 여부를 효율적으로 확인할 수 있게 해주는 동영상 검토 시스템 및 방법을 제공하는 것이다.An object of the present invention is to provide a video review system and method that efficiently checks whether a video has a problem by providing a still image, audio content, and volume information at a time interval input by a user.

상기 동영상 검토 프로그램은 동영상에 대한 정보를 표시하도록 동영상 재생 화면, 동영상 재생 관련 메뉴, 동영상 파일에 대한 정보를 표시하는 동영상정보부와, 상기 동영상정보부에 표시되는 동영상 검토시, 해당 동영상 검토 시간 간격을 입력받도록 표시하고, 현재 출력되는 자막이 동영상의 재생 시간 구간 어디에 해당하는지 확인할 수 있는 자막출력시간을 표시하는 시간표시부와, 상기 동영상정보부에 표시되는 동영상 검토시, 상기 시간표시부에서 입력된 동영상 검토 시간 간격에 대한 해당 구간별 적어도 하나의 동영상 정지 화면, 자막 정보, 음량정보를 포함한 구간정보를 표시하는 구간정보부를 포함하는 것을 특징으로 한다.The video review program inputs a video playback screen, a menu related to video playback, and a video information unit displaying information on a video file to display information about the video, and when reviewing a video displayed in the video information unit, a corresponding video review time interval. A time display unit for displaying a subtitle output time to check where the currently output subtitle corresponds to a playback time interval of a video, and a time interval for reviewing a video input from the time display unit when reviewing a video displayed in the video information unit. and a section information unit for displaying section information including at least one video still screen, subtitle information, and volume information for each corresponding section.

상기 동영상 검토 프로그램은 검토할 동영상 파일 분석을 통해 음성 데이터를 추출하는 단계; 상기 동영상 검토 프로그램은 음성 데이터를 사용자가 입력한 검토 시간 간격과 자막 출력 시간에 따라 구간별로 PCM 데이터를 생성하여 서비스제공서버로 전송하는 단계; 상기 서비스제공서버는 수신한 PCM 데이터를 STT 엔진에 입력하고 STT 변환 결과를 생성하여 상기 동영상 검토 프로그램이 실행된 사용자단말기로 전송하는 단계; 상기 동영상 검토 프로그램은 STT 변환 결과를 구간별로 나눠 출력하여 동영상 검토 시간 간격에 해당하는 구간별로 자막 정보를 표시하여 제공하는 단계를 더 포함한다.the video review program extracting audio data through analysis of a video file to be reviewed; The video review program generates PCM data for each section according to the review time interval and subtitle output time input by the user, and transmits the audio data to a service providing server; the service providing server inputting the received PCM data to the STT engine, generating an STT conversion result, and transmitting the result to a user terminal where the video review program is executed; The video review program further includes displaying and providing subtitle information for each segment corresponding to the video review time interval by dividing and outputting the STT conversion result by segment.

상기 동영상 검토 방법을 수행하는 컴퓨터로 읽을 수 있는 저장매체에 저장된 동영상 검토 프로그램에 의해 수행된다.It is performed by a video review program stored in a computer-readable storage medium that performs the above video review method.

본 발명의 일 실시예에 따른 상기 동영상 검토 방법을 이용한 동영상 검토 시스템에 있어서, 동영상 검토 프로그램이 실행되는 사용자단말기와 통신망을 통하여 연결되고, 사용자단말기에서 실행된 동영상 검토 프로그램이 검토할 동영상에서 추출한 음성을 자막 텍스트 변환 요청시, 해당 음성에 대한 자막 텍스트 변환을 수행하기 위한 STT 엔진을 구비하는 서비스제공서버를 포함한다.In the video review system using the video review method according to an embodiment of the present invention, it is connected to a user terminal where the video review program is executed through a communication network, and the video review program executed in the user terminal is connected to the audio extracted from the video to be reviewed. and a service providing server having an STT engine for performing subtitle text conversion for the corresponding voice when requesting subtitle text conversion.

상기 서비스제공서버는 통신망을 통하여 사용자단말기와 통신하고, 동영상 검토 서비스를 제공하는데 필요한 정보의 데이터 통신을 수행하기 위한 적어도 하나의 통신 프로토콜을 구비한 통신부; STT 엔진을 구비하여 검토할 동영상에 대한 음성-자막 텍스트 변환을 수행하고, STT 변환 결과를 통신부를 통하여 사용자단말기 측으로 전송하도록 제공하는 관리부; 상기 동영상 검토 프로그램에서 구현되는 유저 인터페이스에 대한 제어 기능을 수행하며, 동영상 검토 프로그램을 통하여 사용자 입력 동작에 따라 동영상 정지 화면부터 동영상을 재생하거나 음량 정보를 재생하며, 자막 정보에 대해서 사용자 입력 동작을 수행하면 해당 자막 정보의 텍스트를 직접 다시 입력하여 수정할 수 있도록 인터페이스를 제공하는 표시제어부를 포함한다.The service providing server includes a communication unit having at least one communication protocol for communicating with a user terminal through a communication network and performing data communication of information necessary to provide a video review service; a management unit having an STT engine to perform audio-caption text conversion on the video to be reviewed, and transmitting the STT conversion result to the user terminal through the communication unit; Performs a control function for the user interface implemented in the video review program, plays a video from a still video screen or plays volume information according to a user input operation through the video review program, and performs a user input operation for subtitle information. , a display control unit providing an interface so that the text of the corresponding subtitle information can be directly re-entered and corrected.

상기 서비스제공서버는 동영상 검토 프로그램을 제공함에 따라 동영상 검토 내역에 대해 수집하고, 사용자 요청시 해당 정보를 조회할 수 있도록 관리하며, 동영상 검토 내역에는 동영상 수정 내용, 해당 동영상 정보(자막, 음량 포함)를 포함함과 아울러, 동영상 검토 결과에 대한 사용자 평가나 만족도를 더 포함하며, 수집된 동영상 검토 내역을 이용하여 머신러닝 기반의 학습을 수행할 수 있으며, 학습 결과로서 동영상 검토 프로그램의 문제나 오류에 대한 결과를 판단하여 관리자에게 제공하는 통계학습부를 더 포함한다.As the service providing server provides a video review program, it collects video review details and manages the information so that it can be viewed upon user request. In addition to including, it further includes user evaluation or satisfaction with the video review results, and machine learning-based learning can be performed using the collected video review history, and as a result of learning, it is possible to detect problems or errors in the video review program. Further includes a statistical learning unit that determines the result for and provides it to the manager.

본 발명은 동영상 검토를 효율적으로 수행할 수 있도록 구간별 동영상 정지 화면, 자막 정보, 음성 정보를 구분하여 제공하고, 해당 구간에서의 동영상 자막, 음성의 오류를 발견하고 즉각적으로 수정할 수 있도록 기능을 제공함으로써 사용자 편의성이 증대될 뿐만 아니라, 다양한 동영상 컨텐츠 검토를 수행할 수 있어, 최근 급성장하고 있는 동영상 컨텐츠 관련 사업 수요 창출에 유리한 장점이 있다.The present invention classifies and provides video still images, subtitle information, and audio information for each section so that video review can be efficiently performed, and provides a function to detect and immediately correct errors in video subtitles and audio in the section. By doing so, not only user convenience is increased, but also various video contents can be reviewed, which is advantageous in creating demand for business related to video contents, which are rapidly growing.

또한 본 발명은 분할 화면을 통하여 구간별 정지 화면, 음성(음량 상태 포함), 자막 등에 대해서 구간별로 다각도 검토 방식이 적용되어 방대한 양의 동영상에 대해서도 구간별 적용에 의해 동영상 오류 검토에 효율적인 장점이 있다.In addition, the present invention has the advantage of being efficient in reviewing video errors by applying a multi-angle review method for each section to a section-by-section still screen, audio (including volume status), subtitles, etc. through a split screen, even for a vast amount of video. .

도 1은 본 발명의 일 실시예에 따른 동영상 검토 시스템의 구성을 보인 블록도이다.
도 2는 도 1의 동영상 검토 시스템의 내부 구성을 세부적으로 보인 블록도이다.
도 3은 본 발명의 일 실시예에 따른 동영상 검토 방법의 순서도이다.
도 4는 본 발명의 일 실시예에 따른 동영상 검토 방법에서 음성 데이터 변환 과정을 구체적으로 나타낸 순서도이다.
도 5는 본 발명의 일 실시예에 따른 동영상 검토 방법을 구현한 동영상 검토 프로그램 화면을 예시적으로 보인 도면이다.1 is a block diagram showing the configuration of a video review system according to an embodiment of the present invention.
FIG. 2 is a block diagram showing the internal configuration of the video review system of FIG. 1 in detail.
3 is a flowchart of a video review method according to an embodiment of the present invention.
4 is a flowchart illustrating a process of converting audio data in detail in a video review method according to an embodiment of the present invention.
5 is a diagram showing a screen of a video review program implementing a video review method according to an embodiment of the present invention by way of example.

이하에서는 도면을 참조하여 본 발명의 구체적인 실시예를 상세하게 설명한다. 다만, 본 발명의 사상은 제시되는 실시예에 제한되지 아니하고, 본 발명의 사상을 이해하는 당업자는 동일한 사상의 범위 내에서 다른 구성요소를 추가, 변경, 삭제 등을 통하여, 퇴보적인 다른 발명이나 본 발명 사상의 범위 내에 포함되는 다른 실시예를 용이하게 제안할 수 있을 것이나, 이 또한 본 발명 사상 범위 내에 포함된다고 할 것이다. 또한, 각 실시예의 도면에 나타나는 동일한 사상의 범위 내의 기능이 동일한 구성요소는 동일한 참조부호를 사용하여 설명한다.Hereinafter, specific embodiments of the present invention will be described in detail with reference to the drawings. However, the spirit of the present invention is not limited to the presented embodiments, and those skilled in the art who understand the spirit of the present invention may add, change, delete, etc. other elements within the scope of the same spirit, through other degenerative inventions or the present invention. Other embodiments included within the scope of the inventive idea can be easily proposed, but it will be said that this is also included in the scope of the inventive concept. In addition, components having the same function within the scope of the same idea appearing in the drawings of each embodiment are described using the same reference numerals.

도 3은 본 발명의 일 실시예에 따른 동영상 검토 방법의 순서도이며, 도 4는 본 발명의 일 실시예에 따른 동영상 검토 방법에서 음성 데이터 변환 과정을 구체적으로 나타낸 순서도이며, 도 5는 본 발명의 일 실시예에 따른 동영상 검토 방법을 구현한 동영상 검토 프로그램 화면을 예시적으로 보인 도면이다.Figure 3 is a flow chart of a video review method according to an embodiment of the present invention, Figure 4 is a flow chart showing the audio data conversion process in detail in the video review method according to an embodiment of the present invention, Figure 5 is a flow chart of the present invention It is a diagram showing an example of a video review program screen implementing a video review method according to an embodiment.

본 발명의 일 실시예에 따른 동영상 검토 방법은 사용자단말기(200) 화면에 실행된 동영상 검토 프로그램을 통하여 동영상 검토를 위해 사용자단말기(200)에 제공하려는 동영상의 음성 영상 분리 및 음성 자막 변환 등의 제어 및 표시되는 구간에 대한 정보들이 처리되는 과정을 포함한다.A video review method according to an embodiment of the present invention controls audio and video separation of a video to be provided to the user terminal 200 for video review through a video review program executed on the screen of the user terminal 200 and conversion of audio subtitles. and processing information about the displayed section.

이때 동영상 검토 프로그램은 도 5를 참조하면, 크게 세 영역으로 나누어서 표시되도록 프로그램 화면 상의 유저 인터페이스(User Interface)를 구현할 수 있으며, 세 영역은 동영상정보부(V), 시간표시부(T) 및 구간정보부(I)로 이루어진다.At this time, referring to FIG. 5, the video review program can implement a user interface on the program screen so that it is divided into three areas and displayed, and the three areas are a video information unit (V), a time display unit (T) and a section information unit ( I) consists of

동영상정보부(V)는 동영상에 대한 정보를 표시하도록 제공하며, 이를 테면 동영상화면, 동영상 재생 관련 메뉴(재생, 정지, 일시정지 등), 동영상 파일 정보(제목, 파일 크기, 파일 형식, 재생 시간 등)에 대한 정보를 표시하며 동영상 구간별 직관적인 확인이 가능하도록 구간정보부(I)의 좌측 화면에 위치하여 표시될 수 있다.The video information unit (V) provides information about the video to be displayed, such as video screen, video playback related menu (play, stop, pause, etc.), video file information (title, file size, file format, playback time, etc.) ) and can be displayed on the left screen of the section information unit (I) so that intuitive confirmation of each video section is possible.

시간표시부(T)는 동영상 검토시 검토할 구간과 자막출력시간을 확인할 수 있도록 제공하며, 위치는 동영상정보부(V)와 구간정보부(I)의 상단에 바 형태로 제공되어 사용자가 검토시 자막 출력 시간을 확인하고, 검토할 동영상 검토 시간 간격을 쉽게 입력할 수 있도록 제공할 수 있다.The time display unit (T) provides a way to check the section to be reviewed and the output time of subtitles when reviewing a video, and the location is provided in the form of a bar at the top of the video information section (V) and section information section (I) so that the user can output subtitles during review. You can check the time and provide an easy way to enter the video review time interval you want to review.

구간정보부(I)는 동영상에 대한 검토시 입력된 동영상 검토 간격에 대한 구간별 동영상 정지 화면, 자막 정보, 음량정보를 포함한 구간정보를 제공한다.The section information unit (I) provides section information including a video still screen, subtitle information, and volume information for each section for the video review interval input when reviewing the video.

동영상 검토 간격에 대한 구간은 검토 간격 내에서 일정 시간 간격 또는 영상 프레임 간격으로 분할되어 적어도 하나의 구간에 대한 구간정보가 정해질 수 있다. Section information for at least one section may be determined by dividing the section of the video review interval into a predetermined time interval or image frame interval within the review interval.

또한 동영상 정지 화면은 해당 구간의 동영상 프레임에 대한 정지 화면이 될 수 있다.Also, the video still image may be a still image for a video frame of a corresponding section.

또한 자막 정보 및 음량정보는 해당 구간의 자막과 음량 데이터를 포함할 수 있으며, 자막 정보는 텍스트 형태로 표시되어 제공되고, 음량정보는 버튼 형태로 제공하여 음량정보 버튼에 대한 사용자 입력 동작(예컨대 마우스 클릭 또는 터치 등의 입력 동작 등)시 해당 구간의 음성이 출력되도록 제공될 수 있다.In addition, the subtitle information and volume information may include the subtitle and volume data of the corresponding section, the subtitle information is displayed and provided in text form, and the volume information is provided in the form of a button so that a user input operation for the volume information button (eg, mouse An input operation such as a click or a touch, etc.) may be provided so that the audio of the corresponding section is output.

또한 동영상 정지 화면도 사용자 입력 동작시, 해당 구간의 동영상 프레임에 대한 정지 화면부터 재생이 되도록 제어될 수 있으며 이를 통해 구간별 동영상과 자막, 음량을 같이 검토함으로써, 사용자가 쉽게 동영상의 구간별로 자막 및 음량을 매칭하여 볼 수 있기 때문에 검토시 잘못된 부분을 찾기 용이한 측면이 있는 것이다.In addition, the video still screen can be controlled to be played from the still screen for the video frame of the corresponding section when the user inputs, and through this, the video, subtitle, and volume of each section are reviewed together, so that the user can easily view the subtitles and subtitles for each section of the video. Since the volume can be matched and viewed, it is easy to find the wrong part during review.

구체적으로 동영상 검토 프로그램 사용자는 동영상을 검토할 시간 간격을 입력할 수 있으며, 동영상 검토 프로그램은 입력한 시간 간격에 맞추어 해당 부분의 정지 화면, 일정 시간 동안의 자막 내용 그리고 음량 정보를 확인할 수 있다. Specifically, the user of the video review program can input a time interval to review the video, and the video review program can check the still image of the corresponding part, subtitle content for a certain period of time, and volume information according to the input time interval.

사용자는 자막과 음량정보를 구간 별로 확인하고 문제가 있다고 판단될 경우 정지 화면을 누르고 해당 부분의 동영상을 재생시켜 자세히 확인할 수 있다.The user checks subtitles and volume information for each section, and if it is determined that there is a problem, the user can press the still screen and play the video of the corresponding part to check in detail.

동영상 검토 프로그램에서 사용자 또는 관리자에게 제공하는 정보는 크게 동영상 해당 구간의 시작 정지 화면, 자막 내용을 포함한 자막 정보, 음량 정보이다. The information provided to the user or manager by the video review program is largely the start and stop screen of the corresponding section of the video, subtitle information including subtitle content, and volume information.

도 5는 제공하려는 동영상 관련 정보들이 어떻게 처리되는 지를 나타내는 과정이다. 5 is a process showing how video-related information to be provided is processed.

확인하려는 동영상 파일을 영상 부분과 음성 부분으로 나누어 분석한 뒤에 영상 관련 데이터에서는 사용자가 입력한 시간 단위로 정지 화면을 추출하여 화면에 출력한다. 음성 관련 데이터에서는 구간 자막 정보와 음량 정보를 생성하여 해당 부분의 정지 화면과 함께 출력한다.After analyzing the video file to be checked by dividing it into video and audio parts, still images are extracted from the video-related data in units of time input by the user and displayed on the screen. In audio-related data, section caption information and volume information are generated and output together with a still image of the corresponding section.

이제 도 3의 과정을 참조하여 동영상 구간별 검토를 위한 영상/음성 분석 및 변환 과정을 설명하면, 다음과 같다.Now, with reference to the process of FIG. 3, a video/audio analysis and conversion process for reviewing each video section will be described.

먼저, 확인 및 검토하려는 동영상 파일을 사용자가 입력하면, 동영상 검토 프로그램은 해당 동영상 파일을 분석하여 영상 부분과 음성 부분으로 나누어 추출할 수 있다(S100~S104).First, when a user inputs a video file to be checked and reviewed, the video review program analyzes the video file and divides it into a video part and an audio part and extracts them (S100 to S104).

추출된 영상 데이터에서는 사용자가 입력한 시간 단위로 구간별 동영상 정지 화면을 추출하여 사용자단말기(200)의 화면에 출력한다(S106). From the extracted video data, a video still screen for each section is extracted in units of time input by the user and output to the screen of the user terminal 200 (S106).

또한 음성 데이터에서는 사용자 입력된 동영상 검토 시간 간격에 해당하는 구간별 자막 정보와 음량 정보를 별도로 생성하고, 도 5에 도시된 바와 같이 해당 부분의 동영상 정지 화면별로 구분하여 출력한다(S108~S112).In addition, in the audio data, subtitle information and volume information for each section corresponding to the video review time interval input by the user are separately generated, and as shown in FIG.

음성 데이터의 자막 생성을 위한 구체적 처리 과정에 대해서는 도 4에 도시하고 있으며, 음성을 텍스트로 변환하는 기술인 Speech-to-Text(STT) 기술을 통하여 음성 내용을 자막으로 변환하여 제공할 수 있다.A detailed processing process for generating subtitles of audio data is shown in FIG. 4, and audio contents can be converted into subtitles and provided through Speech-to-Text (STT) technology, which is a technology for converting audio data into text.

음성을 자막으로 변환하기 위하여 STT 엔진이 구비되는 통신망(300)상의 서비스제공서버(100)가 마련된다. 그리고, 사용자단말기(200)에서는 동영상 검토 프로그램을 실행시켜, 동영상의 추출된 음성 데이터를 서비스제공서버(100)로 전송하여 STT 변환 결과 즉, 자막 정보를 수신할 수 있다.In order to convert audio into subtitles, a service providing server 100 on a communication network 300 equipped with an STT engine is provided. In addition, the user terminal 200 may execute a video review program, transmit audio data extracted from the video to the service providing server 100, and receive an STT conversion result, that is, subtitle information.

구체적으로 도 4를 참조하여 설명하면, 먼저 동영상 검토 프로그램은 검토할 동영상 파일 분석을 통해 음성 데이터를 추출한다(S200, S202).Specifically, referring to FIG. 4 , the video review program first extracts audio data through analysis of a video file to be reviewed (S200 and S202).

동영상 검토 프로그램은 음성 데이터를 사용자가 입력한 검토 시간 간격과 자막 출력 시간에 따라 구간별로 Pulse-Code Modulation(PCM) 데이터를 생성하여 서비스제공서버(100)로 전송한다(S204, S206).The video review program generates Pulse-Code Modulation (PCM) data for each section according to the review time interval and subtitle output time input by the user and transmits the audio data to the service providing server 100 (S204 and S206).

서비스제공서버(100)는 수신한 PCM 데이터를 STT 엔진에 입력하고 STT 변환 결과를 생성하여 상기 동영상 검토 프로그램이 실행된 사용자단말기(200)로 전송한다(S208).The service providing server 100 inputs the received PCM data to the STT engine, generates an STT conversion result, and transmits it to the user terminal 200 where the video review program is executed (S208).

동영상 검토 프로그램은 STT 변환 결과를 구간별로 나눠 출력하여 사용자에게 동영상 검토 시간 간격에 해당하는 구간별로 자막 정보를 표시하여 제공한다(S210).The video review program divides and outputs the STT conversion result by section, and displays and provides subtitle information to the user by section corresponding to the video review time interval (S210).

또한 이때 사용자단말기(200)는 검토할 동영상에서 추출한 음성의 STT 변환을 위해 STT 엔진에 요청시, 원하는 국가 언어를 선택할 수 있도록 하여 변환할 언어정보를 같이 전송할 수 있으며, STT 엔진은 해당 언어로 변환된 STT 변환 결과를 제공하며, 복수의 언어에 대해서 요청시 복수의 변환 결과를 제공받을 수도 있다.In addition, at this time, when the user terminal 200 requests the STT engine for STT conversion of the audio extracted from the video to be reviewed, the desired country language can be selected and the language information to be converted can be transmitted together, and the STT engine converts to the corresponding language STT conversion results are provided, and multiple conversion results may be provided upon request for multiple languages.

나아가, 사용자단말기(200)에서 확인되는 구간별 자막 정보는 변환 결과에 따라 오류가 있을 수 있으며, 구간별 해당 음성 정보를 체크하여 직접 사용자가 수정 변환이 가능하도록 구현함으로써, 사용자가 쉽게 특정 구간에 대한 자막 교정이 이루어지도록 할 수 있다.Furthermore, the subtitle information for each section checked by the user terminal 200 may have an error according to the conversion result, and by checking the corresponding audio information for each section and directly modifying and converting it, the user can easily access a specific section. Subtitle corrections can be made.

또한, 자막 정보와 구간별 영상 정지 화면에 대해서도 매칭을 통해 해당 구간의 자막에 해당하는지 체크하여 자막 싱크 조정이 이루어지도록 제공될 수 있다.In addition, caption information and video still images for each section may be matched to check whether they correspond to the caption of the corresponding section, so that caption sync is adjusted.

도 1은 본 발명의 일 실시예에 따른 동영상 검토 시스템의 구성을 보인 블록도이며, 도 2는 도 1의 동영상 검토 시스템의 내부 구성을 세부적으로 보인 블록도이다.1 is a block diagram showing the configuration of a video review system according to an embodiment of the present invention, and FIG. 2 is a block diagram showing the internal configuration of the video review system of FIG. 1 in detail.

본 발명의 동영상 검토 시스템은 동영상 검토 프로그램이 실행되는 사용자단말기(200)와, 통신망(300)을 통하여 연결되어 음성에 대한 자막 텍스트 변환을 수행하기 위한 STT 엔진을 구비하는 서비스제공서버(100)를 포함한다.The video review system of the present invention includes a user terminal 200 in which a video review program is executed, and a service providing server 100 connected through a communication network 300 and having an STT engine for converting subtitles to audio into text. include

사용자단말기(200)는 동영상 검토 프로그램을 실행 가능한 개인 단말기로, 예컨대 PC를 포함하여 노트북, 태블릿, 스마트폰, 패블릿폰과 같은 휴대단말기가 될 수도 있다.The user terminal 200 is a personal terminal capable of executing a video review program, and may be, for example, a portable terminal including a PC, a laptop computer, a tablet computer, a smart phone, and a phablet phone.

또한 서비스제공서버(100)는 STT 엔진을 구비하며, 사용자단말기(200)에서 실행된 동영상 검토 프로그램이 검토할 동영상에서 추출한 음성을 자막 텍스트 변환 요청시, STT 변환을 수행하고 변환 결과를 사용자단말기(200) 측으로 제공할 수 있다.In addition, the service providing server 100 has an STT engine, and when a video review program executed in the user terminal 200 requests subtitle text conversion of audio extracted from a video to be reviewed, STT conversion is performed and the conversion result is transmitted to the user terminal ( 200) side.

나아가 서비스제공서버(100)는 사용자단말기(200)에 설치되는 방식 외에도 웹 기반으로 동영상 검토 프로그램을 제공하도록 서비스를 제공할 수도 있다.Furthermore, the service providing server 100 may provide a service to provide a video review program based on the web in addition to the method installed in the user terminal 200 .

또한 서비스제공서버(100)는 사용자단말기(200) 설치 방식 또는 웹 기반 방식으로 프로그램 서비스 제공을 위한 세부 구성으로 도 2에 도시된 바와 같이 통신부(110), 관리부(120), 표시제어부(130), 통계학습부(140) 및 데이터베이스(150)를 더 포함할 수 있다.In addition, the service providing server 100 has a detailed configuration for providing a program service in a user terminal 200 installation method or a web-based method, and as shown in FIG. 2, the communication unit 110, the management unit 120, and the display control unit 130 , may further include a statistical learning unit 140 and a database 150.

통신부(110)는 통신망(300)을 통하여 사용자단말기(200)와 통신하고, 동영상 검토 서비스를 제공하는데 필요한 정보의 데이터 통신을 수행하며, 이를 위해 통신망(300)의 호환 통신 프로토콜을 하나 이상 포함할 수 있다.The communication unit 110 communicates with the user terminal 200 through the communication network 300, performs data communication of information necessary to provide a video review service, and includes one or more compatible communication protocols of the communication network 300 for this purpose. can

관리부(120)는 STT 엔진을 구비하여 검토할 동영상에 대한 음성-자막 텍스트 변환을 수행하고, 변환 결과(자막 텍스트를 포함한 자막 정보)를 통신부(110)를 통하여 사용자단말기(200) 측으로 전송하도록 제공한다.The management unit 120 is provided with an STT engine to perform audio-caption text conversion on the video to be reviewed, and transmits the conversion result (caption information including caption text) to the user terminal 200 through the communication unit 110. do.

또한 관리부(120)는 웹기반으로 동영상 검토 프로그램을 제공시 프로그램 서비스를 제공하기 위한 기능을 더 포함할 수 있으며, 이를테면 회원정보를 제공받아 회원별 프로그램 관리를 수행할 수도 있다.In addition, the management unit 120 may further include a function for providing a program service when providing a web-based video review program. For example, it may receive member information and perform program management for each member.

또한 회원제에 의해 프로그램 서비스를 제공하는 것 외에, 체험판(trial version) 형태와 유료 결제에 의한 정식판을 제공하며, 정식판 서비스 이용시 로그인마다 결제가 이루어지거나, 기간제 결제 방식이 채용될 수도 있다.In addition to providing program services through a membership system, a trial version and a full version by paid payment are provided, and when using the full version service, payment is made for each log-in or a fixed-term payment method may be employed.

또한 프로그램이 서비스제공서버(100)에서 웹기반으로 운영되는 경우, 서버 로그인이 시도될 수 있다.In addition, when the program is operated on a web basis in the service providing server 100, server login may be attempted.

로그인 인증을 위해 최초 회원 가입 과정이 필요하며, 비회원으로 진행시 개인정보 인증에 의해 임시로 프로그램을 이용할 수도 있으나, 할인 혜택 등 회원에게 제공되는 혜택이 줄어들 수 있다.The first member sign-up process is required for login authentication, and when proceeding as a non-member, the program may be used temporarily by personal information authentication, but benefits provided to members such as discounts may be reduced.

로그인 인증은 최초 회원 가입시 제공받은 개인정보와의 매칭에 의해 진행될 수 있으며, 이를 위해 서비스제공서버(100)는 개인정보를 데이터베이스(150)에 암호화하여 저장할 수 있다.Login authentication may be performed by matching with personal information provided at the time of initial member registration, and for this purpose, the service providing server 100 may encrypt and store the personal information in the database 150.

로그인 인증이 완료되면, 통신망(300)을 통하여 웹기반 형태로 프로그램이 사용자단말기(200)에서 실행될 수 있도록 제공된다.When login authentication is completed, the program is provided to be executed in the user terminal 200 in a web-based form through the communication network 300 .

나아가 로그인 인증시 인증의 무결성 검증을 수행하고, 개인 회원 정보의 보호를 위해 블록체인 기반의 회원정보 관리 및 저장이 이루어질 수 있다.Furthermore, integrity verification of authentication is performed during login authentication, and member information management and storage based on blockchain can be performed to protect individual member information.

구체적으로, 다수의 블록체인서버와 연계하여 블록체인망을 구축하고, 기구축된 내부의 블록체인 네트워크를 통해 공개키 및 개인키를 생성하여 해쉬값으로 변환하여 분산 저장하고, 분산 저장된 공개키와 사용자의 개인정보를 기반으로 사용자 로그인 인증을 수행할 수 있다.Specifically, a blockchain network is established in conjunction with a number of blockchain servers, public keys and private keys are generated through the established internal blockchain network, converted into hash values, distributed and stored, and distributed public keys and users User login authentication can be performed based on the personal information of the user.

더 나아가 다수의 사용자단말기(200)에서 공개키와 함께 개인 고유의 사용자 정보를 전송받아 사용자 정보에 대한 해쉬값을 포함하는 사용자 인증서를 각각 생성할 수 있으며, 각 사용자 인증서에 대한 저장 방식은 머클 트리 구조에 의해 이루어질 수 있다.Furthermore, each user certificate including a hash value for the user information may be generated by receiving individual user information along with a public key from the plurality of user terminals 200, and the storage method for each user certificate is a merkle tree. structure can be achieved.

가령, 각각의 사용자 인증서(거래)를 최하위 자식 노드에 해쉬값을 포함하여 저장하고, 머클 트리의 최상위 레벨인 머클 루트(부모 노드)에는 최하위 자식 노드와 이어지는 경로 상에 있는 중간 노드에 해시값을 공유하도록 해싱(hashing)하여 저장하게 된다.For example, each user certificate (transaction) is stored with a hash value in the lowest child node, and the hash value is stored in the intermediate node on the path leading to the lowest child node in the merkle root (parent node), which is the top level of the merkle tree. It is hashed and stored so that it can be shared.

이를 통해 저장된 사용자 인증서의 진위 여부를 판단할 때, 개인의 사용자단말기(200)에 복사된 사용자 인증서와 데이터베이스(150)의 사용자 인증서를 비교하게 되고, 머클 트리의 경로를 따라 해싱된 해쉬값만을 비교하여 이루어지게 된다.When determining the authenticity of the stored user certificate through this, the user certificate copied to the individual user terminal 200 and the user certificate in the database 150 are compared, and only hash values that have been hashed along the path of the Merkle tree are compared. It is done by

이때, 머클 트리의 경로 상에서 비교 연산이 이루어짐에 따라 모든 노드의 블록에 대한 비교 연산을 수행하지 않아도 되기 때문에, 비교적 쉬운 연산량으로 진위 여부를 판단할 수 있으며, 거래의 위변조도 쉽고 빠르게 찾아낼 수 있으며, 용량이 작은 휴대 단말 형태의 사용자단말기(200)에서도 쉽게 거래를 검증할 수 있게 된다.At this time, since the comparison operation is performed on the path of the Merkle tree, it is not necessary to perform the comparison operation on the blocks of all nodes, so it is possible to determine the authenticity with a relatively easy amount of calculation, and forgery of the transaction can be found easily and quickly. , Transactions can be easily verified even in the user terminal 200 in the form of a portable terminal with a small capacity.

표시제어부(130)는 동영상 검토 프로그램에서 구현되는 유저 인터페이스에 대한 제어 기능을 수행하며, 구체적으로 도 5와 같은 프로그램 화면을 구성하도록 제어하고, 표시된 동영상정보부(V), 시간표시부(T), 구간정보부(I)에서 상술한 바와 같이, 사용자 입력 동작에 따라 동영상 정지 화면부터 동영상을 재생하거나, 음량정보를 재생할 수 있다. 또한 자막 정보에 대해서 사용자 입력 동작을 수행하면, 해당 자막 정보의 텍스트를 직접 다시 입력하여 수정할 수 있도록 인터페이스를 제공할 수도 있다.The display control unit 130 performs a control function for the user interface implemented in the video review program, and specifically controls to configure the program screen as shown in FIG. 5, displays the video information unit (V), time display unit (T), and section As described above in the information unit (I), according to the user's input operation, the video can be played from the still screen of the video or volume information can be reproduced. In addition, when a user input operation is performed on the caption information, an interface may be provided so that the text of the corresponding caption information can be directly re-entered and corrected.

통계학습부(140)는 회원별 동영상 검토 프로그램을 제공함에 따라 회원별 동영상 검토 내역에 대해 수집하고, 데이터베이스(150)에 저장하여 회원이 요청시 해당 정보를 조회할 수 있도록 관리하며, 회원별 동영상 검토 내역에는 동영상 수정 내용, 해당 동영상 정보(자막, 음량 정보 등을 포함한 정보) 등을 포함함과 아울러, 동영상 검토 결과에 대한 사용자 평가나 만족도를 더 포함할 수 있다.As the statistical learning unit 140 provides a video review program for each member, it collects the video review details for each member, stores them in the database 150, and manages the information so that the member can view the information upon request. The review history may further include a user's evaluation or satisfaction with respect to the video review result, as well as the contents of the video correction, the corresponding video information (information including subtitles, volume information, etc.), and the like.

나아가 통계학습부(140)는 수집된 동영상 검토 내역을 이용하여 머신러닝 기반의 학습을 수행할 수 있으며, 학습 결과로서 동영상 검토 프로그램의 문제나 오류 등을 판단하여 관리자에게 제공할 수 있다.Furthermore, the statistical learning unit 140 may perform machine learning-based learning using the collected video review details, and may determine problems or errors of the video review program as learning results and provide the information to a manager.

이를 테면 머신러닝 학습시 특정 검토 영역(자막, 음성, 동영상 재생 구간 등)에 대해 오류 내지 문제 판단을 위한 기준값을 설정하고, 검토 후 사용자 평가나 만족도에서 해당 영역에 대해 문제 제기시 문제 제기된 횟수에 따라 기준값 초과 여부를 판단하고, 초과시 오류나 문제가 있다고 판단하여 미리 설정된 관리자에게 해당 판단 결과를 제공할 수 있도록 한다. 또한 여기서 머신러닝 기반 학습은 알려진 CNN, RNN, SVM 등을 적어도 하나 이상 이용하는 신경망 또는 패턴 분류 학습 알고리즘이 될 수 있다.For example, when learning machine learning, a standard value for error or problem determination is set for a specific review area (subtitle, audio, video playback section, etc.), and the number of times problems are raised when a problem is raised in the user evaluation or satisfaction level after review According to this, it is determined whether the reference value is exceeded, and if it is exceeded, it is determined that there is an error or problem, and the result of the judgment can be provided to a manager set in advance. Also, here, the machine learning-based learning may be a neural network or a pattern classification learning algorithm using at least one known CNN, RNN, SVM, or the like.

데이터베이스(150)는 회원정보, 동영상정보, 동영상 검토 정보, 통계 등을 수집하여 저장하고 관리할 수 있다.The database 150 may collect, store, and manage member information, video information, video review information, statistics, and the like.

본 명세서에서 ‘단말기’는 휴대성 및 이동성이 보장된 무선 통신 장치일 수 있으며, 예를 들어 스마트폰, 태블릿 PC 또는 노트북 등과 같은 모든 종류의 핸드헬드(Handheld) 기반의 무선 통신 장치일 수 있다. 또한, ‘단말기’는 통신망을 통해 다른 단말 또는 서버 등에 접속할 수 있는 PC 등의 유선 통신 장치인 것도 가능하다. 또한, 통신망은 단말들 및 서버들과 같은 각각의 노드 상호 간에 정보 교환이 가능한 연결 구조를 의미하는 것으로, 근거리 통신망(LAN: Local Area Network), 광역 통신망(WAN: Wide Area Network), 인터넷 (WWW: World Wide Web), 유무선 데이터 통신망, 전화망, 유무선 텔레비전 통신망 등을 포함한다. In this specification, a 'terminal' may be a wireless communication device that guarantees portability and mobility, and may be, for example, any type of handheld-based wireless communication device such as a smart phone, a tablet PC, or a laptop computer. In addition, the 'terminal' may also be a wired communication device such as a PC capable of accessing other terminals or servers through a communication network. In addition, a communication network refers to a connection structure capable of exchanging information between nodes such as terminals and servers, such as a local area network (LAN), a wide area network (WAN), and the Internet (WWW : World Wide Web), wired and wireless data communications network, telephone network, and wired and wireless television communications network.

무선 데이터 통신망의 일례에는 3G, 4G, 5G, 3GPP(3rd Generation Partnership Project), LTE(Long Term Evolution), WIMAX(World Interoperability for Microwave Access), 와이파이(Wi-Fi), 블루투스 통신, 적외선 통신, 초음파 통신, 가시광 통신(VLC: Visible Light Communication), 라이파이(LiFi) 등이 포함되나 이에 한정되지는 않는다.Examples of wireless data communication networks include 3G, 4G, 5G, 3rd Generation Partnership Project (3GPP), Long Term Evolution (LTE), World Interoperability for Microwave Access (WIMAX), Wi-Fi, Bluetooth communication, infrared communication, ultrasonic communication, visible light communication (VLC: Visible Light Communication), LiFi, and the like, but are not limited thereto.

100 ; 서비스제공서버
110 ; 통신부
120 ; 관리부
130 ; 표시제어부
140 ; 통계학습부
150 ; 데이터베이스
200 ; 사용자단말기
300 ; 통신망
V ; 동영상정보부
T ; 시간표시부
I ; 구간정보부100; Service provision server
110; Ministry of Communications
120; Management
130; display control unit
140; Department of Statistics
150; database
200; User terminal
300; communications network
V; video information department
T; time display
I; section information department

Claims

When the user inputs a video file to be checked and reviewed, the video review program executed on the screen of the user terminal analyzes the video file and extracts the video data and audio data;
The video review program extracts at least one video still picture for each section from the extracted video data in units of a review time interval input by a user and outputs the still picture to the screen of the user terminal;
generating, by the video review program, subtitle information and volume information for each section corresponding to the video still screen for each section separately from the extracted audio data;
Classifying the video still screen of the corresponding part and outputting the video still screen for each section, subtitle information, and volume information to the screen of the user terminal;
the video review program extracting audio data through analysis of a video file to be reviewed;
The video review program generates PCM data for each section according to the review time interval and subtitle output time input by the user, and transmits the audio data to a service providing server;
the service providing server inputting the received PCM data to the STT engine, generating an STT conversion result, and transmitting the result to a user terminal where the video review program is executed;
The video review program further includes displaying and providing subtitle information for each section corresponding to a video review time interval by dividing and outputting the STT conversion result by section,
When a request is made to the STT engine for STT conversion of audio extracted from a video to be reviewed in the user terminal, the language information to be converted is transmitted together by allowing a desired country language to be selected, and the STT engine converts the STT converted into the corresponding language Provide a conversion result, but provide a plurality of conversion results upon request for a plurality of languages,
Errors in the subtitle information for each section identified by the user terminal are corrected through user modification and conversion to check the corresponding audio information for each section, and the corresponding section is matched with the caption information and the video still screen for each section. Subtitle sync is adjusted by checking whether it corresponds to the subtitle of
When the video review program is executed in a web-based form on the user terminal, log-in authentication is performed on the service providing server, and a blockchain network is established in conjunction with a plurality of blockchain servers, through the established internal blockchain network. Generates public and private keys, converts them into hash values, distributes and stores them, performs user login authentication based on the distributedly stored public key and user's personal information, and performs user login authentication together with the public key in the plurality of user terminals. Information is received and user certificates including a hash value for the user information are generated, and each user certificate is stored using a Merkle tree structure.
Video review method including.

According to claim 1,
The video review program
a video information unit for displaying a video playback screen, a menu related to video playback, and information on a video file so as to display information about the video;
When reviewing the video displayed in the video information unit, a time display unit for displaying a corresponding video review time interval to receive an input and displaying a caption output time to confirm where the currently output caption corresponds to a playback time section of the video;
When reviewing the video displayed in the video information unit, including a section information unit for displaying section information including at least one video still screen, subtitle information, and volume information for each section for the video review time interval input from the time display unit. Featured video review method.

delete

A video review program stored in a computer-readable storage medium that performs the video review method of claim 1 or 2.

In the video review system using the video review method of claim 1 or 2,
An STT engine that is connected to a user terminal where the video review program is executed through a communication network and converts the subtitle text to the audio when the video review program executed in the user terminal requests subtitle text conversion of the audio extracted from the video to be reviewed. A service providing server having
Video review system that includes.

According to claim 5,
The service providing server
A communication unit having at least one communication protocol for communicating with a user terminal through a communication network and performing data communication of information necessary to provide a video review service;
a management unit having an STT engine to perform audio-caption text conversion on the video to be reviewed, and transmitting the STT conversion result to the user terminal through the communication unit;
Performs a control function for the user interface implemented in the video review program, plays a video from a still video screen or plays volume information according to a user input operation through the video review program, and performs a user input operation for subtitle information. video review system including a display control unit that provides an interface so that the text of the corresponding subtitle information can be directly re-entered and corrected.

According to claim 6,
The service providing server
As the video review program is provided, the video review history is collected, and the information is managed so that the information can be viewed upon user request. Statistical learning that further includes user evaluation or satisfaction, can perform machine learning-based learning using the collected video review history, and determines the results of problems or errors in the video review program as a learning result and provides it to the manager A video review system that includes more wealth.