KR102899190B1

KR102899190B1 - Method and system for generating gesture-enhanced realistic digital human tutor

Info

Publication number: KR102899190B1
Application number: KR1020220079272A
Authority: KR
Inventors: 황민철; 김경빈; 목수빈; 윤대호; 조아영
Original assignee: 상명대학교산학협력단
Priority date: 2022-06-28
Filing date: 2022-06-28
Publication date: 2025-12-10
Anticipated expiration: 2042-06-28
Also published as: KR20240002080A

Abstract

본 개시는 제스처 증강 디지털 휴먼 튜터 (DHT)의 생성 방법 및 시스템에 관련한다. 디지털 휴먼 튜터의 생성 방법:은 교수자의 반신 또는 전신 영상과 강의 영상을 획득하는 단계; 상기 교수자의 안면 영상을 이용하여 교수자의 안면 영상에 대응하는 디지털 휴먼 튜터를 생성하는 단계; 상기 강의 영상에서 교수자의 강의 중 얼굴 및 제스쳐의 외형적 특징을 추출하는 단계; 그리고 상기 외형적 특징을 상기 디지털 휴먼 튜터에 반영하여 상기 디지털 휴먼 튜터를 활성화하는 단계;를 포함한다.The present disclosure relates to a method and system for generating a gesture-augmented digital human tutor (DHT). The method for generating a digital human tutor includes: acquiring a half-body or full-body image of an instructor and a lecture video; generating a digital human tutor corresponding to the instructor's facial image using the instructor's facial image; extracting external features of the instructor's face and gestures during a lecture from the lecture video; and activating the digital human tutor by reflecting the external features to the digital human tutor.

Description

Method and system for generating gesture-enhanced realistic digital human tutor

본 개시는 가상 공간에 존재하는 디지털 휴먼 튜터(Digital Human Tutor)를 생성하는 방법에 관한 것으로, 상세하게는 학습자의 반응에 교감하는 것으로 제스처가 증강된 디지털 휴먼 튜터의 생성 방법 및 장치에 관한 것이다. The present disclosure relates to a method for creating a digital human tutor existing in a virtual space, and more particularly, to a method and device for creating a digital human tutor with augmented gestures that empathize with a learner's responses.

최근 바이러스 감염증의 확산으로 비대면 서비스의 수요가 증가하면서, 대면 의사소통보다 온라인을 통한 간접적인 의사소통의 증가로 공감이 결여되는 탈억제효과 (disinhibition effect)가 중요한 문제로 대두되고 있다. As the demand for non-face-to-face services increases due to the recent spread of viral infections, the disinhibition effect, which is a lack of empathy due to the increase in indirect communication through online channels rather than face-to-face communication, is emerging as a major problem.

따라서 교수자와 학습자 간 공감 상호작용을 높이기 위해 원격으로 학습자의 태도를 인식하고 이에 맞는 적절한 교수자의 반응 또는 피드백을 조절하는 것이 필요하다.Therefore, to enhance empathic interaction between instructors and learners, it is necessary to recognize learners' attitudes remotely and adjust instructor responses or feedback accordingly.

교수자가 학습자의 다양한 반응을 모방하면 학습자는 자기자신과 교수자가 비슷한 느낌을 받게 되어 사회적 유대감이 증대되고 이에 따라 교수자에게 더 공감하는 경향이 있다.When instructors mimic the learners' various responses, learners feel a sense of similarity between themselves and the instructor, which increases social bonding and tends to empathize more with the instructor.

따라서 학습자의 표정이나 반응을 실시간으로 인식하고 이에 대해 가상의 교수자인 디지털 휴먼 튜터가 반응하거나 그 표정을 모방하도록 하는 기술은 학습자로 하여금 공감을 유도하고 학습의 효과를 높일 수 있다.Therefore, technology that recognizes learners' facial expressions or reactions in real time and enables a virtual instructor, a digital human tutor, to react or imitate those facial expressions can induce empathy in learners and enhance learning effectiveness.

Joinson, A. N. (2007). Disinhibition and the Internet. In Psychology and the Internet (pp. 75-92). Academic Press.Joinson, A. N. (2007). Disinhibition and the Internet. In Psychology and the Internet (pp. 75-92). Academic Press. Suler, J. (2004). The online disinhibition effect. Cyberpsychology and behavior, 7(3), 321-326.Suler, J. (2004). The online disinhibition effect. Cyberpsychology and behavior, 7(3), 321-326. Chartrand, T. L., and Van Baaren, R. (2009). Human mimicry. Advances in experimental social psychology, 41, 219-274.]Chartrand, T. L., and Van Baaren, R. (2009). Human mimicry. Advances in experimental social psychology, 41, 219-274.] Schutte, N. S., and Stilinovi?, E. J. (2017). Facilitating empathy through virtual reality. Motivation and emotion, 41(6), 708-712.Schutte, N. S., and Stilinovi?, E. J. (2017). Facilitating empathy through virtual reality. Motivation and emotion, 41(6), 708-712. 최원경(Choi, Wonkyung).(2020). 면대면 대 비대면 강의 만족도 비교. 영어교과교육, 19(4): 223-245Choi, Wonkyung (2020). Comparison of satisfaction with face-to-face and virtual lectures. Journal of English Education, 19(4): 223-245. 김상미 (2020). 코로나19 관련 온라인 교육에 관한 국내 언론보도기사 분석. 한국디지털콘텐츠학회 논문지, 21(6), 1091-1100.Kim Sang-mi (2020). Analysis of domestic media reports on online education related to COVID-19. Journal of the Korea Digital Contents Society, 21(6), 1091-1100. 윤보람. (2018). 증강현실 기반 원격 협업 시스템에서 가상 아바타의 외형이 사용자의 사회적 실재감에 미치는 영향=(The) effect of virtual avatar appearance on user's social presence in an augmented reality-based remote collaboration System.Yoon, Bo-ram. (2018). The effect of virtual avatar appearance on users' social presence in an augmented reality-based remote collaboration system. Heidicker, P., Langbehn, E., and Steinicke, F. (2017, March). Influence of avatar appearance on presence in social VR. In 2017 IEEE Symposium on 3D User Interfaces (3DUI) (pp. 233-234). IEEE.Heidicker, P., Langbehn, E., and Steinicke, F. (2017, March). Influence of avatar appearance on presence in social VR. In 2017 IEEE Symposium on 3D User Interfaces (3DUI) (pp. 233-234). IEEE. Zibrek, K., Kokkinara, E., and McDonnell, R. (2018). The effect of realistic appearance of virtual characters in immersive environments-does the character's personality play a role?. IEEE transactions on visualization and computer graphics, 24(4), 1681-1690.Zibrek, K., Kokkinara, E., and McDonnell, R. (2018). The effect of realistic appearance of virtual characters in immersive environments-does the character's personality play a role?. IEEE transactions on visualization and computer graphics, 24(4), 1681-1690. 이우리, 황민철. (2014). 한국인 표준 얼굴 표정 이미지의 감성 인식 정확률. 한국콘텐츠학회논문지, 14(9), 476-483.Lee, Woo-ri, and Hwang, Min-cheol (2014). Emotion recognition accuracy for standard Korean facial expression images. Journal of the Korea Contents Association, 14(9), 476-483. Jo, D., Kim, K. H., and Kim, G. J. (2017). Effects of avatar and background types on users’ co-presence and trust for mixed reality-based teleconference systems. In Proceedings the 30th Conference on Computer Animation and Social Agents (pp. 27-36).Jo, D., Kim, K. H., and Kim, G. J. (2017). Effects of avatar and background types on users’ co-presence and trust for mixed reality-based teleconference systems. In Proceedings the 30th Conference on Computer Animation and Social Agents (pp. 27-36).

본 개시는 가상 공간에서의 디지털 휴먼 튜터의 실재감을 향상시키고 학습자의 공감을 효과적으로 유도할 수 있는 제스처 증강 디지털 휴먼 튜터의 생성 방법 및 시스템을 제시한다.The present disclosure proposes a method and system for creating a gesture-augmented digital human tutor that can enhance the presence of a digital human tutor in a virtual space and effectively induce empathy from a learner.

본 개시는 온라인 상 교수자의 제스처를 포함하는 외형적 특징과 교수자 얼굴의 무의식적 미세 표현을 가상 아바타인 디지털 휴먼 튜터에게 입힘으로써 디지털 휴먼 튜터를 실감나게 생성하는 제스처 증강 디지털 휴먼 튜터의 생성 방법 및 장치를 제안한다.The present disclosure proposes a method and device for generating a gesture-augmented digital human tutor that realistically creates a digital human tutor by imbuing the digital human tutor, which is a virtual avatar, with the external features including the online instructor's gestures and the unconscious micro-expressions of the instructor's face.

본 개시는 비대면 학습 상황에서 학습자의 반응을 인식하고 이를 모방할 수있도록 교수자의 제스처를 증강하여 학습자의 반응을 효과적으로 유도할 수 있는 제스처 증강 디지털 휴먼 튜터의 생성 방법 및 시스템을 제시한다.The present disclosure presents a method and system for creating a gesture-augmented digital human tutor that can effectively induce learner responses by augmenting an instructor's gestures to recognize and imitate learner responses in a non-face-to-face learning situation.

본 개시에 따른 디지털 휴먼 튜터의 생성 방법:은A method for creating a digital human tutor according to the present disclosure:

카메라에 의해 실제 교수자의 안면 영상과 강의 영상을 획득하는 단계;A step of acquiring a real professor's face image and lecture video by a camera;

영상 프로세서에 의해 상기 실사 영상으로부터 교수자의 관절(keypoint) 좌표와 얼굴의 외형적 특징을 추출하는 단계;A step of extracting the instructor's joint (keypoint) coordinates and facial features from the above real-life image by an image processor;

특성 프로세서에 의해 상기 얼굴의 외형적 특징으로부터 외형적 특징 변화와 상기 관절 좌표의 값에 대한 증강 가중치를 결정하는 단계;A step of determining an augmentation weight for the change in the external feature and the value of the joint coordinate from the external feature of the face by a characteristic processor;

캐릭터 생성부에 의해 상기 교수자의 얼굴 특징안면 영상을 이용하여 교수자의 안면 영상에 대응하는 디지털 휴먼 튜터를 생성하는 단계; 그리고A step of generating a digital human tutor corresponding to the instructor's facial image using the instructor's facial feature image by the character generation unit; and

캐릭터 제어부에 의해 상기 외형적 특징을 상기 디지털 휴먼 튜터에 반영하되, 상기 증강된 관절 좌표에 의해 상기 디지털 휴먼 튜터의 관절 좌표를 변화시키는 디지털 휴먼 튜터를 활성화하는 단계;를 포함한다.A step of activating a digital human tutor that reflects the external characteristics to the digital human tutor by the character control unit and changes the joint coordinates of the digital human tutor by the augmented joint coordinates; is included.

본 개시의 구체적인 실시 예에 따르면, 상기 증강된 관절 좌표는 실제 교수자의 상체 영역에 대응할 수 있다.According to a specific embodiment of the present disclosure, the augmented joint coordinates may correspond to the upper body area of an actual instructor.

본 개시의 구체적인 실시 예에 따르면, 상기 관절이 포함된 인체의 특정 부분에 연계된 관절 좌표이다.According to a specific embodiment of the present disclosure, the joint coordinates are associated with a specific part of the human body including the joint.

본 개시의 구체적인 실시 예에 따르면,According to a specific embodiment of the present disclosure,

상기 관절 좌표를 추출하는 단계:는Step of extracting the above joint coordinates:

상기 영상 분석부에 의해 2차원 평면 상의 관절 좌표를 추출하는 단계; 그리고A step of extracting joint coordinates on a two-dimensional plane by the image analysis unit; and

3D 분석기를 이용해 상기 2차원 평면에 수직인 제3의 방향(z)를 추론하여, 3차원 관절 좌표(x, y, z)를 추출하는 단계;를 포함할 수 있다.It may include a step of extracting three-dimensional joint coordinates (x, y, z) by inferring a third direction (z) perpendicular to the two-dimensional plane using a 3D analyzer.

본 개시의 다른 실시 예에 따르면, According to another embodiment of the present disclosure,

상기 증강 가중치를 결정하는 단계에서,In the step of determining the above augmentation weight,

상기 3차원 관절 좌표에서 x, y, z 3개의 좌표 중 적어도 하나의 관절 좌표에 대한 가중치를 결정할 수 있다.In the above 3D joint coordinates, a weight can be determined for at least one joint coordinate among the three coordinates x, y, and z.

본 개시의 다른 실시 예에 따르면,According to another embodiment of the present disclosure,

상기 디지털 휴먼은 영상을 통해 학습을 주도 디지털 휴먼 튜터(DHT, digital human tutor)이며, 디스플레이에 상기 DHT와 함께 학습 영상이 같이 표시될 수 있다.The above digital human is a digital human tutor (DHT) that leads learning through video, and a learning video can be displayed together with the DHT on the display.

하나 또는 그 이상의 실시 예에 따르면, 상기 안면 영상에 특징점을 추출하고, 상기 특징점을 이용해 상기 디지털 휴먼 튜터의 외형을 설정할 수 있다.According to one or more embodiments, feature points can be extracted from the facial image, and the appearance of the digital human tutor can be set using the feature points.

하나 또는 그 이상의 실시 예에 따르면, 상기 특징점은 FACS에 정의하는 랜드마크로부터 선택될 수 있다.According to one or more embodiments, the feature points may be selected from landmarks defined in FACS.

하나 또는 그 이상의 실시 예에 따르면, 강의 영상에서 교수자의 특징점을 추출하고, 추출된 특징점의 움직임 데이터를 추출하고, 상기 움직임 데이터로부터 미세 표현 데이터를 추출할 수 있다.According to one or more embodiments, it is possible to extract feature points of an instructor from a lecture video, extract motion data of the extracted feature points, and extract micro-expression data from the motion data.

하나 또는 그 이상의 실시 예에 따르면, 상기 미세 표현 데이터를 추출하기 위하여 상기 특징점 추적에 KLT (Kanade-Lucas-Tomasi) 추적 알고리즘 또는 TM (Transformation Matrix) 기반 추적 알고리즘을 적용하여 미세 표현 데이터(Micro Expression Data, MED)를 계산할 수 있다.According to one or more embodiments, the micro expression data (MED) may be calculated by applying a KLT (Kanade-Lucas-Tomasi) tracking algorithm or a TM (Transformation Matrix) based tracking algorithm to the feature point tracking to extract the micro expression data.

하나 또는 그 이상의 실시 예에 따르면, 상기 미세 표현 데이터로부터 무의식적 미세 표현 데이터를 추출하기 위하여, 미세 표현 데이터에 대한 소정 주파수의 필터링을 수행하고, 필터링 된 미세표현 데이터에 대해 주성분 분석(PCA)에 의해 심박의 주기성을 판단하고, 상기 주기성이 상기 디지털 휴먼 튜터의 미세 표현을 위한 입력 값으로 사용될 수 있다.According to one or more embodiments, in order to extract unconscious micro-expression data from the micro-expression data, filtering of a predetermined frequency is performed on the micro-expression data, and the periodicity of the heartbeat is determined by principal component analysis (PCA) on the filtered micro-expression data, and the periodicity can be used as an input value for the micro-expression of the digital human tutor.

하나 또는 그 이상의 실시 예에 따르면, 상기 교수자의 외형적 특징은 FACS에서 정의하는 특징점(Landmark)으로 추출되며, 상기 디지털 휴먼 튜터에 대해서는 상기 특징점에 의한 AU 단위로 상기 외형적 특징을 반영할 수 있다.According to one or more embodiments, the external features of the instructor are extracted as landmarks defined in FACS, and the external features can be reflected in the digital human tutor in units of AUs based on the landmarks.

본 개시에 따른 디지털 휴먼 튜터의 생성 시스템:은A system for creating a digital human tutor according to the present disclosure:

실제 교수자의 안면 영상과 강의 영상을 획득하는 하나 이상의 카메라;One or more cameras that capture actual instructor facial images and lecture videos;

상기 실사 영상으로부터 실제 인물의 관절(keypoint) 좌표와 얼굴의 외형적 특징을 추출하는 영상 프로세서;An image processor that extracts joint (keypoint) coordinates and facial features of an actual person from the above-mentioned real-life image;

상기 교수자의 강의 중 안면 영상에서 얼굴의 외형적 특징 변화와 상기 관절좌표의 변화를 추출하고, 상기 관절 좌표의 값에 대한 증강 가중치를 결정하는 특성 프로세서;A feature processor that extracts changes in facial appearance features and joint coordinates from facial images during the lecture of the above professor, and determines augmentation weights for the values of the joint coordinates;

상기 교수자의 안면 영상을 이용하여 교수자의 안면 영상에 대응하는 디지털 휴먼 튜터를 생성하는 캐릭터 생성부;상기 외형적 특징을 상기 디지털 휴먼 튜터에 반영하되, 상기 증강된 관절 좌표에 의해 상기 디지털 휴먼 튜터의 관절 좌표를 변화시키는 디지털 휴먼 튜터를 활성화하는 캐릭터 제어부; 그리고A character generation unit that generates a digital human tutor corresponding to the instructor's facial image using the instructor's facial image; a character control unit that reflects the external features to the digital human tutor and activates the digital human tutor to change the joint coordinates of the digital human tutor by the augmented joint coordinates; and

상기 디지털 휴먼 튜터가 포함된 강의 영상을 생성하는 강의 영상 생성부;를 포함한다.It includes a lecture video generation unit that generates a lecture video including the above digital human tutor.

하나 또는 그 이상의 실시 예에 따르면, 상기 캐릭터 생성부는 상기 안면 영상에 특징점을 추출하고, 상기 특징점을 이용해 상기 디지털 휴먼 튜터의 외형을 설정할 수 있다.According to one or more embodiments, the character generation unit can extract feature points from the facial image and set the appearance of the digital human tutor using the feature points.

하나 또는 그 이상의 실시 예에 따르면, 캐릭터 생성부는 상기 특징점으로 FACS에 정의하는 랜드마크로부터 선택할 수 있다.According to one or more embodiments, the character generator can select from landmarks defined in FACS as the above-described feature points.

하나 또는 그 이상의 실시 예에 따르면, 상기 특성 프로세서는 강의 영상에서 교수자의 특징점을 추출하고, 추출된 특징점의 움직임 데이터를 추출하고, 상기 움직임 데이터로부터 미세 표현 데이터를 추출할 수 있다.According to one or more embodiments, the feature processor may extract feature points of the instructor from a lecture video, extract motion data of the extracted feature points, and extract micro-expression data from the motion data.

하나 또는 그 이상의 실시 예에 따르면, 상기 특성 프로세서는, 상기 미세 표현 데이터를 추출하기 위하여 상기 특징점 추적에 KLT (Kanade-Lucas-Tomasi) 추적 알고리즘 또는 TM (Transformation Matrix) 기반 추적 알고리즘를 적용할 수 있다.According to one or more embodiments, the feature processor may apply a KLT (Kanade-Lucas-Tomasi) tracking algorithm or a TM (Transformation Matrix) based tracking algorithm to the feature point tracking to extract the micro-expression data.

하나 또는 그 이상의 실시 예에 따르면, 상기 특성 프로세서는, 상기 미세 표현 데이터로부터 무의식적 미세 표현 데이터를 추출하기 위하여, 미세 표현 데이터에 대한 소정 주파수의 필터링을 수행하고, 필터링된 미세표현 데이터에 대해 주성분 분석(PCA)에 의해 심박의 주기성을 판단하고, 상기 주기성을 상기 디지털 휴먼 튜터의 미세 표현을 위한 입력 값으로 사용될 수 있다.According to one or more embodiments, the characteristic processor performs filtering of a predetermined frequency on the micro-expression data to extract unconscious micro-expression data from the micro-expression data, determines the periodicity of the heartbeat by principal component analysis (PCA) on the filtered micro-expression data, and uses the periodicity as an input value for the micro-expression of the digital human tutor.

하나 또는 그 이상의 실시 예에 따르면, 상기 특성 프로세서는 상기 교수자의 외형적 특징은 FACS에서 정의하는 특징점(Landmark)으로 추출하며, 상기 디지털 휴먼 튜터에 대해서는 상기 특징점에 의한 AU 단위로 상기 외형적 특징을 반영할 수 있다.According to one or more embodiments, the feature processor extracts the external features of the instructor as landmarks defined in FACS, and can reflect the external features of the digital human tutor in units of AUs based on the landmarks.

도1은 하나 또는 그 이상의 실시 예에 따른 교수자의 얼굴 및 관절 특징 검출 과정을 보이는 플로우챠트이다.
도2은 FACS(Facial Action Coding System)에서 정의하는 얼굴 특징점 (Landmark)의 배열을 예시한다.
도3은 하나 또는 그 이상의 실시 예에 따라, 얼굴 영상(Facial Video)-얼굴검출(Face Detection)-특징점 검출(Facial Landmark detection) 과정의 결과물을 보인다.
도4는 하나 또는 그 이상의 실시 예에 따라, 카메라로 촬영된 교수자 영상에서 얼굴 영역으로부터 심장 박동에 의한 얼굴의 무의식적 미세 표현을 추출하기 위한 과정을 보여주는 플로우챠트이다.
도5는 전술한 미세표현 데이터(MED)에 대한 슬라이딩 윈도우 기법을 포함하는 것으로 심장 박동 신호를 판단하는 과정의 플로우챠트이다.
도6은 하나 또는 그 이상의 실시 예에 따른 제스처 증강 방법의 개념을 설명하는 흐름도이다.
도7는 하나 또는 그 이상의 실시 예에 따른 제스처 증강 방법에서, 실제 인물로부터 추출되는 여러 관절을 보여준다.
도8은 하나 또는 그 이상의 실시 예에 따른 제스처 증강 방법이 적용되지 않은 DHT(digital human tutor)의 오리지널 영상을 보인다.
도9 내지 도12는 하나 또는 그 이상의 실시 예에 따른 제스터 증강 방법에서 DHT의 손부분의 x, y, z 좌표별 증강에 따른 결과를 보이는 영상이다.
도12는 하나 또는 그 이상의 실시 예에 따른 제스처 증강 방법에서 x, y, z 방향으로 손의 좌표가 증강된 상태의 결과를 보이는 영상이다.
도13은 하나 또는 그 이상의 실시 예에 따른 제스처 증강 방법에서, 제스쳐증강 전 DHT와 제스처 증강 완료 후의 DHT를 비교해 보인다.
도14는 하나 또는 그 이상의 실시 예에 따라, DHT에 의한 강의 영상을 제작하는 과정을 도식화한 것이다.
도15는 하나 또는 그 이상의 실시 예에 따라, DHT에 의한 강의 영상을 제작하는 시스템의 구성도이다.
도16은 하나 또는 그 이상의 실시 예에 따라 제작된 강의 영상용 재생 시스템을 예시한다.Figure 1 is a flowchart showing a process for detecting facial and joint features of an instructor according to one or more embodiments.
Figure 2 illustrates an arrangement of facial landmarks defined in the Facial Action Coding System (FACS).
FIG. 3 shows the results of a facial video-face detection-facial landmark detection process according to one or more embodiments.
FIG. 4 is a flowchart showing a process for extracting unconscious facial micro-expressions caused by heartbeats from a facial region in an instructor image captured by a camera, according to one or more embodiments.
Figure 5 is a flowchart of a process for determining a heartbeat signal, including a sliding window technique for the aforementioned micro-expression data (MED).
FIG. 6 is a flowchart illustrating a concept of a gesture augmentation method according to one or more embodiments.
FIG. 7 illustrates several joints extracted from a real person in a gesture augmentation method according to one or more embodiments.
FIG. 8 shows an original image of a digital human tutor (DHT) without any gesture augmentation method applied according to one or more embodiments.
FIGS. 9 to 12 are images showing the results of augmentation of the hand portion of DHT according to x, y, and z coordinates in a gesture augmentation method according to one or more embodiments.
FIG. 12 is an image showing the result of a gesture augmentation method according to one or more embodiments in which the coordinates of the hand are augmented in the x, y, and z directions.
FIG. 13 shows a comparison of a DHT before gesture augmentation and a DHT after gesture augmentation is completed in a gesture augmentation method according to one or more embodiments.
Figure 14 is a diagram illustrating a process of producing a lecture video using DHT according to one or more embodiments.
FIG. 15 is a block diagram of a system for producing a lecture video using DHT according to one or more embodiments.
FIG. 16 illustrates a playback system for a lecture video manufactured according to one or more embodiments.

이하, 첨부도면을 참조하여 본 발명 개념의 바람직한 실시 예들을 상세히 설명하기로 한다. 그러나, 본 발명 개념의 실시 예들은 여러 가지 다른 형태로 변형될 수 있으며, 본 발명 개념의 범위가 아래에서 상술하는 실시 예들로 인해 한정 되어 지는 것으로 해석되어져서는 안 된다. 본 발명 개념의 실시 예들은 당 업계에서 평균적인 지식을 가진 자에게 본 발명 개념을 보다 완전하게 설명하기 위해서 제공 되는 것으로 해석되는 것이 바람직하다. 동일한 부호는 시종 동일한 요소를 의미한다. 나아가, 도면에서의 다양한 요소와 영역은 개략적으로 그려진 것이다. 따라서, 본 발명 개념은 첨부한 도면에 그려진 상대적인 크기나 간격에 의해 제한되어지지 않는다.흐Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the attached drawings. However, the embodiments of the present invention may be modified in various different forms, and the scope of the present invention should not be construed as being limited by the embodiments described below. It is preferable that the embodiments of the present invention be construed as being provided to more completely explain the present invention to a person having average knowledge in the art. Like symbols represent like elements throughout. Furthermore, various elements and areas in the drawings are schematically drawn. Therefore, the present invention is not limited by the relative sizes or intervals drawn in the attached drawings.

제1, 제2 등의 용어는 다양한 구성 요소들을 설명하는 데 사용될 수 있지만, 상기 구성 요소들은 상기 용어들에 의해 한정되지 않는다. 상기 용어들은 하나의 구성 요소를 다른 구성 요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명 개념의 권리 범위를 벗어나지 않으면서 제 1 구성 요소는 제 2 구성 요소로 명명될 수 있고, 반대로 제 2 구성 요소는 제 1 구성 요소로 명명될 수 있다.While terms like "first" and "second" may be used to describe various components, these components are not limited by these terms. These terms are used solely to distinguish one component from another. For example, a first component could be referred to as a "second component," and vice versa, without departing from the scope of the present invention.

본 출원에서 사용한 용어는 단지 특정한 실시 예들을 설명하기 위해 사용된 것으로서, 본 발명 개념을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, “포함한다” 또는 “갖는다” 등의 표현은 명세서에 기재된 특징, 개수, 단계, 동작, 구성 요소, 부분품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 개수, 동작, 구성 요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terminology used in this application is only used to describe specific embodiments and is not intended to limit the concept of the present invention. The singular expression includes the plural expression unless the context clearly indicates otherwise. In this application, it should be understood that the expressions “comprises” or “has” are intended to specify the presence of a feature, number, step, operation, component, part, or combination thereof described in the specification, but do not exclude in advance the possibility of the presence or addition of one or more other features, numbers, operations, components, parts, or combinations thereof.

달리 정의되지 않는 한, 여기에 사용되는 모든 용어들은 기술 용어와 과학 용어를 포함하여 본 발명 개념이 속하는 기술 분야에서 통상의 지식을 가진 자가 공통적으로 이해하고 있는 바와 동일한 의미를 지닌다. 또한, 통상적으로 사용되는, 사전에 정의된 바와 같은 용어들은 관련되는 기술의 맥락에서 이들이 의미하는 바와 일관되는 의미를 갖는 것으로 해석되어야 하며, 여기에 명시적으로 정의하지 않는 한 과도하게 형식적인 의미로 해석되어서는 아니 될 것임은 이해될 것이다.Unless otherwise defined, all terms used herein, including technical and scientific terms, have the same meaning as commonly understood by those of ordinary skill in the art to which the present invention pertains. Furthermore, it is to be understood that commonly used terms, such as those defined in dictionaries, should be interpreted to have a meaning consistent with their meaning within the relevant technical context, and should not be interpreted in an overly formal sense unless explicitly defined herein.

어떤 실시 예가 달리 구현 가능한 경우에 특정한 공정 순서는 설명되는 순서와 다르게 수행될 수도 있다. 예를 들어, 연속하여 설명되는 두 공정이 실질적으로 동시에 수행될 수도 있고, 설명되는 순서와 반대의 순서로 수행될 수도 있다.In some embodiments, where the implementation is otherwise feasible, a particular process sequence may be performed in a different order than described. For example, two processes described in succession may be performed substantially simultaneously, or in a reverse order from the described order.

이하에서 하나 또는 그 이상의 실시 예에 따르면, 교수자의 외형적 특징, 얼굴의 무의식적 미세 표현을 가상 아바타인 디지털 휴먼 튜터(Digital Human Tutor, 이하, DHT)에게 이식 방법 및 시스템이 제시된다.According to one or more embodiments herein, a method and system for transplanting an instructor's external features, such as unconscious facial micro-expressions, into a virtual avatar, a Digital Human Tutor (hereinafter, DHT) are presented.

이에 따르면, 실제 교수자의 제스쳐 및 표정 등이 DHT의 움직임 및 표정 변화에 반영되고, 특히 실제 교수자의 영상에서 추출한 교수자의 눈, 눈썹, 코, 입, 얼굴형 등의 외형적 특징이 DHT에 표현되고, 또한 그 표정 변화도 교수자의 얼굴 영역에서 추출된 얼굴 표정의 변화에 반영된다. 또한 교수자의 제스처를 인식하여 이를 DHT에 반영하며 이때에 공감 확대를 위해 제스처를 증강하여 반영한다. 이를 통해서 비대면 교육 환경에서 교수자와 학습자가 상호 인터랙션 하고 있다는 느낌을 주고, DHT에 대해 신뢰도를 높여 소통의 질을 향상시키고 교육 환경의 제약을 개선시키는데 기여할 수 있을 것으로 기대된다.According to this, the actual instructor's gestures and facial expressions are reflected in the movement and facial expression changes of the DHT, and in particular, the instructor's external features such as eyes, eyebrows, nose, mouth, and facial shape extracted from the actual instructor's video are expressed in the DHT, and the changes in the instructor's facial expressions are also reflected in the changes in the facial expressions extracted from the instructor's face area. In addition, the instructor's gestures are recognized and reflected in the DHT, and at this time, the gestures are augmented and reflected to expand empathy. Through this, it is expected that it will be possible to give the feeling that the instructor and learner are interacting with each other in a non-face-to-face educational environment, increase the reliability of the DHT, improve the quality of communication, and contribute to improving the constraints of the educational environment.

본 개시에 따른 실시 예는 DHT를 생성하기 위하여 다음의 3단계의 과정을 포함한다.An embodiment according to the present disclosure includes the following three-step process for generating a DHT.

단계1: 교수자의 안면 형상 및 표정 특성을 포함하는 안면 특징 및 제스처 특징을 포함하는 외형적 특징, 그리고 안면 표정으로 부터 무의식적 미세 표현 데이터를 인식한다.Step 1: Recognize external features including facial features and gesture features including the instructor's facial shape and expression characteristics, and unconscious micro-expression data from facial expressions.

단계2: 인식된 교수자의 특징을 적용한 DHT를 생성한다.Step 2: Create a DHT that applies the recognized instructor's features.

단계3: 생성된 DHT를 커스터마이징 한다.Step 3: Customize the generated DHT.

교수자의 외형적 특징은 얼굴(안면)의 특징 값과 신체의 관절 포인트의 특징 값을 포함한다. 얼굴의 특징값은 교수자의 안면 영상으로 얻어지며, 관절 포인트의 특징 값은 신체 관절이 포함된 신체 전반 또는 상체 영역의 사진으로부터 얻어 질 수 있다.The instructor's physical features include facial features and joint point features. Facial features are obtained from a facial image of the instructor, while joint point features can be obtained from a photograph of the entire body or upper body region, including joints.

이하에서는 안면 특징값과 관절 포인트의 특징값의 검출에 대해 별개로 설명되며, 먼저 안면 영상으로부터 표정 변화에 따른 얼굴 특징 값을 추출에 대해 설명한다.Below, the detection of facial feature values and joint point feature values is described separately, and first, the extraction of facial feature values according to changes in facial expression from a facial image is described.

<단계 1> 교수자의 외형적 특징과 미세 표현 데이터의 인식<Step 1> Recognition of the instructor's external characteristics and micro-expression data

I. 교수자의 외형적 특징 검출I. Detection of the instructor's external characteristics

카메라로 촬영된 교수자 영상의 얼굴 및 반신 또는 전신 영역으로부터 교수자의 외형적 특징을 추출하기 위한 과정이다. 다음과 같은 단계로 진행된다. 안면의 외형적 특징은 표 1과 같이 눈썹, 눈, 코, 입, 턱의 각 요소의 얼굴 특징 값을 포함한다. 안면 특징 값은 각 요소의 중립적인 위치 또는 크기에서의 변화율을 나타낸다.This process extracts the instructor's physical features from the face and half-body or full-body regions of a camera-captured instructor video. The process proceeds as follows. The facial features include the facial feature values of each element—eyebrows, eyes, nose, mouth, and chin—as shown in Table 1. These facial feature values represent the rate of change from the neutral position or size of each element.

도1은 이하에서 설명될 교수자 안면 및 관절 특징 검출 과정을 보이는 플로우챠트이다. 도2은 FACS에서 정의 하는 특징점 (Landmark)의 배치를 예시하며, 도3은 아래에서 설명되는 얼굴영상(Facial Video)-얼굴검출(Face Detection)-특징점 검출(Facial Landmark detection) 과정에 따른 결과물을 보인다.Figure 1 is a flowchart illustrating the process of detecting instructor facial and joint features described below. Figure 2 illustrates the arrangement of landmarks defined in FACS, and Figure 3 shows the results of the facial video-face detection-facial landmark detection process described below.

i. Facial Video Acquisitioni. Facial Video Acquisition

30fps 이상의 영상 촬영이 가능한 카메라로부터 얼굴이 포함된 교수자의 상반신 또는 전신 모습을 촬영한다.Take a picture of the instructor's upper body or full body, including the face, from a camera capable of capturing video at 30 fps or higher.

ii. Face detectionii. Face detection

촬영한 영상 이미지 내에서 교수자의 얼굴이 위치하는 영역을 검출한다. 이단계에서는 Viola Jones 의 방법이 적용될 수 있다.Detect the area where the instructor's face is located within the captured video image. Viola Jones' method can be applied at this stage.

iii. Facial Landmark detectioniii. Facial landmark detection

검출한 얼굴 영역 내에서 교수자의 외형적 특징을 검출한다. 이때 외형적 특징은 눈썹, 눈, 코, 입, 턱이며, 위치는 68개의 특징점 (Landmark)을 이용해 검출할 수 있다. 상기 특징점은 예를 들어 Ekman의 FACS(Facial Action Coding System)를 기반으로 정의되고 검출될 수 있다. 안면 근육 AU(Action Unity)를 정의하며, 이들 AU에 의해 외형적 특징 움직임을 검출한다. Within the detected facial region, the instructor's external features are detected. These external features include the eyebrows, eyes, nose, mouth, and chin, and their locations can be detected using 68 landmarks. These landmarks can be defined and detected based on, for example, Ekman's Facial Action Coding System (FACS). Facial muscle AUs (Action Units) are defined, and external feature movements are detected based on these AUs.

아래의 표2는 얼굴 표정의 변화를 판단하기 위한 얼굴 근육의 움직임을 정의하는 AU 와 각 AU에 속하는 랜드마크를 설명한다.Table 2 below describes the AUs that define the movements of facial muscles for judging changes in facial expressions and the landmarks belonging to each AU.

iv. 관절 추출iv. Joint extraction

이 단계는 전술한 안면 외형 특징 검출 과정과 함께 병렬 수행된다.This step is performed in parallel with the facial appearance feature detection process described above.

안면 특징이 추출된 교수자의 전신 또는 반신 영상으로부터 교수자의 관절(keypoint)의 좌표를 추출한다.The coordinates of the instructor's joints (keypoints) are extracted from a full-body or half-body image of the instructor from which facial features have been extracted.

신체의 관절은 대략 18개이며, 보다 자연스러운 제스처 표현을 위해서는 18개의 관절 좌표의 추출이 바람직하다. 이러한 관절 좌표의 추출에는 다양한 방법이 사용될 수 있으며, 여기에는 머신러닝에 의한 딥러닝 모델이 적용될 수 있다. 알려진 딥러닝 모델에는 cmu, mobilenet_thin, mobilenet_v2_large, mobilenet_v2_small, tf-pose-estimation, openpose 및 vnect 등이 있다. The human body has approximately 18 joints, and extracting the coordinates of these 18 joints is desirable for more natural gesture expression. Various methods can be used to extract these joint coordinates, and deep learning models based on machine learning can be applied. Known deep learning models include cmu, mobilenet_thin, mobilenet_v2_large, mobilenet_v2_small, tf-pose-estimation, openpose, and vnect.

이 단계에서는 실제 인물 영상에서 인물의 움직임에 따른 이동된 관절 좌표를 일정한 프레임 간격으로 관절 좌표를 추출한다.In this step, the joint coordinates that have moved according to the movement of the person in the actual person image are extracted at regular frame intervals.

이 단계에서, 관절 좌표의 추출은 전술한 바와 같은 딥러닝 모델로서 cmu, mobilenet_thin, mobilenet_v2_large, mobilenet_v2_small, tf-pose-estimation, openpose 및 vnect 등을 적용한다.At this stage, the extraction of joint coordinates is performed using deep learning models such as cmu, mobilenet_thin, mobilenet_v2_large, mobilenet_v2_small, tf-pose-estimation, openpose, and vnect as described above.

상기와 같이 얻어진 관절 좌표는 증강된다. 과정 좌표의 증강 단계는 교수자의 제스처를 크게 강조하는 제스처 증강을 위한 좌표 증강 단계이다. 관절 좌표의 증감은 원래 추출된 2차원 좌표에 대해서 수행될 수 있다.The joint coordinates obtained as described above are augmented. The augmentation step of the process coordinates is a coordinate augmentation step for gesture augmentation that greatly emphasizes the instructor's gestures. The augmentation of the joint coordinates can be performed on the originally extracted two-dimensional coordinates.

II. 심장 박동에 의한 얼굴의 무의식적 미세 표현 데이터 추출II. Extraction of unconscious facial microexpression data from heartbeats

교수자의 내적 감성 또는 감정은 얼굴의 표정 변화를 통해서 인지할 수 있다. 얼굴 표정은 내적 감정에 따른 안면 근육의 움직임에 기인한다. 따라서, 얼굴의 움직임, 특히 여러 AU들의 움직임을 평가함으로써 내적 감정을 평가 또는 판단할 수 있다. 그런데, 여기에는 감정과는 무관한 교수자의 의식적 움직임이 같이 나타날 수 있고, 이것은 내적 감정을 평가하는데 노이즈로 작용할 수 있다. 따라서 교수자의 얼굴 근육에 나타난 미세 움직임에서 의식적 움직임을 제거한다면 교수자의 진정한 미세 표정, 즉 내적 감정을 평가할 수 있다.A teacher's internal emotions or feelings can be recognized through changes in facial expressions. Facial expressions are caused by facial muscle movements that reflect internal emotions. Therefore, internal emotions can be assessed or determined by evaluating facial movements, particularly those of various AUs. However, conscious movements of the teacher, unrelated to emotions, may also appear in this process, which can act as noise in assessing internal emotions. Therefore, by removing conscious movements from the micro-movements of the teacher's facial muscles, we can assess the teacher's true micro-expressions, i.e., internal emotions.

본 실시 예에 의해, 이러한 노이즈 성분으로서의 의식적인 움직임은 필터링된다. 이러한 노이즈 성분을 배제한 내적 감성에 기초하는 미세 움직임은 분당 45에서 150회 범위의 정상 심박수(BPM)에 기초하여 발현된다.In this embodiment, conscious movements, as noise components, are filtered out. Fine movements based on internal emotions, excluding these noise components, are expressed based on a normal heart rate (BPM) ranging from 45 to 150 beats per minute (BPM).

도4는 카메라로 촬영된 교수자 영상에서 얼굴 영역으로부터 심장 박동에 의한 얼굴의 무의식적 미세 표현을 추출하기 위한 과정을 보여주는 플로우챠트이며, 이하에서 이 과정을 상세히 설명한다.Figure 4 is a flowchart showing the process for extracting unconscious facial micro-expressions caused by heartbeats from the facial region in an instructor video captured by a camera, and this process is described in detail below.

i. Facial Videoi. Facial Video

교수자의 외형적 특징을 검출하기 위해 교수자의 얼굴 영역이 포함된 상반신 또는 전신을 촬영하여 안면 여상을 획득한다. 안면 영상의 획득에는 카메라에 의한 시청자 안면의 촬영 및 영상 콘텐츠의 연속적 캡쳐가 포함된다. 이 단계에서 예를 들어 30fps의 두 안면 영상을 획득한다.To detect the instructor's physical features, a facial image is acquired by capturing the instructor's upper body or full body, including the instructor's face. Acquiring the facial image involves capturing the viewer's face with a camera and continuously capturing the video content. For example, two facial images are acquired at 30 frames per second.

ii. Face detectionii. Face detection

얼굴 검출(face detection) 및 추적(tracking)을 통해 얼굴 영역 또는 포인트를 추출한다. 얼굴 영역의 추출은 사람의 얼굴 부위마다 나타나는 명암의 특징을 이용한 Viola-Jones 알고리즘, HOG(Histogram of Oriented Gradients) 등의 방법을 적용한다.Face regions or points are extracted through face detection and tracking. Facial region extraction is achieved using methods such as the Viola-Jones algorithm and HOG (Histogram of Oriented Gradients), which utilize the luminance characteristics of each facial region.

iii. Area Selectioniii. Area Selection

검출된 얼굴 영역에서 노이즈 신호가 가장 적게 발생하는 이마(forehead)와 코(nose) 영역을 선택한다. In the detected face area, the forehead and nose areas with the least noise signals are selected.

iv. Feature Extraction: iv. Feature Extraction:

선택된 이마와 코 영역에서 다른 포인트와 대비해 추적하기 위한 다수의 특징점을 추출한다. 이러한 특징점 추출에는 Good-Feature-To-Track (GFTT) 알고리즘, FLD (Facial Landmark Detection) 알고리즘이 적용될 수 있다. 본 실시 예에서는, GFTT 알고리즘을 적용하여 다수의 특징점(Landmark)을 추출한다.Multiple landmark points are extracted from the selected forehead and nose areas for tracking against other points. The Good-Feature-To-Track (GFTT) algorithm or the Facial Landmark Detection (FLD) algorithm can be applied to extract these landmark points. In this embodiment, the GFTT algorithm is applied to extract multiple landmark points.

v. Feature Trackingv. Feature Tracking

추출된 각 특징점의 움직임 데이터를 구한다. 이러한 특징 추적에는 KLT (Kanade-Lucas-Tomasi) 추적 알고리즘, TM (Transformation Matrix) 기반 추적 알고리즘 등이 적용될 수 있다. 본 실시 예에서는, 연속된 프레임에 대해서 KLT 알고리즘을 이용해 각 특징점 마다 이전 프레임 대비 현재 프레임에서 y좌표 값이 움직인 값을 추적하여 심장박동에 의한 무의식적 미세 표현 데이터(Micro Expression Data, MED)를 추출한다. 미세 표현 데이터의 추출에는 슬라이딩 윈도우 기법이 이용될 수 있으며, 이때의 윈도우 사이즈는 30s, 인터벌 사이즈는 1s로 설정할 수 있다.Obtain the movement data of each extracted feature point. For this feature tracking, the KLT (Kanade-Lucas-Tomasi) tracking algorithm, the TM (Transformation Matrix) based tracking algorithm, etc. can be applied. In this embodiment, for consecutive frames, the KLT algorithm is used to track the value of the movement of the y-coordinate value in the current frame compared to the previous frame for each feature point, thereby extracting unconscious micro expression data (Micro Expression Data, MED) due to heartbeat. The sliding window technique can be used to extract the micro expression data, and the window size can be set to 30s and the interval size can be set to 1s.

III. 심장 박동 신호 판단III. Heartbeat signal assessment

도5는 전술한 미세표현 데이터(MED)에 대한 슬라이딩 윈도우 기법을 포함하는 것으로 심장 박동 신호를 판단하는 과정의 플로우챠트이다.Figure 5 is a flowchart of a process for determining a heartbeat signal, including a sliding window technique for the aforementioned micro-expression data (MED).

이 과정은 위의 과정에서 추출된 무의식적 미세 표현 데이터(MED)로부터 미세 표현에 대해 노이즈가 없는 심장 반응에 의한 성분만을 추출하기 위한 과정이다.This process is to extract only components by cardiac responses without noise for micro-expressions from the unconscious micro-expression data (MED) extracted in the above process.

i. Bandpass Filteri. Bandpass Filter

얼굴의 무의식적 미세 표현 신호에 대해 Butterworth Bandapss Filter (5 order, 0.75-5Hz)를 이용해 심박수 대역에 해당하는 주파수인 0.75Hz(45bpm)~2.5Hz(150bpm) 대역만을 추출한다.For the unconscious micro-expression signals of the face, the Butterworth Bandpass Filter (5 order, 0.75-5Hz) is used to extract only the frequency band corresponding to the heart rate band of 0.75Hz (45bpm) to 2.5Hz (150bpm).

ii. Principal Component Analysisii. Principal Component Analysis

PCA(Principal Component Analysis)는 각 특징점(Landmark)에서 추출된 얼굴의 무의식 미세 표현 데이터로부터 동일한 성분을 가진 하나의 얼굴 무의식 미세 표현 데이터를 추출하기 위한 과정으로, 주성분 분석을 통해 5개의 컴포넌트(Component)를 추출한다. 각 컴포넌트(Component)에 대해 생체 신호는 주기성을 띈다는 특징을 이용해 가장 높은 주기성을 보이는 컴포넌트 (Component)를 최종 얼굴 미세 표현 데이터로 추출한다. 주기성은 다음과 같이 계산된다.PCA ( Principal Component Analysis ) is a process for extracting a single facial micro-expression data with the same components from the facial micro-expression data extracted from each landmark. Five components are extracted through principal component analysis. For each component, the component with the highest periodicity is extracted as the final facial micro-expression data, taking advantage of the periodic nature of biosignals. The periodicity is calculated as follows:

여기에서, s는 시계열 신호, FFT는 시계열 신호를 주파수 대역으로 변환하기 위한 푸리에 분석 방법, PS는 시계열 신호 s에 대한 주파수 대역의 파워 스펙트럼이다.Here, s is a time series signal, FFT is a Fourier analysis method for converting a time series signal into a frequency band, and PS is the power spectrum of the frequency band for the time series signal s .

여기에서 Max Power는 전체 파워 스펙트럼에서 가장 큰 파워 값Here, Max Power is the largest power value in the entire power spectrum.

여기에서 Total Power 는 전체 파워 스펙트럼의 합이다.Here, Total Power is the sum of the entire power spectrum.

최종적으로 시계열 신호 s 에 대한 주기성(Periodicity)는 다음과 같이 계산된다.Finally, the periodicity of the time series signal s is calculated as follows.

최종적으로, 교수자 얼굴의 미세 표현으로부터 심장 반응에 대한 주기성 (심박수)이 분석되고, 이 값은 DHT 가 교수자 얼굴의 미세 표현을 하도록 하기 위한 입력 값으로 사용된다. Finally, the periodicity (heart rate) of the cardiac response is analyzed from the micro-representation of the instructor's face, and this value is used as an input value for DHT to make a micro-representation of the instructor's face.

구체적으로, PCA를 거쳐서 나온 0.75~Hz(45bpm)~2.5Hz(150bpm) 대역의 얼굴 미세 신호는 디지털 휴먼 튜터의 얼굴 랜드마크 특징 점(눈썹, 눈, 코, 입, 턱)의 y좌표에 진폭 값으로 반영된다. 따라서, DHT의 입력값은 얼굴의 외적 표정과 내적 감정이 스며 있는 내적 표정을 동적으로 변화시킨다.Specifically, the facial micro-signals in the 0.75 Hz (45 bpm) to 2.5 Hz (150 bpm) band generated through PCA are reflected as amplitude values in the y-coordinates of the facial landmark feature points (eyebrows, eyes, nose, mouth, and chin) of the digital human tutor. Therefore, the input values of the DHT dynamically change the external facial expression and the internal expression imbued with internal emotions.

<단계 2> 인식된 교수자의 특징을 적용한 디지털 휴먼 튜터(DHT) 생성<Step 2> Creating a Digital Human Tutor (DHT) by Applying the Characteristics of the Recognized Instructor

DHT는 <단계 1> 에서 인식된 교수자의 특징 값을 기본 값(default)으로 사용하여 하기와 같은 과정을 진행한다. 교수자의 특징 값은 얼굴의 외적 표정과 내적 감정이 스며 있는 내적 표정에 관계된다.DHT uses the instructor's feature values recognized in <Step 1> as default values and proceeds as follows. The instructor's feature values are related to the external facial expression and the internal expression that contains internal emotions.

I. 디지털 휴먼 튜터 (DHT) 생성I. Creation of a Digital Human Tutor (DHT)

<단계1> 에서 교수자의 특징을 인식하기 위해 사용된 정면 영상을 투사하여 가상 아바타의 외관 골격을 형성해주는 자동화된 SW (예: REALLUSION Character Creator3)의 휴먼 모델을 사용하여 DHT를 초기화한다. 이러한 휴먼 모델은 API 또는 DLL (dynamic link library) 로 제공 가능하므로, DHT가 적용되는 어플리케이션에 이식이 가능할 것이다. <Step 1> Initializes the DHT using a human model from an automated software (e.g., REALLUSION Character Creator3) that projects the frontal image used to recognize the instructor's characteristics and creates the virtual avatar's external skeleton. This human model can be provided as an API or DLL (dynamic link library), making it easy to port to applications that utilize the DHT.

II. 디지털 휴먼 튜터 (DHT) 외형 보정II. Digital Human Tutor (DHT) Appearance Correction

세부적으로 인식된 교수자의 특징 값(눈썹, 눈, 코, 입, 턱)을 기준으로 정규화해 DHT의 외형을 보정한다. DHT로서 전술한 바와 같이 상용화된 가상 아바타가 이용될 수 있으며, 전술한 바와 같은 과정을 통해 얻은 데이터에 기반하여 DHT의 기본적 외적 속성을 셋팅 한다. The DHT's appearance is normalized based on the detailed, recognized instructor's feature values (eyebrows, eyes, nose, mouth, and chin). As mentioned above, a commercially available virtual avatar can be used as the DHT, and the DHT's basic external attributes are set based on the data obtained through the aforementioned process.

III. 교수자 특징 적용III. Application of instructor characteristics

시간에 따른 프레임을 생성하여 <단계1> 에서 인식된 교수자의 심박 대역에 해당하는 주기적인 진동 주파수에 따라 DHT의 얼굴 랜드마크 특징점에 y(수직방향) 좌표의 진폭 값을 변화시킨다. 이러한 교수자 특징의 적용은 교수자의 얼굴 표정 및 감정을 DHT에 복제하여 DHT를 활성화하는 것이며, 따라서 교수자를 몸짓, 얼굴표정, 얼굴에 드러나는 복잡한 감성적 움직임이 활성화된 DHT를 통해서 학습자에게 전달될 수 있다.By generating a frame according to time, the amplitude value of the y (vertical direction) coordinate of the facial landmark feature point of the DHT is changed according to the periodic vibration frequency corresponding to the instructor's heart rate band recognized in <Step 1>. The application of these instructor features activates the DHT by replicating the instructor's facial expressions and emotions in the DHT, and thus the instructor's gestures, facial expressions, and complex emotional movements revealed on the face can be conveyed to the learner through the activated DHT.

<단계3> 생성된 디지털 휴먼 튜터 (DHT) 커스터마이징 <Step 3> Customizing the generated digital human tutor (DHT)

전 단계에서 기본 값으로 형성된 DHT의 외형은 사용자가 임의로 커스터마이징 할 수 있다. 커스터마이징할 수 있는 특징 값은 표 1과 같으며 -중립적 특징 값을 기준으로 소정 범위, Character Creator 의 경우 -100에서 +100까지의 범위로 조정할 수 있다.The appearance of the DHT, formed from default values in the previous step, can be freely customized by the user. The customizable feature values are listed in Table 1, and can be adjusted within a predetermined range based on the neutral feature value, or within a range of -100 to +100 for Character Creator.

이와 같은 과정을 통해서, 교수자의 실제 얼굴 표정 및 얼굴에 나타난 감정이 디지털 휴먼 튜터의 표정에 반영할 수 있고, 이로써 보다 효과적인 정보의 전달 또는 감정 전달이 가능하게 될 것이다.Through this process, the instructor's actual facial expressions and facial emotions can be reflected in the digital human tutor's facial expressions, enabling more effective information or emotional communication.

이에 더하여 상기 DHT에 교수자의 제스처 변화에 따른 관절 좌표의 증강하여 반영함으로써 학습자에게 교수자의 감성뿐 아니라 학습자의 관심을 더욱 강화한다.In addition, by reflecting the joint coordinates according to the instructor's gesture changes in the above DHT, the instructor's emotions as well as the learner's interest are further strengthened.

도6은 DHT의 제스처 증강 방법의 흐름도이며, 이를 참조하면서 DHT 제스처 증강 방법을 설명한다.Figure 6 is a flowchart of a DHT gesture augmentation method, and the DHT gesture augmentation method is explained with reference to this.

여기에서 사용되는 DHT는 전술한 방법에 의해 교수자로부터 얻어진 안면 특징이 반영된 외모를 가진다. DHT의 외모를 인공적으로 형성할 수 도 있으나, 본 개시에서는 DHT 캐릭터 생성부에서 실제 인물의 안면을 촬영한 후, 이로부터 안면 특징을 추출하여, 이를 DHT의 외모 특성으로 반영할 수 있다. 이를 위한 실제 캐릭터 생성기로서 “Character Creator”라는 프로그램을 적용할 수 있다.The DHT used here has an appearance that reflects facial features obtained from the instructor using the aforementioned method. While the DHT's appearance can be artificially created, in the present disclosure, the DHT character generator captures an actual person's face, extracts facial features from it, and then reflects these as the DHT's appearance characteristics. A program called "Character Creator" can be used as an actual character generator for this purpose.

상기 실제 인물의 전신 또는 반신을 연속 촬영한다.Take continuous shots of the full body or half body of the above real person.

실제 인물의 전신 또는 반신 영상으로부터 관절(keypoint)의 좌표를 추출한다. 신체의 관절은 대략 18개이며, 보다 자연스러운 제스처 표현을 위해서는 18개의 관절 좌표의 추출이 바람직하다. 이러한 관절 좌표의 추출에는 다양한 방법이 사용될 수 있으며, 여기에는 머신러닝에 의한 딥러닝 모델이 적용될 수 있다. 알려진 딥러닝 모델에는 cmu, mobilenet_thin, mobilenet_v2_large, mobilenet_v2_small, tf-pose-estimation, openpose 및 vnect 등이 있다. 여기에서의 관절 좌표 추출은 DHT 모델을 실제 인물의 관절에 대응시키기 위한 과정이다.Extract joint coordinates (keypoints) from full-body or half-body images of real people. The human body has approximately 18 joints, and extracting the coordinates of these 18 joints is desirable for more natural gesture expression. Various methods can be used to extract these joint coordinates, and deep learning models based on machine learning can be applied. Known deep learning models include cmu, mobilenet_thin, mobilenet_v2_large, mobilenet_v2_small, tf-pose-estimation, openpose, and vnect. In this case, joint coordinate extraction is the process of matching the DHT model to the joints of a real person.

이 단계에서는 상기 과정에서 추출된 관절 좌표를 상기 과정에서 생성된 DHT의 관절에 1:1 맵핑(mapping) 한다. 즉. DH의 관절의 좌표가 상기 실제 인물의 관절 좌표가 1:1로 상호 연계된다.In this step, the joint coordinates extracted in the above process are mapped 1:1 to the joints of the DHT generated in the above process. That is, the coordinates of the joints of the DH are 1:1 correlated with the joint coordinates of the actual person.

여기에서, 관절 좌표의 추출은 전술한 바와 같은 딥러닝 모델로서 cmu, mobilenet_thin, mobilenet_v2_large, mobilenet_v2_small, tf-pose-estimation, openpose 및 vnect 등을 적용한다.Here, the extraction of joint coordinates is performed using deep learning models such as cmu, mobilenet_thin, mobilenet_v2_large, mobilenet_v2_small, tf-pose-estimation, openpose, and vnect as described above.

이 단계는 실제 인물의 제스처를 크게 강조하는 제스처 증강을 위한 좌표 증강 단계이다. 관절 좌표의 증감은 원래 추출된 2차원 좌표에 대해서 수행될 수 있다. 본 개시의 다른 실시 예에 따르면, 보다 현실감이 있는 적극적 제스처의 구현을 위하여 상기 2차원 좌표(x, y) 를 3차원 좌표 (x, y, z)로 변환한다. 상기 2차원 좌표 (x, y)는 2차원 영상 이미지 평면에서의 좌표이며, 이에 추가되는 제3의 좌표 “z” 는 영상 이미지 평면에 수직한 방향의 좌표이다. 이러한 변환에 의하면, 원래 추출된 2차원 좌표(x, y)에 z 방향의 좌표가 추가됨으로써 (x, y, z)로 표현되는 3차원 좌표가 구성될 수 있다. 여기에서 상기 좌표는 상기 신체의 특정 영역, 예를 들어, 손 영역이 포함될 수 있으며, 좌표 변환에 의해 손의 위치가 상하, 좌우, 전후로 바뀔 수 있다.This step is a coordinate augmentation step for gesture augmentation that greatly emphasizes the gestures of a real person. The increase or decrease of joint coordinates can be performed on the originally extracted two-dimensional coordinates. According to another embodiment of the present disclosure, in order to implement a more realistic active gesture, the two-dimensional coordinates (x, y) are converted into three-dimensional coordinates (x, y, z). The two-dimensional coordinates (x, y) are coordinates on a two-dimensional video image plane, and the third coordinate “z” added thereto is a coordinate in a direction perpendicular to the video image plane. By this transformation, a three-dimensional coordinate expressed as (x, y, z) can be configured by adding a coordinate in the z direction to the originally extracted two-dimensional coordinates (x, y). Here, the coordinates may include a specific region of the body, for example, a hand region, and the position of the hand may be changed up and down, left and right, or forward and backward by the coordinate transformation.

이러한 3차원 변환에는 3차원 자세 추출(3D pose estimation)이 적용될 수 있으며, 이러한 변환을 위한 알고리즘에는 Mutual PnP, Lifting from the Deep (Denis Tome, Chris Russell, Lourdes Agapito, 2017) 등이 있다.3D pose estimation can be applied to these 3D transformations, and algorithms for these transformations include Mutual PnP and Lifting from the Deep (Denis Tome, Chris Russell, Lourdes Agapito, 2017).

상기 3차원 관절 좌표의 수는 입력된 2차원 좌표의 수인 18 보다 증가되게 되는데, 예를 들어 최대 3차원 관절 좌표의 수 54에 이를 수 있다. 이때의 제스처의 증강은 좌표의 증감 또는 좌표상 각도의 증강을 포함할 수 있다.The number of the above 3D joint coordinates increases from the number of input 2D coordinates, which is 18, and can reach, for example, a maximum of 54 3D joint coordinates. The augmentation of the gesture at this time may include an increase or decrease in the coordinates or an augmentation of the angle on the coordinates.

상기와 같은 과정으로 증강된 좌표, 예를 들어 증강된 2차원 또는 3차원 관절 좌표를 DHT 모델에 적용하여, 증강된 제스처를 DHT에 구현한다.Augmented coordinates, for example, augmented 2D or 3D joint coordinates, are applied to the DHT model through the above process to implement the augmented gesture in the DHT.

상기와 같은 증강된 제스처를 가지는 DHT를 목표 동영상에서 구현하여 활성화하고, 이와 동시에 다음의 실제 인물의 제스처 변화 검출을 위해 강의 중에 있는 교수자를 연속 촬영하면서, 전술한 <S65 단계>로 복귀시켜 전술한 바와 같은 루틴을 반복시키면서 증강된 제스쳐를 가지는 DHT를 구현한다.A DHT having the above-described augmented gesture is implemented and activated in a target video, and at the same time, a professor during a lecture is continuously filmed to detect a change in the gesture of the next real person, and the routine is repeated by returning to the above-described <Step S65> to implement the DHT having the augmented gesture.

정리하면, 초기에 DHT 객체를 생성한 후, 이 DHT 객체의 관절 특성을 실제 인물의 관절에 맵핑하여, DHT의 초기화를 수행하고, 이 이후에 연속적으로 실제 인물의 관절 좌표를 인식하여 이를 증강한 후, 이를 DHT 객체 반영하여 활성화하는 과정이 본 개시의 주요 과정이다.In summary, the main process of the present disclosure is to initially create a DHT object, map the joint characteristics of this DHT object to the joints of an actual person, perform DHT initialization, and then continuously recognize the joint coordinates of an actual person, augment them, and then reflect them in the DHT object to activate them.

본 개시에서 언급된 관절은 도7에 도시된 바와 같이 18개의 관절로 분류된다.The joints mentioned in this disclosure are classified into 18 joints as shown in FIG. 7.

도7을 참조하면 실제 인물로부터 추출되는 관절의 최대 수는 18이며, 여기에는 팔다리, 어깨 관절 뿐 아니라 안면의 코, 양 눈, 양 귀, 입 그리고 목이 포함된다.Referring to Figure 7, the maximum number of joints extracted from an actual person is 18, which includes not only limb and shoulder joints, but also the nose, both eyes, both ears, mouth, and neck of the face.

위의 관절에서 보다 자연스러운 자체 또는 제스처 구현을 위해서는 모든 관절이 사용되는 것이 필요하다.For a more natural self- or gesture-implementation in the above joints, it is necessary to use all joints.

다음에서는 실제 구현된 증강된 제스처의 DHT를 설명한다.In the following, we describe the DHT of the augmented gestures that were actually implemented.

도8은 영상을 통해 학습을 지도하는 교수자(Tutor), 즉 DHT(Digital Human Tutor)를 예시한다. 도8의 영상에서는 DHT는 소극적으로 양 손을 상체 안쪽에 위치하고 있다.Figure 8 illustrates a tutor, or DHT (Digital Human Tutor), who guides learning through video. In the video of Figure 8, the DHT passively positions both hands inside the upper body.

도9는 DH의 제스처를 일부 증강한 것으로 3차원 좌표에서 x 방향의 각도를증강한 예를 보인다.Figure 9 shows an example of augmenting some of DH's gestures, augmenting the angle in the x direction in 3D coordinates.

도8과 도9를 비교해서 알 수 있듯이, 도8에 비해 도9은 보다 적극적이고 활발한 손동작을 보이고 있다.As can be seen by comparing Figures 8 and 9, Figure 9 shows more active and lively hand movements than Figure 8.

도9는 DHT의 3차원 좌표에서 x 방향의 각도를 증강한 예를 보이고, 도10은 y 방향의 각도를 증강한 예를 보이고, 그리고, 도11은 z 방향의 증강을 보인다. 그리고, 도12는 x, y, z 모든 방향으로의 증강을 보인다. Fig. 9 shows an example of augmenting the angle in the x direction in the 3D coordinates of DHT, Fig. 10 shows an example of augmenting the angle in the y direction, and Fig. 11 shows augmentation in the z direction. And Fig. 12 shows augmentation in all directions x, y, and z.

도13은 증강 전의 DHT(좌측)와 x, y, z 모든 방향으로 좌표 각도 증강이 이루어진 DHT(우측)를 비교해 보인다.Figure 13 compares the DHT before augmentation (left) and the DHT with coordinate angle augmentation in all x, y, and z directions (right).

도13에 비교 도시된 바와 같이, 증강 전에 비해 증강 후의 자세가 보다 적극적이고 동적임을 느낄 수 있다. 이는 DH의 비언어적 표현이 매우 강하게 표현됨을 보여 준다.As shown in Figure 13, the post-augmentation posture appears more active and dynamic than before. This demonstrates that DH's nonverbal expressions are very strongly expressed.

상기와 같은 영상의 변환에는 다양한 프로그램형태의 동영상 제어기가 사용될 수 있는데, 예를 들어 Unity 라는 소프트웨어를 이용할 수 있다.Various video controllers in the form of programs can be used to convert the above-mentioned video. For example, a software called Unity can be used.

Unity에서 각 관절의 움직임은 Unity에서 제공되는 slider UI의 0~10 사이의 범위 값으로 증강시킬 수 있으며,각 관절의 증강은 관절 각도 값이 소정 범위, 예를 들어 최대 50도에서 -50도 범위로 증감될 수 있다. 도6내지 도9에 도시된 바와 같이 팔 부분 제스처의 증강을 원하는 경우 영상 처리자가 팔 부분에 해당하는 관절을 선택하고 관절의 x, y, z 각도를 각각 선택하여 0~10 사이의 범위 값으로 각도 값을 증강시키게 된다.In Unity, the movement of each joint can be augmented with a range value between 0 and 10 in the slider UI provided in Unity, and the augmentation of each joint can increase or decrease the joint angle value within a certain range, for example, from a maximum of 50 degrees to a range of -50 degrees. As shown in FIGS. 6 to 9, when augmentation of an arm gesture is desired, the image processor selects the joint corresponding to the arm, selects the x, y, and z angles of the joint, and augments the angle values with a range value between 0 and 10.

위의 방법에 의해 생성되는 DHT는 다양한 분야에 적용될 수 있으며, 여기에는 영상 학습 시스템에서 DHT로 적용할 수 있다. 영상 학습 시스템에서 교수자의 r감성 및 언어적 표현뿐 아니라 몸의 제스처로 나타나는 비언어적 표현까지도 학습자에게 효과적으로 전달할 수 있고, 따라서 학습효율의 상승이 가능하게 된다. 이러한 비언어적 표현의 전달은 가상 세계에서도 유용하게 사용될 수 있다.The DHT generated by the above method can be applied to various fields, including as a DHT in video learning systems. In video learning systems, the instructor's emotional and verbal expressions, as well as nonverbal expressions expressed through body gestures, can be effectively conveyed to learners, thereby enhancing learning efficiency. This nonverbal communication can also be effectively utilized in virtual worlds.

도14는 전술한 바와 같은 DHT에 의한 강의 영상을 제작하는 과정을 도식화한 것이며, 도15은 이를 위한 시스템의 구조를 개략적으로 보인다.Figure 14 is a diagram illustrating the process of producing a lecture video using DHT as described above, and Figure 15 schematically shows the structure of the system for this.

도14과 도15을 참조하면 강의 영상의 제작에는 두 개의 카메라(31a, 31b)가 필요하다. Referring to Figures 14 and 15, two cameras (31a, 31b) are required to produce a lecture video.

하나의 카메라(31a)는 실제 교수자의 얼굴을 촬영하여 영상 프로세서(31)를거친 후 전술한 바와 같은 과정을 통해 교수자의 외형과 닮은 DHT를 캐릭터 생성부(33)를 이용해 생성한다.One camera (31a) captures the face of an actual professor, passes it through an image processor (31), and then creates a DHT resembling the professor's appearance using a character generator (33) through the process described above.

그리고 다른 하나의 카메라(31b)로는 교수자에 의한 강의 내용을 촬영하고, 이로부터 전술한 바와 같은 과정을 통해 영상 프로세서(32)에 의해 안면 및 관절을 검출하고 및 특성 프로세서(34)에 의해 교수자의 표정, 시선, 제스쳐 등의 변화를 감지하여 특성 값 변수를 측정 또는 추출한다.And, another camera (31b) records the lecture content by the professor, and from this, through the process described above, the face and joints are detected by the image processor (32), and changes in the professor's facial expression, gaze, gestures, etc. are detected by the characteristic processor (34), and characteristic value variables are measured or extracted.

위와 같은 과정에서 얻은 DHT 모델에 대해 상기 특성값 변수를 대입 또는 이식함으로써 DHT 모델을 활성화한다.The DHT model is activated by substituting or transplanting the above characteristic value variables into the DHT model obtained through the above process.

DHT 모델의 활성화는 교수자의 외형에 더불어 교수자의 얼굴 표정, 시선, 제스쳐를 가지도록 DHT 모델에 이식하는 DHT 모델 특성 제어부(35)에 의해 이루어진다. 이러한 과정을 통해 생성된 강의 영상은 매체에 저장되고 그리고 상기 매체를 통해 강의 영상은 배포된다.The DHT model is activated by the DHT model feature control unit (35), which implants the instructor's facial expressions, gaze, and gestures into the DHT model in addition to the instructor's appearance. The lecture video generated through this process is stored on a medium and distributed through the medium.

도16은 본 개시의 한 실시 예에 따라, 상기 강의 영상을 이용해 온라인 강의를 수강하는 수강 시스템(1)을 개략적으로 도시한다.FIG. 16 schematically illustrates a lecture system (1) for taking an online lecture using the lecture video according to one embodiment of the present disclosure.

상기 수강 시스템(1)은 상기 강의 영상 자료를 내려 받거나 아니면 스트리밍을 통해 재생하며, 이를 학습자(20)에게 디스플레이(12)를 통해 전달한다. 학습자(20)에게 디스플레이(12)를 통해 제시되는 강의 영상은 대부분 음향 성분을 포함하므로 이를 재생하는 음향 장치가 수강 시스템(1)에 추가로 부가될 수 있다. 상기 수강 시스템(1)은 일반적인 컴퓨터를 기반으로 하므로 컴퓨터에 기본적으로 장착되는 키보드(14), 마우스(15), 모니터(12) 등 입출력 장치 및 이들이 연결되는 본체(11)를 포함하는 컴퓨터 시스템(1)을 기반하는 하드웨어의 구조를 가진다.The above lecture system (1) downloads the above lecture video material or plays it through streaming, and transmits it to the learner (20) through the display (12). Since the lecture video presented to the learner (20) through the display (12) mostly includes audio components, an audio device for playing the audio components may be additionally added to the lecture system (1). Since the above lecture system (1) is based on a general computer, it has a hardware structure based on a computer system (1) including input/output devices such as a keyboard (14), a mouse (15), and a monitor (12) that are basically installed in a computer, and a main body (11) to which they are connected.

이상에서 살펴본 바와 같이 본 발명의 모범적 실시 예들에 대해 상세히 기술되었지만, 본 발명이 속하는 기술분야에 있어서 통상의 지식을 가진 사람이라면, 첨부된 청구 범위에 정의된 본 발명의 정신 및 범위를 벗어나지 않으면서 본 발명을 여러 가지로 변형하여 실시할 수 있을 것이다. 따라서 본 발명의 앞으로의 실시 예들의 변경은 본 발명의 기술을 벗어날 수 없을 것이다.While exemplary embodiments of the present invention have been described in detail above, those skilled in the art will appreciate that various modifications and variations can be made to the present invention without departing from the spirit and scope of the invention as defined in the appended claims. Therefore, modifications to future embodiments of the present invention will not depart from the scope of the present invention.

Claims

A step of acquiring a video of the actual lecturer and facial image through a camera;
A step of extracting the instructor's joint (keypoint) coordinates and facial appearance features from the lecture video and facial image by an image processor;
A step of determining an augmentation weight for the change in the external feature and the value of the joint coordinate from the external feature of the face by a characteristic processor;
A step of generating a digital human tutor corresponding to the instructor's facial image using the instructor's facial feature image by the character generation unit; and
A step of reflecting the external characteristics of the instructor and joint coordinates according to the change in gestures to the digital human tutor by the character control unit, and changing the joint coordinates of the digital human tutor by the joint coordinates augmented by the augmented weights to activate the digital human tutor of the augmented gesture compared to the instructor's gesture;
Here,
In the step of extracting the external characteristics of the above professor,
Extract the instructor's characteristic points from the above lecture video, extract the movement data of the extracted characteristic points, and extract the fine expression data from the movement data.
In order to extract unconscious micro-expression data from the above micro-expression data,
A method for creating a gesture-augmented digital human tutor, characterized in that filtering of a predetermined frequency is performed on the above micro-expression data, the periodicity of the heartbeat is determined by principal component analysis (PCA) on the filtered micro-expression data, and the periodicity is used as an input value for the micro-expression of the digital human tutor.

In the first paragraph,
Step of extracting the above joint coordinates:
A step of extracting joint coordinates on a two-dimensional plane by an image analysis unit; and
A method for creating a gesture-augmented digital human tutor, comprising: a step of extracting three-dimensional joint coordinates (x, y, z) by inferring a third direction (z) perpendicular to the two-dimensional plane using a 3D analyzer;

In the second paragraph,
In the step of determining the above augmentation weight,
A method for creating a gesture-augmented digital human tutor, wherein a weight is determined for at least one joint coordinate among three coordinates x, y, and z in the above three-dimensional joint coordinates.

delete

In the first paragraph,
A method for creating a gesture-augmented digital human tutor, which applies a KLT (Kanade-Lucas-Tomasi) tracking algorithm or a TM (Transformation Matrix)-based tracking algorithm to track the movement of the feature points in order to extract the above-mentioned micro-expression data.

delete

In any one of paragraphs 1, 2, 3 or 5,
The above-mentioned instructor's external features are extracted as landmarks defined in FACS, and a method for creating a gesture-augmented digital human tutor reflecting the external features in units of AUs based on the landmarks for the above-mentioned digital human tutor.

One or more cameras that capture video of the actual instructor's lecture and facial images;
An image processor that extracts the actual instructor's joint (keypoint) coordinates and facial features from the above lecture video and facial image;
A feature processor that extracts changes in facial external features from a facial image during a lecture by the instructor and changes in joint coordinates from the lecture video, and determines augmentation weights for the values of the joint coordinates, extracts feature points of the instructor from the lecture video, extracts movement data of the extracted feature points, extracts micro-expression data from the movement data, performs filtering of a predetermined frequency on the micro-expression data to extract unconscious micro-expression data from the micro-expression data, and determines the periodicity of a heartbeat used as an input value for micro-expression of a digital human tutor by principal component analysis (PCA) on the filtered micro-expression data;
A character generation unit that uses the facial image of the instructor to generate the digital human tutor corresponding to the facial image of the instructor;
A character control unit that reflects the above-mentioned external characteristics to the digital human tutor, and activates the digital human tutor to change the joint coordinates of the digital human tutor by the joint coordinates augmented by the augmented weight; and
A system for generating a gesture-augmented digital human tutor, comprising a lecture video generation unit that generates a lecture video including the digital human tutor.

In paragraph 9,
A system for generating a gesture-augmented digital human tutor, wherein the character generation unit extracts feature points from the facial image in the step of extracting external features and sets the appearance of the digital human tutor using the feature points.

delete

In Article 10,
A character generation system for generating a digital human tutor that selects landmarks defined in FACS as the above-mentioned characteristic points.

delete

In Article 10,
The above-mentioned feature processor is a system for generating a gesture-augmented digital human tutor, which applies a KLT (Kanade-Lucas-Tomasi) tracking algorithm or a TM (Transformation Matrix)-based tracking algorithm to motion tracking for the feature points in order to extract the micro-expression data.

delete

In paragraph 9,
The above characteristic processor extracts landmarks defined in FACS as the external features of the instructor, and a gesture-augmented digital human tutor generation system that reflects the external features in units of facial muscle AU (Action Unit) by the landmarks for the digital human tutor.

In paragraph 9,
The above characteristic processor is,
The joint coordinates on a two-dimensional plane are extracted by the image analysis unit, and
A system for generating a gesture-augmented digital human tutor that extracts three-dimensional joint coordinates (x, y, z) by inferring a third direction (z) perpendicular to the two-dimensional plane using a 3D analyzer.

In Article 17,
A system for generating a gesture-augmented digital human tutor, wherein the above-mentioned characteristic processor determines an augmentation weight for at least one joint coordinate among three coordinates of x, y, and z in the three-dimensional joint coordinates.