KR20220004156A

KR20220004156A - Car cabin interaction method, device and vehicle based on digital human

Info

Publication number: KR20220004156A
Application number: KR1020217039210A
Authority: KR
Inventors: 빈 쩡; 췬옌 저우; 커 리; 양핑 우; 량 쉬; 스징 정; 쥔 우; 페이 왕; 천 첸
Original assignee: 상하이 센스타임 린강 인텔리전트 테크놀로지 컴퍼니 리미티드
Priority date: 2020-03-30
Filing date: 2020-12-17
Publication date: 2022-01-11
Also published as: JP2023500099A; JP7469467B2; WO2021196751A1

Abstract

본 출원의 실시예는 디지털 휴먼에 기반한 자동차 캐빈 인터랙션 방법, 장치 및 차량을 제공하고, 자동차 캐빈 내의 생체의 상태 정보를 획득하고, 상기 상태 정보와 매칭되는 동작 정보를 결정하며, 상기 동작 정보에 기반하여 디지털 휴먼이 상응한 동작을 실행하는 애니메이션을 생성하고 또한 상기 자동차 캐빈 내의 표시 디바이스에 표시한다.An embodiment of the present application provides a method, apparatus and vehicle for car cabin interaction based on a digital human, obtains state information of a living body in a car cabin, determines operation information matching the state information, and based on the operation information to create an animation in which the digital human performs the corresponding action and also displays it on a display device in the car cabin.

Description

Car cabin interaction method, device and vehicle based on digital human

[관련 출원에 대한 상호참조][Cross-Reference to Related Applications]

본 출원은 2020년 3월 30일에 중국 지적재산국에 제출한 출원번호가 CN202010239259.7이고, 발명의 명칭이 "어린이 상태 검출 방법 및 장치, 전자기기, 저장 매체" 및 2020년 6월 23일에 중국 특허국에 제출한 출원번호가 CN202010583637.3이고, 발명의 명칭이 "디지털 휴먼에 기반한 자동차 캐빈 인터랙션 방법, 장치 및 차량"인 중국 특허 출원의 우선권을 주장하며, 그 모든 내용은 원용을 통해 본 출원에 포함된다.This application is filed with the Intellectual Property Office of China on March 30, 2020, the application number is CN202010239259.7, the title of the invention is "Child state detection method and device, electronic device, storage medium" and June 23, 2020 Claims the priority of the Chinese patent application filed with the Chinese Patent Office at CN202010583637.3, titled "Digital Human-Based Car Cabin Interaction Method, Device and Vehicle" Included in this application.

[기술분야][Technical field]

본 출원은 컴퓨터 시각 기술 분야에 관한 것이고, 특히 디지털 휴먼에 기반한 자동차 캐빈 인터랙션 방법, 장치 및 차량에 관한 것이다.The present application relates to the field of computer vision technology, and more particularly to a method, apparatus and vehicle for car cabin interaction based on a digital human.

현재, 많은 차량에는 차량 내의 생체와 인터랙션을 실행하기 위한 감시 제어 제품이 장착되어 있다. 그러나, 기존의 감시 제어 제품은 인터랙션 방식이 딱딱하며, 인성화가 부족하다.Currently, many vehicles are equipped with monitoring and control products for performing interactions with living organisms in the vehicle. However, the existing monitoring and control products have a hard interaction method and lack toughness.

본 출원은 디지털 휴먼에 기반한 자동차 캐빈 인터랙션 방법, 장치 및 차량을 제공한다.The present application provides an automobile cabin interaction method, apparatus, and vehicle based on a digital human.

본 출원의 실시예의 제1 태양에 따르면, 디지털 휴먼에 기반한 자동차 캐빈 인터랙션 방법을 제공하고, 상기 방법은 자동차 캐빈에 승차한 생체의 상태 정보를 획득하는 단계; 상기 상태 정보와 매칭되는 동작 정보를 결정하는 단계; 및 상기 동작 정보에 기반하여 디지털 휴먼이 상응한 동작을 실행하는 애니메이션을 생성하고 또한 상기 자동차 캐빈 내의 표시 디바이스에 표시하는 단계를 포함한다.According to a first aspect of an embodiment of the present application, there is provided a method for interacting with a car cabin based on a digital human, the method comprising: obtaining status information of a living body riding in a car cabin; determining operation information matching the state information; and generating an animation in which the digital human performs a corresponding motion based on the motion information and displaying the animation on a display device in the automobile cabin.

일부 실시예에 있어서, 상기 동작 정보에 기반하여 디지털 휴먼이 상응한 동작을 실행하는 애니메이션을 생성하고 또한 상기 자동차 캐빈 내의 표시 디바이스에 표시하는 단계는, 상기 상태 정보와 매칭되는 음성 정보를 결정하는 단계; 상기 음성 정보에 기반하여 타임스탬프를 포함하는 대응하는 음성을 획득하는 단계; 및 상기 음성을 재생하는 동시에, 상기 동작 정보에 기반하여 상기 디지털 휴먼이 상기 타임스탬프에 대응하는 시각에 상기 동작을 실행하는 애니메이션을 생성하고 표시하는 단계를 포함한다.In some embodiments, the step of generating an animation in which a digital human performs a corresponding action based on the motion information and displaying the animation on a display device in the car cabin includes: determining voice information matching the status information ; obtaining a corresponding voice including a timestamp based on the voice information; and generating and displaying an animation in which the digital human executes the motion at a time corresponding to the timestamp based on the motion information while reproducing the voice.

일부 실시예에 있어서, 상기 동작은 복수의 서브 동작을 포함하고, 각 서브 동작은 상기 음성 중의 하나의 음소와 매칭되며, 상기 타임스탬프는 각 음소의 타임스탬프를 포함하고; 상기 동작 정보에 기반하여 상기 디지털 휴먼이 상기 타임스탬프에 대응하는 시각에 상기 동작을 실행하는 애니메이션을 생성하고 표시하는 단계는, 각 음소의 타임스탬프에 기반하여, 상기 각 음소와 매칭되는 서브 동작의 실행 시간을 결정하는 단계; 및 상기 동작 정보에 기반하여, 상기 디지털 휴먼이 각 음소의 타임스탬프에 당해 음소와 매칭되는 서브 동작을 실행하는 애니메이션을 생성하고 표시하는 단계를 포함한다.In some embodiments, the operation includes a plurality of sub-operations, each sub-operation matches one phoneme in the voice, and the timestamp includes a timestamp of each phoneme; The step of generating and displaying an animation in which the digital human executes the motion at a time corresponding to the timestamp based on the motion information includes: based on the timestamp of each phoneme, a sub-action matching each phoneme. determining an execution time; and generating and displaying, based on the motion information, an animation in which the digital human executes a sub-action matching the corresponding phoneme at a timestamp of each phoneme.

일부 실시예에 있어서, 상기 동작 정보에 기반하여 디지털 휴먼이 상응한 동작을 실행하는 애니메이션을 생성하고 또한 상기 자동차 캐빈 내의 표시 디바이스에 표시하는 단계는, 동작 모델 베이스에서 상기 동작 정보에 대응하는 적어도 하나의 디지털 휴먼의 세그먼트를 호출하는 단계; 및 상기 적어도 하나의 디지털 휴먼의 동작 이미지 프레임 중의 각 프레임을 순서대로 상기 표시 디바이스에 표시하는 단계를 포함한다.In some embodiments, the step of generating an animation in which the digital human executes a corresponding motion based on the motion information and displaying the animation on a display device in the automobile cabin includes at least one corresponding to the motion information in a motion model base calling a segment of the digital human of; and sequentially displaying each frame of the frame of the motion image of the at least one digital human on the display device.

일부 실시예에 있어서, 상기 생체의 상태 정보는 상기 생체의 제1 상태 정보를 포함하고, 상기 자동차 캐빈에 승차한 생체의 상태 정보를 획득하는 단계는, 자동차 캐빈 내 뒷줄의 감시 제어 비디오를 수집하는 단계; 및 상기 감시 제어 비디오에 대해 생체 검출을 실행하고 검출된 생체에 대해 상태 분석을 실행하여, 상기 생체의 제1 상태 정보를 얻는 단계를 포함한다.In some embodiments, the state information of the living body includes the first state information of the living body, and the step of obtaining the state information of the living body riding in the car cabin includes collecting a surveillance control video of a rear row in the car cabin. step; and performing biometric detection on the monitoring control video and performing state analysis on the detected living body to obtain first state information of the living body.

일부 실시예에 있어서, 상기 감시 제어 비디오는 상기 자동차 캐빈 내의 백미러에 장착되고, 또한 렌즈가 자동차 캐빈 뒷줄을 향하는 비디오 수집 장치를 통해 획득한다.In some embodiments, the surveillance control video is captured via a video acquisition device mounted on a rearview mirror in the car cabin and with a lens facing the back row of the car cabin.

일부 실시예에 있어서, 상기 제1 상태 정보는 상기 생체의 분류 정보, 신분 정보, 속성 정보, 정서 정보, 표정 정보, 몸짓 정보, 좌석 착석 정보, 안전벨트 착용 정보 중 적어도 하나를 포함하고; 및/또는 상기 생체는 운전자, 부조종사, 어린이, 노인, 애완동물, 뒷줄 승차인 중 적어도 하나를 포함한다.In some embodiments, the first state information includes at least one of classification information of the living body, identification information, attribute information, emotion information, facial expression information, gesture information, seating information, and seat belt wearing information; and/or the living body includes at least one of a driver, a co-pilot, a child, an elderly person, a pet, and a back-row passenger.

일부 실시예에 있어서, 상기 생체의 상태 정보는 상기 생체의 제1 상태 정보 및 제2 상태 정보를 포함하고, 상기 제1 상태 정보는 자동차 캐빈 내의 감시 제어 비디오에 기반하여 획득하며; 상기 자동차 캐빈 내의 생체의 상태 정보를 획득하는 단계는 상기 생체가 휴대한 스마트기기에서 발송되는 제2 상태 정보를 획득하는 단계를 더 포함하고; 상기 상태 정보와 매칭되는 동작 정보를 결정하는 단계는 상기 제1 상태 정보 및 상기 제2 상태 정보와 모두 매칭되는 동작 정보를 결정하는 단계를 포함한다.In some embodiments, the state information of the living body includes the first state information and second state information of the living body, wherein the first state information is obtained based on a surveillance control video in a car cabin; The acquiring of the status information of the living body in the car cabin further includes acquiring second status information sent from a smart device carried by the living body; The determining of the operation information matching the state information includes determining operation information matching both the first state information and the second state information.

일부 실시예에 있어서, 상기 제2 상태 정보는 건강 상태 정보 및 신경 계통 상태 정보 중 적어도 어느 하나를 포함한다.In some embodiments, the second state information includes at least one of health state information and nervous system state information.

일부 실시예에 있어서, 상기 자동차 캐빈에 승차한 생체의 상태 정보를 획득하는 단계는 상기 자동차 캐빈 내의 감시 제어 비디오를 미리 훈련된 신경망에 입력하는 단계; 및 상기 신경망의 출력 결과에 기반하여 상기 생체의 상태 정보를 결정하는 단계를 포함한다.In some embodiments, the obtaining of the state information of the living body riding in the car cabin includes: inputting a surveillance control video in the car cabin into a pre-trained neural network; and determining the state information of the living body based on the output result of the neural network.

일부 실시예에 있어서, 상기 방법은 상기 동작 정보에 기반하여 디지털 휴먼이 상응한 동작을 실행하는 애니메이션을 생성하고 또한 상기 자동차 캐빈 내의 표시 디바이스에 표시하기 전에, 상기 디지털 휴먼의 형상을 생성하는 단계를 더 포함한다.In some embodiments, the method includes generating an animation of a digital human performing a corresponding action based on the motion information and generating the shape of the digital human before displaying it on a display device in the car cabin include more

일부 실시예에 있어서, 상기 디지털 휴먼의 형상을 생성하는 단계는 상기 생체의 상태 정보에 기반하여 상기 디지털 휴먼의 형상을 생성하는 단계; 또는 미리 설정된 디지털 휴먼의 형상 템플릿에 기반하여 상기 디지털 휴먼의 형상을 생성하는 단계를 포함한다.In some embodiments, generating the shape of the digital human comprises: generating the shape of the digital human based on the state information of the living body; or generating a shape of the digital human based on a preset shape template of the digital human.

일부 실시예에 있어서, 상기 생체의 속성 정보는 연령, 성별, 용모, 체형, 의복과 장신구, 헤어스타일 및 피부색 중 적어도 어느 하나를 포함한다.In some embodiments, the attribute information of the living body includes at least one of age, gender, appearance, body type, clothes and accessories, hair style, and skin color.

일부 실시예에 있어서, 상기 방법은 상기 상태 정보에 기반하여 차량 탑재 디바이스의 실행 상태를 제어하는 단계를 더 포함한다.In some embodiments, the method further includes controlling an execution state of the on-vehicle device based on the state information.

일부 실시예에 있어서, 상기 상태 정보와 매칭되는 동작 정보를 결정하는 단계는, 차량의 주행 상태를 획득하는 단계; 및 상기 차량의 주행 상태 및 상기 상태 정보와 각각 매칭되는 동작 정보를 결정하는 단계를 포함한다.In some embodiments, the determining of the operation information matching the state information includes: obtaining a driving state of the vehicle; and determining the driving state of the vehicle and operation information respectively matched with the state information.

일부 실시예에 있어서, 상기 자동차 캐빈에 승차한 생체의 상태 정보를 획득하는 단계는 상기 자동차 캐빈 내의 목표 이미지에 기반하여, 상기 자동차 캐빈에 승차한 생체를 인식하는 단계; 및 상기 생체의 위치 정보에 기반하여, 상기 생체가 상기 자동차 캐빈 내의 좌석에 위치하는지 여부를 결정하는 단계를 더 포함한다.In some embodiments, the obtaining of the state information of the living body riding in the automobile cabin includes: recognizing the living body riding in the automobile cabin based on a target image in the automobile cabin; and determining whether the living body is located in a seat in the vehicle cabin based on the location information of the living body.

일부 실시예에 있어서, 자동차 캐빈에 승차한 생체의 상태 정보를 획득하는 단계는, 상기 자동차 캐빈 내의 목표 이미지 중 각각의 객체의 객체 정보를 결정하는 단계 - 여기서 각각의 상기 객체에 관련하여, 당해 객체의 객체 정보는 당해 객체의 중심점의 위치 정보 및 당해 객체의 중심점에 대응하는 객체 유형 정보를 포함함 -; 각각의 상기 객체의 객체 유형 정보에 기반하여 상기 객체에서 상기 생체 및 상기 자동차 캐빈 내의 좌석을 선별하는 단계; 및 상기 생체의 중심점 위치 및 상기 좌석의 중심점 위치에 기반하여 상기 생체가 상기 좌석에 위치하는지 여부를 결정하는 단계를 포함한다.In some embodiments, the obtaining of the state information of the living body riding in the car cabin comprises: determining object information of each object from among the target images in the car cabin, wherein, with respect to each of the objects, the object the object information includes location information of the center point of the object and object type information corresponding to the center point of the object; selecting the living body and the seat in the car cabin from the object based on object type information of each of the objects; and determining whether the living body is located in the seat based on the center point position of the living body and the center point position of the seat.

일부 실시예에 있어서, 상기 동작 정보에 기반하여 디지털 휴먼이 상응한 동작을 실행하는 애니메이션을 생성하고 또한 상기 자동차 캐빈 내의 표시 디바이스에 표시하는 단계는, 상기 좌석에 위치하지 않은 생체에 대응하는 디지털 휴먼이 좌석에 착석하고 또한 안전벨트를 착용하는 시범 동작을 실행하는 애니메이션을 생성하고 또한 상기 자동차 캐빈 내의 표시 디바이스에 표시하는 단계를 포함한다.In some embodiments, the step of generating an animation in which the digital human executes the corresponding motion based on the motion information and displaying it on a display device in the car cabin comprises: a digital human corresponding to a living body not located in the seat generating an animation of performing a demonstration motion of sitting in the seat and wearing a seat belt and displaying the animation on a display device in the automobile cabin.

일부 실시예에 있어서, 상기 방법은 상기 생체가 상기 좌석에 위치하지 않는다는 결정에 응답하여, 프롬프트 메시지를 발송하는 단계를 더 포함한다.In some embodiments, the method further comprises, in response to determining that the living body is not located in the seat, sending a prompt message.

일부 실시예에 있어서, 상기 목표 이미지 중 당해 객체의 중심점의 위치 정보를 결정하는 것은, 상기 목표 이미지에 대해 특징 추출을 실행하여, 상기 목표 이미지에 대응하는 제1 특징맵을 얻는 단계; 상기 제1 특징맵의 제1 미리 설정된 채널 중에서, 상기 제1 특징맵 중 각 특징점을 객체 중심점의 응답치로서 획득하는 단계; 상기 제1 특징맵을 복수의 서브 영역으로 분할하고, 또한 각 서브 영역 내 최대 응답치 및 최대 응답치에 대응하는 특징점을 결정하는 단계; 및 최대 응답치가 미리 설정된 임계값보다 큰 특징점을 당해 객체의 중심점으로 하고, 또한 상기 제1 특징맵에서의 당해 객체의 중심점의 위치 인덱스에 기반하여 당해 객체의 중심점의 위치 정보를 결정하는 단계를 포함한다.In some embodiments, determining the location information of the center point of the object in the target image includes: performing feature extraction on the target image to obtain a first feature map corresponding to the target image; acquiring each feature point in the first feature map from among first preset channels of the first feature map as a response value of an object center point; dividing the first feature map into a plurality of sub-regions and determining a maximum response value and a feature point corresponding to the maximum response value in each sub-region; and using a feature point with a maximum response value greater than a preset threshold as the center point of the object, and determining the location information of the center point of the object based on the location index of the center point of the object in the first feature map. do.

일부 실시예에 있어서, 상기 목표 이미지 중 당해 객체의 중심점에 대응하는 객체 유형 정보를 결정하는 것은, 상기 목표 이미지에 대해 특징 추출을 실행하여, 상기 목표 이미지에 대응하는 제2 특징맵을 얻는 단계; 상기 제1 특징맵에서의 당해 객체의 중심점의 위치 인덱스에 기반하여, 상기 제2 특징맵에서의 당해 객체의 중심점의 위치 인덱스를 결정하는 단계; 및 상기 제2 특징맵에서의 당해 객체의 중심점의 위치 인덱스에 대응하는 위치에서, 당해 객체의 중심점에 대응하는 객체 유형 정보를 획득하는 단계를 포함한다.In some embodiments, determining the object type information corresponding to the center point of the object in the target image includes: performing feature extraction on the target image to obtain a second feature map corresponding to the target image; determining a location index of the center point of the object in the second feature map based on the location index of the center point of the object in the first feature map; and acquiring object type information corresponding to the center point of the object at a location corresponding to the location index of the center point of the object in the second feature map.

본 출원의 실시예의 제2 태양에 따르면, 디지털 휴먼에 기반한 자동차 캐빈 인터랙션 장치를 제공하고, 상기 장치는 자동차 캐빈에 승차한 생체의 상태 정보를 획득하도록 설정되는 획득 모듈; 상기 상태 정보와 매칭되는 동작 정보를 결정하도록 설정되는 결정 모듈; 및 상기 동작 정보에 기반하여 디지털 휴먼이 상응한 동작을 실행하는 애니메이션을 생성하고 또한 상기 자동차 캐빈 내의 표시 디바이스에 표시하도록 설정되는 표시 모듈을 포함한다.According to a second aspect of the embodiment of the present application, there is provided an automobile cabin interaction device based on a digital human, the device comprising: an acquisition module configured to acquire status information of a living body riding in an automobile cabin; a determining module, configured to determine operation information matching the state information; and a display module configured to generate an animation in which a digital human performs a corresponding operation based on the operation information and display the animation on a display device in the automobile cabin.

일부 실시예에 있어서, 상기 표시 모듈은 상기 상태 정보와 매칭되는 음성 정보를 결정하도록 설정되는 제1 결정 유닛; 상기 음성 정보에 기반하여 타임스탬프를 포함하는 대응하는 음성을 획득하도록 설정되는 제1 획득 유닛; 및 상기 음성을 재생하는 동시에, 상기 동작 정보에 기반하여 상기 디지털 휴먼이 상기 타임스탬프에 대응하는 시각에 상기 동작을 실행하는 애니메이션을 생성하고 표시하도록 설정되는 제1 표시 유닛을 포함한다.In some embodiments, the display module includes: a first determining unit, configured to determine voice information matching the state information; a first acquiring unit, configured to acquire a corresponding voice including a timestamp based on the voice information; and a first display unit configured to reproduce the voice and simultaneously generate and display an animation in which the digital human executes the operation at a time corresponding to the timestamp based on the operation information.

일부 실시예에 있어서, 상기 동작은 복수의 서브 동작을 포함하고, 각 서브 동작은 상기 음성 중의 하나의 음소와 매칭되며, 상기 타임스탬프는 각 음소의 타임스탬프를 포함하고; 상기 제1 표시 유닛은, 각 음소의 타임스탬프에 기반하여, 상기 각 음소와 매칭되는 서브 동작의 실행 시간을 결정하도록 설정되는 결정 서브 유닛; 및 상기 동작 정보에 기반하여, 상기 디지털 휴먼이 각 음소의 타임스탬프에 실행하는 상기 각 음소와 매칭되는 서브 동작의 애니메이션을 생성하고 표시하도록 설정되는 표시 서브 유닛을 포함한다.In some embodiments, the operation includes a plurality of sub-operations, each sub-operation matches one phoneme in the voice, and the timestamp includes a timestamp of each phoneme; The first display unit may include: a determining subunit configured to determine an execution time of a sub-operation matching each phoneme based on a timestamp of each phoneme; and a display sub-unit configured to generate and display an animation of a sub-action matching each phoneme that the digital human executes at a timestamp of each phoneme, based on the motion information.

일부 실시예에 있어서, 상기 표시 모듈은 동작 모델 베이스에서 상기 동작 정보에 대응하는 적어도 하나의 디지털 휴먼의 동작 세그먼트를 호출하도록 설정되는 호출 유닛; 및 상기 적어도 하나의 디지털 휴먼의 동작 이미지 프레임 중의 각 프레임을 순서대로 상기 표시 디바이스에 표시하도록 설정되는 제2 표시 유닛을 포함한다.In some embodiments, the display module comprises: a calling unit, configured to call a motion segment of at least one digital human corresponding to the motion information in a motion model base; and a second display unit configured to display each frame of the frame of the motion image of the at least one digital human on the display device in order.

일부 실시예에 있어서, 상기 생체의 상태 정보는 상기 생체의 제1 상태 정보를 포함하고, 상기 획득 모듈은 자동차 캐빈 내 뒷줄의 감시 제어 비디오를 수집하도록 설정되는 수집 유닛; 및 상기 감시 제어 비디오에 대해 생체 검출을 실행하고 검출된 생체에 대해 상태 분석을 실행하여, 상기 생체의 제1 상태 정보를 얻도록 설정되는 검출 분석 유닛을 포함한다.In some embodiments, the state information of the living body includes the first state information of the living body, and the acquiring module includes: a collecting unit configured to collect a surveillance control video of a rear row in a car cabin; and a detection and analysis unit, configured to execute biometric detection on the monitoring control video and perform state analysis on the detected living body to obtain first state information of the living body.

일부 실시예에 있어서, 상기 제1 상태 정보는 상기 생체의 분류 정보, 신분 정보, 속성 정보, 정서 정보, 표정 정보, 몸짓 정보, 좌석 착석 정보, 안전벨트 착용 정보 중 적어도 하나를 포함; 및/또는 상기 생체는 운전자, 부조종사, 어린이, 노인, 애완동물, 뒷줄 승차인 중 적어도 하나를 포함한다.In some embodiments, the first state information includes at least one of classification information of the living body, identification information, attribute information, emotion information, facial expression information, gesture information, seating information, and seat belt wearing information; and/or the living body includes at least one of a driver, a co-pilot, a child, an elderly person, a pet, and a back-row passenger.

일부 실시예에 있어서, 상기 생체의 상태 정보는 상기 생체의 제1 상태 정보 및 제2 상태 정보를 포함하고, 상기 제1 상태 정보는 자동차 캐빈 내의 감시 제어 비디오에 기반하여 획득하며; 상기 획득 모듈은 상기 생체가 휴대한 스마트기기에서 발송되는 제2 상태 정보를 획득하도록 추가로 설정되고; 상기 결정 모듈은 상기 제1 상태 정보 및 상기 제2 상태 정보와 모두 매칭되는 동작 정보를 결정하도록 설정된다.In some embodiments, the state information of the living body includes the first state information and second state information of the living body, wherein the first state information is obtained based on a surveillance control video in a car cabin; the acquiring module is further configured to acquire second status information sent from the smart device carried by the living body; The determining module is configured to determine operation information matching both the first state information and the second state information.

일부 실시예에 있어서, 상기 획득 모듈은 상기 자동차 캐빈 내의 감시 제어 비디오를 미리 훈련된 신경망에 입력하도록 설정되는 입력 유닛; 및 상기 신경망의 출력 결과에 기반하여 상기 생체의 상태 정보를 결정하도록 설정되는 제2 결정 유닛을 포함한다.In some embodiments, the acquiring module comprises: an input unit, configured to input the surveillance control video in the car cabin into a pre-trained neural network; and a second determining unit configured to determine the state information of the living body based on an output result of the neural network.

일부 실시예에 있어서, 상기 장치는 상기 동작 정보에 기반하여 디지털 휴먼이 상응한 동작을 실행하는 애니메이션을 생성하고 또한 상기 자동차 캐빈 내의 표시 디바이스에 표시하기 전에, 상기 디지털 휴먼의 형상을 생성하도록 설정되는 생성 모듈을 더 포함한다.In some embodiments, the apparatus is configured to generate an animation in which a digital human performs a corresponding operation based on the operation information and to generate the shape of the digital human before displaying it on a display device in the car cabin It further includes a creation module.

일부 실시예에 있어서, 상기 생성 모듈은 상기 생체의 상태 정보에 기반하여 상기 디지털 휴먼의 형상을 생성하거나; 또는 미리 설정된 디지털 휴먼의 형상 템플릿에 기반하여 상기 디지털 휴먼의 형상을 생성하도록 설정된다.In some embodiments, the generating module generates the shape of the digital human based on the state information of the living body; Alternatively, it is configured to generate the shape of the digital human based on a preset shape template of the digital human.

일부 실시예에 있어서, 상기 장치는 상기 상태 정보에 기반하여 차량 탑재 디바이스의 실행 상태를 제어하도록 설정되는 제어 모듈을 더 포함한다.In some embodiments, the apparatus further includes a control module configured to control an execution state of the on-vehicle device based on the state information.

일부 실시예에 있어서, 상기 결정 모듈은 차량의 주행 상태를 획득하고; 상기 차량의 주행 상태 및 상기 상태 정보와 각각 매칭되는 동작 정보를 결정하도록 설정된다.In some embodiments, the determining module is configured to obtain a driving state of the vehicle; It is set to determine the driving state of the vehicle and operation information respectively matched with the state information.

일부 실시예에 있어서, 상기 획득 모듈은 상기 자동차 캐빈 내의 목표 이미지에 기반하여, 상기 자동차 캐빈에 승차한 생체를 인식하고; 상기 생체의 위치 정보에 기반하여, 상기 생체가 상기 자동차 캐빈 내의 좌석에 위치하는지 여부를 결정하도록 추가로 설정된다.In some embodiments, the acquisition module is configured to recognize a living body in the car cabin based on the target image in the car cabin; It is further configured to determine whether the living body is located in a seat in the automobile cabin based on the location information of the living body.

실시예에 있어서, 상기 획득 모듈은 상기 자동차 캐빈 내의 목표 이미지 중 각각의 객체의 객체 정보를 결정하고; 각각의 상기 객체의 객체 유형 정보에 기반하여 상기 객체에서 상기 생체 및 상기 자동차 캐빈 내의 좌석을 선별하고; 상기 생체의 중심점 위치 및 상기 좌석의 중심점 위치에 기반하여 상기 생체가 상기 좌석에 위치하는지 여부를 결정하도록 추가로 설정되고, 여기서 각각의 상기 객체에 관련하여, 당해 객체의 객체 정보는 당해 객체의 중심점의 위치 정보 및 당해 객체의 중심점에 대응하는 객체 유형 정보를 포함한다.In an embodiment, the acquiring module is configured to determine object information of each object in the target image in the vehicle cabin; selecting the living body and the seat in the car cabin in the object based on object type information of each of the objects; further configured to determine whether the living body is located in the seat based on the center point position of the living body and the center point position of the seat, wherein, with respect to each of the objects, the object information of the object is the center point of the object location information and object type information corresponding to the center point of the object.

일부 실시예에 있어서, 상기 표시 모듈은 상기 좌석에 위치하지 않은 생체에 대응하는 디지털 휴먼이 좌석에 착석하고 또한 안전벨트를 착용하는 시범 동작을 실행하는 애니메이션을 생성하고 또한 상기 자동차 캐빈 내의 표시 디바이스에 표시하도록 추가로 설정된다.In some embodiments, the display module generates an animation in which a digital human corresponding to a living body not located in the seat sits on a seat and performs a demonstration operation to wear a seat belt, and is displayed on a display device in the car cabin. additionally set to display.

일부 실시예에 있어서, 상기 장치는 상기 생체가 상기 좌석에 위치하지 않는다는 결정에 응답하여, 프롬프트 메시지를 발송하도록 설정되는 알림 모듈을 더 포함한다.In some embodiments, the device further comprises a notification module configured to send a prompt message in response to determining that the living body is not located in the seat.

일부 실시예에 있어서, 상기 획득 모듈은, 상기 목표 이미지에 대해 특징 추출을 실행하여, 상기 목표 이미지에 대응하는 제1 특징맵을 얻는 단계; 상기 제1 특징맵의 제1 미리 설정된 채널 중에서, 상기 제1 특징맵 중 각 특징점을 객체 중심점의 응답치로서 획득하는 단계; 상기 제1 특징맵을 복수의 서브 영역으로 분할하고, 또한 각 서브 영역 내 최대 응답치 및 최대 응답치에 대응하는 특징점을 결정하는 단계; 및 최대 응답치가 미리 설정된 임계값보다 큰 특징점을 당해 객체의 중심점으로 하고, 또한 상기 제1 특징맵에서의 당해 객체의 중심점의 위치 인덱스에 기반하여 당해 객체의 중심점의 위치 정보를 결정하는 단계를 통해 상기 목표 이미지 중 해당 객체의 중심점의 위치 정보를 결정하도록 설정된다.In some embodiments, the acquiring module includes: performing feature extraction on the target image to obtain a first feature map corresponding to the target image; acquiring each feature point in the first feature map from among first preset channels of the first feature map as a response value of an object center point; dividing the first feature map into a plurality of sub-regions and determining a maximum response value and a feature point corresponding to the maximum response value in each sub-region; and using a feature point with a maximum response value greater than a preset threshold as the center point of the object, and determining the location information of the center point of the object based on the location index of the center point of the object in the first feature map. It is set to determine position information of the center point of the corresponding object in the target image.

실시예에 있어서, 상기 획득 모듈은 상기 목표 이미지에 대해 특징 추출을 실행하여, 상기 목표 이미지에 대응하는 제2 특징맵을 얻는 단계; 상기 제1 특징맵에서의 당해 객체의 중심점의 위치 인덱스에 기반하여, 상기 제2 특징맵에서의 당해 객체의 중심점의 위치 인덱스를 결정하는 단계; 및 상기 제2 특징맵에서의 당해 객체의 중심점의 위치 인덱스에 대응하는 위치에서, 당해 객체의 중심점에 대응하는 객체 유형 정보를 획득하는 단계를 통해 상기 목표 이미지 중 당해 객체의 중심점에 대응하는 객체 유형 정보를 결정하도록 설정된다.In an embodiment, the acquiring module performs feature extraction on the target image to obtain a second feature map corresponding to the target image; determining a location index of the center point of the object in the second feature map based on the location index of the center point of the object in the first feature map; and acquiring object type information corresponding to the center point of the object at a location corresponding to the location index of the center point of the object in the second feature map, and the object type corresponding to the center point of the object in the target image set to determine information.

본 출원의 실시예의 제3 태양에 따르면, 컴퓨터 판독 가능 저장 매체를 제공하고, 상기 컴퓨터 판독 가능 저장 매체에 컴퓨터 프로그램이 저장되어 있고, 당해 프로그램이 프로세서에 의해 실행될 경우 임의의 실시예에 기재된 방법을 실현한다.According to a third aspect of the embodiments of the present application, there is provided a computer-readable storage medium, wherein the computer program is stored in the computer-readable storage medium, and when the program is executed by a processor, the method described in any of the embodiments is provided. come true

본 출원의 실시예의 제4 태양에 따르면, 컴퓨터 디바이스를 제공하고, 상기 컴퓨터 디바이스는 메모리, 프로세서 및 메모리에 저장되고, 프로세서에서 실행 가능한 컴퓨터 프로그램을 포함하며, 상기 프로세서에 의해 상기 프로그램 실행될 경우 임의의 실시예에 기재된 방법을 실현한다.According to a fourth aspect of an embodiment of the present application, there is provided a computer device, wherein the computer device includes a memory, a processor, and a computer program stored in the memory, executable in the processor, and when the program is executed by the processor, any The method described in the examples is realized.

본 출원의 실시예의 제5 태양에 따르면, 차량을 제공하고, 상기 차량의 자동차 캐빈 내에 표시 디바이스, 감시 제어 시스템 및 본 출원의 임의의 실시예에 기재된 디지털 휴먼에 기반한 자동차 캐빈 인터랙션 장치 또는 본 출원의 임의의 실시예에 기재된 컴퓨터 디바이스가 설치되어 있다.According to a fifth aspect of the embodiment of the present application, there is provided a vehicle, and a display device in an automobile cabin of the vehicle, a monitoring control system and a digital human-based automobile cabin interaction apparatus described in any embodiment of the present application or the present application A computer device as described in any of the embodiments is installed.

본 출원의 실시예의 제6 태양에 따르면, 컴퓨터 프로그램 제품을 제공하고, 상기 컴퓨터 프로그램 제품은 컴퓨터 명령어를 포함하고, 상기 컴퓨터 명령어가 프로세서에 의해 실행될 경우, 본 출원의 임의의 실시예에 기재된 방법을 실현한다.According to a sixth aspect of an embodiment of the present application, there is provided a computer program product, the computer program product comprising computer instructions, and when the computer instructions are executed by a processor, the method described in any embodiment of the present application is provided. come true

본 출원의 실시예는 자동차 캐빈 내의 생체의 상태 정보를 획득하고, 상기 상태 정보와 매칭되는 동작 정보를 결정하며, 상기 동작 정보에 기반하여 디지털 휴먼이 상응한 동작을 실행하는 애니메이션을 생성하고 또한 상기 자동차 캐빈 내의 표시 디바이스에 표시함으로써, 자동차 캐빈 내 생체의 상태 정보에 기반하여, 다른 동작의 디지털 휴먼을 표시할 수 있어, 인격화된 인터랙션을 실현하고, 인터랙션 과정이 더욱 자연스러우며, 인터랙션 과정 중의 피드백 정보에 대한 생체의 수용 정도를 향상시킴으로써, 차량 운전 과정에서의 생체 안전성을 향상시킨다.An embodiment of the present application obtains status information of a living body in a car cabin, determines motion information matching the status information, and generates an animation in which a digital human executes a corresponding motion based on the motion information, and By displaying on the display device in the car cabin, based on the state information of the living body in the car cabin, it is possible to display the digital human of different actions, realizing a personified interaction, the interaction process is more natural, and feedback information during the interaction process By improving the degree of bioacceptance to the biomaterial, biosafety in the process of driving a vehicle is improved.

이상의 일반적인 서술 및 후술하는 세부 설명은 예시적이고 해석적인 것에 불과하고 본 출원을 한정하고자 하는 것이 아님을 이해해야 한다.It should be understood that the above general description and the detailed description given below are merely exemplary and interpretative and are not intended to limit the present application.

첨부 도면은 명세서에서 명세서의 일부를 구성하고, 이러한 첨부 도면은 본 출원의 실시예에 부합되며, 명세서와 함께 본 출원의 기술 방안을 설명하도록 한다.
도 1은 본 출원의 실시예에 따른 디지털 휴먼에 기반한 자동차 캐빈 인터랙션 방법의 흐름도이다.
도 2a는 본 출원의 실시예에 따른 디지털 휴먼의 개략도이다.
도 2b는 본 출원의 다른 실시예에 따른 디지털 휴먼의 개략도이다.
도 3은 본 출원의 실시예에 따른 디지털 휴먼에 기반한 자동차 캐빈 인터랙션 방식의 개략도이다.
도 4는 본 출원의 실시예에 따른 디지털 휴먼에 기반한 자동차 캐빈 인터랙션 장치의 모식도다.
도 5는 본 출원의 실시예에 따른 컴퓨터 디바이스의 구조 개략도이다.
도 6a 및 도 6b는 각각 본 출원의 실시예에 따른 차량의 개략도이다.The accompanying drawings form a part of the specification in the specification, and the accompanying drawings correspond to embodiments of the present application, and together with the specification, describe the technical solution of the present application.
1 is a flowchart of a vehicle cabin interaction method based on a digital human according to an embodiment of the present application.
2A is a schematic diagram of a digital human according to an embodiment of the present application.
2B is a schematic diagram of a digital human according to another embodiment of the present application.
3 is a schematic diagram of a vehicle cabin interaction method based on a digital human according to an embodiment of the present application.
4 is a schematic diagram of a vehicle cabin interaction device based on a digital human according to an embodiment of the present application.
5 is a structural schematic diagram of a computer device according to an embodiment of the present application.
6A and 6B are schematic views of a vehicle according to an embodiment of the present application, respectively.

실시예들이 본 명세서에서 상세히 설명될 것이고, 그의 예들은 도면들에 도시되어 있다. 이하의 설명들이 도면들을 포함할 때, 상이한 도면들에서의 유사한 번호들은 달리 표시되지 않는 한 유사하거나 유사한 요소들을 지칭한다. 다음의 예시적인 실시예들에서 설명된 구현들이 본 개시내용과 일치하는 모든 구현들을 표현하는 것은 아니다. 오히려, 그들은 첨부된 청구항들에 상세히 설명된 바와 같이 본 개시내용의 일부 양태들과 일치하는 장치들 및 방법들의 단지 예들에 불과하다.Embodiments will be described in detail herein, examples of which are shown in the drawings. When the following description includes drawings, like numbers in different drawings refer to similar or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatuses and methods consistent with some aspects of the present disclosure as detailed in the appended claims.

본 개시내용에서 사용되는 용어들은 특정 실시예들을 설명하기 위한 목적일 뿐이며, 본 개시내용을 제한하려는 의도는 아니다. 본 개시내용 및 첨부된 청구항들에서 사용되는 단수 형태들 "한(a)", "상기(said)" 및 "그(the)"는 또한 문맥이 다른 의미들을 명확하게 나타내지 않는 한, 복수 형태들을 포함하도록 의도된다. 본 명세서에서 사용되는 용어 "및/또는"은 하나 이상의 연관된 열거된 항목들의 임의의 또는 모든 가능한 조합들을 지칭하며 이를 포함한다는 점이 추가로 이해되어야 한다. 또한, 본 명세서에서 용어 "적어도 하나"는 복수의 유형 중 임의의 하나 또는 복수의 유형 중 적어도 2개의 임의의 조합을 의미한다.The terms used in the present disclosure are for the purpose of describing specific embodiments only, and are not intended to limit the present disclosure. As used in this disclosure and the appended claims, the singular forms "a", "said" and "the" also refer to the plural forms, unless the context clearly indicates otherwise. It is intended to include It should further be understood that the term “and/or” as used herein refers to and includes any or all possible combinations of one or more associated listed items. Also, as used herein, the term “at least one” means any one of a plurality of types or any combination of at least two of a plurality of types.

용어들 제1, 제2, 제3 등이 다양한 정보를 설명하기 위해 본 개시내용에서 사용될 수 있지만, 정보는 이러한 용어들로 제한되지 않아야 한다는 점이 이해되어야 한다. 이 용어들은 단지 동일한 유형의 정보를 서로 구별하기 위해 사용된다. 예를 들어, 본 개시내용의 범위를 벗어나지 않고, 제1 정보는 제2 정보로서 지칭될 수 있고; 유사하게, 제2 정보는 제1 정보라고도 할 수 있다. 문맥에 따라, 본 명세서에서 사용되는 바와 같은 단어 "~인 경우(if)"는 "~시에(upon)" 또는 "~일 때(when)" 또는 "결정에 응답하여(in response to determination)"로서 해석될 수 있다.Although the terms first, second, third, etc. may be used in this disclosure to describe various pieces of information, it should be understood that the information should not be limited to these terms. These terms are only used to distinguish between the same types of information. For example, without departing from the scope of the present disclosure, first information may be referred to as second information; Similarly, the second information may be referred to as first information. Depending on the context, the word "if" as used herein means "upon" or "when" or "in response to determination." " can be interpreted as

본 기술분야의 통상의 기술자가 본 개시내용의 실시예들에서의 기술적 해결책들을 더 잘 이해하게 하고, 본 개시내용의 실시예들의 설명된 목적들, 특징들 및 이점들을 더 분명하게 하기 위해, 본 개시내용의 실시예들에서의 기술적 해결책들이 첨부 도면들을 참조하여 아래에 상세히 추가로 설명될 것이다.To enable a person skilled in the art to better understand the technical solutions in the embodiments of the present disclosure, and to make the described objects, features and advantages of the embodiments of the present disclosure more clear, Technical solutions in embodiments of the disclosure will be further described in detail below with reference to the accompanying drawings.

자가용의 보급 및 스마트 자동차 캐빈의 급속한 발전에 따라, 주행 과정중의 안전성에 대해 광범위한 관심을 불러 일으키고 있다. 안전성을 향상시키는 일 방식은, 자동차 캐빈 내에 감시 제어 시스템을 설치하여 자동차 캐빈 내 생체에 대해 감시 제어를 실행하고, 감시 제어 결과에 기반하여 인터랙션 정보를 출력하여, 필요한 경우 빠르게 자동차 캐빈 내의 생체에 알리는 방식이다. 기존의 인터랙션 방식은 일반적으로 자동차 캐빈 내의 오디오 재생 디바이스를 통해 음성 프롬프트 메시지를 출력하거나, 또는 음성 프롬프트 메시지를 출력하는 동시에 차량의 표시 디바이스에 문자 프롬프트 메시지를 동시에 출력한다. 그러나, 이러한 인터랙션 방식은 사람들에게 기계와 상호 작용을 실행하는 딱딱한 감을 주게 되고, 때로는 인터랙션 정보에 대한 생체의 수용 정도가 비교적 낮아서, 차량 운전 과정에서의 생체 안전성이 떨어진다.With the spread of private cars and the rapid development of smart car cabins, a wide range of interest is being raised in safety during the driving process. One way to improve safety is to install a monitoring and control system in a car cabin to execute monitoring and control on the living body in the car cabin, and output interaction information based on the monitoring control result to quickly notify the living body in the car cabin, if necessary. method. The conventional interaction method generally outputs a voice prompt message through an audio reproducing device in a car cabin, or outputs a voice prompt message and a text prompt message to a display device of the vehicle at the same time. However, this interaction method gives people a hard feeling of interacting with the machine, and sometimes the degree of acceptance of the living body for the interaction information is relatively low, so the biosafety in the vehicle driving process is deteriorated.

이를 감안하여, 본 출원의 실시예는 디지털 휴먼에 기반한 자동차 캐빈 인터랙션 방법을 제공하고, 도 1에 나타낸 바와 같이, 상기 방법은In view of this, the embodiment of the present application provides a method for car cabin interaction based on a digital human, and as shown in FIG. 1, the method is

자동차 캐빈에 승차한 생체의 상태 정보를 획득하는 단계 101;Step 101 of obtaining status information of the living body riding in the car cabin;

상기 상태 정보와 매칭되는 동작 정보를 결정하는 단계 102; 및determining operation information matching the state information 102; and

상기 동작 정보에 기반하여 디지털 휴먼이 상응한 동작을 실행하는 애니메이션을 생성하고 또한 상기 자동차 캐빈 내의 표시 디바이스에 표시하는 단계 103을 포함할 수 있다.and generating an animation in which the digital human performs a corresponding motion based on the motion information and displaying the animation on a display device in the car cabin.

단계 101에서, 상기 차량은 자가용, 스쿨버스, 대형 버스 등 여러 유형의 차량일 수 있다. 설명의 편이를 위해, 이하 자가용의 예를 들어 본 출원의 실시예에 따른 방안을 설명하도록 한다. 상기 생체는 운전자, 부조종사, 어린이, 노인, 애완동물, 뒷줄 승차인 중 적어도 하나를 포함할 수 있으나 이에 한정하지 않는다. 실제 필요에 따라, 다른 생체와 인터랙션을 실행할 수 있어, 다양한 장면의 인터랙션 수요를 만족시킨다.In step 101, the vehicle may be various types of vehicles, such as a private car, a school bus, and a large bus. For convenience of description, a method according to an embodiment of the present application will be described below with an example of a private vehicle. The living body may include at least one of a driver, a co-pilot, a child, an elderly person, a pet, and a passenger in the back row, but is not limited thereto. According to actual needs, interaction with other living bodies can be performed, satisfying the interaction demands of various scenes.

예를 들어, 승차 과정 중에서, 뒷줄에 착석한 생체(예를 들어, 어린이 등 뒷줄 승차인 또는 애완동물 등)에게 예를 들어, 어린이가 안전벨트를 풀거나 애완동물이 좌석에서 이탈하는 등 일정한 안전상의 위험이 존재한다. 그러나 운전자는 주행 과정에서 운전에 전념해야 하기 때문에, 자동차 캐빈 내 뒷줄의 상황을 겸찰할 수 없다. 뒷줄 생체의 승차 과정 중에서의 안전성을 향상시키기 위해, 디지털 휴먼을 통해 자동차 캐빈 내 뒷줄에 승차한 생체와 인터랙션을 실행할 수 있다. 따라서, 일부 실시예에 있어서, 상기 생체는 자동차 캐빈 내 뒷줄에 착석한 생체이다. 디지털 휴먼을 통해 뒷줄에 착석한 생체와 인터랙션을 실행함으로써, 인터랙션 과정 중의 피드백 정보에 대한 승차 생체의 수용 정도를 향상시킬 수 있어, 승차한 생체의 안전성을 향상시키고, 운전자가 과다한 정력을 뒷줄의 상황을 겸찰하는데 두지 않고 주행 과정 중에서 운전에 전념할 수 있도록 한다.For example, during the boarding process, certain safety issues such as a child unfastening a seat belt or a pet leaving the seat are given to a living body (for example, a back-row passenger such as a child or a pet) seated in the back row. there is a risk of However, since the driver has to concentrate on driving during the driving process, it is not possible to combine the situation in the back row in the car cabin. In order to improve the safety of the rear row living body during the riding process, it is possible to interact with the living body riding in the back row of the car cabin through a digital human. Accordingly, in some embodiments, the living body is a living body seated in a back row of a car cabin. By performing an interaction with the living body seated in the back row through the digital human, the degree of acceptance of the riding body to the feedback information during the interaction process can be improved, improving the safety of the riding body, and the driver applying excessive energy to the situation in the back row It allows you to focus on driving in the course of driving without putting yourself on both sides of the road.

일부 실시예에 있어서, 상기 상태 정보는 상기 생체의 제1 상태 정보를 포함할 수 있고, 예를 들어, 상기 생체의 제1 상태 정보는 상기 생체의 분류 정보, 신분 정보, 속성 정보, 정서 정보, 표정 정보, 몸짓 정보, 좌석 착석 정보, 안전벨트 착용 정보 중 적어도 하나를 포함한다.In some embodiments, the state information may include first state information of the living body, for example, the first state information of the living body includes classification information of the living body, identity information, attribute information, emotion information, It includes at least one of facial expression information, gesture information, seat seating information, and seat belt wearing information.

그 중에서, 상기 분류 정보는 상기 생체의 분류를 나타내는데 사용되고, 상기 분류는 사람 및/또는 동물을 포함할 수 있다. 상기 신분 정보는 각 생체를 유일하게 인식하기 위한 상기 생체의 인식 정보를 포함할 수 있다. 상기 속성 정보는 상기 생체의 특징 속성을 나타내는데 사용되고, 연령, 성별, 용모, 체형, 의복과 장신구, 헤어스타일 및 피부색 중 적어도 하나를 포함할 수 있으며 이에 한정하지 않는다. 상기 정서 정보는 상기 생체의 정서 분류를 나타내는데 사용되고, 상기 정서 분류는 기쁨, 슬픔, 화냄, 부끄러움, 놀라움, 흥분, 두려움, 분노, 평안 등 적어도 하나의 분류를 포함할 수 있으며 이에 한정하지 않는다. 상기 표정 정보는 상기 생체의 얼굴 표정을 나타내는데 사용되고, 상기 얼굴 표정은 미소, 입을 삐죽거림, 울음, 실눈, 익살스러운 표정 등 중 적어도 하나를 포함할 수 있으며 이에 한정하지 않는다. 상기 몸짓 정보는 상기 생체의 동작을 나타내는데 사용되고, 예를 들어, 박수, 발을 구르거나, 차 문을 열거나, 머리 또는 손을 창 밖으로 내미는 등 동작 중 적어도 하나를 포함할 수 있다. 상기 좌석 착석 정보는 상기 생체의 좌석(예를 들어, 어린이 좌석) 이탈 여부를 나타내는데 사용된다. 상기 안전벨트 착용 정보는 상기 생체의 안전벨트 착용 여부를 나타내는데 사용된다.Among them, the classification information is used to indicate the classification of the living body, and the classification may include humans and/or animals. The identification information may include biometric recognition information for uniquely recognizing each living body. The attribute information is used to indicate the characteristic attribute of the living body, and may include, but is not limited to, at least one of age, gender, appearance, body type, clothes and accessories, hair style, and skin color. The emotion information is used to indicate the emotion classification of the living body, and the emotion classification may include, but is not limited to, at least one classification such as joy, sadness, anger, shame, surprise, excitement, fear, anger, and peace. The expression information is used to represent the facial expression of the living body, and the facial expression may include at least one of a smile, a pouting mouth, crying, squinting eyes, a humorous expression, and the like, but is not limited thereto. The gesture information is used to indicate the motion of the living body, and may include, for example, at least one of clapping, rolling a foot, opening a car door, or putting a head or hand out a window. The seat seating information is used to indicate whether the living body leaves a seat (eg, a child seat). The seat belt wearing information is used to indicate whether the living body wears the seat belt.

진일보로, 상기 생체의 상태 정보는 상기 생체의 제2 상태 정보도 포함할 수 있는바, 예를 들어, 상기 생체의 제2 상태 정보는 건강 상태 정보 및 신경 계통 상태 정보 중 적어도 하나를 포함한다. 그 중에서, 상기 건강 상태 정보는 상기 생체의 건강 상태를 나타내는데 사용되는바, 예를 들어, 심박수, 혈압, 혈지, 혈당 등 중 적어도 하나를 포함할 수 있으며 이에 한정하지 않는다. 상기 신경 계통 상태 정보는 상기 생체의 신경 계통의 흥분 정도를 나타내는데 사용되는바, 예를 들어, 상기 생체의 피곤 여부 또는 잠들었는지 여부를 포함한다. 제1 상태 정보 및 제2 상태 정보를 동시에 획득한 경우, 상기 동작 정보는 상기 제1 상태 정보 및 상기 제2 상태 정보와 모두 매칭되는 동작 정보이다.Further, the state information of the living body may also include the second state information of the living body, for example, the second state information of the living body includes at least one of health state information and nervous system state information. Among them, the health state information is used to indicate the health state of the living body, and may include, for example, at least one of heart rate, blood pressure, blood fat, blood sugar, and the like, but is not limited thereto. The nervous system state information is used to indicate the degree of excitement of the nervous system of the living body, and includes, for example, whether the living body is tired or asleep. When the first state information and the second state information are simultaneously acquired, the operation information is operation information that matches both the first state information and the second state information.

다른 상태 정보를 획득함으로써, 상기 생체의 여러 방면의 상태를 결정할 수 있고, 획득한 상태 정보의 분류가 많을수록, 결정되는 상기 생체의 상태가 더 전면적이고 정확하여, 디지털 휴먼이 실행하는 동작이 상기 생체의 상태에 더욱 부합되고, 나아가서 디지털 휴먼의 인격화 정도를 향상시킬 수 있다.By acquiring other status information, it is possible to determine the status of various aspects of the living body, and the more the obtained status information is classified, the more comprehensive and accurate the determined status of the living body is, so that the operation performed by the digital human can be performed by the living body. It is more suitable to the state of the human being, and furthermore, the degree of personification of digital human can be improved.

상기 생체의 상태 정보는 여러 방식을 통해 획득할 수 있는바, 예를 들어, 자동차 캐빈 내의 감시 제어 비디오에 기반하여 획득할 수 있거나, 또는 자동차 캐빈 내의 오디오 감시 제어 결과에 기반하여 획득할 수 있거나, 또는, 상기 생체가 휴대한 스마트기기에서 획득할 수 있다. 또한 여러 수단을 서로 결합한 방식을 통해 상기 생체의 상태 정보를 획득할 수도 있다. 여러 방식으로 상태 정보를 획득함으로써 획득한 상태 정보의 전면성 및 융통성을 향상시킬 수 있다. 다른 응용 장면에 따라, 다른 상태 정보 획득 방식을 선택할 수 있어, 여러 응용 장면에 맞게 여러 종류의 상태 정보를 획득한다. 예를 들어, 광선이 어두운 장면에서, 자동차 캐빈 내의 감시 제어 비디오 및 오디오 감시 제어 결과에 기반하여 공동으로 상기 생체의 상태 정보를 획득할 수 있어, 상태 정보의 획득 정확성을 향상시킨다.The state information of the living body can be acquired through various methods, for example, it can be acquired based on a video surveillance control in the car cabin, or it can be acquired based on the results of audio surveillance control in the car cabin, Alternatively, the living body can obtain it from a smart device carried. In addition, the state information of the living body may be acquired through a method in which several means are combined with each other. By acquiring the status information in various ways, the overallness and flexibility of the acquired status information can be improved. Different state information acquisition methods can be selected according to different application scenes, so that various types of state information are acquired according to different application scenes. For example, in a dimly lit scene, it is possible to jointly acquire the status information of the living body based on the video and audio monitoring control results of the monitoring control in the car cabin, thereby improving the accuracy of acquiring status information.

일부 실시예에 있어서, 상기 자동차 캐빈 내에 감시 제어 시스템을 장착할 수 있다. 실제 응용 장면의 필요에 따라, 상기 감시 제어 시스템은 자동차 캐빈 내 임의의 영역 내의 생체에 대해 감시 제어를 실행할 수 있고, 예를 들어, 상기 감시 제어 시스템은 운전석의 운전자에 대해 감시 제어를 실행할 수 있고; 또 예를 들어, 상기 감시 제어 시스템은 조수석의 운전자 조수에 대해 감시 제어를 실행할 수도 있으며; 또 예를 들어, 상기 감시 제어 시스템은 자동차 캐빈 내 뒷줄의 생체에 대해 감시 제어를 실행할 수도 있다. 또는, 상기 감시 제어 시스템은 자동차 캐빈 내 복수의 영역(예를 들어, 조수석 및 뒷줄) 내의 생체를 감시 제어할 수도 있다.In some embodiments, a supervisory control system may be mounted within the vehicle cabin. According to the needs of the actual application scene, the monitoring and control system may execute monitoring control for a living body in an arbitrary area in the automobile cabin, for example, the monitoring and control system may execute monitoring control for the driver of the driver's seat; ; Also for example, the supervisory control system may execute supervisory control for a driver's assistant in the passenger seat; Also, for example, the supervisory control system may execute supervisory control for living bodies in the rear row in the automobile cabin. Alternatively, the monitoring and control system may monitor and control the living body in a plurality of areas (eg, a passenger seat and a rear row) in an automobile cabin.

상기 감시 제어 시스템은 시각 감시 제어 시스템 및 오디오 감시 제어 시스템 중 적어도 하나를 포함할 수 있고, 상기 시각 감시 제어 시스템은 상기 생체의 시각 감시 제어 결과(예를 들어, 감시 제어 비디오 또는 감시 제어 이미지)를 획득하는데 사용된다. 일부 실시예에 있어서, 상기 자동차 캐빈 내의 백미러에 장착되고, 또한 렌즈가 자동차 캐빈 뒷줄을 향하는 비디오 수집 장치를 통해 감시 제어 비디오를 획득할 수 있다. 비디오 수집 장치를 자동차 캐빈 내의 백미러에 장착하여, 비디오 수집 장치의 시야를 넓혀, 자동차 캐빈 내의 물체에 의해 쉽게 가리우지 않음으로써, 자동차 캐빈 뒷줄의 비교적 완전한 제어 비디오를 수집할 수 있다. 상기 오디오 감시 제어 시스템은 상기 생체의 오디오 감시 제어 결과를 획득하는데 사용되고, 상기 오디오 감시 제어 결과는 상기 생체의 음성 및/또는 상기 생체가 동작(예를 들어, 차 문을 열거나, 안전벨트를 착용하거나, 차창을 두드리는 등 적어도 하나의 동작)을 실행할 때 발생하는 소리 등을 포함할 수 있다. 상기 감시 제어 결과에 기반하여 상기 생체의 제1 상태 정보를 결정할 수 있다.The monitoring control system may include at least one of a visual monitoring control system and an audio monitoring control system, wherein the visual monitoring control system displays a visual monitoring control result (eg, monitoring control video or monitoring control image) of the living body. used to obtain In some embodiments, surveillance control video may be acquired via a video acquisition device mounted on a rearview mirror in the car cabin and with a lens facing the back row of the car cabin. By mounting the video collection device on the rearview mirror in the car cabin, the field of view of the video collection device is broadened, so that it is not easily obscured by objects in the car cabin, so that a relatively complete control video of the back row of the car cabin can be collected. The audio monitoring control system is used to obtain the audio monitoring control result of the living body, and the audio monitoring control result is the voice of the living body and/or the living body operation (eg, opening a car door or wearing a seat belt). or a sound generated when at least one action such as tapping on a car window) is executed. The first state information of the living body may be determined based on the monitoring control result.

자동차 캐빈 내의 감시 제어 비디오에 기반하여 상기 생체의 상태 정보를 획득하는 경우, 상기 감시 제어 비디오를 미리 훈련된 신경망을 입력하고, 상기 신경망의 출력 결과에 기반하여 상기 생체의 상태 정보를 결정할 수 있다. 일부 실시예에 있어서, 신경망과 기타 알고리즘(예를 들어, 안부 인식 알고리즘 및/또는 골격 관건점 검출 알고리즘 등)을 서로 결합하는 방식으로 상기 생체의 상태 정보를 결정할 수도 있다. 일부 실시예에 있어서, 기타 방식으로 상기 생체의 상태 정보를 결정할 수도 있으며, 여기서는 더 이상 설명하지 않도록 한다. 신경망을 통해 생체의 상태 정보를 획득하여, 상기 상태 정보의 획득 정확성을 향상시킬 수 있다.When the state information of the living body is obtained based on the surveillance control video in the car cabin, the state information of the living body may be determined based on an output result of the neural network by inputting a neural network trained in advance for the surveillance control video. In some embodiments, the state information of the living body may be determined by combining a neural network and other algorithms (eg, a safety recognition algorithm and/or a skeletal key point detection algorithm, etc.). In some embodiments, the state information of the living body may be determined by other methods, and will not be described further herein. By acquiring the state information of the living body through the neural network, it is possible to improve the accuracy of obtaining the state information.

상기 신경망은 입력 층, 적어도 하나의 중간층 및 출력 층을 포함할 수 있고, 상기 입력 층, 적어도 하나의 중간층 및 출력 층은 모두 하나 또는 복수의 신경원을 포함한다. 여기서, 상기 중간층은 통상 입력 층과 출력 층 사이에 위치하는 층 예를 들어 은폐 층을 가리킨다. 일부 예에서, 상기 신경망의 중간층은 컨볼루션층, ReLU(Rectified Linear Units, 정류한 선형 유닛) 층 등 중 적어도 하나를 포함할 수 있으나 이에 한정하지 않으며, 상기 신경망에 포함되는 중간층의 층수가 많을수록 네트워크가 더 깊다. 심층 신경망은 구체적으로 심도 신경망(Deep neural network) 또는 컨볼루션 신경망일 수 있다.The neural network may include an input layer, at least one intermediate layer, and an output layer, wherein the input layer, the at least one intermediate layer and the output layer all include one or a plurality of neurons. Here, the intermediate layer usually refers to a layer located between the input layer and the output layer, for example a hiding layer. In some examples, the intermediate layer of the neural network may include, but is not limited to, at least one of a convolutional layer, a Rectified Linear Units (ReLU) layer, and the like, and the more the number of layers of the intermediate layer included in the neural network, the more the network is deeper The deep neural network may specifically be a deep neural network or a convolutional neural network.

또는, 자동차 캐빈 내 뒷줄의 감시 제어 비디오를 수집하고; 상기 감시 제어 비디오에 대해 생체 검출을 실행하고 검출된 생체에 대해 상태 분석을 실행하여, 상기 생체의 제1 상태 정보를 얻을 수 있다. 감시 제어 비디오에 기반하여 하여, 한편으로 비교적 전면적인 상태 정보를 얻을 수 있고, 다른 한편으로는 감시 제어 비디오 중에 포함된 복수 프레임의 목표 이미지에 시간 연관성이 존재하기에 이러한 시간 연관성을 이용하여 상태 정보 획득 정확성을 향상시킬 수 있다. 예를 들어, 상기 생체의 감시 제어 비디오를 획득하여, 상기 감시 제어 비디오에 대해 안부 인식을 실행하고, 안부 인식 결과에 기반하여 상기 생체의 신분 정보를 결정할 수 있다.or, collect surveillance control video of the back row in the car cabin; The first state information of the living body may be obtained by performing biometric detection on the monitoring control video and performing state analysis on the detected living body. Based on the surveillance control video, on the one hand, it is possible to obtain relatively comprehensive status information, and on the other hand, since there is a temporal correlation in the target image of a plurality of frames included in the surveillance control video, the temporal correlation is used to obtain the status information Acquisition accuracy can be improved. For example, by acquiring the monitoring control video of the living body, performing safety recognition on the monitoring control video, and determining the identification information of the living body based on the safety recognition result.

또 예를 들어, 감시 제어 비디오에 기반하여 상기 생체의 정서 정보를 인식할 수 있다. 구체적으로, 상기 감시 제어 비디오에서 상기 생체를 포함하는 적어도 하나의 프레임의 목표 이미지를 획득하고; 상기 목표 이미지 내에서 상기 생체의 안면 서브 이미지를 절취할 수 있다. 상기 안면 서브 이미지가 대표하는 안면의 적어도 두 개의 기관 중 각 기관의 동작, 예를 들어, 미간을 찌푸리거나, 눈을 부릅뜨거나, 입꼬리가 올라가는 등 동작을 인식한다. 인식된 상기 각 기관의 동작에 기반하여, 상기 안면 서브 이미지가 대표하는 안면의 정서 정보를 결정한다. 상기 각 기관의 동작을 인식하기 전에, 안면 서브 이미지에 대해 이미지 전처리를 실행할 수도 있다. 구체적으로, 안면 서브 이미지 중 관건점(예를 들어, 눈가, 입가, 미간, 눈썹 끝부분, 코 등)의 위치 정보를 결정하고; 관건점의 위치 정보에 기반하여, 안면 서브 이미지에 대해 아핀(affine) 변환을 실행하여, 안면 서브 이미지 중 다른 방향의 안면 서브 이미지를 정면을 향하는 안면 서브 이미지로 변환하며; 정면을 향하는 안면 서브 이미지에 대해 정규화 처리를 실행하여, 처리 후의 안면 서브 이미지를 얻을 수 있다.Also, for example, the emotional information of the living body may be recognized based on the surveillance control video. Specifically, acquiring a target image of at least one frame including the living body in the surveillance control video; The facial sub-image of the living body may be cut out from the target image. An operation of each of the at least two organs of the face represented by the facial sub-image, for example, a frown, a frown, or a raised corner of the mouth is recognized. Based on the recognized motion of each organ, the facial emotion information represented by the facial sub-image is determined. Before recognizing the motion of each organ, image preprocessing may be performed on the facial sub-image. Specifically, determining location information of key points (eg, around the eyes, around the mouth, between the eyebrows, at the end of the eyebrows, on the nose, etc.) among the facial sub-images; based on the location information of the key point, performing affine transformation on the facial sub-image to convert the facial sub-image in another direction among the facial sub-images into a front-facing facial sub-image; Normalization processing is performed on the face sub-image facing the front, so that the facial sub-image after the processing can be obtained.

또 예를 들어, 상기 목표 이미지 내에서 생체의 안면 서브 이미지를 절취할 수 있다. 상기 안면 서브 이미지에 기반하여, 생체의 왼쪽 눈을 감고 뜨는 여부 상태 정보 및 오른쪽 눈을 감고 뜨는 여부 상태 정보를 결정한다. 구체적으로 실시 시, 안면 서브 이미지를 하나의 훈련을 거친 신경망에 입력하고, 당해 신경망의 출력 결과에 기반하여 생체의 왼쪽 눈을 감고 뜨는 여부 상태 정보 및 오른쪽 눈을 감고 뜨는 여부 상태 정보를 결정한다. 그 후, 생체의 왼쪽 눈을 감고 뜨는 여부 상태 정보 및 오른쪽 눈을 감고 뜨는 여부 상태 정보에 기반하여, 생체의 피곤 여부 또는 잠들었는지 여부를 결정한다. 구체적으로, 연속 복수 프레임의 상기 생체의 목표 이미지에 대응하는 왼쪽 눈을 감고 뜨는 여부 상태 정보 및 오른쪽 눈을 감고 뜨는 여부 상태 정보에 기반하여, 상기 생체의 눈을 감은 누적 시간을 결정하고; 상기 눈을 감은 누적 시간이 미리 설정된 임계값보다 클 경우, 상기 생체가 이미 잠든 것으로 결정하며; 상기 눈을 감은 누적 시간이 미리 설정된 임계값 이하 또는 같을 경우, 상기 생체가 아직 잠들지 않은 것으로 결정할 수 있다.Also, for example, a facial sub-image of a living body may be cut out in the target image. Based on the facial sub-image, state information on whether the left eye of the living body is closed or opened and information about whether the right eye is closed or opened is determined. Specifically, in implementation, the facial sub-image is input to a neural network that has undergone one training, and based on the output result of the neural network, information on whether the left eye of the living body is closed or not and information about whether the right eye is closed or not is determined. Thereafter, based on the state information on whether the left eye of the living body is closed and open and the state information on whether the right eye is closed or opened, it is determined whether the living body is tired or asleep. Specifically, determining the cumulative time for closing the eyes of the living body based on the left eye closing/opening status information and the right eye closing/opening status information corresponding to the target image of the living body in a plurality of consecutive frames; determining that the living body has already fallen asleep when the cumulative time for closing the eyes is greater than a preset threshold; When the accumulated time for closing the eyes is equal to or less than or equal to a preset threshold, it may be determined that the living body has not yet fallen asleep.

자동차 캐빈 내의 오디오 감시 제어 결과에 기반하여 상기 생체의 상태 정보를 획득하는 경우, 상기 생체가 내는 음성을 획득하고, 상기 음성에 대해 음성 인식을 실행하며, 음성 인식 결과에 기반하여 상기 생체의 신분 정보 및/또는 정서 정보를 결정할 수 있다.When the state information of the living body is acquired based on the audio monitoring control result in the car cabin, the voice made by the living body is acquired, the voice is recognized for voice recognition, and the identification information of the living body is based on the voice recognition result. and/or determine sentiment information.

상기 생체가 휴대한 스마트기기에서 상기 생체의 상태 정보를 획득하는 경우, 미리 관련된 스마트기기가 수집한 제2 상태 정보를 수신할 수 있다. 상기 스마트기기는 스마트 팔찌, 스마트 안경 등과 같은 웨어러블 디바이스일 수도 있고, 휴대폰, 태블릿 PC등과 같은 휴대용 단말기일 수도 있다.When the state information of the living body is obtained from the smart device carried by the living body, the second state information collected in advance by the related smart device may be received. The smart device may be a wearable device such as a smart bracelet or smart glasses, or a portable terminal such as a mobile phone or a tablet PC.

단계 102에서, 상기 상태 정보와 매칭되는 동작 정보를 결정할 수 있다. 일부 실시예에 있어서, 매개 종류의 동작 정보와 상기 상태 정보의 매칭도를 결정하고, 매칭도가 제일 높은 동작 정보를 상기 상태 정보와 매칭되는 동작 정보로 결정할 수 있다. 상기 상태 정보에 여러 정보가 포함되는 경우, 동일 동작 정보와 상기 상태 정보 중 여러 정보와의 매칭도를 각각 결정한 후, 여러 정보에 대응하는 매칭도에 기반하여 상기 동작 정보와 상기 상태 정보의 매칭도를 결정할 수 있다. 예를 들어, 여러 정보에 대응하는 매칭도에 대해 가중 평균을 실행한다. 다른 일부 실시예에서, 다른 상태 정보와 매칭되는 동작 정보 사이의 매핑 관계를 미리 구축하고, 또한 상기 매핑 관계에 기반하여 상기 매칭되는 동작 정보를 결정할 수 있다.In step 102, operation information matching the state information may be determined. In some embodiments, a degree of matching between each type of motion information and the status information may be determined, and motion information having the highest degree of matching may be determined as motion information matching the status information. When multiple pieces of information are included in the status information, the degree of matching between the same motion information and multiple pieces of the status information is determined, and then the degree of matching between the motion information and the status information based on the matching degree corresponding to the multiple pieces of information can be decided For example, a weighted average is performed on the degree of matching corresponding to various pieces of information. In some other embodiments, a mapping relationship between other state information and matching action information may be established in advance, and the matching action information may be determined based on the mapping relationship.

단계 103에서, 매칭되는 동작 정보를 결정한 후, 상기 매칭되는 동작 정보에 기반하여 디지털 휴먼이 상응한 동작을 실행하는 애니메이션을 생성하고 상기 자동차 캐빈 내의 표시 디바이스(예를 들어, 컨트롤 패널 또는 좌석 뒤쪽의 표시 디바이스)에 표시할 수 있다. 상기 디지털 휴먼은 소프트웨어를 통해 생성되고, 또한 자동차 캐빈 내의 표시 디바이스에 표시할 수 있는 가상 형상일 수 있다. 소프트웨어 방식으로 디지털 휴먼을 생성하고 표시하는 방식은 코스트가 비교적 낮고, 디지털 휴먼의 반응 속도가 비교적 빠르며, 사후 유지 보수 코스트가 비교적 낮고, 업데이트 및 업그레이드 하는데 편리하다.In step 103, after determining the matching motion information, an animation is generated in which the digital human performs the corresponding motion based on the matching motion information, and a display device in the car cabin (eg, a control panel or a display device). The digital human may be a virtual shape that is generated through software and can be displayed on a display device in an automobile cabin. The software-based method of creating and displaying a digital human has a relatively low cost, a relatively fast response speed of the digital human, a relatively low post-maintenance cost, and is convenient for updating and upgrading.

도 2a 및 도 2b에 나타낸 바와 같이, 상기 디지털 휴먼의 형상은 카툰 형상일 수 있고, 상기 생체의 실제 형상에 기반하여 생성된 3D형상일 수도 있으며, 다른 유형의 형상일 수도 있다. 일부 실시예에 있어서, 상기 디지털 휴먼의 형상은 고정 형상일 수 있고, 다시 말해서 매번 디바이스에 표시되는 디지털 휴먼의 형상은 모두 동일하고, 예를 들어, 남자 아이의 형상, 또는 카툰 캐릭터(예를 들어, 도라에몽)의 형상 등 일 수 있다. 다른 일부 실시예에서, 상기 디지털 휴먼의 형상은 실제 상황에 따라 동적으로 생성될 수도 있고, 다시 말해서 상기 디바이스에 표시되는 디지털 휴먼의 형상은 다를 수 있다.2A and 2B , the shape of the digital human may be a cartoon shape, a 3D shape generated based on the actual shape of the living body, or other types of shapes. In some embodiments, the shape of the digital human may be a fixed shape, that is, the shape of the digital human displayed on the device every time is the same, for example, the shape of a boy, or a cartoon character (eg, , the shape of Doraemon), and the like. In some other embodiments, the shape of the digital human may be dynamically generated according to an actual situation, that is, the shape of the digital human displayed on the device may be different.

생체의 상태 정보(예를 들어, 신분 정보, 정서 정보, 속성 정보 중 적어도 하나)에 기반하여 다른 디지털 휴먼 형상을 표시할 수 있다. 예를 들어, 갑에게는 갑에 대응하는 제1 디지털 휴먼 형상을 표시하고, 을에게는 을에 대응하는 제2 디지털 휴먼 형상을 표시할 수 있다. 또 예를 들어, 상기 생체의 정서가 기쁨인 경우, 미소를 띤 표정 및/또는 밝은 의복과 장신구의 디지털 휴먼 형상을 표시할 수 있다. 또 예를 들어, 상기 생체가 어린이인 경우, 어린이 디지털 휴먼 형상을 표시하고; 상기 생체가 성인 경우, 성인 디지털 휴먼 형상을 표시할 수 있다. 또 예를 들어, 상기 생체가 장발인 경우, 장발의 디지털 휴먼 형상을 표시하고; 상기 생체가 단발인 경우, 단발의 디지털 휴먼 형상을 표시할 수 있다.Another digital human shape may be displayed based on the state information of the living body (eg, at least one of identity information, emotion information, and attribute information). For example, a first digital human shape corresponding to A may be displayed to A, and a second digital human shape corresponding to B may be displayed to A. Also, for example, when the emotion of the living body is joy, a smiling expression and/or a digital human shape of bright clothes and accessories may be displayed. Also, for example, when the living body is a child, a child digital human shape is displayed; When the living body is an adult, an adult digital human shape may be displayed. Also, for example, when the living body is long-haired, a long-haired digital human shape is displayed; When the living body is single-shot, a single-shot digital human shape may be displayed.

실제 응용에 있어서, 미리 설정된 디지털 휴먼의 형상 템플릿에 기반하여 상기 디지털 휴먼의 형상을 생성할 수 있다. 여기서, 상기 형상 템플릿은 유저가 미리 생성하거나 또는 서버에서 수신한 템플릿일 수 있다. 예를 들어, 생체가 승차 시, 자동차의 카메라로 생체의 이미지를 촬영하거나 또는 유저 단말이 발송한 생체의 이미지를 수신하고, 상기 생체의 이미지에 기반하여 상기 생체의 형상 템플릿을 생성할 수 있다. 구체적으로, 상기 생체의 이미지에 대해 속성 검출을 실행하여, 상기 생체의 속성을 획득하고, 상기 생체의 속성에 기반하여 상기 생체에 대응하는 디지털 휴먼을 생성할 수 있다. 이미 생성된 형상 템플릿에 대해 재생성을 실행하거나(예를 들어, 새로운 형상 템플릿으로 기존의 형상 템플릿을 대체) 또는 부분적으로 편집(예를 들어, 형상 템플릿의 헤어스타일 변경)을 실행할 수도 있다. 형상 템플릿을 생성할 때, 당해 형상 템플릿의 카툰화 정도를 커스터마이징할 수 있다.In actual application, the digital human shape may be generated based on a preset digital human shape template. Here, the shape template may be a template previously created by a user or received from a server. For example, when a living body is riding, an image of the living body may be photographed with a camera of a vehicle or an image of the living body sent from a user terminal may be received, and a shape template of the living body may be generated based on the image of the living body. Specifically, by performing attribute detection on the image of the living body, the attributes of the living body may be acquired, and a digital human corresponding to the living body may be generated based on the attributes of the living body. Regeneration (eg, replacing an existing shape template with a new shape template) or partial editing (eg, changing the hairstyle of a shape template) may be performed on an already created shape template. When generating a shape template, the degree of cartoonization of the shape template can be customized.

디지털 휴먼의 애니메이션을 표시할 때, 실제 상황에 따라 상응한 템플릿을 호출하여 상기 디지털 휴먼의 형상을 생성할 수 있다. 상기 템플릿은 성인의 형상 템플릿, 어린이의 형상 템플릿 또는 애완동물의 형상 템플릿 등일 수 있다. 생체가 성인인 경우, 성인의 형상 템플릿을 호출할 수 있고; 생체가 어린이인 경우, 어린이의 형상 템플릿을 호출할 수 있다. 진일보로, 생체의 승차 시의 상태 정보가 템플릿과 일치하지 않을 수 있어, 상응한 템플릿 호출 후, 상기 생체의 상태 정보에 기반하여 상기 디지털 휴먼 템플릿의 속성 정보를 조절하여, 표시되는 애니메이션 중 상기 디지털 휴먼의 형상과 상기 생체의 상태 정보가 일치하도록 할 수도 있다. 예를 들어, 생체의 정서에 따라 디지털 휴먼 템플릿의 표정 및 의복과 장신구를 조절한다. 진일보로, 상기 생체의 상태 정보에 기반하여 상기 표시 디바이스에 상기 디지털 휴먼을 표시할 때의 표시 인터페이스를 조절할 수도 있다. 예를 들어, 상기 생체의 정서가 기쁨인 경우, 표시 인터페이스의 배경 컬러는 밝은 컬러로 설정하고 및/또는 표시 인터페이스에 꽃을 뿌리는 효과를 표시할 수 있다.When displaying the animation of the digital human, the shape of the digital human may be created by calling the corresponding template according to the actual situation. The template may be an adult shape template, a child shape template, or a pet shape template. When the living body is an adult, the adult shape template can be called; When the living body is a child, the child's shape template can be called. Further, the state information of the living body when riding may not match the template, so after calling the corresponding template, the attribute information of the digital human template is adjusted based on the state information of the living body, so that the digital The shape of the human and the state information of the living body may be matched. For example, according to the emotion of the living body, the facial expression of the digital human template and clothes and accessories are adjusted. Further, a display interface when displaying the digital human on the display device may be adjusted based on the state information of the living body. For example, when the emotion of the living body is joy, the background color of the display interface may be set to a bright color and/or the effect of scattering flowers on the display interface may be displayed.

상술한 방식으로 다른 디지털 휴먼 형상을 생성하고 표시함으로써 일정한 정도에서 디지털 휴먼의 형상이 생체에 부합되도록 하여, 인터랙션 과정에서 생체가 친밀하고 온화한 느낌을 받도록 하여, 디지털 휴먼과의 인터랙션 과정 중의 피드백 정보에 대한 생체의 수용 정도를 향상시킨다.By creating and displaying other digital human shapes in the above-described way, the digital human shape conforms to the living body to a certain extent, so that the living body feels intimate and gentle in the interaction process, and feedback information during the interaction process with the digital human To improve the degree of acceptance of the living body.

상기 자동차 캐빈 내의 표시 디바이스에 디지털 휴먼이 상응한 동작을 실행하는 애니메이션을 표시하는 외에, 인터랙션 효과를 더욱 향상시키기 위해, 상기 상태 정보와 매칭되는 음성 정보를 결정하고, 상기 애니메이션을 표시하는 동시에, 상기 음성 정보에 기반하여 상응한 음성을 동기화 재생할 수도 있다. 예를 들어, 상기 매칭되는 동작 정보가 안전벨트를 푸는 동작에 대응하는 동작 정보이면, 상기 표시 디바이스에 디지털 휴먼이 상응한 동작을 실행하는 애니메이션을 표시하는 동시에, "꼬마 친구, 주행 과정 중 안전벨트를 푸는 것은 아주 위험합니다"라는 한 단락의 음성을 재생할 수도 있다. 실제 응용에 있어서, 신경망을 이용하여 상기 상태 정보와 매칭되는 음성 정보를 결정하거나, 또는, 다른 상태 정보와 음성 정보 사이의 매핑 관계를 미리 구축하고, 또한 상기 매핑 관계에 기반하여 상기 매칭되는 음성 정보를 결정할 수도 있다. 진일보로, 상기 음성을 재생하는 동시에, 상기 표시 인터페이스에 상기 음성에 대응하는 자막 정보를 표시할 수도 있다.In addition to displaying an animation in which a digital human performs a corresponding action on a display device in the car cabin, to further enhance the interaction effect, voice information matching the state information is determined, and the animation is displayed, at the same time, the animation is displayed. Based on the voice information, the corresponding voice may be synchronously reproduced. For example, if the matching motion information is motion information corresponding to an action to unfasten the seat belt, an animation in which the digital human executes the corresponding action is displayed on the display device, and at the same time, "Little friend, seat belt in the course of driving" It is also possible to play the voice of one paragraph, "It is very dangerous to solve . In a practical application, a neural network is used to determine voice information matching the state information, or a mapping relationship between other state information and voice information is established in advance, and the matched voice information is also based on the mapping relationship. may decide Further, while the audio is reproduced, caption information corresponding to the audio may be displayed on the display interface.

재생되는 음성 및 표시되는 애니메이션이 동기화되는 것은 재생되는 음성 내용이 애니메이션 중 디지털 휴먼이 실행하는 동작과 서로 매칭되는 것을 가리키고, 여기에서의 동작은 몸짓, 입모양 동작, 눈 동작 등 중 적어도 하나를 포함할 수 있다. 예를 들어, 애니메이션 중 디지털 휴먼이 손을 흔드는 동작을 실행하고, 또한 입모양 동작이 "안녕하세요"에 대응하는 동작인 경우, "안녕하세요"라는 음성 내용을 재생한다. 구체적으로, 상기 상태 정보와 매칭되는 음성 정보를 결정하고; 상기 음성 정보에 기반하여 타임스탬프를 포함하는 대응하는 음성을 획득하며; 상기 음성을 재생하는 동시에, 상기 동작 정보에 기반하여 상기 디지털 휴먼이 상기 타임스탬프에 대응하는 시각에 상기 동작을 실행하는 애니메이션을 생성하고 표시할 수 있다. 상술한 방식으로, 재생되는 음성과 애니메이션 중 디지털 휴먼의 동작을 동기화 재생함으로써, 디지털 휴먼의 인격화 정도를 더 향상시킬 수 있고, 디지털 휴먼과 생체의 인터랙션이 더욱 자연스럽도록 한다.Synchronization of the reproduced voice and the displayed animation indicates that the reproduced voice content matches an action executed by a digital human during the animation, wherein the action includes at least one of a gesture, a mouth action, an eye action, etc. can do. For example, when the digital human performs a motion of waving a hand during animation, and a mouth motion is a motion corresponding to “hello”, the voice content of “hello” is played. Specifically, determining voice information matching the state information; obtain a corresponding voice including a timestamp based on the voice information; While reproducing the voice, an animation in which the digital human executes the motion at a time corresponding to the timestamp may be generated and displayed based on the motion information. In the above-described manner, by synchronizing and reproducing the operation of the digital human among the reproduced voice and animation, the degree of personification of the digital human can be further improved, and the interaction between the digital human and the living body is made more natural.

상기 음성은 음성 데이터 베이스에서 끌어오고, 끌어온 음성은 당해 음성의 타임스탬프를 포함하며, 상기 애니메이션 중 디지털 휴먼이 상응한 동작을 실행하는 시간이 상기 음성과 동기화하도록 하는데 사용된다. 음성을 끌어올 때, 또한 상기 생체의 상태 정보(예를 들어, 속성 정보, 정서 정보 중 적어도 하나)를 획득하고, 상기 생체의 상태 정보를 음성 데이터 베이스에 발송하여, 상기 음성 데이터 베이스에서 대응하는 음성을 끌어오도록 할 수도 있다. 예를 들어, 생체가 어린이인 경우, 어린이의 음색에 부합되는 음성을 끌어온다.The voice is pulled from a voice database, the fetched voice includes a timestamp of the voice, and the time during which the digital human performs a corresponding action during the animation is used to synchronize the voice with the voice. When retrieving a voice, it also acquires the state information of the living body (for example, at least one of attribute information and emotion information), and sends the state information of the living body to a voice database, You can even have the voice dragged in. For example, when the living body is a child, a voice matching the tone of the child is drawn.

한 단락의 음성은 흔히 복수의 음소를 포함하고, 음소는 음성의 자연 속성에 따라 나뉜 최소 음성 단위이며, 음절의 발음 동작에 따라 분석하며, 하나의 발음 동작은 하나의 음소를 구성한다. 예를 들어, "안녕"은 "안" 및 "녕" 두 개의 음소를 포함한다. 음성에 복수의 음소를 포함하는 경우, 상기 타임스탬프에 각 음소의 타임스탬프를 포함할 수 있다. 하나의 동작 중에 일반적으로 복수의 서브 동작을 포함하게 되고, 예를 들어, 손을 흔드는 동작은 팔을 왼쪽으로 흔드는 서브 동작 및 오른쪽으로 흔드는 서브 동작을 포함할 수 있다. 표시되는 디지털 휴먼이 더욱 생동하고 형상적이게 하기 위해, 각각의 서브 동작과 음성 중의 하나의 음소를 각각 매칭시킬 수 있다. 구체적으로, 각 음소의 타임스탬프에 기반하여, 상기 각 음소와 매칭되는 서브 동작의 실행 시간을 결정하고; 상기 동작 정보에 기반하여, 상기 디지털 휴먼이 각 음소의 타임스탬프에 당해 음소와 매칭되는 서브 동작을 실행하는 애니메이션을 생성하고 표시할 수 있다. 예를 들어, 음소 "안"을 재생하는 동시에, "안"와 매칭되는 입모양 동작을 표시하고, 또한 디지털 휴먼이 왼쪽으로 팔을 흔드는 동작을 표시하고, 음소 "녕"을 재생하는 동시에, "녕"과 매칭되는 입모양 동작을 표시하고, 또한 디지털 휴먼이 오른쪽으로 팔을 흔드는 동작을 표시한다. 각 음소와 디지털 휴먼의 동작을 동기화하여, 동기화 과정의 정확도를 향상시키고, 디지털 휴먼의 동작과 음성 재생이 더욱 생동하도록 하며, 디지털 휴먼의 인격화 정도를 더욱 향상시켰다.A voice in a paragraph often includes a plurality of phonemes, a phoneme is a minimum phonetic unit divided according to the natural properties of a voice, and is analyzed according to the pronunciation operation of a syllable, and one pronunciation operation constitutes one phoneme. For example, "hello" includes two phonemes "an" and "hyeong". When a voice includes a plurality of phonemes, a timestamp of each phoneme may be included in the timestamp. In general, a plurality of sub-motions are included in one motion. For example, the motion of waving a hand may include a sub motion of waving an arm to the left and a sub motion of waving an arm to the right. In order to make the displayed digital human more lively and figurative, it is possible to match each sub-action and one phoneme of the voice, respectively. Specifically, based on the timestamp of each phoneme, determine an execution time of a sub-operation matching each phoneme; Based on the motion information, the digital human may generate and display an animation in which the digital human executes a sub-action matching the corresponding phoneme at the timestamp of the phoneme. For example, while playing the phoneme "an", at the same time displaying a mouth motion matching "an", and also displaying the motion of the digital human waving its arm to the left, while playing the phoneme "nyeong", " It displays a mouth-shaped motion that matches "nyeong", and also displays the digital human's right arm waving motion. By synchronizing each phoneme and the operation of the digital human, the accuracy of the synchronization process was improved, the operation and voice reproduction of the digital human more vivid, and the degree of personification of the digital human was further improved.

상술한 상기 동작 정보에 상응한 동작은 동작 모델 베이스에서 호출할 수 있다. 구체적으로, 동작 모델 베이스에서 상기 동작 정보에 대응하는 적어도 하나의 디지털 휴먼의 동작 세그먼트를 호출하고; 상기 적어도 하나의 디지털 휴먼의 동작 이미지 프레임 중의 각 프레임을 순서대로 상기 표시 디바이스에 표시할 수 있다. 다른 동작 이미지 프레임에 대응하는 디지털 휴먼의 몸짓, 얼굴 표정 동작, 입모양 동작, 눈 동작 등 중 적어도 하나는 다르고, 대응하는 동작 이미지 프레임을 호출하고 순서대로 표시하는 것을 통해, 표시 디바이스에 디지털 휴먼이 실행하는 상기 상응한 동작의 애니메이션을 표시할 수 있다. 상술한 방식으로, 대응하는 프레임만 호출하여 디지털 휴먼의 애니메이션을 표시할 수 있으며, 표시 효율이 높고, 코스트가 낮다.An operation corresponding to the above-described operation information may be called from the operation model base. Specifically, call a motion segment of at least one digital human corresponding to the motion information from the motion model base; Each frame among the frame of the motion image of the at least one digital human may be sequentially displayed on the display device. At least one of a gesture, a facial expression action, a mouth action, an eye action, etc. of the digital human corresponding to the different motion image frames is different, and by calling and sequentially displaying the corresponding motion image frames, the digital human is displayed on the display device. An animation of the corresponding action being executed may be displayed. In the above manner, the animation of the digital human can be displayed by calling only the corresponding frame, and the display efficiency is high and the cost is low.

더욱 많은 음성 소재 및 동작 소재를 대응된 상기 음성 데이터 베이스 및 동작 모델 베이스에 추가하기 위해 상술한 음성 데이터 베이스 및 동작 모델 베이스는 모두 업데이트될 수 있다. 상기 음성 데이터 베이스 및 동작 모델 베이스의 업데이트 방식은 유사하고, 여기서 단 음성 데이터 베이스를 예로 들어, 데이터 베이스의 업데이트 방식에 대해 설명하며, 동작 모델 베이스의 업데이트 방식은 음성 데이터 베이스를 참조할 수 있으며, 여기서 중복된 서술을 생략하도록 한다. 음성 데이터 베이스 업데이트 시, 서버에서 업데이트 패키지를 다운받고, 상기 업데이트 패키지 중의 데이터를 해석하고, 또한 상기 업데이트 패키지 중의 데이터를 상기 음성 데이터 베이스에 추가하거나 또는 상기 업데이트 패키지 중 데이터로 상기 음성 데이터 베이스 중의 고유 데이터를 대체할 수 있다.In order to add more voice material and motion material to the corresponding voice database and motion model base, both the above-mentioned voice database and motion model base can be updated. The updating method of the voice database and the motion model base is similar, and here, taking the voice database as an example, the updating method of the database will be described, and the updating method of the motion model base may refer to the voice database, Here, duplicate descriptions will be omitted. When a voice database is updated, an update package is downloaded from the server, data in the update package is analyzed, and data in the update package is added to the voice database or unique in the voice database as data in the update package. data can be replaced.

일 실시예에서, 일정한 시간 간격으로 상기 음성 데이터 베이스를 업데이트할 수 있고; 다른 일 실시예에서, 서버에서 푸시한 업데이트 프롬프트 메시지를 수신하고, 또한 상기 업데이트 프롬프트 메시지에 응답하여 상기 음성 데이터 베이스를 업데이트할 수 있으며; 또 다른 일 실시예에서, 상기 생체에서 발송한 업데이트 명령어를 수신하고, 상기 업데이트 명령어에 응답하여 서버에 업데이트 요청을 발송하고, 또한 서버가 상기 업데이트 요청에 응답하여 회신한 업데이트 패키지를 수신한 후, 상기 업데이트 패키지에 기반하여 음성 데이터 베이스를 업데이트할 수도 있다. 또한 기타 방식으로 상기 음성 데이터 베이스를 업데이트할 수도 있으며, 여기서 중복된 서술을 생략하도록 한다.In one embodiment, the voice database may be updated at regular time intervals; In another embodiment, receive an update prompt message pushed by the server, and update the voice database in response to the update prompt message; In another embodiment, after receiving the update command sent from the living body, sending an update request to the server in response to the update command, and receiving the update package returned by the server in response to the update request, The voice database may be updated based on the update package. In addition, the voice database may be updated in other ways, and the redundant description will be omitted here.

일부 실시예에 있어서, 또한 차량의 주행 상태를 획득하고, 상기 차량의 주행 상태 및 상기 상태 정보와 각각 매칭되는 동작 정보를 결정할 수도 있다. 예를 들어, 차량이 주행 중인 경우, 생체의 몸짓이 안전벨트를 푸는 동작으로 검출되면, 상기 매칭되는 동작 정보는 상기 생체가 안전벨트를 풀지 말라는 동작에 대응하는 동작 정보를 포함한다. 차량이 멈추고 시동이 꺼지는 경우, 생체의 몸짓이 안전띠를 푸는 동작으로 검출되면, 상기 매칭되는 동작 정보는 손을 흔들어 작별하는 동작에 대응하는 동작 정보를 포함하는 것을 결정한다. 이러한 방식으로, 매칭되는 동작 정보를 더욱 정확하게 결정할 수 있어, 오판단 확률을 줄인다.In some embodiments, it is also possible to obtain the driving state of the vehicle, and determine the driving state of the vehicle and operation information respectively matched with the state information. For example, when the vehicle is driving, when a gesture of the living body is detected as an operation to unfasten the seat belt, the matching operation information includes operation information corresponding to the movement of the living body not to unfasten the seat belt. When the vehicle is stopped and the engine is turned off, if a body gesture is detected as a motion to loosen the seat belt, it is determined that the matching motion information includes motion information corresponding to the motion of waving a hand to say goodbye. In this way, it is possible to more accurately determine the matching motion information, thereby reducing the probability of misjudgment.

일부 실시예에 있어서, 상기 상태 정보에 기반하여 89의 실행 상태를 제어할 수도 있다. 상기 차량 탑재 디바이스는 자동차 캐빈 내의 조명 디바이스, 공조기, 차창, 오디오 재생디바이스 및 좌석 중 적어도 하나를 포함한다. 예를 들어, 상기 생체가 피곤하거나 이미 잠든 경우, 좌석을 눕혀, 상기 생체가 가능한 좌석에 반듯이 눕도록 할 수 있다. 또 예를 들어, 상기 생체가 더워할 경우, 차창을 열거나 또는 공조기를 틀어, 상기 자동차 캐빈 내의 자동차 캐빈 환경을 조절하여, 생체에게 더욱 쾌적하고 안전한 승차 환경을 제공한다.In some embodiments, the execution state of 89 may be controlled based on the state information. The in-vehicle device includes at least one of a lighting device in an automobile cabin, an air conditioner, a window, an audio playback device, and a seat. For example, when the living body is tired or has already fallen asleep, the seat may be laid so that the living body may lie flat on the available seat. Also, for example, when the living body is hot, a car window is opened or an air conditioner is turned on to adjust the car cabin environment in the car cabin, thereby providing a more comfortable and safe riding environment to the living body.

실제 응용에 있어서, 본 출원의 실시예를 통해, 생체가 자동차 캐빈에 들어가거나 자동차 캐빈 내의 좌석에서 이탈하거나, 차 문을 열거나 또는 차 문을 닫거나, 안전벨트를 착용하거나 또는 푸는 등 행위를 검출할 수 있고, 또한 생체의 상태 정보에 기반하여, 디지털 휴먼이 상응한 동작을 실행하는 애니메이션을 표시하여, 상술한 행위에 대해 각각 상기 생체와 상호 작용을 실행할 수 있다.In practical applications, through the embodiments of the present application, an action of a living body entering a car cabin or leaving a seat in the car cabin, opening a car door or closing a car door, fastening or unfastening a seat belt, etc. is detected. Also, based on the state information of the living body, an animation in which the digital human executes a corresponding action may be displayed, so that each of the above-described actions may be interacted with the living body.

생체가 자동차 캐빈에 들어가거나 자동차 캐빈 내의 좌석에서 이탈하는 것을 검출하는 것을 예로 들면, 일부 실시예에 있어서, 자동차 캐빈에 승차한 생체의 상태 정보를 획득하는 것은, 자동차 캐빈 내의 목표 이미지를 획득하고, 상기 목표 이미지 중의 생체를 인식하며, 상기 생체의 위치 정보에 기반하여, 상기 생체가 자동차 캐빈 내의 좌석에 위치하는지 여부를 결정하는 것을 포함할 수 있다. 상기 목표 이미지는 자동차 캐빈 내의 감시 제어 비디오에 기반하여 획득할 수 있다. 구체적으로, 상기 목표 이미지 중의 각각의 객체의 객체 정보(당해 객체의 중심점의 위치 정보 및 당해 객체의 중심점에 대응하는 객체 유형 정보를 포함)를 결정하고, 상기 객체 유형 정보에 기반하여 목표 이미지 중의 각각의 객체 중에서 생체 및 좌석 선별하며, 다시 생체의 중심점의 위치 및 좌석의 중심점 위치에 기반하여 생체가 좌석에 위치하는지 여부를 결정할 수 있다. 상기 생체가 상기 좌석에 위치하지 않는다는 결정에 응답하여, 프롬프트 메시지를 발송한다. 목표 이미지 중의 객체는 사람 얼굴, 인체, 뒷좌석, 카시트 등을 포함할 수 있다. 예를 들어, 차량 주행 중인 경우, 생체가 좌석에 위치하지 않는 것이 검출되면, 생체가 안전벨트를 착용하지 않은 것으로 결정할 수 있고, 컨트롤 패널에 당해 생체에 대응하는 디지털 휴먼이 좌석에 앉고 또한 안전벨트를 착용하는 시범 동작을 실행하는 애니메이션을 표시하는 동시에 "꼬마 친구, 빨리 좌석에 앉고 나와 함께 안전벨트를 착용해요"라는 음성을 재생할 수 있고, 애니메이션 중 디지털 휴먼의 입모양 및 몸짓이 재생하는 음성과 매칭되게 한다.For example, detecting that the living body enters the car cabin or leaves the seat in the car cabin, in some embodiments, acquiring the status information of the living body riding in the car cabin includes: acquiring a target image in the car cabin; and recognizing the living body in the target image, and determining whether the living body is located in a seat in a car cabin based on the location information of the living body. The target image may be acquired based on a surveillance control video in a car cabin. Specifically, object information (including position information of the center point of the object and object type information corresponding to the center point of the object) of each object in the target image is determined, and each of the target images is determined based on the object type information The living body and the seat are selected from among the objects of , and it can be determined whether the living body is located in the seat based on the position of the center point of the living body and the position of the center point of the seat. In response to determining that the living body is not located in the seat, a prompt message is sent. The object in the target image may include a human face, a human body, a rear seat, a car seat, and the like. For example, if it is detected that the living body is not in the seat while the vehicle is being driven, it may be determined that the living body is not wearing the seat belt, and a digital human corresponding to the living body sits on the seat on the control panel and the seat belt While displaying an animation of running a demonstration motion of wearing make it match

일부 실시예에 있어서, 이하의 단계 601-604를 통해 목표 이미지 중 각각의 객체의 중심점의 위치 정보를 결정할 수 있다.In some embodiments, location information of the center point of each object in the target image may be determined through steps 601 to 604 below.

단계 601에서, 상기 목표 이미지에 대해 특징 추출을 실행하여, 상기 목표 이미지에 대응하는 제1 특징맵을 얻는다. 일부 실시예에 있어서, 먼저 목표 이미지를 제1 신경망에 입력하여 이미지 특징 추출을 실행하여, 하나의 초기 특징맵을 얻는다. 그 후 당해 초기 특징맵을 객체 정보 추출을 위한 하나의 제2 신경망에 입력하여, 상술한 제1 특징맵을 얻는다. 상술한 목표 이미지는 하나의 사이즈가 640*480 화소인 이미지일 수 있고, 제1 신경망 처리를 거친 후 80*60*C인 초기 특징맵을 얻을 수 있다. 그 중에서, C는 채널 수량을 나타낸다. 초기 특징맵은 객체 정보 추출을 위한 제2 신경망 처리를 거친 후, 하나의 80*60*3인 제1 특징맵을 얻을 수 있다.In step 601, feature extraction is performed on the target image to obtain a first feature map corresponding to the target image. In some embodiments, first, the target image is input to the first neural network and image feature extraction is performed to obtain one initial feature map. Thereafter, the initial feature map is input to a second neural network for extracting object information to obtain the above-described first feature map. The above-described target image may be an image having one size of 640*480 pixels, and an initial feature map of 80*60*C may be obtained after first neural network processing. Among them, C represents the channel quantity. After the initial feature map is processed by the second neural network for extracting object information, a first feature map of 80*60*3 can be obtained.

단계 602에서, 상기 제1 특징맵의 제1 미리 설정된 채널 중에서, 상기 제1 특징맵 중 각 특징점을 객체 중심점의 응답치로서 획득한다. 제1 미리 설정된 채널은 제1 특징맵 중 제0 채널일 수 있고, 당해 채널은 객체 중심점의 채널이며, 당해 채널 중의 응답치는 제1 특징맵 중 각각의 특징점이 객체의 중심점으로서의 가능성을 나타낼 수 있다. 제1 미리 설정된 채널 중 각각의 특징점에 대응하는 응답치를 획득한 후, sigmoid를 이용하여 이러한 응답치를 0 및 1 사이로 전환시킬 수 있다.In step 602, from among the first preset channels of the first feature map, each feature point in the first feature map is acquired as a response value of an object center point. The first preset channel may be channel 0 of the first feature map, the channel is a channel of an object center point, and a response value in the channel may indicate the possibility that each feature point in the first feature map is a center point of an object. . After obtaining a response value corresponding to each feature point among the first preset channels, the response value may be switched between 0 and 1 using a sigmoid.

단계 603에서, 상기 제1 특징맵을 복수의 서브 영역으로 분할하고, 또한 각 서브 영역 내 최대 응답치 및 최대 응답치에 대응하는 특징점을 결정한다. 일부 실시예에 있어서, 제1 특징맵에 대해 3×3의 스텝이 1인 최대 풀링(max pooling) 조작을 실행하여, 3×3 내의 최대 응답치 및 제1 특징맵에서 당해 최대 응답치의 위치 인덱스를 얻을 수 있다. 60×80개 최대 응답치 및 이러한 최대 응답치 각각에 대응하는 위치 인덱스를 얻을 수 있다. 그 후, 같은 위치 인덱스를 병합하여, N개의 최대 응답치, 각 최대 응답치에 대응하는 위치 인덱스 및 각 최대 응답치에 대응하는 특징점을 얻을 수도 있다.In step 603, the first feature map is divided into a plurality of sub-regions, and a maximum response value and a feature point corresponding to the maximum response value in each sub-region are determined. In some embodiments, a max pooling operation in which a step of 3×3 is 1 is performed on the first feature map, and the maximum response value within 3×3 and the position index of the maximum response value in the first feature map are performed. can get It is possible to obtain 60×80 maximum response values and a position index corresponding to each of these maximum response values. Thereafter, by merging the same location indexes, N maximum response values, a location index corresponding to each maximum response value, and a feature point corresponding to each maximum response value may be obtained.

단계 604에서, 최대 응답치가 미리 설정된 임계값보다 큰 특징점을 객체의 중심점으로 하여, 제1 특징맵에서의 상기 객체의 중심점의 위치 인덱스에 기반하여 객체의 중심점의 위치 정보를 결정한다. 임계값(thrd)을 미리 설정할 수 있고, 최대 응답치가 thrd보다 클 경우, 당해 최대 응답치에 대응하는 특징점을 객체의 중심점으로 판단한다.In step 604, position information of the center point of the object is determined based on the position index of the center point of the object in the first feature map by using a feature point with a maximum response value greater than a preset threshold as the center point of the object. A threshold value thrd may be preset, and when the maximum response value is greater than thrd, a feature point corresponding to the maximum response value is determined as the center point of the object.

상술한 실시예에서, 제1 특징맵 중의 응답치에 대해 최대 풀링 처리를 실행하여, 국부 범위 내에서 객체의 중심점이 될 가능성이 제일 높은 특징점을 찾을 수 있음으로써, 결정된 중심점의 정확도를 효과적으로 향상시킬 수 있다.In the above-described embodiment, the maximum pooling process is performed on the response values in the first feature map to find the feature point most likely to be the center point of the object within the local range, thereby effectively improving the accuracy of the determined center point. can

일부 실시예에 있어서, 객체의 중심점 및 중심점의 위치 정보를 객체의 중심점 정보로 한다. 일부 실시예에 있어서, 중심점 정보는 객체의 중심점의 길이 정보 및 폭 정보를 포함할 수도 있다. 이 경우, 이하의 단계를 통해 중심점의 길이 정보 및 폭 정보를 결정할 수 있다：In some embodiments, the center point of the object and location information of the center point are used as center point information of the object. In some embodiments, the center point information may include length information and width information of the center point of the object. In this case, length information and width information of the center point can be determined through the following steps:

상기 제1 특징맵의 제2 미리 설정된 채널 중에서, 객체의 중심점의 위치 인덱스에 대응하는 위치에서, 상기 객체의 중심점의 길이 정보를 획득한다. 상기 제1 특징맵의 제3 미리 설정된 채널 중에서, 상기 객체의 중심점의 위치 인덱스에 대응하는 위치에서, 상기 객체의 중심점의 폭 정보를 획득한다.In the second preset channel of the first feature map, at a position corresponding to the position index of the center point of the object, information on the length of the center point of the object is obtained. In the third preset channel of the first feature map, at a position corresponding to the position index of the center point of the object, information on the width of the center point of the object is obtained.

상술한 제2 미리 설정된 채널은 제1 특징맵 중의 제1 채널일 수 있고, 제3 미리 설정된 채널은 제1 특징맵 중의 제2 채널일 수 있다. 제1 특징맵 중의 제1 채널 중에서, 중심점에 대응하는 위치에서 중심점의 길이 정보를 획득하고, 제1 특징맵 중의 제2 채널 중에서, 중심점에 대응하는 위치에서 중심점의 폭 정보를 획득한다.The above-described second preset channel may be a first channel in the first feature map, and the third preset channel may be a second channel in the first feature map. In the first channel of the first feature map, information on the length of the center point is obtained at a position corresponding to the center point, and information on the width of the center point is obtained at a position corresponding to the center point among the second channels in the first feature map.

객체의 중심점을 결정한 후, 당해 객체의 중심점의 위치 인덱스를 이용하여, 제1 특징맵의 기타 미리 설정된 채널 중에서 객체의 중심점의 길이 정보 및 폭 정보를 정확하게 획득할 수 있다.After determining the center point of the object, length information and width information of the center point of the object may be accurately obtained from among other preset channels of the first feature map by using the position index of the center point of the object.

객체는 사람 얼굴, 인체, 뒷좌석, 카시트 등을 포함할 수 있기에, 구체적으로 실시 시, 다른 신경망을 이용하여 다른 객체에 대응하는 제1 특징맵을 결정한 후, 다시 다른 제1 특징맵에 기반하여 다른 객체의 중심점, 각 중심점의 위치 정보, 각 중심점의 길이 정보 및 각 중심점의 폭 정보를 결정해야 한다.Since the object may include a human face, a human body, a rear seat, a car seat, etc., when specifically implemented, a first feature map corresponding to another object is determined using another neural network, and then another first feature map is determined based on another first feature map. The center point of the object, location information of each center point, length information of each center point, and width information of each center point must be determined.

일부 실시예에 있어서, 이하의 단계 701-703을 통해 상기 목표 이미지 중 당해 객체의 중심점에 대응하는 객체 유형 정보를 결정한다：In some embodiments, object type information corresponding to the center point of the object in the target image is determined through the following steps 701-703:

단계 701에서, 상기 목표 이미지에 대해 특징 추출을 실행하여, 상기 목표 이미지에 대응하는 제2 특징맵을 얻는다. 목표 이미지를 제3 신경망에 입력하고 이미지 특징 추출을 실행하여, 하나의 초기 특징맵을 얻은 후, 당해 초기 특징맵을 객체 유형 인식을 위한 제4 신경망에 입력하고 처리하여, 제2 특징맵을 얻으며, 당해 제2 특징맵에 기반하여 객체의 중심점에 대응하는 객체 유형 정보를 결정한다. 상술한 제2 특징맵은 하나의 80*60*2인 특징맵일 수 있다. 상술한 제3 신경망은 제1 신경망과 같을 수 있다.In step 701, feature extraction is performed on the target image to obtain a second feature map corresponding to the target image. After inputting the target image into the third neural network and performing image feature extraction to obtain one initial feature map, the initial feature map is input and processed into the fourth neural network for object type recognition to obtain a second feature map, , determine object type information corresponding to the center point of the object based on the second feature map. The above-described second feature map may be a single 80*60*2 feature map. The above-described third neural network may be the same as the first neural network.

어린이에 대해 인식하는 응용 장면에서, 제2 특징맵에서 각 특징점에 하나의 2차원 특징 벡터가 대응되어 있고, 상술한 제2 특징맵에서의 특징점의 2차원 특징 벡터에 대응하여 객체의 중심점에 대해 분류 처리를 실행하여, 분류 결과를 얻을 수 있고, 만약 하나의 분류 결과가 어린이를 대표하고, 다른 하나의 분류 결과가 다른 것을 대표하면, 상술한 분류 결과에 기반하여 당해 객체의 중심점에 대응하는 객체 유형 정보가 어린이인지 여부를 결정할 수 있다. 어린이에 대해 인식하는 응용 장면에서, 상술한 객체는 인체 또는 사람 얼굴일 수 있다.In the application scene for recognizing children, one two-dimensional feature vector corresponds to each feature point in the second feature map, and the center point of the object corresponds to the two-dimensional feature vector of the feature point in the second feature map described above. Execute classification processing to obtain a classification result, and if one classification result represents a child and the other classification result represents another object, an object corresponding to the center point of the object based on the above-described classification result It is possible to determine whether the type information is a child. In an application scene for recognizing a child, the above-described object may be a human body or a human face.

카시트에 대해 인식하는 응용 장면에서, 제2 특징맵에서 각 특징점에 하나의 2차원 특징 벡터가 대응되어 있고, 상술한 제2 특징맵에서의 특징점의 2차원 특징 벡터에 대응하여 객체의 중심점에 대해 분류 처리를 실행하여, 분류 결과를 얻을 수 있고, 만약 하나의 분류 결과가 카시트를 대표하고, 다른 하나의 분류 결과가 다른 것을 대표하면, 상술한 분류 결과에 기반하여 당해 객체의 중심점에 대응하는 객체 유형 정보가 카시트인지 여부를 결정할 수 있다. 동일한 방법을 이용하여 뒷좌석 등에 대해 인식할 수 있는 것을 이해할 수 있다.In the application scene for recognizing the car seat, one two-dimensional feature vector corresponds to each feature point in the second feature map, and the center point of the object corresponds to the two-dimensional feature vector of the feature point in the second feature map described above. Execute classification processing to obtain a classification result, and if one classification result represents a car seat and another classification result represents another object, the object corresponding to the center point of the object based on the above-described classification result It can be determined whether the type information is a car seat. Using the same method, it can be understood that the rear seat and the like can be recognized.

객체는 사람 얼굴, 인체, 뒷좌석, 카시트 등을 포함할 수 있기에, 구체적으로 실시 시, 다른 신경망을 이용하여 다른 객체에 대응하는 제2 특징맵을 결정한 후, 다시 다른 제2 특징맵에 기반하여 다른 객체의 객체 유형 정보를 결정해야 한다.Since the object may include a human face, a human body, a rear seat, a car seat, etc., when specifically implemented, a second feature map corresponding to another object is determined using another neural network, and then another second feature map is determined based on another second feature map. It is necessary to determine the object type information of the object.

단계 702에서, 제1 특징맵에서의 객체의 중심점의 위치 인덱스에 기반하여, 상기 제2 특징맵에서의 상기 객체의 중심점의 위치 인덱스를 결정한다.In step 702, the position index of the center point of the object in the second feature map is determined based on the position index of the center point of the object in the first feature map.

단계 703에서, 상기 제2 특징맵에서의 상기 객체의 중심점의 위치 인덱스에 대응하는 위치에서, 상기 객체의 중심점에 대응하는 객체 유형 정보를 획득한다. 상술한 바와 같이 객체의 중심점을 결정한 후, 중심점의 위치 인덱스를 이용하여, 객체의 중심점에 대응하는 객체 유형 정보를 정확하게 획득할 수 있다.In step 703, object type information corresponding to the center point of the object is obtained at a location corresponding to the location index of the center point of the object in the second feature map. After determining the center point of the object as described above, object type information corresponding to the center point of the object may be accurately obtained by using the location index of the center point.

각각의 객체의 중심점에 대응하는 객체 유형 정보를 결정한 후, 이하의 단계 801-803을 통해 목표 이미지 중의 특정 단체(예를 들어, 어린이, 애완동물 등)를 인식하여, 특정 단체에 대응하는 디지털 휴먼과 특정 단체가 상호 작용을 실행하도록 할 수 있다. 설명의 편리를 위해, 이하 어린이를 예를 들어 설명하고자 하며, 다른 단체의 인식 방식은 유사하여, 여기서 중복된 서술을 생략하도록 한다.After determining object type information corresponding to the center point of each object, a specific group (eg, children, pets, etc.) in the target image is recognized through the following steps 801-803, and a digital human corresponding to the specific group and specific groups to implement the interaction. For convenience of explanation, the following description will be made with a child as an example, and the recognition methods of other groups are similar, so duplicate descriptions will be omitted here.

단계 801에서, 각 인체의 중심점에 대응하는 위치 오프셋 정보에 기반하여, 각 인체와 매칭되는 사람 얼굴의 중심점의 예측 위치 정보를 각각 결정하고; 여기서, 동일한 사람에 속하는 인체와 안면은 서로 매칭된다. 상술한 위치 오프셋 정보 결정 시, 먼저 목표 이미지를 제5 신경망에 입력하여 이미지 특징 추출을 실행하여, 하나의 초기 특징맵을 얻을 수 있다. 그 후 다시 당해 초기 특징맵을 상술한 위치 오프셋 정보를 결정하기 위한 제6 신경망 입력하여 하나의 특징맵을 얻으며, 당해 특징맵에 기반하여 각 인체의 중심점에 대응하는 위치 오프셋 정보를 결정할 수 있다. 당해 특징맵은 하나의 80*60*2의 특징맵일 수 있다. 제5 신경망은 제1 신경망과 같을 수 있다.In step 801, based on the position offset information corresponding to the center point of each human body, the predicted position information of the center point of the human face matching each human body is determined, respectively; Here, the human body and the face belonging to the same person are matched with each other. When determining the above-described position offset information, an initial feature map may be obtained by first inputting a target image to the fifth neural network and performing image feature extraction. After that, the initial feature map is again input to the sixth neural network for determining the above-described position offset information to obtain one feature map, and position offset information corresponding to the center point of each human body can be determined based on the feature map. The feature map may be a single 80*60*2 feature map. The fifth neural network may be the same as the first neural network.

단계 802에서, 결정된 예측 위치 정보 및 각 사람 얼굴의 중심점의 위치 정보에 기반하여, 각 인체와 매칭되는 사람 얼굴을 결정한다. 일부 실시예에 있어서, 예측 위치 정보에 대응하는 위치에 제일 가까운 중심점의 위치에 대응하는 사람 얼굴을, 인체와 매칭되는 사람 얼굴로 할 수 있다.In step 802, a human face matching each human body is determined based on the determined predicted position information and the position information of the center point of each human face. In some embodiments, the human face corresponding to the position of the center point closest to the position corresponding to the predicted position information may be a human face matching the human body.

단계 803에서, 매칭에 성공한 인체 및 사람 얼굴에 대해, 매칭 성공한 인체의 중심점에 대응하는 객체 유형 정보 및 사람 얼굴의 중심점에 대응하는 객체 유형 정보를 이용하여 당해 매칭에 성공한 인체 및 사람 얼굴이 속하는 사람이 어린이인지 여부를 결정한다. 매칭에 성공한 인체의 중심점에 대응하는 객체 유형 정보가, 대응하는 인체가 속하는 사람이 어린이를 가리키고, 또한 사람 얼굴의 중심점에 대응하는 객체 유형 정보가, 대응하는 사람 얼굴이 속하는 사람이 어린이를 가리키면, 당해 매칭에 성공한 인체 및 사람 얼굴이 속하는 사람을 어린이로 결정한다. 매칭에 성공하지 않은 인체에 대해서는, 당해 인체의 중심점에 대응하는 객체 유형 정보를 이용하여 당해 인체의 중심점이 속하는 사람이 어린이인지 여부를 결정한다. 구체적으로, 만일 당해 인체의 중심점에 대응하는 객체 유형 정보가 어린이를 가리키면, 당해 인체가 속하는 사람을 어린이로 결정한다.In step 803, for the human body and human face that has been successfully matched, the person to which the matching successful human body and human face belongs by using object type information corresponding to the center point of the human body and object type information corresponding to the center point of the human face Decide whether this is a child. When the object type information corresponding to the center point of the human body that has succeeded in matching indicates a child to which the corresponding human body belongs, and the object type information corresponding to the center point of the human face, the person to which the corresponding human face belongs points to a child; The person to which the human body and human face that has succeeded in the matching belong is determined as a child. For a human body that does not succeed in matching, it is determined whether the person to which the center point of the human body belongs is a child by using object type information corresponding to the center point of the human body. Specifically, if the object type information corresponding to the center point of the human body indicates a child, the person to which the human body belongs is determined as the child.

이하 구체적인 예에 결부하여, 본 출원의 실시예에 따른 방안을 설명하도록 한다.Hereinafter, in conjunction with a specific example, a method according to an embodiment of the present application will be described.

생체가 승차하여 착석 시, 감시 제어 시스템을 통해 생체의 감시 제어 비디오를 획득하고, 생체의 분류 정보가 인류이고, 생체의 신분 정보가 A이며, 속성 정보가 어린이인 것이 검출되고, 동시에 생체 A의 정서 정보가 기쁨인 것이 검출되면, 얼굴에 미소를 띠고, 밝은 의복과 장신구인 어린이 디지털 휴먼이 손을 들어 인사하는 애니메이션을 생성하고 표시하는 동시에, "A, 오늘 기분이 좋아 보이네"라는 인사 음성을 재생하며, 도 3에 나타낸 바와 같이, 애니메이션 중 디지털 휴먼의 입모양 및 몸짓이 재생되는 음성과 매칭된다. 만일 생체 신분이 인식되지 않으면, 인사할 때 성별, 나이 등 속성 정보에 따라 통칭, 예를 들어 꼬마 친구, 선생님 등을 사용할 수 있다. 생체 A가 착석 후, 이때 A의 정서가 비교적 평온하면, 음성 인터랙션을 실행하지 않을 수 있다. 차량 주행 시작 후, 생체 A가 안전벨트를 착용하지 않은 것이 검출되면, 컨트롤 패널에 어린이 형상의 디지털 휴먼이 안전벨트를 착용하는 시범 동작의 애니메이션을 표시하는 동시에, "A, 나랑 같이 안전벨트를 착용하자"라는 음성을 재생하고, 애니메이션 중 디지털 휴먼의 입모양 및 몸짓은 재생되는 음성과 매칭된다.When a living body gets on and sits down, a monitoring control video of the living body is acquired through the monitoring and control system, and it is detected that the classification information of the living body is human, the identity information of the living body is A, and the attribute information is a child, and at the same time, the living body A When it is detected that the emotional information is joy, it creates and displays an animation of a child digital human with a smile on his face and raising his hand in bright clothing and accessories to say hello, while at the same time emitting a greeting voice saying "A, you look good today" 3, the mouth and gestures of the digital human during animation are matched with the reproduced voice. If the biometric identity is not recognized, a common name, for example, a little friend, a teacher, etc. may be used according to attribute information such as gender and age when greeting. After the living body A is seated, if the emotion of A is relatively calm at this time, the voice interaction may not be performed. When it is detected that living body A is not wearing a seat belt after starting the vehicle, an animation of a demonstration motion of a child-shaped digital human wearing a seat belt is displayed on the control panel, and at the same time, "A, wear a seat belt with me. Let's do it" is played, and the digital human's mouth and gestures during animation are matched with the reproduced voice.

일정한 시간 경과 후, A가 "더워"라고 말하는 것이 검출되면, 차창을 열고, 공조기를 작동시킨다. 또 일정한 시간 경과 후, A가 미리 설정된 방식(예를 들어, 디지털 휴먼을 터치, 시선이 디지털 휴먼을 주시하거나 또는 음성으로 디지털 휴먼을 부르거나 등)으로 디지털 휴먼과 상호 작용을 실행하고, 디지털 휴먼이 A와 상호 작용을 실행하며, 상호 작용 방식은 대화, 게임 또는 음악 재생 제어일 수 있으며, 상호 작용 시, 상호 작용 내용에 따라 상응한 디지털 휴먼 애니메이션을 표시하고, 음성을 동기화 재생한다. A가 하차 시, 감시 제어 시스템은 A의 좌석에 물건이 남긴 것이 검출되면, 컨트롤 패널에 디지털 휴먼이 손을 흔드는 동작을 실행하는 애니메이션을 표시하는 동시에, "A, 물건을 차에 떨어뜨렸어, 빨리 와서 가져가"라고 음성을 동기화 재생한다. A가 하차 후, 차창을 닫고 음악을 끌 수도 있다.After a certain period of time, when it is detected that A says "it's hot", the window is opened and the air conditioner is activated. Also, after a certain time has elapsed, A interacts with the digital human in a preset manner (for example, by touching the digital human, the gaze gazes at the digital human, or calls the digital human by voice, etc.), and the digital human Interaction with this A is executed, the interaction mode may be dialogue, game or music playback control, and upon interaction, a corresponding digital human animation is displayed according to the content of the interaction, and the voice is synchronized. When A gets off, the monitoring and control system detects that an object has been left on A's seat, and displays an animation of the digital human waving on the control panel while simultaneously displaying "A, object was dropped into the car, Come and take it," the voice is played in sync. After A gets off, he can close the car window and turn off the music.

본 출원의 실시예는 자동차 캐빈 내 생체의 상태 정보에 기반하여, 디지털 휴먼이 상응한 동작을 실행하는 애니메이션을 생성하고 또한 상기 자동차 캐빈 내의 표시 디바이스에 표시하여, 인격화된 인터랙션을 실현할 수 있고, 인터랙션 과정이 더욱 자연스럽고, 사람과 기계의 인터랙션의 따뜻함을 느끼도록 하며, 승차의 즐거움, 쾌적함 및 동반감을 향상시킴으로써, 인터랙션 과정 중의 피드백 정보에 대한 생체의 수용 정도를 향상시켜, 차량 운전 과정 중에서의 생체의 안전성을 향상시킨다. 그 외, 본 출원의 실시예는, 소프트웨어 방식으로 디지털 휴먼의 애니메이션을 생성하여, 코스트가 비교적 낮고, 디지털 휴먼의 반응 속도가 비교적 빠르며, 또한 사후 보수 유지 및 업데이트 및 업그레이드에 편리하다.According to the embodiment of the present application, based on the state information of the living body in the car cabin, an animation in which a digital human performs a corresponding action is generated and displayed on a display device in the car cabin, thereby realizing a personified interaction, the interaction The process is more natural, makes the human-machine interaction feel warm, and improves riding pleasure, comfort, and companionship, thereby improving the degree of acceptance of the living body for the feedback information during the interaction process, thereby improving the living body in the vehicle driving process. improve the safety of In addition, the embodiment of the present application generates an animation of a digital human in a software manner, so that the cost is relatively low, the response speed of the digital human is relatively fast, and it is also convenient for post-maintenance, update, and upgrade.

본 기술 분야의 기술자는, 상술한 구체적인 방식의 방법에서 각 단계의 작성 순서는 엄격하게 순서를 실행하여 실시 과정에 대해 어떠한 한정을 하고자 하는 것이 아니며, 각 단계의 구체적인 실시 순서는 그 기능 또는 내적 로직으로 결정할 수 있는 것을 이해할 수 있다.A person skilled in the art will not want to limit the implementation process by strictly executing the order in the writing order of each step in the above-described specific method, and the specific implementation order of each step may be based on its function or internal logic understand what can be determined with

도 4에 나타낸 바와 같이, 본 출원은 디지털 휴먼에 기반한 자동차 캐빈 인터랙션 장치를 더 제공하고, 상기 장치는,As shown in FIG. 4 , the present application further provides an automobile cabin interaction device based on a digital human, the device comprising:

자동차 캐빈에 승차한 생체의 상태 정보를 획득하도록 설정되는 획득 모듈(401);an acquisition module 401 configured to acquire status information of a living body riding in an automobile cabin;

상기 상태 정보와 매칭되는 동작 정보를 결정하도록 설정되는 결정 모듈(402); 및a determining module (402) configured to determine operation information matching the state information; and

상기 동작 정보에 기반하여 디지털 휴먼이 상응한 동작을 실행하는 애니메이션을 생성하고 또한 상기 자동차 캐빈 내의 표시 디바이스에 표시하도록 설정되는 표시 모듈(403)을 포함한다.and a display module 403 configured to generate an animation in which a digital human performs a corresponding operation based on the operation information and display it on a display device in the automobile cabin.

일부 실시예에 있어서, 상기 표시 모듈(403)은 상기 상태 정보와 매칭되는 음성 정보를 결정하도록 설정되는 제1 결정 유닛; 상기 음성 정보에 기반하여 타임스탬프를 포함하는 대응하는 음성을 획득하도록 설정되는 제1 획득 유닛; 및 상기 음성을 재생하는 동시에, 상기 동작 정보에 기반하여 상기 디지털 휴먼이 상기 타임스탬프에 대응하는 시각에 상기 동작을 실행하는 애니메이션을 생성하고 표시하도록 설정되는 제1 표시 유닛을 포함한다.In some embodiments, the display module 403 includes: a first determining unit, configured to determine voice information matching the state information; a first acquiring unit, configured to acquire a corresponding voice including a timestamp based on the voice information; and a first display unit configured to reproduce the voice and simultaneously generate and display an animation in which the digital human executes the operation at a time corresponding to the timestamp based on the operation information.

일부 실시예에 있어서, 상기 동작은 복수의 서브 동작을 포함하고, 각 서브 동작은 상기 음성 중의 하나의 음소와 매칭되며, 상기 타임스탬프는 각 음소의 타임스탬프를 포함하고; 상기 제1 표시 유닛은 각 음소의 타임스탬프에 기반하여, 상기 각 음소와 매칭되는 서브 동작의 실행 시간을 결정하도록 설정되는 결정 서브 유닛; 상기 동작 정보에 기반하여, 상기 디지털 휴먼이 각 음소의 타임스탬프에 당해 음소와 매칭되는 서브 동작을 실행하는 애니메이션을 생성하고 표시하도록 설정되는 표시 서브 유닛을 포함한다.In some embodiments, the operation includes a plurality of sub-operations, each sub-operation matches one phoneme in the voice, and the timestamp includes a timestamp of each phoneme; The first display unit includes: a determining subunit configured to determine an execution time of a sub-operation matching each phoneme based on a timestamp of each phoneme; and a display sub-unit configured to generate and display an animation in which the digital human executes a sub-action matching the corresponding phoneme at the timestamp of each phoneme, based on the motion information.

일부 실시예에 있어서, 상기 표시 모듈(403)은 동작 모델 베이스에서 상기 동작 정보에 대응하는 적어도 하나의 디지털 휴먼의 동작 세그먼트를 호출하도록 설정되는 호출 유닛; 상기 적어도 하나의 디지털 휴먼의 동작 이미지 프레임 중의 각 프레임을 순서대로 상기 표시 디바이스에 표시하도록 설정되는 제2 표시 유닛을 포함한다.In some embodiments, the display module 403 includes: a calling unit, configured to call a motion segment of at least one digital human corresponding to the motion information in a motion model base; and a second display unit configured to display each frame of the frame of the motion image of the at least one digital human on the display device in order.

일부 실시예에 있어서, 상기 생체의 상태 정보는 상기 생체의 제1 상태 정보를 포함하고, 상기 획득 모듈(401)은 자동차 캐빈 내 뒷줄의 감시 제어 비디오를 수집하도록 설정되는 수집 유닛; 상기 감시 제어 비디오에 대해 생체 검출을 실행하고 검출된 생체에 대해 상태 분석을 실행하여, 상기 생체의 제1 상태 정보를 얻도록 설정되는 검출 분석 유닛을 포함한다.In some embodiments, the biological status information includes the first status information of the living body, and the acquiring module 401 includes: a collection unit configured to collect a surveillance control video of a rear row in a car cabin; and a detection and analysis unit, configured to execute biometric detection on the monitoring control video and perform state analysis on the detected living body to obtain first state information of the living body.

일부 실시예에 있어서, 상기 생체의 상태 정보는 상기 생체의 제1 상태 정보 및 제2 상태 정보를 포함하고, 상기 제1 상태 정보는 자동차 캐빈 내의 감시 제어 비디오에 기반하여 획득하며; 상기 획득 모듈(401)은 상기 생체가 휴대한 스마트기기에서 발송되는 제2 상태 정보를 획득하도록 추가로 설정되고; 상기 결정 모듈(402)은 상기 제1 상태 정보 및 상기 제2 상태 정보와 모두 매칭되는 동작 정보를 결정하도록 설정된다.In some embodiments, the state information of the living body includes the first state information and second state information of the living body, wherein the first state information is obtained based on a surveillance control video in a car cabin; the acquiring module 401 is further configured to acquire second status information sent from the smart device carried by the living body; The determining module 402 is configured to determine operation information matching both the first state information and the second state information.

일부 실시예에 있어서, 상기 획득 모듈(401)은 상기 자동차 캐빈 내의 감시 제어 비디오를 미리 훈련된 신경망에 입력하도록 설정되는 입력 유닛; 상기 신경망의 출력 결과에 기반하여 상기 생체의 상태 정보를 결정하도록 설정되는 제2 결정 유닛을 포함한다.In some embodiments, the acquisition module (401) comprises: an input unit, configured to input the surveillance control video in the car cabin into a pre-trained neural network; and a second determining unit configured to determine the state information of the living body based on an output result of the neural network.

일부 실시예에 있어서, 상기 결정 모듈(402)은 차량의 주행 상태를 획득하고; 상기 차량의 주행 상태 및 상기 상태 정보와 각각 매칭되는 동작 정보를 결정하도록 설정된다.In some embodiments, the determining module 402 obtains the driving state of the vehicle; It is set to determine the driving state of the vehicle and operation information respectively matched with the state information.

일부 실시예에 있어서, 상기 획득 모듈(401)은 상기 자동차 캐빈 내의 목표 이미지에 기반하여, 상기 자동차 캐빈에 승차한 생체를 인식하고; 상기 생체의 위치 정보에 기반하여, 상기 생체가 상기 자동차 캐빈 내의 좌석에 위치하는지 여부를 결정하도록 추가로 설정된다.In some embodiments, the acquiring module 401 is configured to recognize a living body riding in the car cabin based on the target image in the car cabin; It is further configured to determine whether the living body is located in a seat in the automobile cabin based on the location information of the living body.

일부 실시예에 있어서, 상기 획득 모듈(401)은 상기 자동차 캐빈 내의 목표 이미지 중 각각의 객체의 객체 정보를 결정하고; 각각의 상기 객체의 객체 유형 정보에 기반하여 상기 객체에서 상기 생체 및 상기 자동차 캐빈 내의 좌석을 선별하고; 상기 생체의 중심점 위치 및 상기 좌석의 중심점 위치에 기반하여 상기 생체가 상기 좌석에 위치하는지 여부를 결정하도록 추가로 설정되며, 여기서 각각의 상기 객체에 관련하여, 당해 객체의 객체 정보는 당해 객체의 중심점의 위치 정보 및 당해 객체의 중심점에 대응하는 객체 유형 정보를 포함한다.In some embodiments, the acquiring module 401 is configured to determine object information of each object in the target image in the car cabin; selecting the living body and the seat in the car cabin in the object based on object type information of each of the objects; further configured to determine whether the living body is located in the seat based on the center point position of the living body and the center point position of the seat, wherein, with respect to each of the objects, the object information of the object is the center point of the object location information and object type information corresponding to the center point of the object.

일부 실시예에 있어서, 상기 표시 모듈(403)은 상기 좌석에 위치하지 않은 생체에 대응하는 디지털 휴먼이 좌석에 착석하고 또한 안전벨트를 착용하는 시범 동작을 실행하는 애니메이션을 생성하고 또한 상기 자동차 캐빈 내의 표시 디바이스에 표시하도록 추가로 설정된다.In some embodiments, the display module 403 generates an animation in which a digital human corresponding to a living body not located in the seat sits on a seat and performs a demonstration operation of wearing a seat belt, and also in the car cabin. It is further set to display on a display device.

일부 실시예에 있어서, 상기 획득 모듈(401)은, 상기 목표 이미지에 대해 특징 추출을 실행하여, 상기 목표 이미지에 대응하는 제1 특징맵을 얻는 단계; 상기 제1 특징맵의 제1 미리 설정된 채널 중에서, 상기 제1 특징맵 중 각 특징점을 객체 중심점의 응답치로서 획득하는 단계; 상기 제1 특징맵을 복수의 서브 영역으로 분할하고, 또한 각 서브 영역 내 최대 응답치 및 최대 응답치에 대응하는 특징점을 결정하는 단계; 및 최대 응답치가 미리 설정된 임계값보다 큰 특징점을 당해 객체의 중심점으로 하고, 또한 상기 제1 특징맵에서의 당해 객체의 중심점의 위치 인덱스에 기반하여 당해 객체의 중심점의 위치 정보를 결정하는 단계를 통해 상기 목표 이미지 중 당해 객체의 중심점의 위치 정보를 결정하도록 설정된다.In some embodiments, the acquiring module 401 may further include: performing feature extraction on the target image to obtain a first feature map corresponding to the target image; acquiring each feature point in the first feature map from among first preset channels of the first feature map as a response value of an object center point; dividing the first feature map into a plurality of sub-regions and determining a maximum response value and a feature point corresponding to the maximum response value in each sub-region; and using a feature point with a maximum response value greater than a preset threshold as the center point of the object, and determining the location information of the center point of the object based on the location index of the center point of the object in the first feature map. It is set to determine position information of the center point of the object in the target image.

일부 실시예에 있어서, 상기 획득 모듈(401)은 상기 목표 이미지에 대해 특징 추출을 실행하여, 상기 목표 이미지에 대응하는 제2 특징맵을 얻는 단계; 상기 제1 특징맵에서의 당해 객체의 중심점의 위치 인덱스에 기반하여, 상기 제2 특징맵에서의 당해 객체의 중심점의 위치 인덱스를 결정하는 단계; 및 상기 제2 특징맵에서의 당해 객체의 중심점의 위치 인덱스에 대응하는 위치에서, 당해 객체의 중심점에 대응하는 객체 유형 정보를 획득하는 단계를 통해 상기 목표 이미지 중 당해 객체의 중심점에 대응하는 객체 유형 정보를 결정하도록 설정된다.In some embodiments, the acquiring module 401 performs feature extraction on the target image to obtain a second feature map corresponding to the target image; determining a location index of the center point of the object in the second feature map based on the location index of the center point of the object in the first feature map; and acquiring object type information corresponding to the center point of the object at a location corresponding to the location index of the center point of the object in the second feature map, and the object type corresponding to the center point of the object in the target image set to determine information.

일부 실시예에 있어서, 본 출원의 실시예에서 제공하는 장치가 구비하는 기능 또는 포함하는 모듈은 상기 방법 실시예에서 서술하는 방법을 실행하는데 사용될 수 있고, 간결한 서술을 위해, 여기서 중복된 서술을 생략하도록 한다.In some embodiments, functions provided by the apparatus provided in the embodiments of the present application or modules included may be used to execute the methods described in the method embodiments, and for the sake of concise description, redundant descriptions are omitted here. to do it

본 명세서의 실시예는 컴퓨터 디바이스를 더 제공하고, 상기 컴퓨터 디바이스는 적어도 메모리, 프로세서 및 메모리에 저장되고 프로세서에서 실행 가능한 컴퓨터 프로그램을 포함하며, 여기서, 프로세서가 상기 프로그램을 실행 시 상술한 임의의 실시예에 기재된 방법을 실현한다.Embodiments of the present specification further provide a computer device, wherein the computer device includes at least a memory, a processor, and a computer program stored in the memory and executable in the processor, wherein the processor executes the program when the program executes any of the aforementioned implementations. Implement the method described in the example.

도 5는 본 명세서의 실시예에서 제공하는 컴퓨터 디바이스 하드웨어 구조 개략도를 도시하며, 당해 디바이스는 프로세서(501), 메모리(502), 입출력 인터페이스(503), 통신 인터페이스(504) 및 버스(505)를 포함할 수 있다. 그 중에서, 프로세서(501), 메모리(502), 입출력 인터페이스(503) 및 통신 인터페이스(504)는 버스(505)를 통해 디바이스 내부에서의 상호 통신 연결을 실현한다.5 shows a schematic diagram of a computer device hardware structure provided in an embodiment of the present specification, wherein the device includes a processor 501 , a memory 502 , an input/output interface 503 , a communication interface 504 and a bus 505 . may include Among them, the processor 501 , the memory 502 , the input/output interface 503 , and the communication interface 504 realize interconnection communication within the device via the bus 505 .

프로세서(501)는 범용 CPU(Central Processing Unit, 중앙 처리 장치), 마이크로 프로세서, 주문형 집적 회로(Application Specific Integrated Circuit, ASIC) 또는 하나 또는 복수의 집적 회로 등 방식으로 구현되어, 본 명세서의 실시예에서 제공하는 기술 방안을 실현하도록 관련 프로그램을 실행할 수 있다.The processor 501 is implemented in a general-purpose CPU (Central Processing Unit, Central Processing Unit), microprocessor, Application Specific Integrated Circuit (ASIC) or in one or a plurality of integrated circuits, etc., in the embodiment of the present specification A related program may be executed to realize the technical solution provided.

메모리(502)는 ROM(Read Only Memory, 읽기 전용 메모리), RAM(Random Access Memory, 랜덤 액세스 메모리), 정적 저장 장치, 동적 저장 장치 등 형식으로 구현될 수 있다. 메모리(502)는 운영 체제 및 기타 응용 프로그램을 저장할 수 있고, 소프트웨어 또는 펌웨어로 본 명세서의 실시예에서 제공하는 기술 방안 실현 시, 관련 프로그램 코드가 메모리(502)에 저장되고 프로세서(501)에 의해 호출되어 실행된다.The memory 502 may be implemented in the form of a read only memory (ROM), a random access memory (RAM), a static storage device, a dynamic storage device, or the like. The memory 502 may store an operating system and other application programs, and when realizing the technical solution provided in the embodiment of the present specification in software or firmware, the related program code is stored in the memory 502 and executed by the processor 501 . called and executed.

입출력 인터페이스(503)는 정보의 입력 및 출력을 구현하기 위해 입출력 모듈을 연결하는데 사용된다. 입출력 모듈은 컴포넌트로서 디바이스에 설정될 수 있고(도시하지 않음), 디바이스에 외접되어 상응한 기능을 제공할 수도 있다. 그 중에서, 입력 디바이스는 키보드, 마우스, 터치 스크린, 마이크, 각종 센서 등을 포함할 수 있고, 출력 디바이스는 모니터, 스피커, 진동기, 지시등 등을 포함할 수 있다.The input/output interface 503 is used to connect input/output modules to implement input and output of information. The input/output module may be configured in the device as a component (not shown), or may be externally connected to the device to provide a corresponding function. Among them, the input device may include a keyboard, a mouse, a touch screen, a microphone, various sensors, and the like, and the output device may include a monitor, a speaker, a vibrator, an indicator, and the like.

통신 인터페이스(504)는 당해 디바이스와 다른 디바이스 사이의 통신 인터랙션을 실현하기 위해 통신 모듈(도시하지 않음)을 연결하는데 사용된다. 그 중에서, 통신 모듈은 유선 방식(예를 들어, USB, 랜 케이블 등)으로 통신을 실현하거나 또는 무선 방식(예를 들어, 모바일 네트워크, WIFI, 블루투스 등)으로 통신을 실현할 수 있다.The communication interface 504 is used to connect a communication module (not shown) to realize communication interaction between the device and another device. Among them, the communication module may realize communication in a wired method (eg, USB, LAN cable, etc.) or may realize communication in a wireless method (eg, mobile network, WIFI, Bluetooth, etc.).

버스(505)는 하나의 통로를 포함하며, 디바이스의 각 컴포넌트(예를 들어, 프로세서(501), 메모리(502), 입출력 인터페이스(503) 및 통신 인터페이스(504)) 사이에서 정보를 전송한다.Bus 505 includes one path and transfers information between each component of the device (eg, processor 501 , memory 502 , input/output interface 503 , and communication interface 504 ).

유의해야 할 점은, 상술한 디바이스는 프로세서(501), 메모리(502), 입출력 인터페이스(503), 통신 인터페이스(504) 및 버스(505)만 표시하였으나, 구체적인 실시 과정에서, 당해 디바이스는 정상 실행의 실현에 필요한 다른 컴포넌트를 포함할 수도 있다. 그 외, 본 기술 분야의 기술자는 상술한 디바이스에 본 명세서의 실시예 방안을 실현하는데 필수적인 컴포넌트만 포함할 수 있으며, 도시된 모든 컴포넌트를 포함할 필요가 없는 것을 이해할 수 있다.It should be noted that, in the above-described device, only the processor 501, the memory 502, the input/output interface 503, the communication interface 504, and the bus 505 are shown. It may include other components necessary for the realization of In addition, a person skilled in the art may understand that the above-described device may include only components essential for realizing the embodiments of the present specification, and need not include all illustrated components.

도 6a 및 도 6b에 나타낸 바와 같이, 본 출원의 실시예는 차량을 더 제공하고, 상기 차량의 자동차 캐빈 내에 표시 디바이스(601), 감시 제어 시스템(602) 및 디지털 휴먼에 기반한 자동차 캐빈 인터랙션 장치(603) 또는 컴퓨터 디바이스(604)가 설치되어 있다.6A and 6B , the embodiment of the present application further provides a vehicle, wherein a display device 601, a supervisory control system 602, and a digital human-based automobile cabin interaction apparatus in the vehicle cabin of the vehicle ( 603) or a computer device 604 is installed.

상기 표시 디바이스(601)는 상기 디지털 휴먼이 상응한 동작을 실행하는 애니메이션을 표시하는데 사용된다. 상기 표시 디바이스(601)는 차량의 컨트롤 패널 및 차량 좌석 뒷면에 장착된 스크린 중 적어도 하나를 포함할 수 있다.The display device 601 is used to display an animation in which the digital human performs a corresponding operation. The display device 601 may include at least one of a control panel of the vehicle and a screen mounted on the back of the vehicle seat.

상기 감시 제어 시스템(602)은 시각 감시 제어 시스템 및 음성 감시 제어 시스템 중 적어도 하나를 포함할 수 있고, 상기 시각 감시 제어 시스템은 적어도 하나의 카메라를 포함할 수 있고, 상기 카메라는 감시 제어할 영역 위쪽에 장착할 수 있으며, 감시 제어할 영역의 비디오 또는 이미지를 수집하는데 사용된다. 예를 들어, 상기 카메라는 차량의 전면 유리에 장착되거나, 또는 좌석 위쪽 등 위치에 장착될 수 있다. 또 예를 들어, 상기 카메라는 상기 자동차 캐빈 내의 백미러에 장착되고, 렌즈가 자동차 캐빈 뒷줄을 향할 수 있다. 이러한 장착 방식을 통해 카메라의 시야 범위가 넓어지고, 자동차 캐빈 뒷줄의 감시 제어 비디오를 획득하는 편리하다. 상기 음성 감시 제어 시스템은 적어도 하나의 마이크를 포함하여, 감시 제어 대기 영역의 오디오 신호를 수집하는데 사용될 수 있다.The monitoring control system 602 may include at least one of a visual monitoring control system and a voice monitoring control system, and the visual monitoring control system may include at least one camera, wherein the camera is located above an area to be monitored and controlled. It can be mounted on the monitor and is used to collect video or images of the area to be monitored and controlled. For example, the camera may be mounted on a windshield of a vehicle or mounted on a position such as above a seat. Also for example, the camera may be mounted on a rearview mirror in the car cabin, and the lens may be directed towards the back row of the car cabin. This mounting method widens the field of view of the camera, and it is convenient to acquire surveillance control video of the rear row of the car cabin. The voice monitoring control system, including at least one microphone, may be used to collect the audio signal of the monitoring control waiting area.

본 출원의 실시예에 따른 디지털 휴먼에 기반한 자동차 캐빈 인터랙션 장치(603)는 상술한 임의의 실시예에 따른 자동차 캐빈 인터랙션 장치를 사용할 수 있고, 본 출원의 실시예에 따른 컴퓨터 디바이스(604)는 상술한 임의의 실시예에 따른 컴퓨터 디바이스를 사용할 수 있다. 상기 디지털 휴먼에 기반한 자동차 캐빈 인터랙션 장치(603) 또는 컴퓨터 디바이스(604)는 차량의 중앙 관리 시스템에 집적될 수 있다. 상기 감시 제어 시스템(602)은 차내 통신 버스를 통해 상기 자동차 캐빈 인터랙션 장치(603) 또는 컴퓨터 디바이스(604)와 통신을 실행할 수 있고, 예를 들어, 상기 차내 통신 버스는 컨트롤러 영역 네트워크(Controller Area Network, CAN) 버스일 수 있다.The digital human-based car cabin interaction apparatus 603 according to the embodiment of the present application may use the car cabin interaction apparatus according to any of the above-described embodiments, and the computer device 604 according to the embodiment of the present application is described above. A computer device according to any one embodiment may be used. The digital human-based car cabin interaction device 603 or computer device 604 may be integrated into a central management system of the vehicle. The supervisory control system 602 may communicate with the in-vehicle communication device 603 or the computer device 604 via an in-vehicle communication bus, for example, the in-vehicle communication bus may be connected to a controller area network (Controller Area Network). , CAN) bus.

일부 실시예에 있어서, 상기 디지털 휴먼에 기반한 자동차 캐빈 인터랙션 장치(603) 또는 컴퓨터 디바이스(604)는 또한 상기 상태 정보에 기반하여 차량 탑재 디바이스, 예를 들어, 자동차 캐빈 내의 조명 디바이스, 공조기, 차창, 오디오 재생 디바이스 및 좌석 중 적어도 하나를 제어할 수도 있다.In some embodiments, the digital human-based car cabin interaction apparatus 603 or computer device 604 also includes an on-vehicle device based on the status information, for example, a lighting device in a car cabin, an air conditioner, a car window, At least one of the audio reproducing device and the seat may be controlled.

161일부 실시예에 있어서, 상기 디지털 휴먼에 기반한 자동차 캐빈 인터랙션 장치(603) 또는 컴퓨터 디바이스(604)는 또한 네트워크를 통해 유저의 스마트기기 또는 음성 데이터 베이스, 동작 모델 데이터 베이스 등에 연결되어, 상기 스마트기기 또는 음성 데이터 베이스, 동작 모델 데이터 베이스와 데이터 인터랙션을 실행할 수도 있다.161 In some embodiments, the digital human-based car cabin interaction device 603 or computer device 604 is also connected to a user's smart device or a voice database, an operation model database, etc. through a network, so that the smart device Alternatively, data interaction with the voice database and the motion model database may be performed.

본 출원의 실시예는 컴퓨터 프로그램이 저장되어 있는 컴퓨터 판독 가능 저장 매체를 더 제공하고, 당해 프로그램이 프로세서에 의해 실행될 경우, 전술한 임의의 실시예에 기재된 방법을 실현한다.An embodiment of the present application further provides a computer-readable storage medium in which a computer program is stored, and when the program is executed by a processor, the method described in any of the above-described embodiments is realized.

컴퓨터 판독가능 매체는 영구적 및 비영구적, 이동식 및 비-이동식 매체를 포함하고, 정보 저장은 임의의 방법 또는 기술에 의해 실현될 수 있다. 정보는 컴퓨터 판독가능 명령어들, 데이터 구조들, 프로그램 모듈들, 또는 다른 데이터일 수 있다. 컴퓨터 저장 매체의 예들은 상 변화 메모리(phase change memory, PRAM), 정적 랜덤 액세스 메모리(static random access memory, SRAM), 동적 랜덤 액세스 메모리(dynamic random access memory, DRAM), 다른 타입들의 랜덤 액세스 메모리(RAM), 판독 전용 메모리(ROM), 전기적 소거가능 프로그램가능 판독 전용 메모리(EEPROM), 플래시 메모리 또는 다른 메모리 기술, CD-ROM, 디지털 다기능 디스크(DVD) 또는 다른 광 저장, 자기 카세트들, 자기 테이프 저장 또는 다른 자기 저장 디바이스들 또는 임의의 다른 비-송신 매체를 포함하지만, 이에 제한되지 않고, 컴퓨팅 디바이스들에 의해 액세스될 수 있는 정보를 저장하도록 구성될 수 있다. 이 논문에서의 정의에 따르면, 컴퓨터 판독가능 매체는 변조된 데이터 신호들 및 반송파들과 같은 일시적 매체를 포함하지 않는다.Computer-readable media includes permanent and non-persistent, removable and non-removable media, and storage of information may be realized by any method or technology. Information may be computer readable instructions, data structures, program modules, or other data. Examples of computer storage media include phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory ( RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical storage, magnetic cassettes, magnetic tape may be configured to store information that can be accessed by computing devices, including, but not limited to, storage or other magnetic storage devices or any other non-transmission medium. By the definition in this paper, computer readable media does not include transitory media such as modulated data signals and carrier waves.

위의 구현들의 설명으로부터, 본 기술분야의 통상의 기술자들은 본 명세서의 실시예들이 소프트웨어와 필요한 범용 하드웨어 플랫폼에 의해 구현될 수 있다는 것을 명확히 이해할 수 있다는 것을 알 수 있다. 이러한 이해에 기초하여, 본 설명의 실시예들의 기술적 해결책들에 대해, 그들의 필수 부분, 다시 말해서 선행 기술에 기여하는 부분은, 소프트웨어 제품의 형태로 구체화될 수 있다. 컴퓨터 소프트웨어 제품은 저장 매체에 저장될 수 있다. 예를 들어, ROM/RAM, 자기 디스크, 광 디스크 등. 컴퓨터 소프트웨어 제품은 컴퓨터 디바이스(개인용 컴퓨터, 서버, 또는 네트워크 디바이스 등일 수 있음)가 본 설명의 각각의 실시예 또는 실시예의 일부 부분에 설명된 방법을 실행할 수 있게 하는 수 개의 명령어들을 포함할 수 있다.From the description of the above implementations, it can be seen that those skilled in the art can clearly understand that the embodiments of the present specification can be implemented by software and a necessary general-purpose hardware platform. Based on this understanding, for the technical solutions of the embodiments of the present description, an essential part thereof, that is, a part contributing to the prior art, may be embodied in the form of a software product. The computer software product may be stored in a storage medium. For example, ROM/RAM, magnetic disk, optical disk, etc. A computer software product may include several instructions that enable a computer device (which may be a personal computer, server, or network device, etc.) to carry out the method described in each embodiment or some portion of an embodiment of this description.

본 출원의 실시예는 컴퓨터 프로그램 제품을 더 제공하고, 당해 컴퓨터 프로그램 제품은 컴퓨터 명령어를 포함하며, 상기 컴퓨터 명령어가 프로세서에 의해 실행될 경우, 전술한 임의의 실시예에 기재된 방법을 실현할 수 있다.An embodiment of the present application further provides a computer program product, wherein the computer program product includes computer instructions, and when the computer instructions are executed by a processor, the method described in any of the above-described embodiments can be realized.

위의 실시예들에서 설명된 시스템들, 장치들, 모듈들, 또는 유닛들은 컴퓨터 칩들 또는 엔티티들에 의해 구현되거나, 특정 기능들을 갖는 제품들에 의해 구현될 수 있다. 전형적인 구현 장치는 컴퓨터이고, 컴퓨터의 특정 형태는 개인용 컴퓨터, 랩톱 컴퓨터, 셀룰러 폰, 카메라 폰, 스마트 폰, 개인 휴대 정보 단말(personal digital assistant), 미디어 플레이어, 내비게이션 디바이스, 이메일 송수신기 디바이스, 게임 콘솔, 태블릿 컴퓨터, 웨어러블 디바이스, 또는 이러한 디바이스들 중 임의의 것의 조합일 수 있다.The systems, devices, modules, or units described in the above embodiments may be implemented by computer chips or entities, or may be implemented by products having specific functions. A typical implementation is a computer, and certain types of computer are personal computers, laptop computers, cellular phones, camera phones, smart phones, personal digital assistants, media players, navigation devices, email transceiver devices, game consoles, It may be a tablet computer, a wearable device, or a combination of any of these devices.

본 설명에서의 다양한 실시예들은 점진적인 방식으로 설명되고, 서로에 대해 서로 유사한 부분들이 참조될 수 있다. 각각의 실시예의 설명은 다른 실시예들과 상이하다. 특히, 장치 실시예들에 대해, 그들은 기본적으로 방법 실시예들과 유사하기 때문에, 설명이 단순화되고, 방법 실시예들의 설명의 대응하는 부분들이 참조될 수 있다. 전술된 장치 실시예들은, 별개의 컴포넌트들로서 설명된 모듈들이 물리적으로 분리되거나 또는 분리되지 않을 수 있고, 모듈들의 기능들은 본 설명의 실시예들이 구현될 때 하나 이상의 소프트웨어 및/또는 하드웨어로 구현될 수 있는 단지 개략적인 것들이다. 모듈들의 일부 또는 전부는 실시예들에서의 해결책들의 목적들을 구현하기 위해 실제 요건들에 따라 선택될 수 있다. 본 기술분야의 통상의 기술자들은 창의적인 작업 없이 본 개시내용을 이해하고 구현할 수 있다.Various embodiments in the present description are described in a progressive manner, and parts similar to each other may be referred to. The description of each embodiment is different from other embodiments. In particular, for the apparatus embodiments, since they are basically similar to the method embodiments, the description is simplified, and corresponding parts of the description of the method embodiments may be referred to. In the device embodiments described above, modules described as separate components may or may not be physically separated, and the functions of the modules may be implemented in one or more software and/or hardware when the embodiments of the present description are implemented. These are just schematics. Some or all of the modules may be selected according to actual requirements to implement the objectives of the solutions in the embodiments. Those skilled in the art can understand and implement the present disclosure without creative work.

Claims

In the car cabin interaction method based on digital human,
acquiring status information of a living body riding in a car cabin;
determining operation information matching the state information; and
generating an animation in which a digital human performs a corresponding motion based on the motion information and displaying the animation on a display device in the car cabin
A car cabin interaction method based on a digital human, characterized in that.

According to claim 1,
generating an animation in which a digital human performs a corresponding motion based on the motion information and displaying the animation on a display device in the car cabin,
determining voice information matching the state information;
obtaining a corresponding voice including a timestamp based on the voice information; and
generating and displaying an animation in which the digital human executes the motion at a time corresponding to the timestamp based on the motion information while reproducing the voice
A car cabin interaction method based on a digital human, characterized in that.

3. The method of claim 2,
the operation includes a plurality of sub-operations, each sub-operation matches one phoneme in the voice, and the timestamp includes a timestamp of each phoneme;
generating and displaying an animation in which the digital human executes the motion at a time corresponding to the timestamp based on the motion information,
determining an execution time of a sub-operation matching each phoneme based on the timestamp of each phoneme; and
based on the motion information, generating and displaying an animation in which the digital human executes a sub-action matching the corresponding phoneme at a timestamp of each phoneme;
A car cabin interaction method based on a digital human, characterized in that.

4. The method according to any one of claims 1 to 3,
generating an animation in which a digital human performs a corresponding motion based on the motion information and displaying the animation on a display device in the car cabin,
calling a motion segment of at least one digital human corresponding to the motion information from a motion model base; and
displaying on the display device each frame of the frame of the motion image of the at least one digital human in order
A car cabin interaction method based on a digital human, characterized in that.

5. The method according to any one of claims 1 to 4,
The state information of the living body includes the first state information of the living body, and the step of obtaining the state information of the living body riding in a car cabin includes:
collecting surveillance control video of a rear row in a car cabin; and
performing biometric detection on the monitoring control video and performing state analysis on the detected living body to obtain first state information of the living body
A car cabin interaction method based on a digital human, characterized in that.

6. The method of claim 5,
The surveillance control video is mounted on a rearview mirror in the car cabin, and also acquired through a video collecting device with a lens facing the rear row of the car cabin.
A car cabin interaction method based on a digital human, characterized in that.

7. The method of claim 5 or 6,
the first state information includes at least one of classification information of the living body, identification information, attribute information, emotion information, facial expression information, gesture information, seating information, and seat belt wearing information; and/or
The living body includes at least one of a driver, a co-pilot, a child, the elderly, a pet, and a passenger in the back row
A car cabin interaction method based on a digital human, characterized in that.

8. The method according to any one of claims 1 to 7,
the biological status information includes first status information and second status information of the living body, wherein the first status information is obtained based on a surveillance control video in a car cabin;
The acquiring of the status information of the living body in the car cabin further includes acquiring second status information sent from a smart device carried by the living body;
The step of determining operation information matching the state information includes:
Further comprising the step of determining operation information that matches both the first state information and the second state information
A car cabin interaction method based on a digital human, characterized in that.

9. The method according to any one of claims 1 to 8,
The step of obtaining the status information of the living body riding in the car cabin,
inputting surveillance control video in the car cabin into a pre-trained neural network; and
Determining the state information of the living body based on the output result of the neural network
A car cabin interaction method based on a digital human, characterized in that.

10. The method according to any one of claims 1 to 9,
Generating an animation in which a digital human performs a corresponding motion based on the motion information and further comprising generating a shape of the digital human before displaying it on a display device in the car cabin
A car cabin interaction method based on a digital human, characterized in that.

11. The method of claim 10,
The step of creating the shape of the digital human comprises:
generating the shape of the digital human based on the state information of the living body;
or
Creating a shape of the digital human based on a preset shape template of the digital human
A car cabin interaction method based on a digital human, characterized in that.

12. The method according to any one of claims 1 to 11,
Further comprising the step of controlling the execution state of the vehicle-mounted device based on the state information
A car cabin interaction method based on a digital human, characterized in that.

13. The method according to any one of claims 1 to 12,
The step of determining operation information matching the state information includes:
acquiring a driving state of the vehicle; and
Comprising the step of determining the driving state of the vehicle and operation information that is respectively matched with the state information
A car cabin interaction method based on a digital human, characterized in that.

According to claim 1,
The step of obtaining the status information of the living body riding in the car cabin,
recognizing a living body riding in the car cabin based on the target image in the car cabin; and
Based on the location information of the living body, further comprising the step of determining whether the living body is located in a seat in the car cabin
A car cabin interaction method based on a digital human, characterized in that.

According to claim 1,
The step of obtaining the status information of the living body riding in the car cabin is
determining object information of each object in the target image in the vehicle cabin, wherein, with respect to each of the objects, the object information of the object includes location information of a center point of the object and an object type corresponding to the center point of the object contains information -;
selecting the living body and the seat in the car cabin from the object based on object type information of each of the objects; and
Determining whether the living body is located in the seat based on the center point position of the living body and the center point position of the seat
A car cabin interaction method based on a digital human, characterized in that.

16. The method of claim 15,
generating an animation in which a digital human performs a corresponding motion based on the motion information and displaying the animation on a display device in the car cabin,
generating an animation in which a digital human corresponding to a living body not located in the seat sits on a seat and performs a demonstration operation of wearing a seat belt, and displaying the animation on a display device in the car cabin
A car cabin interaction method based on a digital human, characterized in that.

17. The method of claim 16,
in response to determining that the living body is not located in the seat, sending a prompt message
A car cabin interaction method based on a digital human, characterized in that.

16. The method of claim 15,
Determining the location information of the center point of the object in the target image,
performing feature extraction on the target image to obtain a first feature map corresponding to the target image;
acquiring each feature point in the first feature map from among first preset channels of the first feature map as a response value of an object center point;
dividing the first feature map into a plurality of sub-regions and determining a maximum response value and a feature point corresponding to the maximum response value in each sub-region; and
Using a feature point with a maximum response value greater than a preset threshold as the center point of the object, and determining the location information of the center point of the object based on the location index of the center point of the object in the first feature map.
A car cabin interaction method based on a digital human, characterized in that.

19. The method of claim 18,
Determining object type information corresponding to the center point of the object in the target image includes:
performing feature extraction on the target image to obtain a second feature map corresponding to the target image;
determining a location index of the center point of the object in the second feature map based on the location index of the center point of the object in the first feature map; and
obtaining object type information corresponding to the center point of the object at a location corresponding to the location index of the center point of the object in the second feature map;
A car cabin interaction method based on a digital human, characterized in that.

In the car cabin interaction device based on digital human,
an acquisition module configured to acquire status information of a living body riding in an automobile cabin;
a determining module, configured to determine operation information matching the state information; and
a display module configured to generate an animation in which a digital human executes a corresponding operation based on the operation information and display it on a display device in the automobile cabin
A car cabin interaction device based on a digital human, characterized in that.

A computer program is stored, and when the program is executed by a processor, the method according to any one of claims 1 to 19 is realized.
A computer-readable storage medium, characterized in that.

20. A method comprising a memory, a processor and a computer program stored in the memory, executable by the processor, and when the processor executes the program, the method according to any one of claims 1 to 19 is realized.
Computer device, characterized in that.

A display device, a monitoring control system, and the digital human-based automobile cabin interaction apparatus according to claim 20 or the computer device according to claim 22 are installed in an automobile cabin.
Vehicle characterized in that.

20. A method comprising computer instructions, which when executed by a processor realizes the method according to any one of claims 1 to 19.
A computer program product, characterized in that