KR102090109B1

KR102090109B1 - Learning and inference apparatus and method

Info

Publication number: KR102090109B1
Application number: KR1020180049810A
Authority: KR
Inventors: 이종석; 이호중
Original assignee: 연세대학교 산학협력단
Priority date: 2018-04-30
Filing date: 2018-04-30
Publication date: 2020-03-17
Anticipated expiration: 2038-04-30
Also published as: KR20190125694A

Abstract

학습 및 추론 장치 그리고 그 방법을 개시한다. 본 발명의 일 실시예에 따른 학습 및 추론 장치는 입력 레이어, 은닉 레이어 및 최종 출력 값을 출력하는 출력 레이어를 포함하는 메인 네트워크 처리부와, 상기 입력 레이어 및 상기 은닉 레이어 각각의 출력 값에 기초하여 오류 기울기 값을 계산하고, 다음 단의 로컬 네트워크 모듈이 계산한 트레이닝 로스 값에 기초하여 업데이트되는 로컬 네트워크 모듈들을 포함한다.Disclosed is a learning and reasoning apparatus and method. The learning and inference apparatus according to an embodiment of the present invention includes an input layer, a hidden layer, and a main network processing unit including an output layer outputting a final output value, and an error based on output values of the input layer and the hidden layer, respectively The gradient value is calculated, and the local network modules of the next stage are updated based on the calculated training loss value.

Description

LEARNING AND INFERENCE APPARATUS AND METHOD}

본 발명은 모델 병렬화 딥러닝에 관한 것으로, 보다 상세하게는 로컬 네트워크 모듈을 추가하여 피드포워드와 업데이트를 동시에 수행하는 학습 및 추론 장치 및 그 방법에 관한 것이다.The present invention relates to model parallelization deep learning, and more particularly, to a learning and inference apparatus and method for simultaneously performing feed forward and update by adding a local network module.

최근에 딥 러닝은 현저하게 진보되고, 많은 분야에서 성공적으로 적용되고 있다. 심층 신경망의 성공 뒤는 계층 구조를 통해 유용한 정보를 추출할 수 잇는 중요한 메커니즘이 있다. 도전적인 실제 세계의 문제를 풀기 위해 점점 더 복잡한 심층신경망 구조가 개발되는 추세이다. 하지만, 복잡한 네트워크 구조는 학습에서 상당한 연산 양을 요구한다.In recent years, deep learning has advanced significantly and has been successfully applied in many fields. There is an important mechanism behind the success of deep neural networks to extract useful information through a hierarchical structure. More and more complex deep neural network structures are being developed to solve challenging real-world problems. However, complex network structures require a significant amount of computation in learning.

이러한 어려움을 완화하는 일반적인 방법은 데이터 병렬화이다. 다시 말해서, 복수개의 동일 모델들은 독립적으로 다른 학습 데이트 셋을 사용하면서 학습된다. A common way to alleviate this difficulty is data parallelization. In other words, a plurality of identical models are trained independently using different learning date sets.

모델 병렬화(model-parallelism)로 불리는 다른 방법이 연구되었다. 역전파 학습 패러다임은 신경망 학습에서 지배적인데, 기본적으로 연속적 방식으로 동작한다. 각각의 계층은 상위 계층으로부터의 업데이트 정보를 사용하며 역 순으로 하나씩 업데이트 된다. 반면에, 모델 병렬 학습 접근에서, 네트워크 모델은 분리된 연산 유닛에서 동시에 그리고 독립적으로 학습될 수 있는 파트들로 나뉠 수 있다.Another method called model-parallelism was studied. The backpropagation learning paradigm is dominant in neural network learning, and basically works in a continuous fashion. Each layer uses update information from the upper layer and is updated one by one in reverse order. On the other hand, in the model parallel learning approach, the network model can be divided into parts that can be learned simultaneously and independently in separate computational units.

모델 병렬 학습에 대한 기존 방법들은 두 가지 접근으로 분류될 수 있다.Existing methods for model parallel learning can be classified into two approaches.

첫 번째는, 계층별 순차적 학습(layer-wise sequential training)이고 계층별 순차적 학습 방법은 각 계층에서 연산 동작의 다중 그룹들이 동일한 그룹에서의 동작들만 의존성을 가지도록 식별되고 분리되어 동작하는 것이다. 이러한 방법들은 그 자체가 구별되는 학습 알고리즘이 아닌 역전파 학습을 능률적인 이행에 의한 스케쥴링 기술로써 간주될 수 있다.The first is layer-wise sequential training, and the sequential learning method for each layer is that multiple groups of operation operations in each layer are identified and separated so that only operations in the same group have a dependency. These methods can be regarded as a scheduling technique by efficient implementation of backpropagation learning, not a learning algorithm that is distinguished by itself.

두 번째는, 선행 계층에 한 계층의 의존성은 다른 계층이 분리되어 다른 연산 유닛들에서 분리되어 학습될 수 있도록 제거될 수 있는 방법이다. 역전파 알고리즘은 이 접근에 적절치 않다. 따라서, 이것을 가능케 하는 보조 좌표의 방법(method of auxiliary coordinates, MAC)이 제안되었다. 이것은 기존의 학습을 위한 최소 제곱 손실 최소화 문제(least-square loss minimization problem)를 각각의 데이터 및 각각의 숨겨진 유닛을 위한 보조 변수를 소개함으로써 균등-제약 최적화 문제로 대체한다.Second, the dependency of one layer on the preceding layer can be removed so that the other layers can be separated and learned separately from other computational units. Back propagation algorithms are not suitable for this approach. Therefore, a method of auxiliary coordinates (MAC) has been proposed to make this possible. This replaces the existing least-squares loss minimization problem with an equal-constrained optimization problem by introducing auxiliary variables for each data and each hidden unit.

그때, 이 문제를 푸는 것은 반복적으로 서브-문제를 해결함으로써 정형화된다. 유사한 방법(a method using the alternating direction method of multiplipers(ADMM))은 제안된다. 이것은 균등-제약 최적화을 채택하나, 다른 보조 변수를 갖지 않아서, 서브 문제는 폐쇄형 솔루션을 가진다. 그러나, 이러한 방법들은 컨볼루션 신경망과 같은 딥 러닝 아키텍처에 확장 가능하지 않다. Decoupled neural interface(DNI)는 직접적으로 추정 계층의 가중치 학습을 위해 추가의 작은 신경망을 사용함으로써 합성 그래디언트를 합성한다. 합성 그래디언트가 실제 역전파된 그래디언트와 가까운 한 각각의 계층은 출력 계층에서 에러가 선행 계층을 통해 역전파 될 때까지 기다릴 필요 없다. 즉, 이것은 각각의 계층이 독립적으로 학습되도록 한다. 하지만, 이 방법은 역전파와 비교하여 성능 저하를 야기한다. 또한, 주 모델 계층을 지원하는 추가 모델을 이용하는 아이디어는 Sobolev training에서 사용된다. 여기서 추가의 네트워크들은 에러 그래디언트 대신에 주 모델의 출력을 근사화시켜 학습된다. 이 때문에, Sobolev training 방법은 병렬 학습에 적절치 않다.Then, solving this problem is formulated by repeatedly solving the sub-problem. A similar method (a method using the alternating direction method of multiplipers (ADMM)) is proposed. It adopts an equal-constrained optimization, but has no other auxiliary variables, so the subproblem has a closed solution. However, these methods are not extensible for deep learning architectures such as convolutional neural networks. The decoupled neural interface (DNI) synthesizes synthetic gradients by directly using additional small neural networks for weight training of the estimation layer. As long as the composite gradient is close to the actual back propagated gradient, each layer does not have to wait for errors in the output layer to propagate back through the preceding layer. In other words, this allows each layer to be learned independently. However, this method causes performance degradation compared to back propagation. In addition, the idea of using additional models to support the main model hierarchy is used in Sobolev training. Here, additional networks are trained by approximating the output of the main model instead of the error gradient. For this reason, the Sobolev training method is not suitable for parallel learning.

한국특허공개 제10-2001-0047163호, "다층퍼셉트론 신경망회로의 학습방법"(2001.06.15)Korean Patent Publication No. 10-2001-0047163, "Multilayer Perceptron Neural Network Learning Method" (2001.06.15)

본 발명은 모델 병렬화 딥러닝에서 완전한 피드포워드 및 백워드 전파 과정 없이 업데이트를 수행하는 학습 및 추론 장치와, 그 방법을 제공하고자 한다.The present invention is to provide a learning and reasoning apparatus and method for performing an update without a complete feedforward and backward propagation process in model parallelization deep learning.

또한, 본 발명은 역전파 학습의 계층적 종속성을 잠금 해제하는 학습 및 추론 장치와, 그 방법을 제공하고자 한다.In addition, the present invention is to provide a learning and reasoning apparatus and method for unlocking the hierarchical dependency of backpropagation learning.

또한, 본 발명은 상위 계층에 대한 로컬 네트워크 모듈의 종속성을 잠금 해제하는 학습 및 추론 장치와 그 방법을 제공하고자 한다.In addition, the present invention is to provide a learning and reasoning apparatus and method for unlocking the dependency of a local network module on an upper layer.

또한, 본 발명은 완전한 피드포워드 과정 없이 분류 값을 출력하는 학습 및 추론 장치와 그 방법을 제공하고자 한다.In addition, the present invention is to provide a learning and reasoning apparatus and method for outputting a classification value without a complete feedforward process.

상술한 목적을 달성하기 위한 본 발명의 일 실시예에 따른 학습 및 추론 장치는, 입력 레이어, 은닉 레이어 및 최종 출력 값을 출력하는 출력 레이어를 포함하는 메인 네트워크 처리부와, 상기 입력 레이어 및 상기 은닉 레이어 각각의 출력 값에 기초하여 오류 기울기 값을 계산하고, 다음 단의 로컬 네트워크 모듈이 계산한 트레이닝 로스 값에 기초하여 업데이트되는 로컬 네트워크 모듈들을 포함한다.The learning and inference apparatus according to an embodiment of the present invention for achieving the above object includes a main network processing unit including an input layer, a hidden layer, and an output layer outputting a final output value, and the input layer and the hidden layer The error slope value is calculated based on each output value, and local network modules are updated based on the training loss value calculated by the local network module in the next stage.

상기 로컬 네트워크 모듈들은 입력 레이어 및 상기 은닉 레이어 각각의 출력 값을 이용하여 상기 최종 출력 값에 근사 시킨 근사 값을 계산하고, 상기 근사 값을 이용하여 손실 함수에 의해 트레이닝 로스 값을 계산하며, 상기 트레이닝 로스 값에 의해 상기 오류 기울기 값을 계산할 수 있다.The local network modules calculate the approximate value approximated to the final output value using the output values of the input layer and the hidden layer, and calculate the training loss value by the loss function using the approximate value, and the training The error slope value can be calculated by the loss value.

상기 손실 함수는 민-앱솔루트 에러 함수, 민-스퀘어 에러 함수 또는 크로스-엔트로피 함수일 수 있다.The loss function may be a min-absolute error function, a min-square error function, or a cross-entropy function.

상기 입력 레이어 및 상기 은닉 레이어는 각각 상기 오류 기울기 값에 의해 업데이트 될 수 있다.The input layer and the hidden layer may be updated by the error slope value, respectively.

상기 입력 레이어 및 상기 은닉 레이어는 경사 하강 법에 의해 업데이트 될 수 있다.The input layer and the hidden layer may be updated by a gradient descent method.

상기 입력 레이어의 로컬 네트워크 모듈은 상기 은닉 레이어의 로컬 네트워크 모듈이 계산한 상기 트레이닝 로스 값을 이용하여 손실 함수에 의해 로컬 트레이닝 로스 값을 계산하고, 상기 로컬 트레이닝 로스 값에 의해 로컬 오류 기울기 값을 계산하고, 상기 로컬 오류 기울기 값에 의해 업데이트될 수 있다.The local network module of the input layer calculates a local training loss value by a loss function using the training loss value calculated by the local network module of the hidden layer, and calculates a local error slope value by the local training loss value. And can be updated by the local error slope value.

상기 입력 레이어의 네트워크 모듈은 경사 하강 법에 의해 업데이트 될 수 있다.The network module of the input layer may be updated by a gradient descent method.

상기 입력 레이어 및 상기 입력 레이어의 로컬 네트워크 모듈을 제1 서브 모델부로 구성하고, 상기 입력 레이어, 상기 은닉 레이어 및 상기 은닉 레이어의 로컬 네트워크 모듈을 제2 서브 모델부로 구성하며, 상기 입력 레이어, 상기 은닉 레이어 및 상기 출력 레이어를 메인 모델부로 구성하고, 상기 서브 모델들의 근사 값에 대한 신뢰도를 순차적으로 판단하여 상기 신뢰도가 임계치 이상인 근사 값이 있는 경우 상기 근사 값을 추론 값으로 출력하고, 상기 신뢰도가 임계치 이상인 근사 값이 없는 경우 상기 최종 출력 값을 추론 값으로 출력하는 추론부를 더 포함할 수 있다.A local network module of the input layer and the input layer is configured as a first sub-model unit, and a local network module of the input layer, the hidden layer and the hidden layer is configured as a second sub-model unit, and the input layer and the hidden The layer and the output layer are configured as a main model unit, and the reliability of the sub-models is sequentially determined to output the approximation value as an inference value when the reliability has an approximation value equal to or greater than the threshold, and the reliability threshold When there is no abnormal value above, a reasoning unit that outputs the final output value as an inference value may be further included.

본 발명의 일 실시예에 따른 학습 및 추론 방법은, 입력 레이어 및 은닉 레이어 각각에 대응되는 로컬 네트워크 모듈들이 각 레이어의 출력 값에 기초하여 오류 기울기 값을 계산하는 단계와, 상기 입력 레이어 및 상기 은닉 레이어가 각각 상기 오류 기울기 값에 의해 업데이트 되는 단계와, 상기 입력 레이어의 로컬 네트워크 모듈이 상기 은닉 레이어의 로컬 네트워크 모듈이 계산한 트레이닝 로스 값에 기초하여 업데이트 되는 단계를 포함한다.In the learning and inference method according to an embodiment of the present invention, the local network modules corresponding to each of the input layer and the concealment layer calculate an error slope value based on the output value of each layer, and the input layer and the concealment Each layer is updated by the error slope value, and the local network module of the input layer is updated based on the training loss value calculated by the local network module of the hidden layer.

상기 오류 기울기 값을 계산하는 단계는, 상기 로컬 네트워크 모듈들이 상기 입력 레이어 및 상기 은닉 레이어 각각의 출력 값을 이용하여 상기 최종 출력 값에 근사 시킨 근사 값을 계산하는 단계와, 상기 로컬 네트워크 모듈들이 상기 근사 값을 이용하여 손실 함수에 의해 트레이닝 로스 값을 계산하는 단계와, 상기 로컬 네트워크 모듈들이 상기 트레이닝 로스 값에 의해 상기 오류 기울기 값을 계산하는 단계를 포함할 수 있다.The calculating of the error slope value may include: calculating, by the local network modules, an approximate value approximating the final output value using the output values of the input layer and the hidden layer, and the local network modules performing the calculation. The method may include calculating a training loss value by a loss function using an approximate value, and calculating the error slope value by the local network modules using the training loss value.

상기 입력 레이어의 로컬 네트워크 모듈이 업데이트 되는 단계는, 상기 입력 레이어의 로컬 네트워크 모듈이 상기 은닉 레이어의 네트워크 모듈이 계산한 상기 트레이닝 로스 값을 이용하여 손실 함수에 의해 로컬 트레이닝 로스 값을 계산하는 단계와, 상기 입력 레이어의 로컬 네트워크 모듈이 상기 로컬 트레이닝 로스 값에 의해 로컬 오류 기울기 값을 계산하는 단계와, 상기 입력 레이어의 로컬 네트워크 모듈이 상기 로컬 오류 기울기 값에 의해 업데이트 되는 단계를 포함할 수 있다.The step of updating the local network module of the input layer may include calculating, by the local network module of the input layer, a local training loss value by a loss function using the training loss value calculated by the network module of the hidden layer. The local network module of the input layer may include calculating a local error slope value based on the local training loss value, and the local network module of the input layer being updated by the local error slope value.

상기 입력 레이어 및 상기 입력 레이어의 로컬 네트워크 모듈을 제1 서브 모델부로 구성하고, 상기 입력 레이어, 상기 은닉 레이어 및 상기 은닉 레이어의 로컬 네트워크 모듈을 제2 서브 모델부로 구성하며, 상기 입력 레이어, 상기 은닉 레이어 및 상기 출력 레이어를 메인 모델부로 구성하고, 추론부가 상기 서브 모델들의 근사 값에 대한 신뢰도를 순차적으로 판단하는 단계와, 상기 추론부가 상기 신뢰도가 임계치 이상인 근사 값이 있는 경우 상기 근사 값을 추론 값으로 출력하고, 상기 신뢰도가 임계치 이상인 근사 값이 없는 경우 상기 최종 출력 값을 추론 값으로 출력하는 단계를 더 포함할 수 있다.A local network module of the input layer and the input layer is configured as a first sub-model unit, and a local network module of the input layer, the hidden layer and the hidden layer is configured as a second sub-model unit, and the input layer and the hidden Comprising a layer and the output layer as a main model unit, the inference unit sequentially determines the reliability of the approximate values of the sub-models, and the inference unit infers the approximate value when the reliability has an approximation value equal to or greater than a threshold value. And outputting the final output value as an inference value when there is no approximate value above the threshold.

본 발명의 일 실시예에 따른 학습 및 추론 장치와, 그 방법은 완전한 피드포워드 및 백워드 전파 과정 없이 업데이트를 수행할 수 있다.The learning and reasoning apparatus and method according to an embodiment of the present invention can perform updates without a complete feedforward and backward propagation process.

또한, 본 발명의 일 실시예에 따른 학습 및 추론 장치와, 그 방법은 역전파 학습의 계층적 종속성을 잠금 해제할 수 있다.In addition, the learning and inference apparatus and method according to an embodiment of the present invention can unlock the hierarchical dependency of backpropagation learning.

또한, 본 발명의 일 실시예에 따른 학습 및 추론 장치와, 그 방법은 상위 계층에 대한 로컬 네트워크 모듈의 종속성을 잠금 해제할 수 있다.In addition, the learning and inference apparatus and method according to an embodiment of the present invention can unlock the dependency of the local network module on the upper layer.

또한, 본 발명의 일 실시예에 따른 학습 및 추론 장치와, 그 방법은 완전한 피드포워드 과정 없이 분류 값을 출력할 수 있다.In addition, the learning and inference apparatus according to an embodiment of the present invention, and the method can output classification values without a complete feedforward process.

도 1은 본 발명의 일 실시예에 따른 학습 및 추론 장치의 블록도이다.
도 2는 본 발명의 일 실시예에 따른 학습 및 추론 장치의 학습 동작을 설명하기 위한 도면이다.
도 3은 본 발명의 일 실시예에 따른 학습 및 추론 장치의 추론 동작을 설명하기 위한 도면이다.
도 4는 본 발명의 일 실시예에 따른 학습 및 추론 방법을 나타내는 흐름도이다.1 is a block diagram of a learning and reasoning apparatus according to an embodiment of the present invention.
2 is a view for explaining a learning operation of the learning and reasoning apparatus according to an embodiment of the present invention.
3 is a view for explaining the reasoning operation of the learning and reasoning apparatus according to an embodiment of the present invention.
4 is a flowchart illustrating a learning and reasoning method according to an embodiment of the present invention.

이하 첨부 도면들 및 첨부 도면들에 기재된 내용들을 참조하여 본 발명의 실시예를 상세하게 설명하지만, 본 발명이 실시예에 의해 제한되거나 한정되는 것은 아니다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings and the contents described in the accompanying drawings, but the present invention is not limited or limited by the embodiments.

본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 및/또는 "포함하는(comprising)"은 언급된 구성요소, 단계, 동작 및/또는 소자는 하나 이상의 다른 구성요소, 단계, 동작 및/또는 소자의 존재 또는 추가를 배제하지 않는다.The terminology used herein is for describing the embodiments and is not intended to limit the present invention. In the present specification, the singular form also includes the plural form unless otherwise specified in the phrase. As used herein, "comprises" and / or "comprising" refers to the components, steps, operations and / or elements mentioned above, the presence of one or more other components, steps, operations and / or elements. Or do not exclude additions.

본 명세서에서 사용되는 “실시예”, “예”, “측면”, “예시” 등은 기술된 임의의 양상(aspect) 또는 설계가 다른 양상 또는 설계들보다 양호하다거나, 이점이 있는 것으로 해석되어야 하는 것은 아니다.As used herein, "example", "example", "side", "example", etc. should be construed as any aspect or design described being better or more advantageous than other aspects or designs. It is not done.

또한, '또는' 이라는 용어는 배타적 논리합 'exclusive or' 이기보다는 포함적인 논리합 'inclusive or' 를 의미한다. 즉, 달리 언급되지 않는 한 또는 문맥으로부터 명확하지 않는 한, 'x가 a 또는 b를 이용한다' 라는 표현은 포함적인 자연 순열들(natural inclusive permutations) 중 어느 하나를 의미한다. In addition, the term 'or' refers to the inclusive 'inclusive or' rather than the exclusive 'exclusive or'. That is, unless stated otherwise or unclear from the context, the expression 'x uses a or b' means any of the natural inclusive permutations.

또한, 본 명세서 및 청구항들에서 사용되는 단수 표현("a" 또는 "an")은, 달리 언급하지 않는 한 또는 단수 형태에 관한 것이라고 문맥으로부터 명확하지 않는 한, 일반적으로 "하나 이상"을 의미하는 것으로 해석되어야 한다.Also, the singular expression (“a” or “an”) used in the specification and claims generally means “one or more” unless the context clearly indicates otherwise that it is related to the singular form. It should be interpreted as.

또한, 본 명세서 및 청구항들에서 사용되는 제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다.Further, terms such as first and second used in the present specification and claims may be used to describe various components, but the components should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from other components.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다.Unless otherwise defined, all terms (including technical and scientific terms) used in the present specification may be used as meanings commonly understood by those skilled in the art to which the present invention pertains. In addition, terms defined in the commonly used dictionary are not ideally or excessively interpreted unless specifically defined.

한편, 본 발명을 설명함에 있어서, 관련된 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는, 그 상세한 설명을 생략할 것이다. 그리고, 본 명세서에서 사용되는 용어(terminology)들은 본 발명의 실시예를 적절히 표현하기 위해 사용된 용어들로서, 이는 사용자, 운용자의 의도 또는 본 발명이 속하는 분야의 관례 등에 따라 달라질 수 있다. 따라서, 본 용어들에 대한 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다.On the other hand, in the description of the present invention, when it is determined that a detailed description of related known functions or configurations may unnecessarily obscure the subject matter of the present invention, the detailed description will be omitted. In addition, terms used in the present specification (terminology) are terms used to properly represent an embodiment of the present invention, which may vary according to a user, an operator's intention, or a custom in a field to which the present invention pertains. Therefore, definitions of these terms should be made based on the contents throughout the present specification.

도 1은 본 발명의 일 실시예에 따른 학습 및 추론 장치의 블록도이다.1 is a block diagram of a learning and reasoning apparatus according to an embodiment of the present invention.

도 1을 참조하면, 학습 및 추론 장치(100)는 입력 레이어(110), 은닉 레이어(120), 출력 레이어(130) 및 로컬 네트워크 모듈들(140, 150)을 포함한다.Referring to FIG. 1, the learning and inference device 100 includes an input layer 110, a hidden layer 120, an output layer 130, and local network modules 140 and 150.

생물학적 신경망을 인공적으로 구현하기 위해 이와 유사한 구조와 연산 방법들이 다양한 형태로 제시되고 있는데, 이러한 인공 신경망의 구성 방법론을 신경망 모델이라고 한다.In order to artificially implement a biological neural network, similar structures and computation methods have been proposed in various forms, and the methodology of constructing the artificial neural network is called a neural network model.

신경망 모델은 인공 뉴런이 방향성이 있는 연결선으로 연결되어 네트워크를 형성하고, 각 뉴런은 고유의 출력 값을 가지며 그 값을 연결선을 통해 전달하는 방법으로 인접한 뉴런에 영향을 미친다.In the neural network model, artificial neurons are connected by directional connecting lines to form a network, and each neuron has its own output value and affects adjacent neurons by passing the value through the connecting line.

뉴런과 뉴런 사이의 연결선 각각은 고유의 속성값을 가지고 있으며, 전달하는 신호의 세기를 조절한다. 연결선의 속성값은 뉴런과 뉴런을 연결하는 연결선의 연결 강도를 나타내는 가중치 값이다.Each connection line between a neuron and a neuron has its own property value and controls the strength of the signal it transmits. The property value of the connecting line is a weight value indicating the connection strength of the connecting line connecting the neuron and the neuron.

입력 레이어(110)는 입력 값을 입력 받는 입력 뉴런들로 구성될 수 있다.The input layer 110 may be composed of input neurons that receive input values.

출력 레이어(130)는 신경망의 결과값이 되어 외부로 전달되는 출력 뉴런들로 구성된다.The output layer 130 is composed of output neurons that are delivered to the outside as a result of a neural network.

은닉 레이어(120)는 입력 레이어와 출력 레이어 사이에 존재하며 복수의 은닉 뉴런들로 구성될 수 있다.The hidden layer 120 exists between the input layer and the output layer and may be composed of a plurality of hidden neurons.

인접한 레이어의 뉴런들 사이에서만 입력 레이어에서 출력 레이어 방향으로, 연결선이 연결된다. Only between neurons in adjacent layers, connecting lines are connected from the input layer to the output layer.

입력 레이어(110), 은닉 레이어(120) 및 출력 레이어(130)는 메인 네트워크 처리부를 구성할 수 있다.The input layer 110, the hidden layer 120, and the output layer 130 may constitute a main network processing unit.

입력 레이어(110)는 입력 값을 입력 받고, 제1 출력 값(h_i)을 출력한다.The input layer 110 receives an input value and outputs a first output value h _i .

이때, 입력 값은 학습 값과, 타켓 값을 포함할 수 있다.At this time, the input value may include a learning value and a target value.

학습 값은 본 학습 및 추론 장치(100)가 학습할 데이터 값을 의미하고, 타겟 값은 해당 학습 값에 대한 정답 값을 의미한다.The learning value refers to a data value to be learned by the learning and inference device 100, and the target value refers to a correct answer value for the corresponding learning value.

예를 들면, 강아지 사진과 고양이 사진을 분류하는 학습 및 추론 장치(100)의 경우, 학습 값은 학습할 사진 데이터 값을 의미하고, 타겟 값은 해당 학습 값이 강아지인 경우 강아지 데이터 값을 의미할 수 있다.For example, in the case of the learning and inference device 100 that classifies a puppy picture and a cat picture, the learning value means a picture data value to be learned, and the target value means a dog data value when the corresponding learning value is a dog. You can.

입력 레이어(110)는 입력 값을 입력 받고, 입력 레이어(110)를 구성하는 뉴런들이 연산한 제1 출력 값(h_i)을 출력한다.The input layer 110 receives an input value and outputs a first output value h _i calculated by neurons constituting the input layer 110.

입력 레이어(110)는 복수 개의 연결된 뉴런들이 구성할 수 있다.The input layer 110 may be composed of a plurality of connected neurons.

이때, 뉴런 들은 학습 값에 가중치 값을 곱하고, 바이어스 값을 더한 값을 활성 함수에 의해 출력 값을 계산할 수 있다. 이렇게 계산된 뉴런의 출력 값들은 입접한 뉴런의 입력이 되어 동일 한 과정을 거쳐 제1 출력 값을 출력할 수 있다.At this time, neurons can multiply the learning value by the weight value, and calculate the output value by the active function by adding the bias value. The output values of the calculated neurons may be input to adjacent neurons and output the first output value through the same process.

본 발명은 입력 레이어(110), 은닉 레이어(120) 및 출력 레이어(130)을 구성하는 가중치 값들에 의해 출력된 최종 출력 값이 타겟 값을 가지도록 업데이트될 수 있다. 이를 학습 및 추론 장치(100)의 학습이라 한다. The present invention can be updated so that the final output value output by the weight values constituting the input layer 110, the hidden layer 120, and the output layer 130 has a target value. This is called learning of the learning and reasoning apparatus 100.

은닉 레이어(120)는 제1 출력 값(h_i)을 입력 받고, 제2 출력 값(h_i+1)을 출력한다.The hidden layer 120 receives a first output value (h _i ) and outputs a second output value (h _{i + 1} ).

은닉 레이어(120)는 복수 개의 뉴런들로 구성될 수 있다.The hidden layer 120 may be composed of a plurality of neurons.

은닉 레이어(120)는 제1 출력 값(h_i)을 입력 받고, 입력 레이어(110)와 동일한 뉴런들의 연산 과정을 거쳐 제2 출력 값(h_i+1)을 출력한다.The hidden layer 120 receives the first output value h _i and outputs the second output value h _{i + 1} through the same neuron operation process as the input layer 110.

출력 레이어(130)는 제2 출력 값(h_i+1)을 입력 받고, 최종 출력 값(h_N)을 출력한다.The output layer 130 receives a second output value (h _{i + 1} ) and outputs a final output value (h _N ).

출력 레이어(130)는 복수 개의 뉴런들로 구성될 수 있다.The output layer 130 may be composed of a plurality of neurons.

출력 레이어(120)는 제1 출력 값(h_i)을 입력 받고, 은닉 레이어(120)와 동일한 뉴런들의 연산 과정을 거쳐 최종 출력 값(h_N)을 출력한다.The output layer 120 receives the first output value h _i and outputs a final output value h _N through a process of calculating the same neurons as the hidden layer 120.

로컬 네트워크 모듈들(140, 150)은 입력 레이어(110)에 대응되는 제1 로컬 네트워크 모듈(140) 및 은닉 레이어(120)에 대응되는 제2 로컬 네트워크 모듈(150)을 포함한다.The local network modules 140 and 150 include a first local network module 140 corresponding to the input layer 110 and a second local network module 150 corresponding to the hidden layer 120.

제1 로컬 네트워크 모듈(140)은 하기 수학식 1과 같이 제1 출력 값(h_i)을 입력 받아 최종 출력 값(h_N)에 근사 시킨 제1 근사 값(

)을 계산한다.The first local network module 140 receives the first output value (h _i ) as shown in Equation 1 below and approximates the final output value (h _N ) (

).

[수학식 1][Equation 1]

제1 로컬 네트워크 모듈(140)은 하기 수학식 2와 같이 제1 근사 값(

)을 이용하여 손실 함수(l)에 의해 입력 레이어(110)에 대한 제1 트레이닝 로스 값(L_i)을 계산한다.The first local network module 140 may have a first approximation value (

) To calculate the first training loss value (L _i ) for the input layer 110 by the loss function (l).

[수학식 2][Equation 2]

여기서, y는 학습 목표 값이고, 최종 출력 값(h_N)에 근사 시킨 제1 근사 값(

)을 이용하였으므로 물결무늬 등호로 표시하였다.Here, y is a learning target value, and a first approximation value approximated to the final output value h _N (

), So it is indicated by the wave pattern equal sign.

이때, 손실 함수(l)는 민-앱솔루트 에러(mean-absolute error) 함수, 크로스-엔트로피(cross-entropy) 함수 또는 민-스퀘어 에러(mean-squared error) 함수일 수 있다.At this time, the loss function (l) may be a min-absolute error function, a cross-entropy function, or a min-squared error function.

제1 로컬 네트워크 모듈(140)은 하기 수학식 3과 같이 제1 트레이닝 로스 값(Li)을 제1 출력 값(hi)으로 미분하여 입력 레이어(110)에 대한 제1 에러 그래디언트 값(

)을 계산한다.The first local network module 140 differentiates the first training loss value Li into the first output value hi as shown in Equation 3 below, so that the first error gradient value for the input layer 110 (

).

[수학식 3][Equation 3]

여기서, 제1 트레이닝 로스 값(L_i)은 입력 레이어(110)의 트레이닝 로스 값을 근사 시킨 값이므로 물결무늬 등호로 표시하였다.Here, since the first training loss value L _i is an approximation of the training loss value of the input layer 110, it is represented by a wave pattern equal sign.

제1 로컬 네트워크 모듈(140)은 계산된 제1 에러 그래디언트 값(

)을 입력 레이어(110)로 출력한다.The first local network module 140 calculates the first error gradient value (

) Is output to the input layer 110.

입력 레이어(110)는 하기 수학식 4와 같이 제1 에러 그래디언트 값(

)에 기초하여 입력 레이어(110)의 가중치 값(

)을 업데이트한다.The input layer 110 has a first error gradient value (

Weight value of the input layer 110 based on)

).

[수학식 4][Equation 4]

여기서,

는 학습율(learning rate)이다.here,

Is the learning rate.

수학식 4는 경사 하강 법(gradient-descent rule)에 의해 가중치 값(

)을 업데이트 하는 것을 수식으로 표현하였으나, 이외 다른 방식을 사용하여 가중치 값(

)을 업데이트 할 수 있다.Equation 4 is the weight value by the gradient-descent rule (

) Is expressed as a formula, but the weight value (

) Can be updated.

따라서, 입력 레이어(110)의 업데이트는 제1 출력 값(h_i)이 출력 레이어(130)로 전파될 때까지 기다릴 필요 없이 에러 그래디언트 값은 역전파된다.Therefore, the update of the input layer 110 does not have to wait for the first output value h _i to propagate to the output layer 130, and the error gradient value is reverse propagated.

제1 로컬 네트워크 모듈(140)의 가중치 값은 제2 로컬 네트워크 모듈(150)이 계산한 제2 트레이닝 로스 값에 의해 업데이트된다.The weight value of the first local network module 140 is updated by the second training loss value calculated by the second local network module 150.

제2 로컬 네트워크 모듈(150)은 제1 로컬 네트워크 모듈(140)과 동일 과정으로 은닉 레이어(120)의 출력 값인 제2 출력 값(h_i+1)을 이용하여 제2 근사 값을 계산한다.The second local network module 150 calculates the second approximation value using the second output value h _{i + 1 that} is the output value of the hidden layer 120 in the same process as the first local network module 140.

제2 로컬 네트워크 모듈(150)은 제2 근사 값에 기초하여 제1 로컬 네트워크 모듈(140)과 동일 과정으로 제2 트레이닝 로스 값을 계산한다.The second local network module 150 calculates the second training loss value in the same process as the first local network module 140 based on the second approximation value.

제2 로컬 네트워크 모듈(150)은 제2 트레이닝 로스 값을 제1 로컬 네트워크 모듈(150)로 출력한다.The second local network module 150 outputs the second training loss value to the first local network module 150.

제1 로컬 네트워크 모듈(140)은 제2 로컬 네트워크 모듈(150)으로부터 제2 트레이닝 로스 값을 입력 받는다.The first local network module 140 receives a second training loss value from the second local network module 150.

제1 로컬 네트워크 모듈(140)은 하기 수학식 5와 같이 제2 트레이닝 로스 값(L_i+1)을 이용하여 로컬 트레이닝 로스 값(L_mi)을 계산한다.The first local network module 140 calculates the local training loss value L _mi using the second training loss value L _{i + 1} as shown in Equation 5 below.

[수학식 5][Equation 5]

여기서, L_i는 제1 트레이닝 로스 값이고, l은 손실 함수이다.Here, L _i is a first training loss value, and l is a loss function.

손실 함수(l)는 민-앱솔루트 에러(mean-absolute error) 함수, 크로스-엔트로피(cross-entropy) 함수 또는 민-스퀘어 에러(mean-squared error) 함수일 수 있다.The loss function (l) may be a mean-absolute error function, a cross-entropy function, or a min-squared error function.

제1 로컬 네트워크 모듈(140)은 로컬 트레이닝 로스 값(L_mi)을 이용하여 제1 로컬 네트워크 모듈(140)의 가중치 값을 업데이트한다.The first local network module 140 updates the weight value of the first local network module 140 using the local training loss value L _mi .

또는, 제1 로컬 네트워크 모듈은 하기 수학식 6과 같이 로컬 트레이닝 로스 값(Lmi)을 계산할 수 있다.Alternatively, the first local network module may calculate the local training loss value Lmi as shown in Equation 6 below.

[수학식 6][Equation 6]

도 1에서는 입력 레이어(110)와 출력 레이어(130) 사이에 하나의 은닉 레이어(130)로 도시되어 있으나, 학습 및 추론 장치(100)는 입력 레이어(110)와 출력 레이어(130) 사이에 하나 이상의 은닉 레이어를 포함할 수도 있다.In FIG. 1, although one hidden layer 130 is shown between the input layer 110 and the output layer 130, the learning and inference device 100 has one between the input layer 110 and the output layer 130. The above-mentioned hidden layer may be included.

이때, 추가된 은닉 레이어에 대응되는 로컬 네트워크 모듈의 수도 증가할 수 있다.At this time, the number of local network modules corresponding to the added hidden layer may increase.

이와 같이, 본 발명의 일 실시예에 따른 학습 및 추론 장치(100)는 메인 네트워크 처리부를 구성하는 레이어들의 역전파 학습 시 발생되는 상위 레이어들에 대한 계층적 종속성을 잠금 해제할 수 있다.As described above, the learning and inference apparatus 100 according to an embodiment of the present invention may unlock hierarchical dependencies on upper layers generated when back-propagation learning of layers constituting the main network processing unit.

또한, 본 발명의 일 실시예에 따른 학습 및 추론 장치(100)는 로컬 네트워크 모듈의 상위 레이어에 대한 종속성을 잠금 해제할 수 있다.In addition, the learning and inference device 100 according to an embodiment of the present invention may unlock the dependency on the upper layer of the local network module.

이에 따라, 본 발명은 피드포워드 과정 후 백포워드 과정을 거칠 필요가 없어 학습 속도를 현저히 개선할 수 있다. Accordingly, the present invention can significantly improve the learning speed since there is no need to go through the back-forward process after the feed-forward process.

도 2는 본 발명의 일 실시예에 따른 학습 및 추론 장치의 학습 동작을 설명하기 위한 도면이다.2 is a view for explaining a learning operation of the learning and reasoning apparatus according to an embodiment of the present invention.

도 2를 참조하면, 학습 및 추론 장치의 학습 동작을 설명하기 위해 제1 레이어(211) 및 제1 레이어(211)에 대응되는 제1 로컬 네트워크 모듈(212)는 제1 노드(210)로 구성하고, 제2 레이어(221) 및 제2 레이어(221)에 대응되는 제2 로컬 네트워크 모듈(222)는 제2 노드(220)로 구성하여 도시 하였다.Referring to FIG. 2, the first local network module 212 corresponding to the first layer 211 and the first layer 211 is configured as a first node 210 to describe a learning operation of the learning and inference device And, the second local network module 222 corresponding to the second layer 221 and the second layer 221 is configured and illustrated as a second node 220.

제1 레이어(211) 및 제2 레이어(221)는 각각 입력 레이어 및 은닉 레이어일 수 있다.The first layer 211 and the second layer 221 may be input layers and hidden layers, respectively.

또는, 제1 레이어(211) 및 제2 레이어(221)는 은닉 레이어가 여러 개인 경우 모두 은닉 레이어일 수 있다.Alternatively, the first layer 211 and the second layer 221 may be both hidden layers when there are multiple hidden layers.

학습 및 추론 장치의 동작을 살펴보면, 제1 레이어(211)는 제2 레이어(221)와, 제1 로컬 네트워크 모듈(212)에 제1 출력 값을 출력(단계 ①)한다.Looking at the operation of the learning and inference device, the first layer 211 outputs a first output value to the second layer 221 and the first local network module 212 (step ①).

다음, 제1 로컬 네트워크 모듈(212)은 제1 에러 그래디언트 값을 제1 레이어(211)에 출력하고, 동시에 제2 레이어(221)는 제2 로컬 네트워크 모듈(222)과 다음 레이어에 제2 출력 값을 출력(단계 ②)한다. 이때, 제1 레이어(221)의 가중치 값은 제1 에러 그래디언트 값을 이용하여 경사 하강 법에 의해 업데이트 될 수 있다.Next, the first local network module 212 outputs the first error gradient value to the first layer 211, and at the same time, the second layer 221 outputs the second to the second local network module 222 and the next layer. The value is output (step ②). At this time, the weight value of the first layer 221 may be updated by a gradient descent method using a first error gradient value.

다음, 제2 로컬 네트워크 모듈(222)은 제2 에러 그래디언트 값을 제2 레이어(211)에 출력하고, 동시에 다음 레이어는 다음 로컬 네트워크 모듈에 출력 값을 출력(단계 ③)한다. 이때, 제2 레이어(221)의 가중치 값은 제2 에러 그래디언트 값을 이용하여 경사 하강 법에 의해 업데이트 될 수 있다.Next, the second local network module 222 outputs the second error gradient value to the second layer 211, and at the same time, the next layer outputs the output value to the next local network module (step ③). At this time, the weight value of the second layer 221 may be updated by a gradient descent method using a second error gradient value.

다음, 제2 로컬 네트워크 모듈(222)은 계산한 제2 레이어(211)의 제2 트레이닝 로스 값을 제1 로컬 네트워크 모듈(212)로 출력(단계 ④)한다. 이때, 제1 로컬 네트워크 모듈(212)의 가중치 값은 제2 트레이닝 로스 값을 이용하여 로컬 트레이닝 로스 값을 계산하고, 계산된 로컬 트레이닝 로스 값을 미분하여 로컬 에러 그래디언트 값을 계산하며, 계산된 로컬 에러 그래디언트 값을 이용하여 경사 하강 법에 의해 업데이트 될 수 있다.Next, the second local network module 222 outputs the calculated second training loss value of the second layer 211 to the first local network module 212 (step ④). At this time, the weight value of the first local network module 212 calculates the local training loss value using the second training loss value, calculates the local error gradient value by differentiating the calculated local training loss value, and calculates the calculated local value. It can be updated by gradient descent using the error gradient value.

이와 같이, 학습 및 추론 장치는 병렬 네트워크를 통해 피드포워드와 업데이트를 동시에 수행하여 학습 속도를 개선할 수 있다.In this way, the learning and inference device can improve the learning speed by performing feed forward and update simultaneously through a parallel network.

도 2에 도시된 학습 및 추론 장치의 구체적인 연산 동작은 도 1을 참조하여 설명한 학습 및 추론 장치의 동작과 동일하므로 상세한 설명은 생략한다.The detailed operation of the learning and reasoning apparatus illustrated in FIG. 2 is the same as the operation of the learning and reasoning apparatus described with reference to FIG. 1, and detailed description thereof will be omitted.

도 3은 본 발명의 일 실시예에 따른 학습 및 추론 장치의 추론 동작을 설명하기 위한 도면이다.3 is a view for explaining the reasoning operation of the learning and reasoning apparatus according to an embodiment of the present invention.

도 3을 참조하면, 학습 및 추론 장치는 입력 값(X)을 입력 받고 분류하여 추론 값을 출력한다.Referring to FIG. 3, the learning and inference device receives and classifies the input value X and outputs the inference value.

도 3의 학습 및 추론 장치는 도 1 및 도 2를 참조하여 설명한 학습 동작에 의해 학습된 장치일 수 있다.The learning and reasoning device of FIG. 3 may be a device learned by the learning operation described with reference to FIGS. 1 and 2.

학습 동작에서 입력 값은 분류하기 위한 데이터 값을 의미한다.In the learning operation, the input value means a data value for classification.

예를 들면, 학습 및 추론 장치가 강아지 사진과 고양이 사진을 분류하는 학습을 수행한 경우 입력 값은 분류(또는, 추론)할 사진일 수 있다.For example, when the learning and inference device performs learning to classify a dog picture and a cat picture, the input value may be a picture to be classified (or inferred).

학습 및 추론 장치는 입력 레이어(311), 제1 은닉 레이어(321), 제2 은닉 레이어(331), 출력 레이어(341), 제1 로컬 네트워크 모듈(312), 제2 로컬 네트워크 모듈(322) 및 제3 로컬 네트워크 모듈(332)을 포함한다.The learning and inference device includes an input layer 311, a first hidden layer 321, a second hidden layer 331, an output layer 341, a first local network module 312, and a second local network module 322. And a third local network module 332.

제1 서브 모델부(310)는 입력 값(X)을 입력 받고 연산을 수행하여 제1 근사 값을 출력한다.The first sub-model unit 310 receives the input value X and performs an operation to output a first approximation value.

제1 서브 모델부(310)는 입력 레이어(311) 및 제1 로컬 네트워크 모듈(312)로 구성된다.The first sub-model unit 310 includes an input layer 311 and a first local network module 312.

입력 레이어(311)은 입력 값(X)를 입력 받고 연산을 수행하여 제1 출력 값을 제1 로컬 네트워크 모듈(312)로 출력한다.The input layer 311 receives the input value X and performs an operation to output the first output value to the first local network module 312.

제1 로컬 네트워크 모듈(312)는 제1 출력 값을 입력 받아 연산을 수행하여 제1 근사 값을 출력한다.The first local network module 312 receives a first output value and performs an operation to output a first approximation value.

제2 서브 모델부(320)는 제1 출력 값을 입력 받고 연산을 수행하여 제2 근사 값을 출력한다.The second sub-model unit 320 receives the first output value and performs an operation to output the second approximation value.

제2 서브 모델부(320)는 입력 레이어(311), 제1 로컬 네트워크 모듈(312) 및 제1 은닉 레이어(321) 및 제2 로컬 네트워크 모듈(322)로 구성된다.The second sub-model unit 320 includes an input layer 311, a first local network module 312, a first hidden layer 321, and a second local network module 322.

제1 은닉 레이어(321)는 입력 레이어(311)로부터 제1 출력 값을 입력 받는다.The first hidden layer 321 receives a first output value from the input layer 311.

제1 은닉 레이어(321)는 제1 출력 값을 입력 받아 연산을 수행하여 제2 출력 값을 출력한다.The first hidden layer 321 receives the first output value and performs an operation to output the second output value.

제2 로컬 네트워크 모듈(322)은 제2 출력 값을 입력 받아 연산을 수행하여 제2 근사 값을 출력한다.The second local network module 322 receives the second output value and performs an operation to output the second approximation value.

제3 서브 모델부(330)는 제2 출력 값을 입력 받고 연산을 수행하여 제3 근사 값을 출력한다.The third sub-model unit 330 receives the second output value and performs an operation to output the third approximation value.

제3 서브 모델부(330)는 입력 레이어(311), 제1 로컬 네트워크 모듈(312) 및 제1 은닉 레이어(321), 제2 로컬 네트워크 모듈(322), 제2 은닉 레이어(331) 및 제3 로컬 네트워크 모듈(332)로 구성된다.The third sub model unit 330 includes an input layer 311, a first local network module 312 and a first hidden layer 321, a second local network module 322, a second hidden layer 331, and a third It consists of 3 local network modules (332).

제2 은닉 레이어(331)는 제1 은닉 레이어(321)로부터 제2 출력 값을 입력 받는다.The second concealment layer 331 receives a second output value from the first concealment layer 321.

제2 은닉 레이어(331)는 제2 출력 값을 입력 받아 연산을 수행하여 제3 출력 값을 출력한다.The second concealment layer 331 receives the second output value and performs an operation to output the third output value.

제3 로컬 네트워크 모듈(332)은 제3 출력 값을 입력 받아 연산을 수행하여 제3 근사 값을 출력한다.The third local network module 332 receives the third output value and performs an operation to output a third approximation value.

메인 모델부(340)는 입력 값(X)을 입력 받아 최종 출력 값을 출력한다.The main model unit 340 receives the input value X and outputs the final output value.

메인 모델부(340)는 입력 레이어(311), 제1 은닉 레이어(321), 제2 은닉 레이어(331) 및 출력 레이어(341)로 구성된다.The main model unit 340 includes an input layer 311, a first hidden layer 321, a second hidden layer 331, and an output layer 341.

메인 모델부(340)는 입력 레이어(311), 제1 은닉 레이어(321), 제2 은닉 레이어(331), 출력 레이어(341) 순으로 피드포워드되며 최종 출력 값을 출력한다.The main model unit 340 is fed forward in the order of the input layer 311, the first hidden layer 321, the second hidden layer 331, and the output layer 341, and outputs a final output value.

입력 레이어(311), 제1 은닉 레이어(321), 제2 은닉 레이어(331) 및 출력 레이어(341)의 연산 동작은 도 1 및 도 2를 참조하여 설명한 레이어들의 동작과 동일하므로 상세한 설명은 생략한다. Operations of the input layer 311, the first hidden layer 321, the second hidden layer 331, and the output layer 341 are the same as those of the layers described with reference to FIGS. 1 and 2, so a detailed description is omitted. do.

제1 로컬 네트워크 모듈(312), 제2 로컬 네트워크 모듈(322) 및 제3 로컬 네트워크 모듈(332)의 연산 동작은 도 1 및 도 2를 참조하여 설명한 로컬 네트워크 모듈들의 동작과 동일하므로 상세한 설명은 생략한다.Operations of the first local network module 312, the second local network module 322, and the third local network module 332 are the same as those of the local network modules described with reference to FIGS. Omitted.

학습 및 추론 장치는 추론부(미도시)를 더 포함할 수 있다.The learning and reasoning device may further include a reasoning unit (not shown).

추론부는 피드포워드 되면서 순차적으로 출력되는 서브 모델부들(310, 320, 330)이 출력하는 근사 값들을 입력 받는다.The inference unit receives approximate values output from the sub-model units 310, 320, and 330 sequentially output while being fed forward.

추론부는 제1 근사 값의 신뢰도가 임계치 이상인지 판단한다.The inference unit determines whether the reliability of the first approximation value is greater than or equal to the threshold.

추론부는 제1 근사 값의 신뢰도가 임계치 이상인 경우 제1 근사 값을 추론 값으로 출력하고, 임계치 미만인 경우 제2 근사 값의 신뢰도를 판단한다.The inference unit outputs the first approximation value as a reasoning value when the reliability of the first approximation value is greater than or equal to the threshold value, and determines the reliability of the second approximation value when the reliability is less than the threshold value.

추론부는 제2 출력 값의 신뢰도가 임계치 이상인 경우 제2 근사 값을 추론 값으로 출력하고, 임계치 미만인 경우 제3 근사 값의 신뢰도를 판단한다.The inference unit outputs the second approximation value as a reasoning value when the reliability of the second output value is greater than or equal to the threshold value, and determines the reliability of the third approximation value when the threshold value is less than the threshold value.

추론부는 제3 근사 값의 신뢰도가 임계치 이상인 경우 제3 근사 값을 추론 값으로 출력하고, 제1 근사 값 내지 제3 근사 값이 모두 임계치 미만인 경우 최종 출력 값을 추론 값으로 출력한다.The inference unit outputs the third approximation value as the inference value when the reliability of the third approximation value is greater than or equal to the threshold value, and outputs the final output value as the inference value when both the first to third approximation values are less than the threshold value.

이때, 임계치는 사용자에 의해 기 설정될 수 있고, 제1 근사 값 내지 제3 근사 값과 최종 출력 값은 소프트맥스(softmax) 출력 값일 수 있다.At this time, the threshold may be preset by the user, and the first to third approximate values and the final output value may be softmax output values.

즉, 학습 및 추론 장치는 각 서브 모델부들의 근사 값의 신뢰도 값을 순차적으로 판단하여, 임계치 이상인 경우 완전한 피드포워드 과정을 거쳐 출력 레이어(341)이 최종 출력 값을 출력할 때까지 기다릴 필요 없이 높은 신뢰도의 추론 값을 얻을 수 있다. 이때, 제1 내지 제3 서브 모델부들(310, 320, 330)의 신뢰도 값이 임계치 미만인 경우 메인 모델부(340)의 최종 출력 값을 추론 값으로 출력할 수 있다.That is, the learning and inference apparatus sequentially determines the reliability values of the approximate values of the sub-model units, and when the threshold value is greater than or equal to a threshold, the output layer 341 undergoes a complete feedforward process and does not have to wait until the output layer 341 outputs the final output value. The inference value of the reliability can be obtained. At this time, when the reliability values of the first to third sub-model units 310, 320, and 330 are less than the threshold, the final output value of the main model unit 340 may be output as an inference value.

예를 들면, 개 사진과 고양이 사진을 학습 시킨 학습 및 추론 장치는 제1 서브 모델을 거쳐 제1 근사 값을 출력할 수 있다. 이때, 제1 서브 모델은 강아지일 확률을 0.6, 고양이일 확률을 0.4인 제1 근사 값을 출력할 수 있다. 기 설정된 임계치가 0.9라 가정하면, 추론부는 제1 근사 값을 입력 받고, 임계치 미만으로 판단하여, 제2 서브 모델로부터 강아지일 확률이 0.85, 고양이일 확률이 0.15인 확률의 제2 근사 값을 입력 받을 수 있다. 마찬가지로 추론부는 임계치 미만이라 판단하여 강아지일 확률이 0.91이고, 고양이일 확률이 0.09인 확률인 제3 근사 값을 제3 서브 모델로부터 입력 받을 수 있다. 이때 추론부는 강아지일 확률이 0.91로 임계치 이상인 것으로 판단하여 제3 근사 값을 추론 값으로 출력한다. 즉, 학습 및 추론 장치는 강아지로 판단하여 추론 값을 출력한다.For example, the learning and inference device that has trained a dog picture and a cat picture may output a first approximation value through a first sub-model. In this case, the first sub-model may output a first approximation value of 0.6 for a puppy and 0.4 for a kitten. Assuming that the preset threshold is 0.9, the inference unit receives a first approximation value, determines that it is less than the threshold, and inputs a second approximation value of a probability of 0.85 as a puppy and a probability of 0.15 as a cat from the second sub-model. Can receive Similarly, the inference unit may receive a third approximation value, which is a probability that the puppy is 0.91 and a probability that the cat is 0.09, by determining that it is less than a threshold from the third submodel. At this time, the reasoning unit determines that the probability of being a puppy is 0.91 or more, and outputs a third approximation value as a reasoning value. That is, the learning and reasoning device determines the puppy and outputs a reasoning value.

따라서, 본 발명의 일 실시예에 따른 학습 및 추론 장치와, 그 방법은 서브모델부의 신뢰도가 임계치 이상인 경우, 완전한 피드포워드 과정 없이 분류 값(추론 값)을 출력할 수 있다. 이에 따라, 연산 속도를 현저히 개선할 수 있다.Accordingly, the learning and inference apparatus and the method according to an embodiment of the present invention can output a classification value (inference value) without a complete feedforward process when the reliability of the submodel unit is greater than or equal to a threshold. Accordingly, the operation speed can be significantly improved.

하기 표 1은 본 발명의 일 실시예에 따른 학습 및 추론 장치의 추론부를 구현하는 알고리즘을 표현한 것일 수 있다.Table 1 below may represent an algorithm for implementing a reasoning unit of a learning and reasoning apparatus according to an embodiment of the present invention.

[표 1][Table 1]

Input: data x, threshold t Input : data x, threshold t

Model: sub-model m_i, main-model f Model : sub-model m _i , main-model f

Initialize: classification = 0.Initialize: classification = 0.

for I = 1 to N - 1 do for I = 1 to N-1 do

if max softmax(m_i(x)) > then if max softmax (m _i (x))> then

classification = argmax softmax(m_i(x))classification = argmax softmax (m _i (x))

breakbreak

end ifend if

end forend for

if classification == 0 then if classification == 0 then

#if all sub-models are not confident #if all sub-models are not confident

Classification = argmax softmax(f(x)) Classification = argmax softmax (f (x))

end ifend if

도 3에 도시된 학습 및 추론 장치는 두 개의 은닉 레이어로 구성되었으나, 하나 또는 세 개 이상의 은닉 레이어 및 이에 대응하는 로컬 네트워크 모듈들로 구성될 수 있다.The learning and inference device illustrated in FIG. 3 is composed of two hidden layers, but may be composed of one or more hidden layers and corresponding local network modules.

도 4는 본 발명의 일 실시예에 따른 학습 및 추론 방법을 나타내는 흐름도이다.4 is a flowchart illustrating a learning and reasoning method according to an embodiment of the present invention.

도 4를 참조하면, 학습 및 추론 장치는 S410 단계에서, 입력 레이어 및 은닉 레이어 각각에 대응되는 로컬 네트워크 모듈들이 각 레이어의 출력 값에 기초하여 오류 기울기 값을 계산한다.Referring to FIG. 4, in the learning and inference device, in step S410, local network modules corresponding to each of the input layer and the hidden layer calculate the error slope value based on the output value of each layer.

학습 및 추론 장치는 S420 단계에서, 입력 레이어 및 은닉 레이어가 각각 오류 기울기 값에 의해 업데이트 된다.In the learning and inference device, in step S420, the input layer and the hidden layer are updated by error slope values, respectively.

학습 및 추론 장치는 S430 단계에서, 입력 레이어의 로컬 네트워크 모듈이 은닉 레이어의 로컬 네트워크 모듈이 계산한 트레이닝 로스 값에 기초하여 업데이트 된다.In step S430, the learning and inference device is updated based on the training loss value calculated by the local network module of the hidden layer in the local network module of the input layer.

도 4의 학습 및 추론 방법은 도 1 내지 도 3을 참조하여 설명한 학습 및 추론 장치의 동작 방법과 동일하므로 이외 상세한 설명은 생략한다.The learning and reasoning method of FIG. 4 is the same as the operation method of the learning and reasoning device described with reference to FIGS. 1 to 3, and detailed descriptions thereof will be omitted.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPA(field programmable array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The device described above may be implemented with hardware components, software components, and / or combinations of hardware components and software components. For example, the devices and components described in the embodiments include, for example, processors, controllers, arithmetic logic units (ALUs), digital signal processors (micro signal processors), microcomputers, field programmable arrays (FPAs), It may be implemented using one or more general purpose computers or special purpose computers, such as a programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions. The processing device may run an operating system (OS) and one or more software applications running on the operating system. In addition, the processing device may access, store, manipulate, process, and generate data in response to execution of the software. For convenience of understanding, a processing device may be described as one being used, but a person having ordinary skill in the art, the processing device may include a plurality of processing elements and / or a plurality of types of processing elements. It can be seen that may include. For example, the processing device may include a plurality of processors or a processor and a controller. In addition, other processing configurations, such as parallel processors, are possible.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instruction, or a combination of one or more of these, and configure the processing device to operate as desired, or process independently or collectively You can command the device. Software and / or data may be interpreted by a processing device, or to provide instructions or data to a processing device, of any type of machine, component, physical device, virtual equipment, computer storage medium or device. , Or may be permanently or temporarily embodied in the transmitted signal wave. The software may be distributed on networked computer systems and stored or executed in a distributed manner. Software and data may be stored in one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer readable medium. The computer-readable medium may include program instructions, data files, data structures, or the like alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the embodiments or may be known and usable by those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs, DVDs, and magnetic media such as floptical disks. -Hardware devices specially configured to store and execute program instructions such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of program instructions include high-level language code that can be executed by a computer using an interpreter, etc., as well as machine language codes produced by a compiler. The hardware device described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described by a limited embodiment and drawings, those skilled in the art can make various modifications and variations from the above description. For example, the described techniques are performed in a different order than the described method, and / or the components of the described system, structure, device, circuit, etc. are combined or combined in a different form from the described method, or other components Alternatively, even if replaced or substituted by equivalents, appropriate results can be achieved.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

A main network processor including an input layer, a hidden layer, and an output layer outputting the final output value; And
An error slope value is calculated based on the output value of each of the input layer and the hidden layer, and the local network modules are updated based on the training loss value calculated by the local network module in the next stage.
Learning and reasoning devices.

According to claim 1,
The local network modules calculate the approximate value approximated to the final output value using the output values of the input layer and the hidden layer, and calculate the training loss value by the loss function using the approximate value, and the training Calculate the error slope value by Loss value
Learning and reasoning devices.

According to claim 2,
The loss function is a min-absolute error function, a min-square error function, or a cross-entropy function.
Learning and reasoning devices.

According to claim 3,
The input layer and the hidden layer are each updated by the error slope value.
Learning and reasoning devices.

According to claim 3,
The input layer and the hidden layer is a learning and inference device that is updated by the gradient descent method.

According to claim 2,
The local network module of the input layer calculates a local training loss value by a loss function using the training loss value calculated by the local network module of the hidden layer, and calculates a local error slope value by the local training loss value. And updated by the local error slope value
Learning and reasoning devices.

The method of claim 6,
The loss function is a min-absolute error function, a min-square error function, or a cross-entropy function.
Learning and reasoning devices.

The method of claim 6,
The network module of the input layer is a learning and inference device that is updated by the gradient descent method.

According to claim 2,
A local network module of the input layer and the input layer is configured as a first sub-model unit, and a local network module of the input layer, the hidden layer and the hidden layer is configured as a second sub-model unit, and the input layer and the hidden The layer and the output layer are configured as a main model unit,
The reliability of the approximate values of the sub-models is sequentially determined to output the approximate value as an inference value when the reliability has an approximation value equal to or higher than a threshold value, and infers the final output value when the reliability does not have an approximation value equal to or higher than a threshold value. Further comprising a reasoning unit to output as a value
Learning and reasoning devices.

In the learning and inference method of the learning and inference device comprising an input layer, a hidden layer and an output layer outputting the final output value,
Local network modules corresponding to each of the input layer and the hidden layer calculate an error slope value based on the output value of each layer;
Updating the input layer and the hidden layer by the error slope value, respectively; And
And updating the local network module of the input layer based on the training loss value calculated by the local network module of the hidden layer.
Learning and reasoning methods.

The method of claim 10,
The step of calculating the error slope value,
Calculating, by the local network modules, an approximate value approximated to a final output value of the output layer by using output values of the input layer and the hidden layer;
Calculating, by the local network modules, a training loss value by a loss function using the approximation value; And
And the local network modules calculating the error slope value based on the training loss value.
Learning and reasoning methods.

The method of claim 11,
The loss function is a min-absolute error function, a min-square error function, or a cross-entropy function.
Learning and reasoning methods.

The method of claim 10,
The input layer and the hidden layer are updated by a gradient descent method
Learning and reasoning methods.

The method of claim 10,
In the step of updating the local network module of the input layer,
Calculating a local training loss value by a loss function using the training loss value calculated by the network module of the hidden layer by the local network module of the input layer;
A local network module of the input layer calculating a local error slope value based on the local training loss value; And
And updating the local network module of the input layer by the local error slope value.
Learning and reasoning methods.

The method of claim 14,
The loss function is a min-absolute error function, a min-square error function, or a cross-entropy function.
Learning and reasoning methods.

The method of claim 10,
The local network module of the input layer is updated by the gradient descent method.
Learning and reasoning methods.

The method of claim 11,
A local network module of the input layer and the input layer is configured as a first sub-model unit, and a local network module of the input layer, the hidden layer and the hidden layer is configured as a second sub-model unit, and the input layer and the hidden The layer and the output layer are configured as a main model unit,
Sequentially determining the reliability of the sub-models for the approximate values of the sub-models; And
The inference unit further includes outputting the approximation value as an inference value when the reliability has an approximation value equal to or greater than the threshold, and outputting the final output value as an inference value when the reliability does not have an approximation value equal to or higher than the threshold value.
Learning and reasoning methods.