CN118574022B

CN118574022B - An imaging control method based on retinal imaging principle

Info

Publication number: CN118574022B
Application number: CN202411047663.9A
Authority: CN
Inventors: 闫锋; 刘泉; 蒋骏杰; 杨婷; 王凯; 吴天泽; 王一鸣
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2024-08-01
Filing date: 2024-08-01
Publication date: 2024-11-12
Anticipated expiration: 2044-08-01
Also published as: CN118574022A

Abstract

The invention discloses an imaging control method based on a retina imaging principle, and belongs to the technical field of imaging. The method adopts a hierarchical downsampling strategy to simulate the retina imaging characteristics, reduces the resolution of an integral image while ensuring the resolution of a focus point, can greatly reduce the data storage space, evaluates and dynamically observes the output result of the imaging method by a decision network, comprehensively considers the estimated loss of the identity, the position and the size of an object in a scene, and the estimated loss of the observed result at the next moment and the difference estimated loss of the generation probability and the posterior probability of the decision network, thereby accurately obtaining the understanding of the scene at the current moment and determining the hierarchical downsampling strategy at the next moment.

Description

Imaging control method based on retina imaging principle

Technical Field

The invention relates to an imaging control method based on a retina imaging principle, and belongs to the technical field of imaging.

Background

The camera, as a photoelectric signal conversion device, can effectively collect optical signals in the environment and convert the optical signals into electrical signals to display images, and is widely applied to the fields of daily life, security, national defense and the like. In the traditional camera, the imaging chip can uniformly read out all pixels, namely, all pixels are processed equally, and the processing mode can obtain images with higher quality, but a large amount of storage space is required, and huge calculation power and power consumption requirements are also required for subsequent tasks such as image processing.

Unlike conventional cameras, where the imaging chip uniformly captures the light signals in the environment, in living beings, such as the human retina, the image is non-uniformly sampled during imaging, i.e., the image is sampled at high resolution in the center of vision and at low resolution at the edges of vision. If the camera is capable of mimicking the principle of retinal imaging to achieve non-uniform sampling, the requirements for computational effort and power consumption can be reduced as much as possible while meeting the resolution requirements.

One problem that needs to be considered when obtaining imaging results in an imaging modality that mimics the principle of retinal imaging is: because the resolution at the edge is too low, the identity of the object at the edge position of the visual field cannot be accurately judged, a targeted improved method needs to be explored, the accurate estimation of the low-resolution image at the edge is realized, the center position of the imaging visual field can be dynamically adjusted, and the confidence of identifying each object in the visual field is improved.

Disclosure of Invention

In order to solve the existing problems, the invention provides an imaging control method based on the retina imaging principle, which simulates the retina imaging characteristics through a hierarchical downsampling strategy, and reduces the resolution of the whole image while guaranteeing the resolution of a focus. The imaging control method based on the retina imaging principle comprises an isomorphic row address selecting circuit and an isomorphic column address selecting circuit, wherein the input of the row address selecting circuit and the column address selecting circuit comprises the center coordinate of a concerned point, the resolution of a concerned region and the downsampling level number n, and the pixel position to be gated is calculated according to the input of the three, and is read and quantized through a reading circuit. Meanwhile, the imaging control method based on the retina imaging principle is provided with n-level storage areas, and the corresponding n-level downsampling is carried out, namely, the downsampling and reading result of each level is stored in the access area of the corresponding level.

Specifically, the imaging control method based on the retina imaging principle is applied to a retina imaging camera, wherein the retina imaging camera is provided with a pixel array, a row address selecting circuit, a column address selecting circuit, a reading circuit and n storage areas; the method adopts a hierarchical downsampling strategy to obtain all levels of sampling images corresponding to a current frame image, analyzes object information in each level of image through a decision network, and determines the hierarchical downsampling strategy of the next frame image according to the object information, wherein the hierarchical downsampling strategy comprises center point coordinatesSize of each stage of sampled imageAnd the currently employed downsampling progression n; the imaging chip of the retina imaging camera gates corresponding pixels in the pixel array for sampling according to a hierarchical downsampling strategy.

The method adopts a hierarchical downsampling strategy to obtain each level of sampling image corresponding to the next frame of image, and then the first level of sampling image is processedThe image of the level downsampling is in the sampling rangeIs sampled every 2 ^n-1 points in the area of the frame, and the obtained size isIs a picture of the image of (a).

Furthermore, when the method carries out hierarchical downsampling on the image, if the sampling range exceeds the range of the pixel array, zero padding is carried out on the sampling points beyond the range.

Further, the decision network is realized based on a transducer model, and the loss function comprises an estimation function term for each object identity in the whole sceneEstimating function term of each object position in whole sceneEstimating function term of each object size in whole sceneEstimation function term for observation result at next momentAnd a difference function term of the decision network generation probability and the posterior probability; Loss functionThe method comprises the following steps:

Wherein, 、、、、And the weight value corresponding to each function item.

Further, the estimation function term of each object identity in the whole sceneThe method comprises the following steps:

Wherein, Representing the identity of the kth object in the entire scene,Expressed in a decision networkA parameterized posterior probability distribution is provided,The viewing angle at time t is indicated,、、Respectively represent the first in the whole sceneIndividual objectAt the first timeThe identity, location and size in the stage,Representing the L2 norm.

Further, the estimation function term of each object position in the whole sceneThe method comprises the following steps:

Wherein, Representing the position of the kth object in the entire scene.

Further, the function term for estimating the size of each object in the whole sceneThe method comprises the following steps:

Wherein, Representing the size of the kth object in the entire scene.

Further, the estimation function term of the observation result at the next momentThe method comprises the following steps:

Wherein, Representing the predicted observation of the decision network at time t +1,Expressed in a decision networkThe generation of the probability distribution of the parameterization,Representing the predicted observation perspective of the decision network at time t +1,、、Respectively represent decision network predictionThe object t+1 is at the timeIdentity, location and size in the stage.

Further, a difference function term between the generation probability and the posterior probability of the decision networkThe method comprises the following steps:

Wherein, KL divergence is expressed to measure the difference between two probability distributions.

Compared with a typical camera imaging method, the imaging control method based on the retina imaging principle provided by the invention has the advantages that the imaging method samples a first-stage downsampling area at a sampling interval of 1 step, samples a second-stage downsampling area at a sampling interval of 2 step, samples a third-stage downsampling area at a sampling interval of 4 step, and the like, and samples an nth-stage downsampling area at a sampling interval of 2 ^n-1 step, so that n pictures with different resolutions are obtained, compared with the full-sampling picture of the full-sampling picture, the data storage is reduced by n/2 ^n-1, a large amount of storage space is saved, the calculated amount required by image processing can be relieved, and compared with the common camera, the calculated amount is reduced by n/4 ^n-1.

According to the imaging control method based on the retina imaging principle, for the multi-stage image obtained by using the imaging method, a decision network is used for analyzing the identity, the size, the position and other information of objects in each stage of image, the distribution condition of each object in the whole scene in an original image is estimated, and the imaging strategy at the next moment, namely the fixation center, the size and the grading number of the next imaging, are dynamically adjusted according to the estimation result. By using the decision network, the problem that the identity of the object at the edge position of the visual field cannot be accurately judged due to the fact that the resolution ratio at the edge is too low can be effectively solved, and the confidence coefficient of identifying each object in the visual field can be effectively improved.

The invention has the beneficial effects that:

The method adopts a hierarchical downsampling strategy to simulate the retina imaging characteristics, reduces the resolution of the whole image while ensuring the resolution of the focus, can greatly reduce the data storage space, can evaluate and dynamically observe the output result of the imaging method by a decision network, can greatly reduce the calculated amount in network calculation, effectively reduces the data transmission bandwidth and power consumption, and is expected to become an edge imaging equipment end standard deployment paradigm.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic diagram of hierarchical downsampling of an image looking at the center of the image.

Fig. 2 is a schematic diagram of a first stage sampling of an image.

Fig. 3 is a schematic diagram of second-level sampling of an image.

Fig. 4 is a schematic diagram of a third level sampling of an image with the center of the image focused.

Fig. 5 is a schematic diagram of hierarchical downsampling of an image looking at the edges of the image.

Fig. 6 is a schematic diagram of third level sampling of an image looking at an image edge.

Fig. 7 is a schematic circuit diagram of a retinal imaging camera.

Fig. 8 is a schematic diagram of a decision network for outputting results to a retinal imaging camera.

Fig. 9 is a schematic diagram of a decision network for outputting results to a general camera.

Fig. 10 is a schematic diagram of a generated model of a retinal imaging camera decision network.

Fig. 11 is a schematic illustration of the inference process of a retinal imaging camera decision network.

Fig. 12 is a network model block diagram of a retinal imaging camera decision network.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the embodiments of the present invention will be described in further detail with reference to the accompanying drawings.

Embodiment one:

The embodiment provides an imaging control method based on a retina imaging principle, the method is applied to a retina imaging camera, the retina imaging camera is provided with a pixel array, a row address selecting circuit, a column address selecting circuit, a reading circuit and n storage areas, an imaging chip of the retina imaging camera acquires a current frame image by adopting a hierarchical downsampling strategy, object information in each level image is analyzed through a decision network, the hierarchical downsampling strategy of the next frame image is determined according to the object information, and the object information comprises information such as identity, size and position of an object. The hierarchical downsampling strategy includes center point coordinates Size of each stage of sampled imageAnd the currently employed downsampling progression n; the imaging chip of the retina imaging camera gates corresponding pixels in the pixel array for sampling according to a hierarchical downsampling strategy.

As shown in fig. 1, the method is an example of three-level downsampling of an image at a center position where the gaze center is the center of the image, wherein the size of each level of image isIs of a size of (a) and (b). Let the center position of the image sample beThe picture size of the first-stage sample isThen the range of the first level sampled pictureThe first item is the coordinate position of the upper left corner of the first-stage sampling picture, the second item is the coordinate position of the lower right corner of the first-stage sampling picture, the origin of coordinates is the upper left corner of the whole imaging array, and in the coordinate range, the picture is subjected to first-stage sampling, namely each point is sampled, and the obtained picture is shown in fig. 2; for the range of the second-level sampled picture, thenIn the coordinate range, the picture is subjected to secondary sampling, namely, sampling is carried out once every two points, so as to ensure that the size of the final picture isThe resulting picture is shown in fig. 3; for the range of the third-level sampled picture, thenIn the coordinate range, the picture is sampled three times, namely, every four points are sampled to ensure that the size of the final picture isThe resulting picture is shown in fig. 4; when n-level downsampling is needed, then analogize is performed, the range of the picture sampled at the n-th level isWithin the coordinate range, the picture is subjected toStage sampling, i.e. sampling every 2 ^n-1 points, to ensure that the final picture is of size。

In the hierarchical downsampling process, if the sampling range exceeds the range of the pixel array, zero padding is performed on the out-of-range sampling points, as shown in fig. 5, which is an example of performing three-level downsampling on an image when the gaze center is an edge position of the image, wherein the size of each level of image is 8×8. In this example, the first-stage sampling and the second-stage sampling are the same as those in the example of fig. 1, for the third-stage sampling, since the gaze center is at the edge position of the image, the range of the third-stage sampling exceeds the boundary of the image, in fig. 5, that is, the upper half and the left half of the image of the third-stage sampling have partial areas exceeding the coordinate range of the pixel array, in this case, zero padding needs to be performed on the sampling points exceeding the coordinate range, and the obtained third-stage sampling image is as shown in fig. 6, in this example, the first row and the first column of the third-stage sampling image exceed the pixel array range, the positions of the corresponding sampling points of the first row and the first column in the obtained third-stage sampling image are directly set to 0, and the remaining sampling points within the pixel array range are normally sampled.

As shown in fig. 7, a circuit configuration diagram of a retinal imaging camera in which a pixel array, a row address circuit, a column address circuit, a readout circuit, andA storage area. Wherein the inputs of the column address circuit and the row address circuit comprise gaze center point coordinatesSize of each level of pictureAnd the number of downsampling stages currently employedAs described above, for the firstDownsampled pictures, need to be sampledLine sumColumn, sampling range isThe lines to be sampled have2^n-2 2^n-1 The columns to be sampled are2^n-2 2^n-1 The column address circuit receives the coordinates of the gaze center pointSize of each level of pictureAnd the number of downsampling stages currently employedAfter the signals of (a), the row address circuit will sequentially gate the rows of the pixel array to be read, and the column address circuit will gate the columns of all the pixel arrays to be readOf stagesLine sumThe pixels of the columns are read out and quantized by a read-out circuit and then stored in the corresponding memoryIn the level memory areas, each memory area has a size of。

As shown in fig. 8, a schematic diagram of a decision network for processing the output result of the retinal imaging camera is shown, at the initial time, the decision network takes the initial center coordinates of the focus (for example, the center position of the image), the resolution of the focus area (for example, the size of one sixteenth of the resolution of the whole image), the downsampling order (for example, five steps) acquires the non-uniformly sampled image, and inputs the non-uniformly sampled image into the decision network, and the output result of the decision network is the center coordinates of the focus of the next frame imageRegion of interest resolution sizeDownsampling progressionThus dynamically adjusting the next frame of image and realizing the retinal-like glancing process. In a decision network, the inputs areThe size of the sheet isIn image processing, the decision network will typically use a convolutional neural network, in which the first layer is convolved withA plurality of channels, each channel having a convolution kernel of sizeThe step of the convolution kernel isEdge padding of an input imageCircle zero, the number of multiplications that need to be made in the convolution operation is: the number of additions required is: The decision network of the output result of the common camera as shown in FIG. 9 is shown, i.e. when all pixels of the pixel array are read out equally, the input of the decision network is 1 sheet with size of 2 ^n-1 2^n-1 In the convolutional neural network, the first layer is convolved withA plurality of channels, each channel having a convolution kernel of sizeThe step of the convolution kernel isEdge padding of an input imageCircle zero, the number of multiplications that need to be made in the convolution operation is:{[(2^n-1 ) /]+1}² the number of additions required is: ( ){[(2^n-1 ) /]+1}² For the calculated amount of the retina imaging camera and the common camera in the decision network, in the same decision network structure, taking one layer of convolution calculation as an example, the calculated amount is mainly related to the image size, the multiplication calculation time can be approximately reduced by n/4 ⁿ ^-1, and the addition calculation time can be approximately reduced by #) / [()4 ^n-1 ] Since the usual convolution computation only takes into account multiplication operations, it can be considered approximately that the computation of a retinal imaging camera in the decision network is reduced by n/4 ^n-1 compared to a normal camera.

The decision network required for the retinal imaging camera described, which can be summarized as the generation model shown in fig. 10, in which the model is for a sceneIf there isIndividual objectFor each object, its identity isThe position in the whole picture isThe size isWhile the camera isThe action at the moment isThe meaning of the action is thatAdjusting gaze center of retinal imaging camera to momentAnd is divided intoStages, each stage having a size ofThereby forming a new viewing angle for the sceneViewing the scene at this angle of view will result in the firstIndividual objects, inAt the moment in timeIdentity in a stagePosition ofSize, size ofThereby forming the retina imaging cameraObservation of time of day. The joint probability distribution for this generative model can be described as equation (1):

（1）

Wherein the method comprises the steps of To at the same timeHidden state of time of day system, including identityPosition ofSize, size ofAs well as other descriptive variables for the state of the system. Posterior probability of model introductionThe free energy of the model is shown in equation (2) and the goal of the model is to minimize the difference in the joint probability distribution of model predictions and generated models, i.e., to minimize the free energyAs in formula (2).

（2）

Implementing the decision network, the inference process of which is shown in FIG. 11, for each observation, i.e., output of the retinal imaging cameraThe decision network first determines the firstIndividual objects, inAt the moment in timeIdentity in a stagePosition ofSize, size ofAnd according to the current viewing angleInferring identity of each object throughout the scenePosition in whole pictureAnd size ofAnd selecting the next observation angle according to the inferred resultAt the same time, the network predicts the first view under the new observation angleIndividual objects, inAt the moment in timeIdentity in a stagePosition ofSize, size ofThereby predictingObservation of time of day. The decision network thus described is trained with a loss function consisting of five parts:

(1) An estimation function term for each object identity in the entire scene:

（3）

(2) An estimation function term for each object position in the entire scene:

（4）

(3) An estimation function term for each object size in the entire scene:

（5）

(4) Estimating function term of observation result at next moment:

（6）

(5) A difference function term of the generation probability and the posterior probability of the decision network:

（7）

Wherein, Expressed in a decision networkA parameterized posterior probability distribution is provided,Expressed in a decision networkThe generation of the probability distribution of the parameterization,The L2 norm is represented by the number,KL divergence is expressed to measure the difference between two probability distributions.

The above five formulas are combined to be the total loss function of the model:

（8）

Wherein, 、、、、And the weight value corresponding to each function item.

In this model, the estimation of the scene due to each observation aims at generating a guiding effect on the next observation, i.e. a better observation of the object and a better estimation of the scene are desired, wherein the strategies of the observation perspective selection involved are:

（9）

In equation (9), the first term on the right of the equation represents the correct determination of the identity of the target object, the second term represents the correct viewing target object position, the third term represents the correct viewing target object size, and the fourth and fifth terms represent continuous searching of the target object. For a desired candidate view angle It is desirable to be able to make as much as possibleThe value of (2) is the smallest, so that the random sampling of equation (9) can be performed using Monte Carlo sampling to learn that the result isCandidate view with minimum value。

For the decision network, the implementation can be based on a transducer model as shown in fig. 12, training of the model uses imagenet data sets, and a non-uniform image obtained by sampling a retina imaging camera is obtained through simulation in a data preprocessing mode. For a plurality of pictures obtained through non-uniform sampling, serializing the pictures in patching mode and carrying out position coding, and obtaining the obtained vector to the whole scene through a coding module of a transducerI.e. the identity of each object in the whole scenePosition in whole pictureAnd size ofInputting the estimation to a decoding module of a transducer to predict the mode of observing the image at the next moment, and estimating the first image in the observed image under the new observation angleIndividual objects, inAt the moment in timeIdentity in a stagePosition ofSize, size ofThereby predictingObservation of time of day。

The imagenet dataset adopted by the scheme of the application is a public image dataset, the dataset comprises more than 1400 ten thousand images, and more than 2 ten thousand categories are covered, wherein more than one million images have definite category labels and labels of object positions in the images. The image of the ImageNet dataset covers the categories of pictures seen in most lives, each picture is manually calibrated in category, and the quality and diversity of the dataset are ensured.

Some steps in the embodiments of the present invention may be implemented by using software, and the corresponding software program may be stored in a readable storage medium, such as an optical disc or a hard disk.

The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims

1. An imaging control method based on the principle of retinal imaging, characterized in that the method is applied to a retinal imaging camera, wherein the retinal imaging camera is provided with a pixel array, a row addressing circuit, a column addressing circuit, a readout circuit, and n storage areas; the method adopts a hierarchical downsampling strategy to obtain sampling images of each level corresponding to the current frame image, and analyzes the object information in each level of the image through a decision network, and determines the hierarchical downsampling strategy of the next frame image according to the object information, wherein the hierarchical downsampling strategy includes the coordinates of the center point , the size of each level of sampling image , and the currently adopted downsampling level n; the imaging chip of the retinal imaging camera selects the corresponding pixels in the pixel array for sampling according to the graded downsampling strategy;

When the method adopts a hierarchical downsampling strategy to obtain the sampling images of each level corresponding to the next frame image, the first level downsampling image is The image of size s×s is obtained by sampling each point in the area. The image downsampled by level (n≥2) is in the sampling range of The size obtained by sampling every 2n ^-1 points in the area is images;

The decision network is implemented based on the Transformer model. In the model, for the scene , if exists Objects , for each object, its identity is , and its position in the entire image is , whose size is , and the camera is The action at the moment is , the meaning of this action is Always adjust the focus of the retinal imaging camera to , and divided into The size of each level is , thus forming a new perspective on the scene , observing the scene from this perspective, we get objects, in At the moment, Level status ,Location ,size , which formed the retinal imaging camera in Observation of the moment ; Its loss function includes an estimated function term for the identity of each object in the entire scene , the estimated function term for the position of each object in the entire scene , the estimated function term for the size of each object in the entire scene , the estimated function term for the observation result at the next moment And the difference function term between the decision network generation probability and the posterior probability ; Loss function for:

in, , , , , is the weight value corresponding to each function item;

The estimated function term for the identity of each object in the entire scene for:

in, represents the identity of the kth object in the entire scene, Decision Network The parameterized posterior probability distribution, represents the observation angle at time t, , , Respectively represent the first Objects At the moment identity, position and size in the class, represents the L2 norm;

The estimated function term for the position of each object in the entire scene for:

in, Represents the position of the kth object in the entire scene;

The estimated function term for the size of each object in the entire scene for:

in, Indicates the size of the kth object in the entire scene;

The estimated function term for the observation result at the next moment for:

in, represents the observation at time t+1 predicted by the decision network, Decision Network Parameterized generative probability distribution, represents the observation perspective at time t+1 predicted by the decision network, , , They represent the first The object at time t+1 is identity, position and size in the hierarchy;

The difference function term between the generated probability and the posterior probability of the decision network for:

in, represents the KL divergence, which is used to measure the difference between two probability distributions.

2. The method according to claim 1 is characterized in that when the method performs hierarchical downsampling on the image, if the sampling range exceeds the range of the pixel array, the sampling points that exceed the range are padded with zeros.