WO1997039417A2 - Method and apparatus for training a neural network to use a fidelity metric - Google Patents
Method and apparatus for training a neural network to use a fidelity metric Download PDFInfo
- Publication number
- WO1997039417A2 WO1997039417A2 PCT/US1997/005224 US9705224W WO9739417A2 WO 1997039417 A2 WO1997039417 A2 WO 1997039417A2 US 9705224 W US9705224 W US 9705224W WO 9739417 A2 WO9739417 A2 WO 9739417A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- neural network
- fidelity
- metric
- input
- encoder
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/189—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
- H04N19/196—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding being specially adapted for the computation of encoding parameters, e.g. by averaging previously computed encoding parameters
- H04N19/198—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding being specially adapted for the computation of encoding parameters, e.g. by averaging previously computed encoding parameters including smoothing of a sequence of encoding parameters, e.g. by averaging, by choice of the maximum, minimum or median value
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
- H04N19/126—Details of normalisation or weighting functions, e.g. normalisation matrices or variable uniform quantisers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/142—Detection of scene cut or scene change
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
- H04N19/147—Data rate or code amount at the encoder output according to rate distortion criteria
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/154—Measured or subjectively estimated visual quality after decoding, e.g. measurement of distortion
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/172—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/179—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a scene or a shot
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/189—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
- H04N19/192—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding the adaptation method, adaptation tool or adaptation type being iterative or recursive
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/189—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
- H04N19/196—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding being specially adapted for the computation of encoding parameters, e.g. by averaging previously computed encoding parameters
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/61—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/85—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
- H04N19/87—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving scene cut or scene change detection in combination with video compression
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
Definitions
- the present invention relates generally to the field of neural information processing and, more particularly, to an apparatus and concomitant method for learning and using fidelity metric as a control mechanism.
- VDM visual discrimination measure
- the spatiotemporal VDMs disclosed in the above-referenced patent applications receive a pair of image sequences as input, and then produce an estimate of the discriminability between the sequences, for each local region in space and time.
- this set of discriminability estimates (fidelity metric, perceptual metric or quality metric) is generated in units of Just Noticeable Differences (JNDs), as a sequence of maps, wherein each pixel value in each frame of the JND Map Sequence is a discriminability estimate for corresponding spaciotemporal regions of the two input sequences.
- JNDs Just Noticeable Differences
- this large volume of output data is more useful if it can be condensed into a single number or a small set of numbers for each pair of input image sequences.
- the content of the image sequences may change rapidly in some applications, it would be imprecise to use a rigid rule for evaluating subjective image quality ratings. Therefore, a need exists in the art for training an apparatus to learn and use fidelity metric as a control mechanism and to quickly and accurately process the large quantities of fidelity metrics from a VDM to a manageable subjective image quality ratings, e.g., a single numbered subjective quality ratings.
- a signal processing apparatus and concomitant method for learning and using fidelity metric as a control mechanism and to process large quantities of fidelity metrics from a visual discrimination measure (VDM) to a manageable subjective image quality ratings incorporates a VDM and a neural network.
- the VDM receives input image sequences and generates fidelity metrics, which are received by a neural network.
- the neural network is trained to learn and use the fidelity metrics as a control mechanism, e.g., to control an image (video)or audio encoder.
- the neural network is trained to classify the fidelity metrics from the visual discrimination measure into a manageable subjective image quality ratings.
- FIG. 1 is a block diagram of a signal processing system that incorporates a neural network that embodies the teachings of the present invention
- FIG. 2 illustrates the structure of a neural network
- FIG. 3 illustrates a block diagram of a quality-metric-based encoding (QME ) system
- FIG. 4 illustrates a block diagram of a QME system which incorporates a neural network
- FIG. 5 illustrates a block diagram of a second embodiment of a neural network (NN-QME) system
- FIG. 6 illustrates a block diagram of a third embodiment of a NN- QME system
- FIG. 7 illustrates a block diagram of a neural network undergoing training in a NN-QME system
- FIG. 8 illustrates a block diagram of a second embodiment where a neural network is undergoing training in a NN-QME system
- FIG. 9 illustrates a NN encoding-control system.
- FIG. 1 depicts a signal processing system 100 that utilizes the present inventions.
- the signal processing system consists of a signal receiving section 130, a signal processing section 110 and input/output devices 120.
- Signal receiving section 130 serves to receive input data signals, such as images from, including by not limited to, satellite or cable transmission, computers, video camera, aerial imagery or medical imaging devices.
- Signal receiving section 130 includes a data receiving section 132 and a data storage section 134.
- Data receiving section 130 may include a number of devices such as a modem and an analog-to-digital converter.
- a modem is a well-known device that comprises a modulator and a demodulator for sending and receiving binary data over a telephone line, while an analog-to-digital converter converts analog signals into a digital form.
- signal receiving section 130 may receive input signals "on-line” and, if necessary, convert them to a digital form from a number of devices such as a computer, a camera, a video player/decoder or various imaging devices, e.g., medical devices.
- the input signals is not limited to images and may comprise any data that has a "natural scale", e.g., drug discovery data (molecular data in general) and/or other time-varying signals such as audio signals from a microphones or recorded media, or financial signals, e.g., daily stock prices.
- drug discovery data molecular data in general
- time-varying signals such as audio signals from a microphones or recorded media
- financial signals e.g., daily stock prices.
- the data storage section 134 serves to store input signals received by data receiving section 132.
- Data storage section 134 may incorporate a number of devices such as a disk drive, semiconductor memory or other storage media. These storage devices provide a method for applying a delay to the input signals or to simply store the input signals for subsequent processing.
- the signal processing section 110 comprises a general purpose computer having at least one neural network 112, at least one central processing unit (CPU) 114 and a memory 116 for processing images.
- the neural network 112 can be a physical device constructed from various filters and/or processors which is coupled to the CPU through a communication channel.
- the neural network can be represented by a software implementation residing in the memory of the signal processing section.
- the general purpose computer may optionally incorporate a VDM
- the VDM serves to receive a pair of image sequences as input, and then produce an estimate of the discriminability between the sequences (fidelity metric), for each local region in space and time.
- the fidelity metric can be forwarded directly to the neural network 112 for further processing and/or to the video encoder 115 to control the encoding process.
- the VDM 111 and video encoder 115 can be physical devices constructed from various filters and/or processors which are coupled to the CPU through a communication channel.
- the VDM and video encoder can also be represented by software implementations residing in the memory of the signal processing section 110.
- FIG. 1 also illustrates the unique architecture where the VDM is positioned between incoming data (image) sequences and the neural network 112.
- This particular configuration illustrates the concept of employing a VDM to generate fidelity metric (e.g., a spatio-temporal JND map) as inputs for the neural network 112, which in turn generates a control signal.
- the neural network is used to perform two functions: 1) process the large quantities of fidelity metrics from the VDM into a manageable subjective fidelity ratings, e.g., a single numbered subjective quality or fidelity ratings and 2) apply fidelity metric as a control mechanism, e.g., to control the coding processes of image encoder 115.
- a spatio-temporal JND map i.e., a two-dimensional representation of image fidelity
- a quantitative perceptual difference measure for encoding video
- other quantitative perceptual difference measures include those disclosed in S. Daly, "The Visible Differences Predictor: An algorithm for the Assessment of Image Quality", in A.B. Watson (ed.), Digital Images and Human Vision, MIT Press, 1993, pp. 179-206 and S. Daly, "Quantitative Performance Assessment of an Algorithm for the Determination of Image Quality", Society of Information Display International Symposium Digest of Technical Papers, Vol. 24, (1993), pp.
- the signal processing section 110 is also coupled to a plurality of input and output devices 120 such as a keyboard, a mouse, a video monitor, a video decoder via a channel or storage devices, including but not limited to, a hard disk drive, a floppy drive or a compact disk drive.
- the input devices serve to provide inputs (e.g., data, commands and software applications) to the signal processing section for processing the input images, while the output devices serve to display or record the results.
- Neural network has been developed to solve problems ranging from natural language understanding to visual processing.
- a neural network is composed of neurons (also known as nodes, units or perceptrons) and connections between the nodes.
- the strength of each connection is expressed by a numerical value called a weight, which can be modified.
- the activation of a given node is based on the activations of the nodes that have connections directed at that node and the weights on those connections.
- feed- forward neural network can be implemented as functions y(f,w) of a vector f of inputs and a weight or parameter vector w.
- the weight vector is modified such that the neural network optimally estimates some quantity that depends on f.
- the process of adjusting w is commonly referred to as training, where the methods for training are referred to as training algorithms.
- Most neural network trainings involve the use of an error function.
- the weight vector is adjusted so as to minimize the sum of average of the error function on a set of training samples.
- a penalty term is generally applied to the error to restrict the weight vector in some manner that is thought desirable.
- various training methods are used to minimized it or involve the use of some form of gradient descent.
- FIG. 2 illustrates the structure of the neural network 112 in a preferred embodiment.
- Each neural network 112 includes at least an input layer 210, an output layer 230 and optional intermediate layers (also known as hidden layers) 220.
- An input signal 205 which is a fidelity metric, is applied as input to the neural network.
- the fidelity metric is a JND map.
- the structure of the neural network 112 as shown in FIG. 2 can be modified as the neural network undergoes training. It is certainly possible to train a neural network to extract relevant features from an input signal, but all the values within the JND map carry important contrast information pertaining to an estimate of the discriminability between the sequences. As such, extraction is not necessary and the entire JND map serves as inputs to the neural network 112. Although it is possible to feed the entire JND map as inputs to the neural network all at once, it is often more efficient for the neural network to take a rectangular block of JND values each time.
- the input block size is not limited to 8 x 8 and other block sizes can also be employed.
- the structure of the neural network is further refined during training.
- the output signal 240 is compared with a desired output 250 via subtractor 245.
- the training method 270 utilizes the error signal 260 to train the neural network.
- Training for the neural network in this architecture can be accomplished with standard back propagation techniques, in which the error signal is based on the sum of the squared differences between predicted ratings and ratings observed from human subjects in standard subjective rating tasks, across a range of different input image sequences at different quality levels.
- error back-propagation is a supervised learning process through which a neural network learns optimal weights. Error back-propagation compares the responses of the output nodes to a desired response, and adjusts the weights in the network so that if the same input is presented to the network again, the network's response will be closer to the desired response.
- each node ⁇ 232 produces a value which is the total sum net of output values O, of a node n ; coupled to the output node n by a coupling weight w Jt , transformed by a predetermined function f, where f is a sigmoidal function, e.g., typically l/l+e "x .
- This same concept applies to the intermediate node n, 222 with respect to input node n h 212.
- an output value O and O for each node n, 222 and n, 232 respectively can be expressed by the following formulas:
- the network acquires output value O pj of the output node n 232 of the output layer 230 by sequentially computing the output values of the input n- 232, each corresponding to a node from the input layer 212 towards the output layer 230.
- the process of learning in accordance with error back-propagation consists of updating coupling weights w y ⁇ and w ih , so that the total sum E of the square errors between the output value O pj of each node of the output layer 230 on applying the input pattern p and the desired output t ., is minimized.
- the total network error E for the input pattern p is defined by:
- the network error with respect to any weight e.g. weight w jlS is given by the gradient ⁇ E p /dW j , of the total network error E with respect to a change in that weight.
- the error ⁇ j for each output node n 232 can be determined as a function of the corresponding actual value O pj and target value t pj and the difference therebetween for that node, as follows:
- equations (3) and (4) are true as long as function f is used as discussed above. However, those skilled in the art will realize that if a different function f is employed then equations (3) and (4) will change accordingly.
- the coupling weights of both the output layer 230 and the intermediate layer 220 are adjusted according to the following learning rules for ⁇ 232:
- ⁇ represents the rate of learning, which is a constant, and it determines how fast the network weights converge during network training.
- Coefficient ⁇ represents a stabilization factor for reducing the error oscillations and accelerating the convergence thereof. Both coefficients ⁇ and ⁇ can be empirically determined from the number of nodes, layers, input values or output values. This weight adjustment process is repeated until the patterns in the training set are exhausted or when the final error value falls below a predefined upper bound E max .
- FIG. 3 illustrates a block diagram of an encoding system 300 which incorporates an image encoder 310, a controller 320 and a quality metric generator 330.
- Encoding system 300 employs video compression methods that generally include a mechanism for controlling bit allocation to produce, for the specified bit rate, the best possible quality decoded image sequence.
- the controller During encoding of the image sequence (original video) 305, the controller considers the available information via path 314, and determines the bit allocation necessary to represent a particular part of the image.
- the available information may include motion vectors, transform coefficients statistics related to the input video, operational modes selected by the encoder and etc. Furthermore, such information may also include how close the actual encoded bitrate is to the desired bitrate, how full various encoder buffers are, and spatio-temporal characteristics of the original and reconstructed video sequences.
- the controller then adjusts the encoder parameters via path 312 in order to achieve that target bitrate/quality point.
- a quality metric generator 330 is incorporated in the encoding process.
- Such system is known as a quality-metric-based encoding (QME) system, and is the subject of a patent application entitled “Apparatus And Method For Optimizing Encoding And Performing Automated Steerable Image Compression In An Image Coding System Using A Perceptual Metric” filed on March 28, 1997 with docket number "DSRC 12008” (which is hereby incorporated by reference).
- a fidelity metric on path 325 is applied to compare the reconstructed video on path 335 to the original video on path 305 and, based on the quality measured, the controller adjusts the encoder parameters to achieve the desired quality/bitrate goal to produce an encoded bitstream on path 340.
- the QME system can be used in an iterative fashion until a desired bitrate/quality goal is reached.
- the meaning of “best” in the context of a QME system can be different for different applications.
- the meaning of “best” is reflected by the fidelity metric used in the encoding process.
- the end “consumer” of the decompressed video is a human viewer. Typically, the viewer does not have any particular task at hand, but only wants the decompressed video to "look" good. In such cases, the compression method should deliver, for a particular bitrate, the decompressed video with the least perceptible distortion.
- the fidelity metric used in the QME system is a visual fidelity metric, and such encoder is referred to as a vision optimized encoder (VOE).
- VOE vision optimized encoder
- FIG. 4 illustrates a block diagram of a QME system 400 which incorporates a neural network (NN) 112 to substitute for the controller of the QME system.
- An QME system that includes a neural net is referred to as a NN-QME system.
- FIGs. 4-6 collectively illustrate three different NN-QME configurations which incorporates one or more neural network(s) (NN) to substitute for either or both, the controller and the fidelity metric generator of the QME system.
- a VDM 111 forwards a fidelity metric on path 425, which is applied to compare the reconstructed video on path 435 to the original video on path 405.
- the neural network 112 adjusts the encoder parameters to the encoder 115 via path 412 to achieve the desired quality/bitrate goal to produce an encoded bitstream on path 440.
- the QME system can be used in an iterative fashion until a desired bitrate/quality goal is reached.
- a neural network is a computational network with many parameters which can be adapted to approximate any (possibly non ⁇ linear) function. Neural networks are appropriately applied in situations where it is easier to collect data samples then to design a solution to address a problem.
- the encoding system may instead collect a database of video sequences and the fidelity metric ratings given to these sequences by a neural network.
- the neural network is employed to process the fidelity metric data and, in turn, produces a simpler fidelity metric ratings in real time for the encoder.
- neural networks are a natural fit to the efficient video fidelity metric problem.
- the mapping from a video quality measure to the optimal parameter settings for the encoder is unknown and likely highly non-linear.
- FIG. 5 illustrates a block diagram of a QME system 500 which incorporates a neural network (NN) 112 to substitute for the VDM of the QME system.
- the NN 112 forwards the fidelity metrics or a simplified fidelity metric ratings to the controller 510.
- An example of a simplified fidelity metric ratings is a scale from 0-4 representing no noticeable difference, a value "0", to very noticeable difference, a value "4".
- the controller using the simplified fidelity metric ratings, is able to quickly adjust the encoder parameters to the encoder 115 to effect a specific coding result.
- FIG. 6 illustrates a block diagram of a QME system 600 which incorporates two neural networks (NN) 112 to substitute for the VDM and controller of the QME system.
- the NN 112 (net 1) forwards the fidelity metrics or a simplified fidelity metric ratings to the NN 112 (net 2).
- the neural network (net 2) using the simplified fidelity metric ratings, is able to quickly adjust the encoder parameters to the encoder 115 to effect a specific coding result.
- Figure 7 shows the configuration of an NN-QME system 700 used for training the NN 112 which will replace the controller 710.
- Figure 8 shows the configuration of an NN-QME system 800 used for training the NN 112 which will replace the quality metric generator (VDM) 111.
- VDM quality metric generator
- the NN For each of the video sequences in the database, the NN computes a metric (labeled "NN output” in FIGs. 7 & 8), given the "current state" of its parameters. An error is generated by subtracting the NN output from the target metric (labeled “desired output” in FIGs. 7 and 8 via subtractors 720 and 820) as calculated by the known metric. Next, the NN parameters are adapted such that the error would be reduced if the video sequence (database) was presented again to the NN. This iterative process (the training phase) continues until it is no longer possible to reduce the NN output error by adaptation of its parameters. If the error is acceptably small, the NN can now serve as a computational device to measure video quality. Once the NN has been completely trained, the subtractors in FIGs. 7 and 8 are removed, and the NN is substituted for the appropriate component in the QME system, forming one of the NN-QME systems shown in Figures 4-6.
- a metric labeled "
- video quality and bit allocation have both global and local aspects. Distortions in the video can occur at any scale, and so all scales need to be examined.
- the controller for example, the number of bits allowed in a frame is a global quantity, which then needs to be distributed in some way across the frame.
- the distribution of encoded bits spatio-temporally across the video sequence is usually non-uniform.
- a coarse-to-fine set of neural networks can be used.
- FIG. 9 illustrates a NN encoding-control system.
- the NN encoding- control system might start with a NN 112a allocating bits between a few very large regions 916 of the image at low resolution 914. Given these allocations, another network would be applied independently to each of these large regions. This network takes the bit allocation for a region and further divides that among sub-regions of the current region.
- a hierarchy of such networks 112a -112b can be constructed, starting at very low resolution 914 or at some intermediate resolution 912, and working up to a network that locates bits at the finest possible spatio-temporal resolution 910 possible for the compression method. The last network in the hierarchy would be applied at the finest resolution 910.
- the coarsest resolution NN 112a might decide on a bit allocation for a frame
- the next layer NN 112b might decide on the bit allocation per slice
- the finest resolution layer NN 112c might decide on the bit allocation for each macroblock in a slice.
- Inputs to these networks at any scale could include the quality metric values for the current sub-regions of the image from a VDM 111, the number of bits, a measure of complexity of the current sub- regions taken from the transform coefficients, etc.
- a fine-to-coarse system is useful for decisions about coarse level, global encoder parameters, which are a function of localized, fine resolution information. In such cases, fine-resolution information needs to be integrated to make a global decision.
- An example of this in the context of MPEG2 is the selection of a quantization matrix for a particular frame in a video sequence.
- a two-pass NN implementation can be used.
- Fine-scale information can be passed up from local regions to successively larger regions in order to make globally oriented determinations, and a coarse-to- fine pass can then be performed to make locally oriented decisions and measurements.
- a coarse-to- fine pass can then be performed to make locally oriented decisions and measurements.
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Computational Linguistics (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer And Data Communications (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Description
Claims
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US1426696P | 1996-03-29 | 1996-03-29 | |
US60/014,266 | 1996-03-29 | ||
US1468796P | 1996-04-02 | 1996-04-02 | |
US60/014,687 | 1996-04-02 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO1997039417A2 true WO1997039417A2 (en) | 1997-10-23 |
WO1997039417A3 WO1997039417A3 (en) | 1998-05-14 |
Family
ID=26685872
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US1997/005224 WO1997039417A2 (en) | 1996-03-29 | 1997-03-31 | Method and apparatus for training a neural network to use a fidelity metric |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO1997039417A2 (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1999043161A1 (en) * | 1998-02-21 | 1999-08-26 | Tandberg Television Limited | Determining visually noticeable differences between two images |
WO2002080563A2 (en) * | 2001-03-29 | 2002-10-10 | Koninklijke Philips Electronics N.V. | Scalable expandable system and method for optimizing a random system of algorithms for image quality |
CN101695141B (en) * | 2009-10-20 | 2012-05-23 | 浙江大学 | A video quality assessment method and device |
WO2017176656A1 (en) * | 2016-04-07 | 2017-10-12 | Netflix, Inc. | Techniques for robustly predicting perceptual video quality |
EP3298776A1 (en) * | 2016-02-23 | 2018-03-28 | Magic Pony Technology Limited | Training end-to-end video processes |
US10499069B2 (en) | 2015-02-19 | 2019-12-03 | Magic Pony Technology Limited | Enhancing visual data using and augmenting model libraries |
US10602163B2 (en) | 2016-05-06 | 2020-03-24 | Magic Pony Technology Limited | Encoder pre-analyser |
US10666962B2 (en) | 2015-03-31 | 2020-05-26 | Magic Pony Technology Limited | Training end-to-end video processes |
US10685264B2 (en) | 2016-04-12 | 2020-06-16 | Magic Pony Technology Limited | Visual data processing using energy networks |
US10692185B2 (en) | 2016-03-18 | 2020-06-23 | Magic Pony Technology Limited | Generative methods of super resolution |
US20210174028A1 (en) * | 2018-02-22 | 2021-06-10 | Salesforce.Com, Inc. | Dialogue state tracking using a global-local encoder |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5107347A (en) * | 1989-07-24 | 1992-04-21 | Sony Corporation | Image data inputting system for image processing apparatus |
US5309525A (en) * | 1990-06-14 | 1994-05-03 | Canon Kabushiki Kaisha | Image processing apparatus using neural network |
US5394483A (en) * | 1992-06-30 | 1995-02-28 | Eastman Kodak Co | Method and apparatus for determining visually perceptible differences between images |
US5491776A (en) * | 1991-08-05 | 1996-02-13 | Kawasaki Steel Corporation | Signal processing apparatus and learning method therefor |
US5495542A (en) * | 1990-03-23 | 1996-02-27 | Canon Kabushiki Kaisha | Binary to multi-level image restoration using neural network |
-
1997
- 1997-03-31 WO PCT/US1997/005224 patent/WO1997039417A2/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5107347A (en) * | 1989-07-24 | 1992-04-21 | Sony Corporation | Image data inputting system for image processing apparatus |
US5495542A (en) * | 1990-03-23 | 1996-02-27 | Canon Kabushiki Kaisha | Binary to multi-level image restoration using neural network |
US5309525A (en) * | 1990-06-14 | 1994-05-03 | Canon Kabushiki Kaisha | Image processing apparatus using neural network |
US5491776A (en) * | 1991-08-05 | 1996-02-13 | Kawasaki Steel Corporation | Signal processing apparatus and learning method therefor |
US5394483A (en) * | 1992-06-30 | 1995-02-28 | Eastman Kodak Co | Method and apparatus for determining visually perceptible differences between images |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1999043161A1 (en) * | 1998-02-21 | 1999-08-26 | Tandberg Television Limited | Determining visually noticeable differences between two images |
WO2002080563A2 (en) * | 2001-03-29 | 2002-10-10 | Koninklijke Philips Electronics N.V. | Scalable expandable system and method for optimizing a random system of algorithms for image quality |
WO2002080563A3 (en) * | 2001-03-29 | 2004-03-11 | Koninkl Philips Electronics Nv | Scalable expandable system and method for optimizing a random system of algorithms for image quality |
CN101695141B (en) * | 2009-10-20 | 2012-05-23 | 浙江大学 | A video quality assessment method and device |
US10623756B2 (en) | 2015-02-19 | 2020-04-14 | Magic Pony Technology Limited | Interpolating visual data |
US10904541B2 (en) | 2015-02-19 | 2021-01-26 | Magic Pony Technology Limited | Offline training of hierarchical algorithms |
US10499069B2 (en) | 2015-02-19 | 2019-12-03 | Magic Pony Technology Limited | Enhancing visual data using and augmenting model libraries |
US10516890B2 (en) | 2015-02-19 | 2019-12-24 | Magic Pony Technology Limited | Accelerating machine optimisation processes |
US10523955B2 (en) | 2015-02-19 | 2019-12-31 | Magic Pony Technology Limited | Enhancement of visual data |
US10547858B2 (en) | 2015-02-19 | 2020-01-28 | Magic Pony Technology Limited | Visual processing using temporal and spatial interpolation |
US10582205B2 (en) | 2015-02-19 | 2020-03-03 | Magic Pony Technology Limited | Enhancing visual data using strided convolutions |
US11528492B2 (en) | 2015-02-19 | 2022-12-13 | Twitter, Inc. | Machine learning for visual processing |
US10887613B2 (en) | 2015-02-19 | 2021-01-05 | Magic Pony Technology Limited | Visual processing using sub-pixel convolutions |
US10630996B2 (en) | 2015-02-19 | 2020-04-21 | Magic Pony Technology Limited | Visual processing using temporal and spatial interpolation |
US10666962B2 (en) | 2015-03-31 | 2020-05-26 | Magic Pony Technology Limited | Training end-to-end video processes |
US10681361B2 (en) | 2016-02-23 | 2020-06-09 | Magic Pony Technology Limited | Training end-to-end video processes |
EP3298776A1 (en) * | 2016-02-23 | 2018-03-28 | Magic Pony Technology Limited | Training end-to-end video processes |
US11234006B2 (en) | 2016-02-23 | 2022-01-25 | Magic Pony Technology Limited | Training end-to-end video processes |
US10692185B2 (en) | 2016-03-18 | 2020-06-23 | Magic Pony Technology Limited | Generative methods of super resolution |
US10827185B2 (en) | 2016-04-07 | 2020-11-03 | Netflix, Inc. | Techniques for robustly predicting perceptual video quality |
WO2017176656A1 (en) * | 2016-04-07 | 2017-10-12 | Netflix, Inc. | Techniques for robustly predicting perceptual video quality |
US10685264B2 (en) | 2016-04-12 | 2020-06-16 | Magic Pony Technology Limited | Visual data processing using energy networks |
US10602163B2 (en) | 2016-05-06 | 2020-03-24 | Magic Pony Technology Limited | Encoder pre-analyser |
US20210174028A1 (en) * | 2018-02-22 | 2021-06-10 | Salesforce.Com, Inc. | Dialogue state tracking using a global-local encoder |
US11836451B2 (en) * | 2018-02-22 | 2023-12-05 | Salesforce.Com, Inc. | Dialogue state tracking using a global-local encoder |
Also Published As
Publication number | Publication date |
---|---|
WO1997039417A3 (en) | 1998-05-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6075884A (en) | Method and apparatus for training a neural network to learn and use fidelity metric as a control mechanism | |
US20220239925A1 (en) | Method and apparatus for applying deep learning techniques in video coding, restoration and video quality analysis (vqa) | |
Hill et al. | Perceptual image fusion using wavelets | |
US6670963B2 (en) | Visual attention model | |
Hontsch et al. | Adaptive image coding with perceptual distortion control | |
JP5138086B2 (en) | Video decoding device | |
US7668397B2 (en) | Apparatus and method for objective assessment of DCT-coded video quality with or without an original video sequence | |
US7023914B2 (en) | Video encoding apparatus and method | |
US6834080B1 (en) | Video encoding method and video encoding apparatus | |
US6983079B2 (en) | Reducing blocking and ringing artifacts in low-bit-rate coding | |
US7346208B2 (en) | Image artifact reduction using a neural network | |
TW221085B (en) | ||
Liu et al. | Perceptual reduced-reference visual quality assessment for contrast alteration | |
EP1213684A2 (en) | Image data compression method and system | |
US20060056731A1 (en) | Determining sharpness predictors for a digital image | |
WO1997039417A2 (en) | Method and apparatus for training a neural network to use a fidelity metric | |
US6571142B1 (en) | Data processing apparatus, data processing method, and medium | |
US20050123038A1 (en) | Moving image encoding apparatus and moving image encoding method, program, and storage medium | |
EP1113395A2 (en) | Method and apparatus for hierarchical image coding | |
US6477278B1 (en) | Image coding apparatus, image coding method, image decoding method, image decoding apparatus, image data transmitting method and recording medium | |
JP3844445B2 (en) | Video content distribution method, apparatus, and video content distribution program | |
Huang et al. | JQF: Optimal JPEG quantization table fusion by simulated annealing on texture images and predicting textures | |
Sun et al. | Reduced-reference image quality assessment through SIFT intensity ratio | |
EP1501295A1 (en) | Data converting apparatus, data converting method, learning apparatus, leaning method, and recording medium | |
CN115190298B (en) | Image encoding method, encoding device and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): CA JP KR MX |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
AK | Designated states |
Kind code of ref document: A3 Designated state(s): CA JP KR MX |
|
AL | Designated countries for regional patents |
Kind code of ref document: A3 Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE |
|
NENP | Non-entry into the national phase |
Ref country code: JP Ref document number: 97537111 Format of ref document f/p: F |
|
122 | Ep: pct application non-entry in european phase |