GB2401739A

GB2401739A - Data compression

Info

Publication number: GB2401739A
Application number: GB0308954A
Authority: GB
Inventors: Nicholas Ian Saunders; Robert Mark Stefan Porter
Original assignee: Sony United Kingdom Ltd
Current assignee: Sony Europe BV United Kingdom Branch
Priority date: 2003-04-17
Filing date: 2003-04-17
Publication date: 2004-11-17
Also published as: GB0308954D0

Abstract

A data compression apparatus in which target data quantities are allocated to respective subsets of input data, the target data quantities together providing a desired output data quantity comprises a compression arrangement for compressing each data subset in accordance with a respective compression parameter and a respective target data quantity, the compression arrangement generating an actual data quantity indicating a quantity of compressed data in respect of that subset and compression parameter; and target control logic for modifying the target data quantities of data subsets yet to be compressed by the compression arrangement, in response to differences between the target data quantities of already-compressed data subsets and the actual data quantities generated by the compression arrangement in respect of those already-compressed data subsets.

Description

1 2401739

DATA COMPRESSION

The present invention relates to data compression.

Data compression techniques are used extensively in the data communications field in order to communicate data at bit rates that can be supported by communication channels having dynamically changing but limited bandwidths. Image data is typically compressed prior to either transmission or storage on an appropriate storage medium and it is decompressed prior to image reproduction.

In the case of still images data compression techniques take advantage of spatial redundancy, whilst for moving images both spatial and temporal redundancy is exploited.

Temporal redundancy arises in moving images where successive images in a temporal sequence, particularly images belonging to the same scene, can be very similar. The Motion Picture Experts Group (MPEG) has defined international standards for video compression encoding for entertainment and broadcast applications. The present invention is relevant (though not at all restricted to) to implementations of the MPEG4 "Studio Profile" standard that is directed to high end video hardware operating at very high data rates (up to 1 Gbit/s) using low compression ratios.

Discrete Cosine Transform (DCT) Quantisation is a widely used encoding technique for video data. It is used in image compression to reduce the length of the data words required to represent input image data prior to transmission or storage of that data. In the DCT quantisation process the image is segmented into regularly sized blocks of pixel values and typically each block comprises 8 horizontal pixels by 8 vertical pixels (8Hx8v). In conventional data formats video data typically has three components that correspond to either the red, green and blue (ROB) components of a colour image or to a luminance component Y along with two colour difference components Cb and Cr. A group of pixel blocks corresponding to all three ROB or YCbCr signal components is known as a macroblock (MB).

The DCT represents a transformation of an image from a spatial domain to a spatial frequency domain and effectively converts a block of pixel values into a block of transform coefficients of the same dimensions The DCT coefficients represent spatial frequency components of the image block. Each coefficient can be thought of as a weight to be applied to an appropriate basis function and a weighted sum of basis functions provides a complete representation of the input image. Each 8HX8V block of DCT coefficients has a single 'DC" coefficient representing zero spatial frequency and 63 "AC" coefficients. The DCT coefficients of largest magnitude are typically those corresponding to the low spatial frequencies. Performing a DCT on an image does not necessarily result in compression but simply transforms the image data from the spatial domain to the spatial frequency domain.

In order to achieve compression each DCT coefficient is divided by a positive integer known as the quantisation divisor and the quotient is rounded up or down to the nearest integer.

Larger quantisation divisors result in higher compression of data at the expense of harsher quantisation. Harsher quantisation results in greater degradation in the quality of the reproduced image. Quantisation artefacts arise in the reproduced images as a consequence of the rounding up or down of the DCT coefficients. During compressed image reproduction each DCT coefficient is reconstructed by multiplying the quantised coefficient (rounded to the nearest integer), rather than the original quotient, by the quantisation step which means that the original precision of the DCT coefficient is not restored. Thus quantisation is a "lossy" encoding technique.

Image data compression systems typically use a series of trial compressions to determine the most appropriate quantisation divisor to achieve a predetermined output bit rate. This process is often carried out on an image-block by image-block basis, so that the trial compressions can reflect the different image properties of the blocks. Trial quantisations are carried out at, say, twenty possible quantisation divisors spread across the full available range of possible quantisation divisors. The two adjacent trial quantisation divisors that give projected output bit rates just above and just below the target bit rate are identified and a refined search is carried out between these two values. Typically the quantisation divisor selected for performing the image compression will be the one that gives the least harsh quantisation yet allows the target bit rate to be achieved.

This invention provides a data compression apparatus in which target data quantities are allocated to respective subsets of input data, the target data quantities together providing a desired output data quantity; the apparatus comprising: a compression arrangement for compressing each data subset in accordance with a respective compression parameter and a respective target data quantity, the compression arrangement generating an actual data quantity indicating a quantity of compressed data in respect of that subset and compression parameter; and target control logic for modifying the target data quantities of data subsets yet to be compressed by the compression arrangement, in response to differences between the target data quantities of already-compressed data subsets and the actual data quantities generated by the compression arrangement in respect of those already-compressed data subsets.

The invention recognises that in a system where each data subset (e.g. image block) is quantised in accordance with a target data quantity relating to that subset, this can lead to some inefficient use of the available data capacity. In particular, some or all subsets could use fewer than their available data quantity, leading to a waste of that data capacity. The invention allows such differences to be passed on to subsequent data subsets, so potentially improving the efficiency of use of the available data capacity.

It will be appreciated that the compression arrangement could be a trial compression arrangement (such as a binary search arrangement), in which case the compression arrangement need not necessarily output actual compressed data - it could output just the data quantities which would be generated at a particular compression parameter. Or the compression arrangement could be a final compression arrangement.

Further respective aspects and features of the invention are defined in the appended 1 5 claims.

Embodiments of the invention will now be described, by way of example only, with reference to the accompanying drawings in which: Figure 1 schematically illustrates a problem with known backsearch techniques.

Figure 2 is a schematic diagram of a compression encoder and a corresponding decoder for use with a data recording/reproducing device or a data transmission/reception system; Figure 3 schematically illustrates the bit rate reducing encoder of Figure 1; Figure 4 is a table of parameters used in the bit rate reduction process of the encoder of Figure 1.

Figure 5 schematically illustrates the decoder of Figure 1; Figure 6 schematically illustrates a calculation performed by a DCT_PRECIS1ON detection circuit of the shuffle unit of Figure 2; Figure 7 schematically illustrates the bit allocation module of the encoder of Figure 2; Figure 8 is a flow chart showing calculations performed by the bit allocation module of Figure 7; Figure 9 schematically illustrates an advantage of the adjusted backsearch according to embodiments of the invention; Figure I OA is an illustrative graph of MBU_DCT_BITS against Q_SCALE; Figure lOB is a graph of MBU_DCT_BITS against QSCALE in a case where quantisation is too harsh for all 6 trial quantisation values; Figure lOC is a graph of MBU_DCT_BITS against Q_SCALE in a case where quantisation is not harsh enough for any of 6 trial quantisation values.

Figure 11 schematically illustrates the internal structure of a binary search module of the encoder of Figure 3; Figure 12 is an illustrative example of a 5 stage binary search performed by the apparatus of Figure 11; Figure 13 schematically illustrates the Macro-Block delay at each stage of the binary search of Figure 12; Figure 14 schematically illustrates the backsearch module of the encoder of Figure 3; Figure 15 schematically illustrates a second embodiment of a binary search module; Figures 16 and 17 schematically illustrate the reallocation of "spare bits" in a pipelined binary search system; Figure 18 schematically illustrates a truncated binary search process; Figure 19 schematically illustrates a search process using available bit lengths; and Figures 20 and 21 schematically illustrate techniques for using "spare bits".

Figure 2 is a schematic diagram of a data compression system. This system comprises an encoder 10, a data processing module 20 and a decoder 30. An input high definition video signal 5 is received by the encoder 10. The encoder 10 models the video image data to remove redundancy and to exploit its statistical properties and it produces output data symbols that represent the information in the input image data 5 in a compressed format. The encoder 10 outputs a compressed data signal 15A that is supplied to the data processing module 20 where it is either transmitted across a communication channel or stored on a recording medium. A compressed data signal 1 5B that was either read from the recording medium or received across a communication network is supplied to a decoder 30 that decodes the compressed data signal 1 5B to form a high definition output image signal 35.

Figure 3 schematically illustrates the bit rate reducing encoder of Figure 2. Data signals Dl, D2 and D3 correspond to RGB input channels for high definition video frames which are supplied as inputs to a shuffle unit 100. It will be appreciated that in an alternative embodiment the data could be supplied in YCBCR format. Furthermore the images can be processed either in a progressive frame mode or in an interlaced field mode. The shuffle unit serves to distribute the input data into Macro- Block Units (MBUs). In this embodiment \ s there are 40 MBUs per video frame, each of which comprises 204 MBs. image samples of each input frame are temporarily written to an external SDRAM 200. During this shuffle write process the values for two quantisation divisor parameters Q_START and DCT_PREClsloN which are required for the subsequent encoding process are calculated.

Blocks of pixels are read from the external SDRAM 200 according to a predetermined shuffle ordering that serves to interleave the image data so that blocks of pixels which are adjacent in the input image frame are not read out at adjacent positions in the shuffle ordering.

The shuffle process alleviates the effect of data losses on the image reconstructed by the decoder apparatus. Pixel blocks that are adjacent to each other in the input video frame are separated in the shuffled bit stream. A short duration data loss in which a contiguous portion of the bit stream is corrupted may affect a number of data blocks but these blocks will not be contiguous blocks in the reconstructed image. In this case data concealment can feasibly be used to reconstruct the missing blocks. Furthermore the shuffle process improves the picture quality during shuttle playback since it distributes input video data pseudo randomly in the MBUs which serves to reduce the variation in the quantisation parameters selected for each MBU in an image frame.

A current image frame is written to the external SDRAM 200 while a previous frame is read, in shuffled format, from the external SDRAM 200. The shuffle unit 100 generates two output signal pairs: a first pair comprising signals S_OP_DI and S_OP_D2 and a second pair comprising signals S_OP_DD1 and S_OP_DD2 containing the same MBU data but delayed by approximately one MBU with respect to the first signal pair. The delay serves to compensate for the processing delay of a bit allocation module 400 belonging to a Q allocation unit 300. The first signal pair is used by the Q allocation unit 300 to determine an appropriate coding mode and a quantisation divisor known as a QSCALE parameter for each MB of the MBU.

The output signals from the shuffle unit 100 are supplied to the Q allocation unit 300 that comprises the bit allocation module 400, a target insertion module 500, a DCT module 600 and a binary search module 700. The first output signal pair S_OP_D1 and S_OP_D2 from the shuffle unit 100 are input to the bit allocation module 400. These input signals comprise raster scanned 8HX8V vertical blocks of 12-bit video samples.

The bit allocation module 400 performs a comparison between lossless differential pulse code modulation (DPCM) encoding and DCT quantisation encoding.

DPCM is a simple image compression technique that takes advantage of the fact that spatially neighbouring pixels in an image tend to be highly correlated. In DPCM the pixel values themselves are not transmitted. Rather, a prediction of the probable pixel value is made by the encoder based on previously transmitted pixel values. A single DPCM encoding stage involves a DPCM reformat, a DPCM transform and entropy encoding calculations.

By way of contrast, the DCT quantisation encoding involves a single DCT transform plus several stages of quantisation using a series of quantisation divisors, each quantisation stage being followed by Huffman entropy encoding calculations. In this embodiment 6 trial quantisation divisors are tested by the bit allocation module 400. Huffman coding is a known lossless compression technique in which more frequently occurring values are represented by short codes and less frequent values with longer codes. The DCT trial encoding stages optionally involve quantisation that is dependent on the "activity" of an image area. Activity is a measure calculated from the appropriately normalised pixel variance of an image block. Since harsher quantisation is known to be less perceptible to a viewer in image blocks having high activity, the quantisation step for each block can be suitably adjusted according to its activity level. Taking account of activity allows for greater compression while maintaining the perceived quality of the reproduced image.

The DPCM and DCT quantisation trial encoding stages are used to calculate MB bit targets constrained by a predetermined frame target based on the required encoding bit rate.

For each MB the mode (DCT or DPCM) that gives the fewest encoded bits is selected for use. The bit allocation module outputs a signal 405 to the target insertion module 500. The signal 405 comprises information about the encoding mode selected for each Macro-Block, a QSCALE quantisation divisor value QBASE to be used by a binary search module 700 and a bit count target for each Macro-Block. The information in the signal 405 is added to the bit stream of the delayed image data to which it corresponds by the target insertion module 500.

The target insertion module 500 outputs two signals SOSA and SOSB which are supplied as inputs to the DCT module 600.

The DCT module 600 again calculates DCT coefficients, this time based on the delayed version of the image data. The DCT module 600 outputs the data to the binary search module 700. The binary search module 700 performs a second stage of Q allocation for each of the MBs that is to be DCT quantisation encoded and uses a binary search technique to determine an appropriate quantisation divisor for each Macro-Block. The binary search module 700 determines the quantisation divisor to a higher resolution (within a given range of available quantisation divisors) than the resolution used by the bit allocation module 400. In fact QBASE is used to define a starting point for a five stage binary search that results in the selection of a higher resolution quantisation step QALLOC for each DCT mode Macro-Block. The DPCM mode Macro-Blocks are routed through the binary search module 700 via a bypass function so that the data is unaltered on output.

The output from the binary search module 700 that includes the value QALLOC for each DCT mode Macro-Block is supplied to a back search module 800. The back search module 800 checks that the QALLOC value chosen for each MB is the "best" quantisation scale for encoding. As explained in the introduction, for image data that has undergone at least on previous encode/decode cycle, the least harsh quantisation that is achievable for a given target bit count will not necessarily give the smallest possible quantisation error for the Macro-Block. Instead, the smallest quantisation error is likely to be achieved by using a quantisation divisor that is substantially equal to the quantisation divisor used in the previous encode/decode cycle. Accordingly, the back search module 800 estimates the quantisation error for a range of quantisation divisors, starting at QALLOC and working towards harsher quantisations. It determines the quantisation step QFINAL that actually produces the smallest possible quantisation error. The trial quantisations are performed on DCT mode Macro Blocks only and a bypass function is provided for DPCM mode macroblocks.

The output from the back search module 800 which includes DCT blocks generated by the DCT encoder 600 together with the selected quantisation step QFrNAL is supplied to a quantiser 900 where the final quantisation is performed. The quantisation procedure is as follows: In DCT mode encoding the single DC coefficient of each 8HX8V block is quantised according to the equation: Q(DC) = DC / (DC_ QUANT * DCT_ SCALER) where DC is the unquantised coefficient and DCQUANT is a quantisation factor that is set by the system and is used to quantise all of the MBs. DC_QUANT is determined from DC_PRECISION as shown in the table below | DC_PRECISION | 00 | 01 | 10 | 11 | | DCQUANT | 8 1 4 1 2 1 1 | DC_PRECISION is set to a fixed value, preferably 00, for each frame.

DCT_scALER is a quantisation factor determined by the DCT_PRECISloN index such that DCT_scALER = 2 DCr-PRECiSiON. In this embodiment a convention is used where DCT_PRECIsloN has the four possible values 0, 1, 2, 3 and 3 corresponds to the most harsh quantisation. Note that a different convention is used in the MPEG4 Studio Profile standard where DCT_PRECISION = 0 corresponds to the most harsh quantisation while DCT_PRECISION = 3 corresponds to the least harsh quantisation.

Similarly the 63 AC coefficients of the block are quantised according to the equation: Q(AC) = (AC* 16) / (QMATRIx*AC_QuANTlsE*DCT_scALER) where AC is the unquantised coefficient and QMATRIX is an array of 63 weights, one for each element of the DCT block. AC_QUANTsE is the product of Q_scALE and NORM_ACT. Q_SCALE is a factor corresponding to either a linear quantiser scale or a non linear quantiser scale, as specified by a Q_sCALE_TYPE. Each of the Q_scALE_TYPEs comprises 31 possible values denoted Q_SCALE_coDE(1) to Q_SCALE_CODE(31). The table of Figure 4 shows the QSCALE values associated with each Q_SCALE_TYPE for all 31 QscALE_coDEs. In the above equation NORM_AcT is a normalized activity factor that lies in the range 0.5 to 2.0 for "activity on" but is equal to unity for "activity off".

ACQUANTISE = NORM_ACT * Q_SCALE and is rounded up to the nearest QSCALE (i.e. a QSCALE that corresponds to a Q_ScALE_coDE in the Table of Figure 3) before it is included as part of the divisor.

The results of the quantisations Q(DC) and Q(AC) are rounded using the known technique of normal infinity rounding. This technique involves rounding positive numbers less than 0.5 down (towards zero) and positive numbers greater than or equal to 0.5 up (towards plus infinity), whereas negative numbers greater than -0.5 are rounded up (towards zero) and negative numbers less than or equal to -0.5 are rounded down (towards minus infinity).

The bit allocation module 400, the binary search module 700 and the back search module 800 each implement a quantisation process in accordance with that implemented by the quantise module 900. However, in the binary search module 700 and the back search module 800 the factor NORM_ACT is always set equal to 1. Only during the bit allocation process carried out by bit allocation module 400, does NORM_ACT take a value other than 1.

Since the MB targets generated during bit allocation take account of activity it need not be taken into account at subsequent stages.

The quantised data are output from the quantise module 900 and subsequently supplied to an entropy encoder 1000 where lossless data compression is applied according to the standard principles of entropy encoding. In this embodiment Huffman encoding is used, according to which more frequently occurring values are represented using short codes while the less frequently occurring values are represented using longer codes.

The output from the entropy encoder 1000 is supplied to a packing module 150 within the shuffle unit 100. The packing module 150 together with the external SDRAM 200 is used to pack the variable length encoded data generated by the entropy encode module 1000 into fixed length sync-blocks. A sync-block is the smallest data block that is separately recoverable during reproduction of the image.

The packing function is implemented by manipulation of the SDRAM read and write addresses. Each MBU is allocated a fixed packing space in the SDRAM which is then subdivided into a nominal packing space for each MB. The total length of each MB must also be stored and this can either be calculated from the individual word lengths or passed directly from the entropy encode module 1000 to the packing module 150. The output from the encoder 10 comprises sync-block 1 data output SB1 and sync-block 2 data output SB2.

An indication of the quantisation divisors used in the encoding process is also transmitted to the decoder 30.

Figure 5 schematically illustrates the decoder 30 of Figure 2. The decoder is operable to reverse the encoding process and comprises an unshuffle unit 2010, an unpack unit 2020, an external SDRAM 2100, an entropy decoding module 2200, an inverse quantiser 2300 and an inverse DCT module 2400. The sync-block data signals SB1 and SB2 that are either read from the recording medium or received across a data transfer network are received by the unpack unit 2020 that implements an unpacking function by writing to and reading from the external SDRAM 2100. The unpacked data is supplied to the entropy decoder that reverses the Huffman coding to recover the quantised coefficients which are supplied to the inverse quantiser 2300. The inverse quantiser 2300 uses information supplied by the encoder 10 about the quantisation divisors and multiplies the quantised coefficients by the appropriate quantisation divisors to obtain an approximation to the original DCT coefficients. This inverse quantisation process does not restore the original precision of the coefficients so quantisation is a "lossy" compression technique. The output from the inverse quantiser 2300 is supplied to the inverse DCT module 2400 that processes each block of frequency domain DCT coefficients using an inverse discrete cosine transform to recover a representation of the image blocks in the spatial domain. The output of the inverse DCT module 2400 will not be identical to the pre-encoded pixel block due to the information lost as a result of the quantisation process. Finally the output of the inverse DCT module 2400 is supplied to the unshuffle unit 2010 where the data is unshuffled to recover the image block ordering of the pre-encoded image. The output of the unshuffle unit 2010 comprises the three colour component video signals RGB from which the image can be reconstructed.

Figure 6 schematically illustrates a calculation performed by a DCT_PRECISIoN detection module (not shown) that is implemented in the shuffle unit 100. The DCT_PRECIsloN detection module determines whether the input video data is source or non- source and, in the case of non-source data, it detects the DCT_PREclsloN index that was used in a previous encode/decode cycle. The value of DCT_PRECISIoN affects the quantisation of both DC and AC coefficients. Given that the value of DCQUANT is known (DC_PRECISION set to fixed value 00 for each frame so that DCQUANT = 8 for this embodiment) it is possible to detect the value of DCT_PREclsloN used in a previous generation by analysing the DC rounding effects. The DCT_PREclSIoN detection module is supplied with input video data and performs an analysis of the DC quantisation of this data.

For each DCT block of the image field or frame the six Least Significant Bit (LSB) values for each of the 64 pixels of the block are summed to generate a 6-bit value DCI5 0] for each block. A frequency of occurrence of particular DC[5 01 values is built up according to the following algorithm: SO = number of occurrences of DC50] = 00 0000 S. = number of occurrences of DC5 0 = 10 0000 S2 = number of occurrences of DC50] = xl 0000 S3 = number of occurrences of DC50] = xx 1000 S4 = number of occurrences of DCI50] = xx xlOO where "x" represents either O or 1. Effectively, the number of instances of the DCls al being: divisible by 64 corresponds to the sum So; divisible by 32 (but not 64) corresponds to the sum S.; divisible by 16 (but not 32) corresponds to the sum S:; divisible by 8 (but not 16) corresponds to the sum S.; and divisible by 4 (but not 8) corresponds to the sum S.. Figure 6 schematically illustrates how frequency of occurrence of particular DC[5 01 values is calculated.

In this embodiment the five sums So to S. include all the DCT blocks from all video components. However, in alternative embodiments the sums So to S. may be calculated separately for each component (RGB or YCbCr) and the final DCT_PREclSIoN decisions can be combined using, for example, a majority decision.

Once the sums So to SO have been calculated, the DCT_PRECS'oN used at the previous generation is detected using four predetermined threshold values, th, to th, to produce the estimated value DCT_PREC_DETECTED. The following pseudocode defines the algorithm used: if (SO > th, * S. ) DCT_ PREC_DETECTED = 3 else if (SO + S.' > th2 * S2) DCT_PREC_DETECTED = 2 else if (SO + S. + S2 > th3 * S3) DCT_PREC_DETECTED = I else if (SO + S. + S2 + S3 > th4 * S4) DCT_ PREC_DETECTED = 0 else Source Data The above algorithm assumes that DCQUANT= 8 (DC_PRECISION = 00) in both the previous generation and in the current generation. Since Q(DC) = DC/DCQUANT * 2 DcT_PREc and DC_QUANT = 8 if we detect a divisor of e.g. 8 on the DC data then we deduce that there was no further quantisation so that DCT_PREC_DETECTED = 0 in the above algorithm. It will be appreciated that the algorithm should be adapted to take account of the value of DCQUANT in both the previous and the current generation.

If the value of the sum So is greater than the product of a threshold value thy and the sum S' then the detected divisor of DC data is 64 = 8 * 23 so the algorithm sets DCT_ PREC_DETECTED = 3 which corresponds to the most harsh DCT_ PRECISION quantisation. If the value of (So ASP) is greater than the product of a threshold value th2 and the sum S2 then the detected divisor of DC data is 32 = 8 * 22 so the algorithm sets DCT_ PREC_DETECTED = 2. If the value of (So + S' + S2) is greater than the product of a threshold value th3 and the value of the sum S3 then the detected divisor of DC data is 16 = 8 * 2 so the algorithm sets DCT_ PREC_DETECTED= 1. Finally, if the value of (So+ S' + S2 +S3) is greater than the product of a threshold value th4 and the value of the sum S4 then the detected divisor of DC data is 8 so the algorithm sets DCT_PREC_DETECTED=0 which corresponds to the least harsh DCT_PRECISIoN quantisation. The threshold values for this particular embodiment are th' = th2= th3= 16 and th4= 2. The threshold values are determined empirically by performing calculations on test image sequences. This algorithm essentially quantifies the severity of the rounding effects on the pixel values in order to detect the previous value of the quantisation divisor DCT_PRECIsloN.

The DCT_PRECISION detection module outputs a "source"/ "not source" sigthat indicates whether or not the image data has previously undergone at least one encode/decode cycle and this signal is supplied as input to the bit allocation module 400 of Figure 3 for use in calculation of target bit counts.

Figure 7 schematically illustrates the bit allocation module 400 of Figure 2. This module has three main functions: firstly it selects the encoding mode for each Macro-Block from the two available alternatives of lossless DPCM and lossy DCT encoding; secondly it calculates a target number of bits MB_TARGET for each Macro-Block from a similar target for the associated Macro-Block Unit that is in turn calculated from the required target bit rate for the system; and thirdly is calculates a Q_SCALE value Q_sAsE defined to be the starting scale for a binary search that is performed by the binary search module 700. The binary search module determines a Q_sCALE value Q_ALLoc, which is obtained using a higher resolution Q_SCALE search than that used to obtain the value QsAsE.

The shuffled output image data signals S_OP_D1 and S_OP_D2 from the shuffling unit 100 are supplied as input to the bit allocation module 400. These input signals comprise raster scanned 8HX8V DCT blocks of 12-bit video samples. The final DCT_PREclsloN and QsTART values produced by the parameter estimation circuit are also supplied as input to the bit allocation module 400. The bit allocation module 400 comprises a DPCM reformat module 410; a DPCM module 420; a Golomb length module 430; a DCT module 440, a lossy encoding module 450 having a quantisation module 452 and a Huffman length module 454; an activity module 460; and a decision logic unit 470. The decision logic unit 470 supplies input to the target insertion module 500.

The bit allocation module 400 makes encoding decisions based on the trial encoding of a Macro-Block Unit, which for this embodiment consists of 204 MBs. The encoding mode decision between lossless DPCM and lossy DCT is made on the basis of a single DPCM encoding stage (performed in combination by the DPCM reformat module 410, the DPCM module 420 and the Golomb length module 430) and on the basis of 6 DCT trial encoding stages involving a single discrete cosine transform in the DCT module 440 followed by six cycles through the lossy encoding unit 450. The activity module 460 is operable to adjust the quantisation divisors that are applied to the data by the quantisation module 452. The calculations performed by the activity module 460 are described in detail below.

The decision logic unit 470 is supplied with the output of the Golomb Length module 430 which is the entropy encoding stage of DPCM encoding and is also supplied with the output of the Huffman length module 454 which is the entropy encoding stage of DCT quantisation encoding. The decision Logic unit 470 compares the results for the DPCM trial encoding and the DCT trial encoding. Lossless DPCM is selected for a Macro-Block if, and only if, it results in a lower overall bit count. The decision logic unit 470 is also operable to calculate the target bit count targets for both the Macro-Block Units and Macro-Blocks.

The Activity module 460, which is part of the DCT mode encoding circuitry, is used to calculate an activity measure based on the pixel variance of image blocks. It is known that harsher quantisation is less perceptible to the viewer in image blocks which have higher activity levels. The quantisation step for each block can be offset by suitably adjusting the QSCALE quantisation divisor used in the quantisation module 452 of the bit allocation module 400 so that higher activity blocks are quantised more harshly. As we shall explain later below, the Q_scALE_coDEs that are used for the bit allocation process depend on whether we have "activity off" or "activity on". Furthermore the activity factor NORM_ACT appears in the denominator of the formula for the quantisation of the AC DCT coefficients Q(AC) above.

Activity is calculated only once for each Macro-Block. Values of a parameter INTRAMAD are calculated for each 8Hx8v DCT block in MacroBlock: for luminance (Y) DCT blocks only in YCbCr mode and for R. G and B DCT blocks in RGB mode. IntraMAD is defined as follows: INTRAMAD[jl= |dct[i,j]- dct_dc] ||/64 A! where dct[i, j] is the pixel value of pixel i in DCT block number j. The parameter dot dclj] is the mean value of dct[i, j] for the 8HX8V DCT block (and for a given signal component) and it is given by the formula: det_dc]=(dCt[i,j])I64 The minimum VALUE OF INTRAMAD for all of the Y or ROB DCT blocks in the Macro-Block is calculated as follows: n MANMADE min (INTRAMAD[f]) j'l and the activity, ACTis given by: ACT = 1 + MinMAD Since values for ACT range from 1 to several thousand the activity is normalised so that it lies within a predetermined range. In this embodiment ACT is normalised using previous Macro-Block Unit data to give a normalised activity measure NORM_ACT that lies in the range 0.5 to 2: NORM_ ACT = (2.ACT + AVG_ACT) / (AcT + 2.AVG_ ACT), where AVG_AcT = Average of AcT from previous MBU.

At the start of an image sequence or if a scene change has been detected (using a standard method) a default value DEFAULT_ AVG_AcT that depends on Q_START and DCT_PRECIsloN is used instead of AVG_AcT. For the first MBU in any frame (except for the first frame in the sequence), a value FRM_AVG_AcT given by the average of ACT for all MBUs in previous frame is used in place of AVG_AcT. Although for this embodiment the activity is normalised to lie in the range 0.5 to 2.0, in alternative embodiments it may be normalised to any range p/q to q/p using the following general equation: NORM_AcT = (q.ACT + p. AVG_ ACT) / FACT + q. AVG_ACT) so that for the selected NORM_AcT range of 0.5 to 2.0, the parameters p and q have the values 1 and 2 respectively.

To increase repeatability over multiple generations of encoding, NORM_AcT is constrained to a fixed number of levels. For the given range of 0.5 to 2, NORM_AcT is constrained to the 8 levels t4/8, 518, 618, 718, 1, 413, 513, 2}.

Figure 8 is a flow chart representing the algorithm implemented on each MBU by the bit allocation module according to embodiments of the invention. Step 1 is performed in combination by the DPCM reformat module 410, the DPCM module and the Golomb length module. Steps 2A to 2D are performed by the quantise module 452 and Steps 3, 4 and 5 are performed by the decision Logic module 470.

Step 1 involves performing one stage of lossless DPCM trial encoding on all MBs in the MBU to determine for each MB the number of bits, MB DPcM_slTs, produced by DPCM encoding.

Step 2A involves performing 6 stages of lossy DCT encoding on all MBs within the MBU to determine for each block and for each of 6 quantisation divisors Q_scALE(16), the number of bits MB_DcT_slTs(QscALE) produced by lossy DCT encoding the MB. Each value of Q_SCALE value is specified by one of 31 Q_SCALE_CODES in combination with a QscALE_rPE that is either linear or non-linear. Rather than performing a complete scan of the full range of Q_SCALE quantisation divisors associated with the 31 available Q_SCALE CODES, a subset (SS) of Q_SCALE_CODES is selected. The SS is defined by taking into account both the value of DCT_PRECISIoN and whether activity is on or off. For example for a DCT_PRECIsoN = 1, with activity off the subset is given by: SS = [2, 3, 5, 7, 10, 15, 22, 31] whereas with activity on (NORM_ACT is in range 0.5 - 2 in this case), the subset is given by: SS = [2, 3, 5, 7, 10, 15] Table 1 below shows the bit allocation subset used in this embodiment of the invention for each value of DCT_PRECISIoN for both "activity on" and "activity off".

DCT_ PRECISION I, . : Activity Off On Off On Off On Off On ss(0) 4 2 2 1 1 1 1 ss(l) 6 3 3 2 2 2 2 ss(2) 10 5 5 3 3 3 3 ss(3) 15 7 7 5 5 5 5 ss(4) 22 10 10 7 7 7 7 ss(5) 31 15 15 11 11 11 11 ss(6) 22 15 15 15 15 ss(7) 31 22 22 ss(8) 31 31 Q_start_error 24 12 6 3 Table 1: Subset of Q_SCALE_CODEs for Bit Allocation The 6 Q_SCALES used for the trial quantisation correspond to the quantisation divisors associated with 6 of the QSCALE_CODES from the appropriate SS. Note that each SS will typically comprise more than 6 QSCALE_CODEs. The value of Q_START is used to select the 6 particular QSCALE_CODES from the subset to be used for the trial quantisation procedure.

The lowest allowable QsCALE_coDE in the subset is the code that represents the quantisation factor MiNIMuMQ where: MrNIMUM_Q = 2(3 PRECISION) or if this code does not exist then the lowest allowable QSCALE_CODE is the lowest Q_SCALE CODE that represents a number greater than the MINIMUM_Q.

Step 2B involves a calculation of the error in QSTART which is related to the value of DCT_PRECsloN and is given by: QSTART_ERROR = 24/(2 DCTPRES' N) so for values of DCT_PRECISION where quantisation is less harsh the error in QSTART is greater.

Step 2C involves calculating Q_SCALE MIN that is given by Q_ START minus Q_START_ERROR and obtaining the QSCALE_CODE, QSCALE_CODE_MIN that is associated with QSCALE_MIN from the look-up table of Figure 4. The closest Q_SCALE_CODE in the subset that is less than or equal to Q_SCALE_CODE_MIN is identified and the quantisation divisor associated with that Q_SCALE_CODE is defined to be the first of the six trial quantisation parameters i.e. QSCALE(1). If however there is no Q_SCALE_CODE in the subset that is less than or equal to Q_SCALE_CODE_MrN then Q_SCALE(1) is set to the quantisation factor associated with the lowest Q_SCALE CODE in the subset.

Step 2D involves assigning the remaining five trial quantisation factors QSCALE(2) to QSCALE(6) to the quantisation scales represented by the five next contiguous Q_SCALE CODES in the subset that are higher than Q_SCALE(1). If there are insufficient QSCALE_CODES in the subset to allow this, then QSCALE(6) is set to the uppermost Q_scALE_coDE in the subset and the values QSCALE(1...5) are set to the next 5 QSCALE CODES from the subset below Q_SCALE(6).

Step 3 involves performing the six trial DCT quantisations and summing the values of MB-DcT-sITs(Q-scALE) over all of the Macro-Blocks in a Macro-Block Unit to produce the value MBU_DCT-BITS(QSCALE) for each of the 6 QSCALES. A bit count target for each MBU is calculated from a predetermined FRAME_TARGET defined for each

image frame (or field) such that:

MBU_TARGET =FRAME_TARGET / 40.

In known systems the FRAME_TARGET is based entirely on the required encoding bit rate. However embodiments of the invention assign the FRAME_TARGET in dependence upon both the required encoding bit rate and whether the video data is "source" or "not source". If the input data is determined to be "source" data then the required encoding bit rate is artificially reduced by adjusting it downwards by a small amount so that the FRAME_TARGET for "source" data is also effectively adjusted downwards. In this embodiment the downwards adjustment is 2.5%. As a consequence of preferentially reducing the FRAME_TARGET for "source" input data the Macro-Block target bit count calculated by the bit allocation module for that source data will be reduced so that Q_ALLOC (calculated by the binary search module 700) for the ls' generation Q. ALLOC GENI will increase giving slightly harsher final quantisation Q. FINAL GENI for the ls' reproduced image than would be the case without FRAME_TARGET reduction. However the slight reduction in quality of the 1st generation image is offset by the advantage gained for subsequent compression/decompression of the reduced FRAME_TARGET image data. The 2n and higher generation encoding of this data will involve unadjusted FRAME_TARGETs.

The 2n generation adjusted value QALLOC GEN2 ADJUSTED is much more likely than the unadjusted value Q. ALLOC GEN2 to be less than Q. FINAL GENI. This is illustrated by Figure 9 which shows that the prior art backsearch starting point misses Q. FINAL GENT whereas the adjusted backsearch i.e. backsearch arising from adjustment of the Is' generation FRAME_TARGET includes Q. FINAL_GENI in its trial Q_sCALE range.

Accordingly the adjusted backsearch process will be more likely to find Q_FNAL GEN2= Q_FINAL_GENI which gives the best possible image quality for 2n (and higher) generations.

Note that during encoding of 2n generation input data the required bit rate of the Is' generation encoding will not necessarily be known.

The set of six values of MBU_DCT slTs(QscALE) are compared to MBU_TARGET and the two values Bu MBu and BL MBu of MBU_DCT BITS(Q_SCALE) that lie just above and just below the MBU_TARGET respectively are defined as follows: B u MBu E MBU-DcT BlTS(QSCALE(n)) > MBU TARGET B LMBu _ MBU_ DCT BTTS(QSCALE(n+I)) ≤ MBU TARGET Figure IDA is an illustrative graph of MBU_DcT_sTs against QscALE and shows the two bit count values B_L MBu and B_u MBu that correspond to the QscALE codes QScALE(n) and QscALE(n+l) respectively. Note that the smaller quantisation divisor Q. SCALE(n) gives a higher bit count than the larger quantisation factor Q_scALE(n+l).

Some properties of this "Bits versus QSCALE" graph for the macroblock unit will be used to derive a target bit count MB_TARGET for each MB belonging to the MBU. In particular, Figure 9A shows the quantities X, Y and Z which are defined as follows: X-MBU TARGET-B_UMBU; Y - B_LMBU MBU_TARGET; and Z _B LMBU-B uMBu=x+y so that MBU_TARGET = (B_L MBu X + B_u MBUY) / z. Accordingly in Step 4 of Figure 8 the MB_TARGET is calculated as follows: MB_TARGET = X * MB_DCT_BITS(QscALE(n)) / Z + Y * MB_DCT_BITS(QSCALE(n+l)) / Z and if Z=0 division by zero is avoided by simply setting MB_TARGET = MB_DCT_BITS(Q_scALE(n)).

In Step 5 a decision is made between lossless DPCM encoding and lossy DCT encoding for each MB in the MBU by comparing MB_DPCM_slTS to MB_TARGET. If MB_DPcM_slTs is greater than MB_TARGET then lossy DCT encoding is selected for the macroblock. If however MB_DPcM_slTs is less than or equal to MB_TARGET then the lossless DPCM encoding mode is selected for the macroblock. Each time lossless DPCM encoding is selected for a MB then MBU_TARGET is recalculated such that MBU_TARGET = MBU_TARGET MB_DPCM_slTS.

Also, MBU_DCT_BITS(QSCALE(n)) and MBU_DCT_BITS(Q_SCALE(n+l)) are recalculated as follows: MBU_DCT_BITS(Q_SCALE(n))= MBU_DCT_BITS(Q_SCALE(n) )-MB DCT_BITS(QSCALE(n)), MBU_DCT_BITS(Q_SCALE(n+ 1)) = MBU_DCT_BITS(Q_SCALE(n+ 1)) - MB_DCT_BITS(Q_SCALE(n+ l)).

Since MBU_TARGET is recalculated in Step 5, the MB_TARGETS have to be recalculated taking account of the number of MBs currently assigned to lossy DCT mode encoding. Step 4 and Step 5 are repeated for all lossy DCT mode MBs until no more MBs switch to DPCM encoding mode in Step 5. Typically 5 or 6 iterations of Steps 4 and 5 are sufficient to achieve this.

It is necessary to consider situations where an inaccurate estimate for Q_START means that the appropriate quantisation divisor Q_SCALE lies outside the bit allocation search range (comprising 6 Q_SCALEs). In such cases an alternative strategy to that of Steps 3 and 4 above is required in order to calculate MB_TARGET. There are two possible scenarios to consider.

Firstly, if in step 3 MBU_DcT_slTs(Qscale(n)) is less than or equal to MBU_target for all 6 values of n, indicating that the quantisation is too harsh for all trial values, then MB_TARGET in step 4 is calculated as follows: MB TARGET MB _ DCT _ BITS(Q _ SCALE(I)) x MBU _ TARGET - MBU _ DCT _ BITS(Q_ SCALE(1)) This first scenario is illustrated by Figure 1 OB.

Secondly, if in step 3, the value MBU DcT-slTs(QscALE(n+l)) is greater than MBU target for all 6 values of n, indicating that the quantisation is not harsh enough for any of the 6 trial values, then Q_scALE(5) and Q_SCALE(6) are selected as Q_scALE(n) and Q_SCALE(n+l) in step 3. In this case Step 4 remains unchanged, but now has the effect of extrapolating MB_TARGET. This second scenario is illustrated by Figure l OC.

In addition to calculating MB_TARGETS, the bit allocation module 400 sets the value of Q_sAsE which is the starting QSCALE for the binary search. The Q_SCALE value QsAsE is determined from MB_TARGET using the appropriate MB_DCT_BITS versus QSCALE curve. If we have a 5 stage binary search, as in this embodiment, then Q_sAsE is simply the QSCALE given by QSCALE CoDE = 16 (i.e. the midpoint of the quantiser table).

Thus the binary search can cover the whole range of Q_scALE by starting at this QsAsE.

However, in alternative embodiments which use fewer binary search stages, Q_sAsE is set midway between Q_scALE(n) and QScALE(n+l) for each MacroBlock. For only those macro-Blocks assigned to lossy DCT mode encoding, the values of QsAsE and MB_TARGET are output by the Decision logic unit 470 and added to the bitstream comprising the image data IP_DD1 and IP_DD2 that has been delayed by one MBU by the target insertion module 500. The values MB_TARGET and Q_sAsE are supplied to the binary search module which uses them in determining the final QsCALE.

Figure 11 schematically illustrates the internal structure of the binary search module 700. The binary search module comprises 5 units, one unit for each stage of the binary search. Each unit comprises a MB delay module 710, a quantisation module 720, a Huffman length calculation module 730 and a Q_sCALE selection module 740.

The binary search module 700 receives two input signals BIN_IP_D1 and BIN_IP_D2 comprising discrete cosine transformed Macro-Block data together with the values of MB_TARGET and Q. sAsE calculated by the bit allocation module 400. The signals BIN_IP_D1 and BIN_IP_D2 are supplied to the MB Delay module 710 that delays the input data by one Macro-Block between each of the five stages of the binary search process. The signals BIN_IP_D1 and BIN_IP_D2 are also supplied to the quantisation module 720 that outputs data to the length calculation module 730 which performs Huffman entropy encoding. Each stage of the binary search algorithm involves a cycle through a quantisation module 720 and a Huffman length calculation module 730. The Huffman length module 730 calculates the bit length required to encode the DCTed and quantised Macro-Block. The quantisation module 720 and the Huffman length calculation module 730 in each of the 5 units are supplied with an input signal IP_PGM comprising quantiser weighting matrix data which is used to vary the quantisation of each sample depending on its position in the 8Hx8v DCT block and Huffman length AC variable length coding tables which give the bit length for each Table/Group combination. In this embodiment hardware requirements are reduced by using a single RAM to store the Quantiser Weighting Matrices for all five binary search stages. However separate RAMs are required for the VLC data tables because they must be randomly accessible for each stage.

The delayed input data from the MB delay unit 710 and the output of the length calculation module are supplied as inputs to the Q_SCALE selection module that selects the next QSCALE value to test for the Macro-Block and passes on the selected QScALE value to the subsequent stage of the binary search. The final QSCALE value is selected by the QSCALE selection module of the fifth binary search unit and this is denoted Q_ALLOC.

QALLOC is the smallest QSCALE value that does not exceed the MB_TARGET for the Macro-Block. The Q_ALLOC value is supplied to the back search module 800 for further processing along with the final output signals BIN_OP_D1 and BIN_OP D2. Note that the output signals BIN_OP_D1 and BIN_OP_D2 are delayed by a total of five Macro-Blocks with respect to the input signals BIN_IP_D1 and BIN_IP_D2.

Figure 12 schematically illustrates the 5 stage binary search procedure.

Stage 1 of the binary search quantises the input data using the value Q_BASE, calculated by the bit allocation module 400 to determine a starting QSCALE_CODE for the binary search denoted Q_BASE_CODE. In this embodiment the binary search has 5 stages which search the whole range of QSCALE. Therefore Q_BASE is always set at the centre of this range corresponding to a Q_SCALE_CODE of 16. The quantised data is then passed to the length calculation module 730. If the stage 1 data length returned by the length calculation module 730 is greater than the MB_TARGET then the QSCALE value is increased by eight steps of Q_scArE_coDE for the next binary search stage; otherwise the value is decreased by eight steps of QscAE_coDE. In this case the QSCALE value is increased by 8 steps at stage 2. The input data is delayed by one Macro-Block, to align it for the next stage.

At Stage 2 the data is quantised using the value Q_SCALE CODE = (QBASE_CODE + 8) and stage 2 length is calculated. This stage 2 data length is compared to the MB_TARGET and the Q_scAE is either increased or decreased by four steps of QscAE coDE for the third stage. In this case the QSCALE value is decreased by 4 steps at stage 3. There is a second MB delay to align the input data for stage 3.

At Stage 3 the data is quantised using the value QscALE CODE = (Q_BASE_CODE + 8 - 4) and a stage 3 data length is calculated. This stage 3 data length is compared to the MB_TARcrT and the Q_scAE is either increased or decreased by two steps of Q_scAE_coDE for the fourth stage. In this case the QSCALE value is increased by 2 steps at stage 4. There is a third MB delay to align the input data for stage 4.

At Stage 4 the data is quantised using the value Q_SCALE_CODE = (Q_BASE_CODE + 8 - 4 + 2) and a stage 4 length is calculated. This stage 4 data length is compared to the MB_TARGET and the Q_scAE is either increased or decreased by one step of QscAE coDE for the fifth stage. In this case the Q_sCALE value is increased by 1 step at stage 5. There is a fourth MB delay to align the input data for stage 5.

At Stage 5 the data is quantised using the value Q_SCALE_CODE = (QBASE_CODE + 8 - 4 + 2 + 1) and a stage 5 length is calculated. This stage 5 data length is compared to the MB_TARGET and the Q_scAE. If the stage 5 data length is greater than the MB_TARGET, then the UPSCALE is increased by one Q_scAE_coDE. Otherwise, Q_scAE is not changed. In this case the Q_SCAE is not changed. In this way, the binary search algorithm selects the smallest Q_scAE value that does not exceed the MB_TARGET for the Macro-Block. This Q_scAE value is known as Q_Aroc.

At each stage of binary search, any decision to decrease the QscAE is subject to the condition of MTNMuM_Q. The lowest allowable QscAEE_coDE is the Q_SCAE_CODE representing the MNMuMQ (or if this does not exist, the first Q_scAE_coDE representing a number greater than the MrN'MuMQ), where: MNMuMQ = 2A(3-DCT_PRECSON) If the decrease in QscAE would take it below the lowest allowable Q_scAE_coDE, then it is instead set equal to the lowest allowable Q_scAE_coDE prior to the next stage of binary search.

Figure 13 schematically illustrates the timing of the input and output Macro-Blocks for each binary search stage in this module. The final output is from "stage 5".

For Macro-Blocks encoded in Lossless mode (DPCM), the binary search module 700 operates in BYPASS mode. In BYPASS mode, the Q_scAnE change normally implemented at the end of each binary search stage is disabled. This produces output data identical to the input data to the module.

Figure 14 schematically illustrates the internal structure of the back search module 800. This module comprises: a Q_SCALE extraction circuit 810; a Q_SCALE incrementer 820; a pre-processor 830; six quantiser units 840, each of which comprises a quantise I module 842 and an error calculation module 844; a Macro-Block delay unit 850; and a QSCALE selection module 860.

The inputs to the back search module 800 comprise the output signals BIN_OP_D1 and BIN_OP_D2 of the binary search module 700. This input data includes a value Q_ALLoc for each MB. The QSCALE extraction module 810 extracts the Q_sCALE value QALLoc from the input bit-stream and this value is used as the first quantiser scale tested in the backsearch procedure. Each quantiser unit tests two quantiser scales. The QSCALE extraction module 810 outputs the Q_ALLoc values to the incrementer 820 where they are incremented by one Q. scALE CODE step to give a second quantiser scale to be tested by the i quantiser unit and this second value is supplied as input to the quantise module 842 of the quantiser unit 840. The MB data output by the Q_scALE extraction module 810 is supplied to the MB delay unit where it is delayed by one MB before being supplied as input to the Q_scALE selection module 860. The MB data from the Q_scALE extraction module is further supplied to the pre-processor 830 where the first 31 of the 63 AC DCT coefficients i are repeated such that they occur twice for each DCT block. The output from the pre- Z processor 830 is supplied to each of six quantisation units 840. Each of these quantisation units 840 is used to estimate the quantisation error at two distinct quantiser levels. This two quantiser level per unit calculation is possible because only 31 of the AC DCT coefficients are used in the error calculation. In this embodiment the first 31 AC DCT coefficients are I used for the error calculation although alternatively the second 31 coefficients could be used.

The quantise module 842 quantises the first 31 AC DCT coefficients using a first quantisation divisor and quantises the second 31 AC DCT coefficients using a second; quantisation divisor.

A range of quantisation divisors are tested by the six quantisation units 840, starting with a QscAE=QALroc and incrementing the Q_sCAE by one QSCALE CODE step at a time, up to a total of 12 quantiser scales. Each quantise module 842 receives the input signal IP_PGM comprising quantiser weighting matrix data that varies the quantisation of each sample depending on its position in the 8HX8V block. The quantise module 842 outputs I quantised data to the error calculation module 844 which calculates the residual error associated with the quantisation that has been performed.

As a result of the quantisation process each quantised DCT coefficient has an associated residual error value, which depends on: the remainder after the division; the total divisor used in the quantiser; and the rounding applied to the result. Any DCT coefficient, C, can be expressed as a combination of the total divisor, q, the residual, r, and an integer, n as follows: C=n*q+r After quantisation the coefficient is either rounded down to become "n" or rounded up to become "n+1". Accordingly, the absolute error after inverse quantisation can be predicted as "r" if n was rounded down or "q-r" if n was rounded up.

The error calculation module 844 takes the unrounded result C/q from the quantiser and given that "q" is known and fixed for the whole Macro-Block this module is able to estimate the residual error "r" or "q-r" for each DCT coefficient. The residual errors are accumulated for the first (or second) 31 AC coefficients of each DCT block, to give a total error for the whole Macro-Block.

The output of each of the 6 error calculation modules 844 is supplied to the QSCALE selection module 860 which selects, for each Macro-Block, a Q_scALE value denoted Q_FINAL that produces the lowest residual error. The MB delay module 850 is required to compensate for the time taken to perform the residual error calculation. The QFrNAL values are inserted into the bitstream of the delayed input data to produce the backsearch output signals BKS_OP_D 1 and BKS_OP_D2. The output of the backsearch module is supplied as input to the final quantise module 900 where the data is image quantised according to the value QFNAL and subsequently Huffman coded in the entropy encoding module 1000.

Figure 15 schematically illustrates a second embodiment of a binary search module.

This embodiment is very similar to the embodiment of Figure 11, except that a quantity "spare_bits" is fed back from the Q scale selector of binary search stage 5 to the Q selector of binary search stage 1. This quantity is passed from Q scale selector to Q scale selector in the sequence of binary search stages.

When the first MB is passed through the binary search process, a quantisation parameter is assigned in order that the target for that MB isnot exceeded. Because the process is a pipelined process, the actual derivation of the quantisation parameter for the first MB takes five MB periods, as shown schematically in Figure 13 and described above. As the processing of the first macroblock is completed, the difference between the actual data quantity for that macroblock and the target for that macroblock - in other words, the quantity "spare_bits", is derived by a simple subtraction in the final Q scale selector. The quantity spare_bits is passed back to the first binary search stage and is added to the target for the macroblock to be processed.

So, because of the latency of the system, as already described with reference to Figure 13, the first few macroblocks do not benefit from any reallocation of"spare_bits", but thereafter, any spare data capacity can be added in to the capacity available for subsequent macroblocks. While it is, in practice, almost impossible to achieve exactly 100% utilisation of the available data capacity, this measure can make more efficient use than would otherwise be possible.

The quantity "spare_bits" is added to the target for the next available macroblock.

As shown schematically in Figure 16, this could be the fifth macroblock later, so that the spare_bits from macroblock MB0 are added into the target for the macroblock MB5.

However, in the embodiment shown in Figure 15 this would impose undesirably stringent timing constraints, so the arrangement of Figure 17 is preferred. In Figure 17, the spare_bits from MB0 are added to the target for MB6; the spare_bits from MB1 are added to the target for MB7, and so on.

So, the first six macroblocks MB0 to MBS do not benefit from any additional data capacity by this arrangement. Also, no use can be made of the excess capacity from the last six macroblocks. But in between, that is to say, for the vast majority of macroblocks in an image, a more efficient use can be made of this excess data capacity which would otherwise be wasted.

In other embodiments, a pipelined process need not be used. For example, the logic of the binary search arrangement of Figure 15 could be implemented in real-time or non real-time software (provided by a storage medium such as disk or memory - not shown and/or a transmission medium such as a network connection - not shown), so that the binary search process can be completed in respect of one macroblock before the processing of the next macroblock is started. Or a hardware embodiment could be arranged so that the binary search stages operate at substantially five times the macroblock rate, which again would mean that the processing in respect of one macroblock could be completed before starting the processing of the next macroblock. In such instances, the spare_bits from MB0 could be added to the target for MB 1; and so on.

Empirical trials have been conducted to test the usefulness of the present techniques, and in particular the arrangement described with reference to Figure 19 (below) using a five stage binary search throughout. These results are illustrated in the following tables. The tables give results for three video test sequences (Seq.) A, B and C. Sequence A is a test sequence representing a "mobile and calendar"; sequence B is a test sequence representing a "weather report"; sequence C is another 70 frame test sequence.

Results are given in two sections: the first two tables are with "activity" (see above) disabled or off. The later two tables have activity enabled.

Within each table, compression properties are shown for five generations of compression / decompression. The technique described above, in which spare_bits are added back to a subsequent macroblock's target, can be either "on" (left-hand sub-column) or"off" (right-hand sub-column).

In the first table of each pair, the signal to noise ratios in decibels are given for encoding at a nominal bit rate of 440 Megabits per second. It can be seen that there is an improvement in signal to noise ratio when the spare_bits technique is "on", and that the variability in signal to noise ratio over multiple generations is similar for the technique being "on" or "off,'.

In the second table of each pair, the number of bits per field is given over the test sequence. At a nominal bit rate of 440 Mbits per second, the number of bits available per field is 7333333. With the spare_bits technique "on", the bit utilization is much higher. In some cases, practically all of the bits are used for compressed data when the technique is "on", whereas up to about 5% of the available capacity may be wasted when the technique is offl'.

Activity off Seq. Generation 1 Generation 2 Generation 3 Generation 4 Generation 5 on off on off on off on off on off A 48.07 47.78 48.06 47. 78 48.06 47.78 48.06 47.78 48.06 47.78 B 56.60 56.17 56.59 56.17 56.59 56. 16 56.58 56.16 56.58 56.16 C 55.00 5 4.7 1 5 4 9 2 5 4.7 0 54.97 5 4.7 0 5 4.97 54.7 0 Activity off Seq. Generation 1 Generation 2 Generation 3 Generation 4 Generation 5 on | off on off on off on off on off A 7333033 7233236 7333003 7233241 1 7332994 7233243 7332992 7233244 7332991 7233244 B 7332906 7164407 7332875 7164412 7164421 7332848 7164426 7332841 7164423 C 7294841 7189613 7294519 7189581 7294381 7189578 7189577 7294235 7189577 s Activity on Seq. Generation 1 | Generation2 Generation 3 | Generation4 | Generation 5 on | off | on | off on | off | on | off | on | off A 47 62 1 4700 1 47 59 1 46 99 47.58 1 46.99 1 47.58 1 46.99 | 47.58 1 46.99 B 55.96 1 55.32 1 55.85 1 55.32 55 79 1 55.32 1 55 75 1 55.32 1 55.72 1 55.32 C 5467 1 54 111 5458 1 5409 5454 1 5409 1 54511 5409 1 5449 1 5409 Activity on Seq. | Generation 1 | Generation 2 | Generation 3 | Generation 4 | Generation 5 | | on | off | on | off | on | off | on | off | on | off | A 1 733237S 1 7096196 1 7330345 1 7095671 1 7329925 1 7095668 1 7329698 1 7095668 1 7329575 1 7095669 B 17331939 1 7076342 1 7324437 1 7075850 1 7320468 1 7075850 1 7317874 1 7075850 1 7315893 1 7075850 C 1 7294152 1 7082195 1 7287069 1 7081016 1 7284206 1 7081010 1 7282463 1 7081009 1 7281257 1 7081009 Figure 18 schematically illustrates a truncated binary search process.

As described above, the variable DCT_PRECISION can take one of four values from 0 to 3. Assuming that the error in the estimation of QSTART is constant, when expressed in terms of the final divisor used during quantisation, the number of available (legal) quantisation parameters within the range of Q_START _ the error in QSTART varies with DCT_PRECISION. Therefore, assuming that the binary search starts at Q_BASE = QSTART, the number of stages of binary search which are needed to search exhaustively through this range of available values also varies with DCT_PRECISION.

For a possible error in Q_START of 24, at DCT_PRECISION = 0, five stages of binary search are needed for such an exhaustive search. At DCT_PRECISION = 3, just three stages are needed. These figures assume activity is "off". When activity is "on", a factor between 0.5 and 2 can be applied to the Q values by the activity mechanism. So with activity "on", binary search must cover a range of: (QSTART - Q_START_ERROR) * 0.5 to (QSTART + Q_START_ERROR) * 2 In some circumstances this search can also be covered in fewer than five stages.

So, at some values of DCT_PRECISION, there will be some 'spare" or redundant stages of binary search.

Figure 18 schematically illustrates the situation where DCT_PRECISION = 3 and there are two redundant search stages from the apparatus of Figures 11 or 15. Stages 1 to 3 carry out binary searching as normal, and then stages 4 and 5 are arranged to "fill in" some gaps in the binary search process, to provide extra information which can be used (see Figure 19 below) in the final selection of a quantisation factor to be passed to backsearch. In particular, stage 4 is used to test QBASE + 1, and stage 5 is used to test QBASE -1.

A variation on this embodiment will now be described.

The above description assumes that binary search has to cover the entire range of the error on QSTART. This is not strictly necessary because the range of Q_SCALE has already been narrowed during bit allocation.

Bit allocation tested a set of Q_SCALE values and found two QSCALE values, QSCALE(n) and Q_SCALE(n+1), that produced a bit quantity above and below the macroblock target. Therefore a slight variation on the above method is to start binary search midway between Q_SCALE(n) and QSCALE(n+1). Then, only enough stages need be used to cover the range of legal values between QSCALE(n) and QSCALE(n+1) (for activity "off"). Any spare stages can be used as described above.

For activity "on", the possible x0.5 - x2 effect of activity should be allowed for in the range searched, so that binary search should cover the range of: Q_SCALE(n) * 0.5 to QSCALE(n+1) * 2 A further variation is to combine the two methods and use a number of stages in binary search which is the lower of: (a) the number of stages needed to cover the error in QSTART and (b) the number of stages needed to cover values between Q_SCALE(n) and QSCALE(n+1).

(the remaining stages being used for further searches as described above).

Figure 19 schematically illustrates a search process using available bit lengths.

In Figure 19, a part 1000 of the final Q scale selector is schematically illustrated. In contrast to previous arrangements where the final decision was made on the basis of only the last bit length result in the binary search process, in Figure 19 the decision on which Q value to use is made on the basis of all available Q values tested during the binary search process, potentially also including the values Q_SCALE (n) and Q_SCALE(n+1) tested at the later stages of the bit allocation process. Note that these two values can only normally be included when activity is "off". An advantage of this arrangement is that the spare bits can be allocated to the next macroblock, rather than to a macroblock five macroblocks later, because the spare bits are applied at the final Q scale selector. In other words the target is adjusted at the final stage of binary search rather than at the first stage.

The arrangement of Figure 19 can be combined with that of Figure 18 so that the extra Q values tested by otherwise redundant binary search stages can be included in this test.

Figures 20 and 21 schematically illustrate detailed techniques for using "spare bits".

To improve the overall performance and make maximum use of the available bits, the following algorithm can be used (selectable via system control) : After the final stage, compare the bits used with the target for that MB - it will generally be lower. Feed back the difference value (i.e spare_bits) to the input of the first stage and add it to the target for the next MB to be processed. As mentioned above, in the embodiment of Figure 15 this will not be able to improve the first five MBs due to the latency of the binary search process, but the remaining MBs will be able to use the spare_bits.

Note that this requires knowledge of the length value for the final Q_SCALE chosen.

This is straightforward if the final decision after Stage 5 is not to change QSCALE, but is more complex otherwise and can involve tracing back the decision tree. This is schematically illustrated in Figure 20.

If the final decision is to add one to the Q_SCALE, then the tree should be traced back until a left branch is found; i.e. where bits≤target. This may be done by passing the "bits" value from each stage forward to the next stage; i.e. if bits≤target for the current Q_SCALE, pass on the bits value for the current QSCALE, else pass on the bits value from the previous Q_SCALE.

In the example in Figure 21, this would be as follows: Stage 1: Pass BITSQSCALE=16 on to next stage (it has no choice) Stage 2: Pass BITSQSCALE=24 on to next stage (bits ≤ target) Stage 3: Pass BITSQSCALE=24 on to next stage (bits > target) Stage 4: Pass BITSQSCALE=22 on to next stage (bits ≤ target) Stage 5: Subtract BITSQSCALE=22 from target and feed back.

(bits > target) 30;

Claims

1. A data compression apparatus in which target data quantities are allocated to respective subsets of input data, the target data quantities together providing a desired output data quantity; the apparatus comprising: a compression arrangement for compressing each data subset in accordance with a respective compression parameter and a respective target data quantity, the compression arrangement generating an actual data quantity indicating a quantity of compressed data in respect of that subset and compression parameter; and target control logic for modifying the target data quantities of data subsets yet to be compressed by the compression arrangement, in response to differences between the target data quantities of already-compressed data subsets and the actual data quantities generated by the compression arrangement in respect of those already-compressed data subsets.

2. Apparatus according to claim 1, in which the input data comprises image data.

3. Apparatus according to claim 2, in which the data subsets are image portions, and the desired output data quantity relates to a group of the image portions.

4. Apparatus according to claim 3, in which two or more groups of data portions comprise one image.

5. Apparatus according to any one of the preceding claims, in which: the compression arrangement is operable to compress each data subset using a variable degree of compression so that the target data quantity is not exceeded in respect of that data subset; and the target control logic is operable to add excess data quantities to target data quantities of data subsets yet to be compressed, the excess data quantities being differences between the target data quantities of already-compressed data subsets and the actual data quantities generated by the compression arrangement in respect of those already-compressed data subsets. 31;

-

6. Apparatus according to any one of the preceding claims, in which the compression arrangement comprises a quantiser operable to quantise a representation of the data subsets in accordance with a quantisation which is variable in order to alter the degree of compression applied to each data subset. I

7. Apparatus according to any one of the preceding claims, the apparatus being pipelined so that the compression arrangement is arranged to compress data subsets at an input rate of one data subset per compression period, the compression of each data subset taking place during two or more successive compression periods.

8. Data compression apparatus in which input data is compressed in accordance with a target compressed data quantity, the apparatus comprising: quantisation search logic having two or more trial quantisers for quantising a representation of the input data using respective trial degrees of quantisation, to assess respective data quantities which would be generated using those degrees of quantisation, the quantisation search logic being arranged as successive trial quantisation stages so that a trial degree of quantisation used by a next trial quantisation stage depends on the data quantity! generated in respect of a preceding trial quantisation stage; a selector for selecting a degree of quantisation to be used in the compression of the input data, the selector being operable to compare the target data quantity with the data quantities produced at the last and at least one other trial quantisation stage of the quantisation search logic; and a quantiser for quantising the input data in accordance with the selected degree of quantisation.

9. Apparatus according to claim 8, in which the selector is arranged to select a degree of quantisation so that the target data quantity is not exceeded.

10. Apparatus according to claim 8 or claim 9, comprising a start estimator arranged to perform further trial quantisations to determine a first degree of quantisation for use by the quantisation search logic; in which the selector is also operable to compare the target data quantity with the data quantities produced by the further trial quantisations of the start estimator. 32;

11. Apparatus according to any one of claims 8 to 10, in which the quantisation search logic comprises: a first trial quantisation stage in which the input data is quantised in accordance with a first trial degree of quantisation; and I one or more subsequent trial quantisation stages, each having a detector for detecting whether the data quantity indicated by the preceding stage would exceed the target data quantity; and if so, setting a trial degree of quantisation for use by the current stage to be harsher, by a stage increment, than that of the preceding stage; and if not, setting the trial degree of quantisation for use by the current stage to be less harsh, by the stage increment, than that of the preceding stage.

12. Apparatus according to claim 11, in which the stage increment varies from stage to stage.

IS

13. Apparatus according to claim 12, in which the stage increment decreases with each successive stage.

14. Apparatus according to any one of claims 8 to 13, in which the degree of quantisation can have one of a set of legal values, and a desired search range of such values is defined in respect of the input data.

15. Apparatus according to claim 14, comprising a detector for detecting whether a subset of the stages of the quantisation search logic is sufficient to test all applicable legal values within the desired search range, and, if so, for setting the trial degrees of quantisation for remaining stages of the quantisation search logic to other values.

16. Apparatus according to any one of claims 8 to 15, in which the input data comprises image data.

17. Apparatus according to claim 16, in which the input data comprises data representing a portion of an image.

18. Data compression apparatus substantially as hereinbefore described with reference to the accompanying drawings.

19. A data compression method in which target data quantities are allocated to respective subsets of input data, the target data quantities together providing a desired output data quantity; the method comprising the steps of: compressing each data subset in accordance with its respective target data quantity; and modifying the target data quantities of data subsets yet to be compressed, in response to differences between the target data quantities of already-compressed data subsets and the actual data quantities generated by the compression arrangement in respect of those already compressed data subsets.

20. A data compression method in which subsets of input data are compressed in accordance with respective target compressed data quantities, the method comprising the steps of: compressing each data subset in accordance with a respective compression parameter and a respective target data quantity, to generate an actual data quantity indicating a quantity of compressed data in respect of that subset and compression parameter; and modifying the target data quantities of data subsets yet to be compressed by the compressing step, in response to differences between the target data quantities of already compressed data subsets and the actual data quantities generated by the compressing step in respect of those already-compressed data subsets.

21. A data compression method substantially as hereinbefore described with reference to the accompanying drawings.

22. Computer software having program code for carrying out a method according to any one of claims 19 to 21.

23. A data providing medium by which computer software according to claim 22 is provided.

24. A medium according to claim 23, the medium being a transmission medium.

25. A medium according to claim 23, the medium being a storage medium.