GB2640349A

GB2640349A - Processing a three-dimensional representation of a scene

Info

Publication number: GB2640349A
Application number: GB2413811.7A
Authority: GB
Inventors: Seibt Luis; Poularakis Stergios
Original assignee: V Nova International Ltd
Current assignee: V Nova International Ltd
Priority date: 2024-09-06
Filing date: 2024-09-19
Publication date: 2025-10-15
Also published as: GB202413811D0

Abstract

Described is a method of processing a three-dimensional representation of a scene, the method comprising: 81 identifying a first image in a set of images, the first image being referenced by a first point of a three-dimensional representation; 82 identifying a second image in the set of images, the second image being referenced by a second point of a three-dimensional representation; 83 determining a similarity value associated with a similarity of the first image and the second image; based on the similarity value: modifying a reference of the first point; and/or 84 processing the first image so as to modify the first image that is referenced by the first point. This may be applied to detecting similarities between texture patches in a texture atlas. Other embodiments involve: (i) determining an attribute value for a point in the three-dimensional representation, identifying an image referenced to that point and identifying a plurality of associated attribute values; (ii) a bitstream of points and images also comprising difference deltas, where a delta is arranged to be combined with an image referenced by a point.

Description

Processing a three-dimensional representation of a scene Field of the Disclosure The present disclosure relates to methods, systems, and apparatuses for processing an image. In particular the disclosure relates to methods, systems, and apparatuses for processing an image so as to remove redundant data. This can enable the compressing of a plurality of images by removing redundant data between similar images.

Background to the Disclosure

Three-dimensional representations of environments are used in many contexts, including for the generation of virtual reality videos, in which depth information for a plurality of points of the representation is used to generate different images for a left eye and a right eye of a user. Typically, substantial processing power is required to determine such a three-dimensional representation, and the file size of files associated with these representations is typically large so that substantial amounts of storage are needed to keep the files and substantial amounts of bandwidth are required to transfer the files.

Summary of the Disclosure

According to an aspect of the present disclosure, there is described a method of processing a three-dimensional representation of a scene, the method comprising: identifying a first image in a set of images, the first image being referenced by a first point of a three-dimensional representation; identifying a second image in the set of images, the second image being referenced by a second point of a three-dimensional representation; determining a similarity value associated with a similarity of the first image and the second image; based on the similarity value: modifying a reference of the first point; and/or processing the first image that is referenced by the first point.

Preferably, the first point and the second point are each points of a first three-dimensional representation.

Preferably, the first point is a point of a first three-dimensional representation and the second point is a point of a second three-dimensional representation. Preferably, the first and second three-dimensional representations are successive three-dimensional representations.

Preferably, processing the first image comprises removing the first image from the set of images.

Preferably, processing the first image comprises modifying the first image, preferably modifying a pixel value of the first image.

Preferably, processing the first image comprises modifying the first image so that it references the second image.

Preferably, modifying the reference comprises modifying the reference of the first point so as to reference the second image.

Preferably, processing the first image comprises determining a representative image that is representative of a subset of images that includes the first image and the second image.

Preferably, modifying the reference comprises modifying the reference of the first point so as to reference the representative image.

Preferably, processing the first image comprises modifying the first image to reference the representative image.

Preferably, the representative image comprises an image from the set of images. Preferably, the representative image comprises the second image.

Preferably, the method comprises determining a delta for the first image, the delta relating to a difference between the first image and the representative image.

Preferably, modifying the reference of the first point comprises modifying the reference so as to reference the delta.

Preferably, processing the first image comprises modifying the first image to comprise the delta.

Preferably, the delta relates to a distance in Euclidean space.

Preferably, the method comprises storing the delta in dependence on a value of the delta exceeding a threshold value. Preferably, the threshold value depends on a distance of the first point to a viewing zone of the three-dimensional representation.

Preferably, the value of the delta comprises one or more of: a maximum change value, an average change value, a cumulative number of change locations, or a maximum absolute value.

Preferably, the method comprises storing the delta in dependence on distance of the first point to a viewing zone of the three-dimensional representation, preferably in dependence on the distance being below a threshold distance.

Preferably, the delta comprises one or more of: a change location; a change value, and an absolute value.

Preferably, the method comprises: determining an initial delta value; quantising the initial value, preferably comprising the initial value based on a quantisation curve; and recording and/or storing the quantised delta value.

Preferably, the subset of images includes images associated with different three-dimensional representations (e.g. with each three-dimensional representation being associated with a different time in the scene).

Preferably, determining a similarity value comprises determining a similarity value between corresponding pixels of the first image and the second image.

Preferably, determining a similarity value comprises determining a difference between corresponding pixels of the first image and the second image and/or a variance of the differences of corresponding pixels of the first image and the second image.

Preferably, the method comprises selecting a sample set of images from the set of images. Preferably, the sample set of images is selected by random or stratified sampling, and the first and second image are part of the sample set of images.

Preferably, the method comprises selecting a sample set of pixels from each of the images. Preferably, the sample set of pixels is selected by random or stratified sampling, and determining the similarity value comprises determining a similarity between the sampled set of pixels.

Preferably, the method comprises: determining a plurality of image segments for each of the images; determining a similarity between segments of the images; and modifying the reference and/or processing the first image based on a similarity between the image segments.

Preferably, modifying the reference and/or processing the first image comprises indicating a plurality of image segments to be associated with the first point, preferably indicating an order for combining the image segments.

Preferably, the method comprises: determining, for each image in the set of images, a characteristic set of pixels; determining attribute values for each characteristic set of pixels; determining at least one subset of similar images based on the determined attribute values; and generating a representative image for each subset of similar images.

Preferably, the method comprises, for each subset of similar images, determining a delta between the representative image of said subset and each image in said subset.

Preferably, determining a representative image for a subset of similar images comprises determining an average, preferably a weighted average, of the attribute values of the images in the subset of similar images.

Preferably, the method comprises adding each representative images to the set of images and/or replacing one or more images within the set of images with a representative image.

Preferably, the method comprises sorting the images into a plurality of clusters of similar images. Preferably, the method comprises sorting the images based on a k-means clustering algorithm.

Preferably, determining the similarity value comprises determining that the first image and the second image are located within the same cluster.

Preferably, the method comprises determining a centroid for each cluster. Preferably, the method comprises generating a representative image for each cluster based on the determined centroids.

Preferably, sorting the images into clusters comprises determining an N-dimensional point for each image based on attribute values of that image and sorting the images into clusters based on the locations of the N-dimensional points within an N-dimensional space.

Preferably, the method comprises: determining, for each image, a plurality of components of the image; and sorting each component of the images into a plurality of clusters of separate components.

Preferably, the components comprise one or more of red, green, blue (RGB) components; red, green, blue, alpha (RGBA) components; and red, green, blue (RGB) components Preferably, determining the similarity value comprises determining that components of the first image and the second image are located within one or more of the same clusters, preferably determining that each pair of components of the first image and the second image are located in the same cluster.

Preferably, the method comprises determining a number of clusters. Preferably, the method comprises determining a number of clusters based on a user input.

Preferably, sorting the images into clusters comprises: converting each image to an N-dimensional point with a location in N-dimensional space, the converting being dependent on attribute values of the image; generating at least one N-dimensional centroid, wherein each N-dimensional centroid has a location in N-dimensional space; calculating the distance between each N-dimensional point and each N-dimensional centroid; assigning each N-dimensional point to a cluster, wherein each cluster comprises a set of N-dimensional points that have a nearest N-dimensional centroid in common; calculating a new N-dimensional centroid for each cluster, the location of the new N-dimensional centroid being the mean of the locations of each N-dimensional point within the cluster; calculating a current cumulative distance, the current cumulative distance being the sum of the distance between each N-dimensional point and the corresponding new N-dimensional centroid summed over all clusters; and comparing the current cumulative distance to a previously stored cumulative distance.

Preferably, generating the at least one N-dimensional centroid comprises generating the centroid based on a previous centroid, the previous centroid having been determined for a precious set of images.

Preferably, the method comprises determining N based on a number of attribute values of each image and/or a size of each image.

Preferably, the method comprises iterating the calculation of new centroids until a change in the location of at least one, preferably each, centroid is below a threshold change.

Preferably, the method comprises linearising the sets of attribute values, wherein the linearising comprises: identifying dimension of the sets of attribute values that are associated with a high deviation; and performing the clustering process in dependence on the dimensions that have a high deviation.

Preferably, each image is associated with the same number of attribute values and/or wherein each image has the same size.

Preferably, each image is associated with an index.

Preferably, the method comprises reordering the set of images.

Preferably, the method comprises updating one or more references of texture points, the texture points referencing images in the set of images. Preferably, updating the references comprises: reordering the images, and updating the references based on the reordering of the images.

Preferably, the method comprises forming a bitstream comprising the first point.

According to another aspect of the present disclosure, there is described a method of determining an attribute value for a point in a three-dimensional representation of a scene, the method comprising: identifying a point of the three-dimensional representation, the point comprising a location; identifying an image referenced by the point; and identifying a plurality of attribute values associated with the point based on the referenced image.

Preferably, the method comprises: identifying a delta associated with the identified point and/or the referenced image; and determining the plurality of attribute values based on the identified delta. Preferably, the method comprises identifying a plurality of attribute values in the image and modifying said plurality of attribute values based on the identified delta. Preferably, the method comprises combining said plurality of attribute values with the identified delta.

Preferably, the image reference comprises a reference to a database of images, wherein: the database of images is associated with the three-dimensional representation; and/or the database of images is transmitted in a bitstream comprising the three-dimensional representation. Preferably, the bitstream further comprises a plurality of delta values associated with the three-dimensional representation.

Preferably, the method comprises identifying a delta flag, the delta flag identifying whether a delta should be combined with the plurality of attribute value. Preferably, the method comprises determining a delta based on the flag.

Preferably, the, and/or each, image is associated with a plurality of attribute values that lie on a shared plane Preferably, the referenced image is a part of a set of images. Preferably, this set of images is associated with a plurality of three-dimensional representations.

Preferably, the method comprises: identifying a point of a further three-dimensional representation, the point comprising a location; identifying a further image referenced by the point, where both the image and the further image are contained in a single set of images associated with each of the three-dimensional representation and the further three-dimensional representation; and identifying a plurality of attribute values associated with the further point based on the further image.

Preferably, the image and the further image are the same image.

Preferably, the method comprises determining that the point references an image. Preferably, the method comprises determining that the point references an image based on evaluating an attribute data field of the point to determine that the attribute data field comprises a reference to an image.

Preferably, the method comprises determining that the point references a delta. Preferably, the method comprises determining that the point references a delta based on evaluating an attribute data field of the point to determine that the attribute data field comprises a reference to a delta.

Preferably, the three-dimensional representation is associated with a viewing zone, the viewing zone comprising a subset of the scene and/or the viewing zone enabling a user to move through a subset of the scene.

Preferably, the user is able to move within the viewing zone with six degrees of freedom (6DoF).

Preferably, the viewing zone has a volume of less than 50% of the volume of the scene, less than 20% of the volume of the scene, and/or less than 10% of the volume of the scene.

Preferably, the viewing zone has, or is associated with, a volume, preferably a real-world volume, of less than five cubic metres (5m3), less than one cubic metre (1m3), less than one-tenth of a cubic metre (0.1 m3) and/or less than one-hundredth of a cubic metre (0.01 m3).

According to another aspect of the present disclosure, there is described an apparatus for processing a three-dimensional representation of a scene, the apparatus comprising: means for (e.g. a processor for) identifying a first image in a set of images, the first image being referenced by a first point of a three-dimensional representation; means for (e.g. a processor for) identifying a second image in the set of images, the second image being referenced by a second point of a three-dimensional representation; means for (e.g. a processor for) determining a similarity value associated with a similarity of the first image and the second image; means for (e.g. a processor for) processing the first image based on the similarity value; and means for (e.g. a processor for): modifying a reference of the first point; and/or processing the first image that is referenced by the first point.

According to another aspect of the present disclosure, there is described an apparatus for determining an attribute value for a location in a three-dimensional scene, the apparatus comprising: means for (e.g. a processor for) identifying a point of the three-dimensional representation, the point comprising a location; means for (e.g. a processor for) identifying an image referenced by the point; and means for (e.g. a processor for) identifying a plurality of attribute values associated with the point based on the referenced image.

According to another aspect of the present disclosure, there is described a bitstream comprising one or more points modified using the aforesaid method.

According to another aspect of the present disclosure, there is described a bitstream comprising one or more images modified using the aforesaid method.

According to another aspect of the present disclosure, there is described a bitstream comprising: one or more points of one or more three-dimensional representations; and a database of images; wherein the one or more points include one or more texture points that reference images in the database of images.

Preferably, the bitstream comprises one or more deltas, wherein each delta is arranged to be combined with an image referenced by a point to provide a plurality of attribute values for that point.

Preferably, the bitstream comprises one or more flags that indicate one or more of: whether to use deltas to determine attribute values for each point; a quantisation level associated with the delta values; and whether the database of images is associated with a plurality of three-dimensional representations.

Preferably, the bitstream comprises points of a plurality of three-dimensional representations. Preferably, each of the three-dimensional representations contains one or more texture points that reference images in the database of images. Preferably, each three-dimensional representation is associated with a separate database of images.

According to another aspect of the present disclosure, there is described a system for carrying out the aforesaid method, the system comprising one or more of a processor; a communication interface; and a display.

According to another aspect of the present disclosure, there is described an apparatus for determining a point of a three-dimensional representation of a scene, the apparatus comprising: means for (e.g. a processor for) identifying a plurality of points of the representation; means for (e.g. a processor for) determining that the plurality of points lie on a shared plane; means for (e.g. a processor for) in dependence on the plurality of points lying on a shared plane, determining a texture patch based on attributes of the plurality of points; and means for (e.g. a processor for) determining a texture point, the texture point comprising a reference to the texture patch.

According to another aspect of the present disclosure, there is described an apparatus method for determining an attribute of a point of a three-dimensional representation of a scene, the method comprising: means for (e.g. a processor for) identifying, in the point, a reference to a texture patch, the texture patch being associated with a plurality of attributes; and means for (e.g. a processor for) determining the attribute of the point based on the attributes of the texture patch.

According to another aspect of the present disclosure, there is described a bitstream comprising one or more texture points and/or texture patches determined using the aforesaid method.

According to another aspect of the present disclosure, there is described a bitstream comprising a texture point, the texture point comprising a reference to a texture patch that comprises attribute values associated with the texture point.

According to another aspect of the present disclosure, there is described an apparatus (e.g. an encoder) for forming and/or encoding the aforesaid bitstream.

According to another aspect of the present disclosure, there is described an apparatus (e.g. a decoder) for receiving and/or decoding the aforesaid bitstream.

Any feature in one aspect of the disclosure may be applied to other aspects of the invention, in any appropriate combination. In particular, method aspects may be applied to apparatus aspects, and vice versa.

Furthermore, features implemented in hardware may be implemented in software, and vice versa. Any reference to software and hardware features herein should be construed accordingly.

Any apparatus feature as described herein may also be provided as a method feature, and vice versa. As used herein, means plus function features may be expressed alternatively in terms of their corresponding structure, such as a suitably programmed processor and associated memory.

It should also be appreciated that particular combinations of the various features described and defined in any aspects of the disclosure can be implemented and/or supplied and/or used independently.

The disclosure also provides a computer program and a computer program product comprising software code adapted, when executed on a data processing apparatus, to perform any of the methods described herein, including any or all of their component steps.

The disclosure also provides a computer program and a computer program product comprising software code which, when executed on a data processing apparatus, comprises any of the apparatus features described herein.

The disclosure also provides a computer program and a computer program product having an operating system which supports a computer program for carrying out any of the methods described herein and/or for embodying any of the apparatus features described herein.

The disclosure also provides a computer readable medium having stored thereon the computer program as aforesaid.

The disclosure also provides a signal carrying the computer program as aforesaid, and a method of transmitting such a signal.

The disclosure extends to methods and/or apparatus substantially as herein described with reference to the accompanying drawings.

The disclosure will now be described, by way of example, with reference to the accompanying drawings.

Description of the Drawings

Figure 1 shows a system for generating a sequence of images.

Figure 2 shows a computer device on which components of the system of Figure 1 may be implemented. Figure 3 shows a method of determining a three-dimensional representation of a scene.

Figures 4a and 4b show method of determining a point based on a plurality of sub-points.

Figure 5 shows a scene comprising a viewing zone.

Figures 6a and 6b show arrangements of capture devices for determining points of the three-dimensional representation.

Figure 7 shows a point that can be captured by a plurality of capture devices.

Figures 8a and 8b show grids formed by the different capture devices.

Figure 9 describes a method of determining a location of a point of the three-dimensional representation.

Figure 10 shows a method of determining an angle of a point from a capture device used to capture the point.

Figure 11 shows a method of determining a texture patch associated with a point of the three-dimensional representation.

Figures 12a, 12b, 12c, and 12d illustrate the determination and use of a texture patch. Figures 13a and 13b each show a texture atlas that comprises a plurality of texture patches. Figure 14a and 14b show methods of compressing a plurality of images.

Figure 15a and 15b show methods of performing clustering.

Figure 16 shows clusters that may be obtained, e g, using the method of Figure 15b.

Figure 17 shows schematic examples of a plurality of deltas generated as a result of a plurality of delta implementations.

Figure 18 shows a method for a method for generating a modified texture patch.

Figure 19 shows a method for processing a texture point to obtain attribute values in dependence on a texture patch.

Figure 20 shows a method for processing a texture point to obtain attribute values in dependence on a texture patch and a delta.

Figure 21 shows a schematic of a bitstream.

Description of the Preferred Embodiments

Referring to Figure 1, there is shown a system for generating a sequence of images. This system can be used to generate, and then display, a representation of an environment, which may comprise a VR environment (or an XR environment).

The system comprises an image generator 11, an encoder 12, a transmitter 13, a network 14, a receiver 15, a decoder 16 and a display device 17.

These components may each be implemented on separate apparatuses. Equally, various combinations of these components may be implemented on a shared apparatus; for example, the image generator 11, the encoder 12, and the transmitter 13 may all be part of a single image data generation device. Similarly, the receiver 15, the decoder 16, and the display device 17 may all be a part of a single image rendering device.

Typically, the system comprises at least one encoding computer device (e.g. a server of a content provider) and at least one rendering computer device (e.g. a VR headset).

Referring to Figure 2, each of the components, and in particular the image generator 11, the encoder 12, the transmitter 13, the receiver 15, the decoder 16 and the display device 17 is typically Implemented on a computer device 20, where, as described above, a plurality of these components may be implemented on a shared computer device.

Each computer device comprises one or more of: a processor 21 for executing instructions (e.g. so as to perform one or more of the steps of the various methods described below), a communication interface 22 for facilitating communication between computer devices (e.g. an ethernet interface, a Bluetooth® interface, or a universal serial bus (UBS) interface, a memory 23 and/or storage 24 for storing information and instructions (e.g. a random access memory (RAM), a read only memory (ROM), a hard drive disk (HDD) a solid state drive (SSD), and/or a flash memory, and a user interface 25 (e.g. a display, a mouse, and/or a keyboard) for enabling a user to interact with the computer device. These components may be coupled to one another by a bus 25 of the computer device.

The computer device 20 may comprise further (or fewer) components. In particular, the computer device (e.g. the display device 17) may comprise one or more sensors, such as an accelerometer, a GPS sensor, or a light sensor. These sensors typically enable the computer device to identify an environmental condition and/or an action of wearer of the display device.

Turning back to Figure 1, the image generator 11 is configured to generate a sequence of image data (e.g. a sequence of image frames) to enable the display device 17 to use this image data to display a plurality of images. The image data may comprise one or more digital objects and the image data may be generated or encoded in any format. For example, the image data may comprise point cloud data, where each point has a 3D position and one or more attributes. These attributes may, for example, include, a surface colour, a transparency value, an object size and a surface normal direction. Each attribute may have a value chosen from a continuous range or may have a value chosen from a discrete set.

The Image data enables the later rendering of images. This image data may enable a direct rendering (e.g. the image data may directly represent an image). Equally, the image data may require further processing in order to enable rendering. For example, the image data may comprise three-dimensional point cloud data, where rendering a two-dimensional image using this data requires processing based on a viewpoint of this two-dimensional image.

The image data may comprise depth map data, where one or more pixels or objects in the image is associated with a depth that is specified by the depth map data. The depth map data may be provided as a depth map layer, separate from an image layer. In some contexts, such as MPEG Immersive Video (MIV), the image layer may instead be described as a texture layer. Similarly, in some contexts, the depth map layer may instead be described as a geometry layer.

The image data may include a predicted display window location. The predicted display window location may indicate a portion of an image that is likely to be displayed by the display device 17. The predicted display window location may be based on a viewing position (such as a virtual position and/or orientation of the user in a 3D environment) of the user, where this viewing position may be obtained from the display device. The predicted display window location may be defined using one or more coordinates. For example, the predicted display window location may be defined using the coordinates of a corner or center of a predicted display window, and may be defined using a size of the predicted display window. The predicted display window location may be encoded as part of metadata included with the frame.

The image data for each image (e.g. each frame) may include further information, which may be provided as a part of an image, e.g. as part of the point cloud data, or as separate layers. In particular, the image data may include audio information or haptic feedback information indicating audio or haptics which can accompany displayed visual data. An audio layer or haptic layer may accompany each image, and may be omitted for images where no accompanying audio or haptics are required.

Similarly, the image data may comprise interactivity information, where the image data may contain or indicate elements with which a user can interact. The interactivity information may, for example, define a behaviour of an element, where a user is able to interact with the element based on this behaviour. The behaviour typically defines a change in an element that occurs as a result of a user interaction where this change may comprise a change in the attributes of the element or in the rendering of the element. As an example, where an image contains a target element, the target element may be arranged to disappear when a user interacts with this element, or to provide feedback indicating that the user has interacted with the target. This interactivity data may be provided as part of, or separately to, the image data.

The image data may indicate, or may be combinable with, a state of the virtual environment, a position of a user, or a viewing direction of the user. Here, the position and viewing direction may be physical properties of the user in the real-world, or position and viewing direction may also be purely virtual, for example being controlled using a handheld controller. The image generator 11 may, for example, obtain information from the display device 17 that indicates the position, viewing direction, or motion of the user. Equally, the image generator may generate image data such that it can later be combined with this position, viewing direction, or motion, where the image generator may generate a full scene which is only partially viewed by a user depending on the position of that user.

In some cases, the generated image may be independent of user position and viewing direction. This type of image generation typically requires significant computer resources such as a powerful GPU, and may be implemented in a cloud service, or on a local but powerful computer. For example, a cloud service (such as a Cloud Rendering Service (CRN)) may reduce the cost per-user and thereby make the image frame generation more accessible to a wider range of users. Here "rendering" refers at least to an initial stage of rendering to generate an image. Further rendering may occur at the display device 17 based on the generated image to produce a final image which is displayed.

The image generator 11 may, for example, comprise a rendering engine for initially rendering a virtual environment such as a game or a virtual meeting room.

The encoder 12 is configured to encode frames to be transmitted to the display device 17. The encoder may be implemented using executable software or may be implemented on specific hardware such as an ASIC. In some embodiments, the image generator 11 may transmit raw, unencoded, data through the network 14. However, such transmission typically leads to a high file size and requires a high bandwidth so that it is typically desirable to encode the data prior to the transmission.

The encoder 12 may encode the image data in a lossless manner or may encode the data a lossy manner. The encoder may apply inter-frame or intra-frame compression based on a currently-encoded frame and optionally one or more previously encoded frames. The encoder may be a multi-layer encoder, such as a low complexity enhancement video codec (LCEVC) enabled encoder.

Where the generated frames comprise depth map data, the encoder 12 may perform layered encoding on each instance of image data (e.g. each frame) to generate an encoded frame comprising a base depth map layer and an enhancement depth map layer. Encoding a depth map in this way may improve compression. In some applications, such as HDR video, depth maps are desirably highly detailed with a bit depth of up to twelve or fourteen bits, which is a significant increase in the data to be transmitted. As a result, providing ways to improve compression of the depth map can make more realistic depth map-based displays viable when performing rendering or transmission of rendered data in real-time. Furthermore, this type of layered encoding makes it easy to drop (and then pick back up) one or more of the layers, which provides flexibility and tools for bandwidth management.

Layered encoding is also helpful as the final decoder/user device (such as a user display device) can choose whether to process these extra layers. For example, in a non-layered approach, the best the end device (i.e. the receiver, decoder or display device associated with a user that will view the images) can do is determine that it does not have enough resources for a given quality (be it resolution, frame rate, inclusion of depth map) and then signal to the controller/renderer/encoder that it does not have enough resources. The controller then will send future images at a lower quality. In that alternative scenario, the end device still unfortunately has to process the higher quality data until the lower quality data arrives, if it can process the received images at all.

In some of the described embodiments, this situation is improved upon because when/if the end device determines for example that it does not have the processing capabilities to handle the highest level of quality, then it can drop and/or choose not to process certain layers. The end device may also signal to the controller that it needs a lower level of quality, but in the meantime the end device can only process the number of layers that it can handle. Therefore, the end device can react to conditions much more quickly.

In some cases, depth map data may be embedded in image data. In this case, the base depth map layer may be a base image layer with embedded depth map data, and the enhancement depth map layer may be an enhancement image layer with embedded depth map data.

Alternatively, when the generated images comprise a depth map layer separate from an image layer and multi-layer encoding is applied, the encoded depth map layers may be separate from the encoded image layers. This has the advantage that the encoded depth map layers can be dropped under some conditions while still retaining image layers that can be displayed (albeit with a lower level of realism). For example, the encoded depth map layers can be dropped by a transmitter or encoder when available communication resources are reduced, or can be dropped by an end device which lacks the processing resources to handle the highest level of quality.

Similarly, if some images comprise an audio base layer, a haptic feedback base layer, an audio enhancement layer or a haptic feedback enhancement layer, these can be processed or dropped flexibly.

Again similarly, if some images comprise an interactivity data base layer or an interactivity enhancement layer these can be processed or dropped flexibly. For example, certain interactions may only be possible where a threshold bandwidth is available, where complex interactions (e.g. those enabling a conversation with a digital object) may be disabled before less complex interactions (e.g. changing a pixel colour) are disabled.

Additionally or alternatively, where the image data comprises point cloud data, the encoder may apply a point cloud data encoding technique such as described in European patent application EP21386059.6, which is incorporated herein by reference. Such a point cloud encoder may act as a base encoder for a layered encoding technique such as LCEVC or VC-6. Notably LCEVC and VC-6 techniques encode and decode a layered signal, but are agnostic about the content type of data encoded in the signal. For example, the signal can include textures, video frames, geometry or depth data, meshes, point clouds, rendering attributes or physics engine attributes.

The transmitter 13 may be any known type of transmitter for wired or wireless communications, including an Ethernet transmitter or a Bluetooth transmitter.

The transmitter 13 may be configured to make decisions about how to transmit the image data, and/or may provide feedback to the encoder 12 or the image generator 11. For example, the transmitter may determine available communication resources (e.g. bandwidth) for transmitting image data, and may drop one or more layers from an encoded frame, or indicate to the image generator and/or encoder that image data should be generated and encoded with fewer layers, when insufficient bandwidth is available for transmission of all generated data. As specific examples, the transmitter may be configured to drop a depth map layer, an LCEVC enhancement layer, or a VC-6 enhancement layer from a frame when insufficient communication resources are available.

The network 14 provides a channel for communication between the transmitter 13 and the receiver 15, and may be any known type of network such as a WAN or LAN or a wireless Wi-Fi or Bluetooth network. The network may further be a composite of several networks of different types. Many users only have access to a network with a bandwidth of 30MBps which can lead to latency jitter when streaming. The required bandwidth and the observed latency can be reduced by means of tactics such as forward-looking rendering and last-millisecond reprojection, which are enabled by improved compression.

The receiver 15 may be any known type of receiver for wired or wireless communications, including an Ethernet transmitter or a Bluetooth transmitter.

The decoder 16 is configured to receive and decode an encoded frame. The decoder may be implemented using executable software or may be implemented on specific hardware such as an ASIC.

The display device 17 may for example be a television screen or a VR headset. The timing of the display may be linked to a configured frame rate, such that the display device may wait before displaying the image. The display device may be configured to perform warping, that is, to obtain a final display window location, adjust a warpable image to obtain a final image corresponding to a final viewing direction of the user, and display the final image.

In this regard, the image data is typically arranged to provide a warpable image for which a portion of the image that is displayed at the display device 17 is dependent on a position or orientation of a viewer. The warpable image may then be rendered before a most up to date viewing direction of the user is known. The warpable image may be transmitted to the display device, or the warpable image may be transmitted to a rendering node which is near to the display device, and the display device or rendering node may perform time warping to generate a displayed image portion based on the warpable image and the most up to date viewing direction of the user.

As mentioned above, a single device may provide a plurality of the described components. For example, a first rendering node may comprise the image generator 11, encoder 12 and transmitter 13. Additional similar rendering nodes may be included in the system, and may work together to generate the sequence of frames.

In one case, multiple rendering nodes may each provide separate image data to an image data assembling node; for example, each rendering node may provide a part of a sequence of frames to a frame assembling node.

For example, the receiver 15, decoder 16 or display device 17 may be configured to assemble parts of image data from multiple sources to generate a sequence of images for display on the display device.

Alternatively, the image data assembling node may be separate from the receiver 15, decoder 16 and display device 17.

Additionally or alternatively, multiple rendering nodes may be chained. In other words, successive rendering nodes may add to a sequence of image data as it passes from rendering node to rendering node, and eventually a complete sequence of image data is then provided to the receiver 15. Furthermore, each rendering node may obtain components of a render from multiple upstream rendering nodes and/or distribute components of a render to multiple downstream rendering nodes.

A chain of rendering nodes may be useful for performing different rendering tasks that require different quantities of processing resources, or different frame rates. For example, a company may provide distributed processing in the form of a centralised hub which has abundant processing resources but is distant from users, and peripheral locations which have more scarce processing resources but are closer to users. Expensive but fairly static rendering features such as background lighting or environmental impact on sound may be generated at the central hub (for example using ray tracing), while features that require fewer resources but faster responses or higher frame rates may be generated closer to the user. In other words, the more responsive a rendering feature needs to be, the lower latency it needs between the rendering node which generates the feature and the user display and, in a chain of rendering nodes, the node which generates each rendering feature can be chosen based on a required maximum latency of that feature. On the other hand, if it is expensive to generate a rendering feature, then it may be preferable to generate the feature less frequency and with a higher maximum latency. For example, a static, high-quality background feature may be generated early in the chain of rendering nodes and a dynamic, but potentially lower-quality, foreground feature may be generated later in the chain of rendering nodes, closer to the user device. Here, environmental impact on sound means, for example, a set of surfaces may be constructed where each surface has different sound reflection and absorption properties depending upon material and shape. The frame rates may be matched by creating multiple frames with features generated at the lower frame rate, and combining them with the frames with features generated at the higher frame rate. In a non-limiting embodiment, a preliminary rendering generates volumetric object data including motion vectors at a first (lowest) frame rate, then produces 2D rendered frames plus depth information for a specific user at a second (higher) frame rate, then transmits video plus depth data to the user device, which produces final frames for display via space warping (depth-based reprojections) at a third (highest) frame rate. One or more of these steps may be performed in combination with the other described embodiments. The viewing position of the user may change as additional rendering tasks are performed at different rendering nodes in the chain. Each or any rendering node may obtain an updated viewing position before performing its respective rendering task.

Additionally, the system may simultaneously generate multiple sequences of image data for different respective users or different respective display devices. For example, in the context of a VR or AR experience, each user or display device may view a different 3D environment, or may view different parts of a same 3D environment. When using a chain of rendering nodes, each node may serve multiple users or just one user.

For example, a starting rendering node (e.g. at a centralised hub) may serve a large group of users. For example, the group of users may be viewing nearby parts of a same 3D environment. In this case, the starting node may render a wide zone of view ("field of view") which is relevant for all users in the large group.

The starting node may send this wide field of view to a first middle rendering node which renders additional aspects of the 3D environment. These additional aspects may for example be aspects which require less processing power to render, or may be aspects which are specific to individual users of the group. Additionally, the middle rendering node may render features in a smaller field of view than the starting node -this smaller field of view may be relevant to each user rather than the group of users. The first middle rendering node may additionally only serve a smaller number of users (e.g. half of the large group of users), with the remaining users being served by a second middle rendering node which also receives the wide

field of view from the starting node.

The middle rendering node(s) may then send sequences of second partially or fully rendered frames to an end device for each user. The end device may perform further processes such as warping or focal distance adjustments, optionally using depth map data.

Preferably, each rendering node encodes the partially or fully rendered frames before transmitting them on to a next rendering node or to the receiver 15. This means that the required communication resources can be reduced when the rendering nodes are separated by one or more networks, or more generally are implemented in a distributed system such as a cloud.

However, each rendering node in a chain is encoding a different partially or fully rendered frame, with different data. Therefore, it may be advantageous for different rendering nodes to use different rendering formats and/or encoding formats. For example, the output from a first rendering node may be point cloud data which logically describes a 3D scene. This point cloud data can be encoded using the techniques of EP21386059.6. A second rendering node may then operate on the point cloud data to generate image data that is more readily displayed by a generic display device, without requiring the display device to model the 3D environment. This image data may be encoded using video coding techniques.

The chaining of rendering nodes may be extended to arbitrary tree structures, where a rendering node obtains partially rendered frames from more than one preceding rendering node, and generates further partially or fully rendered frames based on the multiple obtained sequences of partially rendered frames.

For example, a content rendering network (CRN) comprising numerous rendering nodes may be used to serve a volumetric event to a large number of same-time users, such as users participating in a shared virtual environment. Rendering the same event for each user is far more expensive in terms of computation time and power consumption than rendering the volumetric effect once and performing the rendering equivalent of multicasting the volumetric effect for multiple users. For example, each user may have a second rendering node (such as a VR headset), and the network may comprise a central first rendering node. The first rendering node may render the volumetric event, and distribute partially rendered frames depicting the volumetric event to the different second rendering nodes. The second rendering node for each user may then integrate the partially rendered frames depicting the volumetric event into a view of the virtual environment which is currently being shown to each user, based on parameters such as the user's virtual position.

The receiver 15, decoder 16 and display device 17 may be consolidated into a single device, or may be separated into two or more devices. For example, some VR headset systems comprise a base unit and a headset unit which communicate with each other. The receiver 15 and decoder 16 may be incorporated into such a base unit.

In some embodiments, the network 14 may be omitted. For example, a home display system may comprise a base unit configured as an image source, and a portable display unit comprising the display device 17.

In the event that the decoder 16 or the display device 17 does not or cannot handle one or more layers, the receiver 15 or another transmitter associated with the decoder or display device may send a corresponding layer drop indication back through the network 14. The layer drop indication may be received by each rendering node. A rendering node which generates partially or fully rendered frames for that specific decoder or display device may cease generating the dropped layer. On the other hand, a rendering node which generates partially or fully rendered frames for multiple end devices may disregard a layer drop indication received from one end device (as the dropped layer is still needed for other devices). Alternatively, rendering nodes which serve multiple end devices may record received layer drop indications, and may cease generating the dropped layer only when all end devices served by the rendering node indicate that the layer is to be dropped.

In preferred examples, the encoders or decoders are part of a tier-based hierarchical coding scheme or format. Hierarchical coding enables frames to be communicated with higher resolution and/or higher frame rate than is possible in single-tier coding schemes. In hierarchical coding, one or more enhancement layers is communicated with base data, where the enhancement layers can be used to up-sample the base data at the decoder, for example providing up-sampling in a spatial or temporal dimension. When combined with equivalent down-sampling of the original frames and generation of the enhancement layer at an encoder, hierarchical coding can overall provide lossless compression of data, with higher resolution and/or higher frame rate for a given transmission bit rate. Examples of a tier-based hierarchical coding scheme include LCEVC: MPEG-5 Part 2 LCEVC ("Low Complexity Enhancement Video Coding") and VC-6: SMPTE VC-6 ST-2117, the former being described in PCT/GB2020/050695, published as WO 2020/188273, (and the associated standard document) and the latter being described in PCT/GB2018/053552, published as WO 2019/111010, (and the associated standard document), all of which are incorporated by reference herein. However, the concepts illustrated herein need not be limited to these specific hierarchical coding schemes.

A further example is described in W02018/046940, which is incorporated by reference herein. In this example, a set of residuals are encoded relative to the residuals stored in a temporal buffer.

LCEVC (Low-Complexity Enhancement Video Coding) is a standardised coding method set out in standard specification documents including the Text of ISO/IEC 23094-2 Ed 1 Low Complexity Enhancement Video Coding published in November 2021, which is incorporated by reference herein.

The system describes above is suitable for generating and presenting a representation of a scene, where this scene displays media content to a user. The scene typically comprises an environment, where the user is able to move (e.g. to move their head or to turn their head) to look around the environment and/or to move around the environment. For example, the scene may be a scene of a room in a building, where the user is able to move around the room (e.g. by moving in the real-world and/or by providing an input to a user interface) in order to inspect various parts of the room. Typically, the scene is a XR (e.g. a VR) scene, where the user is able to move about the scene in three degrees of freedom (3DoF) or six degrees of freedom (6DoF) so as to experience the scene.

As has been described with reference to Figure 1, the image generator 11 may be arranged to determine point cloud data, where each point of the point cloud has a 3D position and one or more attributes. More generally, the image generator (or another component) is arranged to determine a three-dimensional representation of a scene, where this three-dimensional representation is thereafter used to generate two-dimensional images that are presented to a user at the display device 17.

While the points are typically points of a point cloud, more generally the disclosure extends to any point that is associated with a location and a value. Therefore, the points may, more generally, be considered to be data (or datapoints), which data is associated with a location and a value, and the 'points' may comprise polygons, planes (regular or irregular), Gaussian splats, etc. Referring to Figure 3, there is described a method of determining (an attribute for) a point of such a three-dimensional representation. The method comprises determining the attribute using a capture device, such as a camera or a scanner. The scene may comprise a real scene, in which attribute values are captured using a camera, or a virtual scene (e.g. a three-dimensional model of a scene), in which attribute values are captured using a virtual scanner.

Where this disclosure describes 'determining a point it will be understood that this generally refers to determining a point that has a location and an attribute value, where determining the point comprises determining the attribute value and/or storing a pointthat comprises at least an attribute value and a location value (these values may be indirect values, e.g. where the location is identified relative to another point).

Once a plurality of points have been captured, these points can be stored as a three-dimensional representation (e.g. a point cloud) so as to enable the reconstruction of the three-dimensional scene based on this representation.

Typically, the scene comprises a simulated scene that exists only on a computer. Such a scene may, for example, be generated using software such as the Maya software produced by Autodeske. The attributes determined using the methods described herein may then depend on virtual objects located within the scene as well as a virtual lighting arrangement used in the scene.

In a first step 31, a computer device initiates a capture process for a capture device, the capture process being initiated with an initial azimuth angle (e.g. of 0°) and an initial elevation angle (e.g. of 0°).

In a second step 32, the computer device causes a point to be captured using the capture device at the current azimuth angle and current elevation angle. Capturing a point typically comprises assigning an attribute value to the point, which attribute value may, for example, be a color of the point and/or a transparency value of the point. Typically, the point has one or more color values associated with each of a left eye and a right eye of a viewer. Capturing the point may also comprise determining a normal value associated with the point, e.g. a normal of a surface on which the point lies. Typically, capturing the point further comprises determining a location of the point, e.g. by determining a distance of the point from the camera.

In practice, determining the point may comprise sending a 'ray' from the capture device and then stepping through a computer model to determine which surface of the computer model is impacted by the ray. The color, transparency, and normal of this surface are then recorded alongside the distance of the surface from the capture device.

In a third step, 33, the computer device determines whether a point has been captured for the capture device at each azimuth of a range of azimuths and in a fourth step 34, if points have not been captured at each azimuth, then the azimuth angle is incremented and the method returns to the second step 32 and another point is captured. The azimuth angle may, for example, be incremented by between 0.01° and 1° and/or by between 0.025° and 0.1°. Typically, the range of azimuth angles is selected to be 360° (i.e. so that the capture device captures points surrounding the entirety of the capture device), but it will be appreciated that other ranges are possible_ Once a point has been captured for each azimuth, in a fifth step 35, the computer device determines whether a point has been captured for the capture device at each elevation of a range of elevations and in a sixth step 16, if points have not been captured at each elevation, then the azimuth angle is reset to the initial value, elevation angle is incremented and the method returns to the second step 32 and another point is captured. The elevation angles may, for example, be incremented by between 0.01° and 1° and/or by between 0.025° and 0.1°. Typically, the range of elevation angles is selected to be 360° (i.e. so that the capture device captures points surrounding the entirety of the capture device), but it will be appreciated that other ranges are possible.

In a seventh step 37, once points have been captured for each azimuth angle and each elevation angle, the scanning process ends.

This method enables a capture device to capture points at a range of elevation and azimuth angles. This point data is typically stored in a matrix. The point data may then be used to provide a representation of the scene to a user, e.g. the three-dimensional representation formed by the point data may be processed to produce two-dimensional images for each eye of a user, with these images then being shown to a user via the display device 17 to provide a virtual reality experience to the viewer. By using the captured data, a video can be provided to a viewer that enables the viewer to move their head to look around the scene (while remaining at the location of the capture device).

It will be appreciated that the capture pattern (or scanning pattern) described with reference to Figure 3 is purely exemplary and that numerous capture patterns are possible. In general, the capture process for each capture device comprises capturing one or more points at one or more azimuth angles and/or one or more elevation angles.

The 'points' captured by the capture device are typically associated with a size, such as a height, a width, or a depth. That is, the points typically relate to two-dimensional planes/pixels and/or three-dimensional voxels. In this regard, there is necessarily some space between the locations of adjacent points (since if the points had no width, then an infinite number of points would be required to capture points at each angle). The size provides points that depict a non-negligible area of the three-dimensional space so that a plurality of points can be fit together to provide a depiction of the scene to a viewer.

The width and height of each point is typically dependent on the distance of that point from the capture device, where more distant points have a larger width/height. The width and height of each point is typically determined so that when each point is displayed, there is no space between adjacent points (indeed, there may be some overlap between points to ensure that no gaps appear between points). This height/width of each point can be determined at the time of capturing the points, or can be determined or defined after the capture of the points.

Typically, the points comprise a size value, which is stored as a part of the point data. For example, the points may be stored with a width value and/or a height value. Typically, the minimum width and the minimum height of a point are set by the angle increment of the azimuth angle and the elevation angle respectively. The size may be then specified in terms of this angle increment and/or in terms of this minimum width/minimum height (e.g. as being a multiple of the angle increment). In some embodiments, the size value is stored as an index, which index relates to a known list of sizes (e.g. if the size may be any of 1x1, 2x1, 1x2, 2x2, pixels this may be specified by using 3 bits and a list that relates each combination of bits to a size). The size may be stored based on an underscan value. In this regard, where an object is very near to the viewing zone it may be captured using an unnecessarily dense arrangement of points. Therefore, certain surfaces or areas of the representation may be associated with an underscan value, which underscan value defines a reduction in the number of points captured as compared to a representation without underscan. The size of the points may be defined so as to indicate this underscan value. In an exemplary embodiment, the underscan value is an integer value between 0 and 3 and the size is stored as a combination of point dimensions (e.g. a width in the range [0,2]) and a height in the range ([0,2]) and an underscan factor (e.g. an underscan factor in the range [0,3]).

In some embodiments, the width and the height are dependent on the underscan factor. For example, when the underscan factor exceeds a threshold value, the possible height and width values may be limited. In a specific example, when the underscan factor is 3, the width and the height may be limited to the range [0,1]. The size may then be defined as size = underscan*9 + height*3 + width. Such a method provides efficient storage and indication of width, height, and underscan values.

As shown in Figure 4a, typically, for each capture step (e.g. each azimuth angle and/or each elevation angle), a plurality of sub-points SP1, SP2, SP3, SP4, SP5 is determined. For example, where the azimuth angle increment is 0.1° then for an azimuth angle of 0°, sub-points may be determined at azimuth angles of -0.05°, -0.025°, 0, 0.025°, and 0.05° (and similar sub-points may be determined for a plurality of elevation angles). Attribute values of these sub-points may then be combined to obtain an attribute value for the point. For example, a maximum attribute value of the sub-points may be used as the value for the point, an average attribute value of the sub-points may be used as the value for the point, and/or a weighted average of the sub-points may be used as the value for the point. It will be appreciated that numerous other methods for combining the attribute values of the sub-points are possible.

By determining the attribute of a point based on the attributes of sub-points, the accuracy of the capture process can be increased. While it would be possible to simply reduce the increment of the angle steps to provide a higher resolution scene, by considering sub-points but only storing attributes for points, a balance can be struck between accuracy and file size (since storing every sub-point would lead to a substantial increase in the amount of data that needs storing).

With the example of Figure 4a, for each point of the three-dimensional representation that is captured by a capture device, this capture device may obtain attributes associated with each of the sub-points SP1, SP2, SP3, SP4, SP5, combine these attributes to obtain a point attribute, and then store a point with a distance that is an average (e.g. a weighted average) of the distances of the sub-points from the capture device, at the nominal angle of the point, with the point attribute.

As shown in Figure 4b, where a plurality of sub-points SP1, SP2, SP3, SP4, SP5 are considered, these points may have different distances from the location of the capture device. In some embodiments, the attributes of the sub-points may be combined in dependence on this distance, e.g. so that sub-points nearer to the capture device have higher weightings.

However, the possibility of sub-points with substantially different distances raises a potential problem. Typically, in order to determine a distance for a point, the distances for the sub-points are averaged. But where the sub-points have substantially different distances and/or are related to different surfaces in the scene, this may result in the point having a distance that does not correspond to any actual surface in the scene. Therefore, the point may seem to hang in space (e.g. to hang between the front and rear surfaces shown in Figure 4b.

Similarly, where the attribute values of the sub-points greatly differ, e.g. if the sub-points SP1 and SP2 are white in colour and the sub-points SP3 and SP4 are black in colour, then the attribute value of the point may be substantially different to the attribute value of other points in the scene. In an example, if the scene were composed of black and white objects, the point may appear as a grey point hanging in space between these objects.

In some embodiments, the computer device is arranged to aggregate sub-points so as not to create any floating points. For example, the computer device may determine whether the sub-points are spatially coherent by employing a clustering algorithm (e.g. a k-means clustering algorithm). Where the sub-points are spatially coherent (e.g. where a difference in the distance of the sub-points is below a threshold value), these distances may be averaged to obtain a distance for the point. Where the sub-points are not spatially coherent, the sub-points may be processed to ensure that the distance of any point places it upon a surface; for example, in the system of Figure 4b, sub-points SP1, SP2, and SP3 may be grouped into a first point and sub-points SP4 and SP5 may be grouped into a second point. Since each sub-point is associated with the same capture device and capture angle (all of these sub-points being associated with a capture step that has a particular azimuth angle and elevation angle), these points may be located at the same angle with respect to a capture device. Therefore, to ensure that each sub-point affects the representation considered, the first point (made up of sub-points SP1, SP2, and SP3) may have a smaller distance value than the second point (made up of sub-points SP4 and SP5) and the first point may be assigned a nonzero transparency value so that the second point can be seen through the first point.

By capturing points at a plurality of azimuth angles and elevation angles, e.g. using the method described with reference to Figure 3, it is possible to provide a three-dimensional representation of the scene that can later be used to enable a viewer to view the scene from a plurality of angles. More specifically, given the three-dimensional points captured by the capture device, a computer device is able to render a two-dimensional representation (e.g. a two-dimensional image) of the scene for each eye of a viewer so as to provide a representation with an impression of depth. The computer device may render a series of two-dimensional representations to enable the viewer to look around the scene, where the two-dimensional representations are rendered based on an orientation of the viewer's head. In this way, the determined representation is useable to provide, for example, a virtual reality (VR), mixed reality (MR), augmented reality (AR), and/or extended reality (XR) experience to the viewer.

To enable such a display, the display device 17 is typically a virtual reality headset, that comprises a plurality of sensors to track a head movement of the user. By tracking this head movement, the display device is able to update the images being displayed to the viewer as the viewer moves their head to look about the scene. Typically, this involves the display device sensing the sensor data to an external computer device (e.g. a computer connected to the display device via a wire). The external computer device may comprise powerful graphical processing units (GPUs) and/or computer processing units (CPUs) so that the external computer device is able to rapidly render appropriate two-dimensional images for the viewer based on the three-dimensional images and the sensor data.

In some embodiments, the external computer device may comprise a server device, where the display device 17 may be connected to this server device wirelessly. This enables the two-dimensional images to be streamed from the server to the display device so as to enable the display of high-quality images without the need for a viewer to purchase expensive computer equipment. In other words, operations that require large amounts of computing power, such as the rendering of two-dimensional images based on the three-dimensional representation, may be performed by the server, so that the display device is only required to perform relatively simple operations. This enables the experience to be provided to a wide range of viewers.

In some embodiments, a first two-dimensional image is provided to the display device 17 (and/or a connected device) and this first image is 'warped' in order to provide an image for viewing at the display device. The warping of the image comprises processing the image based on the sensor data in order to provide an image that matches a current viewpoint of the viewer. By performing the warping at the display device or another local device, the lag between a head movement of the user and an updating of the two-dimensional representation of the scene can be reduced.

One issue with the above-described method of capturing a three-dimensional representation is that it only enables a viewer to make rotational movements. That is, since the points are captured using a single capture device at a single capture location, there is no possibility of enabling translational movements of a viewer through a scene. This inability to move translationally can induce motion sickness within a viewer, can reduce a degree of immersion of the viewer, and can reduce the viewer's enjoyment of the scene.

Therefore, it is desirable to enable translational movements through the scene. To enable such movements, the three-dimensional representation of the scene may be captured using a plurality of capture devices placed at different locations (or the same capture device placed at different locations). A viewer is then able to move around the scene translationally (e.g. by moving between these locations).

More generally, by capturing points for every possible surface that might be viewed by a viewer, a three-dimensional representation of a scene may be captured that allows a suitable two-dimensional representation of this scene to be rendered regardless of a location of a viewer (e.g. regardless of where a user is standing within a virtual room).

This need to capture points for every possible surface (so as to enable movement about a scene) greatly increases the amount of data that needs to be stored to form the three-dimensional representation.

Therefore, as has been described in the application WO 2016/061640 Al, which is hereby incorporated by reference, the three-dimensional representation may be associated with a viewing zone, or a zone of viewpoints (ZVP), where the three-dimensional representation is arranged to enable a user to move about the viewing zone so as to view the scene.

Figure 5 illustrates such a viewing zone 1 and illustrates how the use of a viewing zone limits the amount of image data that needs to be stored to provide a three-dimensional representation of the scene. With the scene shown in this figure, and the viewing zone 1 shown in this figure, it is not necessary to determine attribute data for the occluded surface 2 since this occluded surface cannot be viewed from any point in the viewing zone. Therefore, by enabling the user to only move within the viewing zone (as opposed to around the whole scene) the amount of data needed to depict the scene is greatly reduced.

While Figure 5 shows a two-dimensional viewing zone, it will be appreciated that in practice the viewing zone 1 is typically a three-dimensional zone or volume.

The viewing zone 1 may, for example, comprise a rectangular volume, or a rectangular parallelepiped, and the viewing zone may have a height of at least 30 cm, a depth of at least 30 cm, and/or a width of at least 30 cm, where these dimensions enable a user to move their head while remaining in the viewing zone. This is merely an exemplary arrangement of the viewing zone; it will be appreciated that viewing zones of various shapes and sizes may be used (e.g. spherical viewing zones). That being said, it is preferable that the viewing zone is limited so as to cover only a part of the volume of the scene, e.g. no more than 50% of the scene no more than 25% of the scene, and/or no more than 10% of the scene. In this regard, if the viewing zone is the same size as the scene, then the three-dimensional representation will simply be a standard representation for virtual reality (that enables a user to move freely about the scene) -and so the use of the viewing zone will not provide any reduction in file size.

The viewing zone 1 enables movement of a viewer around (a portion of) the scene. For example, where the scene is a room, the base representation may enable a user to walk around the room so as to view the room from different angles. In particular, the viewing zone enables a user to move through the scene with six degrees-of-freedom (6DoF) movement through the scene, where this aids in the provision of an immersive experience.

In some embodiments, the viewing zone 1 may be four-dimensional, where a three-dimensional location of the viewing zone changes over time -and in such embodiments the size and location of the occluded surface 2 may also change over time. More generally, it will be appreciated that viewing zones may be formed in any size or shape, with different sizes and shapes being suitable for different scenes.

The volume of the viewing zone 1 is typically selected so that a user is able to move to a degree sufficient to avoid motion sickness and to provide an immersive sensation, while still only enabling a limited amount of movement (where this leads to a smaller file size as compared to an implementation where a user is able to fully move about the scene). Typically, the viewing zone is arranged to enable a user to move their head while they are sitting or standing, but not to freely roam around a room.

The viewing zone 1 may have a (e.g. real-world) volume of less than five cubic metres (5m3), less than one cubic metre (1 m3), less than one-tenth of a cubic metre (0.1 m3) and/or less than one-hundredth of a cubic metre (0.01m3).

The viewing zone 1 may also have a minimum size, e.g. the viewing zone may have a volume of at least 1% of the volume of the scene, at least 5% of the volume of the scene, and/or at least than 10% of the volume of the scene. Similarly, the viewing zone may have a volume of at least one-thousandth of a cubic metre (0.01 m3); at least one-hundredth of a cubic metre (0.01 m3); and/or at least one cubic metre (1 m3).

The 'size' of the viewing zone 1 typically relates to a size in the real world, where if the viewing zone has a length of one metre this means that a user is able to move one metre in the real world while staying within the viewing zone. The size of the viewing zone in the scene may be greater than, equal to, or less than the size of the viewing zone in the real world. For example, the viewing zone may scale a real-world distance so that moving one metre in the real world moves the user less than (or more than) one metre in the scene. This enables the scene to provide different perceptions to the user (e.g. to make the user feel larger or smaller than they are in real life). Similarly, the viewing zone may scale a real-world angle so that rotating one degree in the real world rotates the user less than (or more than) one degree in the scene.

Therefore, a viewing zone with a volume of one cubic metre typically connotes a viewing zone in which the user is able to move about a one cubic metre volume in the real world while remaining in the viewing zone. And this may cause the user to move about a volume that is more than, or less than, one metre in the scene.

Referring to Figure 6a, in order to capture points for each surface and location that is visible from the viewing zone 1, a plurality of capture devices 01, 02, ..., C9 may be used (e.g. a plurality of virtual scanners and/or a plurality of cameras). Each capture device is typically arranged to perform a capture process, e.g. as described with reference to Figure 3, in which the capture device captures points at a plurality of azimuth angles and elevation angles. By locating the capture devices appropriately, e.g. by locating a capture device at each corner of the viewing zone, it can be ensured that most (or all) points of a scene are captured.

Typically, a first capture device C1 is located at a centrepoint of the viewing zone 1. In various embodiments, one or more capture devices C2, C3, C4, 05 may be located at the centre of faces of the viewing zone; and/or one or more capture devices C6, C7, C8, C9 may be located at edges of and/or corners of the viewing zone.

Figure 6a shows a two-dimensional view (e.g. a plan view) of a rectangular viewing zone. It will be appreciated that within this viewing zone each capture device may be located on a shared plane. Equally, the various capture devices may be located on different planes. Referring, for example, to Figure 6b, there is shown a three-dimensional view of a cuboid viewing zone, where there is a capture device located: at the centre of the viewing zone; at the centre of each face of the viewing zone; and at each corner of the viewing zone.

With this arrangement, many locations in the scene (e.g. specific surfaces) will be captured by a plurality of capture devices so that there will be overlapping points relating to different capture devices. This is shown in Figure 7, which shows a first point P1 being captured by each of a first capture device C1, a sixth capture device C6, and a seventh capture device C7. Each capture device captures this point at a different angle and distance and may be considered to capture a different 'version' of the point.

Typically, only a single version of the point is stored, where this version may be the highest quality version of the point and/or may be the version of the point associated with the nearest and/or least angled capture device.

In this regard, the highest 'quality' version of the point is captured by the capture device with the smallest distance and smallest angle to the point (e.g. the smallest solid angle). In this regard, as described with reference to Figures 4a and 4b, capturing a point for a given azimuth angle and elevation angle typically comprises capturing a plurality of sub-points at varying sub-point azimuth and elevation angles spread around the point azimuth and elevation angles. Due to the different spreads of sub-points, each capture device will capture a different version of the point (that has a different attribute) even when the points are at the same location. Capture devices that are close to the point and less angled with respect to the point typically have a smaller spread of sub-points and so typically obtain a version of a point that is sharper than a version of that point captured by more distant capture devices.

In some embodiments, a quality value of a version of the point is determined based on the spread of sub-points associated with this version (e.g. based on the perimeter formed by these sub-points and/or based on a surface area or volume bounded by these sub-points). The version of the point that is stored may depend on the respective quality values of possible versions of the points.

Regarding the 'versions' of the points, it will be appreciated that two 'points' in approximately the same location captured by each capture device may not have exactly the same location in the three-dimensional representation. More specifically, since each capture device typically projects a 'ray at a given angle, the rays of differing capture devices may contact the surface at different locations for each capture device. Two points may be considered to be two 'versions' of a single point when they are within a certain proximity, e.g. a threshold proximity. For example, where the first capture device C1 captures a first point and a second point at subsequent azimuth angles, and the sixth capture device C6 captures a further point that is in between the locations of the first point and the second point, this further point may be considered to be a 'version' of one of the first point and the second point.

This difference in the points captured by different capture devices is illustrated by Figures 8a and 8b, which show the separate captured grids that are formed by two different capture devices. As shown by these figures, each capture device will capture a slightly different 'version' of a point at a given location and these captured points will have different sizes. Each capture step is associated with a particular range of angles (e.g. a nominal capture angle of 1° might encompass angles from 0.9° to 1.1°), and therefore capture devices that are far from a point to be captured represent a wider region at the capture distance than capture devices closer to that point to be captured. As shown in Figure 8a, the capture device C1 would capture the points P1 and P2 in separate brackets, whereas for the capture device C2 these points are in the same bracket. Therefore, the capture device C2 might determine a single point that encompasses both points P1 and P2, whereas the capture device C1 would determine separate points for these two points.

Considering then a situation in which points P1 and P2 are captured separately, and capture device C1 is used to capture point P1 while capture device C2 being used to capture point P2, it should be apparent that the 'sizes' of these captured points, and the locations in space that are encompassed by the captured points will be based on different grids. For example, the width of the captured point P2 captured by the capture device C2 will be larger than the width of the captured point P1 captured by the capture device C1. The capture process may be determined based on the existence of these different grids, and on the different bracket widths that occur at different distances from a capture device.

Figure 8a shows an exaggerated difference between grids for the sake of illustration. Figure 8b shows a more realistic embodiment in which the three-dimensional representation comprises a plurality of points associated with different capture devices, where these points lie on different grids associated with these different capture devices.

In order to store the points of the three-dimensional representation, the points may be stored as a string of bits, where a first portion of the string indicates a location of the point (e.g. using x, y, z coordinates) and a second portion of the string locates an attribute of the point. In various embodiments, further portions of the string may be used to indicate, for example, a transparency of the point, a size of the point, and/or a shape of the point.

A computer device that processes the three-dimensional representation after the generation of this representation is then able to determine the location and attribute of each point so as to recreate the scene. This location and attribute may then be used to render a two-dimensional representation of the scene that can be displayed to a viewer wearing the display device 17. Specifically, the locations and attributes of the points of the three-dimensional representation can be used to render a two-dimensional image for each of the left eye of the viewer and the right eye of the viewer so as to provide an immersive extended reality (XR) experience to the viewer.

The present disclosure considers an efficient method of storing the locations of the points (e.g. at an encoder) and of determining the locations of the points (e.g. at a decoder).

As has been described with reference to Figures 5a and 5b, the points of the three-dimensional representation are determined using a set of capture devices placed at locations about the viewing zone, where these capture devices are arranged to capture points at a series of azimuth angles and elevation angles. Typically, each of the capture devices is arranged to use the same capture process (e.g. the same series of azimuth angles and elevation angles), though it will be appreciated that different series of capture angles are possible. For example, there may be a plurality of possible series of capture angles, where different capture devices use different capture angles.

In general, the present disclosure considers a method in which points are stored based on a capture device identifier and an indication of a distance of the point from the capture device associated with this capture device identifier. Typically, the point is also associated with an angular indicator, which indicates an azimuth angle and/or an elevation angle of the point relative to the identified capture device.

It will be appreciated that the storage of the distance and the angle may take many forms. For example, the distance and the angle of each point may be converted into a universal coordinate system, where each capture device has a different location in this universal coordinate system. In particular, each point may be stored with reference to a centre of this universal coordinate system, which centre may be co-located with a central capture device. Where a point is determined based on a distance and an angle from a capture device of a known location in this universal coordinate system, the coordinates of the point in this universal coordinate system can be determined trivially -and the location of the point may then be stored either relative to the capture device or as a coordinate in the universal coordinate system.

The capture device identifier may comprise a location of a capture device (e.g. a location in a co-ordinate system of the three-dimensional representation). Equally, the capture device identifier may comprise an index of a capture device. Similarly, the indication of the azimuth angle and the elevation angle for a point may comprise an angle with reference to a zero-angle of a co-ordinate system of the three-dimensional representation. Equally, the azimuth angle and/or the elevation angle may be indicated using an angle index.

In some embodiments, the three-dimensional representation is associated with configuration information, which configuration information comprises one or more of a set of capture device indexes; locations associated with the capture devices and/or the capture device indexes; a spacing of capture devices (e.g. so that locations of the capture devices can be determined from a location of a first capture device and the spacing); angles associated with a capture process for the capture devices; an azimuth angle increment and/or an elevation angle increment associated with the capture process; and a set of angle indexes (e.g. to match an angle index to an angle).

With this configuration information, it is possible to determine a location of each capture device from an index of that capture device and/or to determine a capture angle from a known capture process. Therefore, given two numbers: a capture device index and an angle index (that is associated with a combination of a specific azimuth angle and a specific elevation angle), a location of a capture device and a direction of a point from this capture device can be determined. By also signalling a distance of the point from the signalled capture device, a precise location of the point in the three-dimensional space can be signalled efficiently.

Typically, the point is associated with each of: a camera index, a distance, an first angular index (e.g. a first azimuth), and a second angle (e.g. a second elevation) This method of indicating a location of a point enables point locations to be identified using a much smaller number of bits than if each point location is identified using x, y, z coordinates.

Referring to Figure 9, there is shown a method of determining a location of a point. This method is carried out by a computer device, e.g. the image generator 11 and/or the decoder 15.

In a first step 41, the computer device identifies an indicator of a capture device used to capture the point.

Typically, this comprises identifying a portion of a string of bits associated with a capture device index.

In a second step 42, the computer device identifies an indicator of an angle of the point from the capture device. Typically, this comprises identifying an angle index, e.g. an azimuth index and/or an elevation index and/or a combined azimuth/elevation index, which index(es) identifies a step of the capture process during which the point was captured.

In a third step 43, based on the identifiers, the computer device determines the location of the capture device and the angle of the point from the capture device.

The capture device identifier is typically a capture device index, which is related to a capture device location based on configuration information that has been sent before, or along with, the point data. For example, the configuration information may specify: - Location of first capture device is (0,0,0).

- Step between capture devices is (0,0,1) along the grid, then across the grid, then up the grid. - The grid is (10,10,10).

With this information, a capture device with an index of 1 can be determined to be located at (0,0,0); a capture device with an index of 5 can be determined to be located at (0,0,4); a capture device with an index of 12 can be determined to be located at (0,1,0), and so on.

Equally, the configuration information may specify a list of camera indexes and locations associated with these indexes, where this enables the use of a wide range of setups of capture devices.

Typically, the three-dimensional representation is associated with a frame of video. The configuration information may be constant over the frames of the video so that the configuration information needs to be signalled only once for an entire video. Therefore, the configuration information may be transmitted alongside a three-dimensional representation of a first frame of the video, with this same information being used for any subsequent frames (e.g. until updated configuration information is sent).

The angle identifier may similarly be related to an angle by a location and an increment that are signalled in a configuration file. For example, the configuration information may specify: -An azimuth increment and an elevation increment are each 1°.

-There are 359 increments for each angle type.

With this information: a capture angle with an index of 1 can be determined to be at an azimuth angle of T and an elevation angle of T; a capture angle with an index of 10 can be determined to be at an azimuth angle of 10° and an elevation angle of 0'; a capture angle with an index of 360 can be determined to be at an azimuth angle of 0° and an elevation angle of 1°; and a capture angle with an index of 370 can be determined to be at an azimuth angle of 9° and an elevation angle of 1'; etc. In a fourth step 44, based on the determined location of the capture device and the determined angle, a location of the point is determined. Typically, this comprises determining the location of the point based on the location of the capture device, the capture angle, and a distance of the point from the capture device (where this distance is specified in the point data for the point).

Determining the location of the point typically comprises determining the location of the point relative to a centrepoint of the three-dimensional representation, This location of the point may then be converted into a desired coordinate system and/or the point may be processed based on its location (e.g. to stitch together adjacent points).

The angular identifier typically comprises a first angular identifier and a second angular identifier, where the first identifier provides the azimuthal angle of the point and the second identifier provides the elevation angle of the point.

Referring to Figure 10, each angular identifier may be provided as an index of a segment of the three-dimensional representation, where, for example, an index of 0 may identify the point as being in a first angular bracket 101 and an index of 1 may identify the point as being in a second angular bracket 102.

In this regard, the capture devices are arranged to perform a capture process, e.g. as described with reference to Figure 3, with a non-infinite angular resolution. Given this non-infinite resolution, each point is not a one-dimensional point located at a precise angle. Instead, each point is a point for a particular area of space, with the size of this area being dependent on the angular resolution as well as the distance of the point from the capture device. In other words, each capture angle determines a point for an angular range (with the range being dependent on the angular resolution). That is, if the capture process leads to points being captured at angles of 10°, 11°, and 12° then this can equally be considered to relate to points being captured at a first range of 9.5°-10.5°, a second range of 10.5°-11.5°, and a third range of 11.5°-12.5°.

This is shown in Figure 10, which shows a series of angular brackets, with the size of these angular brackets at a given distance being dependent on the angular resolution. The angular identifier(s) typically comprise a reference to such an angular bracket. Consider, for example, a cube placed with the capture device C1 at the centre of this cube. By dividing this cube into x segments at regular azimuth angles and y segments at regular elevation angles, it is possible to identify any angular range of the representation by reference to an x segment and a y segment (and then the space bracketed by this angular range will depend on both the angular resolution (e.g. the angle between adjacent brackets) and the distance of the point from the capture device).

Typically, each capture device has the same capture pattern so that the angular bracketing of each device is the same (albeit centred differently at the location of the relevant capture device). For example, in an embodiment with 1000 equal angular brackets, the angle for each bracket may be 360/1000 In some embodiments, different capture devices are associated with different capture patterns, where this may be signalled in configuration information relating to the three-dimensional representation.

In some embodiments, each capture device is arranged to capture a point for a plurality of angular brackets, where each bracket is associated with a different angle. The angular spread of each bracket (that is, the angle between a first, e.g. left, angular boundary of the bracket and a second, e.g. right, angular boundary of the bracket) may be the same; equally, this angular spread may vary. In particular, the angular spread may vary so as to be smaller for points which are directly in front of (or behind, or to a side of) the capture device. For example, the embodiment shown in Figure 7 shows an angular bracketing system that is based on a cube. With this system, a cube is placed such that a capture device is located at the centre of the cube and the cube is then split into 1000 sections of equal size (it will be appreciated thatthe use of 1000 sections is exemplary and any number of sections may be used). Each of these sections is then associated with an angular index. With this arrangement, the angular spread of each section (or bracket) varies, as has been described above.

Figure 10 shows a two-dimensional square, where each angular bracket of the square is referenced by an index number (between 1 and 100). In a three-dimensional implementation, an angular bracket of a cube could be indicated with two separate numbers (with a first azimuthal indicator that identifies a 'column' of the cube and a second elevational indicator that identifies a 'row' of the cube). Equally, a singular indicator may be provided that indicates a specific bracket of the cube. Therefore, for a cube that is divided into 1000 elevational sections and 1000 azimuthal sections, the bracket may be indicated with two separate indicators that are each between 0 and 999 or with a single indicator that is between 0 and 999999.

It will be appreciated that the use of a cube to define the brackets is exemplary and that other bracketing systems are possible. For example, a spherical bracketing system may be used (where this leads to curve angular brackets). Equally, a lookup table may be provided that relates angular indexes to angles, where this enables irregularly spaced brackets to be used.

Typically, determining the location of the point comprises determining the location of the point so as to be at the centre of the angular bracket identified by the angular identifier(s).

Texture patches In order to reduce the file size of the three-dimensional representation (and the bandwidth required to transmit the three-dimensional representation) it is desirable to reduce the number of points within the three-dimensional representation. Therefore, referring to Figure 11, there is described a method of determining a texture patch that can replace a plurality of points in the representation.

In a first step 71, the computer device identifies a plurality of points of the representation; in a second step 72, the computer device determines that the points lie on a shared plane; in a third step 73, the computer device determines a texture patch based on the attributes of the points; and in a fourth step 74, the computer determines a new point that references the texture patch (this new point may be referred to as a 'texture point').

The texture patch typically comprises a patch with a plurality of attribute values, which attribute values may be the same as the attribute values of the identified points. Therefore, the texture patch enables the recreation of the plurality of points. A benefit of using the texture patch is that a single point, with a single location value and a (single) reference to the texture point, can replace the plurality of identified points. The attribute values of each point are contained in the texture patch so that little (or no) information is lost from the original representation, but by representing all of these attribute values by reference to the texture patch, only a single location needs to be signalled (saving on the computational cost of signalling locations for a plurality of points). For example, an 8x8 square of identified points that each have separate locations and attribute values may be replaced by a single point with a single location and an attribute value that is a reference to a texture patch (which texture patch comprises the attribute values of the identified points arranged in the relative positions of the identified points); this would reduce the size of the representation by 63 points (where a single point replaces an 8x8 grid of points) at the cost of needing to signal a 8x8 texture patch (that has 64 attribute values and/or transparency values and/or normal values).

This is shown in Figures 12a, 12b, and 12c. Figure 12a shows a plurality of points of a three-dimensional representation that lie on a shared plane. Each of these points has a location and an attribute value. Figure 18b shows how these points may be replaced by a single point (e.g. a 'texture point) that contains a reference to the texture patch shown in Figure 12c. This texture patch may comprise the attribute values of the plurality of points without separately storing the locations of the attributes (instead, the attribute values are laid out in a predetermined pattern, which is a 5x5 grid in the example of Figure 18c).

It will be appreciated that various sizes of texture patch are possible and that the 5x5 grid of Figure 12c is only an example. Another (practical) example of a texture patch is shown in Figure 12d, which shows an 8x8 arrangement of values laid out in the form of a texture patch. As shown in Figure 12d, typically the texture patch provides a continuous grid of pixel values (e.g. that can be used to form a continuous image) -in this regard, the points shown in Figures 12a -12c are shown as separated points. In practice, these 'points' are typically abutting points that form a joined arrangement of values.

In some embodiments, the method comprises determining the texture patch in dependence on a difference of the attributes of the identified points exceeding a threshold (e.g. in dependence on a variance, a range, or a maximum difference of these attributes exceeding a threshold) In this regard, points that are similar in both location and attribute may be aggregated into a single point with a location and attribute that is based on the initial points and a size that covers both of the initial points (e.g. two adjacent points of the same colour and a size of 1 may be aggregated into a single point of this colour with a size of 2). Such an aggregation does not require any determination of a texture patch. In contrast, a texture patch may be determined where there is a plurality of dissimilar points (e.g. points with dissimilar attributes) that lie on a shared plane, where the use of the texture patch enables the attributes of each of these points to be signalled in an efficient manner.

The second step 72 of determining that the points lie on a shared plane may comprise determining that the points are lie on a shared surface (e.g. on the same object), where the method may comprise identifying a surface associated with the identified points.

Determining that the points lie on a shared plane may comprise comparing a distance of (each of) the points from this plane and/or surface to a threshold distance and determining that the points lie on the threshold/plane if they are within this threshold distance from the plane.

This second step 72 may also, or alternatively, comprise identifying a normal for each of the points, which normal may be contained in point data of the points, and determining a similarity of the normals (e.g. determining that each of the normals is within a threshold value of an average normal and/or determining that a variance of the normals is below a threshold value).

The texture patch is typically determined based on this determination in the second step 72, where if (e.g. only if) the identified points lie on a shared surface or plane then they may be replaced by a single point that references a texture patch.

In some embodiments, the texture patch may be determined for points that lie on a curved plane, where the second step 72 may comprise determining that the points lie on a curved plane or a curved surface. Such a texture patch may be associated with a bend value to enable the reproduction of the identified points. Typically, the texture patch comprises a quadrilateral, where the texture path may be able to bend about a line that is formed between opposite corners of this quadrilateral so as to map the texture patch to a curved surface.

The threshold distance (for the points to be considered co-planar) may depend on the distance of the points from the viewing zone; in particular, points that are located far from the viewing zone may have a higher threshold separation than points that are located nearer to the viewing zone. Typically, users are better able to identify separations between a plurality of surfaces when these surfaces are near to the viewing zone whereas users may not be able to identify separations between surfaces that are distant from the viewing zone. Therefore, the maximum (threshold) acceptable distance between the identified points and a plane passing through the identified points may be dependent on the distance of the identified points from the viewing zone (e.g. the threshold may increase from a first value when the identified points are within 1 km of the viewing zone to a second value when the identified points are more than 1 km from the viewing zone).

Typically, the texture patch is associated with a size, where there may also be provided a plurality of texture patches of different sizes. Identifying the plurality of points may then comprise identifying a plurality of points that could potentially be replaced with a single point that references a texture patch. This may involve the computer device iterating through a plurality of pluralities of identified points and then evaluating each of these pluralities of identified points in order to determine whether the points lie on a shared plane. If these identified points are found to lie on such a shared plane, then a texture patch may be determined based on the attributes of these points and this texture patch may be added to a database of texture patches.

The texture patch may be associated with one or more of one or more attribute values; one or more transparency values; one or more normals; etc. The texture patch may comprise a plurality of points that correspond to the points used to form the texture patch (e.g. where each point of the texture patch comprises an attribute, a normal, and/or a transparency of a corresponding point of the three-dimensional representation). The method may comprise determining a multi-layered texture patch and/or a plurality of texture patches. For example, the method may comprise determining a texture patch for each eye of a user (where these texture patches may be located at the same index of separate databases of texture patches so that they can be signalled by a single reference in a point).

In some embodiments, texture patches for each of a left eye and a right eye are stored in a shared database. The index for a texture patch for a first eye may then be set as being one greater than the index for a texture patch of a right eye, where this simplifies the signalling of the texture patches. In some situations, e.g. for diffuse non-reflective materials, each eye may be associated with the same texture patch. In these situations, only a single texture patch may be included in the database (with the index that would otherwise contain a second texture patch instead pointing to the single texture patch). Equally, the same texture patch may be stored twice. By storing the texture patches for each eye adjacent to each other, there is an increased ability to benefit from similarities between these texture patches when encoding the texture patch database.

Texture Atlas The present disclosure considers an efficient method of encoding a database of texture patches, which is henceforth referred to as a 'texture atlas'. More generally, the present disclosure considers an efficient method of processing one or more images of a plurality of images so as to enable computer devices to more efficiently store and transmit this plurality of images. This method is particularly applicable to texture patches, but more generally may be applied to any type of image.

The 'image' typically comprises a plurality of attribute values (e.g. a plurality of pixel values and/or a plurality of colours). For example, the image may be a texture patch that comprises an, e.g., 8x8 arrangement of pixels. A point that references an image is therefore a point that is associated with a plurality of attribute values (whereas other points are typically associated with only a single attribute value).

As described above, replacing groups of points within the three-dimensional representation with a texture point that refers to a texture patch within a texture atlas provides a reduction in the file size of the three-dimensional representation since this reduces the amount of location data that is stored for the points represented by the texture patch.

In some embodiments, each texture patch is stored separately (e.g. each texture patch is stored without any consideration of other texture patches) and/or each texture atlas is stored separately (e.g. an entirely new texture atlas is determined for each three-dimensional representation). In these embodiments, each texture atlas may have a large file size, and this can slow down processing (e.g. encoding, transmitting and decoding) of the three-dimensional representation of a scene. Therefore, it is desirable to reduce or compress the file size of each texture atlas -methods for providing such a reduction in the size of a texture atlas are described below.

Figures 13a and 13b show schematic examples of different arrangements of texture atlases. Specifically: Figure 13a depicts a first texture atlas TA-1 that is formed from three square texture patches TP-X, TP-Y, TP-Z arranged in a line; Figure 13b depicts a different texture atlas TA-2 formed from five square texture patches TP-A, TP-B, TP-C, TP-D, TP-E arranged in an irregular arrangement.

In general, the texture atlas may be considered to be a database. Therefore, each texture patch may be stored with reference to an index within this database. In practice, the texture atlas typically comprises a two-dimensional image, where each texture patch occupies a different location in this image. Typically, each texture patch has a predetermined size and a predetermined attribute spacing; for example, each texture patch may have a size of 5x5, 6x6, 7x7, or 8x8 and the attributes of each texture patch may form a contiguous arrangement of attributes (e.g. an 8x8 texture patch may occupy an 8x8 arrangement of pixels within the texture atlas and this may relate to a contiguous 8x8 arrangement of adjacent angular brackets in the three-dimensional representation). In such embodiments, the references of the texture points may each comprise an index and these indexes may be related to texture patches within the texture atlas based on the known size of each texture patch; e.g. a first texture patch may start at the pixel (1,1) with a second texture patch starting at (9,1), and so on. It will be appreciated that texture patches with various, and/or irregular, sizes and spacings may equally be used where, for example, a size and a starting pixel of each texture patch may be defined in a header of the texture atlas.

Typically, the texture atlas is a collation of all the texture patches which have been determined from the three-dimensional representation. As described above, the texture patches can be of different sizes and shapes but typically the texture patch forms an M x N rectangle of points (and hence a corresponding M x N rectangle of attribute values, typically for each attribute of said rectangle of points). More typically the texture patch is an N x N square, also known as a texture quad, as depicted in Figures 13a and 13b. In such embodiments the texture atlas may be formed by simply appending all the square texture patches into one 2D image, wherein the texture patches are identified and located by a texture atlas index which locates the texture patch within the texture atlas (2d image). For example, in Figure 13b, TP-A (the upper leftmost texture patch) may be given the texture atlas index [0,0] and TP-E may be given the texture atlas index [1,2]. Thus, said texture atlas index may be associated with the corresponding texture point (in the three-dimensional representation) such that the correct texture patch may be retrieved from the texture atlas, during decoding, by way of locating the texture patch at said texture patch index in the texture atlas. It will be appreciated that the number of texture patches determined within any three-dimensional representation will vary and hence the size of the texture atlas will similarly vary. Generally, the texture atlas may be a 2D image of any dimensions.

In some embodiments, the texture atlas may instead be implemented as a lookup table or dictionary wherein the texture patches are not arranged in a spatially adjacent manner. In this embodiment, the texture atlas index may be implemented as a dictionary index within the dictionary such that a texture patch may be identified and selected from within the dictionary by the dictionary index.

It will be appreciated that the specific implementation of the texture atlas will vary depending on the implementation of the texture patches. For example, if the texture patches are of an irregular shape or size then it may be preferable to use a dictionary implementation as they may not be as easily formed into a 2D image. Alternatively, storage as a 2D image may allow quicker generation of the texture atlas and so may be preferable in other cases. In general, the methods disclosed herein are applicable to the processing of an image of a plurality of images, regardless of how this plurality of images is stored.

As described above, each point in the three-dimensional representation may have multiple attributes.

Typically, texture patches are generated for each attribute of a point (e.g. a right eye attribute, a left eye attribute, a transparency, etc.) and hence a texture atlas may be generated for each attribute (such that multiple texture atlases may be generated from one three-dimensional representation). Equally, each texture atlas may define a plurality of attributes that relate to a corresponding texture point (e.g. by containing a plurality of adjacent texture patches for each texture points or by containing texture patches that define a plurality of points).

For one or more of the texture points, a texture atlas may be formed that defines only a subset of the possible attributes. For example, for a texture point that is far from the viewing zone, there may be formed only a single texture patch that is used for both of a left eye and a right eye (in this regard, viewers typically are not able to notice stereoscopic effects for objects far from the viewing zone). In some embodiments a texture atlas may be formed from only a subset of possible texture patches or of generated texture patches. In such embodiments, the computing device may determine which texture patches to include in the texture atlas based on information about the texture patch -for example, the determination may be based on one or more of: an average of attribute values within a texture patch, a standard deviation of attribute values within a texture patch or a location of a texture point in the three-dimensional representation. In particular the computer device may be arranged to form the texture atlas in dependence on the distances of texture points in the three-dimensional representation from the viewing zone, where more information (e.g. more texture patches or texture patches for more attributes) are stored for points closer to the viewing zone. In some embodiments, the computer device is arranged to determine a distance of a texture point from the viewing zone and to store: a first number of texture patches for this texture point if the distance does not exceed a threshold and a second number of texture patches for this texture point if the distance exceeds the threshold, with the first number typically being greater than the second number.

As described above, in some embodiments, a texture patch may combine multiple attributes such that a resulting texture atlas similarly represents a combination of the attribute values. For example, a single texture atlas may include all three RGB values of a point. In some embodiments, related texture atlases may be combined such that related texture patches (such as those for a left and right eye of a user) are given adjacent texture atlas indexes.

In some embodiments, a plurality of texture atlases are generated based on a plurality of three-dimensional representations. In such embodiments this plurality of texture atlases may be compiled into a 'composite' texture atlas, where this composite texture atlas may be arranged as previously described (for example, as a composite two-dimensional image or a composite database of two-dimensional-images). The composite texture atlas may comprise some or all of the texture patches within each constituent texture atlas and/or the composite texture atlas may comprise texture patches that have been determined based on these texture patches within the constituent texture atlases.

For example, the texture atlas TA-1 of Figure 13a may be compiled with the texture atlas TA-2 of Figure 13b to create a composite texture atlas (not shown) comprising all texture patches TP-A, TP-B, TP-C, TP-D, TP-E, TP-X, TP-Y, and TP-Z. This composite texture atlas may be arranged differently to the texture atlas TA-1 and the texture atlas TA-2 (e.g. the composite texture atlas may be larger than either of these original texture atlases). This composite texture atlas allows texture patches derived from a plurality of three-dimensional representations to be combined into a singular (composite) texture atlas. Typically, a composite texture atlas comprises more texture patches than a texture atlas derived from only one three-dimensional representation. It will be appreciated that the methods described below, in regard to texture patches in a texture atlas, may be equally applicable to texture patches in a composite texture atlas. Indeed, typically a composite texture atlas may be functionally identical to a (regular) texture atlas.

The composite texture atlas may then be associated with a plurality of three-dimensional representations, e.g. with at least three, at least five, and/or at least ten three-dimensional representations. For example, the three-dimensional representations may be sent in a bitstream as a group of representations, where this group may be accompanied by the composite texture atlas (e.g. ten three-point representations may be sent alongside a single composite texture atlas). The references in each representation of this group of representations may then be treated as being references to the composite texture atlas.

The method may include processing one or more of the representations in the group of representations so as to update the texture points in these representations. In this regard, the compilation of the composite texture atlas typically leads to changes in the index values of each texture path (e.g. the texture patch TPX may be located at a first index in the initial texture atlas TA-2 and at a second, different, index in the composite texture atlas). The method may then comprise processing a texture point that references the texture patch TP-X via the first index so as to update a reference of this texture point to relate to the second index. Typically, this comprises modifying an attribute datafield of the texture point so that this modified attribute datafield references the second index.

Intra-Atlas Compression In general, the present disclosure considers methods of decreasing the size of a texture patch within a texture atlas (or, more generally, an image within a set of images) by processing the texture patch to locate repeated information within the texture patch (e.g. information that is present in another texture patch in the texture atlas) and, optionally, to remove this repeated information from the texture patch. Such processing of a texture patch enables the texture atlas to be encoded more efficiently.

Typically, the method involves replacing, in the texture atlas, many similar texture patches with a (singular) representative texture patch that approximates these similar texture patches. The differences between the representative texture patch and the similar texture patches can then be signalled in the form of 'deltas' (or a 'delta') so that the original texture patches can be reproduced. Equally, where some loss of quality is acceptable, the representative texture patch may be sent without the deltas. In other words, the present disclosure considers both lossless and lossy compression -as will be discussed in more detail below.

For many scenes, it can be expected that spatially adjacent points within the three-dimensional representation will have similar or related attribute values. Thus, it can be expected that texture patches formed from spatially adjacent groupings of points may also have similar or related texture patches. In particular, as discussed above texture patches are typically formed from points which are determined to be substantially co-planar; therefore, if a three-dimensional representation of a scene depicts a real-world scene that contains features such as repetitive patterns or large flat surfaces it can be expected that a plurality of texture patches in a texture atlas may represent similar or substantially similar groupings of points (and hence a number of the constituent texture patches may be 'similar' to one another). In a simple example, a plurality of texture patches may relate to the same, single-colour, wall and so these texture patches may each contain identical or near-identical attribute values.

At least due to this possibility, when a texture atlas is generated, this atlas might contain a plurality of texture patches which are redundant as they provide the same or similar information to other texture patches. Due to this redundancy, one method of processing the texture atlas is to identify a first texture patch and a second texture patch and to remove the second texture patch from the texture atlas in dependence on a similarity of these texture patches exceeding a threshold. Equally, a method may comprise identifying the similarity of the texture patches prior to the storing of the second texture patch and then storing the second texture patch in the texture atlas only if a threshold of dissimilarity between this second texture patch and the first texture patch (that is already stored) is exceeded.

Figure 14a shows such a method for processing a plurality of texture patches. In a first step 81, the computer device identifies a first texture patch.

In a second step 82, the computer device identifies a second texture patch.

In a third step 83, the computer device determines a similarity value associated with a similarity of the first texture patch and the second texture patch.

In a fourth step 84, the computer device processes the first and/or the second texture patch in dependence on the similarity value. For example, the computer device may remove the second texture patch from the texture atlas in dependence on a similarity of the first and second texture patches.

Typically, the similarity value is determined so as to be higher if a difference between the first texture patch and the second texture patch is smaller and hence the patches are more similar. The similarity value may, for example, relate to an average difference between corresponding values of the texture patches or to a maximum difference between corresponding values of the texture patches (e.g. where each texture patch comprises a plurality of component values set out in an grid of size X,Y, the similarity value may be determined as: similarity value = -Valuerp,,,,y). It will be appreciated that numerous methods are possible for determining a similarity between two two-dimensional arrays and that any such method may be used to determine the similarity value.

The method may comprise modifying the three-dimensional representation so as to provide a plurality of texture points that reference a single texture patch, e.g. that reference the same texture atlas index, so as to allow for the deletion of any texture patches that are not referenced by a texture point. If two patches are substantially similar then a viewer of the three-dimensional representation may not be able to detect a difference when viewing the reused patch in place of the original patch. However, it will be appreciated that this compression is dependent on locating suitable candidate texture patches for re-use and that a texture patch's ability to be reused depends on how 'similar' other patches are to it.

In other words, the method may comprise identifying a second texture point associated with the second texture patch and modifying (e.g. updating) a reference of the second texture point so as to reference the first texture patch (and then removing the second texture patch from the texture atlas).

More generally, in response to the similarity value exceeding a threshold value, the second texture patch may be removed from the texture atlas and any texture points referencing this second texture patch may be modified/updated to instead reference the first texture patch. Equally, the second texture patch may be modified based on the first texture patch. In particular, the second texture patch may be replaced with a delta texture patch, the delta texture patch comprising values that indicate a difference between the first texture patch and the second texture patch. The delta texture patch may also comprise a reference to the first texture patch. With this method, the second texture patch can be regenerated using the first texture patch and the delta texture patch, where the delta texture patch can typically be compressed to a greater degree than the second texture patch (since the values of the delta texture patch will be small). The threshold similarity may be selected (e.g. by a user or by the computer device) so that the combined encoding size of the delta texture patch and a reference to the first texture patch is smaller than the encoding size of the second texture patch (e.g. so that replacing the second texture patch with the delta texture patch reduces the compressed size of the texture atlas).

Referring to Figure 14b, there is described a method for processing an image. In particular, the method may comprise compressing a plurality of images (e.g. the texture atlas). This method is typically performed by a computer device as a post-processing step that is performed after a three-dimensional representation has been generated (e.g. after each point of the three-dimensional representation has been captured) and after a texture atlas has been formed based on the points of the three-dimensional representation. This method of Figure 14b may then be used to reduce the amount of data in the texture atlas.

In a first step 91, the computer device determines a characteristic set of pixels within each image of a plurality of images. For example, the computer device may determine a set of pixels within each image or the computer device may determine a set of pixels of a grid within each of a plurality of texture patches.

As used herein a 'pixel' refers to a datapoint within an image. These values may comprise pixels of a two-dimensional image with, e.g. a colour value, but more generally a 'pixel' may refer to a datapoint in an image where a pixel may then be associated with, for example, a normal value of a three-dimensional point (where the 'image' is formed of a plurality of normal values of different points).

Typically, the plurality of images are arranged as a texture atlas and each image of the plurality of images is a texture patch, such that determining a characteristic set of pixels comprises determining the pixels comprising the texture patch.

The characteristic pixels may comprise each pixel of an image/texture patch. Equally, the characteristic pixels may comprise a subset of these pixels or a function of these pixels, where the use of a subset of pixels enables a similarity of the texture patches to be determined more rapidly and while using fewer resources, albeit to a lower degree of confidence.

In a second step 92, the computer device determines at least one set of attribute values for each characteristic set of pixels. For example, this may comprise determining a colour value, a brightness value, or a transparency of each of the pixels within the characteristic set of pixels. In practice, this typically comprises identifying each attribute value for each of the plurality of texture patches so that these attribute values can be compared. The determined attribute values are typically associated with a position of these values within a texture patch so that, for example, a bottom right attribute value of a first texture patch can be compared to a bottom right value of a second texture patch.

In a third step 93, the computer device determines at least one subset of similar images within the plurality of images based on a similarity of the determined attribute values for these images. Typically, this comprises forming one or more clusters of similar images using a clustering algorithm. A centroid point of each cluster can then be determined to represent this cluster, as will be described below.

In a fourth step 94, the computer device generates a representative (or 'centroid') image of the subset, wherein the representative image is generated based on the determined attribute values of the images within the subset. Typically, this comprises selecting an image from within the subset of images which is to be the representative image (so that the representative image may be an existing image). Equally, this step may comprise generating a new image to be the representative image, wherein the attribute values of the new image are determined based on the attribute values of the similar images. For example, the new (representative) image may be generated by taking a (e.g. weighted) average of the values of each image within the subset of images so that, for example, the bottom right value of the representative image is a weighted average of the bottom right values of the images within the subset. The weighting for each of the images may depend on the distance of each image in a cluster from the centroid of that cluster (as is described further below with reference to Figure 16).

In a fifth step 95, the computer device processes the plurality of images based on the generated representative image. This typically comprises the computer device removing one or more images (e.g. texture patches) from within the plurality of images (e.g. the texture atlas) and/or replacing one or more texture patches with 'delta' patches (where a delta patch indicates a difference between a first, representative, texture patch and a second texture patch and a delta patch typically replaces the second texture patch). Alternatively, or additionally, processing the texture atlas may comprise adding a representative texture patch to the atlas and/or replacing an original texture patch in the atlas with a representative texture patch. This will typically result in a reduction of the total size of the texture atlas (or an increase in the achievable compression of the texture atlas).

In regard to the fifth step 95, while the method may involve 'replacing' a member of the subset with the representative image, it will be understood that the exact implementation of the replacement may take many forms. In embodiments wherein the plurality of images are texture patches -which are linked to texture points in a three-dimensional representation -the 'replacing' may involve modifying one or more texture points to reference a texture atlas index of the representative image and then (optionally) deleting a now unused texture patch from the texture atlas with the overall effect being that the total number of unique texture patches referenced by the texture points is reduced.

In some embodiments, the plurality of images are associated with a plurality of different three-dimensional representations. In particular, the method may comprise determining a characteristic set of pixels for texture patches from each of a plurality of texture atlases and then determining the representative images based on these determined characteristic sets of pixels.

In this way, a 'composite' texture atlas can be formed that comprises representative images for a plurality of (e.g. successive) three-dimensional representations. This enables redundancies between representations to be considered. In this regard, each three-dimensional representation is typically associated with a respective texture atlas that is determined for that representation. Where the scene comprises surfaces that are stationary or near-stationary, then similar texture patches may be determined for these surfaces for successive three-dimensional representations so that successive texture atlases associated with these representations may comprise similar texture patches.

By considering a plurality of texture atlases during the determination of the representative image(s), these sorts of similar texture patches can be identified and a single representative image can be determined for these similar texture patches from different texture atlases. The composite texture atlas can then comprise this single representative image and the relevant texture points in the plurality of three-dimensional representations can be updated to each reference this single representative image.

Typically the plurality of images is a plurality of texture patches. However, it will be understood that a plurality of images may refer to any grouping of representations wherein said representations are defined by a set of attributes (pixels) and wherein the representations have at least one member of the set of attributes in common such that any one representation may have a similar value to another representation in terms of said common attribute. Therefore, the method of Figure 14b describes how any such representations may be processed and/or compressed. In some embodiments, pixels may have a discrete value (as opposed to a continuous value) and the representative pixels may comprise a set of discrete attributes (e.g. binary values or keys on a keyboard) For example, each image in the plurality of images may comprise a line of code, where each line of code has the attributes of a first character position, a second character position, a third character position etc. In such embodiments a similarity value may be defined as the number of identical characters at identical positions within the lines of code. In such embodiments a delta (or delta patch) may comprise an index to locate a non-identical character in the line of code and a replacement character. Thus, a plurality of lines of code may be processed and/or compressed by the method of Figure 14b.

The plurality of images may be combined into a larger image, wherein each image then represents a section of that larger image. For example, the larger image may be a photograph of 1920 by 1080 pixel-dimensions and the plurality of images may comprise four constituent 960 by 540 quadrants. It will be understood that the larger image may be of any dimensions and said components may be of any smaller dimensions. In this regard, as has been described above, the texture atlas may be stored in the form of a two-dimensional image where the texture patches are then extracted from this two-dimensional image.

Clustering In some embodiments, the method comprises sorting the images (e.g. the texture patches) into groups, or 'clusters', of similar texture patches. Clustering typically involves grouping data in a data set by examining similarities in the characteristics or attributes of the data. The choice of how similarity is assessed, e.g. which attributes are given most importance, and/or how many clusters are formed, may be decided by a user that is processing a three-dimensional representation or may be an automatic decision that is made by a computer device based on, for example, an available processing time and/or available processing resources.

Figure 15a shows a method of clustering a plurality of images into a plurality of clusters where each cluster is defined by a representative (e.g. 'centroid') point.

In a first step 101, the computer device generates a plurality of representative points for a plurality of images (e.g. the computer device determines a plurality of centroid points). The representative points may, for example, be multi-dimensional points that relate to attribute values of the pixels of an image.

In a second step 102, the computer device calculates a distance from each image in the plurality of images to each representative point. Typically, this comprises determining a multi-dimensional (e.g. N-dimensional) point or vector for each of the plurality of images and comparing this N-dimensional point to each of the representative points.

The representative points may themselves be determined based on the N-dimensional points of the images. For example, the method may comprise: forming a plurality of N-dimensional points based on the images in the plurality of images; and generating one or more representative points based on these N-dimensional image points (e.g. where the representative points are located at the centres of clusters of N-dimensional image points).

In a third step 103, the computer device determines a nearest representative point for each image; typically, the nearest representative point is the representative point that is closest to the representative point in the N-dimensional space. In some embodiments, the different dimensions of the N-dimensional space are associated with different weightings and the determination of the nearest point is dependent on these weightings (this can enable, for example, a similarity of central pixels to be given more weight than a similarity of corner pixels).

In a fourth step 104, the computer device identifies a plurality of clusters, where each cluster comprises one or more images and a representative point. Each cluster then relates to a different subset of images.

Typically, the computer device performs the method of Figure 15a on all images in the plurality of images. However, in some embodiments, e.g. where the computer device may have a maximum memory and the number of images in the plurality of images may require more memory than the maximum memory to be analysed (or Wrangled') simultaneously, the method of Figure 15a may further comprise a sampling step prior to the first step 101, wherein the computer device selects a sample set of images from within the plurality of images which is typically of a smaller size than the plurality of images. This sample set of images may be used instead of the plurality of images in the second, third and fourth steps of Figure 15b to thereby allow all members of the sample set of images to be analysed simultaneously. It will be appreciated that the plurality of representative points identified based on the sample set of images may be applied as representative points for -and hence used to determine a plurality of clusters of -the whole plurality of images.

The selection of a sample set of images may be done by a random sampling method, wherein every member of the plurality of images has an equal chance to be selected. However, typically the sample set of images will be selected with a stratified sampling method wherein a rule or scheme may be applied to preference the selection of certain members within the plurality of images. For example, where the plurality of images are a plurality of texture patches in a composite texture atlas, formed of multiple constituent texture atlases, a rule may be implemented to select at least one member of each constituent texture atlas within the sample set of texture patches. It will be appreciated that various other methods for sampling a set of images are possible.

Similarly, the computer device may consider a sample of values within each of the texture patches. For example, the computer device may be arranged to determine a sample set of values for each of the considered texture patches (where this sample set is typically associated with a location that is the same for each of the texture patches) and to determine the representative image(s) based on the determined sample sets of values. In a practical example, the computer device may consider a 2x2 arrangement of central pixels for each of the texture patches in order to detect a similarity between 8x8 texture patches. As mentioned above, this sampling may involve a stratified method of sampling, e.g. to prioritise a selection of central pixels of the texture patches over a selection of edge pixels of the texture patches.

Typically, the clustering algorithm is implemented using a k-means algorithm. In such embodiments, each texture patch is assigned to a centroid point (or just 'centroid') in multidimensional space and the 'distance' between these texture patches and the assigned centroid point is calculated.

Figure 15b shows a detailed method of performing clustering on a plurality of sets of attribute values via use of a k-means algorithm on a computing device. As described above, the number of centroid points used, and the number of clusters formed, may be decided by a user or may be decided by a computer device.

In a first step 151, the computer device converts each set of attribute values to an N-dimensional point with a location in N-dimensional space, wherein Nis determined by the size of the set of attribute values (e.g. a set of 64 attribute values -as may be derived from an 8x8 grid of pixels in a texture patch -would cause the generation of a 64-dimensional point).

In a second step 152, the computer device generates at least one N-dimensional centroid, wherein each N-dimensional centroid has a location in N-dimensional space. As described above, the number of centroids used may be chosen by a user or may be selected automatically by a computer device. This number of centroids may be selected, for example, based on an available processing time where the use of more centroids is typically associated with an increased processing time.

In a third step 153, the computer device calculates the distance between each N-dimensional point and each N-dimensional centroid. Typically, distance refers to the distance in Euclidian space and is defined as the squared distance such that negative values do not negate positive ones.

In a fourth step 154, the computer device assigns each N-dimensional point to a cluster, where a cluster is defined as a set of N-dimensional points which have a nearest N-dimensional centroid in common.

Typically, this will result in the generation of one cluster per N-dimensional centroid.

In a fifth step 155, the computer device calculates a new N-dimensional centroid for each cluster, wherein the location of the new N-dimensional centroid is defined as the mean of the locations of each N-dimensional point within the cluster.

In a sixth step 156, the computer device calculates a current cumulative distance, wherein the current cumulative distance is defined as the sum of the distance between each N-dimensional point and the corresponding new N-dimensional centroid summed over all clusters.

In a seventh step 157, the computer device compares the current cumulative distance to a previously stored cumulative distance (in memory).

If the previously stored cumulative distance is larger than the current cumulative difference (or no previously stored cumulative distance exists), the computing device overwrites the previously stored cumulative distance with the current cumulative distance and the method repeats the third step 153 to the seventh step 157.

If the previously stored cumulative distance is smaller than, or substantially the same as, the current cumulative difference, the computing device returns the current cumulative distance.

It will be appreciated that, in regard to the comparison in the seventh step 157, 'substantially the same as' indicates that the current value is within some tolerance level of difference from the previously stored value. Therefore, the method of Figure 15b typically comprises repeating the steps until the iterative method described has effectively converged to a value.

In some embodiments, the centroids are generated based on previously determined centroids for a previous set of images. In particular, centroids for a first three-dimensional representation may be determined using the above method and these initial centroids may then be used as a starting point for the centroids for a subsequent representation. Such a method of determining a set of initial centroids for a second three-dimensional representation based on a set of final centroids for a first three-dimensional representation enables an improvement in the speed and efficiency of the determining of the clustering process.

Figure 16 shows clusters that may be obtained using the method of Figure 15b. Specifically, Figure 16 shows a plurality of 2-dimensional points 161-a, ..., 161-e displayed on a set of axes 162, 163. Each of the 2-dimensional points 161-a, ..., 161-e represents an image from the plurality of images.

Also shown in Figure 16 is a plurality of 2-dimensional centroids 164-a, 164-b, 164-c that have been found for clusters CL-1 to -3. As can be seen from this figure, a first set of 2-dimensional points 161-a, 161-b, 161-c and 161-d form a first cluster CL-1 that is associated with a first 2-dimensional centroid 164-a. As discussed in regard to the method of Figure 14a and 14b, a first 2-dimensional point 161-c nearest the 2-dimensional centroid 164-a, e.g. defined as being the shortest Euclidian distance 165 away, may then be used as the representative image of the subset and used to replace the images represented by one or more other 2-dimensional points 161-a, 161-b and 161-d. Alternatively, the 2-dimensional centroid 164-a may be used to generate a new representative image, with attribute values such that this new image is associated with the 2-dimensional centroid. Such a new representative image may be used to replace one or more of a first set of images represented by the first set 2-dimensional points 161-a, ..., 161-d.

The axes 162, 163 shown in Figure 16 each correspond to an attribute value within the original set of attribute values for each 2-dimensional point. For ease of display, the example result of Figure 16 shows the results of the clustering method of Figure 15b when performed on a plurality of images wherein the set of attributes values determined at each point only comprise 2 values (hence N=2, resulting in 2 dimensional points). However, it will be appreciated that the clustering method is typically applied to sets of data of higher dimensionality. As described above, if the plurality of images are texture patches from a texture atlas, each image typically comprises an 8x8 grid and the method may perform clustering in 64 dimensions. In this embodiment, the result may be a plurality of clusters associated with 64 axes.

Thus, the method of Figure 15b is able to cluster (group) together texture patches with similar values in similar locations on the grid. For example, returning to Figure 13b, the method would likely form a cluster that contains the texture patches TP-B and TP-C as these texture patches have similar attribute values at similar locations. In practice, such similar texture patches may correspond to continuous surfaces within the three-dimensional representation, e.g. the texture patches may be associated with adjacent pieces of a flat blank wall. However, it will be understood that this is simply an example, and that other forms of similarity may exist and may be determined.

In some embodiments, the method of Figure 15b further includes determining a set of attribute differences, such that in addition to the set of attribute values the relationship between different attribute values within a texture patch is also determined and included as additional dimensions on which clustering is performed. Typically a set of attribute differences comprises a set of values indicating the relative increase or decrease in an attribute value between neighbouring points in the grid of the texture patch. This may allow the method to determine a similarity (and hence form a cluster of) texture patches which represent similar features but at different light levels e.g. texture patches which corresponding to a partially shaded wall. Referring to Figure 13b, such a method of clustering may lead to the clustering of texture patches TP-D and TP-E, where the texture patch TP-E is similar to the texture patch TP-D albeit of a uniformly lower brightness.

Additionally or alternatively, the method of Figure 15b may further comprise generating matrix variants, wherein additional copies of the original texture patch are modified by a matrix operation such as but not limited to: translational, rotational, reflection matrix operations. The matrix variants are then also provided as dimensions to be considered during the clustering algorithm such that texture patches representing similar data which has been translated rotated or reflected may also be formed into a cluster. This may be useful when two texture patches represent different pieces of the same tessellating pattern (for example polka-dot or chequerboard patterned surfaces).

In Figure 16, it can be seen that the axes 162 and 163 are not of equal value in determining the clusters. The plurality of 2-dimensional points 161-a, ..., 161-e vary far more greatly along the axis 162 than along axis 163. Indeed, a clustering method which only analysed the attribute value corresponding to axes 162, would likely find the same or similar clusters and 2-dimensional centroids. In some embodiments, prior to the fourth step, the method further comprises an initial linearisation step, wherein axes which produce a relatively small amount of deviation about a mean are removed from consideration. For example, a 64-dimensional analysis may find that 10 such dimensions produce little to no variation and hence only 54 dimensions may be used to find a plurality of 54-dimensional centroid locations in the fourth step onwards.

A 'relatively' small amount typically is defined as within one standard deviation of the mean, but it will be understood that this restriction may be implemented in different ways depending on the nature of the images (e.g. the texture patches) being considered by the method.

As described above, with the method of Figure 15b, the quantity of centroids designated during clustering is typically variable and may, for example, be selected by the user. The quantity of centroids may be assigned the letter K and the quantity of images within the plurality of images may be assigned the letter P. If K is the same as P, then there is one centroid for every image in the plurality and the method will likely iterate until the locations of the centroids are exactly the same as the locations of the points. This happens because the seventh step of the method of Figure 15b attempts to minimise the cumulative distance between each point and a centroid and hence this will be minimised when there is one centroid at the exact location of every point. So, in this case the method provides no compression.

Conversely, if K equals 1 and P is greater than 1 then the method is effectively attempting to find the average image from all available images in the plurality of images. While very compressed in file size, the skilled person will appreciate that this will typically lead to a blurred and unreadable output if the plurality of images are significantly different from one another. Hence the skilled person will appreciate that the choice of K allows one method of control of the level of compression, wherein the image becomes less compressed as the quantity of centroids, K, approaches the quantity of images P in the plurality of images.

Typically, the computer device is configured to select a quantity of centroids, K, that is less than half the quantity of images P in the plurality of images (e.g. K < P/2). The exact value of k may be selected by an iterative process and/or by trial and error, where this iterative process may be controlled by a user or by the computer device.

In some embodiments, the method of determining clusters is dependent on a target quality, where the value for K is selected as the lowest value that achieves this target quality. For example, a user may select a target quality of 40 dB peak signal to noise ratio (PSNR) and then the computer device may perform an iterative clustering process (where K is incremented, increased, and/or decreased at each step of the process) until this target quality is achieved. For example, the computer device may use a binary search algorithm in order to identify this lowest acceptable value for K, where each step of the search involves identifying the quality achieved for a different K. The threshold may be a hard threshold or may be a soft threshold. For example, the computer device may be arranged to consider -or to present to a user -quality values for a plurality of K values near the threshold. In a practical example, a value K = 100 achieves a PSNR of 39.99dB at 10MB and a value of K = 300 achieves a PSNR of 40.01dB at 20MB. In this scenario, the slight increase from using more clusters would generally not be worth the increased data requirements. It will be appreciated that various target metrics are possible and that, in general, the computer device is arranged to select a value of K in dependent on one or more of a quality, a peak signal to noise ratio, a file size, and a number of clusters.

Typically, the target metric is a combination of the quality and the number of clusters. In some embodiments, a user input is used to select a final number of clusters (where a user may be presented with a plurality of options with different target metrics).

In some embodiments, the method comprises determining clusters separately for a plurality of components of an attribute. In particular, the method may comprise determining clusters separately for each component R, G, B (red, green, blue) of a colour attribute or for each component Y, U, V (luma, chroma, chroma) of a colour attribute. The method may comprise determining clusters for a transparency (or alpha) component of an attribute. Therefore, the method may comprise clustering (separately) RGB, RGBA (red, green, blue alpha), and/or YUV components of each image.

In the description above, the clustering had been described primarily with reference to determining a point (in N-dimensional space) for each image in the set of images (e.g. for each 8x8 image). In some embodiments, the clustering instead involves determining a point for one or more segments of each image in the set of images. For example, where the images comprise an 8x8 arrangement of pixels, points may be determined for each 2x2 set of pixels within this 8x8 arrangement. This can provide further compression by identifying similarities between smaller sections of images (instead of identifying similarities between images as a whole) at the cost of an increased computing cost due to the need to determine a greater number of points.

The method may then comprise determining one or more representative image segments, where these image segments can later be combined (e.g. in different ways) to form the representative image(s). In such embodiments, a reference in a texture point may be updated to comprise a reference to a plurality of representative image segments as well as information that enables the segments to be combined in order to form a texture patch forthat texture point. For example, where the image segments are 2x2 arrangements of pixels, the texture point may reference four image segments in order and the computer device may be preconfigured to combine these image segments in a clockwise arrangement based on the order, Equally, the texture point may reference a plurality of image segments and may define a location for each image section. With such an arrangement, different texture points may use the same image segments in different ways, providing increased versatility.

Lossless Compression The methods described above allow for lossy compression wherein, while the file size of the three-dimensional representation has been substantially reduced (by reducing the total number of unique texture patches used), there may also be a significant reduction in the quality of the three-dimensional representation. As described above, a second texture patch may be removed from the texture atlas based on a similarity of this texture patch to a first texture patch and any references to the second texture patch may be updated to reference the first texture patch. In such an embodiment, there will be some loss of quality of the three-dimensional representation unless the first and second texture patches are identical.

As mentioned above, and as described in more detail below, the disclosed methods may also be used to facilitate lossless (and/or less lossy) compression. In particular, this disclosure envisages the use of delta values to indicate a difference (e.g. deltas) between the second texture patch and a remaining texture patch. A texture point that originally referenced the second texture patch can then be updated to reference this remaining texture patch and the deltas.

In some embodiments, the second texture patch is replaced by a reference to the remaining texture patch and the deltas. Such a method avoids any need to modify the second texture point (since a computer device that is processing the three-dimensional representation is able to follow an initial reference to the second texture patch to identify the remaining texture patch and the deltas). Equally, the second texture point may be modified to identify that the second texture patch now references deltas (instead of referencing the second texture patch).

Deltas Returning to Figure 13b, as discussed above, texture patches TP-B and TP-C are substantially identical except for two pixels (or points). These two pixels can be found in the third row, first column and the third row, fourth column (starting in the upper leftmost corner), henceforth [3r,1c] and [3r,4c] respectively. Effectively, in principle, when starting with TP-B, it is possible to change the texture patch into TP-C by simply altering the value of the attribute at [3r,1c] and [3r,4c]. This alteration can be seen in Figure 16 where a dashed line 166 between 2-dimensional point 161-a and 2-dimensional point 161-d shows the Euclidian distances in each dimension. Specifically in this case, to change 161-d into 161-a requires a relatively large increase in the attribute value represented by axis 162 and a relatively small increase in the attribute value represented by axis 163. This difference, henceforth called a delta, provides a quantified means to alter one point into another (and hence one texture patch into another).

The dashed line 166 of Figure 16 representing the delta between 2-dimensional point 161-a and 2-dimensional point 161-d is a 2-dimensional delta. However, deltas typically comprise more dimensions. The maximum number of dimensions in a delta is the value N of the first step of Figure 15b, wherein N is determined by the size of the set of attribute values.

Using the example of Figure 13b above, the delta from TP-B to TP-C may be formed as: [3r,1 c] -2.45778 [3r,4c] +1.64562 wherein the attribute value represented by the pixel at [3r,1c] would decrease somewhat and the attribute value represented by the pixel at [3r,4c] would increase somewhat. Henceforth, these components of the delta are referred to as the change location, e.g. [3r,1c], and the change value, e.g. -2.45778 (which may be positive or negative). The change location and the change value provided above are purely exemplary and it will be appreciated that they will depend on the specific images being considered.

Forming a delta, as described above, has benefits when combined with the method of Figure 15b for clustering data by determining centroids. Deltas can be made between any two points, but specifically any subset of images (or cluster) may instead be represented as its respective representative image and a plurality of deltas, wherein each delta indicates how the representative image may be changed to recover any of the other images in the cluster via reference to the delta's composite change locations and the change values. As discussed, the representative image may, in some embodiments, be the image represented by the point at the shortest Euclidian distance to the centroid or, in other embodiments, an image generated with attribute values such that it is located at the centroid. In either embodiment, the method of Figure 14b may further comprise the step of generating a plurality of deltas, wherein each delta indicates the change locations and the change values required to transform the representative image to one of the images in the subset of images.

More generally, deltas may be any method of signalling a difference between two images, where this difference may relate to a difference in Euclidean space, e.g. the difference may relate to a colour difference for a given pixel, etc. The fifth step of the method of Figure 14b may comprise replacing, in the plurality of images, at least one member of the subset of images with the representative image and with a delta based at least in part on the difference between the representative image and the one member of the subset of images. Typically, in the context of texture patches, this comprises changing the texture atlas index of a texture point associated with a texture patch to the texture atlas index of the representative image and an additional delta.

In some embodiments, this comprises replacing a texture patch with a reference to a further texture patch and one or more deltas. In this regard, each texture patch may comprise an 8x8 image where each pixel of the image indicates an attribute value of that pixel. A replaced texture patch may be replaced with a series of pixels that instead identifies a further texture patch and one or more deltas.

Deltas may be implemented in a number of ways which affect how a delta may be encoded, as will be described below. Figure 17 shows schematic examples of a plurality of deltas generated as a result of a plurality of delta implementations. All deltas in Figure 17 represent the specific delta from texture patch TP-B to texture patch TP-C (as previously presented in Figure 13b) to facilitate comparison.

The delta TP-B-IC [1] shows the delta which results when a delta is encoded with both a change location and a change value arranged in a grid. At change location [3r,1c] a change value of -2.45778 is encoded and at change location [3r,4c] a change value of +1.64562 is encoded. The shading of the dot in the schematic indicates the change value, wherein the darker the shading the more positive the change value is, and the lighter the shading the more positive the change value is. A decoding device may use this information, along with a representative image, to identify the location of pixels needing to be changed and how much to change the attribute values of said pixels by.

The delta TP-B-*C [2] shows an embodiment where a delta is encoded with only a change value arranged in a grid. The striped pixels indicate a 'null' attribute value. In such embodiments every possible change location in the texture patch is encoded with a value wherein change locations where there is no change value are encoded with a 'null' value. In such embodiments, a decoding device may determine a change location from the ordering in which the decoding device reads the delta (e.g. when read left to right then top to bottom, the eleventh change value in the delta will be at the change location [3r,1c]) It will be appreciated that in such embodiments the encoding device may also encode information related to the size and arrangement of the grid in the texture patch such that a decoding device may determine a change location.

The delta TP-B-*C [3] shows an embodiment where a delta is encoded with a change location and a change value arranged in a grid, but where the change value is implemented as an absolute value. The pixel at [3r,4c] in TP-B-)C [3] is of a darker shading than the corresponding pixel in TP-B-*C [1] because the pixel represents the absolute value of the pixel at the change location [3r,4c] in TP-C (as opposed to the difference between the corresponding pixels in TP-B and TP-C). In such embodiments, a decoding device, when reconstructing the texture patch with the delta, may overwrite the attribute value at [3r,4c] in TB-C with the change value at [3r,4c] in the delta (as opposed to summing the attribute value at [3r,4c] and the change value at [3r,4c]).

The specific implementation of a delta and the nature of the texture patches being processed may effect any one of: encoding time, decoding time and amount of compression. For example, a delta of the type described in reference to TP-B4C [2], where instead of directly locating a change value within the delta every pixel in the delta is parsed in an order, may be faster to process when the number of possible change locations within delta is of a smaller size such that the number of attribute values to parse is similarly small.

Conversely, if the delta is of significant size (e.g. millions of pixels) and/or the fraction of null values in the delta is relatively high then parsing every pixel within the delta may be less efficient than identifying a number of specific change locations. Equally the ratio of null values to non-null values (and hence values requiring a change location) may cause the file size of a delta to increase or decrease depending on the nature of the texture patches involved. For example a delta which would comprise mostly null values may be reduced in size when implemented with the change locations encoded, similar to TP-B3C [1].

In addition, the delta TP-B-*C [2] effectively represents a texture patch wherein only the pixels at the change locations have an attribute value. This implementation may allow a delta to be read by a decoding device with the same method the decoding device applies to non-delta texture patches. This may increase decoding speed as the decoding device does not have to utilise different methods for reading a texture patch and a delta patch.

In a further example, a delta of the type described in reference to TP-B-*C [3], where a decoder may overwrite the attribute value of a pixel with an absolute value (as opposed to performing a summing operation), may have increased decoding speed as, generally, a decoder device may perform a writing operation faster than a summing and writing operation. However, use of an absolute value may increase the file size (typically a number of bits) required to store each change value and hence the overall file size of each delta may be increased. For example, if a first texture patch comprises a pixel with a brightness attribute value of 1563 (e.g. lumens) and a second texture patch comprises a pixel with a brightness attribute value of 1567, then a delta may be formed from the first texture patch to the second texture patch comprising a change value of +4. However, if the change value is implemented as an absolute value then a delta may be formed from the first texture patch to the second texture patch comprising a corresponding change value of 1567. Thus, it will be appreciated that attributes which are typically associated with lower magnitude values (e.g. RGB values which range between 1 and 255) may be more suitable when deltas are implemented as absolute values.

Hence, it will be appreciated that the most suitable or efficient implementation of a delta may vary depending on the nature of the texture patches involved and the priorities of a user. Therefore, it is envisaged that the implementation of a delta may be controllable with settings that are predetermined (e.g. defined by a user) before running the method of 15b. In some embodiments, a control input means may be provided wherein said settings may be chosen by a user. Additionally, or alternatively, said settings may be chosen by an operation within the computer device.

It will be appreciated that while the deltas described above are presented in a grid, a delta may not be formed in the same arrangement as the corresponding texture patches. In general, deltas may be arranged in any form or shape.

While generating a plurality of deltas may involve generating a delta for every image in the subset, it will be understood that the method may generate deltas for only a selection of images within the subset. The selection of images for which deltas are generated may be based on the size of these delta, wherein the size of the delta relates to both the number of different change locations and/or the magnitude of each change value.

In this regard, if two texture patches are very similar, particularly if these texture patches relate to texture points that are far from the viewing zone, then a viewer may not be able to distinguish between these texture patches. Therefore, the computer device may determine whether to determine delta values for a texture patch based on a similarity of said texture patch to a representative texture patch and/or based on a distance of an associated texture point to the viewing zone.

In practice, this may comprise the computer device determining that a second texture patch is similar to a first texture patch; removing the second texture patch from the texture atlas; determining a distance of a second texture point associated with the second texture patch from the viewing zone; and storing deltas that indicate a difference between the first texture patch and the second texture patch in dependence on the deltas and/or the viewing zone (e.g. where the deltas are only stored if the deltas are large, if the second texture point is near to the viewing zone, or if a function of the delta values and the distance of the second texture point from the viewing zone exceeds a threshold).

In this regard, the method of Figure 14b may the involve use of a threshold delta size, wherein only deltas above the threshold delta size are generated and used in the replacing (fifth) step of Figure 14b The threshold data size may be implemented in a multitude of ways, such as but not limited to; a minimum percentage difference in change value from the representative image, a minimum number of change locations, and a minimum cumulative sum of all change values. In some embodiments the threshold delta size is selectable by the user. Thus, deltas may only be provided for images where the difference from the representative image is noticeable to the human eye. It will be appreciated that setting no minimum threshold delta size will cause a delta to be created between each centroid and all images within the centroid's respective cluster. Furthermore, setting a threshold size equivalent to infinity (or a suitably high value) will cause no deltas to be generated (as no delta will exceed the minimum requirements).

Typically, prior to any transmission of the three-dimensional representation, the determined deltas are encoded by a computer device. In some embodiments, as part of this encoding, the computer device may quantise the deltas such that the value of each a delta is truncated (or rounded) to one of a predetermined set of values (e.g. a set of values defined by a quantisation curve). By providing a large set of possible quantisation values, e.g. by providing a small quantisation step, a high degree of accuracy can be provided for the delta values. However, this requires the use of a large number of bits in order to account for the large set of possible values. Therefore, it may be desirable to use a limited number of bits, and a limited set of possible values for the delta values.

In some embodiments, the set of possible values (the quantisation set) is determined so that only deltas above a threshold size are generated and used in the replacing (fifth) step of Figure 14b -this may involve providing a quantisation curve that has a large initial quantisation step (between 0 and between a minimum threshold delta value) and then a smaller quantisation step between this minimum value and a second value.

The quantisation levels and/or step and/or the set of possible values may be determined by a user input and/or may be determined based on a feature of the images and/or of the three-dimensional representation. For example, the computer device may determine a range of deltas for a texture atlas and then determine the set of quantisation values based on this range. For example, the same number of quantisation levels may be used for each texture atlas, where these levels are used to quantise a smaller range (and thus to provide increased accuracy) for texture atlases with smaller ranges of delta values. The set of quantisation values that has been used for a set of deltas may be signalled in a bitstream associated with the deltas (e.g. this set may be selected from a plurality of possible sets of quantisation values and then signalled via an index).

One benefit of quantising the delta values is that this can be used to encode deltas using a predetermined number of bits within the computer memory and hence this may allow a decoder to read a plurality of deltas more quickly than a series of deltas of unknown number of bits. Furthermore, a plurality of deltas each with pre-determined number of bits may more easily be divided amongst a plurality of decoders and hence a decoder device may be able to read multiple delta values in parallel.

Therefore, a user may have a fine level of control over the level of compression by changing the number of centroids to be determined (the value of K) and/or adjusting the threshold delta size. It will be understood that the optimum combination will vary depending on the desired level of compression and the desired minimum level of image quality in the resulting compressed plurality of images.

Where a composite texture atlas is used that contains representative images for a plurality of three-dimensional representations, a set of deltas may be determined for each of the original texture patches in each three-dimensional representation. That is, deltas may be determined between the representative images in the composite texture atlas and the original texture patches associated with the various (separate) three-dimensional representations. The method of transmitting the three-dimensional representations may then involve transmitting a group of three-dimensional representations alongside the composite texture atlas (so that texture points from a plurality of different three-dimensional representations reference texture patches within the composite texture atlas) and then sending a set of deltas for each of the three-dimensional representations. Therefore, a device that is attempting to render an two-dimensional image based on a three-dimensional representation is able to: identify a texture point in a given three-dimensional representation; identify a representative texture patch associated with the texture point from the composite texture atlas; identify a delta associated with the texture point based on a set of deltas associated with the given three-dimensional representation; and determine an attribute value for a plurality of pixels of the two-dimensional image by combining the representative texture patch with the identified delta.

Generally, a delta may be used together with a representative image to recover the original image.

Typically, the representative image is a centroid image of a texture patch (or centroid texture patch) determined by the method of Figure 14b and the original image is an original texture patch.

Figure 18 shows a method for processing a texture patch based on a representative texture patch and a delta.

In a first step 121, the computer device determines a representative texture patch.

In a second step 122, the computer device determines a delta for an identified texture patch based on the representative texture patch. In particular, as described in Figures 14a and 14b, the representative texture patch may be a first texture patch and the identified texture patch may be a second texture patch that is similar to the first texture patch. The delta may then define a difference between the first texture patch and the second texture patch so that the second texture patch can be replaced with a reference to the first texture patch and the delta.

In a third step 123, the computer device processes the identified texture patch based on the determined delta. In particular, as described above, the computer device may replace the identified texture patch with the determined delta.

Determination of an attribute value of the scene As described above, the texture patches may be used to render a two-dimensional image based on the three-dimensional representation. In particular, the texture patches may be used so that a single point within the three-dimensional representation can be associated with a plurality of attribute values that lie on a shared plane (where the single 'texture' point can then define a location of these attribute values so as to provide an increase in efficiency as compared to an implementation where each attribute value is associated with a separate point with its own location).

Typically, the texture patches are stored within a texture atlas, where each texture point in the three-dimensional representation references a texture patch (e.g. by indicating an index within the texture atlas).

As has been described above, the methods of processing a texture patch within the texture atlas may comprise updating the references of texture points associated with any processed texture patch.

Thereafter, a representation of the scene may be rendered based on the updated references (and, e.g. based on deltas that are associated with the texture atlas).

Figure 19 shows a method of determining an attribute value of the scene in dependence on a reference in a texture point. In particular, the attribute value may be used to determine a pixel value of a rendered two-dimensional image of the scene.

In a first step 131, the computer device identifies a texture point.

In a second step 132, the computer device identifies a reference in this texture point to a texture patch.

Typically, the texture points comprise a reference to a texture patch in place of an attribute value (where other points may contain an attribute value, such as a colour, in an attribute field, the texture points may contain a texture patch index in this field) The texture atlas index may be a modified texture atlas index, wherein the original texture atlas index has been replaced during the previously described processing. In some embodiments, the method comprises identifying that a reference has been modified, where the computer device may be arranged to modify the reference so as to identify that the modified reference refers to, for example, a representative texture patch or a delta In a third step 133, the computer device determines a plurality of attribute values for the texture point based on the referenced texture patch. Determining a plurality of attribute values may comprise identifying a texture patch located at a texture atlas index and identifying the attribute values of this texture patch.

Figure 20 shows a method of processing a texture point to obtain attribute values in dependence on a texture patch and a delta.

In a first step 141, the computer device identifies a texture point.

In a second step 142, the computer device identifies a reference in this texture point to a texture patch.

In a third step 142, the computer device identifies a reference to a delta. The reference to the delta may be stored within the texture point, where the computer device is able to identify each of a texture patch and a delta from the point data. Equally, the reference to the delta may be located in the texture atlas, where the computer device may determine a texture patch in the texture atlas based on the reference in the texture point and the computer device may then identify that this texture patch is intended to be combined with a delta.

In a third step 144, the computer device determines a plurality of attribute values for the texture point based on the referenced texture patch and the referenced delta. In some embodiments, determining a plurality of attribute values for the texture point may comprise performing an operation on the referenced texture patch in dependence on the referenced delta. Typically said operation may comprise summing one or more of a plurality of attribute values of the texture patch with one or more of a plurality of attribute values of the delta. Additionally, or alternatively, said operation may comprise overwriting one or more of a plurality of attribute values of the texture patch with one or more of a plurality of attribute values of the delta.

Essentially, the computer device reproduces an original texture patch for the texture point based on the referenced texture patch and the delta.

In some embodiments, the combination of the delta with the texture patch may depend on a user input and/or on a feature of the three-dimensional representation and/or a feature of the delta. For example, the combination may only occur if the texture point is within a threshold distance of the viewing zone, where the use of deltas beyond this threshold distance may not be perceptible to a viewer in the viewing zone and so the deltas may not be used.

Encoding (and formation of a bitstream) Typically, the processing of a plurality of images as described above may be performed on a computer device, wherein the resulting processed plurality of images (e.g. the texture atlas) may then be stored and/or transmitted by the computer device. Typically, the processed plurality of images will be encoded into a bitstream (a series of bits) by an encoding device before being transmitted. Typically, the bitstream will be decoded by a decoding device after being transmitted.

Typically, the texture atlas is associated with a three-dimensional representation, and the computer device may be arranged to generate a bitstream that comprises the processed texture atlas and one or more points of the three-dimensional representation. In some embodiments the texture atlas is a composite texture atlas which is associated with multiple three-dimensional representations and the computer device may be arranged to generate a bitstream that comprises the processed composite texture atlas as well as one or more points of a plurality of three-dimensional representation.

Figure 21 shows a schematic of a bitstream comprising 3 sections wherein each section comprises bits encoded from a portion of the processed plurality of images.

Bit-a to Bit-d forms a first section of the bitstream wherein a plurality of representative images may be encoded. Typically said representative images may be texture patches arranged in a texture atlas. The first section of the bitstream may comprise one or more flags, wherein flags are bits which may indicate the format of the data related to the processed plurality of images. For example, flags may indicate information related to the size and shape of said texture patches may also be encoded into the bitstream. In various embodiments, the bitstream may comprise one or more flags that indicate: whether texture patches are used in the three-dimensional representation(s); whether representative texture patches are used in the three-dimensional representation(s); whether deltas are used in the three-dimensional representations; and whether a (and which) quantisation curve should be used to interpret the texture patches and/or the deltas.

Bit-e to Bit-f forms a second section of the bitstream wherein a plurality of indexes linking to the plurality of representative images may be encoded. Typically linking to the plurality of representative images comprises a texture atlas index wherein the index identifies a location within a texture atlas such that at least one image from the plurality of representative images may be referenced by bitstream for each image in the original (unprocessed) plurality of images.

Bit-g to Bit-I forms an optional third part of the bitstream wherein a plurality of deltas may be encoded. As described above, deltas may be optionally generated dependent on a user choice and so may or may not appear within the bitstream. Typically, said deltas comprise a series of change locations and change values however, in some embodiments a delta may comprise only a series of change values or absolute values. Typically, a delta is encoded for each image in the original (unprocessed) plurality of images. In some embodiments, flags are used to indicate if deltas appear within the bitstream. Furthermore, flags may indicate if the deltas are quantised and to what degree of accuracy (e.g. flags may indicate deltas have been truncated to 4 bits in length).

In some embodiments, flags may indicate the length (in number of bits) of each section of the bit stream and/or if any section of the bitstream can be subdivided into portions of equal length. Hence flags may indicate if a decoder device may parallelise the decoding step by segmenting the bitstream into a plurality of decodable portions.

It will be appreciated that the number of bits in each section and the ratio of bits between sections are provided purely as an example and that in practice the number and ratio will vary based on the nature of the plurality of images provided.

While the bitstream is typically encoded in the order of the sections provided above, the bitstream may be encoded in any order. In some embodiments bits from the second section and the third section may be 'interlaced', wherein each index is immediately followed in the bitstream by the corresponding delta. The bitstream described above may be decoded by a decoding device and this may allow the original (or similar to the original) plurality of images to be re-generated. A method of decoding said bitstream may comprise the steps of identifying a first, a second and optionally a third section of bits in a bit stream; generating a plurality of representative images based on the bits in the first section of the bitstream; generating a plurality of final images based on the bits in the second section of the bitstream; and, optionally, modifying the plurality of final images based on bits in the third section of the bitstream.

Typically, generating a plurality of final images comprises; identifying a texture atlas index in the second section of the bitstream; identifying an image at a location in a texture atlas indicated by said texture atlas index; adding the identified image to the plurality of final images. In some embodiments generating a plurality of final images and modifying the plurality of final images may happen simultaneously such that when an image is generated it is immediately modified before being added to the plurality of final images.

Typically, the encoding of the texture atlas comprises entropy encoding. In particular, the representative (e.g. centroid) images may be encoded using an entropy coding process and the deltas may be encoded using an entropy coding process. The entropy coding process may comprise a Huffman coding process.

In some embodiments, the aforementioned sections of a bitstream are arranged to be decoded separately (e.g. by separate computer devices or processing units), where this enables a parallelised method of decoding the bitstream so as to speed up a process of decoding and rendering a scene.

Alternatives and modifications It will be understood that the present invention has been described above purely by way of example, and modifications of detail can be made within the scope of the invention.

The representation is typically arranged to provide an extended reality (XR) experience (e.g. a representation that is useable to render a XR video). The term extended reality (XR) covers each of virtual reality (VR), augmented reality (AR), and mixed reality (MR) and it will be appreciated that the disclosures herein are applicable to any of these technologies.

The representation may be encoded into, and/or transmitted using, a bitstream, which bitstream typically comprises point data for one or more points of the three-dimensional representation. The point data may be compressed or encoded to form the bitstream. The bitstream may then be transmitted between devices before being decoded at a receiving device so that this receiving device can determine the point data and reform the three-dimensional representation (or form one or more two-dimensional images based on this three-dimensional representation). In particular, the encoder 13 may be arranged to encode (e.g. one or more points of) the three-dimensional representation in order to form the bitstream and the decoder 14 may be arranged to decode the bitstream to generate the one or more two-dimensional images.

In some embodiments, the computer device is arranged to reorder the texture atlas. In particular, the computer device may reorder the texture atlas based on the determined representative texture patches. Specifically, the computer device may reorder the texture atlas so that the representative texture patches are grouped together (where this enables the representative texture patches and the delta patches to be readily distinguished) and/or the computer device may reorder the texture atlas to group togetherthe texture patches associated with each centroid, e.g. so that the texture atlas is organised into groups that each contain a representative texture patch followed by one or more delta patches associated with said representative texture patch.

Reordering the texture atlas in this manner may involve updating the references of one or more texture points of an associated three-dimensional representation. In particular, the computer device may determine a correspondences list that defines original indexes for each texture patch (before the reordering) and updated indexes for each texture patch (after the ordering) where the correspondences list may then be used to update references associated with texture points within the three-dimensional representation.

Reordering the texture atlas in this way may also be used to move similar texture patches close to each other, where this may enable more efficient compression of the texture atlas.

Typically, each texture atlas is associated with a separate three-dimensional representation and these texture atlases are processed separately. In some embodiments, the plurality of images includes images from a plurality of texture atlases, where this enables the efficient encoding of texture atlases for different three-dimensional representations (e.g. a shared set of representative images may be determined for a plurality of texture atlases associated with a plurality of three-dimensional representations where this set of representative images can then be signalled once at the start of a bitstream comprising this plurality of three-dimensional representations).

In some embodiments, the computer device is able to reorder the texture patches in a first texture atlas in dependence on an order of texture patches in a second texture atlas. Such reordering may be used to similarly order similar texture patches in different texture patches in order to enable efficient encoding of these texture atlases with a second texture atlas being encoded based on a first texture atlas and differences between the texture patches in the second and first texture atlases. Reordering one or both of these texture atlases can reduce the size of these differences.

Reference numerals appearing in the claims are by way of illustration only and shall have no limiting effect on the scope of the claims.

Claims

47 Claims 1. A method of processing a three-dimensional representation of a scene, the method comprising: identifying a first image in a set of images, the first image being referenced by a first point of a three-dimensional representation; identifying a second image in the set of images, the second image being referenced by a second point of a three-dimensional representation; determining a similarity value associated with a similarity of the first image and the second image; and based on the similarity value: modifying a reference of the first point; and/or processing the first image.
2. The method of any preceding claim, wherein the first point and the second point are each points of a first three-dimensional representation
3. The method of any preceding claim, wherein the first point is a point of a first three-dimensional representation and the second point is a point of a second three-dimensional representation, preferably wherein the first and second three-dimensional representations are successive three-dimensional representations.
4. The method of any preceding claim, wherein processing the first image comprises removing the first image from the set of images.
5. The method of any preceding claim, wherein processing the first image comprises modifying the first image, preferably comprising one or more of: modifying a pixel value of the first image; and modifying the first image so that it references the second image.
6. The method of any preceding claim, wherein modifying the reference comprises modifying the reference of the first point so as to reference the second image.
7. The method of any preceding claim, wherein processing the first image comprises determining a representative image that is representative of a subset of images that includes the first image and the second image; preferably, wherein: modifying the reference comprises modifying the reference of the first point so as to reference the representative image; and/or processing the first image comprises modifying the first image to reference the representative image.
8. The method of claim 7, wherein the representative image comprises an image from the set of images, preferably wherein the representative image comprises the second image.
9. The method of claim 7 or 8, comprising determining a delta for the first image, the delta relating to a difference between the first image and the representative image, preferably wherein modifying the reference of the first point comprises modifying the reference so as to reference the delta; and/or processing the first image comprises modifying the first image to comprise the delta.
10. The method of claim 9, comprising storing the delta in dependence on a value of the delta exceeding a threshold value, preferably wherein the threshold value depends on a distance of the first point to a viewing zone of the three-dimensional representation.
11. The method of any of claims 7 to 10, wherein the subset of images includes images associated with different three-dimensional representations
12. The method of any preceding claim, wherein determining a similarity value comprises determining a similarity value between corresponding pixels of the first image and the second image, preferably wherein determining a similarity value comprises determining a difference between corresponding pixels of the first image and the second image and/or a variance of the differences of corresponding pixels of the first image and the second image.
13. The method of any preceding claim, comprising: selecting a sample set of images from the set of images, preferably wherein the sample set of images is selected by random or stratified sampling, and the first and second image are part of the sample set of images; and/or selecting a sample set of pixels from each of the images, preferably wherein the sample set of pixels is selected by random or stratified sampling, and determining the similarity value comprises determining a similarity between the sampled set of pixels.
14. The method of any preceding claim, comprising determining: a plurality of image segments for each of the images; determining a similarity between segments of the images; and modifying the reference and/or processing the first image based on a similarity between the image segments; preferably, wherein modifying the reference and/or processing the first image comprises indicating a plurality of image segments to be associated with the first point, preferably indicating an order for combining the image segments.
15. The method of any preceding claim, comprising: determining, for each image in the set of images, a characteristic set of pixels; determining attribute values for each characteristic set of pixels; determining at least one subset of similar images based on the determined attribute values; and generating a representative image for each subset of similar images; preferably, comprising, for each subset of similar images, determining a delta between the representative image of said subset and each image in said subset.
16. The method of claim 15, comprising adding each representative image to the set of images and/or replacing one or more images within the set of images with a representative image.
17. The method of any preceding claim, comprising sorting the images into a plurality of clusters of similar images, preferably comprising: sorting the images based on a k-means clustering algorithm, preferably wherein determining the similarity value comprises determining that the first image and the second image are located within the same cluster; and/or determining a centroid for each cluster, preferably comprising generating a representative image for each cluster based on the determined centroids.
18. The method of claim 17, comprising: determining, for each image, a plurality of components of the image; and sorting each component of the images into a plurality of clusters of separate components; preferably, wherein the components comprise one or more of: red, green, blue (RGB) components; red, green, blue, alpha (RGBA) components; and red, green, blue (RGB) components.
19. The method of any preceding claim, comprising forming a bitstream comprising the first point. 10
20. A method of determining an attribute value for a point in a three-dimensional representation of a scene, the method comprising: identifying a point of the three-dimensional representation, the point comprising a location; identifying an image referenced by the point; and identifying a plurality of attribute values associated with the point based on the referenced image.
21. The method of claim 20, comprising: identifying a delta associated with the identified point and/or the referenced image; and determining the plurality of attribute values based on the identified delta, preferably identifying a plurality of attribute values in the image and modifying said plurality of attribute values based on the identified delta, more preferably combining said plurality of attribute values with the identified delta.
22. An apparatus for processing a three-dimensional representation of a scene, the apparatus comprising: means for identifying a first image in a set of images, the first image being referenced by a first point of a three-dimensional representation; means for identifying a second image in the set of images, the second image being referenced by a second point of a three-dimensional representation; means for determining a similarity value associated with a similarity of the first image and the second image; means for processing the first image based on the similarity value; and means for: modifying a reference of the first point; and/or processing the first image.
23. An apparatus for determining an attribute value for a location in a three-dimensional scene, the apparatus comprising: means for identifying a point of the three-dimensional representation, the point comprising a location; means for identifying an image referenced by the point; and means for identifying a plurality of attribute values associated with the point based on the referenced image.
24. A bitstream comprising one or more points and/or images modified using the method of any preceding claim.
25. A bitstream comprising: one or more points of one or more three-dimensional representations; and a database of images; wherein the one or more points include one or more texture points that reference images in the database of images; preferably wherein: the bitstream further comprises: one or more deltas, wherein each delta is arranged to be combined with an image referenced by a point to provide a plurality of attribute values for that point; and/or the bitstream comprises one or more flags that indicate one or more of: whether to use deltas to determine attribute values for each point; a quantisation level associated with the delta values; and whether the database of images is associated with a plurality of three-dimensional representations.The following