WO2025024152A1 - Photo coding operations for different image displays - Google Patents
Photo coding operations for different image displays Download PDFInfo
- Publication number
- WO2025024152A1 WO2025024152A1 PCT/US2024/038021 US2024038021W WO2025024152A1 WO 2025024152 A1 WO2025024152 A1 WO 2025024152A1 US 2024038021 W US2024038021 W US 2024038021W WO 2025024152 A1 WO2025024152 A1 WO 2025024152A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- file
- metadata
- primary
- container
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/85—Assembly of content; Generation of multimedia applications
- H04N21/854—Content authoring
- H04N21/85406—Content authoring involving a specific file format, e.g. MP4 format
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/46—Embedding additional information in the video signal during the compression process
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/70—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/2343—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
- H04N21/234309—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by transcoding between formats or standards, e.g. from MPEG-2 to MPEG-4 or from Quicktime to Realvideo
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/2343—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
- H04N21/23439—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements for generating different versions
Definitions
- the present disclosure relates generally to images. More particularly, an embodiment of the present disclosure relates to photo coding operations for different image displays.
- Display techniques have been developed to support transmitting and rendering (e.g., photo, etc.) image content based on specific image formats.
- JPEG image encoders and decoders may support image content coded in a JPEG image format.
- Other image encoders and decoders may support image content coded in different image formats.
- a consumer or end user device such as a handheld device typically is installed or configured with a limited set of image codecs each of which may support a specific image format in a limited set of image formats.
- image codecs each of which may support a specific image format in a limited set of image formats.
- the device will likely be incapable of finding a suitable image decoder to decode and help render the image content.
- the rendered image content may comprise incorrect interpretation or representation of the received image content and produce visible artifacts in colors and luminance values.
- FIG. 1 depicts an example process of a delivery pipeline
- FIG. 2A and FIG. 2B illustrate example image codec architectures
- FIG. 3A through FIG. 3C illustrate example coding syntaxes or structures of an image (container) file, an APP11 marker segment or data box(es);
- FIG. 4A and FIG. 4B illustrate example process flows
- FIG. 5 illustrates a simplified block diagram of an example hardware platform on which a computer or a computing device as described herein may be implemented
- FIG. 6A and FIG. 6B depict example (image/photo) capture devices
- FIG. 6C depicts an example image processing device
- FIG. 7 depicts an example image/photo recipient device
- FIG. 8 depicts example image/photo packaging operations
- FIG. 9 illustrates example image metadata compression operations.
- Techniques as described herein can be used to package or encode photos or still images of primary and/or non-primary image formats in image (container) files with image metadata to enable downstream recipient devices to reconstruct photos or still images of different formats, different dynamic ranges, different color spaces, different bit depths, etc.
- the image (container) files may be designated to carry photos/images (e.g., JPEG images, etc.) of the primary image formats (e.g., JPEG, etc.).
- Attendant (data) segments of the image (container) files such as APP11 segments or the like may be used to carry photos/images (e.g., non- JPEG images, etc.) of the non-primary image formats (e.g., non- JPEG, etc.) and/or the image metadata.
- photos/images e.g., non- JPEG images, etc.
- non-primary image formats e.g., non- JPEG, etc.
- the image metadata may carry operational parameters or values thereof that have been optimized by upstream devices. These optimized operational parameters or values may be used in image reconstruction operations such as forward and/or backward reshaping operations.
- the reconstructed photos or still images generated from the photos/images in the image (container) files can be optimized for rendering on image displays of the downstream recipient devices using the image metadata carried in the same image (container) files.
- Example embodiments described herein relate to packaging and encoding photos in image (container) files.
- a primary image of a first image format is encoded into an image file designated for the first image format.
- a non-primary image of a second image format is encoded into one or more attendant segments of the image file.
- the second image format is different from the first image format.
- a display image derived from a reconstructed image is caused to be rendered with a recipient device of the image file. The reconstructed image is generated from one of the primary image or the non-primary image.
- Example embodiments described herein relate to decoding and rendering photos in image (container) files for image reconstruction and rendering.
- An image file designated for a first image format is received.
- the image file is encoded with a primary image of the first image format.
- a non-primary image of a second image format is decoded from one or more attendant segments of the image file.
- the second image format is different from the first image format.
- a display image derived from a reconstructed image is rendered on an image display. The reconstructed image is generated from the non-primary image decoded from the image file.
- FIG. 1 depicts an example process of an image delivery pipeline (100) showing various stages from image capture/generation to image displays of different types or capabilities.
- Example image displays may include, but are not limited to, high dynamic range (HDR) image displays, standard dynamic range (SDR) image displays, image displays operating in conjunction with end-user or personal computers, mobile devices, home theaters, TVs, headmounted display devices, wearable display devices, etc.
- HDR high dynamic range
- SDR standard dynamic range
- An image frame (102) is captured or generated using image generation block (105).
- the image frame (102) may be digitally captured (e.g., by a digital camera or an image signal processor (ISP) therein operating in a particular mode or camera setting, etc.) or generated by a computer (e.g., using computer animation, etc.) to provide image data (107).
- the image data (107) may be (e.g., automatically with no human input, manually, automatically with human input, etc.) edited or transformed by post-ISP image processing operations into a post-ISP or post-ISP processed images before being passed to the next processing stage/phase in the image delivery pipeline (100).
- the image data (107) is then provided to a processor for post-production image processing (115).
- the post-production image processing (115) may include (e.g., partly or fully automatically, partly or fully manually, with an image enhancement or processing application running on a computing device, image cropping, visual effects, global or local tone and/or color adjustment, etc.) adjusting or modifying colors or brightness in an image to enhance the image quality or achieve a particular appearance for the image in accordance with the image content creator’s creative intent.
- the post-production image processing (115) operates on the image data (107) to yield a release version of one or more images to be coded into an image signal such as an image (container) file.
- one or more primary and/or non-primary images (117) - e.g., a single JPEG image as the primary image, a primary image plus zero, one or more nonprimary images, etc. - may be coded into an image signal or image (container) file.
- the primary and/or non-primary images (117) may have been forward or backward reshaped, for example in the post-production image processing (115), for the purpose of generating relatively efficient relatively high quality images - as compared with input or source images from ISP or post-ISP image processing operations - for coding into the image signal or image (container) file.
- the coding block (120) may receive the primary and/or non-primary images (117) as a reshaped image.
- a primary or non-primary image (117) encoded in the image (container) file may not represent a reshaped image.
- the coding block (120) and/or post-production image processing (115) may implement a codec framework such as illustrated in FIG. 2A.
- the primary and/or non-primary images (117) may be compressed/encoded by the coding block (120) into an image signal or image (container) file (122) (e.g., a coded bitstream, a binary file, a JPEG image file, etc.).
- the coding block (120) may include one or more image encoders or codecs, such as those related to industry standard or proprietary specification delivery formats, to generate the image (container) file (122).
- the image (117) preserves the content creator’s intent (also referred to as “artist intent”) with which the image is generated in the post-production image processing (115).
- the image container file (122) is an image signal in compliance with one or more (image) coding or coding syntax specifications.
- the image signal or image (container) file (122) may further include, or may be coded with, image metadata (117-1) including but not limited to composer metadata.
- the image metadata (117-1) may be generated by the coding block (120) and/or the post-production block (115).
- the composer metadata e.g., forward and/or backward reshaping mappings, lookup tables, etc.
- downstream decoders can be used by downstream decoders to perform forward/backward reshaping (e.g., tone mapping, inverse tone mapping, etc.) on the primary and/or non-primary images (117) in order to generate one or more other images including but not limited to display images that may be relatively accurate for rendering on one or more other image displays in addition to one or more reference image displays with which the primary and/or non-primary images (117) are optimized to be rendered.
- forward/backward reshaping e.g., tone mapping, inverse tone mapping, etc.
- reshaping as described herein may refer to image processing operations that convert between different EOTFs, different color spaces, different dynamic ranges, and so forth.
- backward or inverse reshaping refers to image processing operations that convert re-quantized images back to the original EOTF domain (e.g., gamma or PQ, etc.) or to a different EOTF domain, for further downstream processing, such as the display management.
- the image container file (122) is further encoded with portion(s) of the image metadata (117-1) including but not limited to specific display management (DM) metadata portion(s) that can be used by the downstream decoders to perform specific display management operations on decoded or backward reshaped images for specific image displays to generate display images optimized for rendering on the specific image displays.
- DM display management
- Examples of display management operations and corresponding DM metadata portions in (image) metadata are described in U.S. Pat. App. Pub. No. 2022/0164931, “Display management for high dynamic range images,” by Robin Atkins, Jaclyn Anne Pytlarz and Elizabeth G. Pieri, the contents of all of which are incorporated by reference herein in entirety.
- the image (container) file (122) is then delivered downstream to receivers or recipient devices such as decoding and playback devices, media source devices, media streaming client devices, television sets (e.g., smart TVs, etc.), set-top boxes, movie theaters, and the like.
- the image (container) file (122) is decoded by decoding block (130) to generate a decoded image 182, which may be the same as one of the primary and/or non- primary images (117), subject to quantization errors generated in compression performed by the coding block (120) and decompression performed by the decoding block (130).
- the image metadata (117-1) - including but not limited to the composer metadata - transmitted in the image (container) file with the primary and/or non-primary images (117) to a recipient device may be generated by the coding block (120) and/or the postprocessing image processing (115) automatically, in real time, in offline processing, etc.
- the image data (117-1) is provided to the coding block (120) and/or the postprocessing image processing (115) for composer metadata generation.
- the composer metadata generation may automatically generate composer metadata with no or little human interaction.
- the composer metadata can be used to provide or generate image content respectively optimized for a wide variety of display devices or image displays.
- the composer metadata can be used to generate the other images that are unavailable or unsent in the image (container) file (122).
- techniques as described herein can be used to generate or compose image content specifically and respectively optimized for non-reference image displays, so long as the primary and/or non-primary images (117) and the composer metadata are available in the image (container) file (122).
- the image content for these non-reference image displays can be optimized to explore the full or relatively large extent of display capabilities of these non- reference displays.
- the DM metadata in the image metadata can be used by the downstream decoders to perform display management operations on the backward reshaped images generate device- specific display images for rendering on a wide variety of display devices or image displays.
- the receiver can render the decoded image on the image display (140).
- the receiver can extract the composer metadata (e.g., lookup table or LUT based composer metadata, polynomial based composer metadata, multiple-channel, multiple-regression (MMR) composer data, tensor product B-spline (TPB) composer metadata, non-TPB composer metadata, etc.) from the image (container) file (122) and use the composer metadata to compose an inversely /backward reshaped image (132) (also referred to as a constructed or reconstructed image) based at least in part on one of the decoded images (182) and/or the composer metadata.
- the composer metadata e.g., lookup table or LUT based composer metadata, polynomial based composer metadata, multiple-channel, multiple-regression (MMR) composer data, tensor product B-spline (TPB) composer metadata, non-TPB composer metadata, etc.
- the receiver can extract the DM metadata from the image (container) file (122) and apply DM operations (135) on the reconstructed image (132) based on the DM metadata to generate a corresponding display image (137) for rendering on the (e.g., non-reference, etc.) display device (140-1).
- Image displays for which optimized display images can be generated for rendering under techniques as described herein may include image displays of various dynamic ranges.
- dynamic range may relate to a capability of the human visual system (HVS) to perceive a range of intensity (e.g., luminance, luma) in an image, e.g., from darkest blacks (darks) to brightest whites (highlights).
- HVS human visual system
- HDR high dynamic range
- images where n ⁇ 8 are considered images of standard dynamic range, while images where n > 8 may be considered images of enhanced dynamic range.
- PQ refers to perceptual luminance amplitude quantization. The human visual system responds to increasing light levels in a very nonlinear way.
- a human’s ability to see a stimulus is affected by the luminance of that stimulus, the size of the stimulus, the spatial frequencies making up the stimulus, and the luminance level that the eyes have adapted to at the particular moment one is viewing the stimulus.
- a perceptual quantizer function maps linear input gray levels to output gray levels that better match the contrast sensitivity thresholds in the human visual system.
- SMPTE High Dynamic Range EOTF of Mastering Reference Displays
- a reference electro-optical transfer function (EOTF) for a given display characterizes the relationship between color values (e.g., luminance) of an input image signal to output screen color values (e.g., screen luminance) produced by the display.
- ITU Rec. ITU-R BT. 1886 “Reference electro-optical transfer function for flat panel displays used in HDTV studio production,” (March 2011), which is incorporated herein by reference in its entirety, defines the reference EOTF for flat panel displays.
- Displays that support luminance of 200 to 1,000 cd/m 2 or nits typify a lower dynamic range (LDR), also referred to as a standard dynamic range (SDR), in relation to HDR.
- LDR lower dynamic range
- SDR standard dynamic range
- Further EOTF examples are defined or described in SMPTE 2084 and Rec.
- ITU-R BT.2100 “Image parameter values for high dynamic range television for use in production and international programme exchange,” (06/2017), which are incorporated herein by reference in its entirety.
- FIG. 2A and FIG. 2B illustrate example image codec architectures. More specifically, FIG. 2A illustrates an example encoder-side codec architecture, which may be implemented with one or more computing processors in an upstream image encoder, etc.
- FIG. 2B illustrates an example decoder-side codec architecture, which may also be implemented with one or more computing processors in a downstream image decoder (e.g., a receiver, etc.), etc.
- an original image or photo such as one generated by an ISP of a camera or a post-ISP image processing tool is received as input.
- This original image or photo may be used to derive or generate primary and/or non-primary images (117) to be included or contained in an image (container) file (122 of FIG. 1).
- the primary and/or non-primary images (117) may be referred herein as a primary image (in the image (container) file).
- an image generator 162 - which may represent or include one or more image conversion or mapping tools, etc. - is used to generate the primary and/or non-primary image(s) (117) that are derived from, or corresponding to, the source image.
- the image generator (162) may perform forward and/or inverse tone mapping operations.
- an image metadata generator 150 receives some or all of the primary and/or non-primary images (117) as input, generates image metadata (117-1) such as composer metadata, DM metadata, and so forth.
- a compression block 142 compresses/encodes the primary and/or non-primary images (117) in image data 144 carried or included in an image signal or image (container) file (122) of FIG. 1.
- the image metadata (117-1) (denoted as “rpu”), as generated by the image metadata generator (150), may also be included or encoded (e.g., by the coding block (120) of FIG. 1, etc.) into the image signal or image (container) file (122).
- the image metadata (117-1) may be separately carried in designated coding segments of the image signal or image (container) file (122). These designated coding segments may be separate from specific coding segment(s) in the image signal or image (container) file (122) that are used to carry or include the primary and/or non-primary images (117).
- the image metadata (117-1) may be encoded in (designated) image metadata segments or syntax elements in the image (container) file (122), while the primary and/or non-primary images (117) are encoded in (designated) image data segment(s) (or corresponding syntax element(s)) in the same image signal or image (container) file (122).
- the composer metadata in the image metadata (117- 1) in the image signal or image (container) file (122) can be used to enable downstream receivers to (e.g., forward, backward, inverse, etc.) reshape or map the primary and/or non-primary images (117) into one or more reconstructed images (e.g., approximating or the same as the non-primary image(s) (148)) for one or more other image displays other than the reference image displays supported by the primary and/or non-primary images (117).
- Example image displays may include, but are not necessarily limited to only, any of: an image display with similar display capabilities to those of a reference display, an image display with different display capabilities from those of a reference display, an image display with additional DM operations to map reconstructed image content to display image content for the image display, etc.
- the image signal encoded with the primary and/or non-primary images (117) and the image metadata (117-1) are received as input.
- a decompression block 154 decompresses/decodes compressed image data in the image signal or image (container) file (122) into a decoded image (182).
- the decoded image (182) may be the same as one of the primary and/or non-primary images (117) subject to quantization errors in the compression block (142) and in the decompression block (154).
- the decoded image (182) may be outputted in an output image signal 156 (e.g., over an HDMI interface, over a video link, etc.) to and rendered on a (reference) image display.
- an image reshaping block 158 extracts the image metadata (117-1) such as the composer metadata (or backward reshaping metadata) from the input image signal or image (container) file (122), constructs (e.g., backward, inverse, forward, etc.) reshaping functions based on the extracted composer metadata in the image metadata, and performs reshaping operations on the decoded images (182) based on the reshaping functions to generate one or more reshaped images (132) (or reconstructed images) for one or more other image displays in addition to the one or more reference image displays.
- DM operations may not be performed by a receiver to simplify device operations.
- DM metadata may be transmitted with the composer metadata and the primary and/or non-primary images (117) in the image signal or image (container) file (122) to the receiver.
- Display management operations specific to an image display with different display capabilities from those of the reference displays may be performed on the reshaped or reconstructed image (132) based at least in part on the DM metadata in the image metadata (117-1), for example to generate a corresponding device- specific display image to be rendered on the actual image display.
- an SDR image including but not limited to a forward reshaped SDR image may be included or packaged as a primary or non-primary image in an image (container) file as described herein.
- Other images including but not limited to HDR images may be included or packaged with, or reconstructed (with composer metadata) from, the SDR image.
- an image of another dynamic range other than SDR may be included or packaged as a primary or non-primary image in an image (container) file as described herein.
- image metadata and/or other composer metadata may be included or packaged in the same image (container) file for constructing other images in various image formats and/or in various dynamic ranges and/or in various color spaces and/or in various precisions and/or bit depths.
- the image file encoded by the image encoder may be signaled, transmitted, or otherwise directly or indirectly delivered to a downstream recipient device such as an image decoder.
- the image decoder may decode, from the image file, the (encoded or compressed) photo/image (payload) data and image metadata using the same coding syntax.
- the specification may provide a standard-based or proprietary specification of some or all syntax elements (or coding segments) constituting the coding syntax relating to including or packaging photo/image (payload) data and image metadata in image files.
- Example image (container) file specifications as described herein may include, but are not necessarily limited to only, any of: ISO/IEC 10918-1:1994, Information Technology - Digital compression and coding of continuous-tone still images: Requirements and guidelines (similarly, and later defined in ISO/IEC 18477-1:2020-05, Information Technology - Digital compression and coding of continuous-tone still images - Part 1: Core coding system specification); ISO/IEC 10918-4:1999, Information Technology - Digital compression and coding of continuous-tone still images: Registration of JPEG profiles, SPIFF profiles, SPIFF tags, SPIFF colour spaces, APPn markers, SPIFF compression Types and Registration Authorities (REGAUT); ISO/IEC 10918-5:2013, Information Technology - Digital compression and coding of continuous-tone still images: JPEG File Interchange Format (JFIF); CIPA DC-008-2019 / JEITA CP-3451E, Standard of the Camera & Imaging Products Association, Exchangeable image file format for digital still cameras
- FIG. 3A illustrates an example relatively high level coding syntax/segment structure of an image (container) file.
- the coding syntax/segment structure may represent, but is not necessarily limited to only, a sequential and progressive coding syntax used to support sequential DCT-based, progressive DCT-based and lossless modes of image codec operations.
- the coding syntax/segment structure of the image file of FIG. 3A includes a number of syntax elements or coding segments to support including/packaging an JPEG image into the image (container) file along with image metadata.
- these syntax elements or segments may include APP11 marker syntax elements or segments in the image (container) file. Some or all of these APP11 marker syntax elements or segments can be used to carry photo or image (pay load) data for one or more images such as one or more HEVC or AVI images.
- a conditional marker syntax element or segment - such as a restart marker number m (RSTm) - may be omitted or excluded from the coding structure of the image (container) file of FIG. 3A; for example, in some operational scenarios, (scan) restart may not be enabled in decoding operations.
- RSTm restart marker number m
- a first subset of syntax elements or segments such as a start-of-image (SOI) syntax element or segment and an end-of-image (EOI) syntax element or segment in FIG. 3A may represent the first or top level (syntax elements or segments) of the coding syntax for the image (container) file.
- SOI start-of-image
- EI end-of-image
- a second subset of syntax elements or segments such as APP1 or EXIF marker syntax element or segment, optional APP0 or JFIF marker syntax element or segment, APP 11 syntax elements or segments, DQT or quantization table syntax element(s) or segment(s) and a start-of- frame (SOF) syntax element or segment in FIG. 3A may represent the next or second level (syntax elements or segments) of the coding syntax for the image (container) file.
- the second subset of syntax elements or segments at the second level of the coding syntax may be used to contain or include one or more scans (e.g., in decoding operations, etc.), which may be preceded by specific table(s) such as DQT table(s).
- a define-number-of-lines marker (DNL) syntax element or segment may be excluded or absent from the image (container) file (e.g., after syntax elements or segments defining or specifying the first scan, etc.).
- a third subset of syntax elements or segments such as Huffman table (DHT) syntax element(s) or segment(s) and SOS (start-of-scan, last-1) syntax element(s) or segment(s) in FIG. 3A may represent the third level (syntax elements or segments) of the coding syntax for the image (container) file.
- the SOS syntax element(s) or segment(s) may be preceded by specific tables such as the Huffman table (DHT) syntax element(s) or segment(s).
- a fourth subset of syntax elements or segments such as ECS (entropy coded segment(s), last-1) syntax element(s) or segment(s) in FIG.
- 3A may represent the fourth level (syntax elements or segments) of the coding syntax for the image (container) file.
- DAC Define arithmetic conditioning
- DRI Depleted Rate indicator
- COM Comment
- application segments 0, 1 and 11 are included in the image (container) file.
- An example list of application-specific markers (APPn) is described or specified in ISO/IEC 10918-4:1999/Amd.l:2013 (E) as already mentioned.
- a specific ordering as illustrated in FIG. 3 A may be used to package or include some or all of one or more specific tables (e.g., APP1, APP0, APP11, DQT, etc.) at (or before) the start of a frame.
- the specific order may be (e.g., explicitly, by default, precisely, etc.) implemented, enforced or followed by image encoders and/or image decoders as a writing and/or reading order of these coding syntaxes or segments into or from an image (container) file for the purpose of supporting accelerated discovery of image (container) files coded in a coding syntax as described herein or some or all specific contents packaged or included in these files.
- an image decoder or a de -packager used to discover or decode contents of an image (container) file is to support or tolerate other orderings of image (payload) data and/or image metadata carried in the image (container) file including but not limited to other orderings of one or more specific tables application-marker segments.
- an image (container) file as described herein may carry, include or package only a single APP11 marker segment.
- the coding structure of the image (container) file or syntax elements or segments therein may be signaled (from encoder to decoder), identified or indicated with specific extension markers in the image (container) file.
- An image (container) file as described herein may be used to support carrying and/or constructing images of multiple types depicting the same visual semantic content (e.g., characters, objects, motions, visual background, etc.) as the primary (or main) image designated to be carried in the image (container) file. Some or all of these images that are carried or that can be constructed from the image (container) file may be directly or indirectly derived or originated from the same source image such as an image or photo generated by an image signal processor of a camera with specific settings.
- visual semantic content e.g., characters, objects, motions, visual background, etc.
- the image (container) file may be a JPEG image file that includes a primary (or main) image file component such as ECS segment(s) of FIG. 3A to carry, store or contain a (coded) JPEG image.
- images of multiple types including but not limited to the JPEG image may be carried or constructed based on image data and/or image metadata included in the JPEG image file.
- the image (container) file in addition to the primary (or main) image file component carrying the JPEG image, includes one or more attendant data components such as APP11 (e.g., as shown in FIG. 3A, etc.) syntax element(s) or segment(s) used to carry, store or contain one or more HEVC images, and/or one or more AV1/HDR still images, and/or image metadata (“rpu”), etc.
- the image metadata may be applied to each of some or all of the HEVC or AV1/HDR still images to generate corresponding (e.g., SDR, HDR, etc.) display images.
- These corresponding display images generated from the images included in the image (container) file may be specifically optimized - using specific image processing operations and/or specific operational parameters defined or included in the image metadata in the image (container) file - for specific image displays to relatively fully taking advantage of these image displays’ respective capabilities.
- One or more syntax elements or segments such as APP1 may be included in the image (container) file to describe specific characteristics of the JPEG image.
- Specific bit depth(s), chroma format(s) and color space(s) for images already included in the image (container) file and/or additional images that may be composed or generated based at least in part on the included images along with the included image metadata can be defined or specified in the image metadata carried or stored in the image (container) file.
- the image (container) file includes one or more attendant data components such as APP11 (e.g., as shown in FIG. 3A, etc.) syntax element(s) or segment(s) used to carry, store or contain a first image metadata portion of (overall) image metadata (“rpu”), zero, one or more HEVC images, and/or zero, one or more AV1/HDR still images, and/or a second image metadata portion of the (overall) image metadata, etc.
- APP11 e.g., as shown in FIG. 3A, etc.
- rpu first image metadata portion of (overall) image metadata
- rpu zero, one or more HEVC images, and/or zero, one or more AV1/HDR still images
- second image metadata portion of the (overall) image metadata etc.
- the first image metadata portion may be applied to the JPEG image to generate first corresponding (e.g., SDR, HDR, etc.) display images
- the second image metadata portion may be applied to each of some or all of the HEVC or AV1/HDR still images if present in the image (container) file to generate second (e.g., SDR, HDR, etc.) display images.
- These display images may be specifically optimized - using specific respective image processing operations and/or specific respective operational parameters defined or included in the respective image metadata portions of the image data in the image (container) file - for specific image displays to relatively fully taking advantage of these image displays’ respective capabilities.
- one or more syntax elements or segments such as APP1 (e.g., as shown in FIG. 3A, etc.) EXIF may be included in the image (container) file to describe specific characteristics of the JPEG image.
- Specific bit depth(s), chroma format(s) and color space(s) for images already included in the image (container) file and/or additional images that may be composed or generated based at least in part on the included images along with the included image metadata can be defined or specified in the image metadata carried or stored in the image (container) file.
- the term “HEVC image” may refer to a HEVC MainlO Still Picture image as defined in Recommendation ITU-T H.265/ISO/IEC 23008-2 as referenced herein.
- the HEVC image includes or carries HEVC VUI parameters.
- the HEVC image may comply or may be consistent with its HEVC VUI (e.g., settings, values, etc.) including but not necessarily limited to only some or all of: video range, color primaries, transfer characteristic, color matrix, and chroma sampling locations.
- AV 1 image or “AV 1/HDR image” may refer to an AV 1 Main or High Still Picture image as defined in AV 1 Bitstream & Decoding Process Specification as referenced herein.
- the AV 1/HDR image includes or carries AV 1 color description parameters.
- the AV1/HDR image may comply or may be consistent with its AVI parameters (e.g., settings, values, etc.) including but not necessarily limited to only some or all of: video range, color primaries, transfer characteristic, color matrix, and chroma sampling position.
- an image (container) file such as a JPEG image file may carry HEVC and/or AV1/HDR image(s) along with a JPEG image.
- image (container) file such as a JPEG image file
- other types of images - in addition to or in place of HEVC and/or AV 1/HDR image(s) - may be carried with a JPEG image.
- zero, one or more HEIC image may be carried with a JPEG image in addition to or in place of any HEVC and/or AV1/HDR image(s).
- HEIC image may refer to a High Efficiency Image File Format image that complies with a specific specification related to HEVC such as defined in ISO/IEC 23008-12:2017, the contents of which are incorporated herein by reference in entirety.
- the HEIC image includes or carries HEVC VUI parameters.
- the HEIC image may comply or may be consistent with its HEVC VUI including but not necessarily limited to only some or all of: video range, color primaries, transfer characteristic, color matrix, and chroma sampling locations.
- An image (container) file as described herein may be specifically encoded by an image encoder to include image (payload) data and/or image metadata for the purpose of distributing visual semantic content of an original image or photo by way of images of multiple types that can be decoded or constructed - by a recipient image decoder of the image (container) file - from the image (payload) data and/or image metadata encoded in the image (container) file.
- the image encoder may encode the image (container) file with image (payload) data and/or image metadata pursuant to one or more standard based or proprietary specifications including but not necessarily limited to only some or all of: ISO/IEC 10918-1:1994; ISO/IEC 18477-1:2020-05; ISO/IEC 10918-4:1999; ISO/IEC 10918-5:2013; ISO/IEC 14496-12:2015;
- Not all parameters and fields (or their values) as defined in the (applicable) specifications may actually be used to encode the image (container) file.
- a subset of parameters/fields (or values) as defined in these specifications may be excluded or restricted out from the image (container) file.
- Some or all of such exclusions or restrictions of the subset of parameters/fields from the image (container) file may be explicitly signaled or indicated in the image (container) file by the image encoder to the recipient image decoder.
- the image (container) file includes, or is encoded with, at least one APP11 Marker with a specific identifier such as an identifier string having a specific string value such as ‘DI’ to indicate the present of the image (payload) data and/or the image metadata in the image (container) file to be used to construct or derive the other images and/or the different display images.
- a specific identifier such as an identifier string having a specific string value such as ‘DI’ to indicate the present of the image (payload) data and/or the image metadata in the image (container) file to be used to construct or derive the other images and/or the different display images.
- the image (container) file encoded by the image encoder may correspond to a specific (e.g., format, packaging, “0”, “1”, etc.) version of an applicable specification that specifies or define what image (payload) data and/or what image metadata are encoded in the image (container) file.
- the applicable specification with which the image (container) file is encoded may specify or define that the image (container) file such as a JPEG image file carries, or is encoded with, zero, one or more HEVC MainlO still pictures and/or one or more AV 1 Main or High encoded images, in addition to a JPEG image as the primary (or main) image for the image (container) file.
- the applicable specification may be specifically enhanced from one or more other standard based or proprietary specifications to include, specify or define specific coding operations and syntaxes for image/photo distribution of multiple image types using a single image file or a single image container file.
- HDR characteristics e.g., bit depth, chroma format, color space, etc.
- HDR characteristics e.g., bit depth, chroma format, color space, etc.
- a JPEG image is to be encoded or carried in the image (container) file as the primary (or main) image.
- the image encoder may include or invoke a primary image codec such as a JPEG image codec to perform JPEG image encoding operations.
- the image encoder or the JPEG image codec can perform image compression operations to generate JPEG compressed image data from a received (e.g., input, source, original, uncompressed, relatively less compressed, etc.) image.
- the JPEG compressed image data represents the JPEG image and may be encoded or included as (payload) data in a primary (or main) image data component of the image (container) file.
- the JPEG image may include a plurality of pixel values for a plurality of pixels or pixel locations (e.g., in an image frame, in a spatial array such as a two-dimensional pixel array, etc.).
- the JPEG image may be represented in a YCbCr color space with three color components Y, Cb and Cr.
- Each pixel value in the plurality of pixel values may contain luma (Y) and chroma (Cb/Cr) component pixel values of a respective pixel or pixel location in the plurality of pixels or pixel locations.
- Each (Y, Cb or Cr) component pixel value in a pixel value may be of a specific bit depth such as an 8-bit bit depth.
- the JPEG image may be sampled or subsampled to a chroma sampling (or subsampling) format 4:2:0 in accordance with centered chroma location sampling. This chroma location is based on a TIFF default used in many personal computer applications.
- another chroma sampling (or subsampling) format other than 4:2:0 and/or another chroma location sampling other than the centered chroma location sampling may be used to sample or subsample chroma component values.
- another color space other than YCbCr with different color components may be used to represent a primary image in an image (container) file as described herein.
- a component pixel value may be of another bit depth different from the 8-bit bit depth.
- a full (codeword) value range corresponding to all possible values of the bit depth may be used to encode or represent component (Y, Cb or Cr) pixel values in the primary (or main) image as described herein.
- component (Y, Cb or Cr) pixel values may be represented within the full range [0, 255] from a reference black (e.g., 0, etc.) to a reference white (e.g., 255, etc.).
- Chroma (Cb or Cr) component values may be represented within [0, 255] with 128 as a reference minimum color (e.g., a gray value, etc.) and 0/255 as reference maximum color(s).
- the JPEG image encoded in the image (container) file is derived from an original (e.g., received, input, source, etc.) JPEG image that includes an original or input APP1 (EXIF) Application Marker Segment 1.
- the JPEG image encoded in the image (container) file includes, or is accompanied in the same image (container) file with, an (e.g., equivalent, consistent, appropriate, etc.) APP1 (EXIF) Application Marker Segment 1 corresponding to the original or input APP1 (EXIF) Application Marker Segment 1 of the original JPEG image.
- the JPEG image encoded in the image (container) file may also include, or may also be accompanied in the same image (container) file with, an APP0 (JFIF) Application Marker Segment 1 that is (to be) used as appropriate and consistent with the JPEG image encoded in the image (container) file.
- an APP0 JFIF
- Application Marker Segment 1 that is (to be) used as appropriate and consistent with the JPEG image encoded in the image (container) file.
- the image (container) file as described herein may include image (payload) data representing additional images other than the primary (or main) image as well as image metadata that may be used with some or all of the image (payload) data to construct or derive other images such as images of different types from that of the primary (or main) image and/or different display images respectively optimized for rendering on different types of image displays.
- the image encoder may include or invoke one or more non-primary image codecs such as HEVC and/or AV1/HDR (and/or HEIC) image codec(s) to perform HEVC and/or AV1/HDR image encoding operations.
- non-primary image codecs such as HEVC and/or AV1/HDR (and/or HEIC) image codec(s) to perform HEVC and/or AV1/HDR image encoding operations.
- the image encoder or the non-primary image codecs can perform image compression operations to generate HEVC and/or AV1/HDR (and/or HEIC) compressed image data corresponding to the JPEG image in the primary data component of the image (container) file and/or corresponding to the received (e.g., input, source, original, uncompressed, relatively less compressed, etc.) image used to generate the JPEG file.
- image compression operations to generate HEVC and/or AV1/HDR (and/or HEIC) compressed image data corresponding to the JPEG image in the primary data component of the image (container) file and/or corresponding to the received (e.g., input, source, original, uncompressed, relatively less compressed, etc.) image used to generate the JPEG file.
- the HEVC and/or AV1/HDR (and/or HEIC) compressed image data represents HEVC and/or AV1/HDR (and/or HEIC) image(s) and may be encoded or included as (pay load) data in other image data components such as APP11 data segment(s) of the image (container) file in addition to or separate from the primary (or main) image data component of the image (container) file.
- Each of the HEVC and/or AV1/HDR (and/or HEIC) image(s) may include a plurality of pixel values for a plurality of pixels or pixel locations (e.g., in an image frame, in a spatial array such as a two-dimensional pixel array, etc.).
- Each (still image) of the HEVC and/or AV 1/HDR (and/or HEIC) image(s) represented or included in the APP11 segment(s) of the image (container) file may use the same type of image codec for encoding or decoding operation.
- the applicable specification may specify or define that some or all still images included (as extension image data) in the APP11 segment(s) of the image (container) file are coded using the same type of image codecs.
- the non-primary (or extension) images whose image (pay load) data is included in the APP11 segment(s) of the image (container) file are EITHER one or more HEVC images OR one or more AVI images.
- these APP11 segment(s) may not carry a mixture of (extension image data of) HEVC image(s) and AVI image(s).
- Image data (e.g., compressed image data, image payload, etc.) may be encoded in an YCbCr 4:2:0 image with 10 bits per color component or primary (Y, Cb or Cr).
- all non-primary images included in the image container file such as HEVC or AVI still images use one and only one type of transfer characteristic, either PQ or HLG.
- a first limited codeword range 64 - 940 may be used to represent Y codewords from the reference black to the reference white.
- a second limited codeword range 64 - 940 may be used to represent each of Cb or Cr codewords from the reference minimum color to the reference maximum color, with 512 as the gray value.
- the encoded/compressed image data (e.g., HEVC compressed image data, etc.) in the image container file may be carried as network abstract layer (NAL) unit data and written/stored in the image container file (or image signal) as H.265 Annex B Byte stream format.
- NAL network abstract layer
- Image data for one or more non-primary (or extension) images in addition to a primary (or main) image in an image (container) file may be formed from concatenating image (pay load) data carried within APP11 segments or coding syntax as illustrated in FIG. 3 A.
- These APP11 segments may be designated or included with specific segment markers such as a string value of ‘DI’ in accordance with the applicable image coding (syntax) specifications.
- the applicable image coding (syntax) specifications may specify or define application-marker segments such as APP11 for ‘Tables/miscellaneous.’
- An APP11 segment or syntax element may be located either at the start of a frame or the start of a scan in the frame.
- FIG. 3B illustrates an example syntax element or coding structure for an APP11 marker segment.
- the APP11 syntax element includes a plurality of parameters to be respectively coded in a plurality of component syntax elements as a part of the overall APP11 syntax element.
- These parameters or component syntax elements - representing different data fields in the APP11 marker segment - may be sequentially and contiguously ordered (with no null byte(s) and/or with no fill byte(s) - unless explicitly noted - in between successive parameters or syntax elements unless otherwise indicated) as shown in FIG. 3B.
- the plurality of parameters in the APP 11 marker segment or syntax element may include: an APP11 marker parameter (2- bytes); a Eength parameter (2-bytes); an ID string parameter (2-bytes); followed by a Null byte (1-byte); followed by a photo/image (pay load) data syntax element; etc.
- the photo/image (pay load) data syntax element includes the very first byte - as the very first (e.g., lower level, etc.) syntax element in the overall photo/image (payload) data syntax element - carrying a format/packaging version # parameter (1-byte).
- the format/packaging version # parameter in the photo/image (payload) data syntax element is set to a specific value such as one (1), then the format/packaging version # parameter is followed by a specific payload version (1 in the present example) syntax element. All (e.g., component, lower level, etc.) syntax elements constituting the overall photo/image (payload) data syntax element may be consistent with, or identified by, the indicated format/packaging version # parameter in accordance with the applicable specifications.
- the APP11 marker parameter in two bytes can be used to carry or specify a specific marker value such as OxFFEB to identifies this segment represents or carries (e.g., Dolby Image, specification defined, etc.) image (payload) data marker segments or syntax elements.
- a specific marker value such as OxFFEB
- the APP11 Length parameter in two bytes (or 16 bits) (e.g., immediately, as defined by the applicable specifications, etc.) following the APP11 marker parameter can be used to carry a value to indicate a length of the (e.g., entire, minus two bytes, present, etc.) APP11 marker segment.
- the length of the APP 11 marker segment may include the size of the plurality of parameters including any intervening Null byte(s) and the size of the photo/image (payload) data syntax element contained in the present APP11 marker segment alone and may exclude the two bytes of the APP11 marker (OxFFEB in the present example) itself.
- the ID String parameter in two bytes can be used to carry a special or specifically designated value of x4449 (corresponding to ASCII: ‘D’ ‘I’) to distinguish the present APP11 marker segment from any other APP11 marker segment(s) that are used for other purposes other than carrying photo/image (payload) data as specified or defined by the applicable specifications.
- a recipient decoding device can ignore or avoid using the APP11 marker segment for retrieving photo or image (pay load) data.
- the first byte of the photo/image (payload) data syntax element namely the format/packaging version # parameter, can be used to define or specify a specific payload version (e.g., “1”, etc.) syntax element that follows the format/packaging version # parameter in the photo/image (payload) data syntax element.
- the specific payload version syntax element may be used to store or carry actual photo/image (payload) data of a non-primary or extension image such as an HE VC or AV 1 image.
- all APP 11 marker segments to be encoded with any non-primary or extension photo/image (payload) data in the same image (container) file carry the same value for the format/packaging version # parameter specified for each of these APP11 marker segments.
- the format/packaging version # parameter in the APP11 marker segment or syntax element has a specific value ‘1’. Accordingly, a specific payload version syntax element such as a payload version 1 syntax element is used to encode or store actual non-primary or extension photo/image (payload) data.
- the payload version 1 syntax element logically includes one or more (data) boxes coded in respective syntax elements.
- the boxes or the respective syntax elements describe size(s) and location(s) of (e.g., non-primary or extension, HEVC, AVI, etc.) image data and/or image metadata (“rpu”) for primary and/or non-primary image data.
- the image metadata may include specific image metadata items or portions associated with, or corresponding to, specific image items or portions in the non-primary or extension image data such as HEVC or AV 1 still image data and in primary or main image data such as JPEG image data.
- an image metadata data item or portion associated with or corresponding to an image data item or portion refers to a part of the image metadata that is specifically designated to carry or include specific operational parameters for specific image processing operations that may be performed on the (associated or corresponding) image data item or portion.
- FIG. 3C illustrates an example coding syntax/structure for a (data) box or a corresponding syntax element in the one or more boxes included in the payload version 1 syntax element of the APP11 marker segment.
- the box includes a plurality of parameters to be respectively coded in a plurality of component syntax elements as a part of the overall box.
- These parameters or component syntax elements - representing different data fields in the box - may be sequentially and contiguously ordered (with no null byte(s) and/or with no fill byte(s) - unless explicitly noted - in between successive parameters or syntax elements unless otherwise indicated) as shown in FIG. 3C.
- the plurality of parameters in the box in the pay load version 1 syntax element of the APP11 marker segment may include: a Box Instance Number parameter in two bytes (or 16 bits) denoted as “EN”; a Packet Sequence Number parameter in four bytes (or 32 bits) denoted as “Z”; a Box Length parameter in four bytes (or 32 bits) denoted as “LBox”; a Box Type parameter in four bytes (or 32 bits) denoted as “TBox”; an optional Box Length Extension in eight bytes (or 64 bits) denoted as “XLBox”; version 1 pay load data syntax element; etc.
- the Box Instance Number (En) parameter in two bytes (or 16 bits) can be used to allow or support for (e.g., APP11, etc.) marker segments to carry (data) boxes of the same or identical (box) type but differing data or content portions. This parameter can be used to distinguish these boxes of the same box type. Data or content portions belonging to or residing in logically distinct boxes with the same box type differ in respective values for the Box Instance Number (En) parameter.
- a recipient decoding device can concatenate the data or content portions - or payload data - in the boxes of the marker segments, where the boxes have the same value for the Box Type (BType) parameter but different (e.g., contiguous, sequential, etc.) values for the Box Instance Number (En) parameter, for example in an ascending order of the values for the Box Instance Number (En) parameter.
- BType Box Type
- En Box Instance Number
- the Box Instance Number is set - by the image encoder - to be equal to 0x0001.
- This setting in the box in the image (container) file indicates or signals to a recipient image decoder that the primary or main image data such as the JPEG image data to which an image metadata portion in the (e.g., in only a single box for JPEG, etc.) box is associated or is corresponding.
- a box may be used to carry or include a non-primary or extension image data item or portion such as an HEVC or AVI image data item or portion.
- This box may carry a specific (e.g., 4-byte, string, etc.) value such as ‘HEVC’ or ‘AV01’ for the Box Type (BType) parameter in the box.
- Another box may be used to carry or include an image metadata item or portion for the non-primary or extension image data item or portion included or carried in the former box.
- the other box may carry a specific (e.g., 4-byte, string, etc.) counterpart value such as ‘RPHE’ or ‘RPAV’ - corresponding to the specific value of ‘HEVC’ or ‘AV01’ in the (former) box - for the Box Type (BType) parameter in the other box.
- Both of the boxes - the former of which carries the image data item or portion, whereas the latter of which carries the corresponding image metadata item or portion to be used to process the image data item or portion - are to carry the same or identical value (e.g., “01”, etc.) for the Box Instance Number (En) parameter in each of the boxes.
- the former box may be an ‘HEVC’ box that carries or includes an HEVC image data item/portion.
- the latter box may be a ‘RPHE’ box that carries or includes an RPHE image metadata item/portion for the HEVC image data item or portion carried or included in the former box.
- Both the ‘HEVC’ box and the ‘RPHE’ box carry the same or identical value (e.g., “01”, etc.) for the Box Instance Number (En) parameter in each of the boxes.
- the former box may be an ‘AV01’ box that carries or includes an AVI image data item/portion.
- the latter box may be a ‘RPAV’ box that carries or includes an RPAV image metadata item/portion for the AV 1 image data item or portion carried or included in the former box.
- Both the ‘AV01’ box and the ‘RPHE’ box carry the same or identical value (e.g., “01”, etc.) for the Box Instance Number (En) parameter in each of the boxes.
- the image (container) file contains multiple boxes of the same box type (BType).
- a recipient decoding device of the image (container) file uses the Box Instance Number parameter (or data field) as encoder-provided instructions or references for the recipient decoding device to order and merge image data or metadata items/portions carried or included in these multiple boxes of the same box type with the payload version 1 data syntax element into a single overall box or into a single overall image data or metadata item/portion.
- the Packet Sequence Number (Z) parameter in four bytes (or 32 bits) can be set by the image encoder in each of (e.g., all, etc.) packets used to transfer or transmit a box.
- This parameter can be used to to specify a specific order - e.g., an ascending order of increasing values for the Packet Sequence Number parameter in the packets, etc. - in which (e.g., all, etc.) payload data of the packets of the box is to be merged into an overall payload data for the box. Concatenation of the payload data can proceed in the specific order.
- the value for the Packet Sequence Number parameter of the very first packet among all packets of (e.g., a given instance of, etc.) a box of a particular Box Type may be set to 0x0001 or 0x00000001.
- the Box Length (LBox) parameter in four bytes (or 32 bits) can be used to specify the length of a box.
- the value for the Box Length (LBox) parameter of the box may be measured or set as the sum of (a) the combined size of all payload data carried or encoded with payload version 1 data syntax elements of all boxes of the same box type (which may be set as a specific value in an enumerator); (b) the size (4 bytes in the present example) of a single copy/instance of the Box Type (BType) parameter; (c) the size (4 bytes in the present example) of a single copy/instance of the Box Length (LBox) parameter; and (d) the length (8 bytes in the present example) of a single copy/instance of the Box Length Extension (XLBox; optional) parameter if present.
- the value for the Box Length (LBox) parameter of the box may exclude the sizes of the Packet Sequence Number (Z) parameter, the Box Instance Number (En) parameter, the Format/Packaging version # parameter, the Null byte, the ID String parameter, the (APP11 marker) Length parameter (as illustrated in FIG. 3B) or the APP11 Marker parameter.
- the Box Length Extension (XLBox) parameter may be specified to indicate the box is expanded accordingly. Otherwise, the Box Length Extension (XLBox) parameter may be omitted from the Box syntax element; hence, the XLBox size is zero (0).
- Example box extension can be found in the previously mentioned ISO/IEC 18477-3:2015-12-15.
- the Box Type (TBox) parameter in four bytes (or 32 bits) can be used to specify a specific type of pay load data carried in a box and related context.
- Example box types, their respective values, ASCII encoding and constraints are illustrated in TABLE 1 below.
- Additional box types other than or in place of the box types illustrated in TABLE 1 may also be used, for example, to specify or define additional image metadata on an image carried in or to be constructed from the image (container) file.
- a recipient image decoding device may disregard - or perform no-op on - box types which the recipient decoding device does not understand.
- an image decoding device may abort decoding operations while providing information such as an error message to a user. Additionally, optionally or alternatively, the image decoding device may only decode the primary JPEG image and reject non-primary or extended images such as HEVC or AVI still images. Additionally, optionally or alternatively, the image decoding device may decide to attempt a full decoding even with checksum failure(s).
- the version 1 payload data syntax element in a box carried as a part of photo/image (pay load) data of an APP11 marker segment can be used to carry specific content data of the box, EITHER an image data item/portion such as HEVC or AVI still image data (item/portion) OR an image metadata item/portion.
- An HEVC still image may be included or encoded in an image (container) file as described herein as an H.265 bitstream.
- the H.265 bitstream may be in conformance with one or more applicable coding specifications such as the previously mentioned Recommendation ITU-T H.265 / ISO/IEC 23008-2:2017.
- parameters carried or coded in the H.265 bitstream may be set as follows.
- the “nuh_layer_id” parameter may be set equal to “0”.
- the H.265 bitstream shall contain only one picture - for example, the H.265 bitstream includes only one HEVC image derived from a single source image from which a primary image in the image (container) file is derived.
- the “general_profile_idc” parameter may be set equal to “2” in the HEVC MainlO Still Image Profile for coding operations used to code the H.265 bitstream.
- the “general_one_picture_only_constraint_flag” may be set equal to “1” in the HEVC Main 10 Still Image Profile.
- the “general_level_idc” parameter may be set less than or equal to “183”.
- restrictions may be signaled or set in data fields in the (e.g., HEVC, etc.) sequence parameter set and/or picture parameter set of the H.265 bitstream, for the purpose of informing and enabling the recipient device to perform relatively efficient decoding operations. Additionally, optionally, or alternatively, restrictions may be signaled (to recipient devices) or set in data fields in AV 1 OBU.
- one or more restrictions - or corresponding (e.g., video usability information or VUI, etc.) parameters - carried or coded in the H.265 bitstream may be set as follows.
- the “bit_depth_luma_minus8” parameter may be set equal to “2” for HEVC.
- the “bit_depth_chroma_minus8” parameter may be set equal to bit_depth_luma_minus8 for HEVC.
- the “chroma_format_idc” parameter may be set equal to “1” for HEVC.
- the “subsampling_x” parameter may be set equal to “1”
- the “subsampling_y” parameter may be set equal to “1”.
- the “vui_parameters_present_flag” parameter may be set equal to “1” for HEVC.
- the “video_signal_type_present_flag” parameter may be set equal to “1” for HEVC.
- the “video_format” parameter may be set equal to “0”.
- the “color_description_present_flag” parameter may be set equal to “1” for HEVC.
- the “chroma_loc_info_present_flag” parameter may be set equal to “1” for HEVC.
- the “chroma_sample_loc_type_top_field” parameter may be set equal to “2” (or top-left sited) for HEVC.
- the “chroma_sample_loc_type_bottom_field” parameter may be set equal to “2” (or top-left sited) for HEVC.
- the “chroma_sampling_position” parameter may be set equal to “2”.
- the “seq_profile” parameter may be set equal to “0”, and the
- “high_bitdepth” parameter may be set equal to “1”.
- An image decoding device may decode and/process a received image (container) file as described in accordance with one or more applicable coding specifications.
- the image decoding device may support both co-sited and centered chroma sampling or subsampling for chroma image data carried or encoded in the image (container) file.
- Example types and corresponding capabilities of image decoding device that may receive and process the received image (container) file are illustrated in TABLE 3 below.
- FIG. 6A depicts an example (image/photo) capture device (600) that may be used to implement techniques as described herein.
- Example capture devices as described herein may include, but are not limited to, cameras supporting HDR and/or SDR image capturing, mobile devices with one or more cameras, headmounted user devices with cameras, wearable user devices with cameras, etc.
- One or more (image) sensors (602) are used to capture or generate a raw image, which may be of a bit depth of 12-16 bits.
- the raw image may be captured with specific camera settings for aperture, shutter speed, exposure, focal length, etc.
- the raw image may be processed (e.g., error correction, local and global image adjustment, etc.) by an ISP (604) to generate a post-ISP image or input image to a (e.g., encoder side, etc.) photo process core (606).
- the post-ISP image or input image may be of a bit depth of 10-12 bits and comprise high dynamic range (HDR) perceptually quantized (PQ) codewords represented in a BT.2100 color space.
- the post- ISP image or input image generated by the ISP (604) may be optionally used as a preview image (612).
- a coding block such as a (e.g., encoder side, etc.) photo process core (606) receives the post-ISP or input image from the ISP (604) and operates in conjunction with image/video codecs (e.g., 608 and 610 of FIG. 6A; external to the photo process core (606), etc.) to generate and encode images/photos of different image formats from the same input image and to package the generated/encoded images/photos into an image (container) file (614).
- image/video codecs e.g., 608 and 610 of FIG. 6A; external to the photo process core (606), etc.
- the photo process core (606) may generate or provide a first (e.g., 10- bit, intermediate, reshaped, original, etc.) image - derived from the input image received by the photo process core (606) - to an HEVC codec such as an HEVC encode (608).
- the HEVC encode (608) receives, processes, converts, compresses, and/or encodes, the first image into a corresponding encoded HEVC image.
- the photo process core (606) may generate or provide a second (e.g., 8-bit, intermediate, reshaped, etc.) image - derived from the input image received by the photo process core (606) - to a JPG codec such as a JPG encode (610).
- the JPG encode (610) receives, processes, converts, compresses, and/or encodes, the first image into a corresponding encoded JPG image.
- the encoded HEVC image and the encoded JPG image derived from the same input image may be sent as outputs by the HEVC encode (608) and JPG (610) and received as inputs and packaged (e.g., with image metadata, etc.) into the image (container) file (614) by the photo process core (606).
- some or all of the subsystems or processing modules/blocks as illustrated in FIG. 6A may be implemented in the same capture device (600).
- FIG. 6B depicts an alternative example (image/photo) capture device (600-1) that may be used to implement techniques as described herein.
- the capture device (600-1) may be one or more cameras supporting HDR and/or SDR image capturing, a mobile device with one or more cameras, a headmounted user device with one or more cameras, a wearable user device with one or more cameras, etc.
- One or more (image) sensors (602) are used to capture or generate a raw image, which may be of a bit depth of 12-16 bits.
- the raw image may be captured with specific camera settings for aperture, shutter speed, exposure, focal length, etc.
- the raw image may be processed (e.g., error correction, local and global image adjustment, etc.) by an ISP (604) to generate a post-ISP image to an HEVC codec or HEVC encode (606).
- the post-ISP image generated by the ISP (604) may be compressed or encoded by the HEVC encode (606) to generate an (e.g., HEVC, HEIC, etc.) encoded version of the post-ISP image.
- image processing operations such as image analysis (616) and/or metadata generation (e.g., including but not limited to information generated from the image analysis (616), etc.) may be performed on the post-ISP image and/or the encoded version of the post-ISP image before the (final) encoded version of the post-ISP image is packaged in a relatively efficient capture device image (container) file such as an HEIC image file.
- image analysis 616)
- metadata generation e.g., including but not limited to information generated from the image analysis (616), etc.
- a relatively efficient capture device image (container) file such as an HEIC image file.
- the image metadata generated from the image analysis (616) may be included in the capture device image (container) file and may be used along with the encoded version of post- ISP image by a recipient downstream device receiving the capture device image (container) file to generate optimized display image for rendering on an image display.
- FIG. 6C depicts an example image processing device (650) that may be used to operate with a capture device (e.g., 600-1 of FIG. 6B, etc.) to implement techniques as described herein.
- the image processing device (650) may be a mobile or non-mobile computing device, a cloud- based image processing system, an image/photo processing and/or storage service, a cloud-based photo repository, etc.
- a coding block such as a (e.g., encoder side, etc.) photo process core (606) receives the capture device image (container) file such as an HEIC image file containing an encoded version of the post-ISP image.
- the capture device image (container) file may also include image metadata.
- the photo process core (606) may extract the image metadata and the encoded (e.g., HEVC, HEIC etc.) version of the post-ISP image from the capture device image (container) file.
- the photo process core (606) may operate in conjunction with image/video codecs (e.g., 618 and 610 of FIG.
- the photo process core (606) may operate with or invoke an HEVC codec such as an HEVC decode (618) to generate a decoded version of the post-ISP image.
- the HEVC decode (618) receives, processes, converts, decompresses, and/or decodes, the encoded version of the post-ISP image in the capture device image (container) file into the corresponding decoded version of the post-ISP image.
- the photo process core (606) may generate or provide a (e.g., 8-bit, intermediate, reshaped, etc.) image - derived from the decoded version of the post-ISP image - to a JPG codec such as a JPG encode (610).
- the JPG encode (610) receives, processes, converts, compresses, and/or encodes, the image into a corresponding encoded JPG image.
- the (e.g., HEVC, HEIC, etc.) encoded version of the post-ISP image as received in the capture device image (container) file and the encoded JPG image derived by the JPG encode (610) may be packaged, for example with image metadata, into the image (container) file (614) by the photo process core (606).
- the image metadata in the image (container) file (614) may include image metadata portions - e.g., relating to the HEVC or HEIC encoded version - generated or received from the image metadata extracted from the capture device image (container) file.
- the image metadata in the image (container) file (614) may also include other image metadata portions - e.g., relating to the JPG image - generated by the photo process core (606).
- FIG. 7 depicts an example (image/photo) recipient decoding device (700) that may be used to implement techniques as described herein.
- the decoding device (700) may be an image display supporting HDR and/or SDR image rendering, a mobile or non-mobile computing device with one or more image displays, a TV set, etc.
- a (e.g., decoder side, etc.) photo process core (606) in the decoding device (700) may receive an image (container) file containing two or more images of different image formats such as an HEVC image and a JPG image, etc.
- the photo process core (606) operates in conjunction with an image/video codec such as a JPG codec, HEVC codec, an AVI codec, etc.
- the image/video codec represents an HEVC codec such as an HEVC decode (618).
- the photo process core (606) invokes the HEVC decode (618) to decode an HEVC image, which may be an HEVC encoded version of a post-ISP image captured with a camera device.
- the decoding device (700) may include or operate with an image display such as a display panel.
- Panel configuration (704) - which may specify display capabilities such as dynamic range, color gamut, image refresh rate, etc. - may be generated or provided to the photo process core (606).
- image metadata may be extracted by the photo process core (606) from the received image (container) file (614).
- the photo process core (606) Based at least in part on the panel configuration (704) and/or the image metadata, the photo process core (606) generates a display image from the HEVC image.
- the display image represents an RGB image.
- the display image may be transmitted or provided to the image display or display panel for rendering by way of an RGB image buffer (702).
- FIG. 6D depicts an example (e.g., encoder side, etc.) photo process core (or subsystem) 606 that may be included or implemented in an image processing device to implement techniques as described herein.
- the image processing device that includes or implements the photo process core (606) may be a mobile or non-mobile computing device, a cloud- based image processing system, an image/photo processing and/or storage service, a cloud-based photo repository, capture device, a separate device operating with the capture device, a decoding device, a rendering device, etc. In various operational scenarios, more or fewer components and/or processing blocks may be included or implemented in the photo process core (606).
- the photo process core (620) receives an input image (620), which may, but is not necessarily limited to only, be from an ISP, from a photo image file generated by a camera, etc.
- the photo process core (620) may convert or reshape the input image (620) of a first bit depth (e.g., 10-12 bits, etc.) into an intermediate image of a second bit depth (e.g., 10 bits, etc.).
- the photo process core (606) operates with and invokes an HEVC encode (608) to generate an encoded HEVC image (or an HEVC encoded version of the input image or the converted/reshaped image, etc.).
- the photo process core (606) includes a color volume mapper (624) that may be used to generate a (e.g., color volume mapped, color gamut mapped, color space mapped, reshaped, etc.) mapped image from the input image (620).
- a color volume mapper 624 that may be used to generate a (e.g., color volume mapped, color gamut mapped, color space mapped, reshaped, etc.) mapped image from the input image (620).
- the photo process core (606) operates with and invokes a JPG encode (610) to generate an encoded JPG image (or a JPG encoded version of the map image, etc.).
- the photo process core (606) includes a metadata analysis processing block (622) that may be used to analyze some or all of the input images, intermediate images generated, reshaped, converted or mapped from the input images, the encoded images, etc., to generate image metadata.
- image metadata may include operational parameters that may be used by a recipient decoding device to generate or reconstruct images optimized for rendering in a wide variety of devices or image displays with varying system and/or display capabilities.
- Some or all of the (e.g., HEVC, JPG, etc.) encode images and/or the image metadata may be combined or coded by a package processing block (626) into an image (container) file (614).
- the package processing block (626) may obtain EXIF (exchangeable image file format) metadata, color volume mapping metadata, an HEVC encoded image, a JPG encoded image, etc., as inputs.
- EXIF metadata and the JPG encoded image may be used by the package processing block (626) to populate or generate syntax elements corresponding to the JPG header and EXIF in a JPG image (container) file.
- some or all of the (input) EXIF metadata, color volume mapping metadata, the HEVC encoded image, etc. may be included as one or more photo payloads by the package processing block (626) to populate or generate syntax elements corresponding to one or more APP11 marker segments in the same JPG image (container) file.
- a photo process (or processing) core as described herein may be implemented with GPU, SOC, DSP, ASIC, FPGA, CPU, or other computing resources for full quality images and/or reduced quality images.
- the reduced quality images such as gallery images or thumbnails may be generated in a “Reduced Compute” mode with relatively low computing resource usages.
- a primary (e.g., JPEG base layer, etc.) image can be rendered or displayed with no additional processing.
- a photo process core may operate with or rely on external codecs for inputs/outputs.
- Photo/image processing operations in connection with the photo process core may be split or partitioned into multiple processing blocks. These processing blocks may implement methods to generate the “most efficient” packages (e.g., an image container file with relatively few images of different image formats and/or no or relatively small amount of image metadata, etc.) and/or “most compatible” packages (e.g., an image container file with more images of different image formats and/or larger amount of image metadata, etc.) for a given input image.
- most efficient packages e.g., an image container file with relatively few images of different image formats and/or no or relatively small amount of image metadata, etc.
- most compatible packages e.g., an image container file with more images of different image formats and/or larger amount of image metadata, etc.
- photo packaging operations may be reduced to or performed by a single “package” processing block.
- the image metadata such as relating to image reshaping or prediction mappings as carried or included in the image (container) file 122 can be compressed by an upstream encoding device to reduce the size of the image metadata transmitted by the encoding device to a downstream recipient (decoding) device.
- the image (container) file 122 or bitstream may carry configuration, profile and/or level parameter(s) whose values can indicate different schemes or types of metadata compression to be supported by the downstream device such as a playback device.
- the configuration, profile and/or level parameter(s) can be set to first specific value(s) to indicate limited metadata compression - or a first metadata compression scheme or type that uses relatively small-sized (e.g., metadata, etc.) buffering in the downstream device.
- the configuration, profile and/or level parameter(s) can be set to second specific value(s) to indicate extended metadata compression - or a second metadata compression scheme or type that uses relatively large-sized (e.g., metadata, etc.) buffering in the downstream device.
- the configuration, profile and/or level parameter(s) can be set to third specific value(s) to indicate no metadata compression to be performed or supported by the upstream device and/or the downstream device.
- the limited metadata compression scheme or type allows up to one (1) reference buffer for the complete composer (or image reshaping) coefficients pay load and up to one (1) reference buffer for the DM coefficients.
- the extended metadata compression scheme or type allows more buffers to be used by the image reshaping operations and/or by the DM operations.
- image metadata compression operations may be performed with respect to image metadata portions containing prediction or reshaping operational parameters.
- a first flag e.g., denoted as “use_prev_di_rpu_flag”, a specific syntax element, etc.
- some or all image processing operations such as image reshaping or prediction operations to be performed on or for the current image by the downstream device can share or make use of the same previously sent image metadata portion, thereby reducing or omitting some or all of a separate (e.g., reshaping, prediction, composer, etc.) image metadata portion to be explicitly or specifically coded or transmitted by the upstream device to the downstream device for the current image in the image (container) file 122.
- a separate (e.g., reshaping, prediction, composer, etc.) image metadata portion to be explicitly or specifically coded or transmitted by the upstream device to the downstream device for the current image in the image (container) file 122.
- image processing operations such as image reshaping or prediction operations can be performed by the downstream device on or for the current image using the image metadata portion explicitly or specifically coded/transmitted by the upstream device for the current image in the image (container) file 122 and decoded or retrieved for the current image by the downstream device from the image (container) file 122.
- an image or picture may be indicated or carried in the image (container) file 122 as a key frame for which the value of the first flag (“use_prev_di_rpu_flag”) is set to zero (0).
- a second flag (e.g., denoted as “prev_di_rpu_id”, a second specific syntax element, etc.) coded in the image (container) file (122) may be set by the upstream device to a specific value (e.g., a specific value within a value range of 0 to 15, etc.) to signal or indicate to the downstream device which specific portion of the previously sent image metadata (e.g., for a previous image, etc.) is to be used for performing image processing, reshaping or predicting operations on or for the current image.
- a specific value e.g., a specific value within a value range of 0 to 15, etc.
- the previously sent image metadata by the upstream device and received by the downstream device may include a plurality of (previously sent) image metadata portions maintained or stored in memory or cache at the downstream device.
- Each (previously sent) image metadata portion in the plurality of image metadata portions at the downstream device may be labeled or identified with a respective specific metadata portion identifier - among a plurality of metadata portion identifiers respectively for the plurality of (previously sent) image metadata portions.
- the respective specific metadata portion identifier may be within a specific value range such as 0 to 15.
- the second flag (“prev_di_rpu_id”) set with a specific value in the specific value range may be used by the upstream device to signal or indicate to the downstream device that a specific (previously sent) image metadata portion in the plurality of image metadata portions maintained or cached at the downstream device is to be used by the downstream device to perform image processing, reshaping or predicting on or for the current image.
- the specific image metadata portion may be previously sent for a previous image with a third flag (denoted as “di_rpu_id”, a specific syntax element, etc.) set for the previous image to the same metadata portion identifier value as that of the second flag (“prev_di_rpu_id” etc.) set for the current image.
- the downstream device in response to determining the second flag (“prev_di_rpu_id”) is not present for the current image in the image (container) file 122, the downstream device can proceed to set the second flag internally to a value outside the specific value range such as -1 (e.g., invalid rpu_id, etc.). This helps avoid or prevent the downstream device from subsequently using an incorrect (previously sent) image metadata portion to perform image processing, reshaping or prediction on or for the current image.
- -1 e.g., invalid rpu_id, etc.
- a specific image metadata portion such as (e.g., interlayer, etc.) image reshaping or prediction for the current image is explicitly or specifically sent, transmitted, included or present in the image (container) file 122 from the upstream device to the downstream device.
- image processing operations - which may include but are not necessarily limited to only, image reshaping or prediction operations - can be performed on or for the current image using the (currently sent) specific image metadata portion.
- Some or all operational parameters in the (currently sent) specific image metadata portion may be specified with a (currently sent) data structure (e.g., denoted as “di_rpu_data_mapping()”, a specific set of syntax elements, etc.) explicitly or specifically coded for the current image in the image (container) file 122.
- the downstream (receiving or decoding) device can store, maintain or cache received image metadata portions including optimized operational parameter values generated by the upstream device and included in the image (container) file 122 such as in the “di_rpu_data_mapping()” data structure and used in or referred to by subsequent operations performed by the downstream device as reference data for the current and subsequent images if so signaled or directed by the upstream device with the image (container) file 122.
- each image metadata portion - e.g., operational parameters such as prediction coefficients - in the plurality of image metadata portions stored or cached at the downstream device as the reference data can be distinctly tagged, indexed or identified with a respective value (which may be the same value of the third flag “di_rpu_id” as set by the upstream device and received by the downstream device) within the specific value range.
- the downstream device can update the reference data by overwriting the previously received image metadata portion with the currently received image metadata portion including any prediction coefficients explicitly transmitted for the current image.
- the upstream device may determine that the value of the first flag (“use_prev_di_rpu_flag”) should be set to one (1) for the current image.
- the upstream device does not transmit or include inter layer prediction coefficients explicitly in the image (container) file 122 for the current image.
- the downstream device or an image metadata parser therein may infer that the stored data structure - containing inter layer prediction coefficients - with a tag/index/identifier value equal to zero (0) - e.g., as a fallback - should be read and re-used for processing the current image.
- inter layer prediction coefficients representing (e.g., default, trivial, supported, etc.) 1:1 linear mappings for processing the current image or picture.
- Different images - e.g., of different image formats, different image characteristics or qualities such as different dynamic range, different bit depths, different color precisions, etc. - that can be generated from image data and attendant image metadata in the image (container) file 122 may be logically represented as different image layers.
- the image (container) file 122 may or may not explicitly include all image data and/or all image metadata that are needed to generate a speicific image or image layer in these different images or image layers. Rather, inter-layer image processing operations such as inter-layer prediction, reshaping, mapping, etc., may be used to generate some or all of the image data constituting the specific image.
- Image processing operations as described herein - which may include, but not necessarily limited to only: any of: (e.g., inter-layer, etc.) image prediction, image reshaping, image mapping, etc. - may be performed with polynomial pivot points, polynomial coefficients, and the like, signaled or transferred from the upstream device to the downstream device in image metadata.
- Example image prediction, reshaping and/or mapping methods as described herein may include, but are not necessarily limited to only, any of: linear interpolation, second-order polynomial interpolation, multiple color channel, multiple regression prediction (MMR), etc, for example, as described in U.S. Patent No. 10,021,390, the entire contents of which are incorporated by reference herein in entirety.
- image prediction, reshaping and/or mapping methods as described herein may include tensor product, B-spline prediction (TPB), for example as described in U.S. Patent Application Publication No. 2022/0408081, the entire contents of which are incorporated by reference herein in entirety.
- TPB B-spline prediction
- image metadata compression operations may be performed with respect to DM metadata portions containing display management (DM) operational parameters.
- DM display management
- the DM operations to be performed for the current image by the downstream device can share or make use of the same previously sent DM metadata portion, thereby reducing or omitting some or all of a separate (e.g., DM, etc.) DM metadata portion to be explicitly or specifically coded or transmitted by the upstream device to the downstream device for the current image in the image (container) file 122.
- a separate (e.g., DM, etc.) DM metadata portion to be explicitly or specifically coded or transmitted by the upstream device to the downstream device for the current image in the image (container) file 122.
- the DM operations can be performed by the downstream device for the current image using the DM metadata portion explicitly or specifically coded/transmitted by the upstream device for the current image in the image (container) file 122 and decoded or retrieved for the current image by the downstream device from the image (container) file 122.
- a fifth flag (e.g., denoted as “prev_level_md_id”, a second specific syntax element, etc.) coded in the image (container) file (122) may be set by the upstream device to a specific value (e.g., a specific value within a value range of 0 to 15, etc.) to signal or indicate to the downstream device which specific portion of the previously sent image metadata (e.g., for a previous image, etc.) is to be used for performing DM operations for the current image.
- a specific value e.g., a specific value within a value range of 0 to 15, etc.
- the previously sent DM metadata by the upstream device and received by the downstream device may include a plurality of (previously sent) DM metadata portions maintained or stored in memory or cache at the downstream device.
- Each (previously sent) DM metadata portion in the plurality of DM metadata portions at the downstream device may be labeled or identified with a respective specific DM metadata portion identifier - among a plurality of DM metadata portion identifiers respectively for the plurality of (previously sent) DM metadata portions.
- the respective specific DM metadata portion identifier may be within a specific value range such as 0 to 15.
- the fifth flag (“prev_level_md_id”) set with a specific value in the specific value range may be used by the upstream device to signal or indicate to the downstream device that a specific (previously sent) DM metadata portion in the plurality of DM metadata portions maintained or cached at the downstream device is to be used by the downstream device to perform DM operations for the current image.
- the specific DM metadata portion may be previously sent for a previous image with the third flag (“di_rpu_id”) set for the previous image to the same DM metadata portion identifier value as that of the fifth flag (“prev_level_md_id” etc.) set for the current image.
- the downstream device in response to determining the fifth flag (“prev_level_md_id”) is not present for the current image in the image (container) file 122, the downstream device can proceed to set the fifth flag internally to a value outside the specific value range such as -1 (e.g., invalid rpu_id, etc.). This helps avoid or prevent the downstream device from subsequently using an incorrect (previously sent) DM metadata portion to perform DM operations for the current image.
- -1 e.g., invalid rpu_id, etc.
- a specific DM metadata portion for the current image is explicitly or specifically sent, transmitted, included or present in the image (container) file 122 from the upstream device to the downstream device.
- the DM operations can be performed for the current image using the (currently sent) specific DM metadata portion.
- Some or all operational parameters in the (currently sent) specific DM metadata portion may be specified with a (currently sent) data structure (e.g., denoted as “di_dm_data_payload()”, a specific set of syntax elements, etc.) explicitly or specifically coded for the current image in the image (container) file 122.
- a data structure e.g., denoted as “di_dm_data_payload()”, a specific set of syntax elements, etc.
- the downstream (receiving or decoding) device can store, maintain or cache received DM metadata portions including optimized operational parameter values generated by the upstream device and included in the image (container) file 122 such as in the “di_dm_data_payload()” data structure and used in or referred to by subsequent operations performed by the downstream device as reference data for the current and subsequent images if so signaled or directed by the upstream device with the image (container) file 122.
- each DM metadata portion - e.g., operational parameters for DM operations - in the plurality of DM metadata portions stored or cached at the downstream device as the reference data can be distinctly tagged, indexed or identified with a respective value (which may be the same value of the third flag “di_rpu_id” as set by the upstream device and received by the downstream device) within the specific value range.
- the downstream device can update the reference data by overwriting the previously received DM metadata portion with the currently received DM metadata portion explicitly transmitted for the current image.
- the upstream device may determine that the value of the fourth flag (“use_prev_level_md_flag”) should be set to one (1) for the current image.
- the upstream device does not transmit or include DM operational parameters explicitly in the image (container) file 122 for the current image.
- di_dm_data_payload() a previously received or stored data structure
- prev_level_md_id a currently received image metadata unit
- the downstream device or an image metadata parser therein may infer that the stored data structure - containing DM operational parameters - with a tag/index/identifier value equal to zero (0) - e.g., as a fallback - should be read and re-used in DM operations for the current image.
- a metadata compression sub-system or module may be implemented by the upstream device to support or perform metadata compression operations on image metadata related to or associated with image prediction, image reshaping or display management. In some operational scenarios, some or all of the metadata compression operations such as the limited metadata compression may be performed in an (image or picture) encoding order.
- Example encoding orders may be, but are not necessarily limited to only, one of: encoding a key image frame first and followed by encoding non-key image frame(s) referencing the key image frame, encoding a to-be-referenced image frame first followed by encoding image frame(s) referencing the to-be- referenced image, etc. Additionally, optionally or alternatively, in some operational scenarios, an image metadata stream or sub-stream with metadata compression may be generated by the upstream device in a display order.
- FIG. 9 illustrates example image metadata compression operations that may be implemented or performed by the upstream device.
- Block 902 comprises receiving a request to encode an image metadata portion (“RPU”) relating to (e.g., inter-layer, etc.) image prediction, image reshaping, etc., for an image frame.
- RPU image metadata portion
- the upstream device proceed to determine whether the image frame is a key frame.
- Block 904 comprises, in response to determining that the image frame is a key frame, the image metadata portion (RPU re f) for this (reference) key frame is set to, or included based on, an input image metadata portion (RPUin).
- Block 910 comprises setting the first flag (“use_prev_di_rpu_flag”) to zero (0).
- Block 906 comprises, in response to determining that the image frame is not a key frame, the upstream device proceed to determine whether the input image metadata portion (RPUin) for this (non-reference) key frame is equal to the image metadata portion (RPUref) already set or included for the key frame.
- Block 908 comprises, in response to determining that the input image metadata portion (RPUin) for this (non-reference) key frame is equal to the image metadata portion (RPUref) already set or included for the key frame, the upstream device further proceed to set the first flag ("use_prev_di_rpu_flag”) to one (1) and the second flag (“prev_di_rpu_id”) to the same value as that of the third flag (“di_rpu_id”) of the image metadata portion (RPUref) already set or included for the key frame.
- the upstream device further proceed to set the first flag (“use_prev_di_rpu_flag”) to one (1) and the second flag (“prev_di_rpu_id”) to the same value as that of the third flag (“di_rpu_id”) of the image metadata portion (RPUref) already set or included for the key frame.
- This process flow of FIG. 9A may be repeatedly or iteratively performed for all images or pictures (or image frames) in an encoding order, thereby generating compressed image metadata units for image processing operations such as (e.g., inter-layer) image prediction or reshaping operations (RPU with compressed composer metadata).
- image processing operations such as (e.g., inter-layer) image prediction or reshaping operations (RPU with compressed composer metadata).
- the downstream device or a metadata parser therein may extract or receive, from the image (container) file 122 or bitstream generated by the upstream device, image metadata (“RPU”) portions or payloads in the encoding order.
- image metadata RPU
- the metadata parser may maintain one (e.g., single, etc.) first metadata buffer to store or cache a first (e.g., entire, etc.) data structure (“di_rpu_data_payload()”) of a current image metadata portion for which the first flag (“use_prev_di_rpu_flag”) is set to zero (0) and the third flag (“di_rpu_id”) is also set to zero (0).
- di_rpu_data_payload() a first (e.g., entire, etc.) data structure
- the metadata parser may maintain another (e.g., single, etc.) second metadata buffer to store or cache a second (e.g., entire, etc.) DM metadata structure (“di_dm_data_payload()” or “di_dm_data_payload2()”) for which one or more flags indicate or signal as explicitly being carried or included in the image (container) file 122 or bitstream.
- a second metadata structure (“di_dm_data_payload()” or “di_dm_data_payload2()” for which one or more flags indicate or signal as explicitly being carried or included in the image (container) file 122 or bitstream.
- the metadata parser can restore or reuse operational parameters (e.g., parameters or coefficients relating to display management, etc.) from the second metadata buffer if the second data structure (“di_dm_data_payload()” or “di_dm_data_payload2()”) is not present due, for example, as indicated by one or more corresponding flags for DM metadata compression.
- operational parameters e.g., parameters or coefficients relating to display management, etc.
- FIG. 4A illustrates an example process flow according to an embodiment of the present invention.
- one or more computing devices or components e.g., an encoding device/module, a transcoding device/module, a decoding device/module, an inverse tone mapping device/module, a tone mapping device/module, a media device/module, a reverse mapping generation and application system, etc.
- an image processing system encodes a primary image of a first image format into an image file designated for the first image format.
- the image processing system encodes a non-primary image of a second image format into one or more attendant segments of the image file.
- the second image format is different from the first image format.
- the image processing system causes a display image derived from a reconstructed image to be rendered with a recipient device of the image file.
- the reconstructed image is generated from one of the primary image or the non-primary image.
- the primary image represents a JPEG image; wherein the image file represents a JPEG image file; wherein the non-primary image represents a non- JPEG image.
- both the primary image and the non-primary image are derived from a same source image.
- the non-primary image represents one of one or more non-primary images of the second image format that are encoded in the image file with the primary image.
- the one or more segments are encoded as application 11 (APP11) marker segments in the image file.
- APP11 application 11
- one or more image metadata portions are encoded in one or more second segments of the image file.
- the one or more image metadata portions includes a specific image metadata portion that carries specific operational parameters for specific image processing operations to be performed by the recipient device on one of: the primary image or the nonprimary image.
- the specific image processing operations include one or more of: image forward reshaping, image backward reshaping, image inverse mapping, image mapping, color space conversion, codeword linear mapping, codeword non-linear mapping, display management operations, perceptual quantization based mapping, mapping based on one or more transfer functions, other image processing operations performed by the recipient device, etc.
- specific image metadata portion is generated by concatenating one or more boxes carried in one or more application 11 (APP11) marker segments included in the one or more second segments.
- APP11 application 11
- the non-primary image of the second image format represents one of: an HEVC image, an AVI image, or another non- JPEG image.
- FIG. 4B illustrates an example process flow according to an embodiment of the present invention.
- one or more computing devices or components e.g., an encoding device/module, a transcoding device/module, a decoding device/module, an inverse tone mapping device/module, a tone mapping device/module, a media device/module, a prediction model and feature selection system, a reverse mapping generation and application system, etc.
- an image decoding system receives an image file designated for a first image format. The image file is encoded with a primary image of the first image format.
- the image decoding system decodes a non-primary image of a second image format from one or more attendant segments of the image file.
- the second image format is different from the first image format.
- the image decoding system causes a display image derived from a reconstructed image to be rendered on an image display.
- the reconstructed image is generated from one of the primary image or the non-primary image using image metadata carried in the image file.
- the image decoding system further performs: receiving a second image file designated for the first image format, wherein the second image file is encoded with a second primary image of the first image format, the second image file being not encoded with another image other than the second primary image; decoding the second primary image of the first image format from one or more second attendant segments of the second image file; causing a second display image derived from the second primary image to be reconstructed and rendered on the image display.
- the display image is generated by one or more image processing operations; one or more operational parameters for the one or more image processing operations are decoded from an image metadata portion carried by the image file.
- a computing device such as a display device, a mobile device, a set- top box, a multimedia device, etc.
- an apparatus comprises a processor and is configured to perform any of the foregoing methods.
- a non-transitory computer readable storage medium storing software instructions, which when executed by one or more processors cause performance of any of the foregoing methods.
- a computing device comprising one or more processors and one or more storage media storing a set of instructions which, when executed by the one or more processors, cause performance of any of the foregoing methods.
- Embodiments of the present invention may be implemented with a computer system, systems configured in electronic circuitry and components, an integrated circuit (IC) device such as a microcontroller, a field programmable gate array (FPGA), or another configurable or programmable logic device (PLD), a discrete time or digital signal processor (DSP), an application specific IC (ASIC), and/or apparatus that includes one or more of such systems, devices or components.
- IC integrated circuit
- FPGA field programmable gate array
- PLD configurable or programmable logic device
- DSP discrete time or digital signal processor
- ASIC application specific IC
- the computer and/or IC may perform, control, or execute instructions relating to the adaptive perceptual quantization of images with enhanced dynamic range, such as those described herein.
- the computer and/or IC may compute any of a variety of parameters or values that relate to the adaptive perceptual quantization processes described herein.
- the image and video embodiments may be implemented in hardware, software, firmware and various combinations thereof.
- Certain implementations of the inventio comprise computer processors which execute software instructions which cause the processors to perform a method of the disclosure.
- one or more processors in a display, an encoder, a set top box, a transcoder or the like may implement methods related to adaptive perceptual quantization of HDR images as described above by executing software instructions in a program memory accessible to the processors.
- Embodiments of the invention may also be provided in the form of a program product.
- the program product may comprise any non-transitory medium which carries a set of computer- readable signals comprising instructions which, when executed by a data processor, cause the data processor to execute a method of an embodiment of the invention.
- Program products according to embodiments of the invention may be in any of a wide variety of forms.
- the program product may comprise, for example, physical media such as magnetic data storage media including floppy diskettes, hard disk drives, optical data storage media including CD ROMs, DVDs, electronic data storage media including ROMs, flash RAM, or the like.
- the computer-readable signals on the program product may optionally be compressed or encrypted.
- a component e.g., a software module, processor, assembly, device, circuit, etc.
- reference to that component should be interpreted as including as equivalents of that component any component which performs the function of the described component (e.g., that is functionally equivalent), including components which are not structurally equivalent to the disclosed structure which performs the function in the illustrated example embodiments of the invention.
- the techniques described herein are implemented by one or more special-purpose computing devices.
- the special-purpose computing devices may be hard-wired to perform the techniques, or may include digital electronic devices such as one or more application-specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs) that are persistently programmed to perform the techniques, or may include one or more general purpose hardware processors programmed to perform the techniques pursuant to program instructions in firmware, memory, other storage, or a combination.
- ASICs application-specific integrated circuits
- FPGAs field programmable gate arrays
- Such special-purpose computing devices may also combine custom hard-wired logic, ASICs, or FPGAs with custom programming to accomplish the techniques.
- FIG. 5 is a block diagram that illustrates a computer system 500 upon which an embodiment of the invention may be implemented.
- Computer system 500 includes a bus 502 or other communication mechanism for communicating information, and a hardware processor 504 coupled with bus 502 for processing information.
- Hardware processor 504 may be, for example, a general purpose microprocessor.
- Computer system 500 also includes a main memory 506, such as a random access memory (RAM) or other dynamic storage device, coupled to bus 502 for storing information and instructions to be executed by processor 504.
- Main memory 506 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 504.
- Such instructions when stored in non-transitory storage media accessible to processor 504, render computer system 500 into a special-purpose machine that is customized to perform the operations specified in the instructions.
- Computer system 500 further includes a read only memory (ROM) 508 or other static storage device coupled to bus 502 for storing static information and instructions for processor 504.
- ROM read only memory
- a storage device 510 such as a magnetic disk or optical disk, is provided and coupled to bus 502 for storing information and instructions.
- Computer system 500 may be coupled via bus 502 to a display 512, such as a liquid crystal display, for displaying information to a computer user.
- a display 512 such as a liquid crystal display
- An input device 514 is coupled to bus 502 for communicating information and command selections to processor 504.
- cursor control 516 is Another type of user input device
- cursor control 516 such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 504 and for controlling cursor movement on display 512.
- This input device typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), that allows the device to specify positions in a plane.
- Computer system 500 may implement the techniques described herein using customized hard-wired logic, one or more ASICs or FPGAs, firmware and/or program logic which in combination with the computer system causes or programs computer system 500 to be a special-purpose machine. According to one embodiment, the techniques as described herein are performed by computer system 500 in response to processor 504 executing one or more sequences of one or more instructions contained in main memory 506. Such instructions may be read into main memory 506 from another storage medium, such as storage device 510. Execution of the sequences of instructions contained in main memory 506 causes processor 504 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions.
- Non-volatile media includes, for example, optical or magnetic disks, such as storage device 510.
- Volatile media includes dynamic memory, such as main memory 506.
- storage media include, for example, a floppy disk, a flexible disk, hard disk, solid state drive, magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip or cartridge.
- Storage media is distinct from but may be used in conjunction with transmission media.
- Transmission media participates in transferring information between storage media.
- transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 502.
- transmission media can also take the form of acoustic or light waves, such as those generated during radio- wave and infra-red data communications.
- Various forms of media may be involved in carrying one or more sequences of one or more instructions to processor 504 for execution.
- the instructions may initially be carried on a magnetic disk or solid state drive of a remote computer.
- the remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem.
- a modem local to computer system 500 can receive the data on the telephone line and use an infra-red transmitter to convert the data to an infra-red signal.
- An infra-red detector can receive the data carried in the infra-red signal and appropriate circuitry can place the data on bus 502.
- Bus 502 carries the data to main memory 506, from which processor 504 retrieves and executes the instructions.
- the instructions received by main memory 506 may optionally be stored on storage device 510 either before or after execution by processor 504.
- Computer system 500 also includes a communication interface 518 coupled to bus 502.
- Communication interface 518 provides a two-way data communication coupling to a network link 520 that is connected to a local network 522.
- communication interface 518 may be an integrated services digital network (ISDN) card, cable modem, satellite modem, or a modem to provide a data communication connection to a corresponding type of telephone line.
- ISDN integrated services digital network
- communication interface 518 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN.
- LAN local area network
- Wireless links may also be implemented.
- communication interface 518 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.
- Network link 520 typically provides data communication through one or more networks to other data devices.
- network link 520 may provide a connection through local network 522 to a host computer 524 or to data equipment operated by an Internet Service Provider (ISP) 526.
- ISP 526 in turn provides data communication services through the worldwide packet data communication network now commonly referred to as the “Internet” 528.
- Internet 528 uses electrical, electromagnetic or optical signals that carry digital data streams.
- the signals through the various networks and the signals on network link 520 and through communication interface 518, which carry the digital data to and from computer system 500, are example forms of transmission media.
- Computer system 500 can send messages and receive data, including program code, through the network(s), network link 520 and communication interface 518.
- a server 530 might transmit a requested code for an application program through Internet 528, ISP 526, local network 522 and communication interface 518.
- the received code may be executed by processor 504 as it is received, and/or stored in storage device 510, or other non-volatile storage for later execution.
- the invention may be embodied in any of the forms described herein, including, but not limited to the following Enumerated Example Embodiments (EEEs) which describe structure, features, and functionality of some portions of embodiments of the present invention.
- EEEs Enumerated Example Embodiments
- EEE1 A method comprising: encoding a primary image of a first image format into an image file designated for the first image format; encoding a non-primary image of a second image format into one or more attendant segments of the image file, wherein the second image format is different from the first image format; causing a display image derived from a reconstructed image to be rendered with a recipient device of the image file, wherein the reconstructed image is generated from one of the primary image or the non-primary image.
- EEE2 The method of EEE1, wherein the non-primary image is represented by non-primary image data divided into one or more payload data portions respectively included in the one or more attendant segments of the image file; wherein each attendant segment in the one or more attendant segments of the image file includes a respective box designated to contain a corresponding pay load data portion in the one or more pay load data portions.
- each attendant segment in the one or more attendant segments of the image file includes a first data length field and a second data length field; wherein a total number of bytes of all the one or more payload data portions is indicated in the first length field in response to determining that the first length field is not set to a specific reserved value among a plurality of reserved values; wherein the total number of bytes of all the one or more payload data portion is indicated in the second length field in response to determining that the first length field is set to the specific reserved value.
- each attendant segment in the one or more attendant segments of the image file includes a data type field of a specific data type value indicating the second image format.
- EEE5. The method of any of EEE1-EEE4, wherein the primary image represents a JPEG image; wherein the image file represents a JPEG image file; wherein the non-primary image represents a non- JPEG image.
- EEE6 The method of any of EEE1-EEE5, wherein both the primary image and the non-primary image are derived from a same source image.
- EEE7 The method of any of EEE1-EEE6, wherein the non-primary image represents one of one or more non-primary images of the second image format that are encoded in the image file with the primary image.
- EEE8 The method of any of EEE1-EEE7, wherein the one or more segments are encoded as application 11 (APP11) marker segments in the image file.
- APP11 application 11
- EEE9 The method of any of EEE1-EEE8, wherein one or more image metadata portions are encoded in one or more second segments of the image file.
- EEE10 The method of any of EEE1-EEE9, wherein the one or more image metadata portions includes an image metadata portion that includes explicitly specified first operational parameter values for generating a first image from the image file; wherein the image file is free of explicitly specified second operational parameter values for generating a second different image from the image file; wherein the image file includes one or more flags in place of the explicitly specified second operational parameter values to indicate re-using the explicitly specified first operational parameter values for generating the second different image from the image file.
- EEE11 The method of any of EEE1-EEE10, wherein the explicitly specified first operational parameter values relate to one or more of: image prediction, image mapping, image reshaping, display management, or other image processing operations.
- EEE12 The method of EEE9, wherein the one or more image metadata portions includes a specific image metadata portion that carries specific operational parameters for specific image processing operations to be performed by the recipient device on one of: the primary image or the non-primary image.
- EEE13 The method of EEE 12, wherein the specific image processing operations include one or more of: image forward reshaping, image backward reshaping, image inverse mapping, image mapping, color space conversion, codeword linear mapping, codeword nonlinear mapping, display management operations, perceptual quantization based mapping, mapping based on one or more transfer functions, or other image processing operations performed by the recipient device.
- EEE14 The method of EEE12, wherein specific image metadata portion is generated by concatenating one or more boxes carried in one or more application 11 (APP11) marker segments included in the one or more second segments.
- APP11 application 11
- EEE15 The method of any of EEE 1 -EEE 14, wherein the non-primary image of the second image format represents one of: an HEVC image, an AVI image, or another non- JPEG image.
- EEE16 A method comprising: receiving an image file designated for a first image format, wherein the image file is encoded with a primary image of the first image format; decoding a non-primary image of a second image format from one or more attendant segments of the image file, wherein the second image format is different from the first image format, wherein both the primary image and the non-primary image are derived from a same source image; causing a display image derived from a reconstructed image to be rendered on an image display, wherein the reconstructed image is generated from one of the primary image or the non- primary image using image metadata carried in the image file.
- EEE17 The method of EEE16, further comprising: receiving a second image file designated for the first image format, wherein the second image file is encoded with a second primary image of the first image format, wherein the second image file is not encoded with another image other than the second primary image; decoding the second primary image of the first image format from one or more second attendant segments of the second image file; causing a second display image derived from the second primary image to be reconstructed and rendered on the image display.
- EEE18 The method of EEE 16 or EEE 17, wherein the display image is generated by one or more image processing operations; wherein one or more operational parameters for the one or more image processing operations are decoded from an image metadata portion carried by the image file.
- EEE19 The method of any of EEE16-18, further comprising: maintaining at least one image metadata buffer to store at least one image metadata portion, relating to a first image, received with the image file; using first operational parameter values specified in the at least one image metadata portion in the at least one image metadata buffer to apply one or more image processing operations relating to a second different image.
- EEE20 A method comprising: capturing a raw image with a capture device; processing the raw image with an image signal processor (ISP) of the capture device into a post- ISP image; converting the post- ISP image with two codecs of different image formats in the capture device into two images of different image formats; packaging the two images of different image formats with a photo processing subsystem of the capture device into a single image file; causing a display image to be generated by a recipient device of the single image file from one of the two images of the different image formats and rendered on an image display operating with the recipient device.
- ISP image signal processor
- EEE21 A method comprising: receiving, by a photo processing device, an input image file containing a first image of a first image format; invoking a first codec of the first image format in the photo processing device to decode the first image of the first image format into a decoded image; converting the decoded image with a second codec of a second different image format in the photo processing device into a second image of a second image format; packaging the first image of the first image format and the second image of the second image format with a photo processing subsystem of the capture device into a single image file; causing a display image to be generated by a recipient device of the single image file from one of the two images of the different image formats and rendered on an image display operating with the recipient device.
- EEE22 A method comprising: receiving, by a photo recipient device, an image file containing two or more images of different image formats; invoking a codec of one of the different image formats in the photo recipient device to decode one of the two or more images of the different image formats into a decoded image; generating, by the photo recipient device based at least in part on an image display configuration of the photo recipient device, a display image from the decoded image; causing, by the photo recipient device, the display image to be rendered on an image display of the photo recipient device with the image display configuration.
- EEE23 A non-transitory computer readable storage medium, storing software instructions, which when executed by one or more processors cause performance of the method recited in any of EEE1-EEE22.
- EEE24 An apparatus comprising a processor and configured to perform any one of the methods recited in EEE1-EEE22.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computer Security & Cryptography (AREA)
- Television Signal Processing For Recording (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
A primary image of a first image format is encoded into an image file designated for the first image format. A non-primary image of a second image format is encoded into one or more attendant segments of the image file. The second image format is different from the first image format. A display image derived from a reconstructed image is caused to be rendered with a recipient device of the image file. The reconstructed image is generated from one of the primary image or the non-primary image.
Description
PHOTO CODING OPERATIONS FOR DIFFERENT IMAGE DISPLAYS
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This patent application claims the benefit of priority from U.S. Provisional Application No. 63/528,608, filed on 24 July 2023, and U.S. Provisional Application No. 63/606,424, filed on 5 December 2023, each of which is incorporated by reference herein in its entirety.
TECHNOLOGY
[0002] The present disclosure relates generally to images. More particularly, an embodiment of the present disclosure relates to photo coding operations for different image displays.
BACKGROUND
[0003] Display techniques have been developed to support transmitting and rendering (e.g., photo, etc.) image content based on specific image formats. For example, JPEG image encoders and decoders may support image content coded in a JPEG image format. Other image encoders and decoders may support image content coded in different image formats.
[0004] A consumer or end user device such as a handheld device typically is installed or configured with a limited set of image codecs each of which may support a specific image format in a limited set of image formats. Thus, if an image is not encoded and delivered in an image video format, the device will likely be incapable of finding a suitable image decoder to decode and help render the image content. Even rendered, the rendered image content may comprise incorrect interpretation or representation of the received image content and produce visible artifacts in colors and luminance values.
[0005] As appreciated by the inventors here, improved techniques for composing image content data that can be used to support display capabilities of a wide variety of display devices are desired.
[0006] The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, issues identified with respect to one or more approaches should not assume to have been recognized in any prior art on the basis of this section, unless otherwise indicated.
BRIEF DESCRIPTION OF THE DRAWINGS
[0007] An embodiment of the present invention is illustrated by way of example, and not by
way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which:
[0008] FIG. 1 depicts an example process of a delivery pipeline;
[0009] FIG. 2A and FIG. 2B illustrate example image codec architectures;
[00010] FIG. 3A through FIG. 3C illustrate example coding syntaxes or structures of an image (container) file, an APP11 marker segment or data box(es);
[0010] FIG. 4A and FIG. 4B illustrate example process flows;
[0011] FIG. 5 illustrates a simplified block diagram of an example hardware platform on which a computer or a computing device as described herein may be implemented;
[0012] FIG. 6A and FIG. 6B depict example (image/photo) capture devices;
[0013] FIG. 6C depicts an example image processing device;
[0014] FIG. 7 depicts an example image/photo recipient device;
[0015] FIG. 8 depicts example image/photo packaging operations; and
[0016] FIG. 9 illustrates example image metadata compression operations.
DESCRIPTION OF EXAMPLE EMBODIMENTS
[0017] In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure. It will be apparent, however, that the present disclosure may be practiced without these specific details. In other instances, well-known structures and devices are not described in exhaustive detail, in order to avoid unnecessarily occluding, obscuring, or obfuscating the present disclosure.
Summary
[0018] Techniques as described herein can be used to package or encode photos or still images of primary and/or non-primary image formats in image (container) files with image metadata to enable downstream recipient devices to reconstruct photos or still images of different formats, different dynamic ranges, different color spaces, different bit depths, etc.
[0019] The image (container) files (e.g., JPEG image files, etc.) may be designated to carry photos/images (e.g., JPEG images, etc.) of the primary image formats (e.g., JPEG, etc.).
Attendant (data) segments of the image (container) files such as APP11 segments or the like may be used to carry photos/images (e.g., non- JPEG images, etc.) of the non-primary image formats (e.g., non- JPEG, etc.) and/or the image metadata.
[0020] The image metadata may carry operational parameters or values thereof that have
been optimized by upstream devices. These optimized operational parameters or values may be used in image reconstruction operations such as forward and/or backward reshaping operations.
[0021] The reconstructed photos or still images generated from the photos/images in the image (container) files can be optimized for rendering on image displays of the downstream recipient devices using the image metadata carried in the same image (container) files.
[0022] Example embodiments described herein relate to packaging and encoding photos in image (container) files. A primary image of a first image format is encoded into an image file designated for the first image format. A non-primary image of a second image format is encoded into one or more attendant segments of the image file. The second image format is different from the first image format. A display image derived from a reconstructed image is caused to be rendered with a recipient device of the image file. The reconstructed image is generated from one of the primary image or the non-primary image.
[0023] Example embodiments described herein relate to decoding and rendering photos in image (container) files for image reconstruction and rendering. An image file designated for a first image format is received. The image file is encoded with a primary image of the first image format. A non-primary image of a second image format is decoded from one or more attendant segments of the image file. The second image format is different from the first image format. A display image derived from a reconstructed image is rendered on an image display. The reconstructed image is generated from the non-primary image decoded from the image file.
Example Video Delivery Processing Pipeline
[0024] FIG. 1 depicts an example process of an image delivery pipeline (100) showing various stages from image capture/generation to image displays of different types or capabilities. Example image displays may include, but are not limited to, high dynamic range (HDR) image displays, standard dynamic range (SDR) image displays, image displays operating in conjunction with end-user or personal computers, mobile devices, home theaters, TVs, headmounted display devices, wearable display devices, etc.
[0025] An image frame (102) is captured or generated using image generation block (105). The image frame (102) may be digitally captured (e.g., by a digital camera or an image signal processor (ISP) therein operating in a particular mode or camera setting, etc.) or generated by a computer (e.g., using computer animation, etc.) to provide image data (107). In some embodiments, the image data (107) may be (e.g., automatically with no human input, manually, automatically with human input, etc.) edited or transformed by post-ISP image processing
operations into a post-ISP or post-ISP processed images before being passed to the next processing stage/phase in the image delivery pipeline (100).
[0026] The image data (107) is then provided to a processor for post-production image processing (115). The post-production image processing (115) may include (e.g., partly or fully automatically, partly or fully manually, with an image enhancement or processing application running on a computing device, image cropping, visual effects, global or local tone and/or color adjustment, etc.) adjusting or modifying colors or brightness in an image to enhance the image quality or achieve a particular appearance for the image in accordance with the image content creator’s creative intent. Hence, the post-production image processing (115) operates on the image data (107) to yield a release version of one or more images to be coded into an image signal such as an image (container) file.
[0027] In some operational scenarios, one or more primary and/or non-primary images (117) - e.g., a single JPEG image as the primary image, a primary image plus zero, one or more nonprimary images, etc. - may be coded into an image signal or image (container) file.
[0028] In some operational scenarios, the primary and/or non-primary images (117) may have been forward or backward reshaped, for example in the post-production image processing (115), for the purpose of generating relatively efficient relatively high quality images - as compared with input or source images from ISP or post-ISP image processing operations - for coding into the image signal or image (container) file. Hence, the coding block (120) may receive the primary and/or non-primary images (117) as a reshaped image. It should be noted that, in other operational scenarios, a primary or non-primary image (117) encoded in the image (container) file may not represent a reshaped image.
[0029] In some embodiments, the coding block (120) and/or post-production image processing (115) may implement a codec framework such as illustrated in FIG. 2A. The primary and/or non-primary images (117) may be compressed/encoded by the coding block (120) into an image signal or image (container) file (122) (e.g., a coded bitstream, a binary file, a JPEG image file, etc.). In some embodiments, the coding block (120) may include one or more image encoders or codecs, such as those related to industry standard or proprietary specification delivery formats, to generate the image (container) file (122).
[0030] In some embodiments, the image (117) preserves the content creator’s intent (also referred to as “artist intent”) with which the image is generated in the post-production image processing (115).
[0031] In some embodiments, the image container file (122) is an image signal in compliance with one or more (image) coding or coding syntax specifications.
[0032] The image signal or image (container) file (122) may further include, or may be coded with, image metadata (117-1) including but not limited to composer metadata. The image metadata (117-1) may be generated by the coding block (120) and/or the post-production block (115). The composer metadata - e.g., forward and/or backward reshaping mappings, lookup tables, etc. - can be used by downstream decoders to perform forward/backward reshaping (e.g., tone mapping, inverse tone mapping, etc.) on the primary and/or non-primary images (117) in order to generate one or more other images including but not limited to display images that may be relatively accurate for rendering on one or more other image displays in addition to one or more reference image displays with which the primary and/or non-primary images (117) are optimized to be rendered.
[0033] As used herein, reshaping as described herein (e.g., forward reshaping, backward reshaping, tone mapping, inverse tone mapping, etc.) may refer to image processing operations that convert between different EOTFs, different color spaces, different dynamic ranges, and so forth. Additionally, optionally, or alternatively, backward or inverse reshaping refers to image processing operations that convert re-quantized images back to the original EOTF domain (e.g., gamma or PQ, etc.) or to a different EOTF domain, for further downstream processing, such as the display management.
[0034] The image container file (122) is further encoded with portion(s) of the image metadata (117-1) including but not limited to specific display management (DM) metadata portion(s) that can be used by the downstream decoders to perform specific display management operations on decoded or backward reshaped images for specific image displays to generate display images optimized for rendering on the specific image displays. Examples of display management operations and corresponding DM metadata portions in (image) metadata are described in U.S. Pat. App. Pub. No. 2022/0164931, “Display management for high dynamic range images,” by Robin Atkins, Jaclyn Anne Pytlarz and Elizabeth G. Pieri, the contents of all of which are incorporated by reference herein in entirety.
[0035] The image (container) file (122) is then delivered downstream to receivers or recipient devices such as decoding and playback devices, media source devices, media streaming client devices, television sets (e.g., smart TVs, etc.), set-top boxes, movie theaters, and the like. In a receiver (or a downstream device), the image (container) file (122) is decoded by decoding block (130) to generate a decoded image 182, which may be the same as one of the primary and/or non- primary images (117), subject to quantization errors generated in compression performed by the coding block (120) and decompression performed by the decoding block (130).
[0036] Some or all of the image metadata (117-1) - including but not limited to the composer
metadata - transmitted in the image (container) file with the primary and/or non-primary images (117) to a recipient device may be generated by the coding block (120) and/or the postprocessing image processing (115) automatically, in real time, in offline processing, etc. In some embodiments, the image data (117-1) is provided to the coding block (120) and/or the postprocessing image processing (115) for composer metadata generation. The composer metadata generation may automatically generate composer metadata with no or little human interaction. [0037] The composer metadata can be used to provide or generate image content respectively optimized for a wide variety of display devices or image displays. The composer metadata can be used to generate the other images that are unavailable or unsent in the image (container) file (122). Thus, techniques as described herein can be used to generate or compose image content specifically and respectively optimized for non-reference image displays, so long as the primary and/or non-primary images (117) and the composer metadata are available in the image (container) file (122). The image content for these non-reference image displays can be optimized to explore the full or relatively large extent of display capabilities of these non- reference displays.
[0038] Additionally, optionally, or alternatively, the DM metadata in the image metadata can be used by the downstream decoders to perform display management operations on the backward reshaped images generate device- specific display images for rendering on a wide variety of display devices or image displays.
[0039] In operational scenarios in which the receiver operates with (or is attached to) an image display 140 that is supported or targeted by the decoded image (182) - which is the same as or relatively closely approximate one of the primary and/or non-primary images (117) subject to coding errors, the receiver can render the decoded image on the image display (140).
[0040] In operational scenarios in which the receiver operates with (or is attached to) an image display 140-1 that is not supported or targeted by any one of the decoded images (182), the receiver can extract the composer metadata (e.g., lookup table or LUT based composer metadata, polynomial based composer metadata, multiple-channel, multiple-regression (MMR) composer data, tensor product B-spline (TPB) composer metadata, non-TPB composer metadata, etc.) from the image (container) file (122) and use the composer metadata to compose an inversely /backward reshaped image (132) (also referred to as a constructed or reconstructed image) based at least in part on one of the decoded images (182) and/or the composer metadata. In addition, the receiver can extract the DM metadata from the image (container) file (122) and apply DM operations (135) on the reconstructed image (132) based on the DM metadata to generate a corresponding display image (137) for rendering on the (e.g., non-reference, etc.)
display device (140-1).
[0041] Image displays for which optimized display images can be generated for rendering under techniques as described herein may include image displays of various dynamic ranges. [0042] As used herein, the term “dynamic range” (DR) may relate to a capability of the human visual system (HVS) to perceive a range of intensity (e.g., luminance, luma) in an image, e.g., from darkest blacks (darks) to brightest whites (highlights).
[0043] As used herein, the term high dynamic range (HDR) relates to a DR breadth that spans the some 14-15 or more orders of magnitude of the human visual system (HVS). In practice, the DR over which a human may simultaneously perceive an extensive breadth in intensity range may be somewhat truncated, in relation to HDR.
[0044] In practice, images comprise one or more color components (e.g., luma Y and chroma Cb and Cr) of a color space, where each color component is represented by a precision of n-bits per pixel (e.g., n=8, etc.). Using non-linear luminance coding (e.g., gamma encoding), images where n < 8 (e.g., color 24-bit JPEG images, etc.) are considered images of standard dynamic range, while images where n > 8 may be considered images of enhanced dynamic range. [0045] The term “PQ” as used herein refers to perceptual luminance amplitude quantization. The human visual system responds to increasing light levels in a very nonlinear way. A human’s ability to see a stimulus is affected by the luminance of that stimulus, the size of the stimulus, the spatial frequencies making up the stimulus, and the luminance level that the eyes have adapted to at the particular moment one is viewing the stimulus. In some embodiments, a perceptual quantizer function maps linear input gray levels to output gray levels that better match the contrast sensitivity thresholds in the human visual system. An example PQ mapping function is described in SMPTE ST 2084:2014 “High Dynamic Range EOTF of Mastering Reference Displays” (hereinafter “SMPTE”), which is incorporated herein by reference in its entirety, where given a fixed stimulus size, for every luminance level (e.g., the stimulus level, etc.), a minimum visible contrast step at that luminance level is selected according to the most sensitive adaptation level and the most sensitive spatial frequency (according to HVS models).
[0046] A reference electro-optical transfer function (EOTF) for a given display characterizes the relationship between color values (e.g., luminance) of an input image signal to output screen color values (e.g., screen luminance) produced by the display. For example, ITU Rec. ITU-R BT. 1886, “Reference electro-optical transfer function for flat panel displays used in HDTV studio production,” (March 2011), which is incorporated herein by reference in its entirety, defines the reference EOTF for flat panel displays. Displays that support luminance of 200 to 1,000 cd/m2 or nits typify a lower dynamic range (LDR), also referred to as a standard dynamic range (SDR), in
relation to HDR. Further EOTF examples are defined or described in SMPTE 2084 and Rec.
ITU-R BT.2100, “Image parameter values for high dynamic range television for use in production and international programme exchange,” (06/2017), which are incorporated herein by reference in its entirety.
Codec Frameworks
[0047] FIG. 2A and FIG. 2B illustrate example image codec architectures. More specifically, FIG. 2A illustrates an example encoder-side codec architecture, which may be implemented with one or more computing processors in an upstream image encoder, etc. FIG. 2B illustrates an example decoder-side codec architecture, which may also be implemented with one or more computing processors in a downstream image decoder (e.g., a receiver, etc.), etc.
[0048] In the encoder-side codec architecture, as illustrated in FIG. 2A, an original image or photo such as one generated by an ISP of a camera or a post-ISP image processing tool is received as input. This original image or photo may be used to derive or generate primary and/or non-primary images (117) to be included or contained in an image (container) file (122 of FIG. 1). For ease of reference, the primary and/or non-primary images (117) may be referred herein as a primary image (in the image (container) file).
[0049] By way of illustration but not limitation, an image generator 162 - which may represent or include one or more image conversion or mapping tools, etc. - is used to generate the primary and/or non-primary image(s) (117) that are derived from, or corresponding to, the source image. In some embodiments, the image generator (162) may perform forward and/or inverse tone mapping operations.
[0050] In the encoder-side codec architecture, an image metadata generator 150 (e.g., a part of the coding block (120) and/or the post-production image processing (115) of FIG. 1, etc.) receives some or all of the primary and/or non-primary images (117) as input, generates image metadata (117-1) such as composer metadata, DM metadata, and so forth.
[0051] In the encoder-side architecture, a compression block 142 (e.g., a part of the coding block (120) of FIG. 1, etc.) compresses/encodes the primary and/or non-primary images (117) in image data 144 carried or included in an image signal or image (container) file (122) of FIG. 1. The image metadata (117-1) (denoted as “rpu”), as generated by the image metadata generator (150), may also be included or encoded (e.g., by the coding block (120) of FIG. 1, etc.) into the image signal or image (container) file (122).
[0052] In the encoder- side architecture, the image metadata (117-1) may be separately
carried in designated coding segments of the image signal or image (container) file (122). These designated coding segments may be separate from specific coding segment(s) in the image signal or image (container) file (122) that are used to carry or include the primary and/or non-primary images (117). For example, the image metadata (117-1) may be encoded in (designated) image metadata segments or syntax elements in the image (container) file (122), while the primary and/or non-primary images (117) are encoded in (designated) image data segment(s) (or corresponding syntax element(s)) in the same image signal or image (container) file (122).
[0053] In the encoder-side architecture, the composer metadata in the image metadata (117- 1) in the image signal or image (container) file (122) can be used to enable downstream receivers to (e.g., forward, backward, inverse, etc.) reshape or map the primary and/or non-primary images (117) into one or more reconstructed images (e.g., approximating or the same as the non-primary image(s) (148)) for one or more other image displays other than the reference image displays supported by the primary and/or non-primary images (117). Example image displays may include, but are not necessarily limited to only, any of: an image display with similar display capabilities to those of a reference display, an image display with different display capabilities from those of a reference display, an image display with additional DM operations to map reconstructed image content to display image content for the image display, etc.
[0054] In the decoder-side architecture, as illustrated in FIG. 2B, the image signal encoded with the primary and/or non-primary images (117) and the image metadata (117-1) are received as input.
[0055] A decompression block 154 (e.g., a part of the decoding block (130) of FIG. 1, etc.) decompresses/decodes compressed image data in the image signal or image (container) file (122) into a decoded image (182). The decoded image (182) may be the same as one of the primary and/or non-primary images (117) subject to quantization errors in the compression block (142) and in the decompression block (154). The decoded image (182) may be outputted in an output image signal 156 (e.g., over an HDMI interface, over a video link, etc.) to and rendered on a (reference) image display.
[0056] In addition, an image reshaping block 158 extracts the image metadata (117-1) such as the composer metadata (or backward reshaping metadata) from the input image signal or image (container) file (122), constructs (e.g., backward, inverse, forward, etc.) reshaping functions based on the extracted composer metadata in the image metadata, and performs reshaping operations on the decoded images (182) based on the reshaping functions to generate one or more reshaped images (132) (or reconstructed images) for one or more other image displays in addition to the one or more reference image displays.
[0057] In some operational scenarios, DM operations may not be performed by a receiver to simplify device operations. In some operational scenarios, DM metadata may be transmitted with the composer metadata and the primary and/or non-primary images (117) in the image signal or image (container) file (122) to the receiver. Display management operations specific to an image display with different display capabilities from those of the reference displays may be performed on the reshaped or reconstructed image (132) based at least in part on the DM metadata in the image metadata (117-1), for example to generate a corresponding device- specific display image to be rendered on the actual image display.
[0058] In some operational scenarios, an SDR image including but not limited to a forward reshaped SDR image may be included or packaged as a primary or non-primary image in an image (container) file as described herein. Other images including but not limited to HDR images may be included or packaged with, or reconstructed (with composer metadata) from, the SDR image.
[0059] In other operational scenarios, an image of another dynamic range other than SDR may be included or packaged as a primary or non-primary image in an image (container) file as described herein. Additionally, optionally, or alternatively, other image metadata and/or other composer metadata may be included or packaged in the same image (container) file for constructing other images in various image formats and/or in various dynamic ranges and/or in various color spaces and/or in various precisions and/or bit depths.
Image Container File
[0060] An image encoder may generate photo/image data and attendant image metadata and encode or compress the photo/image data and the image metadata (e.g., specifically identified, with a specific parameter value such as rpu_type = 8, etc.) into an image (container) file using a coding syntax in compliance with an image container specification (e.g., of a specific version, etc.). The image file encoded by the image encoder may be signaled, transmitted, or otherwise directly or indirectly delivered to a downstream recipient device such as an image decoder. The image decoder may decode, from the image file, the (encoded or compressed) photo/image (payload) data and image metadata using the same coding syntax.
[0061] The specification may provide a standard-based or proprietary specification of some or all syntax elements (or coding segments) constituting the coding syntax relating to including or packaging photo/image (payload) data and image metadata in image files.
[0062] Example image (container) file specifications as described herein may include, but are
not necessarily limited to only, any of: ISO/IEC 10918-1:1994, Information Technology - Digital compression and coding of continuous-tone still images: Requirements and guidelines (similarly, and later defined in ISO/IEC 18477-1:2020-05, Information Technology - Digital compression and coding of continuous-tone still images - Part 1: Core coding system specification); ISO/IEC 10918-4:1999, Information Technology - Digital compression and coding of continuous-tone still images: Registration of JPEG profiles, SPIFF profiles, SPIFF tags, SPIFF colour spaces, APPn markers, SPIFF compression Types and Registration Authorities (REGAUT); ISO/IEC 10918-5:2013, Information Technology - Digital compression and coding of continuous-tone still images: JPEG File Interchange Format (JFIF); CIPA DC-008-2019 / JEITA CP-3451E, Standard of the Camera & Imaging Products Association, Exchangeable image file format for digital still cameras: EXIF Version 2.32; ISO/IEC 14496-12:2015, Information Technology - Coding of Audio-Visual Objects, Part 12: ISO Base Media File Format, available from http://www.iso.org; ISO/IEC 14496-15:2017, Information Technology - Coding of Audio-Visual Objects, Part 15: Carriage of Network Abstraction Eayer (NAE) Unit Structured Video in ISO Base Media File Format; ISO/IEC 14496-15:2017, Amendment 2, 2019-01, Information Technology — Coding of Audio-Visual Objects, Part 15: Carriage of Network Abstraction Eayer (NAE) Unit Structured Video in ISO Base Media; Recommendation ITU-T H.265 / ISO/IEC 23008-2:2017 Information technology - High efficiency coding and media delivery in heterogeneous environments - Part 2: High efficiency video coding; ISO/IEC 23008- 2:2017/AMD2:2018, Information technology - High efficiency coding and media delivery in heterogeneous environments - Part 2: High efficiency video coding - Amendment 2: Main 10 Still Picture Profile; ISO/IEC 23008-12:2017, Information Technology - High efficiency coding and media delivery in heterogeneous environments - Part 12: Image File Format; ISO/IEC 18477-1:2020-05, Information Technology - Scalable compression and coding of continuous- tone still images - Part 1: Core coding system specification (AKA JPEG-XT specification); ISO/IEC 18477-3:2015-12-15, Information Technology - Scalable compression and coding of continuous-tone still images - Part 3: Box file format; and AVI Bitstream & Decoding Process Specification, v 1.0.0 with Errata 1, 2019-01-08 from the Alliance for Open Media, available from https://github.com/AOMediaCodec/avl-spec, the contents of all of which are herein incorporated by reference in their entirety.
[0063] FIG. 3A illustrates an example relatively high level coding syntax/segment structure of an image (container) file. The coding syntax/segment structure may represent, but is not necessarily limited to only, a sequential and progressive coding syntax used to support sequential DCT-based, progressive DCT-based and lossless modes of image codec operations.
[0064] The coding syntax/segment structure of the image file of FIG. 3A includes a number of syntax elements or coding segments to support including/packaging an JPEG image into the image (container) file along with image metadata. For example, these syntax elements or segments may include APP11 marker syntax elements or segments in the image (container) file. Some or all of these APP11 marker syntax elements or segments can be used to carry photo or image (pay load) data for one or more images such as one or more HEVC or AVI images.
[0065] As shown in FIG. 3A, a conditional marker syntax element or segment - such as a restart marker number m (RSTm) - may be omitted or excluded from the coding structure of the image (container) file of FIG. 3A; for example, in some operational scenarios, (scan) restart may not be enabled in decoding operations.
[0066] A first subset of syntax elements or segments such as a start-of-image (SOI) syntax element or segment and an end-of-image (EOI) syntax element or segment in FIG. 3A may represent the first or top level (syntax elements or segments) of the coding syntax for the image (container) file.
[0067] A second subset of syntax elements or segments such as APP1 or EXIF marker syntax element or segment, optional APP0 or JFIF marker syntax element or segment, APP 11 syntax elements or segments, DQT or quantization table syntax element(s) or segment(s) and a start-of- frame (SOF) syntax element or segment in FIG. 3A may represent the next or second level (syntax elements or segments) of the coding syntax for the image (container) file. The second subset of syntax elements or segments at the second level of the coding syntax may be used to contain or include one or more scans (e.g., in decoding operations, etc.), which may be preceded by specific table(s) such as DQT table(s). As shown in FIG. 3 A, a define-number-of-lines marker (DNL) syntax element or segment may be excluded or absent from the image (container) file (e.g., after syntax elements or segments defining or specifying the first scan, etc.).
[0068] A third subset of syntax elements or segments such as Huffman table (DHT) syntax element(s) or segment(s) and SOS (start-of-scan, last-1) syntax element(s) or segment(s) in FIG. 3A may represent the third level (syntax elements or segments) of the coding syntax for the image (container) file. As shown in FIG. 3A, the SOS syntax element(s) or segment(s) may be preceded by specific tables such as the Huffman table (DHT) syntax element(s) or segment(s). [0069] A fourth subset of syntax elements or segments such as ECS (entropy coded segment(s), last-1) syntax element(s) or segment(s) in FIG. 3A may represent the fourth level (syntax elements or segments) of the coding syntax for the image (container) file.
[0100] In some operational scenarios, some or all of DAC (Define arithmetic conditioning), DRI (Define restart interval) or COM (Comment) syntax elements or segments may be excluded or absent from the image (container) file.
[0101] As shown in FIG. 3A, application segments 0, 1 and 11 (APP0, APP1 and APP11) are included in the image (container) file. An example list of application-specific markers (APPn) is described or specified in ISO/IEC 10918-4:1999/Amd.l:2013 (E) as already mentioned.
[0102] It should be noted that in some operational scenarios, a specific ordering as illustrated in FIG. 3 A may be used to package or include some or all of one or more specific tables (e.g., APP1, APP0, APP11, DQT, etc.) at (or before) the start of a frame. The specific order may be (e.g., explicitly, by default, precisely, etc.) implemented, enforced or followed by image encoders and/or image decoders as a writing and/or reading order of these coding syntaxes or segments into or from an image (container) file for the purpose of supporting accelerated discovery of image (container) files coded in a coding syntax as described herein or some or all specific contents packaged or included in these files.
[0103] However, it should be noted that in some operational scenarios, this specific ordering of FIG. 3A may be optional, and hence may not be precisely implemented, enforced or followed by image codecs. Hence, an image decoder or a de -packager used to discover or decode contents of an image (container) file is to support or tolerate other orderings of image (payload) data and/or image metadata carried in the image (container) file including but not limited to other orderings of one or more specific tables application-marker segments. Additionally, optionally or alternatively, in some image applications, an image (container) file as described herein may carry, include or package only a single APP11 marker segment.
[0104] The coding structure of the image (container) file or syntax elements or segments therein may be signaled (from encoder to decoder), identified or indicated with specific extension markers in the image (container) file.
Support for Multiple Image Types
[0105] An image (container) file as described herein may be used to support carrying and/or constructing images of multiple types depicting the same visual semantic content (e.g., characters, objects, motions, visual background, etc.) as the primary (or main) image designated to be carried in the image (container) file. Some or all of these images that are carried or that can be constructed from the image (container) file may be directly or indirectly derived or originated
from the same source image such as an image or photo generated by an image signal processor of a camera with specific settings.
[0106] For the purpose of illustration, the image (container) file may be a JPEG image file that includes a primary (or main) image file component such as ECS segment(s) of FIG. 3A to carry, store or contain a (coded) JPEG image. Under techniques as described herein, images of multiple types including but not limited to the JPEG image may be carried or constructed based on image data and/or image metadata included in the JPEG image file.
[0107] In a first example, in addition to the primary (or main) image file component carrying the JPEG image, the image (container) file includes one or more attendant data components such as APP11 (e.g., as shown in FIG. 3A, etc.) syntax element(s) or segment(s) used to carry, store or contain one or more HEVC images, and/or one or more AV1/HDR still images, and/or image metadata (“rpu”), etc. The image metadata may be applied to each of some or all of the HEVC or AV1/HDR still images to generate corresponding (e.g., SDR, HDR, etc.) display images. These corresponding display images generated from the images included in the image (container) file may be specifically optimized - using specific image processing operations and/or specific operational parameters defined or included in the image metadata in the image (container) file - for specific image displays to relatively fully taking advantage of these image displays’ respective capabilities.
[0108] One or more syntax elements or segments such as APP1 (e.g., as shown in FIG. 3A, etc.) EXIF may be included in the image (container) file to describe specific characteristics of the JPEG image. Specific bit depth(s), chroma format(s) and color space(s) for images already included in the image (container) file and/or additional images that may be composed or generated based at least in part on the included images along with the included image metadata can be defined or specified in the image metadata carried or stored in the image (container) file. [0109] In a second example, in addition to the primary (or main) image file component carrying the JPEG image, the image (container) file includes one or more attendant data components such as APP11 (e.g., as shown in FIG. 3A, etc.) syntax element(s) or segment(s) used to carry, store or contain a first image metadata portion of (overall) image metadata (“rpu”), zero, one or more HEVC images, and/or zero, one or more AV1/HDR still images, and/or a second image metadata portion of the (overall) image metadata, etc. Whereas the first image metadata portion may be applied to the JPEG image to generate first corresponding (e.g., SDR, HDR, etc.) display images, the second image metadata portion may be applied to each of some or all of the HEVC or AV1/HDR still images if present in the image (container) file to generate second (e.g., SDR, HDR, etc.) display images. These display images may be specifically
optimized - using specific respective image processing operations and/or specific respective operational parameters defined or included in the respective image metadata portions of the image data in the image (container) file - for specific image displays to relatively fully taking advantage of these image displays’ respective capabilities.
[0110] As in the first example, in the second example, one or more syntax elements or segments such as APP1 (e.g., as shown in FIG. 3A, etc.) EXIF may be included in the image (container) file to describe specific characteristics of the JPEG image. Specific bit depth(s), chroma format(s) and color space(s) for images already included in the image (container) file and/or additional images that may be composed or generated based at least in part on the included images along with the included image metadata can be defined or specified in the image metadata carried or stored in the image (container) file.
[0111] As used herein, the term “HEVC image” may refer to a HEVC MainlO Still Picture image as defined in Recommendation ITU-T H.265/ISO/IEC 23008-2 as referenced herein. In some operational scenarios, the HEVC image includes or carries HEVC VUI parameters. The HEVC image may comply or may be consistent with its HEVC VUI (e.g., settings, values, etc.) including but not necessarily limited to only some or all of: video range, color primaries, transfer characteristic, color matrix, and chroma sampling locations.
[0112] The term “AV 1 image” or “AV 1/HDR image” may refer to an AV 1 Main or High Still Picture image as defined in AV 1 Bitstream & Decoding Process Specification as referenced herein. In some operational scenarios, the AV 1/HDR image includes or carries AV 1 color description parameters. The AV1/HDR image may comply or may be consistent with its AVI parameters (e.g., settings, values, etc.) including but not necessarily limited to only some or all of: video range, color primaries, transfer characteristic, color matrix, and chroma sampling position.
[0113] For the purpose of illustration only, it has been described that an image (container) file such as a JPEG image file may carry HEVC and/or AV1/HDR image(s) along with a JPEG image. It should be noted that, in other operational scenarios, other types of images - in addition to or in place of HEVC and/or AV 1/HDR image(s) - may be carried with a JPEG image. For example, in some operational scenarios, zero, one or more HEIC image may be carried with a JPEG image in addition to or in place of any HEVC and/or AV1/HDR image(s). The term “HEIC image” may refer to a High Efficiency Image File Format image that complies with a specific specification related to HEVC such as defined in ISO/IEC 23008-12:2017, the contents of which are incorporated herein by reference in entirety. In some operational scenarios, the HEIC image includes or carries HEVC VUI parameters. The HEIC image may comply or may be consistent
with its HEVC VUI including but not necessarily limited to only some or all of: video range, color primaries, transfer characteristic, color matrix, and chroma sampling locations.
Encoding Operations
[0114] An image (container) file as described herein may be specifically encoded by an image encoder to include image (payload) data and/or image metadata for the purpose of distributing visual semantic content of an original image or photo by way of images of multiple types that can be decoded or constructed - by a recipient image decoder of the image (container) file - from the image (payload) data and/or image metadata encoded in the image (container) file. [0115] The image encoder may encode the image (container) file with image (payload) data and/or image metadata pursuant to one or more standard based or proprietary specifications including but not necessarily limited to only some or all of: ISO/IEC 10918-1:1994; ISO/IEC 18477-1:2020-05; ISO/IEC 10918-4:1999; ISO/IEC 10918-5:2013; ISO/IEC 14496-12:2015;
JEITA CP-345E/CIPA DC-008-2019 (EXIF); etc.
[0116] Not all parameters and fields (or their values) as defined in the (applicable) specifications may actually be used to encode the image (container) file. A subset of parameters/fields (or values) as defined in these specifications may be excluded or restricted out from the image (container) file. Some or all of such exclusions or restrictions of the subset of parameters/fields from the image (container) file may be explicitly signaled or indicated in the image (container) file by the image encoder to the recipient image decoder.
[0117] In some operational scenarios, the image (container) file includes, or is encoded with, at least one APP11 Marker with a specific identifier such as an identifier string having a specific string value such as ‘DI’ to indicate the present of the image (payload) data and/or the image metadata in the image (container) file to be used to construct or derive the other images and/or the different display images.
[0118] The image (container) file encoded by the image encoder may correspond to a specific (e.g., format, packaging, “0”, “1”, etc.) version of an applicable specification that specifies or define what image (payload) data and/or what image metadata are encoded in the image (container) file. For example, the applicable specification with which the image (container) file is encoded may specify or define that the image (container) file such as a JPEG image file carries, or is encoded with, zero, one or more HEVC MainlO still pictures and/or one or more AV 1 Main or High encoded images, in addition to a JPEG image as the primary (or main) image for the image (container) file. The applicable specification may be specifically
enhanced from one or more other standard based or proprietary specifications to include, specify or define specific coding operations and syntaxes for image/photo distribution of multiple image types using a single image file or a single image container file.
[0119] For each of some or all of the HEVC or AV1/HDR (compressed) still image carried or included in, or constructed from, the image (container) file, HDR characteristics (e.g., bit depth, chroma format, color space, etc.) can be defined or specified in the image metadata in the image (container) file in accordance with an applicable (coding or coding syntax) specification. [0120] In the present example, a JPEG image is to be encoded or carried in the image (container) file as the primary (or main) image. The image encoder may include or invoke a primary image codec such as a JPEG image codec to perform JPEG image encoding operations. [0121] The image encoder or the JPEG image codec can perform image compression operations to generate JPEG compressed image data from a received (e.g., input, source, original, uncompressed, relatively less compressed, etc.) image. The JPEG compressed image data represents the JPEG image and may be encoded or included as (payload) data in a primary (or main) image data component of the image (container) file. The JPEG image may include a plurality of pixel values for a plurality of pixels or pixel locations (e.g., in an image frame, in a spatial array such as a two-dimensional pixel array, etc.).
[0122] In some operational scenarios, the JPEG image may be represented in a YCbCr color space with three color components Y, Cb and Cr. Each pixel value in the plurality of pixel values may contain luma (Y) and chroma (Cb/Cr) component pixel values of a respective pixel or pixel location in the plurality of pixels or pixel locations. Each (Y, Cb or Cr) component pixel value in a pixel value may be of a specific bit depth such as an 8-bit bit depth. The JPEG image may be sampled or subsampled to a chroma sampling (or subsampling) format 4:2:0 in accordance with centered chroma location sampling. This chroma location is based on a TIFF default used in many personal computer applications.
[0123] It should be noted that, in other operational scenarios, another chroma sampling (or subsampling) format other than 4:2:0 and/or another chroma location sampling other than the centered chroma location sampling may be used to sample or subsample chroma component values. Additionally, optionally or alternatively, another color space other than YCbCr with different color components may be used to represent a primary image in an image (container) file as described herein. Additionally, optionally or alternatively, a component pixel value may be of another bit depth different from the 8-bit bit depth.
[0124] In the present example, a full (codeword) value range corresponding to all possible values of the bit depth (e.g., all possible 8-bit values in the present example, etc.) may be used to
encode or represent component (Y, Cb or Cr) pixel values in the primary (or main) image as described herein. For instance, luma (Y) component values may be represented within the full range [0, 255] from a reference black (e.g., 0, etc.) to a reference white (e.g., 255, etc.). Chroma (Cb or Cr) component values may be represented within [0, 255] with 128 as a reference minimum color (e.g., a gray value, etc.) and 0/255 as reference maximum color(s).
[0125] In some operational scenarios, the JPEG image encoded in the image (container) file is derived from an original (e.g., received, input, source, etc.) JPEG image that includes an original or input APP1 (EXIF) Application Marker Segment 1. The JPEG image encoded in the image (container) file includes, or is accompanied in the same image (container) file with, an (e.g., equivalent, consistent, appropriate, etc.) APP1 (EXIF) Application Marker Segment 1 corresponding to the original or input APP1 (EXIF) Application Marker Segment 1 of the original JPEG image.
[0126] The JPEG image encoded in the image (container) file may also include, or may also be accompanied in the same image (container) file with, an APP0 (JFIF) Application Marker Segment 1 that is (to be) used as appropriate and consistent with the JPEG image encoded in the image (container) file.
[0127] As noted, the image (container) file as described herein may include image (payload) data representing additional images other than the primary (or main) image as well as image metadata that may be used with some or all of the image (payload) data to construct or derive other images such as images of different types from that of the primary (or main) image and/or different display images respectively optimized for rendering on different types of image displays.
[0128] The image encoder may include or invoke one or more non-primary image codecs such as HEVC and/or AV1/HDR (and/or HEIC) image codec(s) to perform HEVC and/or AV1/HDR image encoding operations.
[0129] The image encoder or the non-primary image codecs can perform image compression operations to generate HEVC and/or AV1/HDR (and/or HEIC) compressed image data corresponding to the JPEG image in the primary data component of the image (container) file and/or corresponding to the received (e.g., input, source, original, uncompressed, relatively less compressed, etc.) image used to generate the JPEG file. The HEVC and/or AV1/HDR (and/or HEIC) compressed image data represents HEVC and/or AV1/HDR (and/or HEIC) image(s) and may be encoded or included as (pay load) data in other image data components such as APP11 data segment(s) of the image (container) file in addition to or separate from the primary (or main) image data component of the image (container) file. Each of the HEVC and/or AV1/HDR (and/or
HEIC) image(s) may include a plurality of pixel values for a plurality of pixels or pixel locations (e.g., in an image frame, in a spatial array such as a two-dimensional pixel array, etc.).
[0130] Each (still image) of the HEVC and/or AV 1/HDR (and/or HEIC) image(s) represented or included in the APP11 segment(s) of the image (container) file may use the same type of image codec for encoding or decoding operation.
[0131] In some operational scenarios, the applicable specification may specify or define that some or all still images included (as extension image data) in the APP11 segment(s) of the image (container) file are coded using the same type of image codecs. In these operational scenarios, the non-primary (or extension) images whose image (pay load) data is included in the APP11 segment(s) of the image (container) file are EITHER one or more HEVC images OR one or more AVI images. In other words, these APP11 segment(s) may not carry a mixture of (extension image data of) HEVC image(s) and AVI image(s).
[0132] Image data (e.g., compressed image data, image payload, etc.) may be encoded in an YCbCr 4:2:0 image with 10 bits per color component or primary (Y, Cb or Cr).
[0133] Color components or primaries of a color space (e.g., YCbCr, etc.) in which the image data is represented may be specified or defined in one or more applicable image data coding specifications such as BT.2100-2, and may be signaled in an image container file, for example, as follows: HEVC VUI or AVI parameter = ‘9’.
[0134] Likewise, transfer characteristics of the image data may be specified or defined as either PQ or HLG in accordance with the applicable image data coding specifications, and may be signaled in the image container file, for example, as follows: either HEVC VUI or AVI parameter = ‘16’ for PQ or HEVC VUI or AVI parameter = ‘18’ for HLG. In some operational scenarios, all non-primary images included in the image container file such as HEVC or AVI still images use one and only one type of transfer characteristic, either PQ or HLG.
[0135] Color matrices relating to the color space (e.g., YCbCr, etc.) may be defined or specified in the one or more applicable image data coding specifications such as BT.2020-2 (NCL) and BT.2100-1, and may be signaled in the image container file, for example, as follows: HEVC VUI or AV 1 parameter = ‘9’ .
[0136] The top-left chroma location sampling - which may be signaled in the image container file, for example, as follows: HEVC VUI or AVI parameter = ‘2’ - may be used for the 4:2:0 chroma sampling format.
[0137] Limited codeword ranges - which may be signaled in the image container file, for example, as follows: HEVC VUI or AVI parameter = ‘0’ - may be used for coding codewords in the color components of the color space (e.g., YCbCr, etc.). For example, a first limited
codeword range 64 - 940 may be used to represent Y codewords from the reference black to the reference white. A second limited codeword range 64 - 940 may be used to represent each of Cb or Cr codewords from the reference minimum color to the reference maximum color, with 512 as the gray value.
[0138] The encoded/compressed image data (e.g., HEVC compressed image data, etc.) in the image container file may be carried as network abstract layer (NAL) unit data and written/stored in the image container file (or image signal) as H.265 Annex B Byte stream format.
Encoding Extension Image Data
[0139] Image data for one or more non-primary (or extension) images in addition to a primary (or main) image in an image (container) file may be formed from concatenating image (pay load) data carried within APP11 segments or coding syntax as illustrated in FIG. 3 A. These APP11 segments may be designated or included with specific segment markers such as a string value of ‘DI’ in accordance with the applicable image coding (syntax) specifications. The applicable image coding (syntax) specifications may specify or define application-marker segments such as APP11 for ‘Tables/miscellaneous.’ An APP11 segment or syntax element may be located either at the start of a frame or the start of a scan in the frame.
[0140] FIG. 3B illustrates an example syntax element or coding structure for an APP11 marker segment. As shown, the APP11 syntax element includes a plurality of parameters to be respectively coded in a plurality of component syntax elements as a part of the overall APP11 syntax element. These parameters or component syntax elements - representing different data fields in the APP11 marker segment - may be sequentially and contiguously ordered (with no null byte(s) and/or with no fill byte(s) - unless explicitly noted - in between successive parameters or syntax elements unless otherwise indicated) as shown in FIG. 3B.
[0141] In some operational scenarios, as illustrated in FIG. 3B, the plurality of parameters in the APP 11 marker segment or syntax element may include: an APP11 marker parameter (2- bytes); a Eength parameter (2-bytes); an ID string parameter (2-bytes); followed by a Null byte (1-byte); followed by a photo/image (pay load) data syntax element; etc. The photo/image (pay load) data syntax element includes the very first byte - as the very first (e.g., lower level, etc.) syntax element in the overall photo/image (payload) data syntax element - carrying a format/packaging version # parameter (1-byte).
[0142] If the format/packaging version # parameter in the photo/image (payload) data syntax element is set to a specific value such as one (1), then the format/packaging version # parameter
is followed by a specific payload version (1 in the present example) syntax element. All (e.g., component, lower level, etc.) syntax elements constituting the overall photo/image (payload) data syntax element may be consistent with, or identified by, the indicated format/packaging version # parameter in accordance with the applicable specifications.
[0143] By way of example but not limitation, as illustrated in FIG. 3B, the APP11 marker parameter in two bytes (or 16 bits) can be used to carry or specify a specific marker value such as OxFFEB to identifies this segment represents or carries (e.g., Dolby Image, specification defined, etc.) image (payload) data marker segments or syntax elements.
[0144] The APP11 Length parameter in two bytes (or 16 bits) (e.g., immediately, as defined by the applicable specifications, etc.) following the APP11 marker parameter can be used to carry a value to indicate a length of the (e.g., entire, minus two bytes, present, etc.) APP11 marker segment. The length of the APP 11 marker segment may include the size of the plurality of parameters including any intervening Null byte(s) and the size of the photo/image (payload) data syntax element contained in the present APP11 marker segment alone and may exclude the two bytes of the APP11 marker (OxFFEB in the present example) itself.
[0145] The ID String parameter in two bytes (or 16 bits) can be used to carry a special or specifically designated value of x4449 (corresponding to ASCII: ‘D’ ‘I’) to distinguish the present APP11 marker segment from any other APP11 marker segment(s) that are used for other purposes other than carrying photo/image (payload) data as specified or defined by the applicable specifications. In response to determining that the (decoded) value of the two bytes in an APP11 marker segment corresponding to this parameter differs from, or does not match, the special or specifically designated value, a recipient decoding device can ignore or avoid using the APP11 marker segment for retrieving photo or image (pay load) data.
[0146] The first byte of the photo/image (payload) data syntax element, namely the format/packaging version # parameter, can be used to define or specify a specific payload version (e.g., “1”, etc.) syntax element that follows the format/packaging version # parameter in the photo/image (payload) data syntax element. The specific payload version syntax element may be used to store or carry actual photo/image (payload) data of a non-primary or extension image such as an HE VC or AV 1 image.
[0147] In some operational scenarios, all APP 11 marker segments to be encoded with any non-primary or extension photo/image (payload) data in the same image (container) file carry the same value for the format/packaging version # parameter specified for each of these APP11 marker segments.
[0148] In the present example, the format/packaging version # parameter in the APP11 marker segment or syntax element has a specific value ‘1’. Accordingly, a specific payload version syntax element such as a payload version 1 syntax element is used to encode or store actual non-primary or extension photo/image (payload) data.
[0149] In some operational scenarios, the payload version 1 syntax element logically includes one or more (data) boxes coded in respective syntax elements. The boxes or the respective syntax elements describe size(s) and location(s) of (e.g., non-primary or extension, HEVC, AVI, etc.) image data and/or image metadata (“rpu”) for primary and/or non-primary image data. The image metadata may include specific image metadata items or portions associated with, or corresponding to, specific image items or portions in the non-primary or extension image data such as HEVC or AV 1 still image data and in primary or main image data such as JPEG image data.
[0150] As used herein, an image metadata data item or portion associated with or corresponding to an image data item or portion refers to a part of the image metadata that is specifically designated to carry or include specific operational parameters for specific image processing operations that may be performed on the (associated or corresponding) image data item or portion.
[0151] FIG. 3C illustrates an example coding syntax/structure for a (data) box or a corresponding syntax element in the one or more boxes included in the payload version 1 syntax element of the APP11 marker segment. As shown, the box includes a plurality of parameters to be respectively coded in a plurality of component syntax elements as a part of the overall box. These parameters or component syntax elements - representing different data fields in the box - may be sequentially and contiguously ordered (with no null byte(s) and/or with no fill byte(s) - unless explicitly noted - in between successive parameters or syntax elements unless otherwise indicated) as shown in FIG. 3C.
[0152] In some operational scenarios, as illustrated in FIG. 3C, the plurality of parameters in the box in the pay load version 1 syntax element of the APP11 marker segment may include: a Box Instance Number parameter in two bytes (or 16 bits) denoted as “EN”; a Packet Sequence Number parameter in four bytes (or 32 bits) denoted as “Z”; a Box Length parameter in four bytes (or 32 bits) denoted as “LBox”; a Box Type parameter in four bytes (or 32 bits) denoted as “TBox”; an optional Box Length Extension in eight bytes (or 64 bits) denoted as “XLBox”; version 1 pay load data syntax element; etc.
[0153] The Box Instance Number (En) parameter in two bytes (or 16 bits) can be used to allow or support for (e.g., APP11, etc.) marker segments to carry (data) boxes of the same or
identical (box) type but differing data or content portions. This parameter can be used to distinguish these boxes of the same box type. Data or content portions belonging to or residing in logically distinct boxes with the same box type differ in respective values for the Box Instance Number (En) parameter. A recipient decoding device can concatenate the data or content portions - or payload data - in the boxes of the marker segments, where the boxes have the same value for the Box Type (BType) parameter but different (e.g., contiguous, sequential, etc.) values for the Box Instance Number (En) parameter, for example in an ascending order of the values for the Box Instance Number (En) parameter.
[0154] In some operational scenarios, in response to determining that the Box Type (BType) parameter has a specific value such as ‘RPJP’ to indicate an association with the primary or main image data such as JPEG image data, then the Box Instance Number is set - by the image encoder - to be equal to 0x0001. This setting in the box in the image (container) file indicates or signals to a recipient image decoder that the primary or main image data such as the JPEG image data to which an image metadata portion in the (e.g., in only a single box for JPEG, etc.) box is associated or is corresponding.
[0155] It should be noted that, in some operational scenarios, there may not be any box in any APP11 marker segment in the image (container) file to carry or include an image metadata portion in the (e.g., in only a single box for JPEG, etc.) box that is associated or is corresponding to the primary or main image data.
[0156] A box may be used to carry or include a non-primary or extension image data item or portion such as an HEVC or AVI image data item or portion. This box may carry a specific (e.g., 4-byte, string, etc.) value such as ‘HEVC’ or ‘AV01’ for the Box Type (BType) parameter in the box.
[0157] Another box may be used to carry or include an image metadata item or portion for the non-primary or extension image data item or portion included or carried in the former box. The other box may carry a specific (e.g., 4-byte, string, etc.) counterpart value such as ‘RPHE’ or ‘RPAV’ - corresponding to the specific value of ‘HEVC’ or ‘AV01’ in the (former) box - for the Box Type (BType) parameter in the other box.
[0158] Both of the boxes - the former of which carries the image data item or portion, whereas the latter of which carries the corresponding image metadata item or portion to be used to process the image data item or portion - are to carry the same or identical value (e.g., “01”, etc.) for the Box Instance Number (En) parameter in each of the boxes.
[0159] In a first example, the former box may be an ‘HEVC’ box that carries or includes an HEVC image data item/portion. The latter box may be a ‘RPHE’ box that carries or includes an
RPHE image metadata item/portion for the HEVC image data item or portion carried or included in the former box. Both the ‘HEVC’ box and the ‘RPHE’ box carry the same or identical value (e.g., “01”, etc.) for the Box Instance Number (En) parameter in each of the boxes.
[0160] In a second example, the former box may be an ‘AV01’ box that carries or includes an AVI image data item/portion. The latter box may be a ‘RPAV’ box that carries or includes an RPAV image metadata item/portion for the AV 1 image data item or portion carried or included in the former box. Both the ‘AV01’ box and the ‘RPHE’ box carry the same or identical value (e.g., “01”, etc.) for the Box Instance Number (En) parameter in each of the boxes.
[0161] In some operational scenarios, the image (container) file contains multiple boxes of the same box type (BType). A recipient decoding device of the image (container) file uses the Box Instance Number parameter (or data field) as encoder-provided instructions or references for the recipient decoding device to order and merge image data or metadata items/portions carried or included in these multiple boxes of the same box type with the payload version 1 data syntax element into a single overall box or into a single overall image data or metadata item/portion. [0162] The Packet Sequence Number (Z) parameter in four bytes (or 32 bits) can be set by the image encoder in each of (e.g., all, etc.) packets used to transfer or transmit a box. This parameter can be used to to specify a specific order - e.g., an ascending order of increasing values for the Packet Sequence Number parameter in the packets, etc. - in which (e.g., all, etc.) payload data of the packets of the box is to be merged into an overall payload data for the box. Concatenation of the payload data can proceed in the specific order. The value for the Packet Sequence Number parameter of the very first packet among all packets of (e.g., a given instance of, etc.) a box of a particular Box Type may be set to 0x0001 or 0x00000001.
[0163] The Box Length (LBox) parameter in four bytes (or 32 bits) can be used to specify the length of a box. The value for the Box Length (LBox) parameter of the box may be measured or set as the sum of (a) the combined size of all payload data carried or encoded with payload version 1 data syntax elements of all boxes of the same box type (which may be set as a specific value in an enumerator); (b) the size (4 bytes in the present example) of a single copy/instance of the Box Type (BType) parameter; (c) the size (4 bytes in the present example) of a single copy/instance of the Box Length (LBox) parameter; and (d) the length (8 bytes in the present example) of a single copy/instance of the Box Length Extension (XLBox; optional) parameter if present. The value for the Box Length (LBox) parameter of the box may exclude the sizes of the Packet Sequence Number (Z) parameter, the Box Instance Number (En) parameter, the Format/Packaging version # parameter, the Null byte, the ID String parameter, the (APP11 marker) Length parameter (as illustrated in FIG. 3B) or the APP11 Marker parameter.
[0164] In a first example, a box has payload version 1 data of 32 bytes without using the Box Length Extension parameter. This box has a value of 32 (payload version 1 data) + 4 (BType) + 4 (LBox) = 40 bytes for the Box Length (LBox) parameter. If the box is split evenly over two APP11 marker segments, then each of the marker segments has a value of 2 (APP11 marker) + 2 ((APP11 marker) Length) + 2 (ID string) + 1 (Null byte) + 1 (Eormat/Packaging version #) + 2 (En) + 4 (Z) + (4 (LBox) + 4 (TBox) + 16 (version 1 pay load data; half of 32 byte pre-split payload version 1 data in the box)) = 38 bytes for an APP11 marker segment Length.
[0165] If the length of a box is larger than a box length threshold such as (232 - 1) - 7 bytes, then the Box Length Extension (XLBox) parameter may be specified to indicate the box is expanded accordingly. Otherwise, the Box Length Extension (XLBox) parameter may be omitted from the Box syntax element; hence, the XLBox size is zero (0). Example box extension can be found in the previously mentioned ISO/IEC 18477-3:2015-12-15.
[0166] The Box Type (TBox) parameter in four bytes (or 32 bits) can be used to specify a specific type of pay load data carried in a box and related context. Example box types, their respective values, ASCII encoding and constraints are illustrated in TABLE 1 below.
TABLE 1
[0167] Additional box types other than or in place of the box types illustrated in TABLE 1 may also be used, for example, to specify or define additional image metadata on an image carried in or to be constructed from the image (container) file. A recipient image decoding device may disregard - or perform no-op on - box types which the recipient decoding device does not understand.
[0168] An example description of the box type “LCHK” can be found in ISO/IEC 18477- 3:2015-12-15. In response to detecting or determining that a checksum computed over the received data of a box subject to this checksum operation or constraint differs from the checksum recorded in the box, an image decoding device may abort decoding operations while providing information such as an error message to a user. Additionally, optionally or alternatively, the image decoding device may only decode the primary JPEG image and reject non-primary or extended images such as HEVC or AVI still images. Additionally, optionally or alternatively, the image decoding device may decide to attempt a full decoding even with checksum failure(s).
[0169] The version 1 payload data syntax element in a box carried as a part of photo/image (pay load) data of an APP11 marker segment can be used to carry specific content data of the box, EITHER an image data item/portion such as HEVC or AVI still image data (item/portion) OR an image metadata item/portion.
[0170] Example sizes, allowed values and meaning of syntax elements or parameters as illustrated in FIG. 3B and FIG. 3C are provided in TABLE 2 below.
HEVC Still Image [0171] An HEVC still image may be included or encoded in an image (container) file as described herein as an H.265 bitstream. The H.265 bitstream may be in conformance with one or more applicable coding specifications such as the previously mentioned Recommendation ITU-T H.265 / ISO/IEC 23008-2:2017.
[0172] In some operational scenarios, parameters carried or coded in the H.265 bitstream may be set as follows.
[0173] The “nuh_layer_id” parameter may be set equal to “0”. The H.265 bitstream shall contain only one picture - for example, the H.265 bitstream includes only one HEVC image derived from a single source image from which a primary image in the image (container) file is derived.
[0174] The “general_profile_idc” parameter may be set equal to “2” in the HEVC MainlO Still Image Profile for coding operations used to code the H.265 bitstream.
[0175] The “general_one_picture_only_constraint_flag” may be set equal to “1” in the HEVC Main 10 Still Image Profile.
[0176] The “general_level_idc” parameter may be set less than or equal to “183”.
[0177] In addition to the above parameters set forth in Recommendation ITU-T H.265 / ISO/IEC 23008-2:2017, restrictions may be signaled or set in data fields in the (e.g., HEVC, etc.) sequence parameter set and/or picture parameter set of the H.265 bitstream, for the purpose of informing and enabling the recipient device to perform relatively efficient decoding operations. Additionally, optionally, or alternatively, restrictions may be signaled (to recipient devices) or set in data fields in AV 1 OBU.
[0178] In some operational scenarios, one or more restrictions - or corresponding (e.g., video usability information or VUI, etc.) parameters - carried or coded in the H.265 bitstream may be set as follows.
[0179] The “bit_depth_luma_minus8” parameter may be set equal to “2” for HEVC. The “bit_depth_chroma_minus8” parameter may be set equal to bit_depth_luma_minus8 for HEVC. [0180] The “chroma_format_idc” parameter may be set equal to “1” for HEVC. For AVI, the “subsampling_x” parameter may be set equal to “1”, and the “subsampling_y” parameter may be set equal to “1”.
[0181] The “vui_parameters_present_flag” parameter may be set equal to “1” for HEVC.
[0182] The “video_signal_type_present_flag” parameter may be set equal to “1” for HEVC.
The “video_format” parameter may be set equal to “0”.
[0183] The “color_description_present_flag” parameter may be set equal to “1” for HEVC.
[0184] The “chroma_loc_info_present_flag” parameter may be set equal to “1” for HEVC.
The “chroma_sample_loc_type_top_field” parameter may be set equal to “2” (or top-left sited) for HEVC. The “chroma_sample_loc_type_bottom_field” parameter may be set equal to “2” (or top-left sited) for HEVC.
[0185] For AVI, the “chroma_sampling_position” parameter may be set equal to “2”.
[0186] For AVI, the “seq_profile” parameter may be set equal to “0”, and the
“high_bitdepth” parameter may be set equal to “1”.
Image Decoding Operations
[0187] An image decoding device may decode and/process a received image (container) file as described in accordance with one or more applicable coding specifications. The image decoding device may support both co-sited and centered chroma sampling or subsampling for chroma image data carried or encoded in the image (container) file.
[0188] Example types and corresponding capabilities of image decoding device that may receive and process the received image (container) file are illustrated in TABLE 3 below.
Example Device Configurations
[0189] FIG. 6A depicts an example (image/photo) capture device (600) that may be used to implement techniques as described herein. Example capture devices as described herein may include, but are not limited to, cameras supporting HDR and/or SDR image capturing, mobile devices with one or more cameras, headmounted user devices with cameras, wearable user devices with cameras, etc.
[0190] One or more (image) sensors (602) are used to capture or generate a raw image, which may be of a bit depth of 12-16 bits. The raw image may be captured with specific camera settings for aperture, shutter speed, exposure, focal length, etc. The raw image may be processed (e.g., error correction, local and global image adjustment, etc.) by an ISP (604) to generate a post-ISP image or input image to a (e.g., encoder side, etc.) photo process core (606). In some operational scenarios, the post-ISP image or input image may be of a bit depth of 10-12 bits and comprise high dynamic range (HDR) perceptually quantized (PQ) codewords represented in a BT.2100
color space. The post- ISP image or input image generated by the ISP (604) may be optionally used as a preview image (612).
[0191] A coding block such as a (e.g., encoder side, etc.) photo process core (606) receives the post-ISP or input image from the ISP (604) and operates in conjunction with image/video codecs (e.g., 608 and 610 of FIG. 6A; external to the photo process core (606), etc.) to generate and encode images/photos of different image formats from the same input image and to package the generated/encoded images/photos into an image (container) file (614).
[0192] For example, the photo process core (606) may generate or provide a first (e.g., 10- bit, intermediate, reshaped, original, etc.) image - derived from the input image received by the photo process core (606) - to an HEVC codec such as an HEVC encode (608). The HEVC encode (608) receives, processes, converts, compresses, and/or encodes, the first image into a corresponding encoded HEVC image.
[0193] Additionally, optionally or alternatively, the photo process core (606) may generate or provide a second (e.g., 8-bit, intermediate, reshaped, etc.) image - derived from the input image received by the photo process core (606) - to a JPG codec such as a JPG encode (610). The JPG encode (610) receives, processes, converts, compresses, and/or encodes, the first image into a corresponding encoded JPG image.
[0194] The encoded HEVC image and the encoded JPG image derived from the same input image may be sent as outputs by the HEVC encode (608) and JPG (610) and received as inputs and packaged (e.g., with image metadata, etc.) into the image (container) file (614) by the photo process core (606).
[0195] In some operational scenarios, some or all of the subsystems or processing modules/blocks as illustrated in FIG. 6A may be implemented in the same capture device (600).
[0196] FIG. 6B depicts an alternative example (image/photo) capture device (600-1) that may be used to implement techniques as described herein. The capture device (600-1) may be one or more cameras supporting HDR and/or SDR image capturing, a mobile device with one or more cameras, a headmounted user device with one or more cameras, a wearable user device with one or more cameras, etc.
[0197] One or more (image) sensors (602) are used to capture or generate a raw image, which may be of a bit depth of 12-16 bits. The raw image may be captured with specific camera settings for aperture, shutter speed, exposure, focal length, etc. The raw image may be processed (e.g., error correction, local and global image adjustment, etc.) by an ISP (604) to generate a post-ISP image to an HEVC codec or HEVC encode (606). The post-ISP image generated by the ISP
(604) may be compressed or encoded by the HEVC encode (606) to generate an (e.g., HEVC, HEIC, etc.) encoded version of the post-ISP image.
[0198] In some operational scenarios, additionally image processing operations such as image analysis (616) and/or metadata generation (e.g., including but not limited to information generated from the image analysis (616), etc.) may be performed on the post-ISP image and/or the encoded version of the post-ISP image before the (final) encoded version of the post-ISP image is packaged in a relatively efficient capture device image (container) file such as an HEIC image file.
[0199] The image metadata generated from the image analysis (616) may be included in the capture device image (container) file and may be used along with the encoded version of post- ISP image by a recipient downstream device receiving the capture device image (container) file to generate optimized display image for rendering on an image display.
[0200] FIG. 6C depicts an example image processing device (650) that may be used to operate with a capture device (e.g., 600-1 of FIG. 6B, etc.) to implement techniques as described herein. The image processing device (650) may be a mobile or non-mobile computing device, a cloud- based image processing system, an image/photo processing and/or storage service, a cloud-based photo repository, etc.
[0201] A coding block such as a (e.g., encoder side, etc.) photo process core (606) receives the capture device image (container) file such as an HEIC image file containing an encoded version of the post-ISP image. The capture device image (container) file may also include image metadata. The photo process core (606) may extract the image metadata and the encoded (e.g., HEVC, HEIC etc.) version of the post-ISP image from the capture device image (container) file. [0202] The photo process core (606) may operate in conjunction with image/video codecs (e.g., 618 and 610 of FIG. 6C; external to the photo process core (606), etc.) to generate and decode/encode images/photos of different image formats in connection with the same post-ISP image and to package the generated/encoded images/photos into an image (container) file (614).
[0203] For example, the photo process core (606) may operate with or invoke an HEVC codec such as an HEVC decode (618) to generate a decoded version of the post-ISP image. The HEVC decode (618) receives, processes, converts, decompresses, and/or decodes, the encoded version of the post-ISP image in the capture device image (container) file into the corresponding decoded version of the post-ISP image.
[0204] The photo process core (606) may generate or provide a (e.g., 8-bit, intermediate, reshaped, etc.) image - derived from the decoded version of the post-ISP image - to a JPG codec
such as a JPG encode (610). The JPG encode (610) receives, processes, converts, compresses, and/or encodes, the image into a corresponding encoded JPG image.
[0205] The (e.g., HEVC, HEIC, etc.) encoded version of the post-ISP image as received in the capture device image (container) file and the encoded JPG image derived by the JPG encode (610) may be packaged, for example with image metadata, into the image (container) file (614) by the photo process core (606). The image metadata in the image (container) file (614) may include image metadata portions - e.g., relating to the HEVC or HEIC encoded version - generated or received from the image metadata extracted from the capture device image (container) file. The image metadata in the image (container) file (614) may also include other image metadata portions - e.g., relating to the JPG image - generated by the photo process core (606).
[0206] FIG. 7 depicts an example (image/photo) recipient decoding device (700) that may be used to implement techniques as described herein. The decoding device (700) may be an image display supporting HDR and/or SDR image rendering, a mobile or non-mobile computing device with one or more image displays, a TV set, etc.
[0207] A (e.g., decoder side, etc.) photo process core (606) in the decoding device (700) may receive an image (container) file containing two or more images of different image formats such as an HEVC image and a JPG image, etc. By way of example but not limitation, the photo process core (606) operates in conjunction with an image/video codec such as a JPG codec, HEVC codec, an AVI codec, etc. By way of illustration but not limitation, the image/video codec represents an HEVC codec such as an HEVC decode (618). The photo process core (606) invokes the HEVC decode (618) to decode an HEVC image, which may be an HEVC encoded version of a post-ISP image captured with a camera device.
[0208] The decoding device (700) may include or operate with an image display such as a display panel. Panel configuration (704) - which may specify display capabilities such as dynamic range, color gamut, image refresh rate, etc. - may be generated or provided to the photo process core (606).
[0209] Additionally, optionally or alternatively, image metadata may be extracted by the photo process core (606) from the received image (container) file (614).
[0210] Based at least in part on the panel configuration (704) and/or the image metadata, the photo process core (606) generates a display image from the HEVC image. For the purpose of illustration only, the display image represents an RGB image. The display image may be transmitted or provided to the image display or display panel for rendering by way of an RGB image buffer (702).
[0211] FIG. 6D depicts an example (e.g., encoder side, etc.) photo process core (or subsystem) 606 that may be included or implemented in an image processing device to implement techniques as described herein. The image processing device that includes or implements the photo process core (606) may be a mobile or non-mobile computing device, a cloud- based image processing system, an image/photo processing and/or storage service, a cloud-based photo repository, capture device, a separate device operating with the capture device, a decoding device, a rendering device, etc. In various operational scenarios, more or fewer components and/or processing blocks may be included or implemented in the photo process core (606).
[0212] By way of example but not limitation, the photo process core (620) receives an input image (620), which may, but is not necessarily limited to only, be from an ISP, from a photo image file generated by a camera, etc. In some operational scenarios, the photo process core (620) may convert or reshape the input image (620) of a first bit depth (e.g., 10-12 bits, etc.) into an intermediate image of a second bit depth (e.g., 10 bits, etc.).
[0213] The photo process core (606) operates with and invokes an HEVC encode (608) to generate an encoded HEVC image (or an HEVC encoded version of the input image or the converted/reshaped image, etc.).
[0214] The photo process core (606) includes a color volume mapper (624) that may be used to generate a (e.g., color volume mapped, color gamut mapped, color space mapped, reshaped, etc.) mapped image from the input image (620).
[0215] The photo process core (606) operates with and invokes a JPG encode (610) to generate an encoded JPG image (or a JPG encoded version of the map image, etc.).
[0216] Additionally, optionally, or alternatively, the photo process core (606) includes a metadata analysis processing block (622) that may be used to analyze some or all of the input images, intermediate images generated, reshaped, converted or mapped from the input images, the encoded images, etc., to generate image metadata. These image metadata may include operational parameters that may be used by a recipient decoding device to generate or reconstruct images optimized for rendering in a wide variety of devices or image displays with varying system and/or display capabilities.
[0217] Some or all of the (e.g., HEVC, JPG, etc.) encode images and/or the image metadata may be combined or coded by a package processing block (626) into an image (container) file (614).
[0218] As illustrated in FIG. 8, the package processing block (626) (also referred to as “photo packager”) may obtain EXIF (exchangeable image file format) metadata, color volume mapping
metadata, an HEVC encoded image, a JPG encoded image, etc., as inputs. The EXIF metadata and the JPG encoded image may be used by the package processing block (626) to populate or generate syntax elements corresponding to the JPG header and EXIF in a JPG image (container) file. Additionally, optionally or alternatively, some or all of the (input) EXIF metadata, color volume mapping metadata, the HEVC encoded image, etc., may be included as one or more photo payloads by the package processing block (626) to populate or generate syntax elements corresponding to one or more APP11 marker segments in the same JPG image (container) file. [0219] A photo process (or processing) core as described herein may be implemented with GPU, SOC, DSP, ASIC, FPGA, CPU, or other computing resources for full quality images and/or reduced quality images. The reduced quality images such as gallery images or thumbnails may be generated in a “Reduced Compute” mode with relatively low computing resource usages. For example, for gallery or thumbnail view, a primary (e.g., JPEG base layer, etc.) image can be rendered or displayed with no additional processing.
[0220] As illustrated in FIG. 6A through FIG. 6C and FIG. 7, a photo process core may operate with or rely on external codecs for inputs/outputs. Photo/image processing operations in connection with the photo process core may be split or partitioned into multiple processing blocks. These processing blocks may implement methods to generate the “most efficient” packages (e.g., an image container file with relatively few images of different image formats and/or no or relatively small amount of image metadata, etc.) and/or “most compatible” packages (e.g., an image container file with more images of different image formats and/or larger amount of image metadata, etc.) for a given input image.
[0221] As illustrated in FIG. 6D (and FIG. 8), photo packaging operations may be reduced to or performed by a single “package” processing block.
Metadata Compression
[0222] In some operational scenarios, the image metadata such as relating to image reshaping or prediction mappings as carried or included in the image (container) file 122 can be compressed by an upstream encoding device to reduce the size of the image metadata transmitted by the encoding device to a downstream recipient (decoding) device.
[0223] The image (container) file 122 or bitstream may carry configuration, profile and/or level parameter(s) whose values can indicate different schemes or types of metadata compression to be supported by the downstream device such as a playback device. For example, the configuration, profile and/or level parameter(s) can be set to first specific value(s) to indicate
limited metadata compression - or a first metadata compression scheme or type that uses relatively small-sized (e.g., metadata, etc.) buffering in the downstream device. Additionally, optionally or alternatively, the configuration, profile and/or level parameter(s) can be set to second specific value(s) to indicate extended metadata compression - or a second metadata compression scheme or type that uses relatively large-sized (e.g., metadata, etc.) buffering in the downstream device. Additionally, optionally or alternatively, the configuration, profile and/or level parameter(s) can be set to third specific value(s) to indicate no metadata compression to be performed or supported by the upstream device and/or the downstream device.
[0224] In some operational scenarios, the limited metadata compression scheme or type allows up to one (1) reference buffer for the complete composer (or image reshaping) coefficients pay load and up to one (1) reference buffer for the DM coefficients. In comparison, the extended metadata compression scheme or type allows more buffers to be used by the image reshaping operations and/or by the DM operations.
[0225] By way of example but not limitation, image metadata compression operations may be performed with respect to image metadata portions containing prediction or reshaping operational parameters.
[0226] In some operational scenarios, a first flag (e.g., denoted as “use_prev_di_rpu_flag”, a specific syntax element, etc.) coded in the image (container) file (122) may be set by the upstream device to a specific value (e.g., use_prev_di_rpu_flag = 1, etc.) to signal or indicate to the downstream device that a previously sent image metadata portion (e.g., of a specific image metadata type, of a type identified by another syntax element such as di_rpu_type = 8, etc.) for a previously sent/decoded image (e.g., in the same image (container) file 122, etc.) may be re-used by the downstream device for a current image. As a result, some or all image processing operations such as image reshaping or prediction operations to be performed on or for the current image by the downstream device can share or make use of the same previously sent image metadata portion, thereby reducing or omitting some or all of a separate (e.g., reshaping, prediction, composer, etc.) image metadata portion to be explicitly or specifically coded or transmitted by the upstream device to the downstream device for the current image in the image (container) file 122.
[0227] On the other hand, the first flag (“use_prev_di_rpu_flag”) coded in the image (container) file (122) may be set by the upstream device to a second specific value (e.g., use_prev_di_rpu_flag = 0, etc.) to signal or indicate to the downstream device that an image metadata portion (e.g., of a specific image metadata type, of a type identified by another syntax element such as di_rpu_type = 8, etc.) for the current image is explicitly or specifically sent,
transmitted included or present in the image (container) file 122. As a result, some or all image processing operations such as image reshaping or prediction operations can be performed by the downstream device on or for the current image using the image metadata portion explicitly or specifically coded/transmitted by the upstream device for the current image in the image (container) file 122 and decoded or retrieved for the current image by the downstream device from the image (container) file 122. In some operational scenarios, an image or picture may be indicated or carried in the image (container) file 122 as a key frame for which the value of the first flag (“use_prev_di_rpu_flag”) is set to zero (0).
[0228] A second flag (e.g., denoted as “prev_di_rpu_id”, a second specific syntax element, etc.) coded in the image (container) file (122) may be set by the upstream device to a specific value (e.g., a specific value within a value range of 0 to 15, etc.) to signal or indicate to the downstream device which specific portion of the previously sent image metadata (e.g., for a previous image, etc.) is to be used for performing image processing, reshaping or predicting operations on or for the current image.
[0229] The previously sent image metadata by the upstream device and received by the downstream device may include a plurality of (previously sent) image metadata portions maintained or stored in memory or cache at the downstream device. Each (previously sent) image metadata portion in the plurality of image metadata portions at the downstream device may be labeled or identified with a respective specific metadata portion identifier - among a plurality of metadata portion identifiers respectively for the plurality of (previously sent) image metadata portions. The respective specific metadata portion identifier may be within a specific value range such as 0 to 15.
[0230] The second flag (“prev_di_rpu_id”) set with a specific value in the specific value range may be used by the upstream device to signal or indicate to the downstream device that a specific (previously sent) image metadata portion in the plurality of image metadata portions maintained or cached at the downstream device is to be used by the downstream device to perform image processing, reshaping or predicting on or for the current image. The specific image metadata portion may be previously sent for a previous image with a third flag (denoted as “di_rpu_id”, a specific syntax element, etc.) set for the previous image to the same metadata portion identifier value as that of the second flag (“prev_di_rpu_id” etc.) set for the current image.
[0231] Additionally, optionally or alternatively, in response to determining the second flag (“prev_di_rpu_id”) is not present for the current image in the image (container) file 122, the downstream device can proceed to set the second flag internally to a value outside the specific
value range such as -1 (e.g., invalid rpu_id, etc.). This helps avoid or prevent the downstream device from subsequently using an incorrect (previously sent) image metadata portion to perform image processing, reshaping or prediction on or for the current image.
[0232] The second flag (“prev_di_rpu_id”) coded in the image (container) file (122) may be set by the upstream device to a second specific value (e.g., prev_di_rpu_id =0, etc.) to signal or indicate to the downstream device that a specific image metadata portion such as (e.g., interlayer, etc.) image reshaping or prediction for the current image is explicitly or specifically sent, transmitted, included or present in the image (container) file 122 from the upstream device to the downstream device. As a result, some or all image processing operations - which may include but are not necessarily limited to only, image reshaping or prediction operations - can be performed on or for the current image using the (currently sent) specific image metadata portion. Some or all operational parameters in the (currently sent) specific image metadata portion may be specified with a (currently sent) data structure (e.g., denoted as “di_rpu_data_mapping()”, a specific set of syntax elements, etc.) explicitly or specifically coded for the current image in the image (container) file 122.
[0233] Hence, under techniques as described herein, the downstream (receiving or decoding) device can store, maintain or cache received image metadata portions including optimized operational parameter values generated by the upstream device and included in the image (container) file 122 such as in the “di_rpu_data_mapping()” data structure and used in or referred to by subsequent operations performed by the downstream device as reference data for the current and subsequent images if so signaled or directed by the upstream device with the image (container) file 122.
[0234] As noted, each image metadata portion - e.g., operational parameters such as prediction coefficients - in the plurality of image metadata portions stored or cached at the downstream device as the reference data can be distinctly tagged, indexed or identified with a respective value (which may be the same value of the third flag “di_rpu_id” as set by the upstream device and received by the downstream device) within the specific value range. [0235] In response to determining that the reference data maintained or cached by the receiving device already include a previously received (or otherwise already maintained/cached) image metadata portion with the same tag/index/identifier value as that of the third flag (“di_rpu_id”) for the currently sent image metadata portion for the current image, the downstream device can update the reference data by overwriting the previously received image metadata portion with the currently received image metadata portion including any prediction coefficients explicitly transmitted for the current image.
[0236] By way of example but not limitation, the upstream device may determine that the value of the first flag (“use_prev_di_rpu_flag”) should be set to one (1) for the current image. In response, the upstream device does not transmit or include inter layer prediction coefficients explicitly in the image (container) file 122 for the current image. Correspondingly, the downstream device is to read or reuse operational parameters in a previously received or stored data structure (“di_rpu_data_mapping()”) that is labeled with a specific tag/index/identifier value equal to the value of the second flag (“prev_di_rpu_id”) of a currently received image metadata unit (“rpu” with a metadata unit flag or header field “di_rpu_type” = 8) for the current image, and use the previously received/stored operational parameters including the inter layer prediction coefficients for processing the current image or picture. The previously received or stored data structure may be identified by the value of the second flag (“prev_di_rpu_id”) carried or signaled in the currently received image metadata unit (“rpu” with “di_rpu_type” = 8).
[0237] However, if the value of the second flag (“prev_di_rpu_id”) in the current image metadata unit (with “di_rpu_type” = 8) does not match any of the tags, indexes or identifiers of the stored data structures (“di_rpu_data_mapping()”) or any stored inter layer prediction coefficients in the reference data maintained at the downstream device, the downstream device or an image metadata parser therein may infer that the stored data structure - containing inter layer prediction coefficients - with a tag/index/identifier value equal to zero (0) - e.g., as a fallback - should be read and re-used for processing the current image.
[0238] In addition, if the stored data structure or inter layer prediction coefficients with a tag/index/identifier value equal to 0 do not exist in the reference data either, the downstream device or the image metadata parser can fall back to use inter layer prediction coefficients representing (e.g., default, trivial, supported, etc.) 1:1 linear mappings for processing the current image or picture.
[0239] Different images - e.g., of different image formats, different image characteristics or qualities such as different dynamic range, different bit depths, different color precisions, etc. - that can be generated from image data and attendant image metadata in the image (container) file 122 may be logically represented as different image layers. In some operational scenarios, the image (container) file 122 may or may not explicitly include all image data and/or all image metadata that are needed to generate a speicific image or image layer in these different images or image layers. Rather, inter-layer image processing operations such as inter-layer prediction, reshaping, mapping, etc., may be used to generate some or all of the image data constituting the specific image.
[0240] Image processing operations as described herein - which may include, but not necessarily limited to only: any of: (e.g., inter-layer, etc.) image prediction, image reshaping, image mapping, etc. - may be performed with polynomial pivot points, polynomial coefficients, and the like, signaled or transferred from the upstream device to the downstream device in image metadata. Example image prediction, reshaping and/or mapping methods as described herein may include, but are not necessarily limited to only, any of: linear interpolation, second-order polynomial interpolation, multiple color channel, multiple regression prediction (MMR), etc, for example, as described in U.S. Patent No. 10,021,390, the entire contents of which are incorporated by reference herein in entirety. Additionally, optionally or alternatively, image prediction, reshaping and/or mapping methods as described herein may include tensor product, B-spline prediction (TPB), for example as described in U.S. Patent Application Publication No. 2022/0408081, the entire contents of which are incorporated by reference herein in entirety.
[0241] In some operational scenarios, image metadata compression operations may be performed with respect to DM metadata portions containing display management (DM) operational parameters.
[0242] For example, a fourth flag (e.g., denoted as “use_prev_level_md_flag”, a specific syntax element, etc.) coded in the image (container) file (122) may be set by the upstream device to a specific value (e.g., use_prev_level_md_flag = 1, etc.) to signal or indicate to the downstream device that a previously sent DM metadata portion (e.g., of a specific image metadata type, of a type identified by another syntax element such as di_rpu_type = 8, etc.) for a previously sent/decoded image (e.g., in the same image (container) file 122, etc.) may be re-used by the downstream device to perform DM operations for a current image. As a result, the DM operations to be performed for the current image by the downstream device can share or make use of the same previously sent DM metadata portion, thereby reducing or omitting some or all of a separate (e.g., DM, etc.) DM metadata portion to be explicitly or specifically coded or transmitted by the upstream device to the downstream device for the current image in the image (container) file 122.
[0243] On the other hand, the fourth flag (“use_prev_level_md_flag”) coded in the image (container) file (122) may be set by the upstream device to a second specific value (e.g., use_prev_level_md_flag = 0, etc.) to signal or indicate to the downstream device that a DM metadata portion (e.g., of a specific image metadata type, of a type identified by another syntax element such as di_rpu_type = 8, etc.) relating to DM operations to be performed for the current image is explicitly or specifically sent, transmitted included or present in the image (container) file 122. As a result, the DM operations can be performed by the downstream device for the
current image using the DM metadata portion explicitly or specifically coded/transmitted by the upstream device for the current image in the image (container) file 122 and decoded or retrieved for the current image by the downstream device from the image (container) file 122.
[0244] A fifth flag (e.g., denoted as “prev_level_md_id”, a second specific syntax element, etc.) coded in the image (container) file (122) may be set by the upstream device to a specific value (e.g., a specific value within a value range of 0 to 15, etc.) to signal or indicate to the downstream device which specific portion of the previously sent image metadata (e.g., for a previous image, etc.) is to be used for performing DM operations for the current image.
[0245] The previously sent DM metadata by the upstream device and received by the downstream device may include a plurality of (previously sent) DM metadata portions maintained or stored in memory or cache at the downstream device. Each (previously sent) DM metadata portion in the plurality of DM metadata portions at the downstream device may be labeled or identified with a respective specific DM metadata portion identifier - among a plurality of DM metadata portion identifiers respectively for the plurality of (previously sent) DM metadata portions. The respective specific DM metadata portion identifier may be within a specific value range such as 0 to 15.
[0246] The fifth flag (“prev_level_md_id”) set with a specific value in the specific value range may be used by the upstream device to signal or indicate to the downstream device that a specific (previously sent) DM metadata portion in the plurality of DM metadata portions maintained or cached at the downstream device is to be used by the downstream device to perform DM operations for the current image. The specific DM metadata portion may be previously sent for a previous image with the third flag (“di_rpu_id”) set for the previous image to the same DM metadata portion identifier value as that of the fifth flag (“prev_level_md_id” etc.) set for the current image.
[0247] Additionally, optionally or alternatively, in response to determining the fifth flag (“prev_level_md_id”) is not present for the current image in the image (container) file 122, the downstream device can proceed to set the fifth flag internally to a value outside the specific value range such as -1 (e.g., invalid rpu_id, etc.). This helps avoid or prevent the downstream device from subsequently using an incorrect (previously sent) DM metadata portion to perform DM operations for the current image.
[0248] The fifth flag (“prev_level_md_id”) coded in the image (container) file (122) may be set by the upstream device to a second specific value (e.g., prev_level_md_id =0, etc.) to signal or indicate to the downstream device that a specific DM metadata portion for the current image is explicitly or specifically sent, transmitted, included or present in the image (container) file 122
from the upstream device to the downstream device. As a result, the DM operations can be performed for the current image using the (currently sent) specific DM metadata portion. Some or all operational parameters in the (currently sent) specific DM metadata portion may be specified with a (currently sent) data structure (e.g., denoted as “di_dm_data_payload()”, a specific set of syntax elements, etc.) explicitly or specifically coded for the current image in the image (container) file 122.
[0249] Hence, under techniques as described herein, the downstream (receiving or decoding) device can store, maintain or cache received DM metadata portions including optimized operational parameter values generated by the upstream device and included in the image (container) file 122 such as in the “di_dm_data_payload()” data structure and used in or referred to by subsequent operations performed by the downstream device as reference data for the current and subsequent images if so signaled or directed by the upstream device with the image (container) file 122.
[0250] As noted, each DM metadata portion - e.g., operational parameters for DM operations - in the plurality of DM metadata portions stored or cached at the downstream device as the reference data can be distinctly tagged, indexed or identified with a respective value (which may be the same value of the third flag “di_rpu_id” as set by the upstream device and received by the downstream device) within the specific value range.
[0251] In response to determining that the reference data maintained or cached by the receiving device already include a previously received (or otherwise already maintained/cached) DM metadata portion with the same tag/index/identifier value as that of the third flag (“di_rpu_id”) for the currently sent DM metadata portion for the current image, the downstream device can update the reference data by overwriting the previously received DM metadata portion with the currently received DM metadata portion explicitly transmitted for the current image.
[0252] By way of example but not limitation, the upstream device may determine that the value of the fourth flag (“use_prev_level_md_flag”) should be set to one (1) for the current image. In response, the upstream device does not transmit or include DM operational parameters explicitly in the image (container) file 122 for the current image. Correspondingly, the downstream device is to read or reuse operational parameters in a previously received or stored data structure (“di_dm_data_payload()”) that is labeled with a specific tag/index/identifier value equal to the value of the fifth flag (“prev_level_md_id”) of a currently received image metadata unit (“rpu” with a metadata unit flag or header field “di_rpu_type” = 8) for the current image, and use the previously received/stored DM operational parameters for performing the DM operations
for the current image or picture. The previously received or stored data structure may be identified by the downstream device using the value of the fifth flag (“prev_level_md_id”) carried or signaled in the currently received image metadata unit (“rpu” with “di_rpu_type” = 8). The downstream device can read the previously stored data structure (“di_dm_data_payload()”) from the last image metadata unit (“rpu” with “di_rpu_type” = 8) - for a previous image - in which the fourth flag (“use_prev_level_md_flag”) is set to 0 and copy into a DM metadata buffer for the current image or picture. This copy can overwrite any existing DM metadata portions of the same level(s) in the current DM metadata buffer if the level(s) present or tagged in the stored DM metadata corresponding to a DM metadata portion identifier equal to the value of the fifth flag (“prev_level_md_id”) of the current metadata unit (“rpu” with “di_rpu_type” = 8) for the current image or picture.
[0253] However, if the value of the fifth flag (“prev_level_md_id”) in the current image metadata unit (with “di_rpu_type” = 8) does not match any of the tags, indexes or identifiers of the stored data structures (“di_dm_data_payload()”) or any stored DM operational parameters in the reference data maintained at the downstream device, the downstream device or an image metadata parser therein may infer that the stored data structure - containing DM operational parameters - with a tag/index/identifier value equal to zero (0) - e.g., as a fallback - should be read and re-used in DM operations for the current image.
[0254] The aforementioned steps of DM metadata copying or reuse may be done in the described order. Subsequently, the DM metadata portion stored in the current DM metadata buffer for the current image is used to perform DM operations for the current image or picture. [0255] A metadata compression sub-system or module may be implemented by the upstream device to support or perform metadata compression operations on image metadata related to or associated with image prediction, image reshaping or display management. In some operational scenarios, some or all of the metadata compression operations such as the limited metadata compression may be performed in an (image or picture) encoding order. Example encoding orders may be, but are not necessarily limited to only, one of: encoding a key image frame first and followed by encoding non-key image frame(s) referencing the key image frame, encoding a to-be-referenced image frame first followed by encoding image frame(s) referencing the to-be- referenced image, etc. Additionally, optionally or alternatively, in some operational scenarios, an image metadata stream or sub-stream with metadata compression may be generated by the upstream device in a display order.
[0256] FIG. 9 illustrates example image metadata compression operations that may be implemented or performed by the upstream device.
[0257] Block 902 comprises receiving a request to encode an image metadata portion (“RPU”) relating to (e.g., inter-layer, etc.) image prediction, image reshaping, etc., for an image frame. The upstream device proceed to determine whether the image frame is a key frame.
[0258] Block 904 comprises, in response to determining that the image frame is a key frame, the image metadata portion (RPUref) for this (reference) key frame is set to, or included based on, an input image metadata portion (RPUin). Block 910 comprises setting the first flag (“use_prev_di_rpu_flag”) to zero (0).
[0259] Block 906 comprises, in response to determining that the image frame is not a key frame, the upstream device proceed to determine whether the input image metadata portion (RPUin) for this (non-reference) key frame is equal to the image metadata portion (RPUref) already set or included for the key frame.
[0260] Block 908 comprises, in response to determining that the input image metadata portion (RPUin) for this (non-reference) key frame is equal to the image metadata portion (RPUref) already set or included for the key frame, the upstream device further proceed to set the first flag ("use_prev_di_rpu_flag”) to one (1) and the second flag ("prev_di_rpu_id”) to the same value as that of the third flag (“di_rpu_id”) of the image metadata portion (RPUref) already set or included for the key frame.
[0261] This process flow of FIG. 9A may be repeatedly or iteratively performed for all images or pictures (or image frames) in an encoding order, thereby generating compressed image metadata units for image processing operations such as (e.g., inter-layer) image prediction or reshaping operations (RPU with compressed composer metadata).
[0262] The downstream device or a metadata parser therein may extract or receive, from the image (container) file 122 or bitstream generated by the upstream device, image metadata (“RPU”) portions or payloads in the encoding order.
[0263] The metadata parser may maintain one (e.g., single, etc.) first metadata buffer to store or cache a first (e.g., entire, etc.) data structure (“di_rpu_data_payload()”) of a current image metadata portion for which the first flag (“use_prev_di_rpu_flag”) is set to zero (0) and the third flag (“di_rpu_id”) is also set to zero (0).
[0264] Additionally, optionally or alternatively, the metadata parser may maintain another (e.g., single, etc.) second metadata buffer to store or cache a second (e.g., entire, etc.) DM metadata structure (“di_dm_data_payload()” or “di_dm_data_payload2()”) for which one or more flags indicate or signal as explicitly being carried or included in the image (container) file 122 or bitstream.
[0265] The metadata parser can restore or reuse operational parameters (e.g., composer coefficients relating to image prediction/mapping/reshaping, etc.) from the first metadata buffer if the first data structure (“di_rpu_data_payload()”) is not present due, for example, as indicated by the first flag (“use_prev_vdr_rpu_flag”) = 1.
[0266] The metadata parser can restore or reuse operational parameters (e.g., parameters or coefficients relating to display management, etc.) from the second metadata buffer if the second data structure (“di_dm_data_payload()” or “di_dm_data_payload2()”) is not present due, for example, as indicated by one or more corresponding flags for DM metadata compression.
Example Process Flows
[0267] FIG. 4A illustrates an example process flow according to an embodiment of the present invention. In some embodiments, one or more computing devices or components (e.g., an encoding device/module, a transcoding device/module, a decoding device/module, an inverse tone mapping device/module, a tone mapping device/module, a media device/module, a reverse mapping generation and application system, etc.) may perform this process flow. In block 402, an image processing system encodes a primary image of a first image format into an image file designated for the first image format.
[0268] In block 404, the image processing system encodes a non-primary image of a second image format into one or more attendant segments of the image file. The second image format is different from the first image format.
[0269] In block 406, the image processing system causes a display image derived from a reconstructed image to be rendered with a recipient device of the image file. The reconstructed image is generated from one of the primary image or the non-primary image.
[0270] In an embodiment, the primary image represents a JPEG image; wherein the image file represents a JPEG image file; wherein the non-primary image represents a non- JPEG image. [0271] In an embodiment, both the primary image and the non-primary image are derived from a same source image.
[0272] In an embodiment, the non-primary image represents one of one or more non-primary images of the second image format that are encoded in the image file with the primary image.
[0273] In an embodiment, the one or more segments are encoded as application 11 (APP11) marker segments in the image file.
[0274] In an embodiment, one or more image metadata portions are encoded in one or more second segments of the image file.
[0275] In an embodiment, the one or more image metadata portions includes a specific image metadata portion that carries specific operational parameters for specific image processing operations to be performed by the recipient device on one of: the primary image or the nonprimary image.
[0276] In an embodiment, the specific image processing operations include one or more of: image forward reshaping, image backward reshaping, image inverse mapping, image mapping, color space conversion, codeword linear mapping, codeword non-linear mapping, display management operations, perceptual quantization based mapping, mapping based on one or more transfer functions, other image processing operations performed by the recipient device, etc. [0277] In an embodiment, specific image metadata portion is generated by concatenating one or more boxes carried in one or more application 11 (APP11) marker segments included in the one or more second segments.
[0278] In an embodiment, the non-primary image of the second image format represents one of: an HEVC image, an AVI image, or another non- JPEG image.
[0279] FIG. 4B illustrates an example process flow according to an embodiment of the present invention. In some embodiments, one or more computing devices or components (e.g., an encoding device/module, a transcoding device/module, a decoding device/module, an inverse tone mapping device/module, a tone mapping device/module, a media device/module, a prediction model and feature selection system, a reverse mapping generation and application system, etc.) may perform this process flow. In block 452, an image decoding system receives an image file designated for a first image format. The image file is encoded with a primary image of the first image format.
[0280] In block 454, the image decoding system decodes a non-primary image of a second image format from one or more attendant segments of the image file. The second image format is different from the first image format.
[0281] In block 456, the image decoding system causes a display image derived from a reconstructed image to be rendered on an image display. The reconstructed image is generated from one of the primary image or the non-primary image using image metadata carried in the image file.
[0282] In an embodiment, the image decoding system further performs: receiving a second image file designated for the first image format, wherein the second image file is encoded with a second primary image of the first image format, the second image file being not encoded with another image other than the second primary image; decoding the second primary image of the first image format from one or more second attendant segments of the second image file; causing
a second display image derived from the second primary image to be reconstructed and rendered on the image display.
[0283] In an embodiment, the display image is generated by one or more image processing operations; one or more operational parameters for the one or more image processing operations are decoded from an image metadata portion carried by the image file.
[0284] In an embodiment, a computing device such as a display device, a mobile device, a set- top box, a multimedia device, etc., is configured to perform any of the foregoing methods. In an embodiment, an apparatus comprises a processor and is configured to perform any of the foregoing methods. In an embodiment, a non-transitory computer readable storage medium, storing software instructions, which when executed by one or more processors cause performance of any of the foregoing methods.
[0285] In an embodiment, a computing device comprising one or more processors and one or more storage media storing a set of instructions which, when executed by the one or more processors, cause performance of any of the foregoing methods.
[0286] Note that, although separate embodiments are discussed herein, any combination of embodiments and/or partial embodiments discussed herein may be combined to form further embodiments.
Example Computer System Implementation
[0287] Embodiments of the present invention may be implemented with a computer system, systems configured in electronic circuitry and components, an integrated circuit (IC) device such as a microcontroller, a field programmable gate array (FPGA), or another configurable or programmable logic device (PLD), a discrete time or digital signal processor (DSP), an application specific IC (ASIC), and/or apparatus that includes one or more of such systems, devices or components. The computer and/or IC may perform, control, or execute instructions relating to the adaptive perceptual quantization of images with enhanced dynamic range, such as those described herein. The computer and/or IC may compute any of a variety of parameters or values that relate to the adaptive perceptual quantization processes described herein. The image and video embodiments may be implemented in hardware, software, firmware and various combinations thereof.
[0288] Certain implementations of the inventio comprise computer processors which execute software instructions which cause the processors to perform a method of the disclosure. For example, one or more processors in a display, an encoder, a set top box, a transcoder or the like
may implement methods related to adaptive perceptual quantization of HDR images as described above by executing software instructions in a program memory accessible to the processors. Embodiments of the invention may also be provided in the form of a program product. The program product may comprise any non-transitory medium which carries a set of computer- readable signals comprising instructions which, when executed by a data processor, cause the data processor to execute a method of an embodiment of the invention. Program products according to embodiments of the invention may be in any of a wide variety of forms. The program product may comprise, for example, physical media such as magnetic data storage media including floppy diskettes, hard disk drives, optical data storage media including CD ROMs, DVDs, electronic data storage media including ROMs, flash RAM, or the like. The computer-readable signals on the program product may optionally be compressed or encrypted. [0289] Where a component (e.g., a software module, processor, assembly, device, circuit, etc.) is referred to above, unless otherwise indicated, reference to that component (including a reference to a "means") should be interpreted as including as equivalents of that component any component which performs the function of the described component (e.g., that is functionally equivalent), including components which are not structurally equivalent to the disclosed structure which performs the function in the illustrated example embodiments of the invention.
[0290] According to one embodiment, the techniques described herein are implemented by one or more special-purpose computing devices. The special-purpose computing devices may be hard-wired to perform the techniques, or may include digital electronic devices such as one or more application-specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs) that are persistently programmed to perform the techniques, or may include one or more general purpose hardware processors programmed to perform the techniques pursuant to program instructions in firmware, memory, other storage, or a combination. Such special-purpose computing devices may also combine custom hard-wired logic, ASICs, or FPGAs with custom programming to accomplish the techniques. The special-purpose computing devices may be desktop computer systems, portable computer systems, handheld devices, networking devices or any other device that incorporates hard-wired and/or program logic to implement the techniques. [0291] For example, FIG. 5 is a block diagram that illustrates a computer system 500 upon which an embodiment of the invention may be implemented. Computer system 500 includes a bus 502 or other communication mechanism for communicating information, and a hardware processor 504 coupled with bus 502 for processing information. Hardware processor 504 may be, for example, a general purpose microprocessor.
[0292] Computer system 500 also includes a main memory 506, such as a random access
memory (RAM) or other dynamic storage device, coupled to bus 502 for storing information and instructions to be executed by processor 504. Main memory 506 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 504. Such instructions, when stored in non-transitory storage media accessible to processor 504, render computer system 500 into a special-purpose machine that is customized to perform the operations specified in the instructions.
[0293] Computer system 500 further includes a read only memory (ROM) 508 or other static storage device coupled to bus 502 for storing static information and instructions for processor 504. A storage device 510, such as a magnetic disk or optical disk, is provided and coupled to bus 502 for storing information and instructions.
[0294] Computer system 500 may be coupled via bus 502 to a display 512, such as a liquid crystal display, for displaying information to a computer user. An input device 514, including alphanumeric and other keys, is coupled to bus 502 for communicating information and command selections to processor 504. Another type of user input device is cursor control 516, such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 504 and for controlling cursor movement on display 512. This input device typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), that allows the device to specify positions in a plane.
[0295] Computer system 500 may implement the techniques described herein using customized hard-wired logic, one or more ASICs or FPGAs, firmware and/or program logic which in combination with the computer system causes or programs computer system 500 to be a special-purpose machine. According to one embodiment, the techniques as described herein are performed by computer system 500 in response to processor 504 executing one or more sequences of one or more instructions contained in main memory 506. Such instructions may be read into main memory 506 from another storage medium, such as storage device 510. Execution of the sequences of instructions contained in main memory 506 causes processor 504 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions.
[0296] The term “storage media” as used herein refers to any non-transitory media that store data and/or instructions that cause a machine to operation in a specific fashion. Such storage media may comprise non-volatile media and/or volatile media. Non-volatile media includes, for example, optical or magnetic disks, such as storage device 510. Volatile media includes dynamic memory, such as main memory 506. Common forms of storage media include, for example, a floppy disk, a flexible disk, hard disk, solid state drive, magnetic tape, or any other magnetic data
storage medium, a CD-ROM, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip or cartridge.
[0297] Storage media is distinct from but may be used in conjunction with transmission media. Transmission media participates in transferring information between storage media. For example, transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 502. Transmission media can also take the form of acoustic or light waves, such as those generated during radio- wave and infra-red data communications.
[0298] Various forms of media may be involved in carrying one or more sequences of one or more instructions to processor 504 for execution. For example, the instructions may initially be carried on a magnetic disk or solid state drive of a remote computer. The remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem. A modem local to computer system 500 can receive the data on the telephone line and use an infra-red transmitter to convert the data to an infra-red signal. An infra-red detector can receive the data carried in the infra-red signal and appropriate circuitry can place the data on bus 502. Bus 502 carries the data to main memory 506, from which processor 504 retrieves and executes the instructions. The instructions received by main memory 506 may optionally be stored on storage device 510 either before or after execution by processor 504.
[0299] Computer system 500 also includes a communication interface 518 coupled to bus 502. Communication interface 518 provides a two-way data communication coupling to a network link 520 that is connected to a local network 522. For example, communication interface 518 may be an integrated services digital network (ISDN) card, cable modem, satellite modem, or a modem to provide a data communication connection to a corresponding type of telephone line. As another example, communication interface 518 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN. Wireless links may also be implemented. In any such implementation, communication interface 518 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.
[0300] Network link 520 typically provides data communication through one or more networks to other data devices. For example, network link 520 may provide a connection through local network 522 to a host computer 524 or to data equipment operated by an Internet Service Provider (ISP) 526. ISP 526 in turn provides data communication services through the worldwide packet data communication network now commonly referred to as the “Internet” 528. Local network 522 and Internet 528 both use electrical, electromagnetic or optical signals that carry
digital data streams. The signals through the various networks and the signals on network link 520 and through communication interface 518, which carry the digital data to and from computer system 500, are example forms of transmission media.
[0301] Computer system 500 can send messages and receive data, including program code, through the network(s), network link 520 and communication interface 518. In the Internet example, a server 530 might transmit a requested code for an application program through Internet 528, ISP 526, local network 522 and communication interface 518.
[0302] The received code may be executed by processor 504 as it is received, and/or stored in storage device 510, or other non-volatile storage for later execution.
Equivalents, Extensions, Alternatives and Miscellaneous
[0303] In the foregoing specification, embodiments of the invention have been described with reference to numerous specific details that may vary from implementation to implementation. Thus, the sole and exclusive indicator of what are claimed embodiments of the invention, and is intended by the applicants to be claimed as embodiments of the invention, is the set of claims that issue from this application, in the specific form in which such claims issue, including any subsequent correction. Any definitions expressly set forth herein for terms contained in such claims shall govern the meaning of such terms as used in the claims. Hence, no limitation, element, property, feature, advantage or attribute that is not expressly recited in a claim should limit the scope of such claim in any way. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.
Enumerated Exemplary Embodiments
[0304] The invention may be embodied in any of the forms described herein, including, but not limited to the following Enumerated Example Embodiments (EEEs) which describe structure, features, and functionality of some portions of embodiments of the present invention.
[0305] EEE1. A method comprising: encoding a primary image of a first image format into an image file designated for the first image format; encoding a non-primary image of a second image format into one or more attendant segments of the image file, wherein the second image format is different from the first image format;
causing a display image derived from a reconstructed image to be rendered with a recipient device of the image file, wherein the reconstructed image is generated from one of the primary image or the non-primary image.
[0306] EEE2. The method of EEE1, wherein the non-primary image is represented by non-primary image data divided into one or more payload data portions respectively included in the one or more attendant segments of the image file; wherein each attendant segment in the one or more attendant segments of the image file includes a respective box designated to contain a corresponding pay load data portion in the one or more pay load data portions.
[0307] EEE3. The method of EEE2, wherein each attendant segment in the one or more attendant segments of the image file includes a first data length field and a second data length field; wherein a total number of bytes of all the one or more payload data portions is indicated in the first length field in response to determining that the first length field is not set to a specific reserved value among a plurality of reserved values; wherein the total number of bytes of all the one or more payload data portion is indicated in the second length field in response to determining that the first length field is set to the specific reserved value.
[0308] EEE4. The method of EEE2, wherein each attendant segment in the one or more attendant segments of the image file includes a data type field of a specific data type value indicating the second image format.
[0309] EEE5. The method of any of EEE1-EEE4, wherein the primary image represents a JPEG image; wherein the image file represents a JPEG image file; wherein the non-primary image represents a non- JPEG image.
[0310] EEE6. The method of any of EEE1-EEE5, wherein both the primary image and the non-primary image are derived from a same source image.
[0311] EEE7. The method of any of EEE1-EEE6, wherein the non-primary image represents one of one or more non-primary images of the second image format that are encoded in the image file with the primary image.
[0312] EEE8. The method of any of EEE1-EEE7, wherein the one or more segments are encoded as application 11 (APP11) marker segments in the image file.
[0313] EEE9. The method of any of EEE1-EEE8, wherein one or more image metadata portions are encoded in one or more second segments of the image file.
[0314] EEE10. The method of any of EEE1-EEE9, wherein the one or more image metadata portions includes an image metadata portion that includes explicitly specified first operational parameter values for generating a first image from the image file; wherein the image file is free of explicitly specified second operational parameter values for generating a second
different image from the image file; wherein the image file includes one or more flags in place of the explicitly specified second operational parameter values to indicate re-using the explicitly specified first operational parameter values for generating the second different image from the image file.
[0315] EEE11. The method of any of EEE1-EEE10, wherein the explicitly specified first operational parameter values relate to one or more of: image prediction, image mapping, image reshaping, display management, or other image processing operations.
[0316] EEE12. The method of EEE9, wherein the one or more image metadata portions includes a specific image metadata portion that carries specific operational parameters for specific image processing operations to be performed by the recipient device on one of: the primary image or the non-primary image.
[0317] EEE13. The method of EEE 12, wherein the specific image processing operations include one or more of: image forward reshaping, image backward reshaping, image inverse mapping, image mapping, color space conversion, codeword linear mapping, codeword nonlinear mapping, display management operations, perceptual quantization based mapping, mapping based on one or more transfer functions, or other image processing operations performed by the recipient device.
[0318] EEE14. The method of EEE12, wherein specific image metadata portion is generated by concatenating one or more boxes carried in one or more application 11 (APP11) marker segments included in the one or more second segments.
[0319] EEE15. The method of any of EEE 1 -EEE 14, wherein the non-primary image of the second image format represents one of: an HEVC image, an AVI image, or another non- JPEG image.
[0320] EEE16. A method comprising: receiving an image file designated for a first image format, wherein the image file is encoded with a primary image of the first image format; decoding a non-primary image of a second image format from one or more attendant segments of the image file, wherein the second image format is different from the first image format, wherein both the primary image and the non-primary image are derived from a same source image; causing a display image derived from a reconstructed image to be rendered on an image display, wherein the reconstructed image is generated from one of the primary image or the non- primary image using image metadata carried in the image file.
[0321] EEE17. The method of EEE16, further comprising:
receiving a second image file designated for the first image format, wherein the second image file is encoded with a second primary image of the first image format, wherein the second image file is not encoded with another image other than the second primary image; decoding the second primary image of the first image format from one or more second attendant segments of the second image file; causing a second display image derived from the second primary image to be reconstructed and rendered on the image display.
[0322] EEE18. The method of EEE 16 or EEE 17, wherein the display image is generated by one or more image processing operations; wherein one or more operational parameters for the one or more image processing operations are decoded from an image metadata portion carried by the image file.
[0323] EEE19. The method of any of EEE16-18, further comprising: maintaining at least one image metadata buffer to store at least one image metadata portion, relating to a first image, received with the image file; using first operational parameter values specified in the at least one image metadata portion in the at least one image metadata buffer to apply one or more image processing operations relating to a second different image.
[0324] EEE20. A method comprising: capturing a raw image with a capture device; processing the raw image with an image signal processor (ISP) of the capture device into a post- ISP image; converting the post- ISP image with two codecs of different image formats in the capture device into two images of different image formats; packaging the two images of different image formats with a photo processing subsystem of the capture device into a single image file; causing a display image to be generated by a recipient device of the single image file from one of the two images of the different image formats and rendered on an image display operating with the recipient device.
[0325] EEE21. A method comprising: receiving, by a photo processing device, an input image file containing a first image of a first image format; invoking a first codec of the first image format in the photo processing device to decode the first image of the first image format into a decoded image; converting the decoded image with a second codec of a second different image format in the photo processing device into a second image of a second image format;
packaging the first image of the first image format and the second image of the second image format with a photo processing subsystem of the capture device into a single image file; causing a display image to be generated by a recipient device of the single image file from one of the two images of the different image formats and rendered on an image display operating with the recipient device.
[0326] EEE22. A method comprising: receiving, by a photo recipient device, an image file containing two or more images of different image formats; invoking a codec of one of the different image formats in the photo recipient device to decode one of the two or more images of the different image formats into a decoded image; generating, by the photo recipient device based at least in part on an image display configuration of the photo recipient device, a display image from the decoded image; causing, by the photo recipient device, the display image to be rendered on an image display of the photo recipient device with the image display configuration.
[0327] EEE23. A non-transitory computer readable storage medium, storing software instructions, which when executed by one or more processors cause performance of the method recited in any of EEE1-EEE22.
[0328] EEE24. An apparatus comprising a processor and configured to perform any one of the methods recited in EEE1-EEE22.
Claims
1. A method comprising: encoding a primary image of a first image format into an image file designated for the first image format; encoding a non-primary image of a second image format into one or more attendant segments of the image file, wherein the second image format is different from the first image format; causing a display image derived from a reconstructed image to be rendered with a recipient device of the image file, wherein the reconstructed image is generated from one of the primary image or the non-primary image.
2. The method of claim 1, wherein the non-primary image is represented by non-primary image data divided into one or more payload data portions respectively included in the one or more attendant segments of the image file; wherein each attendant segment in the one or more attendant segments of the image file includes a respective box designated to contain a corresponding pay load data portion in the one or more pay load data portions.
3. The method of claim 2, wherein each attendant segment in the one or more attendant segments of the image file includes a first data length field and a second data length field; wherein a total number of bytes of all the one or more payload data portions is indicated in the first length field in response to determining that the first length field is not set to a specific reserved value among a plurality of reserved values; wherein the total number of bytes of all the one or more payload data portion is indicated in the second length field in response to determining that the first length field is set to the specific reserved value.
4. The method of any one of claims 1-3, wherein the primary image represents a JPEG image; wherein the image file represents a JPEG image file; wherein the non-primary image represents a non- JPEG image.
5. The method of any one of claims 1-4, wherein both the primary image and the non- primary image are derived from a same source image.
6. The method of any one of claims 1-5, wherein the one or more segments are encoded as application 11 (APP11) marker segments in the image file.
7. The method of any one of claims 1-6, wherein one or more image metadata portions are encoded in one or more second segments of the image file; wherein the one or more image metadata portions includes an image metadata portion that includes explicitly specified first operational parameter values for generating a first image from the image file; wherein the image file is free of explicitly specified second operational parameter values for generating a second different image from the image file; wherein the image file includes one or more flags in place of the explicitly specified second operational parameter values to indicate re-using the explicitly specified first operational parameter values for generating the second different image from the image file.
8. The method of any one of claims 1-7, wherein the one or more image metadata portions includes a specific image metadata portion that carries specific operational parameters for specific image processing operations to be performed by the recipient device on one of: the primary image or the non-primary image.
9. The method of any one of claims 1-8, wherein the non-primary image of the second image format represents one of: an HEVC image, an AVI image, or another non- JPEG image.
10. A method comprising: receiving an image file designated for a first image format, wherein the image file is encoded with a primary image of the first image format; decoding a non-primary image of a second image format from one or more attendant segments of the image file, wherein the second image format is different from the first image format, wherein both the primary image and the non-primary image are derived from a same source image; causing a display image derived from a reconstructed image to be rendered on an image display, wherein the reconstructed image is generated from one of the primary image or the non-primary image using image metadata carried in the image file.
11. The method of claim 10, further comprising:
receiving a second image file designated for the first image format, wherein the second image file is encoded with a second primary image of the first image format, wherein the second image file is not encoded with another image other than the second primary image; decoding the second primary image of the first image format from one or more second attendant segments of the second image file; causing a second display image derived from the second primary image to be reconstructed and rendered on the image display.
12. The method of claim 10 or 11, wherein the display image is generated by one or more image processing operations; wherein one or more operational parameters for the one or more image processing operations are decoded from an image metadata portion carried by the image file.
13. The method of any one of claims 10-12, further comprising: maintaining at least one image metadata buffer to store at least one image metadata portion, relating to a first image, received with the image file; using first operational parameter values specified in the at least one image metadata portion in the at least one image metadata buffer to apply one or more image processing operations relating to a second different image.
14. A non-transitory computer readable storage medium, storing software instructions, which when executed by one or more processors cause performance of the method recited in any one of claims 1-13.
15. An apparatus comprising a processor and configured to perform any one of the methods recited in claims 1-13.
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202363528608P | 2023-07-24 | 2023-07-24 | |
US63/528,608 | 2023-07-24 | ||
US202363606424P | 2023-12-05 | 2023-12-05 | |
US63/606,424 | 2023-12-05 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2025024152A1 true WO2025024152A1 (en) | 2025-01-30 |
Family
ID=92106660
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2024/038021 WO2025024152A1 (en) | 2023-07-24 | 2024-07-15 | Photo coding operations for different image displays |
Country Status (2)
Country | Link |
---|---|
TW (1) | TW202507654A (en) |
WO (1) | WO2025024152A1 (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150016735A1 (en) * | 2013-07-11 | 2015-01-15 | Canon Kabushiki Kaisha | Image encoding apparatus, image decoding apparatus, image processing apparatus, and control method thereof |
US20170164008A1 (en) * | 2012-01-03 | 2017-06-08 | Dolby Laboratories Licensing Corporation | Specifying Visual Dynamic Range Coding Operations and Parameters |
US9781417B2 (en) * | 2011-06-13 | 2017-10-03 | Dolby Laboratories Licensing Corporation | High dynamic range, backwards-compatible, digital cinema |
US10021390B2 (en) | 2011-04-14 | 2018-07-10 | Dolby Laboratories Licensing Corporation | Multiple color channel multiple regression predictor |
US20220164931A1 (en) | 2019-04-23 | 2022-05-26 | Dolby Laboratories Licensing Corporation | Display management for high dynamic range images |
US20220408081A1 (en) | 2019-10-01 | 2022-12-22 | Dolby Laboratories Licensing Corporation | Tensor-product b-spline predictor |
WO2023140952A1 (en) * | 2022-01-20 | 2023-07-27 | Dolby Laboratories Licensing Corporation | Data structure for multimedia applications |
EP4344201A1 (en) * | 2022-09-23 | 2024-03-27 | Dolby Laboratories Licensing Corporation | Cross-asset guided chroma reformatting for multi-asset imaging format |
-
2024
- 2024-07-15 WO PCT/US2024/038021 patent/WO2025024152A1/en active Search and Examination
- 2024-07-23 TW TW113127423A patent/TW202507654A/en unknown
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10021390B2 (en) | 2011-04-14 | 2018-07-10 | Dolby Laboratories Licensing Corporation | Multiple color channel multiple regression predictor |
US9781417B2 (en) * | 2011-06-13 | 2017-10-03 | Dolby Laboratories Licensing Corporation | High dynamic range, backwards-compatible, digital cinema |
US20170164008A1 (en) * | 2012-01-03 | 2017-06-08 | Dolby Laboratories Licensing Corporation | Specifying Visual Dynamic Range Coding Operations and Parameters |
US20150016735A1 (en) * | 2013-07-11 | 2015-01-15 | Canon Kabushiki Kaisha | Image encoding apparatus, image decoding apparatus, image processing apparatus, and control method thereof |
US20220164931A1 (en) | 2019-04-23 | 2022-05-26 | Dolby Laboratories Licensing Corporation | Display management for high dynamic range images |
US20220408081A1 (en) | 2019-10-01 | 2022-12-22 | Dolby Laboratories Licensing Corporation | Tensor-product b-spline predictor |
WO2023140952A1 (en) * | 2022-01-20 | 2023-07-27 | Dolby Laboratories Licensing Corporation | Data structure for multimedia applications |
EP4344201A1 (en) * | 2022-09-23 | 2024-03-27 | Dolby Laboratories Licensing Corporation | Cross-asset guided chroma reformatting for multi-asset imaging format |
Non-Patent Citations (2)
Title |
---|
"Information technology - Digital compression and coding of continuous-tone still images: Registration of JPEG profiles, SPIFF profiles, SPIFF tags, SPIFF colour spaces, APPn markers, SPIFF compression types and Registration Authorities (REGAUT) - Part 4: - Amendment 1: Application specific marker li", 7 May 2013 (2013-05-07), pages 1 - 2, XP082099554, Retrieved from the Internet <URL:https://api.iec.ch/harmonized/publications/download/3613319> [retrieved on 20130507] * |
"Information technology - Scalable compression and coding of continuous-tone still images - Part 3: Box file format", 9 December 2015 (2015-12-09), pages 1 - 42, XP082098854, Retrieved from the Internet <URL:https://api.iec.ch/harmonized/publications/download/3614002> [retrieved on 20151209] * |
Also Published As
Publication number | Publication date |
---|---|
TW202507654A (en) | 2025-02-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU2020201708B2 (en) | Techniques for encoding, decoding and representing high dynamic range images | |
US20180098094A1 (en) | Inverse luma/chroma mappings with histogram transfer and approximation | |
TWI769128B (en) | Methods, systems and apparatus for electro-optical and opto-electrical conversion of images and video | |
WO2016038775A1 (en) | Image processing apparatus and image processing method | |
CN113170157A (en) | Color conversion in layered coding schemes | |
EP3306563B1 (en) | Inverse luma/chroma mappings with histogram transfer and approximation | |
US20240406423A1 (en) | Methods, apparatuses, computer programs and computer-readable media for scalable image coding | |
KR20210028654A (en) | Method and apparatus for processing medium dynamic range video signals in SL-HDR2 format | |
WO2024042319A1 (en) | Encoding and decoding of pre-processing renditions of input videos | |
WO2025024152A1 (en) | Photo coding operations for different image displays | |
KR102538197B1 (en) | Determination of co-located luminance samples of color component samples for HDR coding/decoding | |
WO2025152130A1 (en) | Image encoding method, image decoding method and related device | |
WO2025152668A1 (en) | Multimedia metadata encoding method and multimedia metadata decoding method | |
WO2025153959A1 (en) | Slim mode for image file format | |
HK1248904A1 (en) | Inverse luma/chroma mappings with histogram transfer and approximation | |
HK40009290B (en) | Methods for encoding and decoding high dynamic range images | |
HK40009290A (en) | Methods for encoding and decoding high dynamic range images | |
HK40009915A (en) | Techniques for encoding, decoding and representing high dynamic range images | |
HK40009915B (en) | Techniques for encoding, decoding and representing high dynamic range images | |
HK1228142B (en) | Methods for encoding and decoding high dynamic range images | |
HK1228142A1 (en) | Methods for encoding and decoding high dynamic range images |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24748543 Country of ref document: EP Kind code of ref document: A1 |
|
DPE1 | Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101) |