8000 Add basic support for PDF/UA with Cpdf by lscharmer · Pull Request #3712 · dompdf/dompdf · GitHub
[go: up one dir, main page]

Skip to content

Add basic support for PDF/UA with Cpdf#3712

Open
lscharmer wants to merge 1 commit intodompdf:masterfrom
EDUTIEK:pdfua
Open

Add basic support for PDF/UA with Cpdf#3712
lscharmer wants to merge 1 commit intodompdf:masterfrom
EDUTIEK:pdfua

Conversation

@lscharmer
Copy link
@lscharmer lscharmer commented Jan 29, 2026

Hi,
this PR adds basic functionality to generate PDF/UA compatible PDF's.

Changes made in Cpdf

Add new $pdfua toggle (similar to $pdfa)

Many things that are required for PDF/A are also required for PDF/UA (e.g. embedded fonts) but I still need to check if they are all required (e.g. color profile).
For now I have simply required the same as for PDF/A to be on the safe side.

Add Lang & MarkInfo

Document language can be set.

If a struct tree is used the document is also marked as tagged (required for PDF/UA).

Struct tree

Adds new structTreeRoot and structElement types and corresponding methods to be able to build a tree describing the documents structure.

Outline

The existing outline code did not work correctly in my tests,
so I modified it and added methods to be able to create an outline tree.

The outline item will link to the currently rendered page, maybe the target should be made explicit.
I may add other link targets in the future too.

Marked Content

To simplify marking content (and as all content must be marked anyway for PDF/UA), consumer code only opens marked contents and closing is done automatically.

Marking structure content requires the struct tree, so that the generated MCID is always added to the correct places in the struct tree.

Changes made in Dompdf

Add pdfua option

New property in the Option class.

Add language

Document language is set based on the html lang attribute.
This is left empty for canvas implementations other than CPDF.

StructTree Interface

Adds a new interface which the Canvas returns.
If PDF/UA is enabled Dompdf\Adapter\CPDF returns a Dompdf\StructTree\CPDFStructTree instance.

Building the struct tree, outline & marking all structure content is done in the CPDFStructTree class.

Creating the outline is currently very basic. Headlines with further children are not properly handled

All other Canvas implementations return a dummy class.

Renderers

Backgrounds, borders and outlines are marked as artifacts.

Text

When rendering text (or images) the corresponding node's path is used to build the struct tree and outline.
As this happens after the document reflow, the HTML tree and therefore the struct tree contains the modified version instead of the original.
But this has made the implementation much simpler, as for example MCID's with different pages in the same direct parent struct element are not possible (and therefore don't need to be implemented).

HTML tag and HTML attribute mappings to fitting PDF tags and attributes are not complete.
These will be extended in the future (also attributes to mark whole nodes as artifacts too).

Images

I set the PDF/UA XML tag to PDF/UA-2 instead of PDF/UA-1 because images are tagged as Span instead of Figure.

According to this https://pdfa.org/tagging-images-in-pdfua-1-and-pdfua-2/ PDF/UA-1 contains some ambiguities on how images should be handled, PDF/UA-2 seems to resolve this issue.

The PAC checker doesn't care if the document is set to PDF/UA-1 or PDF/UA-2...
When set to Figure and with an alt attribute the PAC checker also returns some errors which don't make sense for accessibility so...

So it might be incorrect how images are tagged or that the document is tagged as PDF/UA-2 instead of 1.

Other

I'm currently using several php7.4 features.
Should this be made php7.1 compatible or is php7.4 also ok?

This should be PHP7.1 compatible.

Currently this is one big commit,
if you want I can separate the Cpdf and Dompdf part into multiple commits or PRs.

PDF/A-3a

I originally aimed for PDF/A-3a support but the PAC checker only accepts PDF/UA documents and I couldn't find any better source to validate generated PDF's.
I suspect that most of PDF/UA will be required for PDF/A-3a, so maybe it's enough to just change the metadata info to PDF/A-3a ^^

@bsweeney bsweeney added this to the 4.0.0 milestone Jan 29, 2026
@lscharmer
Copy link
Author

I'm currently using several php7.4 features.
Should this be made php7.1 compatible or is php7.4 also ok?

^^ Ok, I saw the build jobs, I will make this php7.1 compatible.

@bsweeney
Copy link
Member

Apologies for missing the PHP version question. I'm thinking of maybe bumping the supported PHP version, though I had not yet planned that out. Looks like there's some property declarations that need to be modified. When I have a moment, I can tidy up the compatibility issues.

As far as the PR structure, it's fine. Personally, I try to target individual commits so it's easier to isolate individual changes but I'm happy to accept changes in whatever form works best for you.

When I review, I'll compare the implementation to the spec to see if there are any improvements to be made. I'll also render various documents to see how well they validate.

@lscharmer
Copy link
Author

I missed two places, should now be PHP7.1 compatible.

@lscharmer
Copy link
Author

Fixed another PHP7.2 issue.

EOT);

to

EOT
);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

0