Add basic support for PDF/UA with Cpdf#3712
Conversation
^^ Ok, I saw the build jobs, I will make this php7.1 compatible. |
|
Apologies for missing the PHP version question. I'm thinking of maybe bumping the supported PHP version, though I had not yet planned that out. Looks like there's some property declarations that need to be modified. When I have a moment, I can tidy up the compatibility issues. As far as the PR structure, it's fine. Personally, I try to target individual commits so it's easier to isolate individual changes but I'm happy to accept changes in whatever form works best for you. When I review, I'll compare the implementation to the spec to see if there are any improvements to be made. I'll also render various documents to see how well they validate. |
|
I missed two places, should now be PHP7.1 compatible. |
|
Fixed another PHP7.2 issue. EOT);to EOT
); |
Hi,
this PR adds basic functionality to generate PDF/UA compatible PDF's.
Changes made in Cpdf
Add new
$pdfuatoggle (similar to$pdfa)Many things that are required for PDF/A are also required for PDF/UA (e.g. embedded fonts) but I still need to check if they are all required (e.g. color profile).
For now I have simply required the same as for PDF/A to be on the safe side.
Add Lang & MarkInfo
Document language can be set.
If a struct tree is used the document is also marked as
tagged(required for PDF/UA).Struct tree
Adds new
structTreeRootandstructElementtypes and corresponding methods to be able to build a tree describing the documents structure.Outline
The existing outline code did not work correctly in my tests,
so I modified it and added methods to be able to create an outline tree.
The outline item will link to the currently rendered page, maybe the target should be made explicit.
I may add other link targets in the future too.
Marked Content
To simplify marking content (and as all content must be marked anyway for PDF/UA), consumer code only opens marked contents and closing is done automatically.
Marking structure content requires the struct tree, so that the generated MCID is always added to the correct places in the struct tree.
Changes made in Dompdf
Add pdfua option
New property in the
Optionclass.Add language
Document language is set based on the
htmllangattribute.This is left empty for canvas implementations other than CPDF.
StructTreeInterfaceAdds a new interface which the
Canvasreturns.If PDF/UA is enabled
Dompdf\Adapter\CPDFreturns aDompdf\StructTree\CPDFStructTreeinstance.Building the struct tree, outline & marking all structure content is done in the
CPDFStructTreeclass.Creating the outline is currently very basic. Headlines with further children are not properly handled
All other
Canvasimplementations return a dummy class.Renderers
Backgrounds, borders and outlines are marked as artifacts.
Text
When rendering text (or images) the corresponding node's path is used to build the struct tree and outline.
As this happens after the document reflow, the HTML tree and therefore the struct tree contains the modified version instead of the original.
But this has made the implementation much simpler, as for example MCID's with different pages in the same direct parent struct element are not possible (and therefore don't need to be implemented).
HTML tag and HTML attribute mappings to fitting PDF tags and attributes are not complete.
These will be extended in the future (also attributes to mark whole nodes as artifacts too).
Images
I set the PDF/UA XML tag to PDF/UA-2 instead of PDF/UA-1 because images are tagged as
Spaninstead ofFigure.According to this https://pdfa.org/tagging-images-in-pdfua-1-and-pdfua-2/ PDF/UA-1 contains some ambiguities on how images should be handled, PDF/UA-2 seems to resolve this issue.
The PAC checker doesn't care if the document is set to PDF/UA-1 or PDF/UA-2...
When set to
Figureand with analtattribute the PAC checker also returns some errors which don't make sense for accessibility so...So it might be incorrect how images are tagged or that the document is tagged as PDF/UA-2 instead of 1.
Other
I'm currently using several php7.4 features.Should this be made php7.1 compatible or is php7.4 also ok?This should be PHP7.1 compatible.
Currently this is one big commit,
if you want I can separate the Cpdf and Dompdf part into multiple commits or PRs.
PDF/A-3a
I originally aimed for PDF/A-3a support but the PAC checker only accepts PDF/UA documents and I couldn't find any better source to validate generated PDF's.
I suspect that most of PDF/UA will be required for PDF/A-3a, so maybe it's enough to just change the metadata info to PDF/A-3a ^^