Accessible Images in PDF

kyledgreen

We need to make our PDFs 508 compliant and are currently subscribed to jsreports and are using chrome-pdf and handlebars.

The first thing I tried to do was give images descriptive text for a screen reader to pick up but it does not appear to be making it from html to pdf format.

Here is a simple example:
https://playground.jsreport.net/w/anon/kUoMvfkO

Using a mac I enable VoiceOver in the system settings to test if the PDF is accessible and it will read out the img as "unlabeled image, image. You are currently on an image."

Am I doing something wrong? Can images be made accessible in the PDF? Is there a workaround?

jan_blaha

I see that this should be already supported in chromium
https://blog.chromium.org/2020/07/using-chrome-to-generate-more.html

However, the current chromium jsreport ships with for some reason doesn't reflect this and doesn't produce tagged pdfs. I've tried it with the latest chromium 101 and it works there well. So we will just need to wait some weeks until the lib we use to manage chrome instance and communicate with it updates to the latest Chromium.

kyledgreen

Thank you for getting back to me so quickly Jan Blaha,

Reading that article from 2020 I would hope its fully supported almost 2 years later, I noticed it states:
"This feature also works with Chrome Headless when you use both the --print-to-pdf and --export-tagged-pdf flags."
and I've tried adding them to my config:

 "extensions": {
  "chrome-pdf": {
    "launchOptions": {
      "args": ["--print-to-pdf", "--export-tagged-pdf"]
    }
  }
}

but this causes an error in chromium (I tried with --no-sandbox as well).

Report "directoryTemplate" render failed.

Failed to launch the browser process!
[0307/181932.010233:ERROR:headless_shell.cc(199)] Print to PDF is disabled when remote debugging is enabled.


TROUBLESHOOTING: https://github.com/puppeteer/puppeteer/blob/main/docs/troubleshooting.md

Error: Failed to launch the browser process!
[0307/181932.010233:ERROR:headless_shell.cc(199)] Print to PDF is disabled when remote debugging is enabled.


TROUBLESHOOTING: https://github.com/puppeteer/puppeteer/blob/main/docs/troubleshooting.md

    at onClose (/app/node_modules/puppeteer/lib/cjs/puppeteer/node/BrowserRunner.js:229:20)
    at Interface.<anonymous> (/app/node_modules/puppeteer/lib/cjs/puppeteer/node/BrowserRunner.js:219:68)
    at Interface.emit (node:events:532:35)
    at Interface.emit (node:domain:537:15)
    at Interface.close (node:readline:586:8)
    at Socket.onend (node:readline:277:10)
    at Socket.emit (node:events:532:35)
    at Socket.emit (node:domain:537:15)
    at endReadableNT (node:internal/streams/readable:1346:12)
    at processTicksAndRejections (node:internal/process/task_queues:83:21)

The history of that flag is a bit confusing per this PR it was merged awhile ago to puppeteer but subsequently removed as it was to be supported by default and this bug says its always enabled be default as of 3 days ago. This chromium bug was also fixed saying headless will always include tags last Nov.

Do you know of any workarounds to get the accessibility tags into the PDF?

Otherwise I just need to watch this and hope it reaches 101 soon? Once it does will jsreports be using it right away or do I need to request an update?

Thank you for your help with this issue!

jan_blaha

You can try to explicitly link the 101 chromium:

 "extensions": {
  "chrome-pdf": {
    "launchOptions": {
      "executablePath": "/path/to/101/chromium",
      "args": ["--export-tagged-pdf"]
    }
  }
}

kyledgreen

Good morning Jan Blaha,

I could use some more directions on how to go about this. We are using jsreport with Docker.

I'm having some trouble figuring out how to find a download and the correct version of chromium using npm.
I was able to do a npm i chromium which gives me '97', but I cant seem to find a '101' version available.
I tried this, using the latest linux revision i found here:

#Dockerfile
FROM jsreport/jsreport:3.4.1-full
...
ENV NODE_CHROMIUM_REVISION=978519
RUN npm i chromium
...

I'm not sure what to set the exe path to though, I assumed something like /app/node_modules/chromium/.

I also attempted to override the chromium version being used by puppeteer:

#Dockerfile
FROM jsreport/jsreport:3.4.1-full
...
ENV PUPPETEER_CHROMIUM_REVISION=978519
RUN npm i puppeteer
...

#jsreport.config.json
...
    "chrome-pdf": {
      "launchOptions": {
        "args": ["--no-sandbox", "--export-tagged-pdf", "--force-renderer-accessibility"]
      },
      "strategy": "dedicated-process",
      "timeout": 7200000
    }
...

No errors but I still don't have accessible images in PDF so it may not have done anything.

jan_blaha

This is how you can install chromium 99

FROM jsreport/jsreport:3.4.1-full

RUN apt-get update && \
    # chrome
    apt-get install -y libgconf-2-4 libappindicator3-1 && \
    wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - && \
    sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' && \
    apt-get update && \
    # install latest chrome just to get package deps installed
    apt-get install -y lsb-release google-chrome-stable fonts-ipafont-gothic fonts-wqy-zenhei fonts-thai-tlwg fonts-kacst --no-install-recommends && \
    # then replace that chrome with specific chrome version, see https://github.com/webnicer/chrome-downloads for available versions
    wget https://github.com/webnicer/chrome-downloads/raw/master/x64.deb/google-chrome-stable_99.0.4844.51-1_amd64.deb && \
    dpkg -i ./google-chrome*.deb && \
    rm google-chrome*.deb

ENV chrome_launchOptions_executablePath google-chrome-stable

Your playground example doesn't include any text and for some reason, Chrome doesn't include tags there.
However, if I add the "Hello world" text to the beginning, it adds to the output pdf tags. So it seems chromium 99 should work with tagging without further configurations

Hello world
<div aria-label="Cover Page Image Aria Label" 
    alt="Cover Page Image Alternate Text" 
    title="Cover Page Image Title Text"
    role="img" style="width: 100%;
        height: 100%;">
    {{!-- Try to label img --}}
    <img style="width: 100%; height: 100%;" src="https://via.placeholder.com/1024" aria-label="Cover Page Image Aria Label" alt="Cover Page Image Alternate Text" title="Cover Page Image Title Text"/>
</div>

<div style='page-break-after: always;'></div>

kyledgreen

Oh this is fantastic, if not a bit weird...
Having text before the image does appear to make the alt and aria-label tags work!
I also appear to need to actually download the pdf instead of try and use a screenreader on the preview in jsreports studio.
Though it doesn't detect the text which is strange.

I will look into this further, thank you so much Jan!

kyledgreen

Do you know if there is a limitation in pdfUtils around this functionality? I'm using pdfUtils to prepend a cover page that is an image. If I run the template for just the cover page it gets the accessibility tags, but when I run my main template and prepend the cover page template it doesn't seem to have tags.

Something like this: https://playground.jsreport.net/w/anon/lhK7cf6Z

Note: I cant seem to get the online playground to generate the tags in any scenario. It only works for me locally. Maybe due to the --export-tagged-pdf arg im still passing, or my version being 3.4.1 and the playground is version: 3.2.1 with jsreport version: 3.2.0.

jan_blaha

The pdf utils unfortunately don't keep the pdf tags. I've put it in the backlog
https://github.com/jsreport/jsreport/issues/894

I didn't read the spec for the pdf tags yet, so not sure how difficult it can be to implement.

kyledgreen

Does that mean any usage of pdfUtils will strip the tags? Its seems even if I embed my cover page template logic into my main template I still dont get the tags, is this because im using the pdfUtils to merge a header/footer template and/or a table of contents?

Do you have an eta of when the team will be getting around to this functionality, is it high on the roadmap?

jan_blaha

Does that mean any usage of pdfUtils will strip the tags? Its seems even if I embed my cover page template logic into my main template I still dont get the tags, is this because im using the pdfUtils to merge a header/footer template and/or a table of contents?

Yes

Do you have an eta of when the team will be getting around to this functionality, is it high on the roadmap?

Not yet, unfortunately, we need to analyze how complex it will be to merge the tags from multiple pdfs the first.

kyledgreen

I've been looking into working around this problem using css but it appears what I'm using is not recognized by jsreports or maybe chromium yet.
Can you confirm that I cannot use css like below to create header/footer and page numbers?

        @page {
            size: A4;
            margin: 2cm;
            
            @bottom-right-corner { 
                content: "Page " counter(page); 
                border: solid green; 
            }

            @top-center {
                font-size: 16px;
                content: "Page title that could be dynamically created";
            }
        }

reference: https://www.w3.org/TR/css-page-3/#populating-margin-boxes

kyledgreen

I dug a bit more and can confirm chromium does not support these yet.

jan_blaha

Yes, this isn't supported in chromium as far as I know.

Not sure how is your header complicated, won't the chrome-pdf native header work for you?
https://jsreport.net/learn/chrome-pdf#native-headers-and-footers

kyledgreen

We are creating a directory, and the section title is placed in the header along with the first and last subsection header something like:
"First subsection on page" <------> "Section title" <-----> "Last subsection on page"
There is also disclaimer content and page number in the footer.

We need some of the newer experimental features of css like "position: running(key)", "content: element(key)", "content: counter(page)", and "@page" media. I'm looking into Paged.js which appears to support these.