Dynamic TOC - without initial/previous declaration

fhrtms

I have a weird approach to building the table of contents with a little less effort. Well, almost...

https://playground.jsreport.net/w/fhrtms/Oxr_HbAT

In the example I have displayed the two TOCs side by side for debugging purposes.

So, I don't want to declare the table of contents completely before, instead I want to pass it directly in the html-file always before the particular header to a function like this:

{{{pdfAddPageItem (pdfAddTocItem @root id="header-1-1" parent="header-1" title="1.1. Heading")}}}
<h2 id="header-1-1">1.1. Heading</h2>

However, with nesting it has its limits at max 2 levels.

Does anyone have an idea how I could map deeper nesting with this?

At the end of the PDF it works quite well. Is it possible to move rendered pages?

fhrtms

Well, I made it work by parsing the finished html DOM with an inline script. Namely at the end of the rendered html document. From this I build a linked toc.

So I have a complete linked toc and a page numbered toc. Which I can simply put on top of each other, just like in the "official" examples too. Only without the additional previous JSON declaration.

Now I only have the problem that this only works at the PDF end. I would have to somehow move the last page freely in the final PDF document. Would anyone have an approach to get this done with jsreport tools?

And I still have a problem with the pdf toc itself. It's not assembled quite right in my example either.

fhrtms

The problem with the pdf TOC is likely that I am adding the <a> tags dynamically to the dom.

The attributes data-pdf-... which are important for building the pdf TOC are read from the static template and not from the manipulated DOM.

Is it possible to influence this, or could the parameters also be passed to the pdf renderer via the WINDOW object?

var a = document.createElement('a')
a.title = item.title
a.href = '#' + item.id
a.className = 'leading-6 entry cursor-pointer'
a.setAttribute('data-pdf-link-target-id', item.id);
a.setAttribute('data-pdf-outline', '');
a.setAttribute('data-pdf-outline-title', item.title);
a.setAttribute('data-pdf-outline-parent', item.parent || 'root');
a.innerHTML = item.title
document.getElementById("toc").appendChild(a);

jan_blaha

Is it possible to influence this, or could the parameters also be passed to the pdf renderer via the WINDOW object?

No, currently. The data-pdf-outline... is parsed just after templating engines are evaluated and before the javascript is evaluated. So you need to assemble the data-pdf-outline.. using handlebars.

Now I only have the problem that this only works at the PDF end. I would have to somehow move the last page freely in the final PDF document. Would anyone have an approach to get this done with jsreport tools?

Not sure if this helps, but you can use a trick and delay rendering of something in handlebars before you modify data later.
https://playground.jsreport.net/w/anon/FAMoOZ_y

I will keep an eye on some possible improvements for TOC rendering. jsreport now supports more things in v3 and perhaps we could do it better now. I will give it some testing.

fhrtms

When I want to build the pdf TOC from a handlebar loop, it does not work. The static one does.

<!-- THIS WORKS -->
<a title="Root" href="#root" class="hidden" 
   data-pdf-link-target-id="root" 
   data-pdf-outline 
   data-pdf-outline-title="My little document"
>My little document</a>

<!-- THIS DOESN'T WORK-->
{{#each @root.$pdf.pages as | page |}}
    {{#each this.items as | item |}}
        {{#if this.id}}
            <a title="{{this.title}}" href="#{{this.id}}" 
               data-pdf-link-target-id="{{this.id}}" 
               data-pdf-outline 
               data-pdf-outline-title="{{this.title}}" 
               data-pdf-outline-parent="{{this.parent}}"
            >{{this.title}} ... {{increment @../index}}</a>
        {{/if}}
    {{/each}}
{{/each}}

When i watch/log the result in @jsreport/jsreport-pdf-utils/lib/worker.js at the end of reporter.afterTemplatingEnginesExecutedListeners.add i see that the HTML content is correct.

0_1671437425646_upload-c31e17b0-d34b-4a8e-a898-80ab2b759867

The output of req.context.pdfUtilsOutlines also seems correct.

0_1671451713507_upload-2d2a8a70-6039-42c5-acb6-0cc65fad56c2

jan_blaha

I see you are going quite deep in your debugging. Well done.

The problem is likely that the data-pdf-outline... needs to be in the main template, not in the merged one.

I'm checking how this can be improved...

fhrtms

YES, your tip with the delayed handlebar in combination with the main template was the right one ... https://playground.jsreport.net/w/fhrtms/Oxr_HbAT

I have now messed it up so that it works.

Many many thanks to @jan_blaha!

0_1671472260323_upload-56df80a1-bd32-485c-8e7f-a013515478c9

So here's what I have:

multiple nested templates with many headings and registering via {{pdfAddPageItem...
a TOC template that I merge (multiple if you like) at the end or/and start of the final pdf
root TOC element <a href="#root" data-pdf-link-target-id="root" data-pdf-outline data-pdf-outline-title="My little document"></a>{{pdfAddPageItem "root"}}
the delayed function with {{laterToc...}}

I just have to play with the delay and increase it a bit depending on the workload.

i am still working on a solution to place the TOC in any position

jan_blaha

I just have to play with the delay and increase it a bit depending on the workload.

This should be a more robust solution
https://jsreport.net/learn/templating-engines#async

const jsreport = require('jsreport-proxy')
async function laterToc(opts) {
   await jsreport.templatingEngines.waitForAsyncHelpers()
   return opts.fn()
}

fhrtms

I haven't really figured out the asynchronous functions yet 🤷

// THIS NOT WORKING
async function laterToc(data) {
    //console.log('data', new Date, data.data.root.toc)
    
    await jsreport.templatingEngines.waitForAsyncHelpers()
    
    return new Promise((resolve) => {   
        //console.log('resolve', new Date, data, resolve)
        
        let res = ()=> {
            //console.log('res', new Date, data)
            let result = []

            data.data.root.toc.forEach((item, iIndex, arr)=> {
                if(item && item.title && item.id) {
                    result.push(`<a class="block" href="#${item.id}" data-pdf-link-target-id="${item.id}" data-pdf-outline data-pdf-outline-title="${item.title}" data-pdf-outline-parent="${item.parent}">${item.title}</a>`)
                }
            })

            //console.log('resolve-result', result.join("\n"))
    
            return result.join("\n");
        }

        resolve(res())
    })
}

jan_blaha

async function laterToc(data) {
    await jsreport.templatingEngines.waitForAsyncHelpers()
    let result = []
    data.data.root.toc.forEach((item, iIndex, arr) => {
        if (item && item.title && item.id) {
            result.push(`<a class="block" href="#${item.id}" data-pdf-link-target-id="${item.id}" data-pdf-outline data-pdf-outline-title="${item.title}" data-pdf-outline-parent="${item.parent}">${item.title}</a>`)
        }
    })
    return result.join('\n')
}

fhrtms

Hmm, I don't think that's enough. I don't get all elements when I run it at the top of the page. At the end of the page it works fine.

0_1671482638690_upload-2acc4a15-df75-4f3e-8ed4-aebedd2f1161

fhrtms

It not works very well. I mixed the stuff for testing reasons in my test environment in a non linear way.

0_1671551560037_upload-027c7586-a95b-4063-83ca-b831cdfad252

I really hate to bug you guys, but I find this topic kind of exciting. I would like to get the TOC somehow dynamic. The most suitable way is to read the header directly from DOM, preferably without additional function calls in the html source code. Can i call the pdfAddPageItem function directly via script from my template? Or can deal with the rendered dom in the script section (under the html sourcecode)?

So, I want to parse the finished DOM outside the template and pass the found elements to the function pdfAddPageItem purely script based (not via handlebars). But somehow I can't get to the html source of the finished DOM.

The HMTL DOM is correct. The remixed order in the DOM looks like this:

1. Heading (main)
1.1. Heading (main)
        3. Heading (child)
        3.1. Heading (child)
        3.2. Heading (child)
                4. Heading (child-child)
                4.1. Heading (child-child)
                5. Heading (child-child)
                5.1. Heading (child-child)
                5.1.1. Heading (child-child)
                5.2. Heading (child-child)
1.1.1. Heading (main)
1.2. Heading (main)
1.2.1. Heading (main)
2. Heading (main)
2.1. Heading (main)
2.2. Heading (main)
2.3. Heading (main)
        3. Heading (child)
        3.1. Heading (child)
        3.2. Heading (child)
                4. Heading (child-child)
                4.1. Heading (child-child)
                5. Heading (child-child)
                5.1. Heading (child-child)
                5.1.1. Heading (child-child)
                5.2. Heading (child-child)

With a little hassle I get the pdf TOC mapped with the function pdfAddPageItem and my little helper function pdfAddTocItem.

0_1671555656065_upload-cbeda541-e0df-42b6-8ea7-a647ed3f9d2e

Yes ok that looks a bit weird:

"laterToc": [
  { "title": "1. Heading", "parent": "root", "id": "h-1" },
  { "title": "1.1. Heading", "parent": "h-1", "id": "h-1-1" },
  { "title": "1.1.1. Heading", "parent": "h-1-1", "id": "h-1-1-1" },
  { "title": "1.2. Heading", "parent": "h-1", "id": "h-1-2" },
  { "title": "1.2.1. Heading", "parent": "h-1-2", "id": "h-1-2-1" },
  { "title": "2. Heading", "parent": "root", "id": "h-2" },
  { "title": "2.1. Heading", "parent": "h-2", "id": "h-2-1" },
  { "title": "2.2. Heading", "parent": "h-2", "id": "h-2-2" },
  { "title": "2.3. Heading", "parent": "h-2", "id": "h-2-3" },
  { "title": "3. Heading", "parent": "h-1-1", "id": "noh-3" },
  { "title": "3.1. Heading", "parent": "noh-3", "id": "nonoh-3" },
  { "title": "3.2. Heading", "parent": "noh-3", "id": "noh-3-2" },
  { "title": "3. Heading", "parent": "root", "id": "yesh-3" },
  { "title": "3.1. Heading", "parent": "yesh-3", "id": "yesyesh-3" },
  { "title": "3.2. Heading", "parent": "yesh-3", "id": "yesh-3-2" },
  { "title": "4. Heading", "parent": "h-1-1", "id": "abcnoh-4" },
  { "title": "4.1. Heading", "parent": "abcnoh-4", "id": "abcnoh-4-1" },
  { "title": "5. Heading", "parent": "h-1-1", "id": "abcnoh-5" },
  { "title": "5.1. Heading", "parent": "abcnoh-5", "id": "abcnoh-5-1" },
  { "title": "5.1.1. Heading", "parent": "abcnoh-5-1", "id": "abcnoh-5-1-1" },
  { "title": "5.2. Heading", "parent": "abcnoh-5", "id": "abcnoh-5-2" },
  { "title": "4. Heading", "parent": "root", "id": "abcyesh-4" },
  { "title": "4.1. Heading", "parent": "abcyesh-4", "id": "abcyesh-4-1" },
  { "title": "5. Heading", "parent": "root", "id": "abcyesh-5" },
  { "title": "5.1. Heading", "parent": "abcyesh-5", "id": "abcyesh-5-1" },
  { "title": "5.1.1. Heading","parent": "abcyesh-5-1","id": "abcyesh-5-1-1"},
  { "title": "5.2. Heading", "parent": "abcyesh-5", "id": "abcyesh-5-2" }
]

Or should I make the work and try to program a possible extension on nodejs level?

jan_blaha

The flow is like this:

Your template runs in the nodejs sandbox and evaluates handlebars. There is no DOM or inline script evaluated.
The helpers like pdfAddPageItem are evaluated and puts to the html output some specific text-based marks.
The chrome gets HTML, and evaluates it together with inline scripts in DOM, and outputs pdf. We have only a little influence here.
if there is a pdf utils operation like merge or append, the pdf utils parses the chrome-produced pdf, finds the hidden marks and reconstructs information added with pdfAddPageItem, this data are then passed in the $pdf to a template that is merged or appended

You cant run helpers from inside an HTML inline script because it is a completely different context and process. While the handlebars are executed in the node, the inline script is executed in chrome.

Your main problem is being able to put the TOC anywhere in the template?

As you probably noted, the main problem are the page numbers, which we can't obtain before sending to chrome, therefore we are rendering TOC twice.

To be able to simply put TOC for example in the middle of the document, you can render the whole template twice. The first time you collect information about page numbers and then in the second render you have everything you need.

A demo
https://playground.jsreport.net/w/anon/6hbllD25

I am still evaluating some other approaches...

fhrtms

This could look pretty interesting. And I might have an interface to exchange data between the two renderings.

On the first render i run an inline script at the end of DOM, and put a pseudo handlebar at the end of body:

toc = [{items: []}]
const regex = /<h([1-6]).*>(.*)<\/h\1>/g;

const str = document.getElementsByTagName('body')[0].innerHTML;
let m;
let tocItems = []

while((m = regex.exec(str)) !== null) {
    
    if(m.index === regex.lastIndex) {
        regex.lastIndex++;
    }

    let div = document.createElement('div');
    div.innerHTML = m[0].trim();

    let header = {
        title: div.firstChild.innerHTML,
        id: div.firstChild.id,
        parent: div.firstChild.getAttribute('data-parent') || null,
    }

    toc[0].items.push(header)
    tocItems.push(header)

    m.forEach((match, groupIndex) => {
        console.log(`Found match, group ${groupIndex}: ${match}`);
    });
}

let secondRenderJsonData = []

// PARSE STUCTURE
toc.forEach((page, pIndex, arr)=> {

    page.items.forEach((item, iIndex, arr)=> {
        
        if(item.title) {
            secondRenderJsonData.push({id: item.id, title: item.title, parent: item.parent || null, page: pIndex + 1})
        }
    })
})

var code = document.createElement('code')
code.setAttribute('ref', 'toc')
code.innerHTML = '{' + '{#secondRenderJsonData}}' + JSON.stringify(secondRenderJsonData) + '{' + '{/secondRenderJsonData}}'
console.log('code', code.outerHTML)
document.getElementsByTagName('body')[0].appendChild(code);

Then i use your second rendering approach, with filtering the pseudo handlebar and pass it as parsed json data to the second rendering:

async function afterRender (req, res) {
    if (req.data.secondRender) {
        return
    }

    const $pdf  = await jsreport.pdfUtils.parse(res.content, true)
    let page = $pdf.pages[$pdf.pages.length]
    let jsonData = []
    
    for(var i=0; i <= $pdf.pages.length; i++) {
        let page = $pdf.pages[i]
        //console.log('parsedResult', page)

        if(page && page.text) {
            const regex = /{{#secondRenderJsonData}}(.*){{\/secondRenderJsonData}}/gm;
            let m;

            while ((m = regex.exec(page.text)) !== null) {
                
                if (m.index === regex.lastIndex) {
                    regex.lastIndex++;
                }

                if(m[1]) {
                    jsonData = m[1]
                }
            }
        }
    }
            
    //console.log('jsonData: ', jsonData);

    const finalR = await jsreport.render({
        template: {
            name: 'template',
        },
        data: {
            ...req.data,
            $pdf: $pdf,
            firstRenderPdf: $pdf,
            secondRender: true,
            secondRenderTOC: (typeof jsonData === 'string') ? JSON.parse(jsonData) : jsonData,
        }
    })
    res.content = finalR.content
}

And then i write the links dynamically with the #each loop handlebar syntax:

{{#each secondRenderTOC}}
    <a class="block" 
        title="{{this.title}}" 
        href="#{{this.id}}" 
        data-pdf-link-target-id="{{this.id}}" 
        data-pdf-outline 
        data-pdf-outline-title="{{this.title}}" 
        data-pdf-outline-parent="{{this.parent}}"
    >
        <div class="relative w-full">
            <span class="chapter w-full block">{{this.title}}</span>
            <strong class="page absolute top-0 right-0 pl-2 bg-white">{{getPageNumber this.id}}</strong>
        </div>
    </a>
{{/each}}

Here my fork of your solution: https://playground.jsreport.net/w/fhrtms/z9GUrD79

I'll test it again in a more complex structure.

fhrtms

Ok, I tried it and found it to be excellent.

https://playground.jsreport.net/w/fhrtms/WDA4w4ao

0_1671611786058_upload-cd0d2d52-99d2-42b9-a222-6023f93a2919

But i found a little bug with spaces:
0_1671611927219_upload-6e35fe2f-c414-4d8e-8051-323e95e12143

It is always the last space of innerHTML property.

jan_blaha

You really don't give up easily, I like that :)

The pdfjs lib which is used to parse the pdf text has issues with spaces. We solve it here by trimming
https://github.com/jsreport/jsreport/blob/master/packages/jsreport-pdf-utils/lib/utils/parsePdf.js#L3

You could do the same, base64 encode the values to avoid spaces and use similiar regexp like we use to find what you need.

However, maybe you can try the following approach using console.log. It could make things simpler. Just note there is limit 1000 chars for single console.log.
https://playground.jsreport.net/w/anon/jciOAKws

fhrtms

Base64 is not the best solution, at least not for utf8. Some special characters are not encoded correctly.

0_1672100001958_upload-9e4db87e-6b49-4df6-8183-3f61ac444a42

So I just replace the spaces with the html equivalent title.replace(/ /gi, ' ').

And with a very long TOC in the attached code tag, which contains the headers as JSON code, split by a page break, the readout in the afterRender via regex no longer works. So I tried to minimize the font size.

code.setAttribute('style', 'font-size: 0.05px;')

What is the smallest font size that is rendered?

jan_blaha

You can find inspiration in pdfAddPageItem
https://github.com/jsreport/jsreport/blob/master/packages/jsreport-pdf-utils/static/helpers.js#L103

Base64 is not the best solution, at least not for utf8. Some special characters are not encoded correctly.

I don't see a reason why this shouldn't work. base64 can encode and decode everything, even images.

What is the smallest font size that is rendered?

See here, we use 1.1px size and opacity 0.01

const jsonStrOriginalValue = JSON.stringify(item)
const value = Buffer.from(jsonStrOriginalValue).toString('base64')
 // we use position: absolute to make the element to not participate in flexbox layout
  // (making it not a flexbox child)
  const result = `<span class='jsreport-pdf-utils-page-item jsreport-pdf-utils-hidden-element' style='font-family: Helvetica;position:absolute;text-transform: none;opacity: 0.01;font-size:1.1px'>item@@@${value}@@@</span>`

fhrtms

But in the inline script to parse the DOM i can't use new Buffer.from(title, 'utf8').toString('base64'). I can only use atob/btoa there.

fhrtms

With your solution i get an error Unexpected token � in JSON at position 0...

const jsonStrOriginalValue = JSON.stringify(secondRenderJsonData)
const value = btoa(jsonStrOriginalValue) //unescape(encodeURIComponent()) //Buffer.from(jsonStrOriginalValue).toString('base64') 

document.getElementsByTagName('body')[0].innerHTML += `<span class='jsreport-pdf-utils-page-item jsreport-pdf-utils-hidden-element' style='font-family: Helvetica;position:absolute;text-transform: none;opacity: 0.01;font-size:1.1px'>item@@@${value}@@@</span>`

Error when evaluating custom script /main.js
Unexpected token � in JSON at position 0

(sandbox.js line 33:19)

  31 |     }
  32 |
> 33 |     const $pdf  = await jsreport.pdfUtils.parse(res.content, true)
     |                   ^
  34 |     let page = $pdf.pages[$pdf.pages.length]
  35 |     let jsonData = []
  36 |     

SyntaxError: Unexpected token � in JSON at position 0

What is the meaning of the notation item@@@...@@@?
Is this somehow processed differently?

If I use itemm@@...@@@ or something else, it works. But the base64 string (content between @@@...@@@) gets clipped when I try to get it via regex, probably it gets clipped at the page margin.