Populating the Table of Content with subheaders extracting text from <H2>'s in template



  • Hi I'm struggling to populate a Table of Content where I want to incorporate subheaders in the template as part of the Table of Content.

    For instance and to start with, here's an example of how it currently looks on my current setup:

    0_1693564146451_upload-2278c788-640f-4f66-8708-3fe530f21a64

    This is all fine and this is achieved with this code in the helper section:

    function tableOfContents (pdfPages) {
        function onlyUnique(value, index, self) {
            return self.indexOf(value) === index;
        }
    
        const sections = pdfPages.map((d) => d.group?.title ?? "Blank").filter((d) => d !== "Blank").filter(onlyUnique)
    
        const contents = [];
     
        for (const section of sections) {
            let firstPageOfSection = 0;
            let firstPageSeen = false;
            for (let i = 0; i < pdfPages.length; i++){
                if (pdfPages[i].group?.title === section) {
                    if(!firstPageSeen) {
                        firstPageOfSection = i + 1;
                        firstPageSeen = true;
                    }
                }            
            }
            contents.push({
                sectionName: section,
                firstPageOfSection: firstPageOfSection
            })
        }
        return contents;
    }
    

    And here in the template:

    <ul class="toc__list">
        {{#each (tableOfContents $pdf.pages)}}
             <li class="toc__item">
                 <span class="toc__title">{{this.sectionName}}</span>
                 <span class="toc__spacer"></span>
                 <span class="toc__page">{{this.firstPageOfSection}}</span>         
             </li>
        {{/each}}
    </ul>
    

    So... what I want to do is to have something like this in the Table of content:

    • Executive Summary ................................. 1
    • Pricing ...................................................... 2
      • Subheader 1 ................................... 2
      • Subheader 2 ................................... 3
      • Subheader 3 ................................... 3
      • and so on...

    So, the subheaders, and here is where I'm struggling with, which is getting them from the template by targeting all the <h2> tags with a class name, i.e. <h2 class="toc__listed">

    I was trying to do something like this:

    function tableOfContents(pdfPages) {
        function onlyUnique(value, index, self) {
            return self.indexOf(value) === index;
        }
    
        const sections = pdfPages.map((d) => d.group?.title ?? "Blank").filter((d) => d !== "Blank").filter(onlyUnique)
    
        const contents = [];
    
        for (const section of sections) {
            let firstPageOfSection = 0;
            let firstPageSeen = false;
            const subsections = [];
    
            for (let i = 0; i < pdfPages.length; i++) {
                if (pdfPages[i].group?.title === section) {
                    if (!firstPageSeen) {
                        firstPageOfSection = i + 1;
                        firstPageSeen = true;
                    }
    
                    // Extract h2 headers with class .toc__listed from page content
                    const h2Regex = /<h2 class="toc__listed".*?>(.*?)<\/h2>/g;
                    const matches = pdfPages[i].content.match(h2Regex);
    
                    if (matches) {
                        const subSections = matches.map(match => match.replace(/<\/?h2.*?>/g, '').trim());
                        subsections.push(...subSections);
                    }
                }
            }
    
            contents.push({
                sectionName: section,
                firstPageOfSection: firstPageOfSection,
                subsections: subsections
            });
        }
        return contents;
    }
    

    And then update the template with something like this to match this updated code:

    {{#each (tableOfContents $pdf.pages)}}
        <li class="toc__item">
            <span class="toc__title">{{this.sectionName}}</span>
            <span class="toc__spacer"></span>
            <span class="toc__page">{{increment this.firstPageOfSection 1}}</span>
            <ul>
                {{#each this.subsections}}
                <li>{{this}}</li>
                {{/each}}
            </ul>
        </li>
    {{/each}}
    

    But I don't seem to go anywhere without running in to a number of errors.

    So, my question would be:

    • Am I close with this to achieve what I want to get and if you point me toward what I might be missing? OR
    • Is there a better approach I should be considering when working with Table of Contents?

    Many thanks in advance for any help you guys can provide!



  • You are on the right track using pdf utils to find out the page numbers. You use pdf utils groups which embeds max just one information per page and you rather want to use pdf utils page items.

    Please refer to our pdf utils documentation which includes two examples where both has nested headers.

    One using "twice rendering" approach
    https://playground.jsreport.net/w/admin/tV6sVKbV

    One merging extra template for ToC
    https://playground.jsreport.net/w/admin/akYBA4rS



  • Hi...

    Ah brilliant! Thanks for pointing those out. I checked this one out as well from your util link. I have now a couple of solid use cases to work on - https://playground.jsreport.net/w/anon/0~cRmrQ~

    Thanks again 🙏


Log in to reply
 

Looks like your connection to jsreport forum was lost, please wait while we try to reconnect.