Load balancing vs. JS Report out-of-the-box Clustering

zhwatts

Hello, I am configuring JS Report for an app that has high capacity, high availability requirements. I'm currently investigating running multiple JS Report instances in a clustered configuration. I have designed a partially working PoC using HA Proxy, but am concerned it may be over-engineered, which lead me to investigate JS Report's ability to cluster natively.

I am following this documentation, but have a few questions.

Questions

Is it fair to say that "JS Report Clustering" is really just Node's ability to cluster? eg this? Or is there some additional JS Report magic that happens?
Does clustering support a single endpoint that my app can register jobs with, and communicate for job status updates, irrespective of which JS Report instance is tackling the render job?
Based on jsreport docs, I trust that data consistency isn't a concern, however how can I overcome scenario where one instance of JS Report can respond to job status update request, if the job is being serviced by another JS report instance?
I am planning to move to a MSSQL based template store. will this also persist all running job stats in MSSQL as well, or will that remain in the /tmp_deploy/reports file?

My Scenario

I currently run a single instance of JS Report, as a windows-service, using file-system store configuration.
My app communicates with JS Report via .NET sdk (via web API), and has JS Report save the rendered files to a network drive that is ultimately picked up by a polling service that sends the PDF files to a printer.
My app will register a job with JS Report, then check back with JS Report on interval basis regarding current status of job. When done, app will release another job to JS Report.
My app currently uses JS Report to render PDF documents, using the Chrome-PDF Recipe.

My Problem

I'm required to render 5K+ documents a day, and my average render time per document is 8~10 seconds, which is not acceptable.
I have identified several areas in my template that can be optimized (there are a few costly network calls that the template has to make eg, remote loading dynamic images)
In addition to optimizing the templates as much as possible, I'd also like to load balance requests, to allow multithreading requests to JS Report.

My Load Balancing Strategy

Initially, i've designed and developed a POC that uses dockerized instance of HA Proxy, with 3+ instances of dockerized JS Report. Each dockerized instance is mapped to a common/shared directory for templates and jsreport.config file. In this case, my app sends all traffic to HA Proxy, which then round-robins the jobs a pool of JS Report servers. 2 Draw backs have arisen:

due to round robin strategy, i'm not guaranteed that my app will be connected to the servicing JS Report instance for it's job status check (see My Scenario/#3 ).
If JS Report is designed to support instance clustering, then the use of HA Proxy may be un needed overhead.

my PoC overview

0_1726094861733_upload-e88cadbf-e39d-4c78-a7a9-7b6faa6c9a08

admin

jsreport is a web app just like any other. If you want to run it on multiple servers and send request to it. You need some proxy/load balancing technology in front.

jsreport (single instance) scaling is based on threads. You can use workers.numbeOfWorkers=10 config to increase the number of allocated threads and increase this way the number of requests that are processed in parallel.

You mention you want to use the reports extension and ping for report status. Not sure if this is a must for you and you can't just wait in sync for report output. However, this extension will need a single source of truth data as you mention. You can start with file system and shared volume to all jsreport instances. But the synchronization of file system changes between jsreport instances can be time-consuming. If this turns out to be a problem you can choose a jsreport database driver of your choice. Every driver can store also report blobs in the particular database so using reports extension won't be a problem.

To answer the general question. If the single server isn't enough for your traffic, you have no other way than to scale to multiple servers and use a load balancing technology.

zhwatts

Hi Jan, your comments validate what I suspected. Yes, it’s a requirement to use the report extension, as the originally requesting IIS app is not permitted to handle the output PDFs, due to content sensitivity.

Is it recommended to use Nodes clustering option over using something like HA Proxy, as I described in my original post?

Is there a good guide/ outline on how to configure JS Report for clustering I can use as a reference?

Thanks for your help!

admin

Hi Jan, your comments validate what I suspected. Yes, it’s a requirement to use the report extension, as the originally requesting IIS app is not permitted to handle the output PDFs, due to content sensitivity.

You may also consider using jsreport script afterRender hook to for example upload the response buffer somewhere and clean it up before returning back to the caller.
https://jsreport.net/learn/scripts

Is it recommended to use Nodes clustering option over using something like HA Proxy, as I described in my original post?

As far as I know, nodejs clustering is useful only to distribute traffic on a single server. You basically fork one process into more. This won't help you with multi-server scalling. Also, it makes no sense to use it with jsreport, because jsreport already uses worker threads to distribute the traffic. The main entry point is designed to do a little work and never stuck. It makes no sense to fork it. In other words, it makes no sense to cluster jsreport on single server.

Is there a good guide/ outline on how to configure JS Report for clustering I can use as a reference?

If you mean a guide for deploying jsreport to multiple servers, then no. It is just like any other web app with db. Every general approach should work.

zhwatts

Ok. In that case, what is the purpose of this clustering config as described here? https://jsreport.net/learn/fs-store?version=2.11.0#running-in-cluster

admin

It's not a clustering config, but rather a recommendation to increase some automatic internal intervals to ensure better run when multiple servers are writing to the same disk. It will work also without those changes.