Deterministic PDF Pagination in a Serverless Environment

I needed to paginate PDFs deterministically, which meant measuring content height before rendering. I couldn’t find a PDF layout library that exposed this — they all measured during render. That constraint shaped every downstream decision: the tool I picked, the pipeline I built, and the infrastructure I deployed on.

The Constraint

I was working with tables where rows had variable-length content — wrapped text, nested data, mixed languages. PDF pages fill by height, not row count. A table with 50 rows doesn’t produce 5 pages of 10 rows. It produces however many pages the rendered content requires, and that depends on each row’s height after text wrapping, font metrics, and padding are applied.

I couldn’t predict how many rows fit a page without measuring each row’s rendered height first.

┌───────────────────────────┐
│  Header + Margins  120pt  │
├───────────────────────────┤
│ Row 1 — 1 line      24pt  │
│ Row 2 — 3 lines     56pt  │
│ Row 3 — 1 line      24pt  │
│                           │
│ Row 4 — 8 lines    152pt  │
│                           │
│ Row 5 — 2 lines     40pt  │
│ Row 6 — 5 lines     96pt  │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│                           │
│     Remaining: 280pt      │
│                           │
└───────────────────────────┘
        Page (792pt)

Row 4 alone consumes 152pt — more than rows 1, 3, and 5 combined. A fixed rowsPerPage approach couldn’t account for this. The page either overflowed or ended up sparse.

The problem I kept hitting: the layout engines I tried measured during render, which meant I couldn’t build a measure-then-chunk pipeline with them. I needed height data before pagination, but the libraries I tried only produced height data after layout was complete.

What I Tried First

Before finding something that worked, I tried four approaches. Each one failed for my use case because it tried to work around the constraint instead of satisfying it.

Fixed rowsPerPage — Assumed uniform row height. Variable content caused overflow or content loss.

Library pagination config — Truncated content at page boundaries. Layout became unstable when rows spanned pages.

Height estimation from character count — Ignored font metrics, padding, styled text, and nested content. Too inaccurate.

Two-pass render (render → measure → re-render) — Doubled CPU and memory usage. Felt unsafe at scale in a serverless environment.

What Eventually Worked: Measure First, Render Second

I realized that browsers expose getBoundingClientRect() — post-layout height measurement — and that if I controlled the rendering environment (same viewport, same fonts, same CSS), the measurements would be consistent enough to paginate against. Playwright provides headless access to it.

This made a measure-then-chunk pipeline possible:

flowchart LR
    A["Build HTML"] --> B["Load in Playwright"]
    B --> C["Measure row heights"]
    C --> D["Chunk by height budget"]
    D --> E["Render PDF"]

Step 1: Build a single HTML document with the full table structure — all rows, all columns, real content. I didn’t paginate yet.

Step 2: Measure each row’s rendered height using the browser’s layout engine. Each row in the HTML has a data attribute, and I queried getBoundingClientRect().height for all of them in a single page.evaluate() call. The browser has already computed text wrapping, font metrics, and padding, so the measurement reflects what would actually render.

Step 3: Chunk rows by cumulative height budget. Each page has a fixed content area: page height minus header, footer, and margins. I walked through the measured heights, accumulating until the budget was exhausted, then started a new page.

Step 4: Handle rows that exceed the page budget. A single row might be taller than the content area — a cell with a large block of text, for example. Truncation would lose content. Instead, I let the row continue onto the next page with a visual indicator, and repeated the header. This was the trickiest part of the implementation — detecting oversized rows, splitting their content across pages in HTML, and maintaining consistent styling — and probably deserves its own write-up.

Step 5: Render the paginated HTML to PDF using Playwright’s page.pdf(). Each page is now a self-contained HTML section with the correct rows for its height budget.

Why Browser-First Over Library-First

This comparison reflects the libraries I evaluated — other PDF tooling may handle these differently.

Pre-render height measurement — The libraries I tried computed height during render, so it wasn’t available beforehand. Browsers expose getBoundingClientRect() after layout, which let me measure before committing to pagination.

Deterministic pagination — The libraries I evaluated made pagination decisions at render time that I couldn’t control. The browser approach let me measure first, then chunk — pagination became my logic, not the library’s.

Text wrapping accuracy — I found text wrapping varied across the libraries I tried. Browser wrapping is native and matched what would display in a web page.

Font metric handling — The libraries I looked at required bundled fonts and their own metric support. Browsers handle this natively through CSS @font-face.

Production Reality: Cold Starts and Async Design

Playwright solved the measurement problem, but introduced a new one: cold start time in serverless environments.

Chromium needs to start before any measurement can happen. In a warm Lambda invocation, this is fast. On a cold start, it can take several seconds. API Gateway enforces a 29-second request timeout — Playwright cold starts alone can exceed that.

I ran into this directly: if Playwright cold start exceeds the request timeout, synchronous PDF generation fails on the first invocation. Retries can amplify the problem if the system isn’t idempotent — each retry spawns another cold start.

I decided to move rendering off the request path.

sequenceDiagram
    participant Client
    participant API
    participant Queue
    participant Worker
    participant S3

    Client->>API: POST /jobs {documentId, params}
    API->>Queue: Enqueue job
    API-->>Client: 202 Accepted {jobId}

    Queue->>Worker: Deliver job
    Worker->>Worker: Launch Playwright, measure, render
    Worker->>S3: Upload PDF
    Worker->>API: Mark job complete

    Client->>API: GET /jobs/{jobId}
    API-->>Client: {status: "complete", downloadUrl}
    Client->>S3: Download PDF

The client submits a job, receives a job ID immediately, and polls for completion. Cold start time is decoupled from the request path — the user sees a brief wait during polling rather than a timeout error.

Hardening Layers

Each layer assumes the layer below it can fail.

Idempotency. A fingerprint derived from the job parameters is stored alongside the job record. If a duplicate request arrives, the system returns the existing job instead of creating a new one. I used DynamoDB for the job store, so this is enforced with a conditional write.

Atomic state transitions. Job status changes use conditional expressions — no read-modify-write cycles. A claim operation only succeeds if the job is still in pending state. If two workers try to claim the same job, the conditional write ensures only one succeeds.

Terminal state immutability. Once a job reaches complete or failed, no further state transitions are allowed. This prevents stale workers from overwriting valid results.

stateDiagram-v2
    [*] --> pending
    pending --> claimed : Worker claims (conditional write)
    claimed --> processing : Worker starts rendering
    claimed --> pending : Lease expires (no heartbeat)
    processing --> complete : PDF uploaded
    processing --> failed : Unrecoverable error
    complete --> [*]
    failed --> [*]

Lease-based concurrency control. A claimed job includes a lease expiry timestamp. If the worker crashes or stalls, the lease expires and the job returns to pending for another worker to pick up.

Checkpoints. For multi-step rendering (measure → chunk → render), the worker writes intermediate state to S3. If it fails mid-render, the next attempt can resume from the last checkpoint rather than restarting.

Bounded retries. Infrastructure failures (timeouts, throttling) are retried with backoff. Business failures (invalid input, unsupported format) are terminal immediately. The distinction prevents infinite retry loops.

Trade-offs I’m Still Thinking About

Operational complexity. Playwright plus an async job system is substantially heavier than a library call. I reached for this approach because I was working with documents where variable content meant content loss was unacceptable. Others might have different constraints.

Cold start cost. The async design hides cold start latency from users but doesn’t eliminate it. First-job latency is still high. Pre-warming strategies help but add their own complexity.

Unbounded input. Input is effectively unbounded — a cell could contain thousands of words, a table could have thousands of rows. The measurement and chunking steps handle this correctly, but not necessarily efficiently. At some document size, the worker timeout becomes the binding constraint instead of the page budget.

When I wouldn’t use this. If I had content that was fixed-height — every row a single line, same font, same padding — a simple rowsPerPage approach would work fine. If the page count was known in advance, no measurement would be needed. The simpler tool wins when the constraint doesn’t apply.

The constraint I started with — needing height data before pagination — led me through this entire stack. When the constraint changes, the stack can change with it.