Question

Converting HTML and CSS to PDF in PHP: Core Concepts, Limits, and Practical Approaches

phphtmlcsspdfpdf-generation

Question

I have an HTML document, not XHTML, that renders correctly in Firefox 3 and IE 7. It uses fairly basic CSS, and the page displays as expected in a browser.

I now need to convert that HTML into a PDF.

I have tried several tools:

DOMPDF: It had major problems with tables. After simplifying some large nested tables, it improved slightly, but before that it was consuming up to 128 MB of memory and then failing, which matches the memory limit in php.ini. Even after simplification, it still produces broken table layouts and does not reliably include images. The tables are relatively basic and mostly use border styles for lines.
HTML2PDF / HTML2PS: These worked somewhat better. Some images rendered correctly, including images from Google Chart URLs, and table formatting was better. However, the conversion still failed with unknown node_type() errors that I could not fully diagnose.
htmldoc: This handled simple HTML, but it has very limited CSS support. It appears to require most formatting to be done directly in HTML, which makes it unsuitable for my needs.

I also tried a Windows application called Html2Pdf Pilot, which did a fairly good job. However, I need a solution that runs on Linux and ideally can generate PDFs on demand through PHP on the web server.

What is the main issue I am running into when converting HTML and CSS to PDF, and what practical approaches are usually used to solve it?

Short Answer

By the end of this page, you will understand why converting browser-rendered HTML and CSS to PDF is difficult, especially in PHP. You will learn that PDF generators usually support only part of HTML and CSS, why complex tables and remote images often fail, and what practical strategies developers use to generate stable PDFs on Linux servers.

Concept

HTML in a browser and PDF output are different rendering environments.

A web browser like Firefox or Chrome has a full layout engine that understands a large part of HTML, CSS, fonts, images, table layout rules, and modern rendering behavior. A PDF library usually does not use the same engine. Instead, it often has its own simplified HTML/CSS parser and layout system.

That is the core problem.

Why this matters

When developers say, “My HTML looks fine in the browser, so why does the PDF look broken?”, the answer is usually:

the PDF tool supports only a subset of HTML
the CSS engine is incomplete
table layout is harder than it looks
nested tables and large documents use a lot of memory
remote assets like images, fonts, or charts may fail to load
malformed HTML can break strict parsers

Common limitations of HTML-to-PDF tools

Most server-side HTML-to-PDF tools struggle with one or more of these:

complex or deeply nested tables
floats and positioning
unsupported CSS properties
page breaks inside tables
remote image loading
font embedding
invalid HTML structure
memory usage on large documents

Why PDF generation is different from browser rendering

Browsers render for screens with flexible sizing and dynamic reflow. PDFs render for fixed pages.

That means the converter must answer hard layout questions such as:

Where should a table row split across pages?
What happens if an image is taller than the remaining page space?
How are margins, headers, and footers applied?
Which CSS rules should win when pagination happens?

These are not trivial problems, and many libraries solve them only partially.

Mental Model

Think of a browser as a full kitchen with every appliance, while many PDF libraries are more like a small camping stove.

Both can cook food, but one can handle almost anything and the other works best with simple, controlled recipes.

Your HTML page may look perfect in the browser because the browser has:

a powerful CSS engine
advanced table layout rules
flexible image handling
mature error recovery for imperfect HTML

A PDF converter often has:

limited CSS support
strict parsing rules
weaker table handling
less memory available

So the fix is often not “make the PDF tool act like a browser,” but rather “prepare simpler input that the PDF tool can handle reliably.”

Take Quiz

Syntax and Examples

Basic idea in PHP

A common pattern is:

build an HTML string
pass it to a PDF library
render and stream or save the PDF

Example using a generic PHP-style flow:

<?php
$html = '
  <h1>Invoice</h1>
  <table border="1" cellpadding="6" cellspacing="0">
    <tr>
      <th>Item</th>
      <th>Price</th>
    </tr>
    <tr>
      <td>Book</td>
      <td>$10</td>
    </tr>
  </table>
';

// Pseudocode: exact API depends on the library
$pdf = new PdfRenderer();
$pdf->loadHtml($html);
$pdf->render();
$pdf->output('invoice.pdf');

Beginner-friendly example: keep HTML simple

<?php
$html = '
<!DOCTYPE html>
<html>
<head>
  <style>
    body { font-family: Arial, sans-serif; font-size: 14px; }
    h1 { color: #333; }
    table { width: 100%; border-collapse: collapse; }
    th, td { border: 1px solid #999; padding: 8px; text-align: left; }
  </style>
</head>
<body>
  <h1>Order Summary</h1>
  <table>
    <tr>
      <th>Product</th>
      <th>Quantity</th>
    </tr>
    <tr>
      <td>Pen</td>
      <td>3</td>
    </tr>
  </table>
</body>
</html>';

This kind of HTML is more likely to work than a page with:

Step by Step Execution

Trace example

Consider this HTML:

<?php
$html = '
<html>
<head>
  <style>
    table { border-collapse: collapse; width: 100%; }
    td { border: 1px solid black; padding: 4px; }
  </style>
</head>
<body>
  <table>
    <tr><td>Name</td><td>Alice</td></tr>
    <tr><td>Role</td><td>Admin</td></tr>
  </table>
</body>
</html>';

Now imagine a PDF library processes it.

Step 1: Parse HTML

The library reads the tags:

<html>
<head>
<style>
<body>
<table>
<tr> and <td>

If the HTML is malformed, some libraries fail here or silently ignore broken parts.

Step 2: Parse CSS

The library reads:

table { border-collapse: collapse; width: 100%; }
td { border: 1px solid black; : ; }

Real World Use Cases

HTML-to-PDF conversion is common in real applications.

Invoices and receipts

Apps often generate invoice PDFs from order data. These usually work well because the layout is controlled and simple.

Reports

Internal systems generate downloadable reports with tables, totals, and charts. This is where complexity increases and layout bugs often appear.

Certificates and tickets

These often use fixed layouts, logos, and a few styled blocks. They are usually easier than table-heavy reports.

Export features in admin dashboards

A dashboard may show rich HTML in the browser, but the PDF export often uses a separate simplified template for reliability.

Contracts and printable forms

These require stable pagination, consistent fonts, and predictable spacing. Developers often avoid advanced CSS and use print-specific templates.

Take Quiz

Real Codebase Usage

In real projects, developers rarely send a full interactive web page directly into a PDF engine and hope it matches the browser exactly.

Common patterns

Separate PDF template: Create a dedicated view for PDF output.
Guard clauses: Check that required data exists before rendering.
Validation: Ensure image paths, fonts, and variables are valid.
Preprocessing: Convert dynamic charts to image files first.
Early simplification: Flatten nested tables or replace them with simpler structures.
Error handling: Log failed asset loads and rendering errors.

Example pattern in PHP

<?php
function generateInvoicePdf(array $order)
{
    if (empty($order['items'])) {
        throw new InvalidArgumentException('Order must contain at least one item.');
    }

    $html = buildInvoiceHtml($order); // dedicated PDF template

    $pdf = new PdfRenderer();
    $pdf->loadHtml($html);

    try {
        ->();
    }  ( ) {
        ( . ->());
         ;
    }

     ->();
}

Common Mistakes

1. Expecting browser-perfect rendering

A browser and a PDF library do not have the same rendering engine.

Avoid this by: designing simpler PDF-specific HTML.

2. Using complex nested tables for layout

Deeply nested tables are hard to paginate and can consume a lot of memory.

Broken approach:

<table>
  <tr>
    <td>
      <table>
        <tr>
          <td>
            <table>
              <tr><td>Too deeply nested</td></tr>
            </table>
          </td>
        </tr>
      </table>
    </td>
  </tr>
</table>

Better: use a flatter structure.

3. Relying on unsupported CSS

Comparisons

Common approaches to PDF generation

Approach	How it works	Strengths	Weaknesses	Best for
HTML-to-PDF library in PHP	Parses HTML/CSS and renders PDF	Easy to integrate with PHP apps	Partial CSS support, table issues, memory problems	Simple invoices, reports, receipts
Browser-based rendering	Uses a real browser engine to print to PDF	Better HTML/CSS accuracy	Heavier infrastructure	Complex modern layouts
Direct PDF drawing	Build PDF with coordinates and drawing commands	Very reliable and precise	More manual work	Highly structured documents
Separate print template	Custom HTML built specifically for PDF	More predictable results	Requires extra template maintenance

Cheat Sheet

Quick reference

Browser rendering and PDF rendering are not the same.
Most HTML-to-PDF tools support only part of HTML and CSS.
Keep PDF HTML simple and well-formed.
Prefer a dedicated PDF template.
Avoid deep nested tables.
Test remote images and fonts carefully.
Watch PHP memory limits on large documents.

Good practices

Use simple HTML
Use embedded or inline CSS
Use valid markup
Use local assets when possible
Keep tables flat
Test page breaks early

Warning signs

layout looks perfect in browser but broken in PDF
large tables cause crashes
images randomly disappear
CSS rules are ignored
render works only for small inputs

Safer workflow

Start with minimal HTML
Verify text and tables render
Add CSS gradually
Test images and fonts
Create a separate PDF template if needed
Add error logging and memory monitoring

Rule of thumb

If the document is business-critical, do not depend on full browser-like behavior from a lightweight HTML-to-PDF parser.

FAQ

Why does HTML look correct in the browser but break in PDF?

Because browsers use full rendering engines, while many PDF libraries support only a subset of HTML and CSS.

Are tables especially difficult in HTML-to-PDF conversion?

Yes. Table sizing, nested structures, and page splitting are common failure points.

Should I use the same HTML for the website and the PDF?

Usually not. A dedicated PDF template is often simpler and more reliable.

Why do remote images sometimes not appear in generated PDFs?

The server may not be able to fetch them because of network settings, SSL issues, disabled remote access, or library limitations.

Does valid HTML matter more for PDF generation?

Yes. Browsers can recover from invalid markup more easily than many PDF parsers can.

Can increasing PHP memory solve PDF rendering problems?

Sometimes it helps, but it does not fix unsupported CSS or broken table layout logic.

What kind of documents work best with HTML-to-PDF tools?

Simple invoices, receipts, certificates, and controlled report layouts usually work best.

What is the most practical long-term solution?

Use a PDF-specific template with simplified HTML, limited CSS, reliable assets, and good error handling.

Related Concepts

HTML parsing — PDF tools must parse your markup before they can render it.
CSS support — Limited CSS support is one of the main causes of unexpected PDF output.
Table layout — Tables are a frequent source of rendering and pagination problems.
Pagination — PDFs must split content across fixed-size pages.
PHP memory limits — Large HTML documents and images can exceed available memory.
Template rendering — Many applications generate PDF HTML from dedicated templates.
Print stylesheets — Print-focused styling helps produce cleaner PDF output.
Server-side asset loading — Images, fonts, and remote resources must be accessible from the server.
Error handling in PHP — Logging and exception handling help diagnose render failures.
Document generation — PDF creation is part of a larger category of generated outputs like invoices and reports.

Take Quiz

Mini Project

Description

Build a simple PHP PDF export feature for an order summary. The purpose is to practice creating a PDF-friendly HTML template instead of reusing a full website page. This demonstrates how keeping markup simple improves reliability when converting HTML and CSS to PDF.

Goal

Create a PHP script that builds a clean HTML order summary and sends it to a PDF renderer using a simple, table-based layout.

Requirements

Create a dedicated HTML template for the PDF output
Include a title and a table of at least three order items
Use simple embedded CSS for fonts, borders, and spacing
Avoid nested tables and remote images
Render the HTML through a PDF generator class or placeholder API

Take Quiz

Keep learning

Option	Pros	Cons
Reuse the same HTML as the website	Less duplicated markup	Often unreliable for PDF
Build a dedicated PDF template	Stable and easier to debug	Extra development effort

Technique	Good for	Risk
HTML tables	Tabular business data	Pagination and nesting issues
Direct PDF positioning	Fixed forms and exact layouts	Harder to maintain

Converting HTML and CSS to PDF in PHP: Core Concepts, Limits, and Practical Approaches

Question

Short Answer

Concept

Why this matters

Common limitations of HTML-to-PDF tools

Why PDF generation is different from browser rendering

Mental Model

Syntax and Examples

Basic idea in PHP

Beginner-friendly example: keep HTML simple

Step by Step Execution

Trace example

Step 1: Parse HTML

Step 2: Parse CSS

Real World Use Cases

Invoices and receipts

Reports

Certificates and tickets

Export features in admin dashboards

Contracts and printable forms

Real Codebase Usage

Common patterns

Example pattern in PHP

Common Mistakes

1. Expecting browser-perfect rendering

2. Using complex nested tables for layout

3. Relying on unsupported CSS

Comparisons

Common approaches to PDF generation

Cheat Sheet

Quick reference

Good practices

Warning signs

Safer workflow

Rule of thumb

FAQ

Why does HTML look correct in the browser but break in PDF?

Are tables especially difficult in HTML-to-PDF conversion?

Should I use the same HTML for the website and the PDF?

Why do remote images sometimes not appear in generated PDFs?

Does valid HTML matter more for PDF generation?

Can increasing PHP memory solve PDF rendering problems?

What kind of documents work best with HTML-to-PDF tools?

What is the most practical long-term solution?

Related Concepts

Mini Project

Description

Goal

Requirements

Related questions

Are PDO Prepared Statements Enough to Prevent SQL Injection in PHP?

Can You Bind an Array to an IN Clause in PHP PDO?

Choosing the Right MySQL Collation for PHP and UTF-8

The practical lesson

Better input usually means better PDF output

Step 3: Build layout boxes

Step 4: Paginate

Step 5: Render assets

Step 6: Output PDF bytes

Why this works well

4. Using remote images without checking access

5. Feeding invalid or messy HTML

6. Ignoring memory limits

7. Reusing screen HTML for PDF export

Separate web view vs separate PDF view

HTML tables vs direct PDF layout