Question
Converting HTML and CSS to PDF in PHP: Core Concepts, Limits, and Practical Approaches
Question
I have an HTML document, not XHTML, that renders correctly in Firefox 3 and IE 7. It uses fairly basic CSS, and the page displays as expected in a browser.
I now need to convert that HTML into a PDF.
I have tried several tools:
- DOMPDF: It had major problems with tables. After simplifying some large nested tables, it improved slightly, but before that it was consuming up to 128 MB of memory and then failing, which matches the memory limit in
php.ini. Even after simplification, it still produces broken table layouts and does not reliably include images. The tables are relatively basic and mostly use border styles for lines. - HTML2PDF / HTML2PS: These worked somewhat better. Some images rendered correctly, including images from Google Chart URLs, and table formatting was better. However, the conversion still failed with
unknown node_type()errors that I could not fully diagnose. - htmldoc: This handled simple HTML, but it has very limited CSS support. It appears to require most formatting to be done directly in HTML, which makes it unsuitable for my needs.
I also tried a Windows application called Html2Pdf Pilot, which did a fairly good job. However, I need a solution that runs on Linux and ideally can generate PDFs on demand through PHP on the web server.
What is the main issue I am running into when converting HTML and CSS to PDF, and what practical approaches are usually used to solve it?
Short Answer
By the end of this page, you will understand why converting browser-rendered HTML and CSS to PDF is difficult, especially in PHP. You will learn that PDF generators usually support only part of HTML and CSS, why complex tables and remote images often fail, and what practical strategies developers use to generate stable PDFs on Linux servers.
Concept
HTML in a browser and PDF output are different rendering environments.
A web browser like Firefox or Chrome has a full layout engine that understands a large part of HTML, CSS, fonts, images, table layout rules, and modern rendering behavior. A PDF library usually does not use the same engine. Instead, it often has its own simplified HTML/CSS parser and layout system.
That is the core problem.
Why this matters
When developers say, “My HTML looks fine in the browser, so why does the PDF look broken?”, the answer is usually:
- the PDF tool supports only a subset of HTML
- the CSS engine is incomplete
- table layout is harder than it looks
- nested tables and large documents use a lot of memory
- remote assets like images, fonts, or charts may fail to load
- malformed HTML can break strict parsers
Common limitations of HTML-to-PDF tools
Most server-side HTML-to-PDF tools struggle with one or more of these:
- complex or deeply nested tables
- floats and positioning
- unsupported CSS properties
- page breaks inside tables
- remote image loading
- font embedding
- invalid HTML structure
- memory usage on large documents
Why PDF generation is different from browser rendering
Browsers render for screens with flexible sizing and dynamic reflow. PDFs render for fixed pages.
That means the converter must answer hard layout questions such as:
- Where should a table row split across pages?
- What happens if an image is taller than the remaining page space?
- How are margins, headers, and footers applied?
- Which CSS rules should win when pagination happens?
These are not trivial problems, and many libraries solve them only partially.
Mental Model
Think of a browser as a full kitchen with every appliance, while many PDF libraries are more like a small camping stove.
Both can cook food, but one can handle almost anything and the other works best with simple, controlled recipes.
Your HTML page may look perfect in the browser because the browser has:
- a powerful CSS engine
- advanced table layout rules
- flexible image handling
- mature error recovery for imperfect HTML
A PDF converter often has:
- limited CSS support
- strict parsing rules
- weaker table handling
- less memory available
So the fix is often not “make the PDF tool act like a browser,” but rather “prepare simpler input that the PDF tool can handle reliably.”
Syntax and Examples
Basic idea in PHP
A common pattern is:
- build an HTML string
- pass it to a PDF library
- render and stream or save the PDF
Example using a generic PHP-style flow:
<?php
$html = '
<h1>Invoice</h1>
<table border="1" cellpadding="6" cellspacing="0">
<tr>
<th>Item</th>
<th>Price</th>
</tr>
<tr>
<td>Book</td>
<td>$10</td>
</tr>
</table>
';
// Pseudocode: exact API depends on the library
$pdf = new PdfRenderer();
$pdf->loadHtml($html);
$pdf->render();
$pdf->output('invoice.pdf');
Beginner-friendly example: keep HTML simple
<?php
$html = '
<!DOCTYPE html>
<html>
<head>
<style>
body { font-family: Arial, sans-serif; font-size: 14px; }
h1 { color: #333; }
table { width: 100%; border-collapse: collapse; }
th, td { border: 1px solid #999; padding: 8px; text-align: left; }
</style>
</head>
<body>
<h1>Order Summary</h1>
<table>
<tr>
<th>Product</th>
<th>Quantity</th>
</tr>
<tr>
<td>Pen</td>
<td>3</td>
</tr>
</table>
</body>
</html>';
This kind of HTML is more likely to work than a page with:
Step by Step Execution
Trace example
Consider this HTML:
<?php
$html = '
<html>
<head>
<style>
table { border-collapse: collapse; width: 100%; }
td { border: 1px solid black; padding: 4px; }
</style>
</head>
<body>
<table>
<tr><td>Name</td><td>Alice</td></tr>
<tr><td>Role</td><td>Admin</td></tr>
</table>
</body>
</html>';
Now imagine a PDF library processes it.
Step 1: Parse HTML
The library reads the tags:
<html><head><style><body><table><tr>and<td>
If the HTML is malformed, some libraries fail here or silently ignore broken parts.
Step 2: Parse CSS
The library reads:
table { border-collapse: collapse; width: 100%; }
td { border: 1px solid black; : ; }
Real World Use Cases
HTML-to-PDF conversion is common in real applications.
Invoices and receipts
Apps often generate invoice PDFs from order data. These usually work well because the layout is controlled and simple.
Reports
Internal systems generate downloadable reports with tables, totals, and charts. This is where complexity increases and layout bugs often appear.
Certificates and tickets
These often use fixed layouts, logos, and a few styled blocks. They are usually easier than table-heavy reports.
Export features in admin dashboards
A dashboard may show rich HTML in the browser, but the PDF export often uses a separate simplified template for reliability.
Contracts and printable forms
These require stable pagination, consistent fonts, and predictable spacing. Developers often avoid advanced CSS and use print-specific templates.
Real Codebase Usage
In real projects, developers rarely send a full interactive web page directly into a PDF engine and hope it matches the browser exactly.
Common patterns
- Separate PDF template: Create a dedicated view for PDF output.
- Guard clauses: Check that required data exists before rendering.
- Validation: Ensure image paths, fonts, and variables are valid.
- Preprocessing: Convert dynamic charts to image files first.
- Early simplification: Flatten nested tables or replace them with simpler structures.
- Error handling: Log failed asset loads and rendering errors.
Example pattern in PHP
<?php
function generateInvoicePdf(array $order)
{
if (empty($order['items'])) {
throw new InvalidArgumentException('Order must contain at least one item.');
}
$html = buildInvoiceHtml($order); // dedicated PDF template
$pdf = new PdfRenderer();
$pdf->loadHtml($html);
try {
->();
} ( ) {
( . ->());
;
}
->();
}
Common Mistakes
1. Expecting browser-perfect rendering
A browser and a PDF library do not have the same rendering engine.
Avoid this by: designing simpler PDF-specific HTML.
2. Using complex nested tables for layout
Deeply nested tables are hard to paginate and can consume a lot of memory.
Broken approach:
<table>
<tr>
<td>
<table>
<tr>
<td>
<table>
<tr><td>Too deeply nested</td></tr>
</table>
</td>
</tr>
</table>
</td>
</tr>
</table>
Better: use a flatter structure.
3. Relying on unsupported CSS
Comparisons
Common approaches to PDF generation
| Approach | How it works | Strengths | Weaknesses | Best for |
|---|---|---|---|---|
| HTML-to-PDF library in PHP | Parses HTML/CSS and renders PDF | Easy to integrate with PHP apps | Partial CSS support, table issues, memory problems | Simple invoices, reports, receipts |
| Browser-based rendering | Uses a real browser engine to print to PDF | Better HTML/CSS accuracy | Heavier infrastructure | Complex modern layouts |
| Direct PDF drawing | Build PDF with coordinates and drawing commands | Very reliable and precise | More manual work | Highly structured documents |
| Separate print template | Custom HTML built specifically for PDF | More predictable results | Requires extra template maintenance |
Cheat Sheet
Quick reference
- Browser rendering and PDF rendering are not the same.
- Most HTML-to-PDF tools support only part of HTML and CSS.
- Keep PDF HTML simple and well-formed.
- Prefer a dedicated PDF template.
- Avoid deep nested tables.
- Test remote images and fonts carefully.
- Watch PHP memory limits on large documents.
Good practices
Use simple HTML
Use embedded or inline CSS
Use valid markup
Use local assets when possible
Keep tables flat
Test page breaks early
Warning signs
- layout looks perfect in browser but broken in PDF
- large tables cause crashes
- images randomly disappear
- CSS rules are ignored
- render works only for small inputs
Safer workflow
- Start with minimal HTML
- Verify text and tables render
- Add CSS gradually
- Test images and fonts
- Create a separate PDF template if needed
- Add error logging and memory monitoring
Rule of thumb
If the document is business-critical, do not depend on full browser-like behavior from a lightweight HTML-to-PDF parser.
FAQ
Why does HTML look correct in the browser but break in PDF?
Because browsers use full rendering engines, while many PDF libraries support only a subset of HTML and CSS.
Are tables especially difficult in HTML-to-PDF conversion?
Yes. Table sizing, nested structures, and page splitting are common failure points.
Should I use the same HTML for the website and the PDF?
Usually not. A dedicated PDF template is often simpler and more reliable.
Why do remote images sometimes not appear in generated PDFs?
The server may not be able to fetch them because of network settings, SSL issues, disabled remote access, or library limitations.
Does valid HTML matter more for PDF generation?
Yes. Browsers can recover from invalid markup more easily than many PDF parsers can.
Can increasing PHP memory solve PDF rendering problems?
Sometimes it helps, but it does not fix unsupported CSS or broken table layout logic.
What kind of documents work best with HTML-to-PDF tools?
Simple invoices, receipts, certificates, and controlled report layouts usually work best.
What is the most practical long-term solution?
Use a PDF-specific template with simplified HTML, limited CSS, reliable assets, and good error handling.
Mini Project
Description
Build a simple PHP PDF export feature for an order summary. The purpose is to practice creating a PDF-friendly HTML template instead of reusing a full website page. This demonstrates how keeping markup simple improves reliability when converting HTML and CSS to PDF.
Goal
Create a PHP script that builds a clean HTML order summary and sends it to a PDF renderer using a simple, table-based layout.
Requirements
- Create a dedicated HTML template for the PDF output
- Include a title and a table of at least three order items
- Use simple embedded CSS for fonts, borders, and spacing
- Avoid nested tables and remote images
- Render the HTML through a PDF generator class or placeholder API
Keep learning
Related questions
How PHP foreach Actually Works with Arrays
Learn how PHP foreach works internally, including array copies, internal pointers, by-value vs by-reference behavior, and common pitfalls.
How to Check String Prefixes and Suffixes in PHP
Learn how to check whether a string starts or ends with specific text in PHP using simple functions and practical examples.
How to Check if a String Contains a Word in PHP
Learn how to check whether a PHP string contains a specific word using strpos and str_contains with clear examples and common mistakes.