Webpage to PDF
Try to fetch a webpage URL in the browser, or upload saved HTML, then export the preview as a PDF.
Back to all tools on ToolForge
Webpage URL:
Upload Saved HTML
Some websites block browser fetch because of CORS or anti-framing policies. If fetch fails, save the page HTML and upload it instead.
Preview
About Webpage to PDF
This webpage to PDF tool fetches publicly accessible webpage HTML or accepts uploaded HTML files, renders a preview, and exports the content as a PDF document using the html2canvas and jsPDF libraries.
It is useful for archiving web content, creating offline copies of documentation, saving receipts or confirmations, and generating PDFs from HTML prototypes. When direct URL fetch fails due to CORS policies, uploading a saved HTML file provides a reliable alternative.
How It Works
The tool uses a two-step conversion process:
- HTML Rendering: The webpage HTML is parsed and rendered in a preview div, with scripts and potentially harmful elements removed for security.
- Capture & PDF Generation: html2canvas captures the rendered content as a canvas image, then jsPDF creates a multi-page PDF document from the canvas.
// Conversion flow: HTML → Sanitize → Render in DOM → html2canvas → Canvas → jsPDF → PDF // Libraries used: - html2canvas v1.4.1: DOM to canvas screenshot - jsPDF v2.5.1: Canvas to PDF document
CORS and Fetch Limitations
Browser-based fetch requests are subject to Cross-Origin Resource Sharing (CORS) policies. Many websites block browser fetch to prevent scraping or embedding.
| Scenario | Fetch Result | Workaround |
|---|---|---|
| Public URL with CORS headers | ✓ May succeed | Direct fetch |
| URL without CORS headers | ✗ Blocked | Upload saved HTML |
| Site with anti-framing | ✗ Blocked | Upload saved HTML |
| Local file (file://) | ✗ Blocked | Upload HTML file |
HTML Sanitization
For security, the following elements and attributes are removed from input HTML:
- Script tags: All
<script>elements removed - IFrames: All
<iframe>elements removed - Object/Embed: All
<object>and<embed>removed - Event handlers: Attributes like
onclick,onload, etc. stripped - JavaScript URLs:
javascript:protocols removed from href/src
PDF Output Settings
| Setting | Value |
|---|---|
| Page size | A4 (595 × 842 points) |
| Orientation | Portrait |
| Canvas scale | 2x (for better quality) |
| Background color | White (#ffffff) |
| Margins | 20 points (approx. 7mm) |
Common Use Cases
- Web archiving: Save important web pages for offline reference
- Documentation: Export API docs, tutorials, or guides as PDFs
- Receipts & confirmations: Preserve order confirmations and booking receipts
- Content review: Create PDFs from HTML prototypes for client review
- Research: Archive web sources for academic or professional research
Limitations
- External images may not render if the server blocks cross-origin access
- CSS print-specific styles (@media print) are not applied
- Dynamic content loaded after initial render may not appear
- PDF contains rasterized images, not selectable/searchable text
- Very long pages may produce large PDF files
Alternative Approaches
For production-grade PDF generation, consider server-side tools:
- Puppeteer/Playwright: Headless Chrome for accurate HTML→PDF
- wkhtmltopdf: Uses WebKit for HTML rendering
- Prince XML: Commercial tool with CSS Paged Media support
- WeasyPrint: Python-based PDF generator with CSS support
Frequently Asked Questions
- How does webpage to PDF conversion work in the browser?
- This tool uses html2canvas to capture a screenshot of the rendered HTML content, then jsPDF to create a PDF document from the captured canvas. The HTML is sanitized to remove scripts and potentially harmful elements before rendering.
- Why does fetching some webpages fail?
- Many websites block browser-based fetch requests due to CORS (Cross-Origin Resource Sharing) policies and anti-framing protections. When fetch fails, upload a saved HTML file instead—save the page from your browser (Ctrl+S) and upload the .html file.
- What are the limitations of browser-based PDF generation?
- Browser-based PDF generation captures visual appearance as images, not selectable text. Complex layouts, external fonts, and dynamic content may not render perfectly. For production use, server-side tools like Puppeteer or wkhtmltopdf provide more reliable results.
- How does the tool handle multi-page PDFs?
- If the captured content exceeds one A4 page, the tool automatically adds additional pages. It calculates the remaining height and creates new pages as needed, maintaining the same image dimensions across all pages.
- What security measures are in place?
- The tool sanitizes HTML by removing script, iframe, object, embed tags and event handler attributes (onclick, etc.). It also strips javascript: URLs from href/src attributes. This prevents XSS attacks when rendering external HTML content.
- Why might external images not appear in the PDF?
- External images may be blocked by CORS policies when using html2canvas. The useCORS option is enabled, but the image server must explicitly allow cross-origin access. For best results, ensure images are from CORS-enabled servers or use data URLs.