Convert HTML to PDF in Python Using xhtml2pdf
Are you looking to convert HTML content into polished PDF reports using Python? You've come to the right place! This comprehensive guide will walk you through the entire process of creating professional PDFs from HTML using the powerful Python library — xhtml2pdf. By leveraging the flexibility of HTML/CSS for design and Python's robust libraries for processing, you can create beautiful, data-driven documents that are ready for distribution or printing.
Meet the Python Libraries: xhtml2pdf and Jinja2
xhtml2pdf (formerly known as pisa) — is a Python library that utilizes the ReportLab Toolkit to transform HTML and CSS into PDF documents. Developed entirely in Python, it offers support for HTML5 and CSS 2.1 (with some CSS3 features), making it an excellent option for converting simple HTML content into PDF format. However, it may encounter difficulties when handling highly complex layouts or advanced styling.
- Role in the process: Interprets HTML/CSS layouts and translates them into the corresponding PDF format, enabling web-like styling in printed documents.
- Learn more: xhtml2pdf Documentation
Jinja2 — is a lightweight and flexible templating engine for Python that dynamically generates HTML content.
- Role in the process: Enables the creation of customizable HTML templates that are later converted into PDFs via xhtml2pdf and populated with dynamic data.
- Learn more: Jinja2 Documentation
From HTML to PDF: A Step-by-Step Approach with xhtml2pdf
Prerequisites
-
Python 3.6 or higher
- Ensure you have Python installed on your system.
- You can download the latest version from Python.org.
-
Code Editor
- Choose your preferred code editor.
- Popular options include Visual Studio Code or PyCharm.
Setting Up Your Environment
Let's start by setting up our project environment and installing the necessary libraries.
Project Folder Structure
Before we dive into coding, let's establish a clear project structure to organize our files. This helps maintain separation of concerns and makes the project more maintainable.
html-to-pdf-project/ # Root directory
├── data/ # Data files
│ └── annual_data.json # Example data file for report
├── image/ # Image assets
│ └── logo.png
├── templates/ # Jinja2 HTML templates
│ ├── annual_report.html # Report template
│ └── styles.css # CSS styles for reports
├── utils/
│ └── chart_generator.py # Module for creating charts
└── generate_annual_report.py # Main script
Installing Required Libraries
Now, let's install the necessary Python libraries we'll need for our PDF generation system:
pip install xhtml2pdf jinja2 matplotlib
xhtml2pdf
: Converts HTML/CSS to PDF.jinja2
: Creates dynamic HTML templates.matplotlib
: Produces charts and visualizations.
Creating Jinja2 HTML Templates
Our HTML template is the foundation of our report's structure. Using Jinja2 templating, we can dynamically inject data into a predefined layout.
View code - annual_report.html
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>{{ company_name }} - {{ title }} - {{ year }}</title>
<style>
{% include 'styles.css' %}
</style>
</head>
<body>
<div class="header">
<div class="logo-container">
<img src="{{ logo_path }}" class="company-logo" alt="Company Logo">
</div>
<h1>{{ title }} - {{ year }}</h1>
</div>
<div>
<h3>Executive Summary</h3>
<p class="summary-section">
This report presents the sales performance for the fiscal year {{ year }}.
Total sales reached ${{ '{:,.2f}'.format(total_sales) }}, representing a
{{ '{:.1f}'.format(growth_vs_prev_year) }}% growth compared to the previous year.
</p>
</div>
<div>
<h3>Annual Sales Performance</h3>
<div class="chart">
<img src="{{ sales_chart_path }}" alt="Monthly Sales Chart">
<p class="caption">Fig 1: Monthly Sales for {{ year }}</p>
</div>
<div class="chart">
<img src="{{ quarterly_chart_path }}" alt="Quarterly Sales Chart" style="width: 500px;">
<p class="caption">Fig 2: Quarterly Sales Breakdown</p>
</div>
<h3>Product Category Breakdown</h3>
<div class="chart">
<img src="{{ product_chart_path }}" alt="Product Sales Chart" style="width: 500px;">
<p class="caption">Fig 3: Sales by Product Category</p>
</div>
</div>
<div class="data-section">
<h3>Monthly Sales Data</h3>
<table>
<thead>
<tr>
<th>Month</th>
<th>Sales ($)</th>
<th>Orders</th>
<th>Avg. Order Value</th>
<th>Month-over-Month Growth (%)</th>
</tr>
</thead>
<tbody>
{% for month in months %}
<tr>
<td>{{ month }}</td>
<td>${{ '{:,.2f}'.format(monthly_sales[month]) }}</td>
<td>{{ '{:,}'.format(monthly_orders[month]) }}</td>
<td>${{ '{:.2f}'.format(monthly_sales[month] / monthly_orders[month]) }}</td>
<td class="{{ growth_classes[month] }}">{{ growth_values[month]|safe }}</td>
</tr>
{% endfor %}
</tbody>
</table>
</div>
<div>
<h3>Conclusion</h3>
<p class="summary-section">{{ final_summary }}</p>
</div>
</body>
</html>
Notice how we've used Jinja2's templating syntax ({{ variable }}
and {% for %}
blocks) to dynamically insert data. This allows us to create a single template that can generate different reports based on the data we provide.
Styling with CSS
A well-designed report needs proper styling. Our CSS file defines the visual appearance of the report.
View code - styles.css
* {
box-sizing: border-box;
}
h1, h2, h3, div, section {
border: none !important;
}
body {
font-family: Helvetica, Arial, sans-serif;
margin: 0;
padding: 20px;
color: #333;
}
.header {
text-align: center;
margin-bottom: 30px;
padding-bottom: 10px;
}
.logo-container {
display: flex;
align-items: center;
justify-content: center;
margin-bottom: 15px;
}
.company-logo {
height: 150px;
}
h1 {
font-size: 32px;
color: #663399;
}
h3 {
margin-bottom: 15px;
font-size: 18px;
padding-bottom: 5px;
background-color: #f7f7f7;
border-radius: 10px;
}
.summary-section {
font-size: 14px;
}
.chart {
margin-bottom: 30px;
background-color: white;
padding: 10px;
border: 1px solid #ddd;
border-radius: 5px;
text-align: center;
}
.caption {
text-align: center;
font-size: 12px;
color: #555;
margin-top: 5px;
}
table {
width: 100%;
border-collapse: collapse;
margin: 15px 0;
}
th {
background-color: #582888;
color: white;
padding: 10px 5px 5px;
text-align: center;
font-size: 14px;
}
tr:nth-child(even) {
background-color: #f2f2f2;
}
td {
padding: 10px 5px 5px;
font-size: 14px;
border-bottom: 1px solid #ddd;
}
.positive-growth {
color: #28a745;
font-weight: bold;
}
.negative-growth {
color: #dc3545;
font-weight: bold;
}
Creating the Data Source
Our PDF report is data-driven, so we need a structured data source. We'll use a JSON file to store our sales data. This approach separates the data from the presentation logic, making it easy to generate different reports by simply changing the input data.
View example data - annual_data.json
{
"title": "Annual Sales Report",
"company_name": "Lorem Ipsum Company",
"year": 2024,
"total_sales": 3847250.50,
"growth_vs_prev_year": 8.7,
"generation_date": "auto",
"monthly_sales": {
"January": 265120.75,
"February": 278350.45,
"March": 303779.30,
"April": 312540.25,
"May": 325680.50,
"June": 342750.80,
"July": 328950.40,
"August": 318760.30,
"September": 337820.65,
"October": 346950.75,
"November": 348621.35,
"December": 365875.00
},
"monthly_orders": {
"January": 1854,
"February": 1932,
"March": 2102,
"April": 2185,
"May": 2253,
"June": 2376,
"July": 2215,
"August": 2104,
"September": 2267,
"October": 2398,
"November": 2425,
"December": 2576
},
"category_sales": {
"Electronics": 1254300.25,
"Clothing": 886700.50,
"Home & Kitchen": 724800.75,
"Sports": 568200.00,
"Books": 356250.00,
"Toys & Games": 285000.00
},
"quarterly_breakdown": {
"Q1": 847250.50,
"Q2": 981971.55,
"Q3": 985531.35,
"Q4": 1061447.10
},
"final_summary": "The Annual Sales Report for 2024 demonstrates robust growth across multiple metrics. Monthly trends indicate strong peaks and sustained improvements, while the quarterly and category breakdowns highlight key areas of strength. Overall, this report provides a comprehensive view of our successful year and lays a strong foundation for future strategies. Moving forward, the company can leverage these insights to refine sales strategies, optimize product offerings, and enhance customer engagement, thereby laying a solid foundation for sustained growth in the upcoming fiscal years."
}
Creating Data Visualizations
A picture is worth a thousand words, especially in financial reports. Our chart generation module creates three visualizations:
- A monthly sales bar chart.
- A quarterly sales breakdown chart.
- A product category pie chart.
The module handles everything from data extraction to chart styling and converts the matplotlib figures to base64-encoded PNG images that can be embedded directly in our HTML.
View code - chart_generator.py
import base64
import io
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter
def remove_spines(ax):
# Remove axis borders for a clean look
for spine in ax.spines.values():
spine.set_visible(False)
def set_thousands_formatter(ax):
# Format y-axis ticks to display values in thousands
def thousands_formatter(x, pos):
return f'${x / 1000:.0f}k'
ax.yaxis.set_major_formatter(FuncFormatter(thousands_formatter))
def fig_to_base64(fig):
# Convert the Matplotlib figure to a Base64-encoded PNG string
buffer = io.BytesIO()
fig.savefig(buffer, format='png', dpi=300, bbox_inches='tight', transparent=True)
plt.close(fig)
buffer.seek(0)
encoded = base64.b64encode(buffer.getvalue()).decode('utf-8')
buffer.close()
return f"data:image/png;base64,{encoded}"
def create_sales_chart(sales_data, year):
# Create a bar chart for monthly sales data
months = list(sales_data['monthly_sales'].keys())
values = list(sales_data['monthly_sales'].values())
fig, ax = plt.subplots(figsize=(10, 5))
bars = ax.bar(months, values, color='#AB8DC1')
set_thousands_formatter(ax)
# Annotate each bar with its sales value
for i, bar in enumerate(bars):
height = bar.get_height()
ax.text(
bar.get_x() + bar.get_width() / 2., height + 5000,
f'${values[i] / 1000:.0f}k', ha='center', va='bottom', fontsize=11
)
remove_spines(ax) # Clean up the chart appearance
plt.xticks(rotation=45) # Rotate x-axis labels for readability
plt.ylabel('Sales')
plt.title(f'Monthly Sales for {year}', fontsize=14)
plt.tight_layout()
return fig_to_base64(fig) # Return the chart image as a Base64 string
def create_quarterly_chart(sales_data):
# Create a bar chart for quarterly sales data
quarters = list(sales_data['quarterly_breakdown'].keys())
values = list(sales_data['quarterly_breakdown'].values())
fig, ax = plt.subplots(figsize=(8, 5))
bars = ax.bar(quarters, values, color='#795473')
set_thousands_formatter(ax)
plt.ylabel('Sales')
# Annotate each bar with its quarterly sales value
for i, bar in enumerate(bars):
height = bar.get_height()
ax.text(
bar.get_x() + bar.get_width() / 2., height + 20000,
f'${values[i] / 1000:.0f}k', ha='center', va='bottom', fontsize=11
)
remove_spines(ax)
plt.title('Quarterly Sales Breakdown', fontsize=14)
plt.tight_layout()
return fig_to_base64(fig)
def create_product_breakdown_chart(sales_data):
# Create a pie chart for product category sales breakdown
categories = list(sales_data['category_sales'].keys())
values = list(sales_data['category_sales'].values())
fig, ax = plt.subplots(figsize=(9, 8))
# Generate the pie chart with percentage labels
wedges, texts, autotexts = ax.pie(
values, labels=None, autopct='%1.1f%%',
colors=['#F0C7E5', '#E296D6', '#E4B4E4', '#D7A5EC', '#AB8DC1', '#9d70d5']
)
for autotext in autotexts:
autotext.set_fontsize(12)
autotext.set_weight('bold')
# Create a legend with sales values formatted in thousands
legend_labels = [f'{cat}: ${val / 1000:.0f}k' for cat, val in zip(categories, values)]
ax.legend(legend_labels, loc='center left', bbox_to_anchor=(1.05, 0.5), frameon=False, fontsize=13)
ax.axis('equal') # Ensure the pie chart is circular
plt.title('Product Category Breakdown', fontsize=14)
plt.tight_layout()
return fig_to_base64(fig)
The Main Script: Putting It All Together
Now let's create the main script that ties everything together.
This script handles:
- Loading data from the JSON file.
- Generating charts using our chart generator.
- Preparing the template data with calculated values.
- Rendering the HTML template with Jinja2.
- Converting the rendered HTML to PDF.
- Saving the final PDF report with a timestamp.
View code - generate_annual_report.py
import os
import sys
import json
import datetime
import base64
from xhtml2pdf import pisa
from utils.chart_generator import create_sales_chart, create_product_breakdown_chart, create_quarterly_chart
from jinja2 import Environment, FileSystemLoader
# Added PDF conversion function
def convert_html_to_pdf(html_content, output_filename):
# Open the output file in write mode
with open(output_filename, "wb") as output_file:
# Convert HTML to PDF
conversion_status = pisa.CreatePDF(
html_content, # HTML content string
dest=output_file # Output file handle
)
# Return True if successful
return conversion_status.err == 0
def load_data(json_file_path):
# Check if file exists before attempting to read it
if not os.path.exists(json_file_path):
raise FileNotFoundError(f"Data file not found: {json_file_path}")
# Load JSON data from file
with open(json_file_path, 'r') as file:
data = json.load(file)
# Auto-set generation date if marked as 'auto'
if data.get('generation_date') == 'auto':
data['generation_date'] = datetime.datetime.now().strftime('%Y-%m-%d')
return data
def prepare_logo_path(logo_file_path):
# Check if logo file exists
if not os.path.exists(logo_file_path):
print(f"Warning: Logo file not found: {logo_file_path}")
return None
# Get file extension to determine MIME type
_, ext = os.path.splitext(logo_file_path)
ext = ext.lower().replace('.', '')
# Map file extensions to MIME types
mime_types = {
'png': 'image/png',
'jpg': 'image/jpeg',
'jpeg': 'image/jpeg',
'gif': 'image/gif',
'svg': 'image/svg+xml'
}
mime_type = mime_types.get(ext, 'image/png')
# Convert logo to base64 format for embedding in HTML
with open(logo_file_path, 'rb') as f:
logo_data = f.read()
encoded = base64.b64encode(logo_data).decode('utf-8')
return f"data:{mime_type};base64,{encoded}"
def prepare_template_data(sales_data, year):
# Get list of months from sales data
months = list(sales_data.get('monthly_sales', {}).keys())
# Initialize dictionaries for growth tracking
growth_classes = {}
growth_values = {}
# Calculate month-over-month growth for each month
previous_sales = None
for month in months:
sales = sales_data['monthly_sales'][month]
# Set default values and calculate growth percentage if previous month exists
growth_values[month] = "N/A"
growth_classes[month] = ""
if previous_sales is not None and previous_sales > 0:
growth_val = ((sales - previous_sales) / previous_sales) * 100
growth_icon = "▲" if growth_val > 0 else "▼"
growth_classes[month] = "positive-growth" if growth_val > 0 else "negative-growth"
growth_values[month] = f"{growth_icon} {abs(growth_val):.1f}%"
previous_sales = sales
# Return prepared data with defaults for missing keys
return {
'title': sales_data.get('title', f'Annual Sales Report - {year}'),
'company_name': sales_data.get('company_name', 'Company Name'),
'year': year,
'total_sales': sales_data.get('total_sales', 0),
'growth_vs_prev_year': sales_data.get('growth_vs_prev_year', 0),
'generation_date': sales_data.get('generation_date', datetime.datetime.now().strftime('%Y-%m-%d')),
'months': months,
'monthly_sales': sales_data.get('monthly_sales', {}),
'monthly_orders': sales_data.get('monthly_orders', {}),
'quarterly_breakdown': sales_data.get('quarterly_breakdown', {}),
'growth_classes': growth_classes,
'growth_values': growth_values,
'final_summary': sales_data.get('final_summary', ''),
}
def generate_html_from_template(template_data, sales_chart_path, quarterly_chart_path,
product_chart_path, logo_path=None, template_name='annual_report.html'):
# Set up Jinja2 environment for template rendering
template_dir = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'templates')
env = Environment(loader=FileSystemLoader(template_dir))
# Try loading the specified template or fallback to default
try:
template = env.get_template(template_name)
except Exception as e:
print(f"Error loading template '{template_name}': {str(e)}")
template = env.get_template('annual_report.html')
# Add chart paths to the template data
template_data['sales_chart_path'] = sales_chart_path
template_data['quarterly_chart_path'] = quarterly_chart_path
template_data['product_chart_path'] = product_chart_path
# Add logo path if provided, otherwise set empty placeholder
if logo_path:
template_data['logo_path'] = logo_path
else:
template_data['logo_path'] = ''
# Render the template with the provided data
return template.render(**template_data)
def verify_data(sales_data):
# Define the keys that must be present for the report to work correctly
required_keys = ['monthly_sales', 'monthly_orders', 'quarterly_breakdown', 'category_sales']
missing_keys = []
# Check for missing required keys
for key in required_keys:
if key not in sales_data:
missing_keys.append(key)
# Warn if required keys are missing
if missing_keys:
print(f"Warning: Required data sections missing: {', '.join(missing_keys)}")
print("Report may be incomplete or charts may not render correctly.")
return False
return True
def generate_annual_report(sales_data, year, output_filename, logo_file=None, template_name='annual_report.html'):
# Check data integrity before proceeding
verify_data(sales_data)
try:
# Generate all chart images needed for the report
sales_chart_path = create_sales_chart(sales_data, year)
quarterly_chart_path = create_quarterly_chart(sales_data)
product_chart_path = create_product_breakdown_chart(sales_data)
# Prepare logo data if a logo file was provided
logo_path = None
if logo_file:
logo_path = prepare_logo_path(logo_file)
# Prepare all template data with calculations for growth, etc.
template_data = prepare_template_data(sales_data, year)
# Generate complete HTML from template and all data
html_content = generate_html_from_template(
template_data,
sales_chart_path,
quarterly_chart_path,
product_chart_path,
logo_path,
template_name
)
# Convert HTML to PDF and save to file
success = convert_html_to_pdf(html_content, output_filename)
# Report success or failure
if success:
print(f"Annual report successfully generated: {output_filename}")
else:
print("Error generating PDF annual report")
except Exception as e:
# Catch and report any errors that occur during report generation
print(f"Error generating report: {str(e)}")
raise
if __name__ == "__main__":
# Parse command line arguments with defaults
year = int(sys.argv[1]) if len(sys.argv) > 1 else 2024
output_file = sys.argv[2] if len(sys.argv) > 2 else 'annual_sales_report.pdf'
data_file = sys.argv[3] if len(sys.argv) > 3 else 'data/annual_data.json'
logo_file = sys.argv[4] if len(sys.argv) > 4 else 'image/logo.png'
template_name = sys.argv[5] if len(sys.argv) > 5 else 'annual_report.html'
# Load data from JSON file or exit if failed
try:
sales_data = load_data(data_file)
except Exception as e:
print(f"Error loading data: {str(e)}")
sys.exit(1)
# Add timestamp to the output filename
timestamp = datetime.datetime.now().strftime('%Y%m%d_%H%M%S')
filename, file_extension = os.path.splitext(output_file)
output_file = f"{filename}_{timestamp}{file_extension}"
# Create output directory if it doesn't exist
os.makedirs('output', exist_ok=True)
output_path = os.path.join('output', output_file)
# Generate the report and handle any errors
try:
generate_annual_report(sales_data, year, output_path, logo_file, template_name)
except Exception as e:
print(f"Failed to generate report: {str(e)}")
sys.exit(1)
Running the Script
Now that we have set up all the necessary files, let's run our application to generate a PDF report.
python generate_annual_report.py
Exploring the Generated Report
After running the script:
- Check the
output/
directory. - Open the generated PDF file.
- Click to view the generated report and explore the dynamically created sales report!
Troubleshooting and Best Practices
Common Issues
When working with HTML to PDF conversion, you might encounter some issues:
- Missing images: Check that all paths are correct and images are encoded properly.
- CSS styling problems: Not all CSS properties are supported by xhtml2pdf. Stick to basic properties.
- Font issues: Use web-safe fonts or embed custom fonts properly.
- Page breaks: Control page breaks with CSS directives like
page-break-before
. - Unicode characters: Ensure your HTML has proper charset declarations.
Performance Optimization
For large reports, consider these optimizations:
- Compress images before embedding them.
- Split very large reports into multiple PDFs.
- Use web-safe fonts to avoid embedding large font files.
- Cache generated charts if they're used in multiple reports.
Conclusion
In this comprehensive guide, we've explored how to generate professional PDF reports from HTML using xhtml2pdf in Python. By combining the power of this library with Jinja2 templates and matplotlib visualizations, we've created a flexible, data-driven report generation system.
The approach we've taken has several advantages:
- Separation of concerns: Data, presentation, and logic are nicely separated.
- Maintainability: Easy to update templates without changing the core code.
- Flexibility: Generate different reports by just changing the data and templates.
- Professional output: Beautiful, well-formatted PDF with charts and styling.
This pattern can be adapted for various types of reports beyond sales data, such as:
- Financial statements.
- Inventory reports.
- Performance dashboards.
- Certificates and official documents.
- Marketing analytics reports.
By mastering this HTML to PDF workflow, you've gained a powerful tool for your Python toolkit that can streamline reporting processes and enhance the professional presentation of your data.
Happy report-making, and may your PDF reports be as engaging as a Netflix series! 📺