Python

How to create PDF files with Python

In a previous article we talked about several ways to read PDF files with Python. This post will cover two packages used to create PDF files with Python, including pdfkit and ReportLab.

Create PDF files with Python and pdfkit

pdfkit was the first library I learned for creating PDF files. A nice feature of pdfkit is that you can use it to create PDF files from URLs. To get started, you’ll need to install it along with a utility called wkhtmltopdf. Use pip to install pdfkit from PyPI:


pip install pdfkit

Once you’re set up, you can start using pdfkit. In the example below, we download Wikipedia’s main page as a PDF file. To get pdfkit working, you’ll need to either add wkhtmltopdf to your PATH, or configure pdfkit to point to where the executable is stored (the latter option is used below).

Download a webpage as a PDF


# import package
import pdfkit

# configure pdfkit to point to our installation of wkhtmltopdf
config = pdfkit.configuration(wkhtmltopdf = r"C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe")

# download Wikipedia main page as a PDF file
pdfkit.from_url("https://en.wikipedia.org/wiki/Main_Page", "sample_url_pdf.pdf", configuration = config)

You can also set the output path to False, which will return a binary version of the PDF into Python, rather than downloading the webpage to an external file.


pdfkit.from_url("https://en.wikipedia.org/wiki/Main_Page", output_path = False, configuration = config)

How to create a PDF from HTML

One of the nicest features of pdfkit is that you can use it to create PDF files from HTML, including from HTML strings that you pass it directly in Python.


s = """<h1><strong>Sample PDF file from HTML</strong></h1>
       <br></br>
       <p>First line...</p>
       <p>Second line...</p>
       <p>Third line...</p>"""

pdfkit.from_string(s, output_path = "new_file.pdf", configuration = config)

Additionally, pdfkit can create PDF files by reading HTML files.


pdfkit.from_file("sample_html_file.html", output_path = "new_file2.pdf", configuration = config)

You can also create PDF files with more complex HTML / CSS, as well. You simply need to pass the HTML as a string or store it in a file that can be passed to pdfkit. Let’s do another example, but this time, we’ll create a table using HTML and CSS.

Creating tables in a PDF file


table_html = """<!DOCTYPE html>
<html>
<head>
<style>
table, th, td {
  border: 1px solid black;
}

table {
  width: 100%;
}
</style>
</head>
<body>

<h2>Sample Table</h2>

<table>
  <tr>
    <th>Field 1</th>
    <th>Field 2</th>
  </tr>
  <tr>
    <td>x1</td>
    <td>x2</td>
  </tr>
  <tr>
    <td>x3</td>
    <td>x4</td>
  </tr>
</table>

</body>
</html>
 """

pdfkit.from_string(table_html, output_path = "sample_table.pdf", configuration = config)

Creating PDF files with Python and ReportLab

The next package we’ll discuss is ReportLab. ReportLab is one of the most popular libaries for creating PDF files.

You can install ReportLab using pip:


pip install reportlab

Here’s an initial example to create a simple PDF with one line of text. The first piece of code imports the canvas module from ReportLab. Then, we create an instance of the Canvas (note the capital “C” this time) class with the name of the file we want to create. Third, we use drawString to write out a line of text. The (50, 800) are coordinates for where to place the text (this might take some experimentation). Lastly, we save the file.


from reportlab.pdfgen import canvas

report = canvas.Canvas("first_test.pdf")

report.drawString(50, 800, "**First PDF with ReportLab**")
report.save()

Adding images to a PDF file

Next, let’s create a sample PDF file containing an image. Here, we’re going to use the pillow library to create an Image object. In this example, we need to create a list of elements that we will use to construct the PDF file (we refer to this list as info below). For this instance, the list will contain just one element – the Image object represeting the image that we will put into the PDF file, but as we’ll see in the next example, we can also use this list to store other elements for placing into the PDF file.

Also, note here we are using the SimpleDocTemplate class, which basically does what it sounds like – creates a simple document template that we can use to fill in information. This provides more structure than using canvas, like above.


# import in SimpleDocTemplate
from reportlab.platypus import SimpleDocTemplate
from PIL import Image

# create document object
doc = SimpleDocTemplate("sample_image.pdf")
info = []

# directory to image file we want to use
image_file = "sample_plot.png"

# create Image object with size specifications
im = Image(image_file, 3*inch, 3*inch)

# append Image object to our info list
info.append(im)

# build / save PDF document
doc.build(info)

Creating paragraphs of text

Generalizing on our code above, we can add a few paragraphs of text, followed by a sample image.


from reportlab.platypus import Paragraph

doc = SimpleDocTemplate("more_text.pdf")

p1 = "<font size = '12'><strong>This is the first paragraph...</strong></font>"
p2 = "<font size = '12'><strong>This is the second paragraph...</strong></font>"
p3 = "<font size = '12'><strong>This is the third paragraph...</strong></font>"
p4 = "<br></br><br></br><br></br>"

image_file = "sample_plot.png"

im = Image(image_file, 3*inch, 3*inch)

info = []

info.append(Paragraph(p1))
info.append(Paragraph(p2))
info.append(Paragraph(p3))
info.append(Paragraph(p4))
info.append(im)

doc.build(info)

How to adjust fonts

To adjust font types, we can tweak our first ReportLab example above to use the setFont method.


from reportlab.pdfgen import canvas

report = canvas.Canvas("test_with_font.pdf")

report.setFont("Courier", 12)

report.drawString(50, 800, "**Test PDF with Different Font**")
report.save()

Creating a PDF with multiple pages

Next, let’s show how to create a PDF with multiple pages. This is a common and useful task to be able to do. To handle creating multiple pages, we’ll modify the above example to create a PDF with three separate pages. One way to tell ReportLab the content on a single page is finished is to use the showPage method, like below. Any content you create afterward will be added to the next page. Then, we can call the showPage method again to create a third page.


from reportlab.pdfgen import canvas

report = canvas.Canvas("multiple_pages.pdf")
report.setFont("Courier", 12)

report.drawString(50, 800, "**This is the first page...**")
report.showPage()

report.drawString(50, 800, "**This is the second page...**")
report.showPage()

report.drawString(50, 800, "**This is the third page...**")
report.showPage()

report.save()

Another way to create page breaks using the SimpleDocTemplate from earlier in the post is like this:


# import PageBreak, along with SimpleDocTemplate
from reportlab.platypus import SimpleDocTemplate, PageBreak

# create new file with image and multiple pages
doc = SimpleDocTemplate("sample_image_multiple_pages.pdf")
info = []

image_file = "sample_plot.png"

im = Image(image_file, 3*inch, 3*inch)
info.append(im)

# add page break 
info.append(PageBreak())
info.append(Paragraph("Second page..."))

# add third page
info.append(PageBreak())
info.append(Paragraph("Third page..."))

# build PDF
doc.build(info)

Conclusion

That’s it for this post! If you enjoyed reading, please share this article with your friends. Check out more about ReportLab by clicking here. Documentation for pdfkit is here.

Andrew Treadway

Recent Posts

Software Engineering for Data Scientists (New book!)

Very excited to announce the early-access preview (MEAP) of my upcoming book, Software Engineering for…

2 years ago

How to stop long-running code in Python

Ever had long-running code that you don't know when it's going to finish running? If…

3 years ago

Faster alternatives to pandas

Background If you've done any type of data analysis in Python, chances are you've probably…

3 years ago

Automated EDA with Python

In this post, we will investigate the pandas_profiling and sweetviz packages, which can be used…

3 years ago

How to plot XGBoost trees in R

In this post, we're going to cover how to plot XGBoost trees in R. XGBoost…

4 years ago

Python collections tutorial

In this post, we'll discuss the underrated Python collections package, which is part of the…

4 years ago