In a previous article we talked about several ways to read PDF files with Python. This post will cover two packages used to create PDF files with Python, including pdfkit and ReportLab.
pdfkit was the first library I learned for creating PDF files. A nice feature of pdfkit is that you can use it to create PDF files from URLs. To get started, you’ll need to install it along with a utility called wkhtmltopdf. Use pip to install pdfkit from PyPI:
pip install pdfkit
Once you’re set up, you can start using pdfkit. In the example below, we download Wikipedia’s main page as a PDF file. To get pdfkit working, you’ll need to either add wkhtmltopdf to your PATH, or configure pdfkit to point to where the executable is stored (the latter option is used below).
# import package import pdfkit # configure pdfkit to point to our installation of wkhtmltopdf config = pdfkit.configuration(wkhtmltopdf = r"C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe") # download Wikipedia main page as a PDF file pdfkit.from_url("https://en.wikipedia.org/wiki/Main_Page", "sample_url_pdf.pdf", configuration = config)
You can also set the output path to False, which will return a binary version of the PDF into Python, rather than downloading the webpage to an external file.
pdfkit.from_url("https://en.wikipedia.org/wiki/Main_Page", output_path = False, configuration = config)
One of the nicest features of pdfkit is that you can use it to create PDF files from HTML, including from HTML strings that you pass it directly in Python.
s = """<h1><strong>Sample PDF file from HTML</strong></h1> <br></br> <p>First line...</p> <p>Second line...</p> <p>Third line...</p>""" pdfkit.from_string(s, output_path = "new_file.pdf", configuration = config)
Additionally, pdfkit can create PDF files by reading HTML files.
pdfkit.from_file("sample_html_file.html", output_path = "new_file2.pdf", configuration = config)
You can also create PDF files with more complex HTML / CSS, as well. You simply need to pass the HTML as a string or store it in a file that can be passed to pdfkit. Let’s do another example, but this time, we’ll create a table using HTML and CSS.
table_html = """<!DOCTYPE html> <html> <head> <style> table, th, td { border: 1px solid black; } table { width: 100%; } </style> </head> <body> <h2>Sample Table</h2> <table> <tr> <th>Field 1</th> <th>Field 2</th> </tr> <tr> <td>x1</td> <td>x2</td> </tr> <tr> <td>x3</td> <td>x4</td> </tr> </table> </body> </html> """ pdfkit.from_string(table_html, output_path = "sample_table.pdf", configuration = config)
The next package we’ll discuss is ReportLab. ReportLab is one of the most popular libaries for creating PDF files.
You can install ReportLab using pip:
pip install reportlab
Here’s an initial example to create a simple PDF with one line of text. The first piece of code imports the canvas module from ReportLab. Then, we create an instance of the Canvas (note the capital “C” this time) class with the name of the file we want to create. Third, we use drawString to write out a line of text. The (50, 800) are coordinates for where to place the text (this might take some experimentation). Lastly, we save the file.
from reportlab.pdfgen import canvas report = canvas.Canvas("first_test.pdf") report.drawString(50, 800, "**First PDF with ReportLab**") report.save()
Next, let’s create a sample PDF file containing an image. Here, we’re going to use the pillow library to create an Image object. In this example, we need to create a list of elements that we will use to construct the PDF file (we refer to this list as info below). For this instance, the list will contain just one element – the Image object represeting the image that we will put into the PDF file, but as we’ll see in the next example, we can also use this list to store other elements for placing into the PDF file.
Also, note here we are using the SimpleDocTemplate class, which basically does what it sounds like – creates a simple document template that we can use to fill in information. This provides more structure than using canvas, like above.
# import in SimpleDocTemplate from reportlab.platypus import SimpleDocTemplate from PIL import Image # create document object doc = SimpleDocTemplate("sample_image.pdf") info = [] # directory to image file we want to use image_file = "sample_plot.png" # create Image object with size specifications im = Image(image_file, 3*inch, 3*inch) # append Image object to our info list info.append(im) # build / save PDF document doc.build(info)
Generalizing on our code above, we can add a few paragraphs of text, followed by a sample image.
from reportlab.platypus import Paragraph doc = SimpleDocTemplate("more_text.pdf") p1 = "<font size = '12'><strong>This is the first paragraph...</strong></font>" p2 = "<font size = '12'><strong>This is the second paragraph...</strong></font>" p3 = "<font size = '12'><strong>This is the third paragraph...</strong></font>" p4 = "<br></br><br></br><br></br>" image_file = "sample_plot.png" im = Image(image_file, 3*inch, 3*inch) info = [] info.append(Paragraph(p1)) info.append(Paragraph(p2)) info.append(Paragraph(p3)) info.append(Paragraph(p4)) info.append(im) doc.build(info)
To adjust font types, we can tweak our first ReportLab example above to use the setFont method.
from reportlab.pdfgen import canvas report = canvas.Canvas("test_with_font.pdf") report.setFont("Courier", 12) report.drawString(50, 800, "**Test PDF with Different Font**") report.save()
Next, let’s show how to create a PDF with multiple pages. This is a common and useful task to be able to do. To handle creating multiple pages, we’ll modify the above example to create a PDF with three separate pages. One way to tell ReportLab the content on a single page is finished is to use the showPage method, like below. Any content you create afterward will be added to the next page. Then, we can call the showPage method again to create a third page.
from reportlab.pdfgen import canvas report = canvas.Canvas("multiple_pages.pdf") report.setFont("Courier", 12) report.drawString(50, 800, "**This is the first page...**") report.showPage() report.drawString(50, 800, "**This is the second page...**") report.showPage() report.drawString(50, 800, "**This is the third page...**") report.showPage() report.save()
Another way to create page breaks using the SimpleDocTemplate from earlier in the post is like this:
# import PageBreak, along with SimpleDocTemplate from reportlab.platypus import SimpleDocTemplate, PageBreak # create new file with image and multiple pages doc = SimpleDocTemplate("sample_image_multiple_pages.pdf") info = [] image_file = "sample_plot.png" im = Image(image_file, 3*inch, 3*inch) info.append(im) # add page break info.append(PageBreak()) info.append(Paragraph("Second page...")) # add third page info.append(PageBreak()) info.append(Paragraph("Third page...")) # build PDF doc.build(info)
That’s it for this post! If you enjoyed reading, please share this article with your friends. Check out more about ReportLab by clicking here. Documentation for pdfkit is here.
Very excited to announce the early-access preview (MEAP) of my upcoming book, Software Engineering for…
Ever had long-running code that you don't know when it's going to finish running? If…
Background If you've done any type of data analysis in Python, chances are you've probably…
In this post, we will investigate the pandas_profiling and sweetviz packages, which can be used…
In this post, we're going to cover how to plot XGBoost trees in R. XGBoost…
In this post, we'll discuss the underrated Python collections package, which is part of the…