Merging Multiple PDFs Into One With Python
Recently, I ran into an issue where I needed to merge multiple PDFs into one file. The problem is that I don’t have an Adobe Acrobat Pro subscription and I didn’t want to use online tools due to privacy concerns. To solve this, I built my own using Python.
Hi, I’m Da Data Guy! Before I begin, if you are interested in Power BI, Python, Data visualization and SQL, please follow me on Medium and on my LinkedIn. I focus on writing quality articles breaking down each step in the process so you can follow along and learn too. You can also visit my GitHub page to view my published resources and follow along.
Getting Your Files Prepped
To follow along, you create a folder on your local computer where you will store all the PDF files you want to merge and a Jupyter Notebook (.ipynb) file. Also, for this example, I swapped the original PDF files I used with Power BI PDFs.
For those that don’t have the PyPDF2 library, you can learn how to install it or review their documentation here: PyPDF2 · PyPI. Also, I’m using the latest version, which I believe is 3.0.0 at the time of this writing (3/15/2023).
Let’s Get Started
# Importing all libaries and the updated PyPDF2 library codes.
# If you need to install, type: pip install PyPDF2
import os
import PyPDF2
from PyPDF2 import PdfReader , PdfWriter, PdfMerger
pdfFiles = [] # variable
for root, dirs, filenames in os.walk(os.getcwd()): # Root and directory pathway.
for filename in filenames:
if filename.lower().endswith('.pdf'):# for loop for all files with .pdf in the name.
pdfFiles.append(os.path.join(root,filename))
# Appending files to root name from OS (operating system).
# Sorting the files by forcing everything to lower case.
pdfFiles.sort(key=str.lower)
# Assigning the pdfWriter() function to pdfWriter.
pdfWriter = PyPDF2.PdfWriter()
Example of the pathway the function found on my local machine.
# Displaying the pathways it's found on the local file.
pdfFiles
['C:\\Users\\NAME\\Desktop\\Data Science\\Jupyter Notebook\\Sandbox\\Merge PDFs\\Test_PDFs\\Harry Potter - Final.pdf',
'C:\\Users\\NAME\\Desktop\\Data Science\\Jupyter Notebook\\Sandbox\\Merge PDFs\\Test_PDFs\\Mexico Restaurant Ratings.pdf',
'C:\\Users\\NAME\\Desktop\\Data Science\\Jupyter Notebook\\Sandbox\\Merge PDFs\\Test_PDFs\\Super Bowl-Final.pdf']
If you have several PDF files you are trying to merge, then you might want to count the number of files. Be careful, if you already have ran this script, it will count the output file you’ve generated.
# Displaying the pathways it's found on the local file.
print(len(pdfFiles))
# Output: 3
The next step is to now append each of the 3 files together based on the file path that we’ve stored in the pdfFiles variable.
for filename in pdfFiles: # Starting a for loop.
pdfFileObj = open(filename, 'rb') # Opens each of the file paths in filename variable.
pdfReader = PyPDF2.PdfReader(pdfFileObj) # Reads each of the files in the new varaible you've created above and stores into memory.
pageObj = pdfReader.pages[pageNum] # Reads only those that are in the varaible.
pdfWriter.add_page(pageObj) # Adds each of the PDFs it's read to a new page.
Lastly, we need to take the appended files that have been stored in the pdfOutput variable and generate a new PDF output file. The new file will be placed in the same directory as the other PDF files that have been used.
# Name of the PDF file can be written here.
pdfOutput = open('Power_BI_Test_Files.pdf', 'wb')
# Writing the output file using the pdfWriter function.
pdfWriter.write(pdfOutput)
pdfOutput.close()
URL to view Output File.
https://github.com/DaDataGuy/PDF_Merged_Script/blob/main/Final_PDF_Output/Power_BI_Test_Files.pdf
Thank you!
If you have enjoyed this article, please give me a follow and if you’re interested in Using Python to Transform Multiple CSV files, click here: Using Python To Transform Data From Multiple CSV Files | by Da Data Guy | Medium