Combining PDFs

July 4, 2016

I’ve had to merge PDF files a few times over the years, including when I had to put all of the parts / chapters of my thesis together. Every time, I’d spend some time googling until I’d find myself at a semi-sketchy (a place I could see having malware) website which would have a graphical interface for doing this menial (but surprisingly hard to find a compatible tool) task. Well, I found myself wanting to combine all of the Nature Methods “Points of View” PDFs today, and since I’m much more comfortable with scripts and the command line now, I wondered if I could merge the individual pages that way. A quick google search got me to this website. I’ll summarize what I did so I can quickly recreate this in the future (instead of having to google all over again).

Download PyPDF2 from the website.
Install PyPDF2 (unzip, move into directory, then command line: “python3 setup.py install”).

Create a python script file with the following code from the above website:

import os
from PyPDF2 import PdfFileReader, PdfFileMerger

files_dir = "PATH TO PDF FILE DIRECTORY HERE"
pdf_files = [f for f in os.listdir(files_dir) if f.endswith("pdf")]
merger = PdfFileMerger()

for filename in pdf_files:
    merger.append(PdfFileReader(os.path.join(files_dir, filename), "rb"))

merger.write(os.path.join(files_dir, "merged_full.pdf"))

Go to the folder with the script file, then command line: “Python3 PDF_merge_script.py”

I got an error message “PdfReadWarning: Xref table not zero-indexed. ID numbers for objects will not be corrected. [pdf.py:1503]”, though apparently it’s not very important. I still ended up getting a merged PDF, and though the order was weird (since I was too lazy to rename the default names given my Nature Methods), having the whole thing in a funky order is still way better than printing each pdf individually.

Visualized Life

(A Well) Visualized Life

Combining PDFs

Leave a Reply Cancel reply