Combining PDFs
I’ve had to merge PDF files a few times over the years, including when I had to put all of the parts / chapters of my thesis together. Every time, I’d spend some time googling until I’d find myself at a semi-sketchy (a place I could see having malware) website which would have a graphical interface for doing this menial (but surprisingly hard to find a compatible tool) task. Well, I found myself wanting to combine all of the Nature Methods “Points of View” PDFs today, and since I’m much more comfortable with scripts and the command line now, I wondered if I could merge the individual pages that way. A quick google search got me to this website. I’ll summarize what I did so I can quickly recreate this in the future (instead of having to google all over again).
- Download PyPDF2 from the website.
- Install PyPDF2 (unzip, move into directory, then command line: “python3 setup.py install”).
- Create a python script file with the following code from the above website:
import os from PyPDF2 import PdfFileReader, PdfFileMerger files_dir = "PATH TO PDF FILE DIRECTORY HERE" pdf_files = [f for f in os.listdir(files_dir) if f.endswith("pdf")] merger = PdfFileMerger() for filename in pdf_files: merger.append(PdfFileReader(os.path.join(files_dir, filename), "rb")) merger.write(os.path.join(files_dir, "merged_full.pdf"))
- Go to the folder with the script file, then command line: “Python3 PDF_merge_script.py”
I got an error message “PdfReadWarning: Xref table not zero-indexed. ID numbers for objects will not be corrected. [pdf.py:1503]”, though apparently it’s not very important. I still ended up getting a merged PDF, and though the order was weird (since I was too lazy to rename the default names given my Nature Methods), having the whole thing in a funky order is still way better than printing each pdf individually.