Combining PDFs

I’ve had to merge PDF files a few times over the years, including when I had to put all of the parts / chapters of my thesis together.  Every time, I’d spend some time googling until I’d find myself at a semi-sketchy (a place I could see having malware) website which would have a graphical interface for doing this menial (but surprisingly hard to find a compatible tool) task.  Well, I found myself wanting to combine all of the Nature Methods “Points of View” PDFs today, and since I’m much more comfortable with scripts and the command line now, I wondered if I could merge the individual pages that way.  A quick google search got me to this website.  I’ll summarize what I did so I can quickly recreate this in the future (instead of having to google all over again).

  1. Download PyPDF2 from the website.
  2. Install PyPDF2 (unzip, move into directory, then command line: “python3 setup.py install”).
  3. Create a python script file with the following code from the above website:
    import os
    from PyPDF2 import PdfFileReader, PdfFileMerger
    
    files_dir = "PATH TO PDF FILE DIRECTORY HERE"
    pdf_files = [f for f in os.listdir(files_dir) if f.endswith("pdf")]
    merger = PdfFileMerger()
    
    for filename in pdf_files:
        merger.append(PdfFileReader(os.path.join(files_dir, filename), "rb"))
    
    merger.write(os.path.join(files_dir, "merged_full.pdf"))
  4. Go to the folder with the script file, then command line: “Python3 PDF_merge_script.py”

I got an error message “PdfReadWarning: Xref table not zero-indexed. ID numbers for objects will not be corrected. [pdf.py:1503]”, though apparently it’s not very important.  I still ended up getting a merged PDF, and though the order was weird (since I was too lazy to rename the default names given my Nature Methods), having the whole thing in a funky order is still way better than printing each pdf individually.

Leave a Reply

Your email address will not be published. Required fields are marked *