Skip to content

Instantly share code, notes, and snippets.

@michabbb
Created November 26, 2024 17:53
Show Gist options
  • Save michabbb/be511b13c647935e2a4fa1ff80ee62d1 to your computer and use it in GitHub Desktop.
Save michabbb/be511b13c647935e2a4fa1ff80ee62d1 to your computer and use it in GitHub Desktop.
You are tasked with analyzing a merged PDF containing multiple documents in German and creating a bash script to split and rename these documents. Follow these instructions carefully:
First, carefully examine the content of the merged PDF.
Now, follow these steps:
1. Analyze the PDF:
- Identify all distinct documents within the merged PDF.
- Ignore any blank pages (pages that contain no letters).
- Determine the page range for each document.
- Identify the creation or issue date for each document (format: YYYY-MM-DD).
2. Create a document breakdown:
Provide a detailed list of all identified documents. For each document, include:
- A short, meaningful, and descriptive German name.
- The exact page range(s) that belong to that document.
- The date when the document was created or issued (format: YYYY-MM-DD).
Present this information in the following format:
1. [German name]: Pages [X-Y], Date: [YYYY-MM-DD]
2. [German name]: Pages [X-Y], Date: [YYYY-MM-DD]
...
3. Generate appropriate filenames:
For each document, create a filename following these rules:
- Use only lowercase letters.
- Replace spaces with underscores (_).
- Use short, descriptive German names.
- Include the document's creation date at the end (format: YYYY-MM-DD).
4. Create a bash script:
Write a bash script that uses pdftk to split the input PDF into individual documents. The script should:
- Define the input PDF filename as "merged_documents.pdf".
- Include variables for each output filename.
- Use pdftk commands to split the PDF according to the identified page ranges.
- Be ready to run without modifications.
Present the script as markdown in the following format:
#!/bin/bash
# Input PDF file
input_pdf="merged_documents.pdf"
# Output files with dates
[filename_variable]="[generated_filename].pdf"
...
# Splitting PDF pages with pdftk
pdftk "$input_pdf" cat [page_range] output "$[filename_variable]"
...
echo "PDFs erfolgreich aufgeteilt."
5. Provide final verification:
After completing the analysis and script creation, provide the following information:
- Total number of pages in merged PDF: [specify]
- Total number of pages accounted for: [specify]
- Number of blank pages identified: [specify]
- Confirmation that all pages are accounted for: [Yes/No]
Ensure that you:
- Analyze the PDF thoroughly, identifying all documents and ignoring blank pages.
- Double-check all page ranges for accuracy.
- Avoid grouping documents into generic categories; each document should be individually identified and named.
- Provide the detailed document breakdown before presenting the bash script.
Present your final output in this order:
1. Document breakdown
2. Bash script
3. Final verification
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment