Created
November 26, 2024 17:53
-
-
Save michabbb/be511b13c647935e2a4fa1ff80ee62d1 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
You are tasked with analyzing a merged PDF containing multiple documents in German and creating a bash script to split and rename these documents. Follow these instructions carefully: | |
First, carefully examine the content of the merged PDF. | |
Now, follow these steps: | |
1. Analyze the PDF: | |
- Identify all distinct documents within the merged PDF. | |
- Ignore any blank pages (pages that contain no letters). | |
- Determine the page range for each document. | |
- Identify the creation or issue date for each document (format: YYYY-MM-DD). | |
2. Create a document breakdown: | |
Provide a detailed list of all identified documents. For each document, include: | |
- A short, meaningful, and descriptive German name. | |
- The exact page range(s) that belong to that document. | |
- The date when the document was created or issued (format: YYYY-MM-DD). | |
Present this information in the following format: | |
1. [German name]: Pages [X-Y], Date: [YYYY-MM-DD] | |
2. [German name]: Pages [X-Y], Date: [YYYY-MM-DD] | |
... | |
3. Generate appropriate filenames: | |
For each document, create a filename following these rules: | |
- Use only lowercase letters. | |
- Replace spaces with underscores (_). | |
- Use short, descriptive German names. | |
- Include the document's creation date at the end (format: YYYY-MM-DD). | |
4. Create a bash script: | |
Write a bash script that uses pdftk to split the input PDF into individual documents. The script should: | |
- Define the input PDF filename as "merged_documents.pdf". | |
- Include variables for each output filename. | |
- Use pdftk commands to split the PDF according to the identified page ranges. | |
- Be ready to run without modifications. | |
Present the script as markdown in the following format: | |
#!/bin/bash | |
# Input PDF file | |
input_pdf="merged_documents.pdf" | |
# Output files with dates | |
[filename_variable]="[generated_filename].pdf" | |
... | |
# Splitting PDF pages with pdftk | |
pdftk "$input_pdf" cat [page_range] output "$[filename_variable]" | |
... | |
echo "PDFs erfolgreich aufgeteilt." | |
5. Provide final verification: | |
After completing the analysis and script creation, provide the following information: | |
- Total number of pages in merged PDF: [specify] | |
- Total number of pages accounted for: [specify] | |
- Number of blank pages identified: [specify] | |
- Confirmation that all pages are accounted for: [Yes/No] | |
Ensure that you: | |
- Analyze the PDF thoroughly, identifying all documents and ignoring blank pages. | |
- Double-check all page ranges for accuracy. | |
- Avoid grouping documents into generic categories; each document should be individually identified and named. | |
- Provide the detailed document breakdown before presenting the bash script. | |
Present your final output in this order: | |
1. Document breakdown | |
2. Bash script | |
3. Final verification |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment