Delete selected page from pdf based on matching content?

Get help. Get answers. Let others lend you a hand.

Moderator: Mr_Noodle

Guidance would be appreciated here. I'll do my homework once I have some bread crumbs to lead me down the right path.

I download 12 monthly bank statements, and then append them into one sequential pdf file. Total is about 50-60 pages per year.

Each month includes several repetitive and unnecessary pages. For example, there is always a template page for balancing the checkbook (which I never use), and depending on the number of pages in the statement there will often be a blank page containing the single sentence, "This page intentionally left blank."

I am able to use content-matching tokens successfully. For example, a match for "intentionally" easily identifies the blank pages.

Once I have a match, is there any way (via Hazel, CLI, Applescript, Automator) to delete the 8-12 matched pages interspersed throughout the longer pdf file?

* Scan contents of page 1.
* IF page contains "intentionally", THEN delete page.
* Next page ...
* Continue UNTIL last page

This is a relatively trivial example because it doesn't take too long to do this manually in Preview. OTOH, we're always looking for ways to identify and eliminate repetitive (and error-prone) tasks, and this process would be invaluable for more complex situations, or much larger files.

TIA

Ed
ejgallaher
 
Posts: 6
Joined: Thu Dec 04, 2014 8:52 pm

After my original question, an inelegant, but probably workable solution occurred to me.

1. (Automator workflow) Burst the monthly PDF (4-5 pages per month x 12 months =~ 50-60 pages) into individual pages, thus creating 50-60 one-page pdf documents.

2A. (Hazel): IF THE FOLLOWING... USE content matching to identify those files that includes the word "intentionally"

2B. (Hazel) ACTION: Move matching files to 'discard folder'.

3. (Automator workflow) Append remaining pdf files into one sequential file.

Hmmm -- that doesn't seem so hard... I'll try it tomorrow and report back.

Ed
ejgallaher
 
Posts: 6
Joined: Thu Dec 04, 2014 8:52 pm

UNCLE! Re: Delete selected page from pdf Fri Dec 12, 2014 12:02 am • by ejgallaher
OK - I've given it the old college try, and more, over the past two days. I can make all the pieces work separately, but for the life of me cannot make them play nicely together.

Here's what works:

1. I place twelve (12) monthly bank statements into a folder. Each file is 2 to 4 pages. :)
2. A Hazel rule runs an Automator workflow to split each statement into separate pages. :)
3. The next Hazel rule matches contents which are found only on 'junk' pages. Once identified, these pages are moved into a 'discard' folder. :)

PROGRESS TO DATE:

The above workflow retains all the 'good' pages in the previous folder, labeled numerically. ALL (?) that needs to be done is to merge them into a single PDF file, thus collecting all the statements from one year into a single document. :)

Easy, right? I wish! :? :lol: :x

This works: Create an Automator.app (Append-PDF.app) that combines the files. Open (manually) the folder, select all, drag to the Append-PDF.app icon on the desktop, and voila! I'm done! :)

HOWEVER... AFAIK, Hazel cannot process an Automator APP; it requires an Automator WORKFLOW. Simple enough... :)

I have converted the app to a workflow, which again, works fine when I run it manually. But when I incorporate the workflow into a Hazel rule, it executes 12 times :o , once for each file in the folder. I have tried tuning up the Automator workflow to 'get folder', 'get folder contents' , merge pdf, which again, works fine (albeit, 12 times). I cannot seem to create a Hazel rule which sends ALL of the files to Automator, ONCE, and then stops. :cry:

It would take me another two hours to document all the things I've tried -- rather than do that, it's time to wave the white flag and see if anyone has the answer. (It's probably embarrassingly simple, but I can't figure it out.) :oops:

Suggestions? Comments? Help... !

Ed
ejgallaher
 
Posts: 6
Joined: Thu Dec 04, 2014 8:52 pm

OK.… !!! :D

I got it working -- combination of Hazel and Automator. Whew!

I notice that several people have asked the same, or similar questions in the past. Whether or not they got it working, I have never seen the workflow posted anywhere that would solve this problem.

I found several gotcha's that were (much) less than obvious. One appears to be an apparent bug in Automator. If not a bug, then the 'obvious' workflow for identifying the path to save items in a given folder is very inconsistent and misleading.

I'm tired -- it's late, so I won't write up the details tonight, but I promise I'll do this over the weekend. I hope others find it useful.

ejg
ejgallaher
 
Posts: 6
Joined: Thu Dec 04, 2014 8:52 pm

ejgallaher wrote:I won't write up the details tonight, but I promise I'll do this over the weekend.


That was written in 2014 :(

Any chance anyone figured out how to make this work smoothly?
cai
 
Posts: 3
Joined: Thu Jul 13, 2017 3:07 pm


Return to Support