I was tempted to latch on to the forum topic as below, but my query is a bit more basic! (I think! )

viewtopic.php?f=4&t=2021&p=8434&hilit=using+mdls#p8434
I have trawled the Support forum looking for information about getting into the TEXT-file information of pdf files. There are a few topics on this, but they are unfortunately aimed at people who presumably have plenty more coding knowledge in Terminal than me.
I'm hoping someone can dumb this down for me.
I have many of pdf files collected from a variety of places over the past few years.
Obviously, some of these are OCR'ed already, but many are not.
And as far as I can tell, the only way to know - off-hand, whether OCR has happened, is to open the file, and perform a search.
UNLESS you tell Hazel to do that for you, by differentiating between files that already have, or still have to be, OCR'ed.
I have managed the following so far, using various useful threads listed throughout the Support forums:
a.) Sort new and old PDF's as I need them, including [sort into subfolders];
b.) ...
c.) Have Hazel send a file (that needs to be OCR'ed) to PDFpen, and have the magic happen.
d.) Have Hazel sort and move the recently OCR'ed files back to where they came from.
As is hopefully apparent from the above - I'm stuck at b.)
I need to get Hazel to do the following:
I've managed to work out that if the TEXT information of a PDF file contains the following terms:
"encoding" (and/or) "decodeparms" - then there is a very high possibility that the PDF in question has not been OCR'ed.
What I mean by the above, is if I select a PDF, open it with [TextEdit], and the simply do a search for either "encoding" or "decodeparms", if the terms are not present in the text, then the file has (probably) not been OCR'ed.
[As an aside, I had a look at the metadata through running 'mdls' in Terminal, as many have suggested - but cannot discern a pattern emerging between files that have, and have not, been OCR'ed]
This is then step b.) - I want to get Hazel to call up Textedit on certain files moved into a watched folder, run the search to confirm if either "encoding" or "decodeparms" are present - AND IF SO, initiate c.) - d.).
I had a look at the following topic: viewtopic.php?p=3593#p3593
Unfortunately, it went straight over my head.
Specifically the following little bit of code:
- Code: Select all
#! /bin/bash
if ! grep Font "$1"
then
echo "This file needs to be OCRed""
else
echo "This file does not need to be OCRed"
fi
In the other, "Applesuperlatives" walked the same path I am trying to walk, and concluded that in his case - "Font" was the keyword. For whatever reason (and I checked), that doesn't work this side - as mentioned, mine are "encoding" and/or "decodeparms".
So presumably, I could simply replace (Font) in the above, with my two search-terms...
But I don't even vaguely know where to begin.
Do I simply enter the script as above into Hazel?
Do I need to expand on some of the steps - since I presume code-words have been used to simplify things?
I've googled quite a bit so far - and realised I am either going to spend hours sharpening my Terminal skills (that are virtually non-existent), in order to understand the above - or I can try my luck here...
Here's hoping someone can provide me with some tips...
