Use OCR'd date as filename

Get help. Get answers. Let others lend you a hand.

Moderator: Mr_Noodle

Use OCR'd date as filename Wed Jan 02, 2013 1:31 pm • by 73inches
I've searched a lot but couldn't find something helped me with this. Maybe there's a clever head around who can handle this:

I'm looking for a solution to get the first date in an already OCR'd PDF matched the pattern DD.MM.YYYY to rename the file as YYYY_MM_DD-name.pdf. Oh, and I'm not really a shell crack!
73inches
 
Posts: 1
Joined: Wed Jan 02, 2013 1:19 pm

Re: Use OCR'd date as filename Wed Jan 02, 2013 2:56 pm • by Mr_Noodle
If you are referring to a date in the file's contents, then you will probably have to do some level of scripting. I'm guessing someone else can chime in here with a specific script but I imagine it would involve using "grep". You can try searching the forums for that to see scripts that people already have posted.
Mr_Noodle
Site Admin
 
Posts: 11865
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: Use OCR'd date as filename Wed Jan 02, 2013 6:44 pm • by a_freyer
Unfortunately the PDF text layer is not accessible in a plain text readout of a PDF file, so no command line file parsing tools (e.g. grep, awk, sed) will work for you.

You will have to use a (very clever) method like this to extract the contents first.

Here's the basic idea for your case:

Code: Select all
/usr/bin/mdimport -d2 "$1" >& /tmp/_extractedText.tmp; cat /tmp/_extractedText.tmp | grep -Eow '[0-9]{2}\.[0-9]{2}\.[0-9]{4}' | head -1


But since you need this first date as a custom token, you'll have to run this through applescript and use hazelExportTokens.

If you have trouble with this, post back.
a_freyer
 
Posts: 631
Joined: Tue Sep 30, 2008 9:21 am
Location: Colorado


Return to Support