Page 1 of 1

Use OCR'd date as filename

PostPosted: Wed Jan 02, 2013 1:31 pm
by 73inches
I've searched a lot but couldn't find something helped me with this. Maybe there's a clever head around who can handle this:

I'm looking for a solution to get the first date in an already OCR'd PDF matched the pattern DD.MM.YYYY to rename the file as YYYY_MM_DD-name.pdf. Oh, and I'm not really a shell crack!

Re: Use OCR'd date as filename

PostPosted: Wed Jan 02, 2013 2:56 pm
by Mr_Noodle
If you are referring to a date in the file's contents, then you will probably have to do some level of scripting. I'm guessing someone else can chime in here with a specific script but I imagine it would involve using "grep". You can try searching the forums for that to see scripts that people already have posted.

Re: Use OCR'd date as filename

PostPosted: Wed Jan 02, 2013 6:44 pm
by a_freyer
Unfortunately the PDF text layer is not accessible in a plain text readout of a PDF file, so no command line file parsing tools (e.g. grep, awk, sed) will work for you.

You will have to use a (very clever) method like this to extract the contents first.

Here's the basic idea for your case:

Code: Select all
/usr/bin/mdimport -d2 "$1" >& /tmp/_extractedText.tmp; cat /tmp/_extractedText.tmp | grep -Eow '[0-9]{2}\.[0-9]{2}\.[0-9]{4}' | head -1


But since you need this first date as a custom token, you'll have to run this through applescript and use hazelExportTokens.

If you have trouble with this, post back.