Noodlesoft Forums

Posted: **Wed Jan 02, 2013 1:31 pm**

I've searched a lot but couldn't find something helped me with this. Maybe there's a clever head around who can handle this:

I'm looking for a solution to get the first date in an already OCR'd PDF matched the pattern DD.MM.YYYY to rename the file as YYYY_MM_DD-name.pdf. Oh, and I'm not really a shell crack!

Posted: **Wed Jan 02, 2013 2:56 pm**

If you are referring to a date in the file's contents, then you will probably have to do some level of scripting. I'm guessing someone else can chime in here with a specific script but I imagine it would involve using "grep". You can try searching the forums for that to see scripts that people already have posted.

Posted: **Wed Jan 02, 2013 6:44 pm**

Unfortunately the PDF text layer is not accessible in a plain text readout of a PDF file, so no command line file parsing tools (e.g. grep, awk, sed) will work for you.

You will have to use a (very clever) method like this to extract the contents first.

Here's the basic idea for your case:

Code: Select all: /usr/bin/mdimport -d2 "$1" >& /tmp/_extractedText.tmp; cat /tmp/_extractedText.tmp | grep -Eow '[0-9]{2}\.[0-9]{2}\.[0-9]{4}' | head -1

But since you need this first date as a custom token, you'll have to run this through applescript and use hazelExportTokens.

If you have trouble with this, post back.

Noodlesoft Forums

Use OCR'd date as filename

Use OCR'd date as filename

Re: Use OCR'd date as filename

Re: Use OCR'd date as filename