Extracting PDF Keywords Metadata as Tokens

Get help. Get answers. Let others lend you a hand.

Moderator: Mr_Noodle

Extracting PDF Keywords Metadata as Tokens Sun Oct 13, 2013 10:30 am • by mdevane
I have been testing to see if I can accomplish the following workflow with Hazel or with Hazel + Applescript:

1) emailed receipts are programmatically converted to PDF files from Mac Mail.
2) PDF's are then added to a Doc Mgt/OCR app (Neat) which automatically identifies Vendor, Amount, & Receipt Date (amongst other fields).
3) I manually add the Account name which was used to pay for item: AMEX, MC, VISA, CHECKING, etc.
4) receipts are then batch exported to a folder watched by Hazel. I discovered that Neat converts all metadata which was programmatically identified or manually created into the PDF Keywords area as a comma-delimited string (e.g. Keywords = "vendor=ABC Company", "date=10/08/2013", "amount=$9.99", "account=AMEX", etc.)
5) I would like Hazel to rename the receipt using metadata extracted as tokens.
6) renamed receipts can then be matched to a bank-generated list of transactions inside Quicken.

I have been able to successfully parse the keywords metadata and extract what I need. The challenge is the date string, which uses a format that must be changed from "mm/dd/yyyy" to "yyyy-mm-dd" which can then be used in the renaming process. This particular date field is not a file-related date, but rather the "Receipt Date", which was automatically extracted by Neat.

Can Hazel convert the date string or would I need to use a scripting language like Applescript to convert the date to a new token for renaming? If scripting is the answer, could someone point me in the right direction, as I am not a programmer, per se, but am technical enough to mimic and expand.

For the accountants and bookkeepers using Hazel, is there an easier solution you have found or developed???

Thanks in advance for your help.

Regards, Mark
mdevane
 
Posts: 3
Joined: Sat Oct 12, 2013 5:32 pm

When matching the date in Hazel, match it as a date and not as string. This will create an actual date so when you use that custom token in the rename action, you can reformat it to whatever date format you want. Basically, when matching, use a date token, not a custom token.
Mr_Noodle
Site Admin
 
Posts: 11255
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: Extracting PDF Keywords Metadata as Tokens Thu Oct 17, 2013 10:21 am • by mdevane
Thanks Mr Noodle - I was actually able to get the match working using Custom Tokens, but only because the individual elements (dd, mm, yyyy) were all there to begin with. I just needed to reorder them and use different data separators.

I will go back and convert the rule to use Date Tokens, per your comments above, as this will give me more flexibility in the future on other data matching routines.

Thanks for your help on this.

- Mark
mdevane
 
Posts: 3
Joined: Sat Oct 12, 2013 5:32 pm


Return to Support