Extract text from OCR-ed PDF, use it for filename

Get help. Get answers. Let others lend you a hand.

Moderator: Mr_Noodle

hi all,
I scanned and OCR-ed a utilities bill and want to use part of the text as the filename of this document.

The text I want to use is the bill's period, example: "2015 March"
Here's an example of a piece of the OCR-ed PDF with the needed text string:
Code: Select all
...
Resibu Aqualectra
11414527 2015 March
673.66 673.66
...

The string that I'm looking for comes directly after "11414527", which will always show on my bill as it is my account number.

How can I extract "2015 March" from the text?
I need to look for "11414527" and then select the text immediately after that.
All help is greatly appreciated!
SamPieter
 
Posts: 3
Joined: Mon Jun 08, 2015 4:02 pm

If there is only one date in the file, you can use a date attribute and just match that date. If there are too many dates and their positions are variable, then using any surrounding context, like the account number will help.
Mr_Noodle
Site Admin
 
Posts: 11868
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

...then using any surrounding context...

Yes that's exactly what I'd like to do, but how?
SamPieter
 
Posts: 3
Joined: Mon Jun 08, 2015 4:02 pm

Enter the text into the pattern, like "11414527 (• your date attribute)". With a pattern like that, the date would have to appear after that account number.
Mr_Noodle
Site Admin
 
Posts: 11868
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Enter the text into the pattern, like "11414527 (• your date attribute)".


Where in Hazel should I do this?
SamPieter
 
Posts: 3
Joined: Mon Jun 08, 2015 4:02 pm

Since you are looking for it in the contents, you probably want to have a condition like "Contents contain match". Search the help for "match patterns" for more info on using patterns.
Mr_Noodle
Site Admin
 
Posts: 11868
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City


Return to Support