Page 1 of 1
Match Patterns with variable number of words

Posted:
Mon Mar 10, 2014 2:33 pm
by temjeito
I have set up a workflow where scanned and OCRed files are searched for dates and other patterns, which are then used to rename the file. I want to create a rule that renames certain letters as follows: [DATE] from [SENDER] RE: [WHATEVER IS IN THE RE: LINE]. Where I get stuck is the "RE" line, which has a variable number of words in it. For example, one letter may say:
RE: Inventory Audit
while another may say:
RE: Team Meeting Monday Morning
How can I create a token that captures whatever is in the RE: line, regardless of how long or short it is, and nothing more?
What I have now is:
If all of the following conditions are met
Contents contain Bob Smith
Contents contain match (Date Match)
Contents contain (Re Line)
(Re Line) is a custom token that looks like this:
RE: (abc)(...)
But that only gets me the first word of the RE line. If I add more (...) elements, it will go on to the next line if it gets to the end of the RE line and there are more tokens. Since I can't predict how many words there will be, I need a way to tell it to capture everything after the RE:, until it gets to the end of the line. Is there a way to do that?
Re: Match Patterns with variable number of words

Posted:
Mon Mar 10, 2014 3:40 pm
by Mr_Noodle
Try (...) without any other tokens or spaces around it. That should grab anything until the end of the line.
Re: Match Patterns with variable number of words

Posted:
Mon Mar 10, 2014 4:45 pm
by temjeito
Mr_Noodle wrote:Try (...) without any other tokens or spaces around it. That should grab anything until the end of the line.
Thanks.
RE: (abc)(...) output "RE:" plus the first word of the RE line.
RE:(...) outputs just "RE:"
Re: Match Patterns with variable number of words

Posted:
Wed Mar 12, 2014 11:20 am
by arjunm
I have the exact same issue. I've gotten very far with content matching and would like to add a PDF e-mail's subject line to the file name. I'm using:
Subject: *
Where the * is a custom token containing
(...)
This just throws back the first word after the colon and space in "Subject: "
Re: Match Patterns with variable number of words

Posted:
Thu Mar 13, 2014 2:15 pm
by Mr_Noodle
There might be some odd invisible characters in there. I suggest emailing support so I can take a look at the files and see what's going on.
Re: Match Patterns with variable number of words

Posted:
Fri Mar 14, 2014 2:34 pm
by Mr_Noodle
Actually, thinking about it more, this probably won't work. While when matching something like a filename, the "anything" token will match everything to the end of the line (if it's the last thing in the pattern), matching on file contents doesn't work that way. I need to think about how to make the "anything" token smarter in this case. I'll add an entry in the bug database to look into this for a future release.
Re: Match Patterns with variable number of words

Posted:
Sat Mar 15, 2014 2:38 pm
by arjunm
Awesome. Thanks for looking into this.
I found a workaround that works for me now, but it's pretty unstable / dependent on the PDF template in use. What I did was use a token as follows to match:
Subject: (*) Date:
The (*) being a (..) anything token. This matches the subject because the next line begins with "Date:". The match seems to work if I specify what comes *after* the "anything" token. This is fortunately consistent for me as the PDFs I am matching are coming from pdfconvert.me and have a consistent template. But it wouldn't work for E-mails in general because mail clients don't always have the date after the subject, call it differently, have another language etc.
But it works so far.
Re: Match Patterns with variable number of words

Posted:
Thu Mar 20, 2014 12:41 pm
by Mr_Noodle
Glad you found a workaround. I will continue to look into a real solution since I know others may need this and it makes sense for it to operate that way.
Re: Match Patterns with variable number of words

Posted:
Tue Apr 08, 2014 12:12 pm
by Mr_Noodle
FYI, I've added this now in version 3.2.6. If the custom token starts with (anything) and it's the first token, it will capture everything from the beginning of the line. Likewise, if (anything) is the last token, it will grab everything to the end of the line.