Match Patterns with variable number of words

I have set up a workflow where scanned and OCRed files are searched for dates and other patterns, which are then used to rename the file. I want to create a rule that renames certain letters as follows: [DATE] from [SENDER] RE: [WHATEVER IS IN THE RE: LINE]. Where I get stuck is the "RE" line, which has a variable number of words in it. For example, one letter may say:

RE: Inventory Audit

while another may say:

RE: Team Meeting Monday Morning

How can I create a token that captures whatever is in the RE: line, regardless of how long or short it is, and nothing more?

What I have now is:

If all of the following conditions are met
Contents contain Bob Smith
Contents contain match (Date Match)
Contents contain (Re Line)

(Re Line) is a custom token that looks like this:

RE: (abc)(...)

But that only gets me the first word of the RE line. If I add more (...) elements, it will go on to the next line if it gets to the end of the RE line and there are more tokens. Since I can't predict how many words there will be, I need a way to tell it to capture everything after the RE:, until it gets to the end of the line. Is there a way to do that?

Try (...) without any other tokens or spaces around it. That should grab anything until the end of the line.

Mr_Noodle wrote:Try (...) without any other tokens or spaces around it. That should grab anything until the end of the line.

Thanks.

RE: (abc)(...) output "RE:" plus the first word of the RE line.

RE:(...) outputs just "RE:"

I have the exact same issue. I've gotten very far with content matching and would like to add a PDF e-mail's subject line to the file name. I'm using:

Subject: *

Where the * is a custom token containing

(...)

This just throws back the first word after the colon and space in "Subject: "

There might be some odd invisible characters in there. I suggest emailing support so I can take a look at the files and see what's going on.

Actually, thinking about it more, this probably won't work. While when matching something like a filename, the "anything" token will match everything to the end of the line (if it's the last thing in the pattern), matching on file contents doesn't work that way. I need to think about how to make the "anything" token smarter in this case. I'll add an entry in the bug database to look into this for a future release.

Awesome. Thanks for looking into this.

I found a workaround that works for me now, but it's pretty unstable / dependent on the PDF template in use. What I did was use a token as follows to match:

Subject: (*) Date:

The (*) being a (..) anything token. This matches the subject because the next line begins with "Date:". The match seems to work if I specify what comes *after* the "anything" token. This is fortunately consistent for me as the PDFs I am matching are coming from pdfconvert.me and have a consistent template. But it wouldn't work for E-mails in general because mail clients don't always have the date after the subject, call it differently, have another language etc.

But it works so far.

Glad you found a workaround. I will continue to look into a real solution since I know others may need this and it makes sense for it to operate that way.

FYI, I've added this now in version 3.2.6. If the custom token starts with (anything) and it's the first token, it will capture everything from the beginning of the line. Likewise, if (anything) is the last token, it will grab everything to the end of the line.