Page 1 of 1

Pattern Matching and Parsing

PostPosted: Mon May 02, 2016 10:14 am
by jmvenable
I've hit a wall on three little problems; hoping I can get some help.

1. I'm searching for an account number in my credit card account statement file, using Contents/Contain Match with a pattern of: numbers<space>numbers<space>numbers<space>numbers. Basically, I want to find the entire account string. But I want to use just the last four digits in renaming of the file. Any suggestions on how to strip out the last four digits for the file name?

2. I want to find the begin and end dates for the statement. They appear on a line that looks something like this:

Opening/Closing Date 05/18/14-06/17/14

After many permutations, my only successful match occurred when I set up a custom date (MM/DD/YY) and matched the second and third occurrences of the dates (the first occurrence being the payment due date). Well, that's okay as far as it goes, but I'd rather key off of the "opening/closing date" label to better identify the dates. The problem is that I can't come up with the syntax for a possible match. I've used various combinations of the label, spaces, and "anythings" without success. Any ideas?

3. Similar to problem 1, I want to locate the account balance for renaming the file. The only successful match was:

Anything $Numbers,Numbers successfully match $2,456.78. How can I strip out the comma (I know, ticky) so I can simply use 2456 in the file rename?

Sorry for what might be obvious questions. Newbie here. JV

Re: Pattern Matching and Parsing

PostPosted: Mon May 02, 2016 10:49 am
by jmvenable
Never mind; figured them out. Sometimes just writing out the problem suggests the answer.

Re: Pattern Matching and Parsing

PostPosted: Fri May 06, 2016 10:23 am
by bver0911
Hi! Can you please share your solutions to these problems? I'm especially interested in how you solved 1 and 2.

Thanks,

B

jmvenable wrote:I've hit a wall on three little problems; hoping I can get some help.

1. I'm searching for an account number in my credit card account statement file, using Contents/Contain Match with a pattern of: numbers<space>numbers<space>numbers<space>numbers. Basically, I want to find the entire account string. But I want to use just the last four digits in renaming of the file. Any suggestions on how to strip out the last four digits for the file name?

2. I want to find the begin and end dates for the statement. They appear on a line that looks something like this:

Opening/Closing Date 05/18/14-06/17/14

After many permutations, my only successful match occurred when I set up a custom date (MM/DD/YY) and matched the second and third occurrences of the dates (the first occurrence being the payment due date). Well, that's okay as far as it goes, but I'd rather key off of the "opening/closing date" label to better identify the dates. The problem is that I can't come up with the syntax for a possible match. I've used various combinations of the label, spaces, and "anythings" without success. Any ideas?

3. Similar to problem 1, I want to locate the account balance for renaming the file. The only successful match was:

Anything $Numbers,Numbers successfully match $2,456.78. How can I strip out the comma (I know, ticky) so I can simply use 2456 in the file rename?

Sorry for what might be obvious questions. Newbie here. JV

Re: Pattern Matching and Parsing

PostPosted: Thu May 12, 2016 4:03 pm
by jmvenable
Sorry didn't see your response:

1. For the 16-digit credit card number, I created a pattern where the first twelve digits were literals. The last item was a text token ("lastPart") defined as (any) four 1-digit numbers. Note that this pattern will match only 12 of the 16 numbers. But, for my application, it is safe and practical. When you rename the file, use LastPart to display your card number e.g. "VISA (...4020)"

2. Once I had Preview to use, it became easy to see what the problem was. The phrase "Opening/Closing Statement" was nowhere near the dates I was searching for even though they appeared in print to be right next to each other. Not on the same line; nowhere close. So it would be like searching for a letter addressee by knowing where the word "Sincerely" was located. Can't do it.

So, I just had to count position of the two dates. I wound up using the third and fourth occurrences. (By the way, looking at the PDF file as it opens, you would swear the dates I need were the first and second occurrences.) Nope. So, use the Preview button.

3. To get the comma out of the balance, I used two custom texts, FirstPart and SecondPart. My pattern was:

$<FirstPart>,<SecondPart>anything ... Where "$" and "," are literal strings.

this will find all dollar amounts. I had to use the nth occurrence feature to figure out which dollar amount was the one I was looking for. Then when it came time to rename the file, I just put FirstPart and SecondPart in the name with no delimiters between them. This easily works for 4-digt values; as written it will miss < $999.

Hope this helps. JV

Re: Pattern Matching and Parsing

PostPosted: Sun May 29, 2016 10:51 am
by bver0911
Sorry I just found time to work on this issue again, visited this site and saw your response.

First, thanks for responding. Your solution sounds logical. My problem is more basic: I don't understand how to use the "Content Matching" function of Hazel. I've figured out how to extract "Dates" from some examples. But I want to extract non-date information and not sure how to do this exactly. I've searched the web, Noodlesoft's Help, and found NO EXAMPLES on how to extract non-date information; i.e., how to CONSTRUCT the Content Match.

Can you help with that?

Thanks,

Bryan


jmvenable wrote:Sorry didn't see your response:

1. For the 16-digit credit card number, I created a pattern where the first twelve digits were literals. The last item was a text token ("lastPart") defined as (any) four 1-digit numbers. Note that this pattern will match only 12 of the 16 numbers. But, for my application, it is safe and practical. When you rename the file, use LastPart to display your card number e.g. "VISA (...4020)"

2. Once I had Preview to use, it became easy to see what the problem was. The phrase "Opening/Closing Statement" was nowhere near the dates I was searching for even though they appeared in print to be right next to each other. Not on the same line; nowhere close. So it would be like searching for a letter addressee by knowing where the word "Sincerely" was located. Can't do it.

So, I just had to count position of the two dates. I wound up using the third and fourth occurrences. (By the way, looking at the PDF file as it opens, you would swear the dates I need were the first and second occurrences.) Nope. So, use the Preview button.

3. To get the comma out of the balance, I used two custom texts, FirstPart and SecondPart. My pattern was:

$<FirstPart>,<SecondPart>anything ... Where "$" and "," are literal strings.

this will find all dollar amounts. I had to use the nth occurrence feature to figure out which dollar amount was the one I was looking for. Then when it came time to rename the file, I just put FirstPart and SecondPart in the name with no delimiters between them. This easily works for 4-digt values; as written it will miss < $999.

Hope this helps. JV

Re: Pattern Matching and Parsing

PostPosted: Mon May 30, 2016 11:25 am
by Mr_Noodle
Use a "custom text" attribute instead of date. The in-app help should have an example of it.