Handling date variations

Hi,

I'm testing hazel for processing of doxie-scanned and ocr'ed files. I'd started with my salary statements which I'd rename to a form like "Gehaltsabrechnung Month Year.pdf". So I gave it a try with the date matching to match for "Month Year" which contains month and year in text form. It works like a charm, if the exact form is found in the document.

Unfortunately the OCR is sometimes (more often than not) producing things like this:
F e b r u a r 2016

So i tried it with another date, which is in the form:
27.02.2016

This works, but unfortunately there are documents where OCR produces similar stuff as described for the first form, e.g with spaces between the numbers.

Now I'm wondering if there is any possibility to specify alternations of a pattern, so that I could match for
(Month as word|Month as two-digits)(anything)(Year)

I'm aware that I can create multiple rules, but that seems error-prone, since I need to keep the actions in sync.

Apart from that I can use nested conditions to match the different formats, but I cannot re-use the same pattern name and define precedence, so I cannot really use that to do the naming.

Any option I miss?

Best Regards,
Patrick

You can use multiple conditions. Check out this article on how to create a nested condition: https://www.noodlesoft.com/kb/how-to-cr ... onditions/

I might be stupid, but I don't really get how that solves my problem, as I already wrote in my post.

Look:
> Apart from that I can use nested conditions to match the different formats, but I cannot re-use the same pattern name and
> define precedence, so I cannot really use that to do the naming.

Let me elaborate on that:

Let's say I'd define a rule where all rules need to match, with one rule for the document type ("contents" match "Gehaltsabrechnung") and a nested rule where any needs to match with the two options for custom dates. In my tries, I were unable to reuse the name for the custom date field as hazle automatically corrected it to another name.

So how exactly do I get the desired result with nested conditions?

First off, "rule" is the overarching combination of conditions and actions. I believe you mean "condition" here.

You can re-use the same attribute. Just drag it in. It won't work if you drag in a new one and try and name it the same as Hazel thinks it's a mistake when you do that. That's why the previous attributes are available to drag in so that it's clear you want to re-use the existing one. Give that a shot.

You are absolutely right that I meant conditions. Sorry isn't always that easy to get wording right as a non-native speaker.

And thanks for the clarification. In fact I achieved it by trial and error in the meanwhile, I just haven't gotten around to reply to my own post yet.

I have to say that it feels somewhat like a hidden feature (and I have to confess that I failed to find the hint in the documentation in the first and second try). That is because while it's totally obvious that I can re-use an attribute as-is by dropping it, it's not so obvious that I can actually change the underlying pattern.

If I were to chose how to change this, I'd probably do the following things:

Add a hint to the help in "Using match patterns" - where there is written "You can also re-use (...) it should probably read "You can also re-use and create alternations of custom attributes (...)"
Probably add a hint about that feature to "The Ins and Outs of Match Patterns", too
Add some kind of visual indication for the rule view, because currently it's not possible to tell if the custom token is used multiple times or has a different pattern. An option would be to write the pattern behind the name in parenthesis or ask for an additional text identifier when re-using an attribute and using that.

Btw. this would probably be a good match for an FAQ entry, although the link in the help points to a search page instead of a browsable FAQ.

Thanks for your help!

Best Regards,
Patrick

Thanks for the suggestions. There is new help in the wings. As for FAQ, actually it's not that frequent as it's a very advanced feature.

I've considered visual differences but the flip side of that is that people may not realize it's the same attribute if it is at all different. Also, it would be hard/noisy to embed the whole pattern within the name.