Taking Hazel to the next level (feature requests)

Hello,

It hasn't that long ago that I have discovered Hazel, and I can't figure out how I had lived without it all this time.

Often though, while working with Hazel, I swing back and forth two diametrically opposed emotional states :

On the one hand, as I continue observing its rock solid stability/quality and discovering new features and tips time and again, I keep admiring what a useful and elegant piece of software it is.

On the the hand, as I often stumble upon some -quite unexpected- shortcomings, I find myself getting quite frustrated and wondering why I insist on Hazel instead of just rolling my own rule-sets based on scripting alone.

Anyhow, here's a list of features that come to my mind which would take Hazel to the next level while actually simplifying its usage.

Ability to ...

1a. Pass literal a values to scripts (in addition to attributes)
1b. Define and set custom attributes outside of a pattern matching context, without having to resort to an external script just for that.
1c. Use [custom] attributes on the left hand side of a match rule
1d. Have a few more comparison operators for match rules
2a. Pass an input stream (stdin) to and capture the output stream (stdout) of shell scripts.
2b. Pass input/output parameters to/from shell scripts (I can propose a backward-compatible convention, if needed)
3a. Get/set arbitrary extended atributes ("xattr") of a file/folder, in addition to the spotlight comments and tags.
3b. Ideally, having the option to set those "xattrs" conditionally (e.g. only if currently undefined / empty /set to particular value, ...).
4a. Use complex (struct/record) data-types as input/output parameters of scripts. (though this could be desirable for al types of scripts, it is mrobably much easier to implement with JavaScript/AppleScript).
4b. Use such complex custom attributes within Hazel (via something similar to dot notation in the attribute names)
5. Reuse the same rules in multiple folders without having to maintain multiple copies.

In the above list, I have tried to group related items under the same number heading. The letter subscripts (a, b, ..) are closely releated items or sometimes being downright alternatives to each other.

I have tried to draft the above list in kind of a priority order while paying attention to the possible ease/difficulty of implementation. Both accounts merely reflect my own views for the time being, naturally open for discusion.

BTW, I am aware that a lot of these could be considered "currently possible" albeit most of them in quite convoluted ways sometimes involving ugliesh hacks (or perhaps in somemore straightforward ways currently unknown to me ?)

For some my immediate pain points, I may have to turn to the support forum to seek others' wisdom.

The motivation behind this particular post is to surface feature-requests. Hence I haven't dwelled into my own use case(s), but I can indeed give some example scenarios if needed. I am sure others could chime in, too.

Cheers all,
Ayk

Thanks for the suggestions. What's helpful is if you provide concrete examples as many times the solution is something totally different.

To address your specific points:

1a. Could you not just hardcode those values into the scripts? If not, then where are these values coming from?
1b. See 1a.
1c. This is possible now unless I'm missing something.
1d. Need more specifics.
2a. Where is this input stream coming from? Note that stdout will go to the log if debug mode is turned on.
2b. This is in the feature list, at least for input parameters. Output is more tricky since shellscripts are very limited in what they can return.
3a. Which attributes did you want to access?
3b. See 3b.
4a. Again, where are these coming from? Need examples.
4b. Need examples here.
5. You can sync rules between folders. Right-click on a folder and select "Sync options".

It does appear like you have a very specific workflow in mind with these suggestions so maybe it's more productive you start there.

Hello,

Thank you for your response and interest.

As requested, I have jotted down below my main use case(s). I will add clarifications and concrete examples for my suggestions on another post.

In fact, I had actually refrained from tying these suggestions directly to my particular use cases in an effort to produce a set that could be potentially be useful for many different scenarios and Hazel users.

But, yes, concrete examples always help for better conveying things. And, yes, I am certainly interested in discovering and considering any possible solutions within the reach of the current feature-set, since I am well aware that that's all we've got in the short-term, anyway.

-----------------------------------

Anyhow, where I am facing difficulty is in my quest to go paperless.

For a number of reasons, I wish to not only extract and derive, but also persist a bunch of properties (metadata in key-value pairs) about each document, before actually filing it away (i.e. renaming & moving).

Here, we are talking about arbitrary custom properties whose names as well as their syntax & semantics are defined by myself.

Again, for a number of reasons, I prefer to persist these properties in individual key value pairs, stored in extended attributes (i.e. xattrs) provided natively by most decent file systems (APS, HFS+, NTFS, EXT4, ...)

Here are some examples of such custom properties and some typical values :
- doc.nature : letter, bank-statement, bill, reciept, ...
- doc.date : (the actual document date extracted from textual contents whenever possible)
- doc.flow : recieved, sent, internal, n/a, ...
- doc.party : MELLON
- doc.party-desc: bank, insurance, supplier, ....
- doc.related.account :
- doc.related.service :
- doc.realm : personal, work-related, home-related, ...)
- ...

Anyhow, you get the picture...

In general, depending on the "nature" of the document, I may have upto 15-20 such properties for a given file that would need to be persisted in xattrs.

The typical workflow involves the following prelimenary steps, which may be partially automated quite trivially using Hazel :

Here is a high-level description of the rules on INBOX:

"INBOX" rules

RULE-1: If document has already got a valid set of my custom METADATA persisted along with it
==> place it into the "FILE_AWAY" sub-folder
RULE-2: Otherwise, if it meets some basic criteria (e.g. if it is a PDF document),
==> place it into the "CLASSIFY" sub-folder
RULE-3: Catch-all : As a last resort,
==> Move it to the "STUCK" sub-folder (meaning it needs human attention)

But then, when we look at the rules for the "CLASSIFY" and "FILE_AWAY" sub-folders, that's where thinngs get hairier :

-----------------------------------------------------
"CLASSIFY" Rule(s)

Within the watched "CLASSIFY" folder, I would have several rules whose mere purpose would be to "extract/transform/enrich" the desired metadata set, and then persist those metadata in x-attrs.

Here's what a sample "CLASSIFY" rule could look like,

match:

m1. Determine and match against the type of document (bank statement, electric bill, pay-slip, ...), based on some match criteria on the file's name, contents, attributes, or other properties;

action:

a1. Extract some preliminary raw metadata (e.g. doc-date, amount, account#, ...) typically from the textual contents through pattern matching or similar means;
a2. Transform / enrich the metadata set based on some of the metadata obtained above (e.g. "realm", "flow-direction", etc)
a3. Persist the desired subset of metadata associated with this file.

IAs mentinn earlier, I prefer to persist the metadata in individual key value pairs as custom extended attributes (i.e. xattrs) provided natively by most decent file systems (APS, HFS+, NTFS, EXT4, ...)
a4. Move the document to back to "INBOX" (whose rules would normally place it into "FILE_AWAY")

-----------------------------------------------------
FILE_AWAY Rule(s)

Here, the purpose is to rename eligible files and move them to their final archival locations.
Both the new name and the new location would depend on a bunch of MEDATA that must have already been persisted in filesystem x-attrs.

If possible, I would ideally have one simple rule that could generically do this on any previously CLASSIFIED file, using a capable renamer like FileBot (who is able to use arbitrary x-attributes in its renaming formats/templates, although quite slow...)

Otherwise, I may settle for having one Hazel rule per naming-template. In any case, the rule would in any case look like this :

match:

m1. Match any document that has got the necessary METADATA persisted in given x-attr keys, regardless of the actual x-attr values themselves (basically any file for which a CLASSIFY rule was previously succesfully applied)

action:

a1. Rename the file, based on the values of some x-atribs (metadata);
The naming template itself may also vary, depending on "document-type" (bank statement, electricity bill, ...)
a2. Move the file to its archival location (again, the path would be computed based on x-attrs )

----------------------

The above describes the general scenario.

And at first sight, it seems to fit the bill for Hazel's current feature set, as long as a few things (like reading/writing the xattrs) are delegated to some scripts as needed...

Yes, you can achieve this through scripts. The xattr command should help here.

As far as providing xattr support with a built-in action, I'll consider it but it is considered a more low level feature. xattrs require a namespace which is in reverse domain format and not everyone has their own domain. While they could make something up, it's still a bit geeky and users could inadvertently conflict with other domains.

Also, it would be odd to provided something which has no visible equivalent in Finder. Users may not understand what they are and where they are stored.

As long as the subject is feature requests, I know in my situation some ability to auto-sort rules would be helpful. I've got some folders which have a couple dozen rules applied to them, so something like Finder-style headers for sorting (e.g., Name or Date Added or Last Used) would be beneficial. Whether that's easy enough to implement or worthwhile enough for the overall user base, I'm certainly not qualified to say.

You should probably do your own thread for your request but as far as sorting rules goes, the problem is that the rule ordering affects how they are executed.

If you find yourself with a lot of rules to manage, you may want to look into consolidating them. Look into match patterns as that's one way you can replace a lot of hardcoded rules with a single rule that covers the same pattern.

Yes, I hear your reasonable concerns about providing direct access to xattrs.

And yes, it is indeed possible to delegate the xattr access to scripts; and as I have already hinted, that's OK for me, as long as I have some practical means to do such things in a reusable and easily maintainable manner.

I will try and explain my pain points while attempting to implement my scenario outlined earlier, and the approaches I have gone through along the way... It's a bit long, sorry about that; but it's kind of funny in a way.

At first sight, I had the impression that the whole thing would be a piece of cake... So I set out implementing my project along those lines, only to end with a set of unmaintainable mix of rules and scripts.

Below is an account of the several approaches I have tried for implementing a typical CLASSIFY rule, along with the issues that have arisen in each attempt.

APPROACH 1 :

brief description :

Make use of a Hazel custom attribute for each property mentioned above (so as to end up with 15-20 such custom attributes)
For step a1 : Use a mixture of Hazel patterns and JavaScript code for extracting the raw metadata;
For step a2 : Use JavaScript for the metadata transformation and enrichment stage; (can't use Hazel here since that would require the features 1b, 1c, 1d)
For step a3 : Use JavaScript for persisting the the properties in xattrs

pros :

p1.1 : This kind of staged aproach, where each distinct step (a1, a2, a3, ...) is called upon individually from Hazel, would normally allow for good code reuse and enable mixing Hazel's capabilities with those of scripting, since both worlds have access to the individual custom properties.

cons :

c1.1 : The scripts need to be written in JavaScript / AppleScript as parameters can't be passed to or received from shell scripts (feature request 2b).
Not prefereable in my case, but I could eventually live with that (writing some glue code in JavaScript as needed)
c1.2 : More importantly, this approach results in a maintenance nightmare (as explained below).

In order to truely appreciate what this maintenance nightmare is all about, please remember a few aspects from the scenario described above :

A considerable number (15-20 and up) of metadata properties
A considerable number (30-40 and up) of CLASSIFY rules, each corresponding ro a distich "document type" (bill, statement, ...)
3-4 distinct steps that would be delegated to scripts in each CLASSIFY rule

Soon enough, I realized that I would needed to maintain (through the Hazel GUI) the I/O parameter lists of 100-150 script calls scattered around, with all the fragility that goes with positional parameters in general.

The initial hassle could be dealt with duplicating a rule template, but then each time a new property needs to be added/modified, I would need to go back and do that in 100-150 different places... very error-prone indeed...

I knew I had to find a better way... So I looked at a radically different approach :

APPROACH 2 :

brief description :

Use Hazel only for initially matching a rule's appllicability conditions
Do everything else (a1, a2, a3, ...) in one big combined script

pros :

p2.1 : Better maitanability, since there is no need to worry about maintaining a huge number of I/O parameter lists scattered around.
p2.2 : Shell scripts are back on the table (as long as you are willing to maintain a specific script per rule)

cons :

c2.1 : Can't really mix and match Hazel's capabilities with those of scripting, since everything happens in one opaque step from the point of of Hazel. Also Hazel ca no longer access the produced metadata.
c1.2 : Considering the previous point, but more generally, what's the point of using Hazel in this case? -- since its role is merely reduced to a rule-matcher.

BTW, in the end, I can still see some merit in continuing to use Hazel even here, since it also takes care of watching folders as well as providing a GUI for activating/de-activating rules, BUT it's really a pity not to be able to use a lot of its capabilities...

So, back to the drawing table for a new aproach in the hope of combining the benefits of the two previous approaches and elimintaing the shorcomings...

APPROACH 3 :

brief description :
This is essentially the same as the very first approach (1), but this time, I tried to come up with a somewhat creative workaround in an attempt to address the maintainability issue, as outlined below :

Put the whole set of metadata into a JavaScript class;
Pack it into a string (in JSON representation) ;
Pass that one opaque string around back and forth to Hazel;
As needed, write small scripts for extracting some metadata from the opaque JSON token into some custom attributes in Hazel.

pros :

p3.1 : Better maitanability (compared to APPROACH 1), since there is no need to worry about maintaining a huge number of I/O parameter lists scattered around.
p3.2 : At least in theory, it's now possible to leverage Hazel's own capabilities.

cons :

c3.1 : In practice, it's still somewhat convoluted when it comes to leveraging Hazel's own capabilities
c3.2 : Shell scripts still require specialized glue-code.
c3.3: In general, it's still hard to write generic scripts that could be parametrized (more on that later)

All in all, this last approach (3) appears to be more promising, but still somewhat awkward, as described above.

------------------------------

This has again been a long post, again... But if you have been baring with me upto this point, you have probably observed where each requested feature could have played a role in overcoming some of the difficulties.

At this stage, if I had to pick a feature from the feature requests, it would be :

feature request 1a (ability to pass literal values to scripts),
a small change in the UI that would allow me to readily see (without opening up a dialog) which script is being called along with the parameters being passed to it (the latter was not in the original list of my feature requests)

Especially when employed together, these two could go a long way by :

better enabling and encouraging reusable code (removing the need for a lot of the glue snippets)
allowing me to see what's going on in a rule just by glancing at it

As an example, consider the case where I need to get at a property value from within the packed JSON as decribed above :

Code: Select all: getprop ("doc.nature", props) => docNature

where :

props is my custom Hazel attribute that holds the opaque JSON string mentioned above
docNature is the custom Hazel attribute that receives the value of the desired indvidual property, upon decoding the JSON.

In fact, I can still live with having to do the above call within an embedded script (as you have suggested in your initial reply), but then I really would like to directly see what that embedded script is doing when I glance at the rule.

Instead, all we see at this time are lines like :

Code: Select all: Passes Javascript : embeded script

So, at this point, if nothing else is possible, how about just adding a Hazel preference (either global or individual) that allows to display the first "n" lines of embedded scripts (in the screen where we see the list of matchers and actions)... Here, "n" is a number that can be set by the user (the default can still be 0).

I'm not quite following why you need to send in constants and why those can't be in the script. Can you run through a very specific example of how you would use this?

Mr_Noodle wrote:I'm not quite following why you need to send in constants and why those can't be in the script. Can you run through a very specific example of how you would use this?

As more recent Hazel new user, I can easily see the need for passing in literals without having to resort to an embedded script.

One specific example is that I want to be able to get the "Last Opened" or "Modified" date of a file into a custom token (so I can use them in a rename action). I cannot do this directly in a Hazel rule as far as I can tell. So a script is in order. The script would also have to accept a value indicating which one of the two is needed. That requires passing a literal to the script. Since we cannot, I have to replicate two almost the same versions of the script so I can use the correct one as needed. That duplicates code and results in a maintenance nightmare. Even worse is that unless the script is less than say 5 lines or so, having these copies embedded in rules makes it hard to find them back for later updates, should this be necessary.

Related is that I would like scripts to be able to modify custom tokens that may have been used in prior rules. Today one cannot create a custom attribute for export with the same name as an already existing one. If one could, values should only be allowed to be set into it if (a) the rule passes, and (b) the value is not already set. That part of the logic would already be there.

I tried this script at first (embedded):

Code: Select all: set inputItem to item 1 of inputAttributes return {hazelPassesScript:true, hazelOutputAttributes:{inputItem}}

but then ran into the realization I cannot output to already defined tokens.
This particular one has a work around, but it makes it convoluted: (1) Create a brand new token for output, and (2) add a rule that says "If <new token> <matches> <existing token:...>"

Perhaps that helps to better understand the need.

Is there a reason why you need to capture the date into a custom attribute? The dates themselves should be usable in a Rename action.

As for re-defining attributes in a script, I'll have to think about it though if you could provide a more real-world example, that would help.

Mr_Noodle wrote:Is there a reason why you need to capture the date into a custom attribute? The dates themselves should be usable in a Rename action.

If you have rules “fishing” for a date to put in a rename, it might come from ocr content and, if not there, from the file etc. in the rename you can only put one token or date in a specific location. There are no options for if/else type logic there so you have to do it in rules beforehand.

As for re-defining attributes in a script, I'll have to think about it though if you could provide a more real-world example, that would help.

I think my example here is real world. It sure is to me, and several others who have asked had a need to.

I have seen/heard many arguments from you that certain features would make the product too hard, or harder to use for “users with simpler needs”. You also often seem to use the argument that what is asked for can be achieved in other ways. I get the impression that in your quest for keeping things simple you sometimes do not fully understand or read what the requester is saying.

Another real world example: There are many rules that I have created that could have had less than half the conditions in them if there were support, for example, for (as an option) “matches regular expression” and its “does not match” counterpart. Lots of ocr documents may have subtle transcription differences in them causing variations to be needed to capture what you want in a token. Variations often means additional rules and very careful ordering.

As an example, last night I had an unexpected result. I have some rules archiving any names that that with yyyy or yyyy-mm-did into sub folders for each year. Due to the content is frequently encounter I had to have a rule to catch <1><1><1><1> Before the one that catches the full date. That already gave rise to two rules instead of one where a regular expression could have handled both. Due to the inability to say in a “match” that I want to check for yyyy only at the start (starts with does not accept a match pattern) I had a name with a tax year in the middle of it, while the file name itself started with a yyyy-mm-did and it matched the first rule, so it caught the first rules, missing out on the complete date and thus loosing it in the rename. I had to build several more rules and any/all around it to prevent it from happening. All could have been just one with either regular expressions, or more options in the regular match operation.

That is another real world example.

Here is another: Lack of ability to reuse a set of conditions or a rule is another. I have seen you refer users to the sync option to share rules between folders, but that is a all or nothing proposition. You cannot share just specific rules, forcing you to copy them, again resulting in a maintenance nightmare. You can get around some of that by using markers, such as file colors and tags, and serializing what you do through a pipeline of rules, but unless you are careful to remove such temporary tags afterward you are left with an unwanted mess. Colors have the problem that if files were already colored for some other purpose you might lose that.

Perhaps there is a class of your users that is much more sophisticated than the “simple” user you often think of. While I believe it is possible to design UI with what is called progressive disclosure to address issues of mixed sophistication/need in your user base (I’ve only been in the sw business for 30+ years), it is ultimately your decision, but if you decide not to want to address these more sophisticated uses, that’s your decision.

The problem here is that people tend to focus on what they think is the solution instead of what the problem is. When you provide a concrete example, and not just an abstract description of it, that better serves everyone.

For example, your problem with a name starting with year is based on some incorrect assumptions. If you use "Name matches", you can most definitely specify the patter start with a four digit year. That's because "matches" (as opposed to "contains match"), requires you to specify a pattern which matches all characters. So, if you do "<1><1><1><1><anything>", then the pattern must match something starting with 4 numerical digits followed by anything afterwards.

Again, for your other issues, please provide specific examples. As you can see, focusing on solutions to abstract problems is the wrong way to go about things. This happens countless times and if I implemented solutions as requested by users without knowing the real problem, the software would be a mess plus in many cases, the user's problems would not be solved.

Mr_Noodle wrote:The problem here is that people tend to focus on what they think is the solution instead of what the problem is. When you provide a concrete example, and not just an abstract description of it, that better serves everyone.

For example, your problem with a name starting with year is based on some incorrect assumptions. If you use "Name matches", you can most definitely specify the patter start with a four digit year. That's because "matches" (as opposed to "contains match"), requires you to specify a pattern which matches all characters. So, if you do "<1><1><1><1><anything>", then the pattern must match something starting with 4 numerical digits followed by anything afterwards.

Again, for your other issues, please provide specific examples. As you can see, focusing on solutions to abstract problems is the wrong way to go about things. This happens countless times and if I implemented solutions as requested by users without knowing the real problem, the software would be a mess plus in many cases, the user's problems would not be solved.

I think the real problem in our communication is that I translated the needs from a real world example, into a generic feature because it makes no sense to ask you to implement a single solution oriented feature. That seems to have tripped you up into thinking I don't know why I am asking. So, here is a real world example that I encounter today:

Problem: I need to match the contents of the OCR text of a file against a series of strings, and capture which one was matched. The current, and obvious, approach in Hazel is to create a "any" rule with nested in it as many "Contents match <pattern>" where the pattern is consistently the desired token, lets say •Match, and the attribute of the token set to one of the desired possible matched.

In my particular case, there were some 10 desired matches, so 10 cases under the "any". I also had a bunch of other stuff, causing the edit window to take up all screen height. When I then tried to modify the "rename" rule I had at the bottom, the little popup for that was too tall and most of it fell off the bottom of the screen, rendering it impossible to be changed.

Realizing that you might eventually fix that bug, I needed to proceed and the only solution seems to make the overall rule set shorter. It was not easily possible to split my single rule into independent rules without duplicating much of it in each copy (undesirable for maintenance). So, I decided that if I could replace the "match one of many" described above with something simpler, I could shorten the rule. This led to the desire to have a script that could be passed a list of allowed matches, that, if run, would return the match that worked along with true, or false otherwise. The "shortening" of the rule only set me off on the trail, but I had encountered, many times before, cases where my rulesets could be much much simpler if such a solution existed.

The script was easy to write, in fact just about 10 lines of Javascript. But... how to get into the script the list of allowable choices? As mentioned above, the lack of options to send literals makes this impossible, causing me to have to modify the script to:

Code: Select all: let target = inputAttributes[0] let choices = [ "choice1", "anotherchoice", "something else" ]; let pattern = "(" + choices.join('|') + ")" let matches = target.match(new RegExp(pattern, "i")) if (matches != null) { return {hazelPassesScript:true, hazelOutputAttributes: [matches[1]]} } else { result = {hazelPassesScript:false} }

OK, so this works and is actually nice because I can really use regular expressions in each alternative, which makes it easy to handle small OCR errors as well (e.g. use a choice like "c[0o]ice" to capture cases where the "ooh" was OCR-ed as a zero). It would be much nicer if this script could be an external script that could be used in many rules, but... each rule would want its own set of alternatives and that cannot work with a single external script. So the only re-use option would be to use an embedded script and copy it in place each time --> maintenance nightmare.

Aaah, well it does not really work because I tested this by matching a file name. Once I had the script working, I tried to change the input to "Contents", but to my surprise, that choice was not available. Here too, there is a workaround by creating a rule before the script that goes like this: [Contents] [matches] [•Contents_Text:...] (in other words match the whole contents into a token). Then use "•Contents_Text" as the input. Seems unnecessarily complicated. I could see "Contents" is not made available as it might be bigger that the scripting interface can handle. I'll see what happens with my alternative approach.

Now, ultimately I need the matched value in a rules variable/token (named •Other) that may, or may not, already have been set by prior lines in the ruleset. Consequently, Hazel does not allow me to use this name in the "exports" section of the script to capture the exports, so to achieve what I want, I need to add another rules below it that says: [•Match] [matches] [•Other] where the attribute of •Other is set to "<a1%><...>" (this looks complicated but I only want it set if something non-empty was matched).

The above, I think, describes one case of the need for, or usefulness of, literal inputs to scripts. It also makes the case as to why "exporting" to existing tokens should be allowed (removing the need for the extra rule shown above), although it would effectively not store anything if the token was already given a value earlier.

Now, you might still think that I am presenting a solution but have the wrong approach. I think the illustrated case is clear, so if you think that, I would be interested in hearing what the solutions/approach should be according to your "correct" thinking about the problem, as long as that approach does not result in my having to replace the single feature/solution I ask for with tons of rules in multiple scripts without duplicating lots of stuff.

Thanks for the example. I do have a feature on the list for Hazel 5 which would be lists and/or tables. This would allow you to define (or maybe read from a file) lists or tables which you can then reference in a rule. I think this would handle your case.

As I've been thinking about this feature recently, some questions:

- Would you need the lists only for a particular rule, a ruleset or for all rulesets across all folder?
- If you stored something like this in a file, what format would it be in?
- Besides "Contents contain match", are there other conditions or contexts you would want to use this?

Mr_Noodle wrote:Thanks for the example. I do have a feature on the list for Hazel 5 which would be lists and/or tables. This would allow you to define (or maybe read from a file) lists or tables which you can then reference in a rule. I think this would handle your case.

As I've been thinking about this feature recently, some questions:

- Would you need the lists only for a particular rule, a ruleset or for all rulesets across all folder?
- If you stored something like this in a file, what format would it be in?
- Besides "Contents contain match", are there other conditions or contexts you would want to use this?

Your question exposes a slightly more elaborate discussion, which I'll have below, after trying to just answer you question in the context of the original post.

[list=]
[*]If the list/table is to be used as input to a script (one example I mentioned), it would essentially replace the need for literal inputs. In that case I would likely only reference such a list in only one place. Being able to read from a list in a file would make reuse easy, but would make it overly complicated for single-use cases, where some kind of in-line entry would be easier. I'd really advocate for both.
[*]Storing in a file: Probably JSON format, with XML as a second choice. You'd probably want a structured format that can be easily and unambiguously parsed. JSON and XML allow that, but if (advanced) users every would want to interact with those files, JSON is a lot easier to deal with. An additional benefit is that Javascript embedded or external scripts would have little trouble utilizing the same files.
[*]Content contain match: The answer depends on the context of how the list/table is being used. One option is a an input list to a script. In that case, the values would be literals, and the question does not apply. For the other scenario, the list (or a reference to it) would essentially replace whatever goes in the expression box you have now. Considering all these "rule" entries currently answer a single boolean answer question, although possible binding tokens to values, there are two answers:
[list=]
[*]For operations that cannot bind a token, the list would be the equivalent of a series of "rule entries" each with the some operation, but a different expression. This could be "contains, starts with", etc. To make this clear, you could change the name of the operation(s) to, for example: ""starts with any of" etc. The name change would happen when the user activates whatever UI you provide to enter a list or reference to a list so that after it the rules read easier. It could, conceivable be even more functional if you allowed a choice between any/all here, but that makes the UI more difficult, and in some cases (e.g. starts with) "all of" does not make sense. I would think "any of" is the much more common case where this would be used.
[*]Where token binding is possible the list approach by itself makes little sense because you could not easily indicate how tokens would be bound, unless the list format would somehow allow for that. See other remarks below.
[/list]
[/list]

I will describe here something I have been tinkering with, even though it is an unfinished thought, as it might give you ideas.

I have found that, in many cases, lists of literals are sometimes really alternative for matching (mostly in a contains), but sometimes alternative variations of how a single text might vary due to poor OCR. The problem with the latter case is that is hard to predict what "poor" variations might be produced, so exhaustively listing them all is problematic. Often times, though, they could all be caught be some (regular) expression.

I frequently found myself creating fairly complex rulesets due to trying to handle variations. In addition, I almost always have the need to capture elements of what was matched into tokens. So I came up with an external script that can be given a JSON formatted input, and that produces the prerequisite hazel type response, possible with export tokens. The structure of the JSON input essentially captures an option to execute "any", "all", or "match" (single case), where "any" and "all" contain arrays that can, recursively contain the same. The script "executes" this and delivers a final result. Single "match" entries can be regular expressions with capture groups. Such capture groups, if anything was captured, are delivered back to Hazel in the final response.

The main "problem" with this at the moment is the inability to pass the reference to the JSON file or literal to the script, so I have to build a little wrapper around each case that makes it specific. If/When literal inputs to scripts become possible this restriction would go away.

Secondly, while the rest of the approach works to greatly simplify many rulesets, manually having to create the JSON files is somewhat cumbersome, as compared to having some kind of UI.

Your current functionality allows a somewhat regex like approach with your expressions such as <a> <a1> <...> etc. but they cannot handle certain constructs that would be needed very well. One such example is that a space in the expression matches any white space, including newlines. Sometimes you want a real single space, but you cannot express that. I've been able, in many cases to come up with somewhat unsafe work arounds using <...> instead of the space, but in some cases that too can be resulting in incorrect matches. So the question arises whether or not it might be useful to allow a true regular expression.

I can see that that might be too complex for many users, and you could keep the current options, but make the regex variation available through some subtle UI (for example, but not given much thought, holding down option while activating the function popup exposes not only "contains match", but also "contains regex match", etc.). What remains to be figured out with regexes is how to allow token binding.

The simplest thought here is to require the user uses "named capture groups" and have the names turn into tokens. A fairly simple scan of the regex would expose the names for use in the UI. You would otherwise treat the captures as you currently do token bindings (meaning do not bind if already bound earlier). A more extensive approach would scan the regex for any capture groups and provide some kind of (popup) UI to allow the user to name them (internally you would have to remember name to capture index mapping, taking into account nested croups).