Page 1 of 1

Tip, working with big file databases

PostPosted: Sat Jul 13, 2019 4:01 am
by Robert
So I had recently a few problems regarding tagging a database of many, many PDFs and having those (30) tag-rules continuously checking the whole database. (see f.e. here, for the tagging workflow see here)

My solution now is that I have two smartfolders.
First smart-folder: Here I have all new PDFs, all recently Added, all recently changed, the PDFs I had worked on exactly one week ago, exactly one month ago and exactly one year ago. On a normal day there are about 20 PDFs inside this smart folder.
Second smart-folder: Here are all PDFs which contain the Tag: "Tagging"

Both folders have certain Hazel rules applied to them:
The first smart-folder has rules that at a certain time a day (three times on a normal working day) PDFs get tagged with the tag "tagging" – I use the "after" and "before" a certain time option as it seems to be more reliable. In a timeframe of 10 minutes these files get tagged and so appear in the second smart folder, where all the heavy content checking, tagging, ocr stuff happens. After 10 minutes the tag "Tagging" is deleted (so I have a rule which matches at all other times then the 10 minute frames where the tag should be added).

With this setup hazel is not constantly checking the pdfs and the cpu is not at its limit. And I get my tagging working randomly on all new files three times a day and sporadicly checking old files if a certain tag should be applied.
If I need certain files to be checked I simply add the tag "Tagging" to them, wherever they are on my macbook and they are checked.

That works for me since the last two days :) hope that helps another tagging fanatic.