Flaw in MacSparky paperless workflow - Hazel feature request

Greetings from Sydney, Australia !

After hearing about the MacSparky paperless workflow using Hazel I am keen to introduce this for myself but I feel there is a flaw in the system which is discouraging me from getting started.

What’s wrong with the MacSparky workflow ?

Assume you are scanning some old papers which you wish to store as pdf’s.  When Hazel files this document in a date based file structure the pdf will have a ‘creation date’ set to the current date instead of the date on which the original paper document was produced which could have been some months or years previous.

Apart from the difficulties with ‘old’ papers, the MacSparky workflow as it stands requires scanning papers in the same month in which you want them filed so that Hazel will put them in the correct place. That is, if you want the pdf to be filed in a 'Jan' folder you have to scan the paper in January. If you don’t receive the paper, say a bank statement, until some time in February and then scan, it will end up in the 'Feb' folder. Similar problems occur at year end when pdf’s will end up in the next years folder instead.

My suggested solution

Ideally we need a Hazel action which adds another ‘custom date’ to the file which could then be used for sorting instead of using the ‘creation date’. We also need some way of easily entering this date into the fiile either at the time it is created, or later when manually renaming the file prior to filing via Hazel. The file name is the most obvious place for date entry since it is the easiest to access through the finder.

The workflow would then look like this :

1. Scan the paper and create the pdf

2. As each pdf is saved, name (or later rename) each pdf with a filename format
‘CN Statement_20090630_Qtr2 gas bill.pdf’ (or similar)

- ‘CN Statement’ is your code used for Hazel filing
- 30th June 2009 is the date on the paper 
- ‘Qtr2 gas bill’ is any further information you require

3. The pdf is then saved to a Hazel watch folder somewhere on the Desktop

4. Hazel parses the date from the filename, sets the ‘custom date’ and then moves and sorts the file based on this date.

Ideas and Issues

• Using the filename this way has the advantage of total accessability in the finder.  We can easliy embed the ‘custom date’ any time we save or rename a file.

• Text expansion tools could also help with file naming. The expansion could include todays date in the correct format which would then easily be changed to the required 'custom date' before saving.

• The filename parsing should be able to use a wide range of delimiters to parse the date from any position in the filename including at the end
ie. ‘CN Statement Qtr2 Gas Bill 20090630.pdf’ should also work

• Maybe the ‘custom date’ could be stored in the Spotlight Comment field or as an OpenMeta tag so it could also be used in other programs and searches.

• Maybe we could have a number of ‘custom dates’ which could be set in Hazel and used elsewhere as tickler dates etc.

• Maybe the parsing should be more general so that we can embed any words in the filename, have them parsed and then set comments or tags as desired.

• I found some parsing Applescripts here http://www.macosxhints.com/article.php? ... 3160750583 (way down the post) which reset the creation date. While this worked after a fashion, interactions between ‘modification dates’ and ‘creation dates’ in OSX cause confusion and if we change a ‘creation date’ we can no longer use it to find for example, all the files I scanned last tuesday. Therefore I think custom dates are the way to go.

Finally

Hazel is a great solution and I feel these changes would add greatly to the versatility of Hazel and overcome the current problem with the MacSparky workflow.

What does everyone else think, or is there an alternative solution ?

PJA

I'm still unclear on where the actual date is coming from. Are you OCR'ing it from the paper doc? Hazel can parse the date into its parts (look up "match patterns" and "custom tokens" in the help).

Sorry for the confusion,

Yes I want to get the date from the filename. I tried the pattern matching with custom tokens with little success. While the custom tokens are very powerful they are a bit difficult for the average person such as myself. I need a simpler UI for date patterns.

Given that dates are fairly common, could we not have a [Match Date] token as one of the standard token types in the pattern matches. This token could then be edited using the existing date pattern editor as used for 'date modified' etc.

The rule would then look like :

Name matches [...][Match Date][...] (with Match Date in whatever format is selected in the date pattern editor)
Sort into Subfolders with pattern [Sort Date] (Sort Date is the same date as Match date - see *Note below)

*Note - that while the [Match Date] and [Sort Date] would be the same date, the pattern may need to be different
so that the filename could contain a date in say YYYMMDD format and still end up with subfolders in 2010 > January format.
Again this could be set using the date pattern editor.

PJA

It's an interesting idea and similar to something I had in mind to deal with a case where someone wanted to parse in a date in numerical form but spit it out in text form (12 -> Dec). Since the match token would have it's down date format separate from where it is used in the rename/sort patterns, what you are asking for would work.

I'll amend the current issue in the feature database. At this point, probably a 3.0 thing so it may be a while.

Thank you Mr Noodlesoft I think this would be a great enhancement.

Can you give any indication of when we could expect to see v3.0.
Are we talking weeks, months, years ?

While I realise it is difficult to make commitments, I am sure many users would like some indication of not only the timing but also some preview of new features which might be included.

Perhaps a new post of v3.0 progress would keep us entertained and help avoid posts requesting features that you have already considered.

Thanks again

PJA

I can't say right now. Definitely not weeks. Maybe a beta in many months? I'm trying to make progress on a different product plus dealing with the possibility of another 2.x maintenance release is sucking away time. I suggest checking the beta forum for any updates on this front.

I also can't commit to any features as of yet. It will involve a rewrite of a chunk of the main engine though. There's a good chance it will require Snow Leopard.

And don't worry about duplicate requests (though you may want to do a forum search). I do use the requests as an indication of demand for certain features. The requests are particularly valuable if you can provide details on specific workflows and use cases so please provide those details if you can.

Mr_Noodle wrote:There's a good chance it will require Snow Leopard.

Bummer. I'm still satisfied running Leopard on my mini (Early 2009), though might reluctantly "upgrade" it to SL after 10.6.3 is released.

Came across this post in a Google search...

Has this been addressed in 3.0 as described, i.e. a better UI for pulling out dates from the OCR'd text in the PDF?

That seems like a different issue. Search the forums as there are some scripts for doing this type of thing.