Lastly, I want to thank a_freyer for all of his assistance, without which I never would have gotten this workflow together.
First of all, a bit of setup. I had been thinking of using Hazel to file my scans into the appropriate client's folder for some time, but the task of having Hazel run through the possible client names with all of the possible permutations (Firstname Lastname, Firstname MI Lastname, ...) seemed very difficult and very resource intensive. Then I came across a tidbit of information. The 1500 software has a function that allows you to highlight text and add them to the metadata as keywords. http://www.fujitsu.com/emea/products/scanners/faqs/how-to-use-the-scansnap-highlighter-feature-to-create-searchable-keywords.html. By highlighting the client's name, I can now just compare the keywords to the name of the client folders and determine the correct one.
For the folder setup, I have a folder called "Scanner" where the initial scans are sent. I also have a "Clients" folder where my client subfolders are kept by client name, type of matter, and date of incident. The exact form is [Lastname], [Firstname] [MI]., [Filetype] [Date of Incident], This is what the rules will expect to parse. You may have a different naming convention, so modify the rules as necessary. Also as a subfolder in clients I have a folder called "Needs To Be Filed" which is where the PDFs that fail to be filed are place by Hazel. The last bit of information to understand some of the scripts is that these folders all reside on a disk called Snow Leprd. When using the path name, Snow Leprd has to be properly escaped and written as "Snow\\ Leprd" within the path.
To start the workflow, the mail is scanned with the client's name highlighted. Then Hazel does her magic.
Here are the rules for my Scanner folder:
https://app.box.com/s/fmwso1l26f50p1x833j8
The first rule processed is the OCR File. This is the rule:
https://app.box.com/s/rpkjbgip53tgwcmtbyzr
And here is the applescript embedded in the rule:
https://app.box.com/s/43y9nokei0t5pfdjr48x
This is the code itself:
- Code: Select all
try
tell application "PDFpenPro 6"
open theFile as alias
tell document 1
ocr
repeat while performing ocr
delay 1
end repeat
delay 1
close with saving
end tell
end tell
on error
tell document 1 to close
tell application "PDFpenPro 6" to quit
--This captures the error so that a document isn't OCR'd ad infinitum.
end try
As you can see, the rule checks to see that the file is a PDF, checks to see that there is no "OCR" comment and then runs the applescript on it which uses PDFPenPro 6 to OCR the document. It then adds "OCR" to the comments to keep the document from being OCR'd again.
The next rule sets the Label color to red:
https://app.box.com/s/25snjuyjnc9bkzphu20j
This makes these documents obvious as new files so that they can be read by someone and acted upon. The next rules takes this and uses it as part of the criteria for moving the file to the clients folder:
https://app.box.com/s/98n43fwojc209hsq3bm8
The other rules are not part of the current workflow and were used in the past to manage documents in the Scanner folder and they were left as legacy rules.
Now over to the client folder. Here are the Client rules:
https://app.box.com/s/lvum66cy4at61wq33r1q
The first rule fires if the document is not able to be filed automatically because either there are no keywords to file with, or the filing attempt failed for some reason. If either of these two basic conditions is met, the file is put into the "To Be Filed" folder. This means that if there were no keywords (either no highlighting, or the SnapScan didn't pick it up properly) or the PDF has already been run through the rules once, and the filing didn't work this rule will sweep it into the To Be Filed folder. It comes first so that Hazel doesn't try to run a PDF without keywords through the filing scripts.
https://app.box.com/s/3qyh4ar1d76ng8a88fis
Next comes the Sort Filing Into Subfolder rule which is this:
https://app.box.com/s/dozy863ucomo5sofeuv1
This rule is taken straight from the a_freyer playbook. See his post Custom Matching 101: Pass Hazel Variables to Applescript http://www.noodlesoft.com/forums/viewtopic.php?f=3&t=1770&p=7294 for a thorough explanation. The first applescript is a long one and it is this:
- Code: Select all
tell application "Finder"
set posixPath to POSIX path of ((item theFile) as text)
end tell
try
set returnKeywords to (do shell script "mdls -name kMDItemKeywords " & quoted form of posixPath) as text
on error
tell application "System Events"
activate
display dialog "Shell script didn't work"
set comment of theFile to (comment of theFile & space & "Filing Failed")
end tell
end try
-- Now dig out the file keywords
try
set tid to AppleScript's text item delimiters
set AppleScript's text item delimiters to "\""
if ((number of text items of returnKeywords) > 1) then
set returnKeywords to text item 2 of returnKeywords as text
end if
-- Some initial seeding of variables
set AppleScript's text item delimiters to space
set empty to ""
set findFailed to ""
set Name1 to " "
set Name2 to " "
set Name3 to " "
-- Remove Extra Spaces. Sometimes the ScanSnap reads a long space as two spaces. This corrects for that.
set numberKeywords to number of text items in returnKeywords
set tempKeywords to ""
repeat with counter from 1 to numberKeywords
set compKeyword to text item counter of returnKeywords
if ((number of text of compKeyword) > 0) then
set tempKeywords to (tempKeywords & space & compKeyword)
end if
end repeat
set textCount to number of text of tempKeywords
set tempKeywords to text 2 thru textCount of tempKeywords
set returnKeywords to tempKeywords
on error
tell application "System Events"
activate
display dialog "Remove spaces didn't work"
end tell
set findFailed to "Find Failed"
end try
try
-- Remove unwanted punctuation. Tries to correct for some typos and over highlighting.
set legalCharacters to "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz,.- '"
set tempKeywords to ""
set numberKeywordsText to number of text of returnKeywords
repeat with counter from 1 to numberKeywordsText
if legalCharacters contains (text counter of returnKeywords) then
set compKeyword to (text counter of returnKeywords)
set tempKeywords to (tempKeywords & compKeyword)
end if
end repeat
set returnKeywords to tempKeywords
on error
tell application "System Events"
activate
display dialog "Remove unwanted punctuation didn't work"
end tell
set findFailed to "Find Failed"
end try
try
-- A final clean up. I would expect a person's name to be listed in one of four ways: 1. FirstName LastName 2. FirstName MiddleName/MI LastName 3. LastName, FirstName and 4. LastName, FirstName MiddleName/MI
--Therefore, I would not expect any puntuation at the end of a whole name except for a ",". If we are dealing with an initial, then I would expect a ".", so if there is a "." at the end of a whole name (longer than two characters with one of those characters being a ".") or any other punctuation, it should be removed. For a MI, if the there is any punctuation after except a "." then that punctuation should be removed
set illegalAtEndWholeName to ".-'"
set illegalAtEndMI to ",-'"
set tempKeywords to ""
set AppleScript's text item delimiters to " "
repeat with counter from 1 to numberKeywords
set compKeyword to (text item counter of returnKeywords)
set textNumber to (number of text of compKeyword)
set tempText to ""
if textNumber ≤ 2 then
repeat with textCounter from 1 to textNumber
set textTest to (text textCounter of compKeyword)
if illegalAtEndMI does not contain textTest then
set tempText to (tempText & textTest)
end if
end repeat
else if textNumber = 3 then
if text 2 of compKeyword is "." then
set compKeyword to text 1 thru 2 of compKeyword
end if
else
set textTest to text textNumber of compKeyword
if illegalAtEndWholeName does not contain textTest then
set tempText to compKeyword
else
set tempText to text 1 thru (textNumber - 1) of compKeyword
end if
end if
if counter < numberKeywords then set tempText to (tempText & space)
set tempKeywords to tempKeywords & tempText
end repeat
set returnKeywords to tempKeywords
on error
tell application "System Events"
activate
display dialog "Final cleanup didn't work"
end tell
set findFailed to "Find Failed"
end try
try
if (numberKeywords = 1) then
set Name1 to text item 1 of returnKeywords
set Name2 to " "
set Name3 to " "
else if (numberKeywords = 2) then
set Name1 to text item 1 of returnKeywords
set Name2 to text item 2 of returnKeywords
set Name3 to " "
else if (numberKeywords ≥ 3) then
set Name1 to text item 1 of returnKeywords
set Name2 to text item 2 of returnKeywords
set Name3 to text item 3 of returnKeywords
end if
on error
tell application "System Events"
activate
display dialog "Name Assignment didn't work. numberKeywords is " & numberKeywords
end tell
set findFailed to "Find Failed"
end try
set AppleScript's text item delimiters to tid
return {hazelExportTokens:{Name1:Name1, Name2:Name2, Name3:Name3, findFailed:findFailed}}
Remember you HAVE TO set export tokens for this script. Mine are Name1, Name2, Name3 & findFailed. I tried to comment the script sufficiently so that the next person could understand what I was doing.
The next portion of the rule is also from the a_freyer playbook. See his post Custom Matching 101: Sort Files into Alphabetical Folders http://www.noodlesoft.com/forums/viewtopic.php?f=3&t=1714&p=7083&hilit=101#p7083.
The other applescript is as follows:
- Code: Select all
tell application "Finder"
set tid to AppleScript's text item delimiters
set myHazelTokenDelimiters to "|"
set theListOfCustomTokens to name of theFile
set AppleScript's text item delimiters to {myHazelTokenDelimiters}
set Name1 to (text item 1 of theListOfCustomTokens)
set Name2 to (text item 2 of theListOfCustomTokens)
set Name3 to (text item 3 of theListOfCustomTokens)
end tell
-- In case of error, we want to pass the name of the failed to file folder
set sortFolder to "// Needs To Be Filed"
try
set showFolders to ((do shell script "find /Volumes/Snow\\ Leprd/Data/Clients/ -maxdepth 1 -mindepth 1 -type d -name \"*\" -print") as text)
on error
tell application "System Events"
activate
display dialog "Find Failed" & (theFile as text)
end tell
end try
try
set AppleScript's text item delimiters to space
if (Name2 is not " ") then
if (Name3 is not " ") then
set compNames to (Name1 & space & Name2 & space & Name3)
else
set compNames to (Name2 & ", " & Name1)
end if
repeat with counter from 1 to the number of paragraphs in showFolders
if (paragraph counter of showFolders contains compNames) then
-- The next if then statement tests whether there are two folders with the same client name. If there are, the test fails and goes to the else statement.
if (paragraph (counter + 1) of showFolders does not start with (text items 1 thru 3 of paragraph counter of showFolders)) then
if (paragraph (counter - 1) of showFolders does not start with (text items 1 thru 3 of paragraph counter of showFolders)) then
set sortFolder to paragraph counter of showFolders
-- If the file fails this test then The file is deemed to be unsortable since there are more than one folder that has the names that are in the keyword. This prevents the situation where there are two clients "Doe, Jane A." and "Doe, Jane L." or the situation where there is only one client with that name, but two separate matters. In the future, there may be logic to search the OCR'd document to see if there is enough differentiation to pick one folder over the other for filing. For now, human intervention will come into play. This is the 80/20 rule in action. The // is added because find returns a path that has // before the final folder. I don't know if this happens from computer to computer, but it has been consistent on mine so I have taken it into account. sortfolder will remain "//1 Needs To Be Filed"
end if
end if
end if
end repeat
end if
-- Because the // appears just prior to the subfolder I would like to file into, it makes a convenient way to get just the name of that subfolder.
set AppleScript's text item delimiters to "//"
set sortFolder to (text item 2 of sortFolder) as text
on error
tell application "System Events"
activate
display dialog "Sort Failed"
end tell
end try
set AppleScript's text item delimiters to tid
return {hazelExportTokens:{sortFolder:sortFolder}}
Again, I have tried to comment the script to be self-explanatory.
As some may have noticed, I have the OCR rule again in the Clients folder. This is not part of the paperless workflow, rather it catches those documents that are emailed as PDFs and are then placed into the correct client folder.
I will keep an eye on this and answer as many questions as I can. Also, if something needs a better explanation, please let me know and I will edit the post.