Using Hazel to File PDFs Scanned with a SnapScan 1500

From your noodle to other noodles. Talk about ways to get the most from Hazel. Even exchange recipes for the cool rules you've thought up. DO NOT POST YOUR QUESTIONS HERE.

Moderators: Mr_Noodle, Moderators

I believe that this workflow is finally working properly, so I am ready to share it with the world. I am an attorney with several hundred clients with document intensive cases. I have pretty well reached my goal of a Paperless Office, but my biggest problem has been turning snail mail into online documents that I can deal with. Due to the volume of mail, this was a chore that took a huge portion of my assistant's day just to get filed in the appropriate client file. I used the 80/20 rule in putting this workflow together, that is I expect it to handle about 80% of what is scanned, and we have to deal with the other 20% manually. If anyone comes up with improvements on this, please add to this post.

Lastly, I want to thank a_freyer for all of his assistance, without which I never would have gotten this workflow together.

First of all, a bit of setup. I had been thinking of using Hazel to file my scans into the appropriate client's folder for some time, but the task of having Hazel run through the possible client names with all of the possible permutations (Firstname Lastname, Firstname MI Lastname, ...) seemed very difficult and very resource intensive. Then I came across a tidbit of information. The 1500 software has a function that allows you to highlight text and add them to the metadata as keywords. http://www.fujitsu.com/emea/products/scanners/faqs/how-to-use-the-scansnap-highlighter-feature-to-create-searchable-keywords.html. By highlighting the client's name, I can now just compare the keywords to the name of the client folders and determine the correct one.

For the folder setup, I have a folder called "Scanner" where the initial scans are sent. I also have a "Clients" folder where my client subfolders are kept by client name, type of matter, and date of incident. The exact form is [Lastname], [Firstname] [MI]., [Filetype] [Date of Incident], This is what the rules will expect to parse. You may have a different naming convention, so modify the rules as necessary. Also as a subfolder in clients I have a folder called "Needs To Be Filed" which is where the PDFs that fail to be filed are place by Hazel. The last bit of information to understand some of the scripts is that these folders all reside on a disk called Snow Leprd. When using the path name, Snow Leprd has to be properly escaped and written as "Snow\\ Leprd" within the path.

To start the workflow, the mail is scanned with the client's name highlighted. Then Hazel does her magic.

Here are the rules for my Scanner folder:
https://app.box.com/s/fmwso1l26f50p1x833j8
The first rule processed is the OCR File. This is the rule:
https://app.box.com/s/rpkjbgip53tgwcmtbyzr
And here is the applescript embedded in the rule:
https://app.box.com/s/43y9nokei0t5pfdjr48x
This is the code itself:
Code: Select all
try
   tell application "PDFpenPro 6"
      open theFile as alias
      tell document 1
         ocr
         repeat while performing ocr
            delay 1
         end repeat
         delay 1
         close with saving
      end tell
      
   end tell
   
on error
   tell document 1 to close
   tell application "PDFpenPro 6" to quit
   --This captures the error so that a document isn't OCR'd ad infinitum.
end try


As you can see, the rule checks to see that the file is a PDF, checks to see that there is no "OCR" comment and then runs the applescript on it which uses PDFPenPro 6 to OCR the document. It then adds "OCR" to the comments to keep the document from being OCR'd again.

The next rule sets the Label color to red:
https://app.box.com/s/25snjuyjnc9bkzphu20j
This makes these documents obvious as new files so that they can be read by someone and acted upon. The next rules takes this and uses it as part of the criteria for moving the file to the clients folder:
https://app.box.com/s/98n43fwojc209hsq3bm8
The other rules are not part of the current workflow and were used in the past to manage documents in the Scanner folder and they were left as legacy rules.

Now over to the client folder. Here are the Client rules:
https://app.box.com/s/lvum66cy4at61wq33r1q
The first rule fires if the document is not able to be filed automatically because either there are no keywords to file with, or the filing attempt failed for some reason. If either of these two basic conditions is met, the file is put into the "To Be Filed" folder. This means that if there were no keywords (either no highlighting, or the SnapScan didn't pick it up properly) or the PDF has already been run through the rules once, and the filing didn't work this rule will sweep it into the To Be Filed folder. It comes first so that Hazel doesn't try to run a PDF without keywords through the filing scripts.
https://app.box.com/s/3qyh4ar1d76ng8a88fis
Next comes the Sort Filing Into Subfolder rule which is this:
https://app.box.com/s/dozy863ucomo5sofeuv1
This rule is taken straight from the a_freyer playbook. See his post Custom Matching 101: Pass Hazel Variables to Applescript http://www.noodlesoft.com/forums/viewtopic.php?f=3&t=1770&p=7294 for a thorough explanation. The first applescript is a long one and it is this:
Code: Select all
tell application "Finder"
   
   set posixPath to POSIX path of ((item theFile) as text)
   
end tell


try
   set returnKeywords to (do shell script "mdls -name kMDItemKeywords " & quoted form of posixPath) as text
   
   
on error
   
   tell application "System Events"
      activate
      display dialog "Shell script didn't work"
      set comment of theFile to (comment of theFile & space & "Filing Failed")
   end tell
   
   
end try


-- Now dig out the file keywords
try
   set tid to AppleScript's text item delimiters
   set AppleScript's text item delimiters to "\""
   if ((number of text items of returnKeywords) > 1) then
      set returnKeywords to text item 2 of returnKeywords as text
   end if
   --   Some initial seeding of variables   
   set AppleScript's text item delimiters to space
   set empty to ""
   set findFailed to ""
   set Name1 to " "
   set Name2 to " "
   set Name3 to " "
   
   
   --   Remove Extra Spaces. Sometimes the ScanSnap reads a long space as two spaces. This corrects for that.
   set numberKeywords to number of text items in returnKeywords
   set tempKeywords to ""
   
   repeat with counter from 1 to numberKeywords
      
      set compKeyword to text item counter of returnKeywords
      
      if ((number of text of compKeyword) > 0) then
         
         set tempKeywords to (tempKeywords & space & compKeyword)
         
      end if
      
   end repeat
   
   set textCount to number of text of tempKeywords
   set tempKeywords to text 2 thru textCount of tempKeywords
   
   set returnKeywords to tempKeywords
   
on error
   
   tell application "System Events"
      activate
      display dialog "Remove spaces didn't work"
   end tell
   
   set findFailed to "Find Failed"
   
end try


try
   --   Remove unwanted punctuation. Tries to correct for some typos and over highlighting.
   set legalCharacters to "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz,.- '"
   set tempKeywords to ""
   set numberKeywordsText to number of text of returnKeywords
   
   repeat with counter from 1 to numberKeywordsText
      
      if legalCharacters contains (text counter of returnKeywords) then
         
         set compKeyword to (text counter of returnKeywords)
         set tempKeywords to (tempKeywords & compKeyword)
         
      end if
      
   end repeat
   
   set returnKeywords to tempKeywords
   
on error
   
   tell application "System Events"
      activate
      display dialog "Remove unwanted punctuation didn't work"
   end tell
   
   set findFailed to "Find Failed"
   
end try


try
   -- A final clean up. I would expect a person's name to be listed in one of four ways: 1. FirstName LastName 2. FirstName MiddleName/MI LastName 3. LastName, FirstName and 4. LastName, FirstName MiddleName/MI
   --Therefore, I would not expect any puntuation at the end of a whole name except for a ",". If we are dealing with an initial, then I would expect a ".", so if there is a "." at the end of a whole name (longer than two characters with one of those characters being a ".") or any other punctuation, it should be removed. For a MI, if the there is any punctuation after except a "." then that punctuation should be removed
   set illegalAtEndWholeName to ".-'"
   set illegalAtEndMI to ",-'"
   set tempKeywords to ""
   set AppleScript's text item delimiters to " "
   
   repeat with counter from 1 to numberKeywords
      
      set compKeyword to (text item counter of returnKeywords)
      set textNumber to (number of text of compKeyword)
      set tempText to ""
      
      if textNumber ≤ 2 then
         
         repeat with textCounter from 1 to textNumber
            
            set textTest to (text textCounter of compKeyword)
            
            if illegalAtEndMI does not contain textTest then
               set tempText to (tempText & textTest)
            end if
            
         end repeat
         
      else if textNumber = 3 then
         
         if text 2 of compKeyword is "." then
            set compKeyword to text 1 thru 2 of compKeyword
         end if
         
      else
         
         set textTest to text textNumber of compKeyword
         
         if illegalAtEndWholeName does not contain textTest then
            set tempText to compKeyword
            
         else
            
            set tempText to text 1 thru (textNumber - 1) of compKeyword
            
         end if
         
      end if
      
      if counter < numberKeywords then set tempText to (tempText & space)
      set tempKeywords to tempKeywords & tempText
      
   end repeat
   
   set returnKeywords to tempKeywords
   
on error
   
   tell application "System Events"
      activate
      display dialog "Final cleanup didn't work"
   end tell
   
   set findFailed to "Find Failed"
   
end try

try
   if (numberKeywords = 1) then
      set Name1 to text item 1 of returnKeywords
      set Name2 to " "
      set Name3 to " "
   else if (numberKeywords = 2) then
      set Name1 to text item 1 of returnKeywords
      set Name2 to text item 2 of returnKeywords
      set Name3 to " "
   else if (numberKeywords ≥ 3) then
      set Name1 to text item 1 of returnKeywords
      set Name2 to text item 2 of returnKeywords
      set Name3 to text item 3 of returnKeywords
   end if
   
on error
   
   tell application "System Events"
      activate
      display dialog "Name Assignment didn't work. numberKeywords is " & numberKeywords
   end tell
   
   set findFailed to "Find Failed"
   
   
end try

set AppleScript's text item delimiters to tid
return {hazelExportTokens:{Name1:Name1, Name2:Name2, Name3:Name3, findFailed:findFailed}}


Remember you HAVE TO set export tokens for this script. Mine are Name1, Name2, Name3 & findFailed. I tried to comment the script sufficiently so that the next person could understand what I was doing.

The next portion of the rule is also from the a_freyer playbook. See his post Custom Matching 101: Sort Files into Alphabetical Folders http://www.noodlesoft.com/forums/viewtopic.php?f=3&t=1714&p=7083&hilit=101#p7083.

The other applescript is as follows:
Code: Select all
tell application "Finder"
   set tid to AppleScript's text item delimiters
   set myHazelTokenDelimiters to "|"
   set theListOfCustomTokens to name of theFile
   set AppleScript's text item delimiters to {myHazelTokenDelimiters}
   
   set Name1 to (text item 1 of theListOfCustomTokens)
   set Name2 to (text item 2 of theListOfCustomTokens)
   set Name3 to (text item 3 of theListOfCustomTokens)
end tell

--   In case of error, we want to pass the name of the failed to file folder
set sortFolder to "// Needs To Be Filed"

try
   
   set showFolders to ((do shell script "find /Volumes/Snow\\ Leprd/Data/Clients/ -maxdepth 1 -mindepth 1 -type d -name \"*\" -print") as text)
   
on error
   
   tell application "System Events"
      activate
      display dialog "Find Failed" & (theFile as text)
   end tell
   
end try

try
   set AppleScript's text item delimiters to space
   if (Name2 is not " ") then
      
      if (Name3 is not " ") then
         
         set compNames to (Name1 & space & Name2 & space & Name3)
         
      else
         
         set compNames to (Name2 & ", " & Name1)
         
      end if
      
      repeat with counter from 1 to the number of paragraphs in showFolders
         
         
         if (paragraph counter of showFolders contains compNames) then
            
            -- The next if then statement tests whether there are two folders with the same client name. If there are, the test fails and goes to the else statement.
            if (paragraph (counter + 1) of showFolders does not start with (text items 1 thru 3 of paragraph counter of showFolders)) then
               
               if (paragraph (counter - 1) of showFolders does not start with (text items 1 thru 3 of paragraph counter of showFolders)) then
                  
                  set sortFolder to paragraph counter of showFolders
                  
                  
                  --   If the file fails this test then The file is deemed to be unsortable since there are  more than one folder that has the names that are in the keyword. This prevents the situation where there are two clients "Doe, Jane A." and "Doe, Jane L." or the situation where there is only one client with that name, but two separate matters. In the future, there may be logic to search the OCR'd document to see if there is enough differentiation to pick one folder over the other for filing. For now, human intervention will come into play. This is the 80/20 rule in action. The // is added because find returns a path that has // before the final folder. I don't know if this happens from computer to computer, but it has been consistent on mine so I have taken it into account. sortfolder will remain "//1 Needs To Be Filed"
                  
               end if
               
            end if
            
         end if
         
      end repeat
      
   end if
   
   -- Because the // appears just prior to the subfolder I would like to file into, it makes a convenient way to get just the name of that subfolder.
   set AppleScript's text item delimiters to "//"
   set sortFolder to (text item 2 of sortFolder) as text
   
on error
   tell application "System Events"
      activate
      display dialog "Sort Failed"
   end tell
end try

set AppleScript's text item delimiters to tid

return {hazelExportTokens:{sortFolder:sortFolder}}

Again, I have tried to comment the script to be self-explanatory.

As some may have noticed, I have the OCR rule again in the Clients folder. This is not part of the paperless workflow, rather it catches those documents that are emailed as PDFs and are then placed into the correct client folder.

I will keep an eye on this and answer as many questions as I can. Also, if something needs a better explanation, please let me know and I will edit the post.
Last edited by Bryan on Tue Jan 31, 2017 9:47 pm, edited 2 times in total.
Bryan
 
Posts: 25
Joined: Wed Jan 11, 2012 4:34 pm
Location: Maryland

For a further discussion on the either or condition in the first Clients folder rule, please see my discussion here: http://www.noodlesoft.com/forums/viewtopic.php?f=4&t=2293&p=9706#p9706
Bryan
 
Posts: 25
Joined: Wed Jan 11, 2012 4:34 pm
Location: Maryland

Link to Fujistu topic broken.
Should be http://uk.scansnapcommunity.com/tips-tr ... eywords-3/
Jcupak
 
Posts: 1
Joined: Wed May 04, 2016 11:41 am


Return to Tips & Tricks - DO NOT POST QUESTIONS