Autofile from Hazel -> OCR (PDFPenPro) -> DevonThink

I've looked around and couldn't find a comprehensive way of taking a PDF, OCRing it, renaming it, and then auto-filing it into DevonTHINKPro. There are other postings that tell you how to rename a PDF to make it easier to import into DevonTHINK. But I could not find one that automatically took a file and put the PDF in the correct Group in DevonTHINK.
So, I built the following solution. To use, you need to have a copy of Hazel, PDFPenPro and DevonTHINKPro... plus the ability to edit a shell script.
Step 1
To use, you need to create the following directories in your home filesystem:
First create a directory called HazelAutomation, and add these subfolders under it.
HazelAutomation
1.HazelOCR
2.HazelSort
3.HazelProcessed
4.HazelBakup
To test, open a terminal session and type "cd ~/HazelAutomation" and it should go to the top-level directory that is needed. You should see only the four directories listed above in this folder.

Step 2
Setup Hazel to monitor the directory "1.HazelOCR". For any PDF file, it OCRs it (if needed) and then sends the file to the next step ("2.HazelSort").
We have two rules, one OCR PDFs that need it and the other to just move PDFs to the next step.

Rule 1: OCR Files, if needed - move to 2.HazeSort

Here is the code in each embedded script:
The first checks to see if the file needs to be OCRd:
This second will OCR the file. I use PDFPenPro to do this:
The second rule moves all remaining PDFs to "2.HazelSort":

Now all your files are in the "2.HazelSort" folder. Here is where the hard work occurs!
Step 3 - auto rename and auto file the PDF
Setup Hazel to watch the "2.HazelSort" folder. This is the hardest step, because you have to analyze each PDF to determine how to find the date and name of the bill. The key for each rule is to move a matched file to "3.HazelProcesssed". I always rename the files to "YYYY-MM-DD-billname.pdf"

Here is a sample rule:

You can find many instructions in this forum on how to do the above. Again, you need to setup one rule per PDF.
Step 4 - autofile to DevonThinik
This is the magic step!
Setup Hazel to watch "3.HazelProcessed"


This script will look at the name of the file (created in Step 3) and auto-file it into the correct group in DevonTHINK. Any file that it cannot auto-file is moved to the DevonTHINK global Inbox. When this script runs, it creates an error log at ~/HazelAutomation/log.txt that you can look to see how it is working.
Here is the content of the script.
Note that you need to edit the file a bit to match your setup
Find the variable Matches.
You need to replace the content of this to variable to match how you setup your DevonTHINK database. The variable is setup to have one line per file you want to auto-file.
The first part (before the "I") is the part that should match the name of the PDF File. It should be the same names that you used in Step 3 above.
The second part (after the "I") is the name of the DevonTHINK group you want the file to be sorted to. Note you can have sub-groups and spaces in the name are OK.
Next, find this line:
Change "PATH_TO_DEVONTHINK_DB/DevonTHINK/MyScans.dtBase2" to be the location of your DevonThink database and the real name. Don't use relative paths (e.g. ~/Document" but use "/Users/username/Document".
Last, find this line:
Change "/Users/USERNAME/Library/Application Support/DEVONthink Pro 2/Inbox" to be the path to your DevonThink Inbox.
When this script runs, it creates an error log at ~/HazelAutomation/log.txt that you can look to see how it is working.
That's it!
So, I built the following solution. To use, you need to have a copy of Hazel, PDFPenPro and DevonTHINKPro... plus the ability to edit a shell script.
Step 1
To use, you need to create the following directories in your home filesystem:
First create a directory called HazelAutomation, and add these subfolders under it.
HazelAutomation
1.HazelOCR
2.HazelSort
3.HazelProcessed
4.HazelBakup
To test, open a terminal session and type "cd ~/HazelAutomation" and it should go to the top-level directory that is needed. You should see only the four directories listed above in this folder.
Step 2
Setup Hazel to monitor the directory "1.HazelOCR". For any PDF file, it OCRs it (if needed) and then sends the file to the next step ("2.HazelSort").
We have two rules, one OCR PDFs that need it and the other to just move PDFs to the next step.
Rule 1: OCR Files, if needed - move to 2.HazeSort
Here is the code in each embedded script:
The first checks to see if the file needs to be OCRd:
- Code: Select all
if ! grep Font "$1"
then
exit 0
else
exit 1
fi
This second will OCR the file. I use PDFPenPro to do this:
- Code: Select all
tell application "PDFpenPro"
open theFile as alias
tell document 1
ocr
repeat while performing ocr
delay 1
end repeat
delay 1
close with saving
end tell
end tell
The second rule moves all remaining PDFs to "2.HazelSort":
Now all your files are in the "2.HazelSort" folder. Here is where the hard work occurs!
Step 3 - auto rename and auto file the PDF
Setup Hazel to watch the "2.HazelSort" folder. This is the hardest step, because you have to analyze each PDF to determine how to find the date and name of the bill. The key for each rule is to move a matched file to "3.HazelProcesssed". I always rename the files to "YYYY-MM-DD-billname.pdf"
Here is a sample rule:
You can find many instructions in this forum on how to do the above. Again, you need to setup one rule per PDF.
Step 4 - autofile to DevonThinik
This is the magic step!
Setup Hazel to watch "3.HazelProcessed"
This script will look at the name of the file (created in Step 3) and auto-file it into the correct group in DevonTHINK. Any file that it cannot auto-file is moved to the DevonTHINK global Inbox. When this script runs, it creates an error log at ~/HazelAutomation/log.txt that you can look to see how it is working.
Here is the content of the script.
- Code: Select all
#!/bin/sh
# This script moves files of to DevonThink folders automatically
# based on the $Matches variable.
#
# It looks for a match in the PDF File Name, and moves the file to
# the corresponding DevonTHINK Folder Name.
#
# If it does not find a match, it copies the file to the Inbox.
#
# In either case, it moves the file to the Backup folder
#
# use this pattern to match files: PDFFileName|DevonTHINKFolderName
Matches="Cleaners|/Utilities/Cleaners
Phone|/Utilities/Phone
Bank|/Finance/Bank
Cable|/Utilities/Cable
Community Giving|/Charity/Community Giving
Paystub|/Employment/My Employer"
IFS=$'\n' read -rd '' -a MatchNames <<<"$Matches"
FullPath=`echo "$(cd "$(dirname "$1")"; pwd)/$(basename "$1")"`
echo `date` > ~/HazelAutomation/log.txt
echo "Looking in [$1] for:" > ~/HazelAutomation/log.txt
j=0
for i in "${MatchNames[@]}"
do
IFS=$'|' read -rd '' -a GroupNames <<<"${MatchNames[$j]}"
MatchString=${GroupNames[0]}
FolderName=$(echo ${GroupNames[1]}) #use this to remove trailing new line character
echo $j "["$MatchString"]" >> ~/HazelAutomation/log.txt
echo "$1" | grep -q "$MatchString"
greprc=$?
if [[ $greprc -eq 0 ]] ; then
echo "Found:[$MatchString] in [$1] moving to [$FolderName]" >> ~/HazelAutomation/log.txt
Command='
tell application id "com.devon-technologies.thinkpro2"\n
launch\n
set theDatabase to open database "PATH_TO_DEVONTHINK_DB/DevonTHINK/MyScans.dtBase2"\n
set theGroup to create location "'$FolderName'" in theDatabase\n
import "'$FullPath'" to theGroup\n
end tell\n
'
echo $Command >> ~/HazelAutomation/log.txt
echo $Command | osascript
mv "$FullPath" ~/HazelAutomation/4.HazelBackup/.
exit
fi
let "j++"
done
echo "No match found for $1" >> ~/HazelAutomation/log.txt
cp "$FullPath" "/Users/USERNAME/Library/Application Support/DEVONthink Pro 2/Inbox/."
mv "$FullPath" ~/HazelAutomation/4.HazelBackup/.
exit
Note that you need to edit the file a bit to match your setup
Find the variable Matches.
- Code: Select all
Matches="Cleaners|/Utilities/Cleaners
Phone|/Utilities/Phone
Bank|/Finance/Bank
Cable|/Utilities/Cable
Community Giving|/Charity/Community Giving
Paystub|/Employment/My Employer"
You need to replace the content of this to variable to match how you setup your DevonTHINK database. The variable is setup to have one line per file you want to auto-file.
The first part (before the "I") is the part that should match the name of the PDF File. It should be the same names that you used in Step 3 above.
The second part (after the "I") is the name of the DevonTHINK group you want the file to be sorted to. Note you can have sub-groups and spaces in the name are OK.
Next, find this line:
- Code: Select all
set theDatabase to open database "PATH_TO_DEVONTHINK_DB/DevonTHINK/MyScans.dtBase2"\n
Change "PATH_TO_DEVONTHINK_DB/DevonTHINK/MyScans.dtBase2" to be the location of your DevonThink database and the real name. Don't use relative paths (e.g. ~/Document" but use "/Users/username/Document".
Last, find this line:
- Code: Select all
cp "$FullPath" "/Users/USERNAME/Library/Application Support/DEVONthink Pro 2/Inbox/."
Change "/Users/USERNAME/Library/Application Support/DEVONthink Pro 2/Inbox" to be the path to your DevonThink Inbox.
When this script runs, it creates an error log at ~/HazelAutomation/log.txt that you can look to see how it is working.
That's it!