Autofile from Hazel -> OCR (PDFPenPro) -> DevonThink

From your noodle to other noodles. Talk about ways to get the most from Hazel. Even exchange recipes for the cool rules you've thought up. DO NOT POST YOUR QUESTIONS HERE.

Moderators: Mr_Noodle, Moderators

Hi,

I’ve gotten this working just fine (the tip to change from /bin/bash to /bin/sh was key for me), but my concern is that the hard-coded paths to DEVONthink Groups (folders) make these Hazel rules/scripts rather fragile in terms of moving/renaming DEVONthink Groups.

Since DEVONthink now supports persistent URLs (moving/renaming Groups doesn’t change their URL), in the form of x-devonthink-item://, it would be great to update this shell script to use instead. Sadly, my shell scripting chops aren’t up to this task.

(It also seems like this change would make specifying the DEVONthink database unnecessary, wouldn’t it?)
tantramar
 
Posts: 16
Joined: Wed Dec 07, 2016 1:01 am
Location: New Brunswick, Canada

Actually, I am having issues with this shell script.

It seems to file items in random databases, despite having explicitly specified one.

My database names are all discrete (i.e. they’re not in any way similarly-named, so it shouldn’t be a case of the first match), and sometimes it goes into the correct one, but this is worse than having nothing, as I am having to hunt down and manually correct mis-filed items.

Anyone else experiencing this? Is it a case of the “open” database grabbing the incoming file despite the script telling it to open and use a different one? Is it related to DEVONthink Pro Office’s Import Destination settings (mine is set to “Global inbox”) overriding the script?
tantramar
 
Posts: 16
Joined: Wed Dec 07, 2016 1:01 am
Location: New Brunswick, Canada

I just stumbled on this while researching a bug, and it is very near to what I do but without shellscripting. I use Mac Automator Workflows instead. As DevonThink provide Automator actions, this might be easier for those less enamoured with unix scripting.

There are two options to do the DevonThink import: DT provide an import script (Import and Delete.scpt or something like that) which sucks the file in into the default inbox. This is set up with a bog standard OSX folder action (right-click folder -> Services -> Folder Actions Setup), and Hazel dumps unknown stuff into that folder after OCR.

For stuff that Hazel is set up to recognise, use OSX automator workflows for each DT folder you want to import into. You can then run these directly from Hazel. The automator workflows all have three steps and it ends up in the right folder. I also have a bunch of these running on my downloads folder for statements and the likes.

Image
MikeP
 
Posts: 6
Joined: Sat Feb 28, 2015 10:00 am

MikeP wrote:I just stumbled on this while researching a bug, and it is very near to what I do but without shellscripting. I use Mac Automator Workflows instead. As DevonThink provide Automator actions, this might be easier for those less enamoured with unix scripting.

There are two options to do the DevonThink import: DT provide an import script (Import and Delete.scpt or something like that) which sucks the file in into the default inbox. This is set up with a bog standard OSX folder action (right-click folder -> Services -> Folder Actions Setup), and Hazel dumps unknown stuff into that folder after OCR.

For stuff that Hazel is set up to recognise, use OSX automator workflows for each DT folder you want to import into. You can then run these directly from Hazel. The automator workflows all have three steps and it ends up in the right folder. I also have a bunch of these running on my downloads folder for statements and the likes.

Image


Hi MikeP,

an interesting possibility (since I also always have problems with the script, as soon as I want to add new filenames and paths).

However, I always get an error message in the last step:

"Add items to CurrentGroup failed
     The "Add items to Current Group" action did not have the required data
         No items to input

Does the Automator or DEVONthink still require any permissions or releases?

Did you have that too?

Maybe you have a solution

Best regards
Markus
AbwehrchefVC
 
Posts: 16
Joined: Sun Jan 08, 2017 9:46 am

I've just bought Devonthink so am about to set up this in a very similar manner. However there is (as always) another way of doing this and some may find it neater as it both checks for OCR and closes PDFPro once its completed.

The code is courtesy of Greg Scown, one of the PDFpen developers.

Code: Select all
tell application "PDFpenPro"
    open theFile as alias
    -- does the document need to be OCR'd?
    get the needs ocr of document 1
    if result is true then
        tell document 1
            ocr
            repeat while performing ocr
                delay 1
            end repeat
            delay 1
            close with saving
        end tell
        --In PDFpen, when no documents are open, window 1 is "Preferences"
        --If other documents are open, do not close the App.
        if name of window 1 is "Preferences" then
            tell application "PDFpenPro"
                quit
            end tell
        end if
    else
        -- Scan Doc was previously OCR'd or is already a text type PDF.
        tell document 1
            close without saving
        end tell
        --In PDFpen, when no documents are open, window 1 is "Preferences"
        --If other documents are open, do not close the App.
        if name of window 1 is "Preferences" then
            tell application "PDFpenPro"
                quit
            end tell
        end if
    end if
end tell
evansgo
 
Posts: 2
Joined: Sun May 15, 2016 7:14 am

I do something like this as inspired by MacSparky. OCR happens with ScanSnap X1500 and then organization winds up in a folder structure that is filled by the Hazel scripting.

I have seen suggestions for using DevonThink or EverNote, but I don't understand why. What is the motivation for using one of these apps vs. just using the file system and the Finder?
bmhardy
 
Posts: 7
Joined: Fri Feb 17, 2017 7:30 pm

bmhardy wrote:I have seen suggestions for using DevonThink or EverNote, but I don't understand why. What is the motivation for using one of these apps vs. just using the file system and the Finder?

I realise this is way too late for the original poster, but just in case someone else has the same question, here is a sort of answer.

DEVONthink is a heavyweight "storage and retrieval solution" that is really aimed at professional researchers who need to be able to put a lot of disparate files together, categorise them, and -- crucially -- search for and find related material. You can conduct searches that, for example, will look for a specific, single word occurring NEXT to another specific word, NEAR another word, or even within a certain number of words of another one (you can specify the max space between the words -- 10, 32, 4, whatever you like). Search is extremely fast. Using one of the special search terms, I can search a database containing over 4 million words and get a list of hits in less than half a second. The hits are ranked according to relevance, and you are given a kind of "See Also" list that shows related material. The program can also auto-group files with similar content into folders of related material if you don't want to do it by hand. However, it can only do that effectively if you have already done some work on sorting things by hand, so that it can tell what the common elements are. (This is rather a crude summary of a complex program.)

It is not something that most people need, but if, like me, you have notes and research material collected over 25 years, on psychology, history, and miscellaneous, it can be a great help in bringing order to the chaos.
mbbcam
 
Posts: 2
Joined: Wed Jan 16, 2013 3:04 pm

Hi,

I'm just getting started with Hazel and DT3. Right now, I'm playing with a trial of PDFpen and put in below 'script'...
It's running, removing the tag, and add the new tag to the file/s, but entirely ignore the PDFpen script?!

Anyone have an idea. Could this be the trial version of PDFpen not letting these scripts run due to a splash screen when the trial initially files?

Should this not 'run' in Hazel IF the scripts fail before adding the "readable" tag?

https://www.dropbox.com/s/tgkrrhgwnm6me ... l.pdf?dl=0

tell application "PDFpen"
   open theFile as alias
   tell document 1
      ocr
      repeat while performing ocr
         delay 1
      end repeat
      delay 1
      close with saving
   end tell
end tell




Cassady wrote:Wow!

Very useful - many thanks!

Bought Hazel 4 years ago, to convert all my unOCR'ed files into readable, ℅ PDFPen >> which (2400 pdf's later) paid for itself ten times over with the time it saved me.

Only stumbled across DTPO several months after that - and I've never looked back. But anyone using DTPO currently, and needing to do this in batch - the above will be an absolute lifesafer/life-changer! :D
supermuls
 
Posts: 1
Joined: Sat Jul 25, 2020 3:21 pm

Previous

Return to Tips & Tricks - DO NOT POST QUESTIONS

cron