Shell script for ocrmypdf stopped working

I have a Hazel rule that has recently stopped working. This may coincide with a recent Hazel update, or it may be related to an update to a dependent command line tool. The rule watches a folder, and when a new file is added, runs an embedded shell script that
1. Runs the cli program "ocrmypdf" on the file
2. Outputs the resulting OCR'd file to a different directory
3. Tags the resulting file with another cli program, "tag"

Here is the embedded script (run from ZSH):

Code: Select all: PATH=$PATH:/opt/homebrew/bin export PATH filename=$(basename "$1") filename=${filename%.*} converting_directory=~/Documents/processed/ converting_filename="$filename.pdf" if ocrmypdf --clean --rotate-pages "$1" "$converting_directory""$converting_filename" then tag -a "pdf/a,scan" "$converting_directory""$converting_filename" rm $1 echo "OCR succeeded" else echo "OCR failed" fi

When I run this script on a PDF outside of Hazel, it works. However, now when triggered via Hazel it looks like it's failing on the "ocrmypdf" command. The logs show that matching happens as expected, and the script definitely gets called but the OCR step fails.

This is the relevant bit from the logs

Code: Select all: 2024-01-07 17:01:24.037 hazelworker[49612] DEBUG: About to process directory /Users/sam/Documents/inbox/hazel-scan-ocr 2024-01-07 17:01:24.042 hazelworker[49612] 20240107_151957.pdf: Rule Process Scans matched. 2024-01-07 17:01:24.042 hazelworker[49612] DEBUG: New rule signature. Executing actions. Old signatures: ( ) New Signature:{dateAdded >[cd] dateMatched}:{(shellscript:/opt/homebrew/bin/zsh:0012fb12ac317454ed0977ff22fc908e,{ })} 2024-01-07 17:01:24.426 hazelworker[49612] DEBUG: == script output == 1 [tesseract] Leptonica Error in fopenReadStream: file not found: 000001_rasterize_preview.jpg 1 [tesseract] Leptonica Error in findFileFormat: image file not found: /tmp/ocrmypdf.io.mvzd3qav/000001_rasterize_preview.jpg 1 [tesseract] Leptonica Error in fopenReadStream: file not found: 1 [tesseract] Leptonica Error in pixRead: image file not found: 1 [tesseract] Image file cannot be read! 1 [tesseract] Error during processing. SubprocessOutputError OCR failed == End script output ==

It looks like the ocrmypdf program isn't finding the source file, but I'm not sure what the problem is or why it was working previously but stopped. Hoping someone with more scripting knowledge can point me in the right direction.

OS X Version 13.6.3 (Build 22G436)
Hazel Version 5.3.1 (Build 2371)

You should try printing out different things (like the arguments passed in) to make sure everything is as you expect. It should show up in the logs.

When I print the arguments I don't see anything unexpected. I've tried reducing the script to a simpler form

Code: Select all: PATH=$PATH:/opt/homebrew/bin export PATH filename=$(basename "$1") converting_directory=~/Documents/processed/ echo $converting_directory # /Users/sam/Documents/processed/ echo $filename # test.pdf echo "$1" # /Users/sam/Documents/inbox/hazel-scan-ocr/test.pdf ocrmypdf --no-progress-bar "$1" "$converting_directory""$filename"

I also tried recreating the script in an external file and calling it from Hazel, but both attempts result in the same error.

Code: Select all: New Signature:{dateAdded >[cd] dateMatched}:{(shellscript:/opt/homebrew/bin/zsh:cba2c281810ecaa9d147a8c2dd7d34ad,{ })} 2024-01-08 12:52:32.230 hazelworker[60993] DEBUG: == script output == /Users/sam/Documents/processed/ test.pdf /Users/sam/Documents/inbox/hazel-scan-ocr/test.pdf 1 [tesseract] Leptonica Error in fopenReadStream: file not found: 000001_ocr.png 1 [tesseract] Leptonica Error in findFileFormat: image file not found: /tmp/ocrmypdf.io.u1oblfdh/000001_ocr.png 1 [tesseract] Leptonica Error in fopenReadStream: file not found: PNG 1 [tesseract] Leptonica Error in pixRead: image file not found: PNG 1 [tesseract] Image file PNG cannot be read! 1 [tesseract] Error during processing. SubprocessOutputError == End script output ==

The fact that I can call the script directly from the terminal and it runs without error makes me think there's a pathing issue, but I'm not sure what else to try. In Hazel I have the shell set to /opt/homebrew/bin/zsh which is the output of $ which zsh.

What is 000001_ocr.png? That is the file the program is complaining about.

Note that running things in Terminal is a different environment from running them in Hazel. You cannot assume any env vars are set and need to set those in your script.

000001_ocr.png is a temporary file created by Ghostscript as part of the OCR process. When I ran this issue by the OCRmyPDF creator their response was

My guess would be that macOS or Hazel is virtualizing /tmp in some way.
The file that is missing is supposed to be created by Ghostscript. (The gs -dQUIET command above tesseract.) That command completes without error, but the expected file is not created, so it's putting it somewhere else.

You could try setting an environment variable TMPDIR to redirect it to somewhere like /Users/you/tmp since I think that is less likely to be defeated.

After explicitly setting the TMPDIR environment variable as suggested, the script is working again. Here is the working version

Code: Select all: PATH=$PATH:/opt/homebrew/bin export PATH export TMPDIR="/Users/sam/tmp/" filename=$(basename "$1") converting_directory=~/Documents/processed/ if ocrmypdf --clean --rotate-pages --no-progress-bar -v 1 "$1" "$converting_directory""$filename" then tag -a "pdf/a,scan" "$converting_directory""$filename" rm $1 echo "OCR succeeded" else echo "OCR failed" fi

So at this point I guess my remaining questions are
1. Is this an appropriate solution for when Hazel needs to know about TMPDIR, and
2. What changed to make this solution necessary? How was Hazel able to access TMPDIR in the past?

That would be the way to work around it. Is TMPDIR set when in Terminal?

Yes, when I echo $TMPDIR in Terminal it returns "/var/folders/blah/blah", but when I echo $TMPDIR in the Hazel embedded script it returns nothing.

So that would be a case of something you have to explicitly define in Hazel's environment.

Makes sense! Not sure how it was working before, but happy to have a solution. Thank you for the guidance.