Python script failing

Get help. Get answers. Let others lend you a hand.

Moderator: Mr_Noodle

Python script failing Mon Nov 04, 2024 5:38 pm • by RiseUp
Hey guys,

I have built a Python script to call OpenAI to help me rename PDF files after OCR-ing them. The script works like a charm in Terminal, but does absolutely nothing when I include it in Hazel with /opt/homebrew/bin/python3 /path/to/my/script.py "$1". I have tried other Python scripts to test the general ability of Hazel to use my Python installation: that worked. The debugger mentioned some issue with tesseract, but I couldn't find out what exactly. So something must be wrong with my script/ how Hazel deals with it. Any ideas?

EDIT: I had similar issues with Automator, but could solve the problem by adding the "export PATH=[…]" command right before the script execution command. However, this does not seem to resolve it in Hazel. So the script works now in Terminal, it works in Automator, but NOT in Hazel. What's the issue??

The script:

import os
import sys
from openai import OpenAI
from PyPDF2 import PdfReader
from pdf2image import convert_from_path
import pytesseract
import datetime

# OpenAI Client-Instanz erstellen
client = OpenAI(api_key='MYKEY')

# Überprüfen, ob ein Dateipfad als Argument übergeben wurde
if len(sys.argv) < 2:
print("Fehler: Kein Dateipfad übergeben.")
sys.exit(1)

# Dateipfad von Hazel empfangen
file_path = sys.argv[1]

# Funktion, um OCR-Text aus einer PDF-Datei zu extrahieren
def extract_text_from_pdf(file_path):
pages = convert_from_path(file_path, 300)
full_text = ""
for page in pages:
text = pytesseract.image_to_string(page, lang='deu')
full_text += text + "\n"
return full_text

# Funktion zur Analyse des Texts mit OpenAI und Bestimmung des Dateinamens
def analyze_and_generate_filename(text):
completion = client.completions.create(
model="gpt-3.5-turbo-instruct",
prompt=f"Analysiere den folgenden Text und gib den Absender (max. 3 Wörter), den Betreff (max. 5 Wörter) und den Dokumenttyp ('Rechnung' oder 'Dokument') an:\n{text}",
max_tokens=150,
temperature=0.5
)

# Antwortinhalt extrahieren und kürzen
content = completion.choices[0].text.strip().split("\n")

# Fallback, falls die Antwort nicht wie erwartet ist
sender = (content[0] if len(content) > 0 else "Unbekannt").split()[:3] # Max. 3 Wörter
subject = (content[1] if len(content) > 1 else "Allgemein").split()[:5] # Max. 5 Wörter
doc_type = content[2] if len(content) > 2 else "Dokument"

# Ungültige Zeichen wie Schrägstriche entfernen
for char in ['/', '\\', ':', '*', '?', '"', '<', '>', '|']:
sender = " ".join(sender).replace(char, "-")
subject = " ".join(subject).replace(char, "-")
doc_type = doc_type.replace(char, "-")

# Alle Wörter im Dateinamen gemäß der deutschen Groß- und Kleinschreibung formatieren
sender = sender.title()
subject = subject.title()
doc_type = doc_type.title()

# Aktuelles Datum als Präfix
date_prefix = datetime.datetime.now().strftime("%y%m%d")

# Bestimme, ob es sich um ein Firmen- oder privates Dokument handelt
document_type = "BEV" if "BEV" in text else "Privat"

# Generiere den Dateinamen
file_name = f"{date_prefix}_{document_type}_{doc_type}_{sender}_{subject}.pdf"
return file_name

# Text aus der PDF extrahieren
text = extract_text_from_pdf(file_path)
new_filename = analyze_and_generate_filename(text)

# Datei umbenennen
directory = os.path.dirname(file_path)
new_file_path = os.path.join(directory, new_filename)
os.rename(file_path, new_file_path)

print(f"Datei umbenannt: {file_path} -> {new_filename}")
RiseUp
 
Posts: 6
Joined: Mon Nov 04, 2019 2:23 pm

Re: Python script failing Tue Nov 05, 2024 9:05 am • by RiseUp
Latest log:

2024-11-05 14:00:38.364 hazelworker[19426] DEBUG: == script output ==
/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/opt/homebrew/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin
Traceback (most recent call last):
File "/Users/patricedeckert/Library/CloudStorage/GoogleDrive-myname@gmail.com/My Drive/1 MY FILES/1 Private/Scripts/pdf_rename_script.py", line 94, in <module>
text = extract_text_from_pdf(file_path)
File "/Users/patricedeckert/Library/CloudStorage/GoogleDrive-myname@gmail.com/My Drive/1 MY FILES/1 Private/Scripts/pdf_rename_script.py", line 37, in extract_text_from_pdf
text = pytesseract.image_to_string(page, lang='deu+eng')
File "/opt/homebrew/lib/python3.13/site-packages/pytesseract/pytesseract.py", line 486, in image_to_string
return {
~
...<2 lines>...
Output.STRING: lambda: run_and_get_output(*args),
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
}[output_type]()
~~~~~~~~~~~~~~^^
File "/opt/homebrew/lib/python3.13/site-packages/pytesseract/pytesseract.py", line 489, in <lambda>
Output.STRING: lambda: run_and_get_output(*args),
~~~~~~~~~~~~~~~~~~^^^^^^^
File "/opt/homebrew/lib/python3.13/site-packages/pytesseract/pytesseract.py", line 352, in run_and_get_output
run_tesseract(**kwargs)
~~~~~~~~~~~~~^^^^^^^^^^
File "/opt/homebrew/lib/python3.13/site-packages/pytesseract/pytesseract.py", line 284, in run_tesseract
raise TesseractError(proc.returncode, get_errors(error_string))
pytesseract.pytesseract.TesseractError: (1, 'Error in fopenReadStream: failed to open locally with tail tess_6jkqkpg3_input.PPM for filename /tmp/tess_6jkqkpg3_input.PPM Leptonica Error in findFileFormat: image file not found: /tmp/tess_6jkqkpg3_input.PPM Error in fopenReadStream: failed to open locally with tail P6 for filename P6 Leptonica Error in pixRead: image file not found: P6 Image file P6 cannot be read! Error during processing.')

== End script output ==
RiseUp
 
Posts: 6
Joined: Mon Nov 04, 2019 2:23 pm

Re: Python script failing Tue Nov 05, 2024 10:19 am • by Mr_Noodle
There is probably some other environment variable that needs to be set. This is a bit outside of the support I can give but you'll need to delve more into Tesseract and analyze the error you are getting.
Mr_Noodle
Site Admin
 
Posts: 11643
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: Python script failing Mon Nov 11, 2024 5:58 am • by RiseUp
For everyone trying this or something similar in the future: I couldn't, for the life of me, get it to run in Hazel directly. However, what worked, is to save it as an Automator Application and then launch that application via shell script open -W -a
RiseUp
 
Posts: 6
Joined: Mon Nov 04, 2019 2:23 pm


Return to Support