ShellScript: problem with German special characters

Get help. Get answers. Let others lend you a hand.

Moderator: Mr_Noodle

Hello

i have the following problem: Hazel seems to not properly handle filenames that contain special characters (German: öüä, French: éèàç, etc.) in connection with shell scripts.

I have the following shell script on my Desktop: ~/Desktop/test.sh
Code: Select all
sourceIn=$1
fileIn=$(basename "$sourceIn")
fileNameIn="${fileIn%.*}"
fileExtIn="${fileIn##*.}"

# Debug output
printf "$sourceIn \n" > ~/Desktop/_debug.txt
printf "$fileIn \n" >> ~/Desktop/_debug.txt
printf "$fileNameIn \n" >> ~/Desktop/_debug.txt
printf "$fileExtIn \n" >> ~/Desktop/_debug.txt

if [ $fileExtIn == "mkv" ]; then
  mkvpropedit "$sourceIn" --edit info --set title="$fileNameIn"
fi

It extracts the file name and uses this file name to write it as title to MKV video files. For this i use the tool MKVpropedit.

The script works perfectly if executed directly from the terminal. File names can contain special characters. Try
  • sh test.sh "/Users/testuser/Desktop/Test.mkv"
  • sh test.sh "/Users/testuser/Desktop/Tüst.mkv"
  • sh test.sh "/Users/testuser/Desktop/Tést.mkv"
MKVpropedit will write Test, Tüst or Tést to the title field in MKV video file.

However, when I call the very same script from Hazel, the script will only work if the file name does not contain a special character.

E.g. I have a folder ~/Desktop/Movies and i have a Hazel rule that is watching this Movies folder

The rule only contains the following embedded shell script
Code: Select all
#!/bin/zsh
sh ~/Desktop/test.sh "$1"


If i add a MKV movie file with the name Test.mkv to the Movies folder, the Rule/Script will execute without problems. However, filenames like Tüst.mkv or Tést.mkv will fail. In Hazel log i will see the following
Code: Select all
Shellscript exited with non-successful status code: 2

I guess this corresponds to the exit code 2 from MKVpropedit: "2 -- This exit code is used after an error occurred." Also see MKVpropedit documentation here

Side note 1: Of course I also tried to add the code from the (external) shell script directly into the Hazel rule. But I had the same problem with special characters as described above. I describe the case with an external script here, because the external cases proves that MKVpropedit can actually write special characters to a MKV movie file. It's not MKVpropedit. There seems to be a conversion problem when special characters are handed from Hazel to ZSH.

Side note 2: the shell script also generates a "debug" test file on the desktop. Strangly, the debug files displays the file name correctly (e.g. using the macOS built-in QuickView) in all combinations, Terminal or Hazel script start, with or without special characters.

Side note 3: I use the latest versions of Big Sur and Hazel.

Any help would be appreciated.

Thanks
test2000
 
Posts: 9
Joined: Mon Jan 31, 2022 7:37 am

If the filenames appear ok in the debug then they are being transferred correctly to the script. Keep in mind that scripts run in Hazel do not run in the same environment as in Terminal. You need to see if mkvpropedit needs certain environment variables set to work properly.
Mr_Noodle
Site Admin
 
Posts: 11865
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Thank you for your feedback.

I have read the Hazel help where environment variables are mentioned. Therefore I optimised my script... all variables are now in double quotation marks and the mkvpropedit call also has the full path. Furthermore mkvpropedit is now directly generating a debug text file on the Desktop. So, if the debug file is generated, mkvpropedit was at least initiated.

Code: Select all
sourceIn="$1"
fileIn=$(basename "$sourceIn")
fileNameIn="${fileIn%.*}"
fileExtIn="${fileIn##*.}"

if [[ $fileExtIn == "mkv" ]]; then
  "/usr/local/bin/mkvpropedit" "$sourceIn" --redirect-output ~/Desktop/_debug.txt --edit info --set title="$fileNameIn"
fi

I can start this script from the terminal with the full path/filename as parameter or I can use the very same script by calling it from an embedded shell script in a Rule Action. In both cases $sourceIn will contain the full path and the filename of the MKV file that shall be changed.

Shell code of the embedded script in Hazel:
Code: Select all
#!/bin/zsh
sh ~/Desktop/test.sh "$1"

If I start the script from the terminal, there are no errors in the debug file and the MKV title is set correctly, no matter if the file name contains special characters or not. E.g. with this two MKV files:
  • sh test.sh "/Users/testuser/Desktop/Test.mkv"
  • sh test.sh "/Users/testuser/Desktop/Tüst.mkv"
If I copy the file "Test.mkv" to the watched Hazel folder, the debug file doesn't show any errors. But when I copy the "Tüst.mkv" file to the Hazel folder, mkvpropedit will write the following to the debug file
Error: The file '/Users/testuser/Desktop/Tu' is not a Matroska file or it could not be found.

The mkvpropedit just receives "/Users/testuser/Desktop/Tu" instead of "/Users/testuser/Desktop/watched/Tüst.mkv". Why or where is the rest lost?

Any hints would be appreciated

Thanks a lot
Last edited by test2000 on Mon Feb 21, 2022 4:47 am, edited 1 time in total.
test2000
 
Posts: 9
Joined: Mon Jan 31, 2022 7:37 am

Can you echo what $sourceIn is after the fileExtIn line?
Mr_Noodle
Site Admin
 
Posts: 11865
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Hello

i added the following (code lines 5 & 6)):
Code: Select all
sourceIn="$1"
fileIn=$(basename "$sourceIn")
fileNameIn="${fileIn%.*}"
fileExtIn="${fileIn##*.}"

echo $sourceIn
printf "$sourceIn \n" > ~/Desktop/_debug_sourceIn.txt

if [[ $fileExtIn == "mkv" ]]; then
  "/usr/local/bin/mkvpropedit" "$sourceIn" --redirect-output ~/Desktop/_debug.txt --edit info --set title="$fileNameIn"
fi

I don't know where i could see the "echo"... at least I can't see it in the Hazel log. Where else would that show up?

If I open the new file "_debug_sourceIn.txt" with the Atom editor, I see the following:
/Users/testuser/Desktop/Tüst.mkv

If I switch to UTF-8, then it is correctly displayed
/Users/testuser/Desktop/Tüst.mkv

Does Hazel spit out something that is differently encoded than expected?

Thank you

Andy
test2000
 
Posts: 9
Joined: Mon Jan 31, 2022 7:37 am

Make sure debug is turned on as described here: https://www.noodlesoft.com/kb/hazel-debug-mode/

It should appear in the log after that.

Encoding will depend on the shell environment. You may need to set that in your script.
Mr_Noodle
Site Admin
 
Posts: 11865
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Hazel decomposes filename strings... special characters such as e.g. ü are internally changed to u¨ in Hazel... WHY???

Test string: Test_with_ü
Code: Select all
correct Unicodes:            [84, 101, 115, 116, 95, 119, 105, 116, 104, 95, 252]
Hazel's decomposed Unicodes: [84, 101, 115, 116, 95, 119, 105, 116, 104, 95, 117, 776]
Re-composed:                 [84, 101, 115, 116, 95, 119, 105, 116, 104, 95, 252]

Note: decomposed strings might be displayed identically in a text editor as a composed string, however they are not identical in terms of stored unicodes.

possible Shell/Python code to display unicodes of a string
Code: Select all
var1=$(python3 -c "print([ord(c) for c in '$var_original'])")


possible Shell/Python code to re-compose a string:
Code: Select all
fileNameIn=$(python3 -c "import unicodedata; print(unicodedata.normalize('NFC', '$fileNameIn'))")
test2000
 
Posts: 9
Joined: Mon Jan 31, 2022 7:37 am

Unicode has several different ways to compose/decompose things. If you are referring to the filename that is passed in, that is the standard dictated by the filesystem. You should be ready to handle unicode strings in different forms.
Mr_Noodle
Site Admin
 
Posts: 11865
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Mr_Noodle wrote:Unicode has several different ways to compose/decompose things. If you are referring to the filename that is passed in, that is the standard dictated by the filesystem. You should be ready to handle unicode strings in different forms.


I'm just wondering why you did not mention that in response to my initial post two years ago... instead you were misleading in the the direction of environment variables...
also see this post: viewtopic.php?f=4&t=14484
test2000
 
Posts: 9
Joined: Mon Jan 31, 2022 7:37 am

Because setting environment variables is a way to normalize the encodings so that it isn't an issue?
Mr_Noodle
Site Admin
 
Posts: 11865
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City


Return to Support