Page 1 of 1

OCR and ocrmypdf embedded script

PostPosted: Mon Oct 26, 2020 7:00 am
by bongobong
Hi,
New to scripting and python but I figured with a bit of help I could get this to work.

I'm trying to use ocrmypdf which I've installed via homebrew.

Thus far I've had no success; here is where i am currently.

Shell: bin/bash

Code: Select all
ocrmypdf --rotate-pages --deskew input.pdf output.pdf


however bin/bash says to use $1 in place of the file name which is confusing to me since ocrpdf demands that I have input.pdf output.pdf in the code.

Any help would be greatly appreciated.

Re: OCR and ocrmypdf embedded script

PostPosted: Mon Oct 26, 2020 10:27 am
by Mr_Noodle
I don't think you need to have input.pdf and output.pdf. You should be able to specify whatever paths you want there, otherwise, it would just hardcode those and not ask you to type those in. I don't know whether that program allows you to specify the same file as input and output though so you should look into that. Also, you should specify the full path for the ocrmypdf program.

Re: OCR and ocrmypdf embedded script

PostPosted: Mon Oct 26, 2020 10:50 am
by bongobong
Mr_Noodle wrote:I don't think you need to have input.pdf and output.pdf. You should be able to specify whatever paths you want there, otherwise, it would just hardcode those and not ask you to type those in. I don't know whether that program allows you to specify the same file as input and output though so you should look into that. Also, you should specify the full path for the ocrmypdf program.


Hmm yeah. It's terminal package though so I don't actually know whether it has a path. I thought I could just invoke it.

Re: OCR and ocrmypdf embedded script

PostPosted: Mon Oct 26, 2020 4:34 pm
by Mr_Noodle
Every program has to have a path of some sort. Try doing "which ocrmypdf" in Terminal.

Re: OCR and ocrmypdf embedded script

PostPosted: Wed Oct 28, 2020 7:45 am
by bongobong
It's /usr/local/bin/ocrmypdf

Re: OCR and ocrmypdf embedded script

PostPosted: Wed Oct 28, 2020 10:11 am
by Mr_Noodle
You should use that path in the script. Use "$1" for the input file. Again, I don't know if the program allows the input and output files to be the same so you'll have to do some research on that part.

Re: OCR and ocrmypdf embedded script

PostPosted: Sat Oct 31, 2020 5:56 am
by bongobong
Mr_Noodle wrote:You should use that path in the script. Use "$1" for the input file. Again, I don't know if the program allows the input and output files to be the same so you'll have to do some research on that part.


Thanks. I managed to get it working with the following:

Code: Select all
ocrmypdf "$1" "$1" --rotate-pages --deskew --clean --skip-text


There's obviously no progress bar or anything to let you know it's working so I had to wait a while for the first scans to began changing names (my second instruction) to know it was working.
It's a great piece of code. I don't now how it pulls it without knowing the file path. Perhaps someone can chime in here.