PDF contents in Javascript actions

Get help. Get answers. Let others lend you a hand.

Moderator: Mr_Noodle

PDF contents in Javascript actions Thu Aug 22, 2019 4:05 pm • by jswright61
I use Contents Contain as a matching criteria for several rules. I'd like to be able to access the text contents of pdf in a run Javascript action - am I missing something? I can't seem to figure out how to set one of the inputAttributes to the contents of a pdf. If I can do that, I can use some JS Regexes to identify and generate an output variable with a new file_name.
jswright61
 
Posts: 11
Joined: Tue Aug 20, 2019 4:40 pm

Re: PDF contents in Javascript actions Fri Aug 23, 2019 10:37 am • by Mr_Noodle
It's a bit weird but there's no metadata for the text content. If you are writing a script, the script will have to get that information itself. You'll probably need some special library or program to parse the actual PDF.
Mr_Noodle
Site Admin
 
Posts: 11196
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: PDF contents in Javascript actions Sat Aug 24, 2019 4:39 pm • by jswright61
That does seem weird that the text contents are available for matching but not as script input. Since I have to go to an external script, I will probably just end up doing it all there.

Thanks.

BTW, I found https://www.xpdfreader.com/pdftotext-man.html to do a pretty good and quick job of converting pdfs to text files. I use this on my Mac. A couple of notes with this utility.
1) I use the switch -l 1 which sets the last page to convert to page 1 because all the identifying info I need is on page 1.
2) In the docs, there is a little blurb that if you set the output file name to -, it sends the output text to STDOUT which is what I want
jswright61
 
Posts: 11
Joined: Tue Aug 20, 2019 4:40 pm


Return to Support