Page 1 of 1

PDF contents in Javascript actions

PostPosted: Thu Aug 22, 2019 4:05 pm
by jswright61
I use Contents Contain as a matching criteria for several rules. I'd like to be able to access the text contents of pdf in a run Javascript action - am I missing something? I can't seem to figure out how to set one of the inputAttributes to the contents of a pdf. If I can do that, I can use some JS Regexes to identify and generate an output variable with a new file_name.

Re: PDF contents in Javascript actions

PostPosted: Fri Aug 23, 2019 10:37 am
by Mr_Noodle
It's a bit weird but there's no metadata for the text content. If you are writing a script, the script will have to get that information itself. You'll probably need some special library or program to parse the actual PDF.

Re: PDF contents in Javascript actions

PostPosted: Sat Aug 24, 2019 4:39 pm
by jswright61
That does seem weird that the text contents are available for matching but not as script input. Since I have to go to an external script, I will probably just end up doing it all there.

Thanks.

BTW, I found https://www.xpdfreader.com/pdftotext-man.html to do a pretty good and quick job of converting pdfs to text files. I use this on my Mac. A couple of notes with this utility.
1) I use the switch -l 1 which sets the last page to convert to page 1 because all the identifying info I need is on page 1.
2) In the docs, there is a little blurb that if you set the output file name to -, it sends the output text to STDOUT which is what I want