pattern matching only for first page

Get help. Get answers. Let others lend you a hand.

Moderator: Mr_Noodle

pattern matching only for first page Wed Feb 06, 2019 8:49 pm • by speedy_99
Hi,

is it possible to search for pattern only on the first page of an multiple page pdf?
speedy_99
 
Posts: 56
Joined: Thu Feb 12, 2015 11:00 pm

Re: pattern matching only for first page Thu Feb 07, 2019 12:18 pm • by Mr_Noodle
I can't think of a way off the top of my head to do it with the built-in pattern matching. You might be able to do it with a script though it would probably be tricky. You'd need some sort of program that can extract the text while staying aware of which page it came from.

Is there some text that indicates the end of a page or the start of the next one?
Mr_Noodle
Site Admin
 
Posts: 11195
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: pattern matching only for first page Thu Feb 07, 2019 1:36 pm • by speedy_99
Mr_Noodle wrote:Is there some text that indicates the end of a page or the start of the next one?


No, the external documents are too different.

The scan software has an option to OCR only the first page, but it make s no sense for other steps.


But I could prepare my own docs with an endpoint.
What kind of endpoint would be necessary and which rule would work?

The format (lettre, DIN A4) of an PDF is not part of the properties?
Page number could be read?
speedy_99
 
Posts: 56
Joined: Thu Feb 12, 2015 11:00 pm

Re: pattern matching only for first page Fri Feb 08, 2019 4:21 pm • by Mr_Noodle
Looks like there is a pdftotext program which does have options to only return text for certain pages. It's a part of the poppler package: https://poppler.freedesktop.org

If you are familiar with macports or homebrew, both of those have poppler which you can install and use.
Mr_Noodle
Site Admin
 
Posts: 11195
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City


Return to Support

cron