Dealing with blankspaces within words

Get help. Get answers. Let others lend you a hand.

Moderator: Mr_Noodle

Dealing with blankspaces within words Mon Jun 11, 2018 3:28 pm • by Sandro
Following up from the other thread, here is another issue, that I am having

On some of receipts, that I am scanning there is a capital font, which is pretty widely spaced
And that leads to the OCR recognizing a blank space where there isn't one every once in a while

The actual word is PARKSCHEIN
Sometimes it might be recognized as PARKSC HEIN or PA RK SCHEIN or P A RKSCHEI N
You get the drift

Given that there are ten letters and thus nine space between letters, I am looking at 512 (if math hasn't let me down there) combinations of spaces/no spaces in that word. How to effectively deal with that?
Sandro
 
Posts: 20
Joined: Mon Jul 31, 2017 7:08 am

Re: Dealing with blankspaces within words Tue Jun 12, 2018 10:31 am • by Mr_Noodle
If you can still isolate it into a custom attribute somehow (maybe surrounding text), you can use the replace text feature to remove the spaces. Otherwise, you'll need a script to handle this type of case.
Mr_Noodle
Site Admin
 
Posts: 11193
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: Dealing with blankspaces within words Wed Jul 04, 2018 2:45 am • by Sandro
Mr_Noodle wrote:If you can still isolate it into a custom attribute somehow (maybe surrounding text), you can use the replace text feature to remove the spaces. Otherwise, you'll need a script to handle this type of case.


Took me a minute to figure out what you meant... You mean the "replace text..." when applying the custom attribute in an action, right? That doesn't help, because I want to use it in a condition to check whether it falls into the categorie. Then I'll probably use a script action to check for the word where I can use regexs

Thx
Sandro
 
Posts: 20
Joined: Mon Jul 31, 2017 7:08 am


Return to Support