OCR and Grey Box

toddc · January 2, 2020, 5:39pm

I have PDF Architect + OCR. When I open an existing PDF (image based), I click on the OCR tab and attempt to OCR the document with the image in the foreground and OCR text in the background. After the process runs, there are dark grey boxes around the text it has identified. It also removes some characters (assuming it did not identify them appropriately) and changes those areas to white spaces.

An example document I'm attempting to OCR is a Invoice/Packing Slip. It was scanned and emailed from a Xerox multi-function machine. I usually save them out of my email, OCR the text while preserving the original look. That way, searching through a folder(s) of documents (content search) is easier.

With Adobe Acrobat (an old version), I was able to OCR and it did not change the original look. Since then, I'm switched to PDF Architect, but cannot figure the configuration out.

My Settings are:
-->Options
---->OCR
------>Language: English
------>Recognition Quality: High
------>Output Quality: Max
------>Orientation and Script Detection: Enabled
------>Selected: Deskew, Rotate Page, Detect Text Orientation and
------>PDF Type: Image-Text

Any guidance?

Robin.W · January 3, 2020, 9:37am

Hi,

does it get better, if you switch it to "text only" PDF Type?
This might lead to producing only text output from the OCR, avoiding the addidtional boxes around the text.
If this doesn't help. please contact the PDF Architect support, as this Forum is exclusively managed by the PDFCreator Support Team.

Best regards

Robin

toddc · January 4, 2020, 7:04pm

Thanks for the suggestion. The text only did not help any. It only removed the images and recognized what text it could. It did not place the gray boxes, but it removed the images. I'll reach out to support.