PDF from scanner + OCR

Vernatronics · July 5, 2014, 3:15am

I create PDFs from scanner and use OCR to make the PDF files searchable.
It seems that the OCR does not simply add the recognised text as metadata, but changes the scanned document itself replacing the recognised text with characters with a similar looking font typeface.
So in the middle of the line of text in the scanned document, I see some words looking too perfect for a scanned document and other words that have not been recognised and therefore are rendered as an image. The scanned document looks as if it has been artificially modified, “counterfeit”. This is unacceptable.
Is it possible to have the PDF file to accurately reproduce the scanned document and have the recognised text as metadata, as Adobe Acrobat does?
Thanks, Claudio.

Vernatronics · July 23, 2014, 2:09pm

Hi,
No comments on the issue I reported above?
Is there anyone else facing this issue?

Kind regards,
Claudio.

Robin.W · July 24, 2014, 6:54am

Hi,

sorry I haven’t found time to look at this yet, have you already tried contacting our phone support (http://web.pdfarchitect.org/support.aspx)? They might be able to help you faster, though I don’t think there is a setting which affects this.

best regards

Vernatronics · July 30, 2014, 3:14pm

Hi,

Thank you for your reply Robin,

I haven’t contacted the support because I don’t think this is a bug.
I think it’s more a missing feature (a critically important one in my own opinion), unless there is a hidden setting somewhere.

I wrote here also to know what other users think about this. I guess I am not the only one using PDF Architect to scan and archive legal (or similar) documents, and disliking the fact the PDF Architect “alters” the scanned document.
As it is now, I cannot use the OCR (giving up the possibility to search my documents based on their content).

I also hope that this can be taken as a suggestion for future releases.
By the way, is there somewhere a feature request where users can vote, or is it appropriate to post such requests in this forum?
Thanks,
Claudio.

Robin.W · July 31, 2014, 5:06am

Hi,

thanks for the suggestion, it is fine to post it here; I will create an internal ticket with this suggestion so everybody in the team can have a look at it. It sounds like a good idea to me and I agree it is most likely a missing feature, as I did check all settings and there is no hidden setting for it in the registry.

best regards,