I create PDFs from scanner and use OCR to make the PDF files searchable.
It seems that the OCR does not simply add the recognised text as metadata, but changes the scanned document itself replacing the recognised text with characters with a similar looking font typeface.
So in the middle of the line of text in the scanned document, I see some words looking too perfect for a scanned document and other words that have not been recognised and therefore are rendered as an image. The scanned document looks as if it has been artificially modified, “counterfeit”. This is unacceptable.
Is it possible to have the PDF file to accurately reproduce the scanned document and have the recognised text as metadata, as Adobe Acrobat does?
Thanks, Claudio.
Hi,
No comments on the issue I reported above?
Is there anyone else facing this issue?
Kind regards,
Claudio.
Hi,
Hi,
Thank you for your reply Robin,
I haven’t contacted the support because I don’t think this is a bug.
I think it’s more a missing feature (a critically important one in my own opinion), unless there is a hidden setting somewhere.
I wrote here also to know what other users think about this. I guess I am not the only one using PDF Architect to scan and archive legal (or similar) documents, and disliking the fact the PDF Architect “alters” the scanned document.
As it is now, I cannot use the OCR (giving up the possibility to search my documents based on their content).
I also hope that this can be taken as a suggestion for future releases.
By the way, is there somewhere a feature request where users can vote, or is it appropriate to post such requests in this forum?
Thanks,
Claudio.
Hi,