Hello,
is it possible to set up architect to replace image based PDFs with OCR PDFs ?
as an example:
D:\--FILES--\0001\AR\1 …. 1000.pdf
D:\--FILES--\0001\ER\1 …. 1000.pdf
D:\--FILES--\0002\AR\1 …. 10.pdf
D:\--FILES--\0002\LS\A …. Zzz.pdf
i want to create a scheduled task that does start architect with something similar to:
architect.exe --pdf-tools --recognize --source-dir "D:\--FILES--\” --use-ocr --destination-dir "D:\--FILES"`
and rinse repeat - so maybe "--dont-ocr-textpdf" ? and "--do-sub-folders" ?
and if not with architect - what could i use ?
Hi,
I am afraid the command line options for PDF Architect are currently undocumented / unsupported.
There never has been any switch to "not OCR already OCRd" PDFs, but it might automatically skip them.
If this doesn't help you could look at tesseract ocr and write some small application / script to process each PDF only once.
Best regards
Robin
Thanks @Robin.W - but what would be the parameters for recursive direcory harvesting
There doesn't seem to be one, the switches are unfortunately undocumented / unsupported and I don't have any additional information or resources which could help.
Hi Alexander,
You might want to try to our PDF Architect Support team:
https://support.pdfarchitect.org/hc/en-us/requests/new
Best regards
Sascha