Architect commandline options for batchprocessing recursive folder OCR

Hello,
is it possible to set up architect to replace image based PDFs with OCR PDFs ?
as an example:

      D:\--FILES--\0001\AR\1 …. 1000.pdf
      D:\--FILES--\0001\ER\1 …. 1000.pdf
      D:\--FILES--\0002\AR\1 …. 10.pdf
      D:\--FILES--\0002\LS\A …. Zzz.pdf

i want to create a scheduled task that does start architect with something similar to:

          architect.exe --pdf-tools --recognize --source-dir "D:\--FILES--\” --use-ocr --destination-dir "D:\--FILES"`

and rinse repeat - so maybe "--dont-ocr-textpdf" ? and "--do-sub-folders" ?

and if not with architect - what could i use ?

Hi,

I am afraid the command line options for PDF Architect are currently undocumented / unsupported.
There never has been any switch to "not OCR already OCRd" PDFs, but it might automatically skip them.
If this doesn't help you could look at tesseract ocr and write some small application / script to process each PDF only once.

Best regards

Robin

Thanks @Robin.W - but what would be the parameters for recursive direcory harvesting

There doesn't seem to be one, the switches are unfortunately undocumented / unsupported and I don't have any additional information or resources which could help.

Hi Alexander,

You might want to try to our PDF Architect Support team:
https://support.pdfarchitect.org/hc/en-us/requests/new

Best regards
Sascha