ComPDFKit conversion SDK OCR form recognition function

We are very pleased to announce that ComPDFKit Transcoding SDK 1.8.0 for Windows, iOS, Android and Server is now released! In this version, the OCR function supports table recognition and optimizes the OCR text recognition rate. PDF to HTML optimizes the structure of html files, which greatly reduces the capacity of converted HTML files.

OCR form recognition:

Windows:

CPDFConvertWordOptions wordOptions = new CPDFConvertWordOptions();

wordOptions.IsAllowOCR = false;

Mac:

CPDFConvertWordOptions *options = [[CPDFConvertWordOptions alloc] init]

autorelease];

[options setIsAllowOCR:YES];

If you want to learn more about how to use OCR for more platforms, visit our detailed OCR guide .

PDF to HTML:

Windows:
string resPath = “";
string inputFilePath = "
”;
string outputFolderPath = “";
string outputFileName = "
”;

CPDFConverter.Init(resPath);
CPDFConverterHTML converter = CPDFConvertFactroy.CreateConverter(CPDFConvertType.CPDFConvertTypeHtml, inputFilePath) as CPDFConverterHTML;

CPDFConvertHTMLOptions htmlOptions = new CPDFConvertHTMLOptions();
htmlOptions.PageAndNavigationPaneOpts = PageAndNavigationPaneOptions.SinglePageNavigationByBookmarks;
htmlOptions.IsAllowOCR = false;
htmlOptions.IsContainAnnotations = true;
htmlOptions.IsContainImages = true;

int pageCount = converter.GetPagesCount();
int[] pageArray = new int[pageCount];
for (int i = 0; i < pageArray.Length; i++)
{
pageArray[i] = i + 1;
}

ConvertError error = ConvertError.ERR_UNKNOWN;
converter.Convert(outputFolderPath, ref outputFileName, htmlOptions, pageArray, ref error, getPorgress);

If you want to learn more about how to use PDF to HTML conversion for more platforms, visit our detailed PDF to HTML guide .

Bug fixes:

  • Fixed the crash issue that may occur when converting PDF to Word to perform OCR on traditional Chinese documents.
  • Fixed the problem that there would be an extra blank page when converting PDF to RTF.
  • Fixed the problem that the **OnProgress()** callback function of PDF to RTF returns too slowly.
  • Fixed the problem that the PDF to Excel conversion fails when the document has no tables and OnlyTable is equal to true, and now a blank Excel file will be generated.
  • Fixed the problem that some file links in PDF to HTML could not be redirected.
  • Fixed the problem that some comments in PDF to HTML files were lost.
  • Fixed the Crash issue when converting PDF to jpg and png, and the incoming DPI parameter is a negative number.

Overall, we believe this update will bring your experience of using ComPDFKit to a whole new level. We will continue to optimize our features to provide a better user experience for every user. You are welcome to contact us, try out ComPDFKit and give us your feedback.

Guess you like

Origin blog.csdn.net/PDFReaderPro/article/details/131959345