Aspose.PDF is an advanced PDF processing API that can easily generate, modify, convert, render, protect and print documents in cross-platform applications. No need to use Adobe Acrobat. Additionally, the API provides compression options, table creation and manipulation, graphics and image functionality, extensive hyperlink functionality, stamp and watermark tasks, extended security controls, and custom font handling.
Aspose API supports popular file format processing and allows exporting or converting various types of documents to fixed layout file formats and most commonly used image/multimedia formats.
PDF files are popular because they support text, images, animations, videos, and many other annotations.
However, text is the most important part of most PDF documents. In this article, we will use C#.NET to convert PDF to TXT file and convert TXT file to PDF format. This article includes:
- Convert PDF to TXT file without formatting using C# or VB.NET
- Convert PDF to TXT file using formatting routines using C# or VB.NET
- Convert TXT file to PDF programmatically using C# or VB.NET
Currently, the .NET version of Aspose.PDF is upgraded to v20.9, which enhances the conversion performance from TIFF to PDF and fixes many bugs such as LZW decoder failure. Interested friends can click the button below to download the latest version.
Convert PDF to TXT file without formatting using C# or VB.NET
First, we'll convert the PDF to text without any formatting routines. Text content will be converted as-is. So, from the input PDF file, the output text will not follow any formatting. The following steps need to be followed to convert PDF to TXT efficiently and reliably.
- Load the input PDF document
- Initialize an instance of the StringBuilder class
- Iterate through each page of the PDF document
- Read text using TextDevice and Raw mode
- Save the output text as a TXT file
The code snippet below shows how to convert PDF to TXT file using C# or VB in .NET Framework:
// Open document Document pdfDocument = new Document(dataDir + "MultiColumnPdf.pdf"); StringBuilder builder = new StringBuilder(); // String to hold extracted text string extractedText = ""; foreach (Page pdfPage in pdfDocument.Pages) { using (MemoryStream textStream = new MemoryStream()) { // Create text device TextDevice textDevice = new TextDevice(); // Set different options TextExtractionOptions options = new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Raw); textDevice.ExtractionOptions = options; // Convert the page and save text to the stream textDevice.Process(pdfPage, textStream); // Close memory stream textStream.Close(); // Get text from memory stream extractedText = Encoding.Unicode.GetString(textStream.ToArray()); } builder.Append(extractedText); } dataDir = dataDir + "PDF_to_TXT_Raw.txt"; // Save the text file File.WriteAllText(dataDir, builder.ToString());
Convert PDF to TXT file using formatting routines using C# or VB.NET
You can easily render the text content of a PDF document as a TXT file using C# by following these steps:
- Load source PDF file
- start a string variable
- Read through each page using TextFormattingMode.Pure
- Save the converted TXT file
The following code snippet shows how to convert PDF format to TXT file using C# or VB.NET language:
// Open document Document pdfDocument = new Document(dataDir + "MultiColumnPdf.pdf"); StringBuilder builder = new StringBuilder(); // String to hold extracted text string extractedText = ""; foreach (Page pdfPage in pdfDocument.Pages) { using (MemoryStream textStream = new MemoryStream()) { // Create text device TextDevice textDevice = new TextDevice(); // Set different options TextExtractionOptions options = new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Pure); textDevice.ExtractionOptions = options; // Convert the page and save text to the stream textDevice.Process(pdfPage, textStream); // Close memory stream textStream.Close(); // Get text from memory stream extractedText = Encoding.Unicode.GetString(textStream.ToArray()); } builder.Append(extractedText); } dataDir = dataDir + "PDF_to_TXT_Pure.txt"; // Save the text file File.WriteAllText(dataDir, builder.ToString());
Visual comparison of PURE and RAW text conversions
The screenshot below is a visual comparison of the two methods we just discussed. You'll notice that the native mode (rightmost window) displays text in the same format as the PDF file (leftmost window).
Convert TXT file to PDF programmatically using C# or VB.NET
TXT files usually contain a lot of text content. You can easily convert TXT files to PDF files using Aspose.PDF for .NET API. Just follow the steps below to perform text to PDF conversion:
- Create an instance of the TextReader class
- Initialize the PDF document and add blank pages
- Instantiate the TextBuilder object
- Read each line of text from the input TXT file
- Save the output PDF file
The code snippet below illustrates how to programmatically convert a TXT file containing text to a PDF document using C# or VB.NET language:
// Read input TXT file System.IO.TextReader tr = new StreamReader(dataDir + "Test.txt", Encoding.UTF8, true); // Initialize new Document Document doc = new Document(); // Add blank page Page page = doc.Pages.Add(); String strLine; // Initiate TextBuilder object TextBuilder builder = new TextBuilder(page); double x = 100; double y = 100; while ((strLine = tr.ReadLine()) != null) { TextFragment text = new TextFragment(strLine); text.Position = new Position(x, y); if ( y >= page . PageInfo . Height - 72 ) { y = 100; page = doc.Pages.Add(); builder = new TextBuilder(page); } else { y += 15; } builder.AppendText(text); } // Save output PDF file doc.Save(dataDir + "TexttoPDF.pdf"); tr.Close();