How to extract PDF text and images in C#

At work, when we use PDF documents, we often spend more time processing documents because the documents are not easy to manipulate and edit. For developers, we need to use a convenient method to operate PDF documents. So how should we achieve when extracting PDF text and images? This article will introduce how to extract text and images by using Free Spire.PDF , a free PDF control. Controls are available here .

Note: After downloading and installing this component, the dll file can be obtained in the folder Bin after the compressed package is decompressed. Take care to add references in the project program.

Original document:

 

1. Extract PDF text

C#

//Create a PdfDocument class object and load the PDF samle
PdfDocument doc = new PdfDocument();
doc.LoadFromFile("sample.pdf");
 
//Instantiate the StringBuilder class
StringBuilder buffer = new StringBuilder();
// Traverse the document and extract the text
foreach (PdfPageBase page in doc.Pages)
{
    buffer.Append(page.ExtractText());
}
doc.Close();
// save the document
String fileName = "TextInPdf.txt";
File.WriteAllText(fileName, buffer.ToString());
buffer = null;

 Run the program to generate the documentation:

 

2. Extract pictures

C#

//Create a PdfDocument class object and load the PDF sample
PdfDocument doc = new PdfDocument();
doc.LoadFromFile("sample.pdf");

/ / Declare an IList class, the element is image
IList<Image> images = new List<Image>();
/ / Traverse the PDF document to diagnose whether there are pictures, and extract the pictures
foreach (PdfPageBase page in doc.Pages)
{
if (page.ExtractImages() != null)
   {
     foreach (Image image in page.ExtractImages())
         {
               images.Add(image);
          }
   }
}
doc.Close();

// Traverse the extracted images, save and name the image
int index = 0;
foreach (Image image in images)
{
  String imageFileName = String.Format("Image-{0}.png", index++);
  image.Save(imageFileName, ImageFormat.Png);
 }

 After extracting the image:

 (This article is reproduced from http://www.cnblogs.com/Yesi/p/4203686.html )

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326214046&siteId=291194637