PPT is a commonly used presentation tool that plays an indispensable role in all walks of life. In the process of processing office documents, there will be situations in which it is necessary to extract document text or pictures. If the document to be operated does not contain a lot of text and pictures, we can still manually copy and paste the content bit by bit. deal with? This article provides a method for extracting PPT text and pictures. (This article is reproduced from: http://www.cnblogs.com/Yesi/p/7770802.html , tested and valid)
Note: To achieve the above operation, you must use the component Spire.Presentation, add the reference dll file after installation, and add the corresponding namespace. The specific operation can refer to the following code.
Original PPT document:
1. Extract the text
The full code is as follows:
using System; using System.Text; using Spire.Presentation; using System.IO; using System.Diagnostics; namespace ExtractText_PPT { class Program { static void Main(string[] args) { //Create a Presentation instance and load the document Presentation presentation = new Presentation(@"C:\Users\Administrator\Desktop\sample.pptx", FileFormat.Pptx2010); //Create a StringBuilder class StringBuilder sb = new StringBuilder(); // Traverse the document and extract the text content foreach (ISlide slide in presentation.Slides) { foreach (IShape shape in slide.Shapes) { if (shape is IAutoShape) { foreach (TextParagraph tp in (shape as IAutoShape).TextFrame.Paragraphs) { sb.Append(tp.Text + Environment.NewLine); } } } } // save the document File.WriteAllText("target.txt", sb.ToString()); Process.Start("target.txt"); } } }
The extracted text looks like this:
2. Extract the text
2.1 Extract all text
using Spire.Presentation; using System.Drawing; namespace ExtractImage_PPT { class Program { static void Main(string[] args) { //Initialize an instance of the Presentation class and load the document Presentation ppt = new Presentation(); ppt.LoadFromFile(@"C:\Users\Administrator\Desktop\sample.pptx"); // loop through the document for (int i = 0; i < ppt.Images.Count; i++) { Image image = ppt.Images[i].Image; // extract image image.Save(string.Format(@"..\..\Images{0}.png", i)); } } } }
Example of effect:
2.2 Extract pictures of specific slides
using System.Drawing; using Spire.Presentation; namespace ExtractImageFromSpecialSlides_PPT { class Program { static void Main(string[] args) { //Create an instance of the Presentation class and load the document Presentation PPT = new Presentation(); PPT.LoadFromFile(@"C:\Users\Administrator\Desktop\sample.pptx"); // Traverse the document, get the fourth slide, and extract the image int i = 0; foreach (IShape s in PPT.Slides[3].Shapes) { if (s is SlidePicture) { SlidePicture ps = s as SlidePicture; ps.PictureFill.Picture.EmbedImage.Image.Save(string.Format("{0}.png", i)); i++; } if (s is PictureShape) { PictureShape ps = s as PictureShape; ps.EmbedImage.Image.Save(string.Format("{0}.png", i)); i++; } } } } }
Example of effect: