How to extract PPT text and pictures in C#

PPT is a commonly used presentation tool that plays an indispensable role in all walks of life. In the process of processing office documents, there will be situations in which it is necessary to extract document text or pictures. If the document to be operated does not contain a lot of text and pictures, we can still manually copy and paste the content bit by bit. deal with? This article provides a method for extracting PPT text and pictures. (This article is reproduced from: http://www.cnblogs.com/Yesi/p/7770802.html , tested and valid)

 

Note: To achieve the above operation, you must use the component Spire.Presentation, add the reference dll file after installation, and add the corresponding namespace. The specific operation can refer to the following code.

Original PPT document:



1. Extract the text

The full code is as follows:

using System;
using System.Text;
using Spire.Presentation;
using System.IO;
using System.Diagnostics;

namespace ExtractText_PPT
{
    class Program
    {
        static void Main(string[] args)
        {
            //Create a Presentation instance and load the document
            Presentation presentation = new Presentation(@"C:\Users\Administrator\Desktop\sample.pptx", FileFormat.Pptx2010);

            //Create a StringBuilder class
            StringBuilder sb = new StringBuilder();
            // Traverse the document and extract the text content
            foreach (ISlide slide in presentation.Slides)
            {
                foreach (IShape shape in slide.Shapes)
                {
                    if (shape is IAutoShape)
                    {
                        foreach (TextParagraph tp in (shape as IAutoShape).TextFrame.Paragraphs)
                        {
                            sb.Append(tp.Text + Environment.NewLine);
                        }
                    }

                }
            }
            // save the document
            File.WriteAllText("target.txt", sb.ToString());
            Process.Start("target.txt");
        }
    }
}

 

The extracted text looks like this:



2. Extract the text

 2.1 Extract all text

using Spire.Presentation;
using System.Drawing;

namespace ExtractImage_PPT
{
    class Program
    {
        static void Main(string[] args)
        {
            //Initialize an instance of the Presentation class and load the document
            Presentation ppt = new Presentation();
            ppt.LoadFromFile(@"C:\Users\Administrator\Desktop\sample.pptx");

            // loop through the document
            for (int i = 0; i < ppt.Images.Count; i++)
            {
                Image image = ppt.Images[i].Image;
                // extract image
                image.Save(string.Format(@"..\..\Images{0}.png", i));
            }
        }
    }
}

 Example of effect:

 



 2.2 Extract pictures of specific slides

using System.Drawing;
using Spire.Presentation;

namespace ExtractImageFromSpecialSlides_PPT
{
    class Program
    {
        static void Main(string[] args)
        {
            //Create an instance of the Presentation class and load the document
            Presentation PPT = new Presentation();
            PPT.LoadFromFile(@"C:\Users\Administrator\Desktop\sample.pptx");

            // Traverse the document, get the fourth slide, and extract the image
            int i = 0;
            foreach (IShape s in PPT.Slides[3].Shapes)
            {
                if (s is SlidePicture)
                {
                    SlidePicture ps = s as SlidePicture;
                    ps.PictureFill.Picture.EmbedImage.Image.Save(string.Format("{0}.png", i));
                    i++;
                }
                if (s is PictureShape)
                {
                    PictureShape ps = s as PictureShape;
                    ps.EmbedImage.Image.Save(string.Format("{0}.png", i));
                    i++;
                }
            }

        }
    }
}

 Example of effect:

 




 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326316705&siteId=291194637