CSDN blog post downloader (JAVA)

Reprinted from: http://blog.csdn.net/noaboutfengyue/article/details/45192897

CSDN blog downloader made by myself with JAVA, providing jar package and source code.

The source code is also open. Anyway, the source code can be obtained by decompiling the jar, and the novice will not encrypt the jar.


Download: http://download.csdn.net/detail/owuguanfengyue123/8619649

The resource csdn is under review... so slow


Considering that it is inconvenient to read blogs during class, I want to download all good blogs and watch them on my mobile phone.

Various Baidu, found several tools.

1.http://blog.csdn.net/gzshun/article/details/7555525

The great god wrote ideas and tutorials, and found problems with his tools:

(1) The download is incomplete, and the test can only download about 21 articles on the first page.

(2) The generated pdf looks ok, some of the code part is beyond the scope of the pdf page, which makes it invisible and inconvenient

2.http://www.cr173.com/soft/48129.html

The general blog export tool written by this great god seems to have changed in the interface, it is invalid, and it cannot be exported.


I'm thinking about writing a program myself.

Well, with the idea of ​​http://blog.csdn.net/gzshun/article/category/932960 predecessors.

Expanded a bit, my own thinking:

(1) Change the computer version to the mobile version

Personally, I feel that the mobile version of csdn is simpler and easier to handle. The link is http://m.blog.csdn.net/blog, and the user name after it is the user's blog. Version of the blog link is generally a custom domain name, for example: my computer version of the csdn blog custom domain name is: noaboutfengyue, my user name is: oWuGuanFengYue123, in the computer version http://blog.csdn.net/ add these two You can jump to the blog, but in the mobile version, you can only access http://m.blog.csdn.net/blog/oWuGuanFengYue123 to open the blog.

For the convenience of use, the program can be downloaded only by reading the custom domain name. Therefore, a method is added. The function is to obtain the user name through the domain name. The implementation is very simple. Open http://blog.csdn.net/noaboutfengyue, and the source code is There is oWuGuanFengYue123 username, and it is OK to parse it through regular expressions.

(2) Use ITEXT to produce pdf from html

It has been explained in http://blog.csdn.net/noaboutfengyue/article/details/45174787

(3) Get a list of all articles

After testing, it was found that http://m.blog.csdn.net/blog/oWuGuanFengYue123?page= This page is the number of article pages. When this number is large, it is larger than the number of article pages. Take a limit, 99999, http:/ /m.blog.csdn.net/blog/oWuGuanFengYue123?page=999999, all articles are displayed

(4) Acquisition of article list and analysis of article content

For the page obtained in (3), all article titles and urls are parsed through regular expressions, and then the html source code is obtained through this url. Since IText is used, the format of the html source code is very strict, so preprocessing is required here ( This is a bit imperfect), such as <br> error, must be changed to <br/>, and some, and then directly converted to pdf output.

(5) Increase the serial number

In order to make the generated pdf orderly, start numbering according to the author's first blog post. The generated file name is 1.title.pdf


These are the ideas, and some details need to be dealt with.

Instructions for use:

The download directory is in the current program directory /csdn/username


Demonstrate it.

Use http://m.blog.csdn.net/blog/lmj623565791 this teacher to demonstrate, the blog is well written.



Processing shows:


There are still some areas where the processing is not perfect. Generally speaking, the processing is not bad, and most of the pdfs can be generated.

result:


That's it, haha, the world is clean.

Not to mention, I went to the downloaded blog, and went to see it after class.


Copy to Google Translate
                </div>

Reprinted from: http://blog.csdn.net/noaboutfengyue/article/details/45192897

CSDN blog downloader made by myself with JAVA, providing jar package and source code.

The source code is also open. Anyway, the source code can be obtained by decompiling the jar, and the novice will not encrypt the jar.


Download: http://download.csdn.net/detail/owuguanfengyue123/8619649

The resource csdn is under review... so slow


Considering that it is inconvenient to read blogs during class, I want to download all good blogs and watch them on my mobile phone.

Various Baidu, found several tools.

1.http://blog.csdn.net/gzshun/article/details/7555525

The great god wrote ideas and tutorials, and found problems with his tools:

(1) The download is incomplete, and the test can only download about 21 articles on the first page.

(2) The generated pdf looks ok, some of the code part is beyond the scope of the pdf page, which makes it invisible and inconvenient

2.http://www.cr173.com/soft/48129.html

The general blog export tool written by this great god seems to have changed in the interface, it is invalid, and it cannot be exported.


I'm thinking about writing a program myself.

Well, with the idea of ​​http://blog.csdn.net/gzshun/article/category/932960 predecessors.

Expanded a bit, my own thinking:

(1) Change the computer version to the mobile version

Personally, I feel that the mobile version of csdn is simpler and easier to handle. The link is http://m.blog.csdn.net/blog, and the user name after it is the user's blog. Version of the blog link is generally a custom domain name, for example: my computer version of the csdn blog custom domain name is: noaboutfengyue, my user name is: oWuGuanFengYue123, in the computer version http://blog.csdn.net/ add these two You can jump to the blog, but in the mobile version, you can only access http://m.blog.csdn.net/blog/oWuGuanFengYue123 to open the blog.

For the convenience of use, the program can be downloaded only by reading the custom domain name. Therefore, a method is added. The function is to obtain the user name through the domain name. The implementation is very simple. Open http://blog.csdn.net/noaboutfengyue, and the source code is There is oWuGuanFengYue123 username, and it is OK to parse it through regular expressions.

(2) Use ITEXT to produce pdf from html

It has been explained in http://blog.csdn.net/noaboutfengyue/article/details/45174787

(3) Get a list of all articles

After testing, it was found that http://m.blog.csdn.net/blog/oWuGuanFengYue123?page= This page is the number of article pages. When this number is large, it is larger than the number of article pages. Take a limit, 99999, http:/ /m.blog.csdn.net/blog/oWuGuanFengYue123?page=999999, all articles are displayed

(4) Acquisition of article list and analysis of article content

For the page obtained in (3), all article titles and urls are parsed through regular expressions, and then the html source code is obtained through this url. Since IText is used, the format of the html source code is very strict, so preprocessing is required here ( This is a bit imperfect), such as <br> error, must be changed to <br/>, and some, and then directly converted to pdf output.

(5) Increase the serial number

In order to make the generated pdf orderly, start numbering according to the author's first blog post. The generated file name is 1.title.pdf


These are the ideas, and some details need to be dealt with.

Instructions for use:

The download directory is in the current program directory /csdn/username


Demonstrate it.

Use http://m.blog.csdn.net/blog/lmj623565791 this teacher to demonstrate, the blog is well written.



Processing shows:


There are still some areas where the processing is not perfect. Generally speaking, the processing is not bad, and most of the pdfs can be generated.

result:


That's it, haha, the world is clean.

Not to mention, I went to the downloaded blog, and went to see it after class.


Copy to Google Translate
                </div>

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325805917&siteId=291194637