Regarding my experience of self-learning crawler, when you really want to do something, the whole world will help you

I. Introduction

  Hello everyone, my name is Xiao Sun. I'm so glad that when I was 20 years old, I came across something I could love for the rest of my life - code.
  The binary combination of 0 and 1 drives the progress of the times, and the code contains the energy to change the world.

  but. . .
  But my path of study and exploration was not smooth, even extremely tortuous.
  But what is proud and proud is that even if the road is long and the road is long and the poverty is getting stronger, I have never thought about giving up, and I have never thought about it.
  Although I only taught myself for ten months. But along the way, I have met many teachers who have taught me a lot, made many like-minded friends, and felt the charm of the program and the good open source and social atmosphere in the programmer world.
  I also have small achievements. I have opened my own CSDN, Zhihu, WeChat public account one after another. I
  also started to build my own personal website https://www.sunguoqi.com

  and applied for my own Github account https://www.github .com/sun0225SUN

  Technically, I can now independently develop some simple web projects.
  https://www.bilibili.com/video/BV15L4y1E7xY
  (First recording, poor sound quality, really sorry)

  Write some basic crawler programs.

  During my studies, I experienced firsthand what it means to stand on the shoulders of giants, and also got why the Turing community engraved this sentence in every book published.

  I am really super grateful to those who have helped me directly or indirectly during my studies.
  So I have a plan to write such a series of articles, in order to record my every move on the way of learning, and to express my gratitude to every teacher, classmate, and friend who have helped me.

2. Text

  This is the first article, it is about my experience of self-learning web crawler, my fate with the book "Python3 Web Crawler Development" and the subtle bits and pieces of teacher Cui Qingcai.

1. About my experience of learning reptiles

  I learn a technology, probably by purchasing relevant technical books, nibbling on high-quality teaching courses on station B, consulting relevant articles in the technical exchange community, reading official documents, and prostituting Github open source project code, etc.
  But my road to learning web crawling was not so smooth.

2. About me and the book "Python3 Web Crawler Development"

  I first went to station B to understand what a reptile was in my mind, and then I would look for some high-quality books, e-books, paper books, and so on.
  When choosing books about crawlers, I found that there are not many books about web crawlers on the market. Compared with some books from entry to entry (mastery), there are really too few books.
  This has brought me a lot of trouble to find information, and the only crawler-related books on the market are generally not very detailed (I read it through the book catalog on the product display page before I bought it, At that time, I didn't realize that the school library was such a treasure. I spent a lot of money at that time, because I found that the author's ideas were not suitable for me to study the books I bought, and then I flipped through it two or three times and put it on the bookshelf to eat. Gray)
  Of course, I'm not saying that these writings are not good, I am saying that these writings are not suitable for me, or not suitable for me at this stage.
  For a beginner, the highly condensed generalization will make me lack a transition, and it is easy to learn and give up halfway.
  After constant trial and error and trial and error, I finally finally came across this book written by Mr. Cui Qingcai - "Python3 Web Crawler Development".

  At that time, the second edition was not yet on the market, and I bought the first edition. It's a shame that I bought a pirated copy, and the print quality is poor.
  I don't want to say more about piracy here. The best-selling of pirated books in China does spread knowledge to a certain extent, but this is undoubtedly a disrespect for intellectual property rights and wanton trampling.
  The content of the book "Python3 Web Crawler Development", I dare say, is more detailed than any book I have seen on the market, and we can speculate from its thickness.

  If you say that it looks bloated because it takes care of beginners too much, I think it can be true from a certain point of view.
  As someone who has experience with reptiles, there is really no need to talk about those basic knowledge. (For example, the whole chapter on the installation of python third-party libraries in the first edition)
  But from a beginner's point of view, these so-called bloat are a step-by-step ladder to the door of success.
  I believe that Cui Da also considered this issue and made such a trade-off.

小遗憾
  But I didn't finish the first edition, I just learned to the extent that I can be a small crawler.
  Since I bought pirated books, the printing quality is really a bit poor, which to a certain extent hinders my study.
  Later, I found Cui Da's personal blog.
  https://www.cuiqingcai.com
  I found that Cui Da's personal blog has a blog post about "python3 web crawler development" (it should be a blog before a book).
  The content in the blog post is generally similar to the book, but there is a small flaw in the markdown typesetting problem, which also makes me a little uncomfortable.
  I'm still brooding about pirated books and this typesetting, which indirectly caused me to not finish the first edition. (Of course there are other reasons, such as my own study plan, time schedule, etc.)

3. About me and the second edition of the book "Python3 Web Crawler Development"

  Due to the foreshadowing of the first edition, I added two public accounts of Mr. Cui Qingcai [ Coder of Attack] and [Cui Qingcai | Jing Mi] . I read almost every article of Cui Dafa, it is really useful, and I can feel it. It's thanks to Teacher Cui's good intentions.

  Because I basically read every article, I went through [崔大的新书获得了python之父Guido van Rossum的引荐语]--> [新书封面确定]--> [新书内容介绍]--> [第一次给1000本书签名]--> [正式上市]--> --> [知识星球活动]--> [第二次给1000本书签名]--> [在星球里填写收货信息]--> [每日一催,等待发货]--> [收到包裹,拆箱晒朋友圈]such a complete timeline.

  I remember that Cui Da said at the time that it would be able to catch up with this year's Double Eleven, but in the end it was really delayed and it was officially launched on November 26.
  On the day of the listing, Cui Da also did a corresponding activity. I saw the option of 99 yuan to join Knowledge Planet and send an autograph book, so I participated in such an activity without saying a word. (My behavior is rational, because the price of the later planet has risen to 149 yuan)
  However, the autograph book presented by the classmates who joined the knowledge planet requires Cui Da to sign 1,000 books before they can be shipped (because the 1,000 books signed last time are all at once. It was sold out.)
  After signing, there are still many processes for plastic packaging and packaging. I waited for more than half a month to receive my package. The feeling of waiting is really too anxious.

4. About the content of the second edition of "Python3 Web Crawler Development"

  When I read the content of a book, I am more accustomed to reading its content first, because the content is a basic structure of the whole book, just like the project structure.


  Yes, you read that right, more than 900 pages of content cover basically every aspect of crawler development.
  However, since I am also studying and have not finished reading this book, I can only put an introduction by Cui Da on the contents of the second edition. When I finish learning, I will write another article.
  https://mp.weixin.qq.com/s/66r5s2I-yX6OzGLRJBI0lg

5. About my opinion on Teacher Cui Qingcai’s crawler case platform Scrape

  If you read Cui Da's introduction to the book carefully, you will find a dazzling bright spot.
  Scrape 案例平台
  https://scrape.center

  What kind of existence is this? Below is my understanding.
  We all know that learning crawler, actual combat is indispensable. But if we crawl the existing website directly, we will encounter many problems.

  • First, the website has been revised, and the code cannot run normally.
      Once the target website is revised, the code running through the tutorials in the book will no longer be able to run normally. This is a very big problem and will greatly frustrate the learner's mentality.

  • Second, it will bring a certain load to the target website, which is unhealthy.
      If the traffic brought by the crawler is too large, it will interfere with its normal operation. I believe that our starting point is to learn a technology, not to destroy and waste resources. This is really bad intentions.

  • Third, do not respect the copyright of the target website information, which is easy to cause disputes.
      I believe that the developers, managers, and power holders of the website are all willing to share information, which is a kind of selfless open source spirit. Like Douban, Baidu, AutoNavi map open platform, etc. all provide corresponding APIs.

  But when we crawl the website directly, we consciously or unconsciously enter a gray area, which will cause some infringements, even if there is a gentleman's agreement (robots.txt)
  and Cui Da's own case platform Scrape completely solved these problems.

  • First, the book and the case platform cooperate, so there is no need to worry about the revision of the target website to be crawled.
  • Second, Cui Da’s own case platform, Cui Da allows anyone to climb at his own expense.
  • Third, the case content does not involve any commercial behavior and will not bring any legal problems.

  To a large extent, when I saw Scrapesuch a case platform, I knew that I didn’t need any other crawler books and crawler materials.

6. About my feeling of reading the book "Python3 Web Crawler Development"

  Seeing words like face, reading a book is like communicating with the author of the book. I like this feeling, and the same is true when reading the book "Python3 Web Crawler Development".
  Teacher Cui Qingcai's book makes my reading very smooth, because I also often blog and take notes. I can completely clarify Mr. Cui's ideas for writing books, so as to make it clearer about my next learning direction, and then type more codes and practice more to improve my technical ability.

7. About the affinity of Mr. Cui Qingcai

  In the process of self-study, I joined many learning exchange groups. Some groups are really Crouching Tiger, Hidden Dragon. All the friends in the group are talented, and they speak nicely. I like the people in the group very much, and it is really beneficial to communicate with the group friends.
  Cui Da also operates such a learning exchange group.

  The most worthy of my admiration is that Cui Da often solves problems for group friends in the group, this. . . Does Cui Da have so much time, I admire it! (Of course I know that Cui Da has a fake trumpet, and sometimes I can't tell which one is the real body)
  So we can think about it, what will high-frequency communication bring?
  What brings is the affinity of Cui Da.
  While I am reading the book, I also have the author's WeChat. I send a message to the author, and the author will not play a big role and will reply to me. God, isn't this the feeling of being chased by a star?

8. I want to be like Teacher Cui Qingcai

  In my eyes, teacher Cui Qingcai and Vue.js developer You Yuxi have created amazing things.

  Both of them are stars. I am a huge fan of the two big guys. Chinese developers need such leaders.
  To speak my mind, I want to be such a person, to contribute to the industry and society within my capacity. Although it is difficult and a long way to go, I will definitely work hard.
  In short, it is foreseeable that the second edition of Cui Da's "Python3 Web Crawler Development" will lead another wave of "people" learning web crawlers.
  As the quote from the father of python says.
  this book will help more people understand Python and web crawling/scraping.
  这本书会帮助更多人去学习python和网络爬虫。

I am happy to see that Python is so widely used in the Chinese IT community. I hope this book will help more people understand Python and web crawling/scraping.

—— Guido van Rossum, creator of Python, Distinguished Engineer, Microsoft

3. Postscript

  Finally, I would like to end this article with a quote from Fish C, a small turtle . (Little Turtle (Li Jiayu) is also my programming enlightenment teacher)
  We have been working hard to cultivate such a simple soil. Although it has not been perfect, it has begun to take shape.
  Ten years ago we looked up at the starry sky, ten years later we will look down at the earth, the sky in the future will surely leave us a bright dawn!

  May you and I both live up to our ideals, find our love, and do something with sincerity. As the title of this article says, when you really want to do something, the whole world will help you.

Guess you like

Origin blog.csdn.net/weixin_50915462/article/details/122253788