Jmeter (45) - From entry to advanced level - Jmeter web crawler - Part 1 (detailed tutorial)

1 Introduction

When I was in college, I heard my classmates talk about web crawlers for the first time. At that time, I was relatively naive and ignorant. I thought they were just a few electronic bugs crawling on the web page to grab things. Later, I heard that writing code can realize web crawling, which felt very high-end. Later, I heard at work that some companies were caught doing crawlers, etc. For a long time, crawlers seemed to be implemented by writing code. Today, on a whim, I tried to see if I could implement a web crawler without writing code. Therefore, the topic of today's article is to introduce how to implement a web crawler in Jmeter! Here is a practical example of crawling the homepage articles of the blog park.

2. Reptile principle

The principle of Jmeter's crawler is actually very simple. It is to submit a request to the web page, then extract all the returned hrefs, and use the ForEach controller to implement url traversal. Is this explanation very clear? Brother Hong will briefly introduce how to operate it below.

3. Test your skills in a small way

1. First, we need to submit a request to the web page based on the crawler principle. Let’s take the Blog Park as an example to practice it! We initiate a request to the blog park, as shown in the figure below:

If you want to learn automated testing, I recommend a set of videos to you. This video can be said to be the number one automated testing tutorial on the entire network played by Bilibili. The number of people online at the same time has reached 1,000, and there are also notes that can be collected and communicated with various channels. Master technical communication: 798478386    

[Updated] A complete collection of the most detailed practical tutorials on Python interface automation testing taught by Bilibili (the latest practical version)_bilibili_bilibili [Updated] A complete collection of the most detailed practical tutorials on Python interface automated testing taught by Bilibili (practical version) The latest version) has a total of 200 videos, including: 1. Why interface automation should be done for interface automation, 2. Overall view of request for interface automation, 3. Interface practice for interface automation, etc. For more exciting videos from UP master, please follow the UP account . icon-default.png?t=N7T8https://www.bilibili.com/video/BV17p4y1B77x/?spm_id_from=333.337 

2. Look at the result tree and observe the return value. You can find that there are many href tags + text title URLs in the middle, as shown in the following figure:

 3. Now you need to extract these URLs and use powerful regular expressions! As shown below:

4. As you can see from the picture above, what you need has been extracted. Now Hongge adds a regular expression extractor. Remember to fill in -1 for the matching number, which means to extract all suitable URLs, as shown in the picture below. : 

5. Then, add a debug sampler and run jmeter to see if what we want is really taken out, as shown in the following figure: 

6. Or we can directly use regular matching in the results. You can see that many web links have been taken out, as shown in the following figure:

 7. Next we need to use the ForEach controller. Use this controller to traverse and trigger all the extracted URLs. Remember to fill in the variable name in the controller, which is the variable name in the regular expression just now, as shown in the following figure:

8. Add another http request under the ForEach controller and use it to perform request triggering, as shown in the following figure:

9. After re-running Jmeter, we can observe the results and it is time to witness the miracle. Observing the results, we found that all matching URLs were triggered, as shown in the following figure:

10. Modify the HTML to better view the articles we crawled, as shown in the figure below:

 

At this point, the first part of Jmeter’s web crawler is over. Isn’t it very simple? Go and try it out!

4. Summary

Pay attention to the regular expression. At the beginning, there is no question mark. As a result, the URL will contain a string of target, causing the request to fail. Also, please note that https has an s, otherwise it will fail. Here we only crawl the articles on the homepage of the blog park. If you are interested, you can try it yourself and crawl the articles on pages 1, 2, and 3.

Guess you like

Origin blog.csdn.net/Faith_Lzt/article/details/132982783