Teach you how to use Python to create a smart search for Taobao products!

 

/1 Introduction/

With the rise of online shopping, many traditional shops have transformed to online businesses, and the emergence of e-commerce has greatly facilitated our lives.

/2 Project goal/

Search and go directly to the destination with one-click through the Python program, crawl Taobao product links, product names, and product image links, and record each operation in the log file.

/ 3 Project preparation/

Use the sublime text 3 editor to write the program, first look at the main interface after the program runs:

/4 Project realization/

1. Analyze the page structure and put the product information in their respective lists. Take the following shop as an example.

2. The old way, F12, because we are looking for product links in the store, so we find as many products as possible. From the perspective of the layout of the store, it seems that the baby recommends more products in this section, so we will Climb all the content in this section.

3. Steps 1, 2 and 3 in the figure are the various information of the products we want to crawl. It can be seen that the products are all in the dt tag with the class photo, so we need to extract them.

try: 


              urllib3.disable_warnings() #Remove warnings from urllib3 


              #Web request 


              rep=requests.get(self.e2.get(),verify=False,timeout=4) #Certificate verification is set to FALSE, set access delay 


              rep .encoding='gbk' 


              soup=BeautifulSoup(rep.content,'html.parser') 


              result=soup.find_all('dt',class_='photo') #Get all dt elements whose class is photo 


for x in result : 


                    tt=x.find_all('a') #Get all the child elements under dt a 


for y in tt: 


for x in y: 


                               ab=x.find_next_siblings('img') #Get all the next sibling elements img 


for z in ab: 


                                     \#Add the product name and product picture link to the lists aa and bb 


                                     aa.append (z ['old'])


                                     bb.append('https:'+z['data-ks-lazyload']) 


                          cc.append('https:'+y['href'])#Add the product link to the list cc 


except: 


return

In this way, we can easily obtain the product link, product name, and product image link, and then save them in the aa, bb, and cc lists.

/ 5 Writing GUI interface, interactive and friendly/

In order to make the running result more beautiful, we need to make a GUI interface, which has to mention the Python built-in GUI artifact tkinter.

Okay, let's get back to the subject, we can write an interactive interface to encapsulate it into a class, which is more beautiful.

class page: 


def __init__(self): 


self.ti=dt.now().strftime("%Y/%m/%d %H:%S:%M") 


self.root = tk.Tk() # Initialize the window 


self.root.title('Taobao Get Merchant Baby V1.0') 


#Window name self.root.geometry("700x700") 


#Set window size self.root.iconbitmap('q.ico') 


self.root .resizable(width=True,height=True) #Set whether the window is variable, width is not variable, height is variable, the default is True 


         \ #Create label, text, background color, font (color, size), label height and Wide 


self.label1 =tk.Label(self.root,text='Shop Homepage:',font=('宋体',10),width=12,height=2) 


         \ #Create input box, label height, font size Color, content display method 


self.e2 = tk.Entry(self.root,width=30,show=None, font=('Arial', 12)) # Display in plain text 


self.label2 =tk.Label(self. root,text='Taobao Direct:',font=('宋体',10),width=12,height=2) 


self.e1 = tk.Entry(self.root,width=30,show=None, font=('Arial', 12))


         \ 


#Create button content width and height button binding event self.b1 = tk.Button(self.root, text='parse page', width=8,height=1,command=self.parse) 


self.b2 =tk. Button(self.root, text='Generate excel', width=8,height=1,command=self.sc) 


self.b3 =tk.Button(self.root, text='Taobao search', width=8, height=1,command=self.search) 


self.b4 =tk.Button(self.root, text='Close the program', width=8,height=1,command=self.close) 


self.b5 =tk.Button (self.root, text='Save log', width=8,height=1,command=self.log) 


         \ #Create a text box 


self.te=tk.Text(self.root,height=40) 


self.label1 .place(x=140,y=30,anchor='nw') 


self.label2.place(x=138,y=70,anchor='nw') 


         \ 


#Add all parts to the interface self.e1. place(x=210,y=74,anchor='nw') 


self.e2.place(x=210,y = 34, anchor = 'nw')


self.b1.place(x=160,y=110,anchor='nw') 


self.b2.place(x=240,y=110,anchor='nw') 


self.b3.place(x=320, y=110,anchor='nw') 


self.b4.place(x=400,y=110,anchor='nw') 


self.b5.place(x=480,y=110,anchor='nw') 


self.te.place(x=40,y=170,anchor='nw') 


         \ 


#Set the start text of the input box self.e1.delete(0, "end") 


self.e1.insert(0, "Please enter The product to be searched") 


self.root.mainloop()

Even if the GUI interface is created, the effect diagram is as follows:

/ 6 Enter the homepage address of the target shop, generate data and export to Excel and log records/

1. Get the data by entering the homepage address of the Taobao shop, so we need to make a judgment on the program, because we encapsulate it in a class, so we need to add a self in each function bracket, the code is as follows:

# Parse web content 




def parse(self): 


self.res() 


if self.e2.get()=='': #Determine whether the value of the input box is empty 


             \ 


#Insert the value into the text box self.te.insert ('insert','.... Please enter the URL.... \n') 


       elif str(self.e2.get()).find('taobao.com')==-1 or aa==' ': 


self.te.insert('insert',' ... The address is incorrect...\n') 


else: 


self.te.insert("insert","Parsing the target webpage: %s\n\ n"%self.e2.get()) 


self.te.insert("insert",".... The start of parsing:.... \n") #INSERT index indicates the current position of the insert cursor 


self. te.insert("insert","\n\n") 


for x,y,z in zip(aa,bb,cc): #Combine the list where the data is located 


                  result=x+'\n'+y+'\n' +z+'\n\n' 


self.te.insert("insert",result,"\n\n") 


#Insert into the text box self.te.insert("insert","\n\n") #Insert a space


self.te.insert("end","The parsing is complete...\n")

2. Generate an Excel file, the code is as follows:

# Save the result to excel 


def sc(self): 


self.te.insert("insert"," ... Start generating: ...\n") 


       av={'Time':self.ti, 'Product name':aa,'Product link':cc,'Product picture link':bb} 


       \#Generate dataframe multidimensional array 


       df=p.DataFrame(av,columns=['time','product name','product Link','Product image link'],index=range(len(aa))) 


       df.to_excel('22.xlsx', sheet_name='taobao') 


#Generate excel self.te.insert("end"," ... The generation is complete.... \N")

After the code is run, the following effect is obtained:

3. Generate a log file, the code is as follows:

# Save log 
def log(self): 


       ss=str(self.te.get(0.0,'end')).split('\n') #Separate text box content 


       with open('1.txt','w ',encoding='utf8') as f: 
                  #Save log 


for y in range(len(ss)): 

rea=str(self.ti)+ss[y]+'\n' 


                  f.write(rea)

After the code is run, the following effect is obtained:

/8 Quick search Taobao commodity webpage direct program is closed/

To search Taobao products with one click, we first find the search address of Taobao, and then make a get request and pass different values ​​to him. General search will involve a keyword search.

Here we first find the search entry of Taobao, the address is:

https://s.taobao.com/search?q=

Then pass the value later, because we want to browse in the browser, so we need to use the webbrowser module, which is specifically responsible for accessing the page, and its usage is webbrowser.open(url).

So the code is as follows:

# Search products 
def search(self): 


self.te.insert("insert",".... Open the browser:.... \N") 


  wb.open('https://s.taobao .com/search?q='+self.e1.get()) #Open the browser

The final step is to close the program. code show as below:

# Close the program 
def close(self): 


self.te.insert("insert","..... Close the program:.... \N") 


self.root.destroy() #Destroy the window

/9 Summary/

1. It is not recommended to grab too much data, which may cause load on the server, just try it briefly.

2. This article is based on the Python web crawler, using the crawler library, to create a simple and intelligent Taobao search system, and can be operated to generate logs.

3. This system seems very simple, but in fact it is a big challenge for the novice Xiaobai. Even some big guys are easy to fall into the hole. The main page analysis is a bit complicated and changeable, and there will be many abnormalities that will make you trapped. . Generally speaking, it's a pretty good practice project, and it's a test for yourself, I hope you like it.

4. If you need the source code of this article, please click the "  source code " keyword to get it. You think it's good, remember to give a star.

 

Guess you like

Origin blog.csdn.net/weixin_43881394/article/details/109027001