from bs4 import BeautifulSoup text = """ <ul id="navList" class="w1"> <li><a id="blog_nav_sitehome" class="menu" href="https://www.cnblogs.com/">博客园</a> </li> <li> <a id="blog_nav_myhome" class="menu" href="https://www.cnblogs.COM / jswf / > Home </a>" </ li> <li> <a id="blog_nav_newpost" class="menu" href="https://i.cnblogs.com/EditPosts.aspx?opt=1">新随笔</a> </li> <li> <a id="blog_nav_contact" class="menu" href="https://msg.cnblogs.com/send/jswf">联系</a></li> <li> <a id="blog_nav_rss" class="menu"https://www.cnblogs.com/jswf/rss/"href =">订阅</a> <!--<partial name="./Shared/_XmlLink.cshtml" model="Model" /></li>--></li> <li> <a id="blog_nav_admin" class="menu" href="https://i.cnblogs.com/">管理</a> </li> </ul> <ul> <li>1213123</li> </ul> """ soup = BeautifulSoup(text,"lxml") ul = soup.find_all("ul", the class_ = " W1 " , id = " navList " , limit = 2 ) [ 0 ] #, and find all the tags ul class and id is specified and only take two zeroth obtain a list obtained after listing #ul = Soup .find_all ( " ULS " , attrs = { " class " : " W1 " , " id " : " navList " }) [ 0 ] #, and find all the tags ul id is specified class and get a list of the list obtained after the zeroth a Print (ul) Print (List (ul.strings)) # get all the text under the label include carriage returns ul print (list (ul.stripped_strings)) # Get all non-empty text label under ul AESUl.find_all = ( " a " ) for a in AES: href = a [ " href " ] # Get a tag href attribute #href = a.attrs ( " href " ) # Get the href attribute of a tag Print (href )
There is also a volume ul.get_text () and ul.strings the same role (both to return all the text in the label ul including spaces carriage returns)
However get_text () Returns a string format strings returns the format generator