Get all the values with same class name in selenium

Usman khan :

I want to get the article names and url of articles with the same class name. Issue is, it prints just one information again and again instead of all the artilces.

from selenium import webdriver
driver = webdriver.Chrome(r'C:\Users\muhammad.usman\Downloads\chromedriver_win32\chromedriver.exe')
driver.get('https://www.aljazeera.com/news/')
# to get the current location ...
driver.current_url
button = driver.find_element_by_id('btn_showmore_b1_418')
driver.execute_script("arguments[0].click();", button)
content = driver.find_element_by_class_name('topics-sec-block')
print(content)
container = content.find_elements_by_xpath('//div[@class="col-sm-7 topics-sec-item-cont"]')
print(container)
i=0
for i in range(0, 12):
    title = []
    url = []
    heading=container[i].find_element_by_xpath('//div[@class="col-sm-7 topics-sec-item-cont"]/a/h2').text
    link = container[i].find_element_by_xpath('//div[@class="col-sm-7 topics-sec-item-cont"]/a')
    title.append(heading)
    url.append(link.get_attribute('href'))
    print(title)
    print(url)
    i += 1
names = driver.find_elements_by_css_selector('div.topics-sec-item-cont')
for name in names:

    heading=name.find_element_by_xpath('//div[@class="col-sm-7 topics-sec-item-cont"]/a/h2').text
    link = name.find_element_by_xpath('//div[@class="col-sm-7 topics-sec-item-cont"]/a')
    print(heading)
    print(link.get_attribute('href'))
chitown88 :

Not sure what the issue is (I know a LITTLE about Selenium, but more versed in BeautifulSoup). So The way I implemented it was feed the html into BeautifulSoup object, then iterate through those elements:

from selenium import webdriver
from bs4 import BeautifulSoup

driver = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
driver.get('https://www.aljazeera.com/news/')
# to get the current location ...
driver.current_url
button = driver.find_element_by_id('btn_showmore_b1_418')
driver.execute_script("arguments[0].click();", button)
content = driver.find_element_by_class_name('topics-sec-block')
print(content)

soup = BeautifulSoup(driver.page_source, 'html.parser')
container = soup.select('div.topics-sec-item-cont')

titleList = []
urlList = []
for item in container:
    heading=item.find('h2').text
    link = item.find('a')['href']
    titleList.append(heading)
    urlList.append(link)
    print('HEADLINE: %s\nUrl: https://www.aljazeera.com%s\n' %(heading, link) + '-'*70 + '\n' )



driver.close()

Output:

HEADLINE: Trump's Remain in Mexico policy endangers migrants headed to US
Url: https://www.aljazeera.com/news/2020/03/trumps-remain-mexico-policy-endangers-migrants-headed-200306102155930.html
----------------------------------------------------------------------

HEADLINE: India, South Korea report new coronavirus cases: Live updates
Url: https://www.aljazeera.com/topics/events/coronavirus-outbreak.html
----------------------------------------------------------------------

HEADLINE: Clashes between Greek police, migrants reported on Turkish border
Url: https://www.aljazeera.com/topics/subjects/refugees.html
----------------------------------------------------------------------

HEADLINE: Congo protests against unpaid pensions as gov't debt balloons
Url: https://www.aljazeera.com/topics/regions/africa.html
----------------------------------------------------------------------

HEADLINE: Is India prepared for coronavirus outbreak?
Url: https://www.aljazeera.com/topics/events/coronavirus-outbreak.html
----------------------------------------------------------------------

HEADLINE: India protest violence leaves thousands displaced
Url: https://www.aljazeera.com/topics/regions/asia.html
----------------------------------------------------------------------

HEADLINE: Guinea protests: One dead in anti-government demonstration
Url: https://www.aljazeera.com/topics/regions/africa.html
----------------------------------------------------------------------

HEADLINE: Brazil recalls diplomats, officials from Venezuela
Url: https://www.aljazeera.com/topics/country/brazil.html
----------------------------------------------------------------------

HEADLINE: US coronavirus: rise in cases in New York state
Url: https://www.aljazeera.com/topics/events/coronavirus-outbreak.html
----------------------------------------------------------------------

HEADLINE: Australia urged to take action amid rising violence against women
Url: https://www.aljazeera.com/topics/country/australia.html
----------------------------------------------------------------------

HEADLINE: Turkey, Russia announce ceasefire in Syria's Idlib
Url: https://www.aljazeera.com/topics/regions/middleeast.html
----------------------------------------------------------------------

HEADLINE: 'Good morning, Codogno!': A coronavirus radio station in Italy
Url: https://www.aljazeera.com/topics/country/italy.html
----------------------------------------------------------------------

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=27847&siteId=1