Trying to scrape a page with login credentials.
payload = {
'email': '*******@gmail.com',
'password': '***'
}
urls = []
login_url = 'https://www.spotrac.com/signin/'
url = 'https://www.spotrac.com/nba/contracts/breakdown/2010/'
webpage = requests.get(login_url, payload)
content = webpage.content
soup = BeautifulSoup(content)
a = soup.find('table',{'class':'datatable'})
urls.append(a)
This is my first time scraping a page with credentials, and can't seem to figure out how to properly enter them.
looked at: http://3.python-requests.org/user/advanced/#session-objects looked at: https://mechanicalsoup.readthedocs.io/en/stable/tutorial.html looked at several stackoverflow answers as well.
I searched for a csrf token on the source page and nothing comes up. I know that scraping page with a login is specific to each website; can anybody inspect this particular login site and see where I can improve this code?
The tricky part is that it uses cookie with Google Analytics, and requests doesn;t receive those to use in the headers. However you can get those cookies by logging in using Selenium. Once you get the cookies that way, you can then use that with the requests
module to go through the pages.
I haven't exactly figured out how to work around the popup ads (so sometimes this will work, sometimes you'll need to try to run it again), but it seems like once you get past the initial log in, it'll work. Since it does have to go to every players link, it does take about 2-3 minutes or some to run through the total list of 375 players from 2010-2020:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import time
import re
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
driver = webdriver.Chrome("C:/chromedriver_win32/chromedriver.exe")
# Use Selenium to login and get all cookies
loginURL = 'https://www.spotrac.com/signin/'
username = 'xxxxxx'
password = 'xxxxxx'
driver.get(loginURL)
try:
# Wait for cookie message
accept_cookie = WebDriverWait(driver, 5, 0.25).until(EC.visibility_of_element_located([By.CSS_SELECTOR, '.cookie-alert-accept']))
accept_cookie.click()
print("Cookies accepted")
except TimeoutException:
print("no alert")
try:
# Wait for cookie message
popup = WebDriverWait(driver, 5, 0.25).until(EC.visibility_of_element_located([By.CSS_SELECTOR, '.cls-btn']))
popup.click()
except TimeoutException:
print("timed out")
time.sleep(5)
driver.find_element_by_name("email").send_keys(username)
driver.find_element_by_name("password").send_keys(password)
submit = WebDriverWait(driver, 100).until(EC.visibility_of_element_located((By.XPATH, '//*[@id="contactForm"]/div[2]/input')))
submit.click()
print ('Logged in!')
# Now that the cookies are there, can use requests to iterate through the links
for seas in range(2020, 2009, -1):
print(seas)
url = 'https://www.spotrac.com/nba/contracts/breakdown/%s/' %seas
driver.get(url)
playerDict = {}
soup = BeautifulSoup(driver.page_source, 'html.parser')
players = soup.find_all('td',{'class':'player'})
for player in players:
name = player.find('a').text
link = player.find('a')['href']
if name not in playerDict.keys():
playerDict[name] = link
results = pd.DataFrame()
count = 1
for player, link in playerDict.items():
driver.get(link)
dfs = pd.read_html(driver.page_source)
df = pd.DataFrame()
for i, table in enumerate(dfs):
if len(table.columns) == 2 and len(table) == 5:
idx = i
temp_df = table.T
temp_df.columns = temp_df.iloc[0]
temp_df = temp_df.rename(columns={'Average Salary:':'Avg. Salary:','Avg Salary:':'Avg. Salary:'})
try:
seasonContract = dfs[idx-2].iloc[0,0]
year = re.findall(r"\d\d\d\d-\d\d\d\d",seasonContract)[0]
seasonContract = year + ' ' + re.split(year, seasonContract)[-1]
except:
seasonContract = 'Current Contract'
temp_df['Player'] = player
temp_df['Contract Years'] = seasonContract
df = df.append(temp_df.iloc[1:], sort=False).reset_index(drop=True)
results = results.append(df,sort=False).reset_index(drop=True)
print ('%03d of %d - %s data aquired...' %(count, len(playerDict), player))
count += 1
driver.close()
Output:
print (results.head(25).to_string())
0 Contract: Signing Bonus: Avg. Salary: Signed Using: Free Agent: Player Contract Years
0 2 yr(s) / $48,500,000 - $24,250,000 Bird 2016 / UFA Kobe Bryant 2014-2015
1 3 yr(s) / $83,547,447 - $27,849,149 Bird 0 / Kobe Bryant 2011-2013
2 7 yr(s) / $136,434,375 - $19,490,625 NaN 2011 / UFA Kobe Bryant 2004-2010
3 5 yr(s) / $56,255,000 - $11,251,000 NaN 2004 / UFA Kobe Bryant 1999-2003
4 3 yr(s) / $3,501,240 - $1,167,080 NaN 0 / Kobe Bryant 1996-1998 Entry Level
5 2 yr(s) / $2,751,688 - $1,375,844 Minimum 2014 / UFA Rashard Lewis 2012-2013
6 1 yr(s) / $13,765,000 - $13,765,000 NaN 0 / Rashard Lewis 2012-2012
7 6 yr(s) / $118,200,000 - $19,700,000 NaN 2013 / UFA Rashard Lewis 2007-2012
8 4 yr(s) / $32,727,273 - $8,181,818 NaN 2007 / UFA Rashard Lewis 2003-2006
9 3 yr(s) / $14,567,141 - $4,855,714 NaN 2003 / UFA Rashard Lewis 2000-2002
10 2 yr(s) / $672,500 - $336,250 NaN 2000 / RFA Rashard Lewis 1998-1999 Entry Level
11 2 yr(s) / $10,850,000 - $5,425,000 Bird 2017 / UFA Tim Duncan 2015-2016
12 3 yr(s) / $30,361,446 - $10,120,482 Bird 2015 / UFA Tim Duncan 2012-2014
13 4 yr(s) / $40,000,000 - $10,000,000 NaN 2012 / UFA Tim Duncan 2010-2011
14 7 yr(s) / $122,007,706 - $17,429,672 NaN 2010 / UFA Tim Duncan 2003-2009
15 3 yr(s) / $31,902,500 - $10,634,167 NaN 2003 / UFA Tim Duncan 2000-2002
16 3 yr(s) / $10,239,080 - $3,413,027 NaN 2000 / UFA Tim Duncan 1997-1999 Entry Level
17 2 yr(s) / $16,500,000 - $8,250,000 Bird 2017 / UFA Kevin Garnett 2015-2016
18 3 yr(s) / $36,000,000 - $12,000,000 Bird 2015 / UFA Kevin Garnett 2012-2014
19 3 yr(s) / $51,300,000 - $17,100,000 NaN 2012 / UFA Kevin Garnett 2009-2011
20 5 yr(s) / $100,000,000 - $20,000,000 NaN 2009 / UFA Kevin Garnett 2004-2008
21 6 yr(s) / $126,016,300 - $21,002,717 NaN 0 / Kevin Garnett 1998-2003
22 3 yr(s) / $5,397,120 - $1,799,040 Rookie 0 / Kevin Garnett 1995-1997 Entry Level
23 1 yr(s) / $1,308,506 - $1,308,506 NaN 2012 / UFA Michael Redd Current Contract
24 6 yr(s) / $90,100,000 - $15,016,667 NaN 2011 / UFA Michael Redd 2005-2010
....