Click here for more recommended reading on Python and open source programming
Recently, I was asked to show someone how to programmatically log in to Amazon using the Python requests package (unlike using selenium / mechanize etc.). I thought I’d share how to do this as a blog post.
Step 1)
First, we’ll load the packages we’ll need. In our example, we’ll just be using requests and BeautifulSoup. For more information about either of these packages, see here for a refresher on requests, or here for more about BeautifulSoup.
'''load packages''' import requests from bs4 import BeautifulSoup
Step 2)
Next, we create a session object. Basically, a session allows you to maintain a connection to a website, while also maintaining cookies. Once you’ve logged into Amazon, this will allow you to remain logged in for anything else you might want to do.
In this step, we’ll also define the headers for the session object.
The URL where you would actually login using a browser is https://www.amazon.com/gp/sign-in.html .
'''define URL where login form is located''' site = 'https://www.amazon.com/gp/sign-in.html' '''initiate session''' session = requests.Session() '''define session headers''' session.headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.61 Safari/537.36', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'Accept-Language': 'en-US,en;q=0.5', 'Referer': site }
Step 3)
Now, we’ll use requests to get the HTML source of the login page. Then, we convert the HTML to a BeautifulSoup object.
'''get login page''' resp = session.get(site) html = resp.text '''get BeautifulSoup object of the html of the login page''' soup = BeautifulSoup(html , 'lxml')
Step 4)
If you were manually logging into Amazon using this webpage, you would type in your username and password. When you click “submit”, the browser you’re using takes your login information, along with values in hidden HTML tags, to submit a post request to the login form. These other required tag values can be found by looking at the HTML source of the form.
You can typically view the source of a webpage by right clicking on the page in your browser, and clicking “view page source” (or something similar).
Examining the HTML, we can see that the form name is called signIn.
To easily get the other values required by the form, we’ll use the BeautifulSoup object we’ve just created to loop through the form’s inputs. Since not every input tag will have a value attribute, we will wrap each loop iteration in a try / except statement, so that we can get every input value we need without exiting the loop in error.
'''scrape login page to get all the needed inputs required for login''' data = {} form = soup.find('form', {'name': 'signIn'}) for field in form.find_all('input'): try: data[field['name']] = field['value'] except: pass
Next, you must input your username and password, like below. Here you’ll just replace USERNAME with your username, and PASSWORD with your password.
'''add username and password to the data for post request''' data[u'email'] = USERNAME data[u'password'] = PASSWORD
Step 5)
After we have all the inputs we need, we’ll use them to submit a post request to the signIn form URL (you can find this in the website’s source, or in the snapshot above).
'''submit post request with username / password and other needed info''' post_resp = session.post('https://www.amazon.com/ap/signin', data = data)
Step 6)
Now, we need to check to see if the login was successful. We can check this easily enough by getting the HTML from the post response. First, we scrape the HTML of the response by using the ‘.content’ method of post_resp. Then , we check the title tag on the HTML. If it says ‘Your Account’, then you’ve been successful. Otherwise, something may have gone wrong.
post_soup = BeautifulSoup(post_resp.content , 'lxml') if post_soup.find_all('title')[0].text == 'Your Account': print('Login Successfull') else: print('Login Failed')
One last thing to do is to close your session. This just takes one line, like below. If, however, you want to scrape other information in your account, you can keep your session open. I may have another blog in the future discussing this possibility, but that’s it for now.
session.close()