Sorry it’s been a while, holidays, tests and homework clogging up a lot of time recently. I decided to do a post today about an API I’ve been working with recently, the Amazon Advertising API. As part of my Maths GCSE Coursework, I need to collect data about book costs, and whilst it was fine to collect 100 books on two genres and compare if one is more expensive, I decided to go with a larger sample, 16,000 in this case. I thought it went quite well, and this is how I did it.
First of all, why did I choose the Amazon API? Well despite there being a fairly wide range (Google and Ebay being large contenders), Amazon stores a surprisingly vast amount of data on each book, including factors like weight and height. You can also be very precise on what you search – not just for books, for anything.
Step 1: Collect an API Key (and a secret code + affiliates account)
To get into the API, I had to sign up for various places to get the API Key. Despite it asking for my credit details on numerous occasions, I somehow dodged giving anything away to Amazon. If you’re a developer doing something on these lines, feel free to contact me on how to get a key.
In this case, I chose the official Python API, written via Github so you can see the source-code on your browser. This API is by far simpler and has a good functionality compared to using an XML parser. It was a simple install via pip, which is mentioned on the page.
To make my script, I had to learn some raw basics about how you can run a search request via the API. First of all, you authenticate with Amazon to prove you are a human on the other side. Afterwords, you ask Amazon what you want it to give. It will then give you a mass of data, of which is your task to process.
Example Script (Close to what I used) broken down:
In this case, I link to a Google Spreadsheet so I could see data being added live. Also, for this example I will only be collecting the prices of one genre of books, however using these concepts you can do whatever you like and expand.
from amazonproduct.api import API
This imports gspread, a Google API module which allows me to be able to write the data somewhere. I also imported the Amazon API, which I will need to run this.
api = API(locale=’uk’, cfg=”config.txt”)
This lets Amazon know I’m using Amazon.co.uk, not .com (us) and then lets me “login”.
gc = gspread.login(“email@example.com“, “hmmmm”)
sheet = gc.open(“Book Data”).sheet1
sheet.update_cell(1,1,”Hi, Python Script Running”)
This gets onto the spreadsheet which I’ll be modifying, you can either create one with the module or just make one yourself. This is not necessary if you are not saving via this method.
publishers = 
publisher_file = open(“/Users/MyUserNameIsNotThis/Desktop/Publisher.txt”)
publishers = publisher_file.readlines()
I then created a file which had a list of a few thousand publishers, as this lets me to search a wide range of books. It’s a simple extraction into a Python list.
loc = 0
When writing into the spreadsheet, I need to know the y location of where I’m writing, which will need to be stored as a variable. If I was to store multiple genres, it would be a list (locs = [0, 0, 0, 0…])
def search(genre, publisher, x):
This is the introduction to a Python function. It’s kind of like a way of letting Python do a task again and again, just with different ingredients. I only use a function in this case, as if I wanted to expand to multiple genres (which I did), I can with ease by just changing one of the components.
results = api.item_search(“Books”, Keywords=genre, Publisher=publisher, ResponseGroup=”Medium”)
for book in results:
data = book.ItemAttributes.ListPrice.Amount/100
loc += 1
print “I read a book”
To break this down, It runs a try/except statement (if there can be an error, there will be an error) to ignore the books it can’t analyse enough. It then runs a search (api.item_search) of Books, with my chosen genre. It then searches through each book, collecting the price (ListPrice) as it goes. If it collects a price, it will go down to the next item on the spreadsheet and place the value.
for i in publishers:
search(“Fiction”, i, 3)
For the amount of publishers there are, try to search if that publisher has made a Fiction Book/s. If they have, record the price/s (i being the publisher in the function). Place this in the 3rd column.
With a little tweaking, this is what I got as a result. 😀
Thanks for reading, I hope you’re enjoying 2014 so far! Feel free to comment.