Sep. 26, 2013

Scraping Wunderground

Overview

Working with APIs is both fun and educational.

Many companies like Google, Reddit and Twitter releases it's API to the public
so that developers can develop products that are powered by its service.

Working with APIs learns you the nuts and bolts beneath the hood. 

In this post, we will work the Weather Underground API.

Weather Underground (Wunderground)

We will build an app that will connect to 'Wunderground' and retrieve. 
Weather Forecasts etc. 
Wunderground provides local & long range Weather Forecast, weather reports,
maps & tropical weather conditions for locations worldwide.

API

An API is a protocol intended to be used as an interface by software components
to communicate with each other. An API is a set of  programming instructions and
standards for accessing web based software applications (such as above).

With API's applications talk to each other without any user knowledge or
intervention.

Getting Started

The first thing that we need to do when we want to use an API, is to see if the
company provides any API documentation. Since we want to write an application for
Wunderground, we will go to Wundergrounds website

At the bottom of the page, you should see the "Weather API for Developers". 
api-dev

The API Documentation

Most of the API features require an API key, so let's go ahead and sign up for
a key before we start to use the Weather API.

In the documentation we can also read that the API requests are made over HTTP
and that Data features return JSON or XML.

To read the full API documentation, see this link.

Before we get the key, we need to first create a free account. 

The API Key

Next step is to sign up for the API key. Just fill in your name, email address,
project name and website and you should be ready to go. 

Many services on the Internet (such as Twitter, Facebook..) requires that you
have an "API Key".  

An application programming interface key (API key) is a code passed in by
computer programs calling an API to identify the calling program, its developer,
or its user to the Web site. 

API keys are used to track and control how the API is being used, for example
to prevent malicious use or abuse of the API. 

The API key often acts as both a unique identifier and a secret token for
authentication, and will generally have a set of access rights on the API
associated with it.

Current Conditions in US City

Wunderground provides an example for us in their API documentation.
Current Conditions in US City
http://api.wunderground.com/api/0def10027afaebb7/conditions/q/CA/San_Francisco.json
If you click on the "Show response" button or copy and paste that URL into your
browser, you should something similar to this:
{
	"response": {
		"version": "0.1"
		,"termsofService": "http://www.wunderground.com/weather/api/d/terms.html"
		,"features": {
		"conditions": 1
		}
	}
		,	"current_observation": {
		"image": {
		"url":"http://icons-ak.wxug.com/graphics/wu2/logo_130x80.png",
		"title":"Weather Underground",
		"link":"http://www.wunderground.com"
		},
		"display_location": {
		"full":"San Francisco, CA",
		"city":"San Francisco",
		"state":"CA",
		"state_name":"California",
		"country":"US",
		"country_iso3166":"US",
		"zip":"94101",
		"magic":"1",
		"wmo":"99999",
		"latitude":"37.77500916",
		"longitude":"-122.41825867",
		"elevation":"47.00000000"
		},
		.....

Current Conditions in Cedar Rapids

On the "Code Samples" page we can see the whole Python code to retrieve the
current temperature in Cedar Rapids. 
Copy and paste this into your favorite editor and save it as anything you like. 

Note, that you have to replace "0def10027afaebb7" with your own API key.
import urllib2
import json
f = urllib2.urlopen('http://api.wunderground.com/api/0def10027afaebb7/geolookup/conditions/q/IA/Cedar_Rapids.json')
json_string = f.read()

parsed_json = json.loads(json_string)

location = parsed_json['location']['city']

temp_f = parsed_json['current_observation']['temp_f']

print "Current temperature in %s is: %s" % (location, temp_f)

f.close()
To run the program in your terminal:
python get_current_temp.py
Your program will return the current temperature in Cedar Rapids:

Current temperature in Cedar Rapids is: 68.9

What is next?

Now that we have looked at and tested the examples provided by Wunderground,
let's create a program by ourselves. 

The Weather Underground provides us with a whole bunch of "Data Features" that
we can use. 

It is important that you read through the information there, to understand how
the different features can be accessed. 

Standard Request URL Format

"Most API features can be accessed using the following format. 

Note that several features can be combined into a single request."
http://api.wunderground.com/api/0def10027afaebb7/features/settings/q/query.format
where:

0def10027afaebb7:	Your API key

features:		One or more of the following data features

settings (optional):	Example: lang:FR/pws:0

query:			The location for which you want weather information

format:		json, or xml
What I want to do is to retrieve the forecast for Paris.

The forecast feature returns a summary of the weather for the next 3 days. 

This includes high and low temperatures, a string text forecast and the conditions.

Forecast for Paris

To retrieve the forecast for Paris, I will first have to find out the country
code for France, which I can find here:

Weather by country

Next step is to look for the "Feature: forecast" in the API documentation.

The string that we need can be found here: 

http://www.wunderground.com/weather/api/d/docs?d=data/forecast

By reading the documentation, we should be able to construct an URL.

In our case, the URL looks like this:
http://api.wunderground.com/api/your_api_key/forecast/q/France/Paris.json

Making the API call

We now have the URL that we need and we can start with our program.

Now its time to make the API call to Weather Underground.

Note: Instead of using the urllib2 module as we did in the examples above,
we will in this program use the "requests" module. 
Making the API call is very easy with the "requests" module. 
r = requests.get("http://api.wunderground.com/api/your_api_key/forecast/q/France/
Paris.json")
Now, we have a Response object called "r". We can get all the information we need
from this object.

Creating our Application

Open your editor of choice, at the first line, import the requests module.

Note, the requests module comes with a built-in JSON decoder, which we can use
for the JSON data. That also means, that we don't have to import the JSON
module (like we did in the previous example when we used the urllib2 module)
import requests
To begin extracting the information that we need, we first have to see 
what keys that the "r" object returns to us. 
The code below will return the keys and should return [u'response', u'forecast']
import requests

r = requests.get("http://api.wunderground.com/api/your_api_key/forecast/q/France/
Paris.json")

data = r.json()

print data.keys()

Getting the data that we want

Copy and paste the URL (from above) into a JSON editor.

I use http://jsoneditoronline.org/ but any JSON editor should do the work. 

This will show an easier overview of all the data.

http://api.wunderground.com/api/your_api_key/forecast/q/France/Paris.json

Note, the same information can be gained via the terminal, by typing:
r = requests.get("http://api.wunderground.com/api/your_api_key/forecast/q/France/
Paris.json")
print r.text
After inspecting the output given to us, we can see that the data that we are
interested in, is in the "forecast" key. Back to our program, and print out the
data from that key.
import requests

r = requests.get("http://api.wunderground.com/api/your_api_key/forecast/q/France/
Paris.json")

data = r.json()

print data['forecast']
The result is stored in the variable "data".  

To access our JSON data, we simple use the bracket notation, like this:
data['key'].

Let's navigate a bit more through the data, by adding 'simpleforecast'
forecast
import requests

r = requests.get("http://api.wunderground.com/api/your_api_key/forecast/q/France/
Paris.json")

data = r.json()

print data['forecast']['simpleforecast']
We are still getting a bit to much output, but hold on, we are almost there. 
The last step in our program is to add ['forecastday'] and instead of printing
out each and every entry, we will use a for loop to iterate through the dictionary. 
forecast2
We can access anything we want like this, just look up what data you are
interested in.
In this program I wanted to get the forecast for Paris. 

Let's see how the code looks like. 
import requests

r = requests.get("http://api.wunderground.com/api/0def10027afaebb7/forecast/q/France/Paris.json")
data = r.json()

for day in data['forecast']['simpleforecast']['forecastday']:
    print day['date']['weekday'] + ":"
    print "Conditions: ", day['conditions']
    print "High: ", day['high']['celsius'] + "C", "Low: ", day['low']['celsius'] + "C", '\n'
Run the program.

$ python get_temp_paris.py
wundergroundLogo_blue_horz
Monday:
Conditions:  Partly Cloudy
High:  23C Low:  10C

Tuesday:
Conditions:  Partly Cloudy
High:  23C Low:  10C

Wednesday:
Conditions:  Partly Cloudy
High:  24C Low:  14C

Thursday:
Conditions:  Mostly Cloudy
High:  26C Low:  15C
The forecast feature is just one of many. I will leave it up to you to explore
the rest. 

Once you get the understanding of an API and it's output in JSON, you understand
how most of them work.

More Reading

A comprehensive list of Python APIs
Weather Underground


Read more about: