Getting Started With Geocoding

In this notebook, you will learn how to geocode different sorts of location data by making requests to several online APIs (Application Programming Interface) for latitude and longitude co-ordinates associated with those locations.

The aim of the notebook is not to teach you formal approaches for working with APIs or the data that is returned from them. Instead, it's something to whet your curiosity. Something to show you how, with a few lines of Python code, you can start to work with live, third-party datasources and online services to perform real-world programming tasks.

If something doesn't work: DON'T PANIC. You won't break your computer and you won't break the internet. And you won't fail the module if you just move on!

The location data we will consider includes:

In [1]:
#The requests library makes it easy to call URLs using Python
import requests

Postcodes

Postcodes are a widely used form of location data, typically capable of identifying a location to a resolution of a few hundred square metres.

There are several online services that will return geolocation information given a postcode.

To call the service, we construct a URL as defined for a particular API and make a request to that URL using the python requests package.

Data is often returned from webservices using the JSON (Javascript Object Notation) data format, although some APIs allow you to specify other formats such as XML.

(One advantage of the JSON response is that it can be immediately consumed by a Javascript script called from inside a webpage.)

JSON and XML both allow data to be represented in a structured, tree based hierarchical format. The first API we will use, published via the postcodes.io website, structures its response data in the following way:

Hierarchical structure of postocdes api data, showire results tree with latitude, longitude and codes children, and codes showing admin_district and parish district

The result node is at the top of the tree with children postcode, latitude, longitude and so on. The codes child has further children, such as: admin and parish.

In python, data structures of this form can be represented using the dict ("dictionary") structure, which you will meet elsewhere in the course.

The python requests library has a method that parses a correctly formed JSON response as a python dict, or more generally, as a set of nested dicts. In this case, one dict structure may be nested inside another to support child, grandchild, great grandchild, and so on, levels of structure.

Hierarchical structure of postocdes api data, showire results tree with latitude, longitude and codes children, and codes showing admin_district and parish district

The contents of different levels of the nested dict data structure can be accessed by using a form of associative, relative addressing. For example, if the variable mypostcode is set to the dict shown above, we could access the contents of the main result part of the data structure by writing: mypostcode["result"].

To obtain the value of items in deeper nested parts of the data structure, we simply add further levels of relative addressing. To fetch the value of the postcode, we need to specify the path to it via the result node: mypostcode["result"]["postcode"]. To obtain the value of the parish in the code part of the data structure, we specify the path to it as mypostcode["result"]["code"]["parish"].

Run the following cell to call the postcodes.io API with a particular postcode.

See if you can make sense of the result that is returned.

In [2]:
postcode = 'MK7 6AA'
r=requests.get('https://api.postcodes.io/postcodes/{PC}'.format(PC=postcode))
r.json()
Out[2]:
{'result': {'admin_county': None,
  'admin_district': 'Milton Keynes',
  'admin_ward': 'Monkston',
  'ccg': 'NHS Milton Keynes',
  'codes': {'admin_county': 'E99999999',
   'admin_district': 'E06000042',
   'admin_ward': 'E05009415',
   'ccg': 'E38000107',
   'nuts': 'UKJ12',
   'parish': 'E04001275'},
  'country': 'England',
  'eastings': 488625,
  'european_electoral_region': 'South East',
  'incode': '6AA',
  'latitude': 52.0249147315159,
  'longitude': -0.709747474196332,
  'lsoa': 'Milton Keynes 017C',
  'msoa': 'Milton Keynes 017',
  'nhs_ha': 'South Central',
  'northings': 237063,
  'nuts': 'Milton Keynes',
  'outcode': 'MK7',
  'parish': 'Walton',
  'parliamentary_constituency': 'Milton Keynes South',
  'postcode': 'MK7 6AA',
  'primary_care_trust': 'Milton Keynes',
  'quality': 1,
  'region': 'South East'},
 'status': 200}

Try rerunning the previous cell using different postcodes - can the service locate your home postcode?

Parsing the postcodes.io JSON data

Once we have retrieved the data from the API, and cast it as a python data object, we can look inside it programmatically.

For example, we can find the latitude and longitude values.

In [3]:
#Obtain the lat/long of a postcode
lat=r.json()['result']['latitude']
lon=r.json()['result']['longitude']

#Display the result
print(lat,lon)
52.0249147315159 -0.709747474196332

Having access to the latitude and longitude means we can start to make use of that information, for example by plotting it on a map.

You may recall how we previously used the folium package to generate interactive maps from python code within a notebook.

We can do a similar thing again here.

In [6]:
!pip3 install folium
Collecting folium
  Downloading folium-0.3.0-py3-none-any.whl (71kB)
    100% |████████████████████████████████| 71kB 2.4MB/s ta 0:00:011
Requirement already satisfied: Jinja2 in /usr/local/lib/python3.6/site-packages (from folium)
Requirement already satisfied: six in /usr/local/lib/python3.6/site-packages (from folium)
Collecting branca (from folium)
  Downloading branca-0.2.0-py3-none-any.whl
Requirement already satisfied: MarkupSafe>=0.23 in /usr/local/lib/python3.6/site-packages (from Jinja2->folium)
Installing collected packages: branca, folium
Successfully installed branca-0.2.0 folium-0.3.0
In [7]:
#Plot the lat long of a postcode on a map

#We need to import the following packages to access the maps
import folium

#Create a map centered on the postcode location at a particular zoom level
mymap = folium.Map(location=[lat, lon], zoom_start=15)

#Create a popup message using Python string formatting to create the label based on variable values
popupstr = 'Location of {PC}: ({lat},{lon})'.format(PC=postcode, lat=lat,lon=lon)

#Display a marker for the location
folium.Marker([52.0239, -0.7072], popup=popupstr).add_to(mymap)
mymap
Out[7]:

Addresses

As well as geolocating postcodes, we can also goecode complete (or partial) addresses. One API that supports address based geocoding is the Google Maps geocoding API.

Once again, we need to construct a URL according to a pattern defined by the API documentation. Then we can make a request to that URL and hopefully get the geocoded data back as a response.

In [49]:
address='Open University, Walton Hall, Milton Keynes, MK7 6AA, UK'
r= requests.get("https://maps.googleapis.com/maps/api/geocode/json", params={'address': address, 'sensor': "false"})
r.json()
Out[49]:
{'results': [{'address_components': [{'long_name': 'Walton Hall',
     'short_name': 'Walton Hall',
     'types': ['establishment', 'point_of_interest']},
    {'long_name': 'Kents Hill',
     'short_name': 'Kents Hill',
     'types': ['locality', 'political']},
    {'long_name': 'Milton Keynes',
     'short_name': 'Milton Keynes',
     'types': ['postal_town']},
    {'long_name': 'Milton Keynes',
     'short_name': 'Milton Keynes',
     'types': ['administrative_area_level_2', 'political']},
    {'long_name': 'England',
     'short_name': 'England',
     'types': ['administrative_area_level_1', 'political']},
    {'long_name': 'United Kingdom',
     'short_name': 'GB',
     'types': ['country', 'political']},
    {'long_name': 'MK7 6BH',
     'short_name': 'MK7 6BH',
     'types': ['postal_code']}],
   'formatted_address': 'Walton Hall, Kents Hill, Milton Keynes MK7 6BH, UK',
   'geometry': {'location': {'lat': 52.02462269999999, 'lng': -0.7107079},
    'location_type': 'APPROXIMATE',
    'viewport': {'northeast': {'lat': 52.02597168029149,
      'lng': -0.709358919708498},
     'southwest': {'lat': 52.02327371970849, 'lng': -0.712056880291502}}},
   'place_id': 'ChIJW3FMFVuhd0gRVUSpS2HG-ps',
   'types': ['establishment', 'point_of_interest']}],
 'status': 'OK'}

Try rerunning the previous cell with an address that is familiar to you. Does the API find it?

Optional Activities

  • see if you can write a loop that will look up the geolocations of several postcodes, one at a time. To be nice to the API import the python time library and add the statement time.sleep(1) inside the loop to pause its execution for one second during each iteration.
  • create a new folium map object to display several markers, one for each of your (looped) postcodes. Inside the postcode loop add a corresponding marker to the map. Don't forget to render the map from the last line of code in the cell.

IP Addresses

As well as looking up geolocation data for a postal address, we can also try to look up a location based on the IP address of a computer. There are seveal websites that allow you to lookup the IP address of the device you are using to connect to the internet, and several webservices too.

I'm going to use a simple service from Amazon web services that returns an IP address terminated by an end of line (\n) character. By using the requests library, I can call the URL, access the data response (text) and then strip (.strip())) the end-of-line whitespace character from it.

In [50]:
myIPaddress=requests.get('http://checkip.amazonaws.com/').text.strip()
myIPaddress
Out[50]:
'109.157.179.177'
In [51]:
#We can construct a URL based around the IP address of the machine making the request as follows:
url='https://freegeoip.net/json/{IP}'.format(IP=myIPaddress)
url
Out[51]:
'https://freegeoip.net/json/109.157.179.177'
In [52]:
r=requests.get(url)
r.json()
Out[52]:
{'city': 'Sandown',
 'country_code': 'GB',
 'country_name': 'United Kingdom',
 'ip': '109.157.179.177',
 'latitude': 50.6674,
 'longitude': -1.186,
 'metro_code': 0,
 'region_code': 'ENG',
 'region_name': 'England',
 'time_zone': 'Europe/London',
 'zip_code': 'PO36'}

The result may surprise you, for example if the notebook and the python process associated with it is running on a server hosted in the cloud. In this case, try looking up the IP address associated with computer you are using to access the internet. You can find this IP by visiting the link: http://checkip.amazonaws.com/.

Cell Tower Lookup

The Google geolocation API can be used to look-up the geographical locations (latitue and longitude co-ordinates) of cell towers and wifi hotsposts based on their unique IDs.

To call the Google webservice to look up the geographical locations of cell towers or wifi hotspots from their IDs, you will need to get a Google Geolocation API token: visit https://developers.google.com/maps/documentation/geocoding/get-api-key and follow the instructions on how to get a key for the geolocation API.

When you have obtained your key, use it to set the googleMapsAPIkey variable below.

In [62]:
googleMapsAPIkey="AIzaSyAnpCrSlBn72gHzcxrX5EHKxeeKOiOuBVg"

Once you have set your Google API key, run the following cell to look up the details of a particular cell tower:

In [63]:
#Add your cell tower details here.
#You can find them using an app such as the OpenSignal app

postjson = {
  "cellTowers": [
    {
        "mobileCountryCode": 234,
        "mobileNetworkCode": 15,
        "locationAreaCode": 714,#979,
        "cellId": 1671#42333969
    }
  ]
}
In [64]:
url='https://www.googleapis.com/geolocation/v1/geolocate?key={}'.format(googleMapsAPIkey)

print(postjson)
r = requests.post(url, json=postjson)
r.json()
{'cellTowers': [{'cellId': 1671, 'mobileNetworkCode': 15, 'mobileCountryCode': 234, 'locationAreaCode': 714}]}
Out[64]:
{'accuracy': 2588.0, 'location': {'lat': 50.6544242, 'lng': -1.200891}}

WiFi Hotspot MAC Addresses

As well as services that provide access to directories that try to associate IP addresses with physical locations, there are also databases that also try to associate MAC addresses of wifi routers with physical locations.

If your computer has a wifi enabled, you will use access a low level command on your computer that identifies in-range wifi routers and provides adminstrative information about them.

STILL NEEDS TESTING & REFINING - TO DO

Note that to call the Google webservice to look up the geographical locations of cell towers or wifi hotspots from their IDs, you will need to get a Google Geolocation API token: visit https://developers.google.com/maps/documentation/geocoding/get-api-key and follow the instructions on how to get a key for the geolocation API.

When you have obtained your key, use it to set the googleMapsAPIkey variable below.

Also note that the code may look a little bit involved. But DON'T PANIC, you don't need to be able to write, or even read, this sort of code for the purposes of this course.

In [59]:
googleMapsAPIkey="YOUR_KEY_HERE"
In [60]:
import sys
import requests

#http://stackoverflow.com/a/9859202/454773
def isInt_str(v):
    v = str(v).strip()
    return v=='0' or (v if v.find('..') > -1 else v.lstrip('-+').rstrip('0').rstrip('.')).isdigit()


#/System/Library/PrivateFrameworks/Apple80211.framework/Resources/airport
import subprocess
def getWifiMacAddresses():
    #autodetect platform and then report based on this? 
    print(sys.platform)
    
    macAddr={}
    
    #For Mac:
    if sys.platform=='darwin':
            results = subprocess.check_output(["/System/Library/PrivateFrameworks/Apple80211.framework/Resources/airport", "-s"])
            results = results.decode("utf-8").split("\n")
            for l in [x.strip() for x in results[1:] if x.strip()!='']:
                ll=l.split(' ')
                #We could use a regular expression - or we can construct our parser a step at a time...
                macAddress=l.strip().split(' ')[1]
                strength=l.strip().split(' ')[2]
                if isInt_str(strength):
                    macAddr[l.strip().split(' ')[0]]={'macAddress':macAddress,
                                                      'signalStrength':int(strength)}
                
    elif win in sys.platform:
        results = subprocess.check_output(["netsh", "wlan", "show", "network", "mode=bssid"])
        results = results.replace("\r","").split("\n")
        macAddress='UNKNOWN'
        for l in results[4:]:
            if l.startswith('SSID'):
                macAddress=':'.join(l.split(':')[1:]).strip()
            if 'BSSID' in l:
                macAddr[macAddress]=l.split(':')[1].strip()
                macAddress='UNKNOWN'

    elif 'linux' in sys.platform:
        #linux?
        #! apt-get -y install wireless-tools
        #results = subprocess.check_output(["iwlist","scanning"])    
        #via PP - linux text - TO DO
        # apt-get -y install wireless-tools then run iwlist scanning to display the details of wireless access points your computer can see.
        #apt-get -y install wireless-tools gave me "Could not open the lock file ..."
        #However when I checked in the Ubuntu Software Centre wireless-tools was already installed. I think non-expert users may use the Software Centre to install additional applications.
        #iwlist just give you a not very helpful usage list. What works directly is:
        #iwlist wlan0 scan
        pass

    return macAddr
In [ ]:
postjson={'wifiAccessPoints':[]}
hotspots=getWifiMacAddresses()

for h in hotspots:
    postjson['wifiAccessPoints'].append(hotspots[h])
    print(h,hotspots[h])
    
print('JSON posted to Google service: ',postjson)
url='https://www.googleapis.com/geolocation/v1/geolocate?key={}'.format(googleMapsAPIkey)


r = requests.post(url, json=postjson)
r.json()

Summary

In this notebook, you have learned how to geocode several different sorts of location identifer - postcodes, postal addresses, IP addresses and maybe even the MAC address of any WiFI routers in view of your computer.

You have also seen how we can take the JSON data returned from the geolocation services and parse it as python dict that we can then start to work as data, for example, by plotting markers associated with identified locations on an interactive map.