This is the second part of our series How to Add Postcode-Based Proximity Search With Open Data. In part 1, we explained how to geocode a dataset using open data.
Previously in this series, I showed how to set up databases for a location-based application.
In this article, I explain three ways to geocode your data using Open Postcode Geo. This involves finding coordinates for each record pinpointing that record’s position. Once we know a record’s position, we can work out the distance between that record and another position, such as our current location. In this article, I show you how to find the position of each pub in a dataset of pubs; our application uses this dataset to find the nearest pub to a position supplied by the user.
The three geocoding methods are:
- Using a SQL join
- Using dedicated fields
- Using the Open Postcode API
Each has pros and cons.
Geocoding with a SQL join
This is a great method if:
- You have the Open Postcode Geo dataset available as a table within in your database.
- You expect your own data to get regularly updates and you don’t want to geocode it each time that happens.
There are better methods if:
- You need the very best performance.
- You cannot install the Open Postcode Geo dataset locally.
- You need up-to-date data, but do not want the overhead of updating your local Open Postcode Geo dataset from time to time.
This method works by joining your dataset table on the open_postcode_geo table using postcode as the common key. This join creates a dataset containing both your existing data and the coordinates for each record from the open_postcode_geo table. Here’s an example using the sample pubs dataset we loaded earlier:
mysql> select open_pubs.name, open_postcode_geo.easting, open_postcode_geo.northing from open_pubs join open_postcode_geo on open_pubs.postcode = open_postcode_geo.postcode limit 10;
+----------------------+---------+----------+
| name | easting | northing |
+----------------------+---------+----------+
| Anchor Inn | 604749 | 234404 |
| Angel Inn | 582889 | 247367 |
| Black Boy Hotel | 587356 | 241327 |
| Black Horse | 604271 | 233919 |
| Black Lion | 582751 | 248296 |
| Bristol Arms | 624667 | 233743 |
| Butt and Oyster Inn | 620644 | 237973 |
| Carriers Arms | 607464 | 235395 |
| Chenery Catering Ltd | 612239 | 234480 |
| Clubhouse | 609842 | 235185 |
+----------------------+---------+----------+
10 rows in set (0.00 sec)
Open Postcode Geo allows you to join on postcodes in a variety of different formats. For fast joining, ensure that the common postcode key is indexed in both tables.
Geocoding dedicated fields with a SQL join
Use this method to add location information to your database when you want to store the easting and northing within your main dataset. For example, it is a good idea if you are creating a derived static dataset containing permanently geocoded addresses. It’s not the best path to take if postcodes in your data change from time to time, as you are storing the coordinates within your dataset and you will therefore need some process to update them should the postcode change.
First, create easting and northing fields in your main dataset table. Let’s do this for our open_pubs table:
mysql> alter table open_pubs add easting mediumint;
Query OK, 54368 rows affected (2.63 sec)
mysql> alter table open_pubs add northing mediumint;
Query OK, 54368 rows affected (2.79 sec)
Then set open_pubs.easting=open_postcode_geo.easting and open_pubs.northing=open_postcode_geo.northing using a SQL join with postcode as the common key:
mysql> update open_pubs join open_postcode_geo on open_pubs.postcode = open_postcode_geo.postcode set open_pubs.easting = open_postcode_geo.easting, open_pubs.northing = open_postcode_geo.northing;
Query OK, 51838 rows affected (20.12 sec)
You may have noticed that not all rows were affected: 51,838 of 54,368. This is because not every pub in the sample dataset has a valid postcodes.
You should now have a single table containing the geocoded data:
mysql> select name, easting, northing from open_pubs limit 10;
+----------------------+---------+----------+
| name | easting | northing |
+----------------------+---------+----------+
| Anchor Inn | 604749 | 234404 |
| Angel Inn | 582889 | 247367 |
| Black Boy Hotel | 587356 | 241327 |
| Black Horse | 604271 | 233919 |
| Black Lion | 582751 | 248296 |
| Bristol Arms | 624667 | 233743 |
| Butt and Oyster Inn | 620644 | 237973 |
| Carriers Arms | 607464 | 235395 |
| Chenery Catering Ltd | 612239 | 234480 |
| Clubhouse | 609842 | 235185 |
+----------------------+---------+----------+
10 rows in set (0.00 sec)
For fast joining, ensure that the common postcode key is indexed in both tables.
Geocoding using the Open Postcode Geo API
The API is a great option if you don’t want to keep a local copy of the Open Postcode Geo dataset up to date within your application. For example, imagine a situation wherein an application’s addresses are user generated, and each time an address is submitted it needs to be geocoded with current data.
Another strength of using the API is it takes postcodes in any standard format. As a result, you do not need to worry about the formats of the postcode data with which you’re supplied.
It’s less recommended as an option if you need the best possible realtime performance without the overhead of an API lookup, or if your application needs to work on a device with unpredictable connectivity.
To use the API, you make a request to the API over HTTP with the postcode in the URI:
http://api.getthedata.com/postcode/[postcode]
For example:
http://api.getthedata.com/postcode/SW1A+1AA
If you click the above example you see the API response, which is returned in JSON:
{
"status": "match",
"match_type": "unit_postcode",
"input": "SW1A 1AA",
"data": {
"postcode": "SW1A 1AA",
"status": "live",
"usertype": "large",
"easting": 529090,
"northing": 179645,
"positional_quality_indicator": 1,
"country": "England",
"latitude": "51.501009",
"longitude": "-0.141588",
"postcode_no_space": "SW1A1AA",
"postcode_fixed_width_seven": "SW1A1AA",
"postcode_fixed_width_eight": "SW1A 1AA",
"postcode_area": "SW",
"postcode_district": "SW1A",
"postcode_sector": "SW1A 1",
"outcode": "SW1A",
"incode": "1AA" },
"copyright": [
"Contains OS data (c) Crown copyright and database right 2016",
"Contains Royal Mail data (c) Royal Mail copyright and database right 2016",
"Contains National Statistics data (c) Crown copyright and database right 2016" ]
}
The important fields for our purposes are:
- tatus: This is either “match” or “no_match”. Where status=no_match your input failed to match a postcode, postcode sector, postcode district, or postcode area, and error is provided.
- match_type: This should be “unit_postcode” unless you are trying to match a part-postcode, for example an outcode.
- data > easting: The easting of the postcode.
- data > northing: The northing of the postcode.
Here is a simple PHP example demonstrating how you might call the API:
// Set $postcode to your postcode.
$postcode = 'SW1A 1AA';
// This is the Open Postcode Geo API endpoint.
$endpoint = 'http://api.getthedata.com/postcode/';
// $postcode should be URL encoded.
$url = $endpoint . urlencode($postcode);
// This fetches the API response and loads it into $response as a string of JSON.
$response = file_get_contents($url) or die('Cannot fetch ' . $url);
// Decode the JSON into an associative array.
$arr = json_decode($response, true) or die('Cannot decode ' . $response);
// Check status and match to ensure we have a suitable response for geocoding.
if($arr['status'] == 'match' and $arr['match_type'] == 'unit_postcode'){
print 'input:' . $postcode . "\n";
print 'postcode:' . $arr['data']['postcode'] . "\n";
print 'easting:' . $arr['data']['easting'] . "\n";
print 'northing:' . $arr['data']['northing'] . "\n";
}
else {
die('Input "' . $postcode . '" does not match valid postcode');
}
If you run this code using PHP’s command line interface, you should see the output:
input:SW1A1AA
postcode:SW1A 1AA
easting:529090
northing:179645
To run the code on a webserver, wrap it in HTML <pre></pre> tags to preserve the formatting.
Stepping through the code:
$postcode = 'SW1A1AA';
$endpoint = 'http://api.getthedata.com/postcode/';
$url = $endpoint . urlencode($postcode);
Set $postcode to your postcode. Any common postcode format is acceptable.
The postcode must be URL encoded before being included in the URL. In practice, this means replacing any [space] characters with + signs.
$response = file_get_contents($url) or die('Cannot fetch ' . $url);
This line fetches the response from the API. If you need more control or error handling, you could use PHP’s cURL functions instead of file_get_contents().
$arr = json_decode($response, true) or die('Cannot decode ' . $response);
This line takes the text of the JSON response and loads it into an associative array.
Continued from page 1.
The second parameter of json_decode() should be set to true in order to return an associative array rather than an object. If you prefer to work with an object, you can omit the second parameter or set it to false.
if($arr['status'] == 'match' and $arr['match_type'] == 'unit_postcode'){
This line does basic validation of the API response. If status=match and match_type=unit_postcode then you have a valid postcode. In most cases, you should get an easting and northing, unless the postcode is non-geographic. If you are concerned about non-geographic postcodes, look at the positional_quality_indicator field in the documentation.
The remainder of the code just prints the output.
You can use the API to iterate through your dataset and populate fields for easting and northing. Let’s do that with our sample data.
First, if you have not already done so, import the sample dataset and add empty fields for easting and northing. These steps are covered earlier in this how-to.
The PHP and MySQL code that follows iterates through all the postcodes in the sample dataset, gets the easting and northing from the API for each postcode, and updates all records with that postcode with the easting and northing.
// Connect to the pub_finder database
$link = mysqli_connect('[hostname]', '[user]', '[password]', 'pub_finder');
// As we are geocoding postcodes, we only need distinct postcodes.
$sql = 'select distinct postcode from open_pubs';
$stmt = mysqli_prepare($link, $sql);
mysqli_stmt_bind_result($stmt, $postcode);
mysqli_stmt_execute($stmt) or die(mysqli_error($link));
mysqli_stmt_store_result($stmt);
while(mysqli_stmt_fetch($stmt)){
// We could validate $postcode here, but the API does that for us.
// This is the Open Postcode Geo API endpoint.
$endpoint = 'http://api.getthedata.com/postcode/';
// $postcode should be URL encoded.
$url = $endpoint . urlencode($postcode);
// This fetches the API response and loads it into $response
// as a string of JSON.
$response = file_get_contents($url) or die('Cannot fetch ' . $url);
// Decode the JSON into an associative array.
$arr = json_decode($response, true) or die('Cannot decode ' . $response);
// Check status, match, easting and northing to ensure we have
// a suitable response for geocoding.
if($arr['status'] == 'match' and $arr['match_type'] == 'unit_postcode' and isset($arr['data']['easting']) and !empty($arr['data']['easting']) and isset($arr['data']['northing']) and !empty($arr['data']['northing'])){
// Update easting and northing for all address records
// which match the postcode.
$sql2 = 'update open_pubs set easting = ?, northing = ? where postcode = ?';
$stmt2 = mysqli_prepare($link, $sql2);
mysqli_stmt_bind_param($stmt2, 'sss', $arr['data']['easting'], $arr['data']['northing'], $postcode);
mysqli_stmt_execute($stmt2) or die(mysqli_error($link));
mysqli_stmt_close($stmt2);
}
}
mysqli_stmt_close($stmt);
Once run, you should see the newly-geocoded data in the open_pubs table:
mysql> select name, easting, northing from open_pubs limit 10;
+----------------------+---------+----------+
| name | easting | northing |
+----------------------+---------+----------+
| Anchor Inn | 604749 | 234404 |
| Angel Inn | 582889 | 247367 |
| Black Boy Hotel | 587356 | 241327 |
| Black Horse | 604271 | 233919 |
| Black Lion | 582751 | 248296 |
| Bristol Arms | 624667 | 233743 |
| Butt and Oyster Inn | 620644 | 237973 |
| Carriers Arms | 607464 | 235395 |
| Chenery Catering Ltd | 612239 | 234480 |
| Clubhouse | 609842 | 235185 |
+----------------------+---------+----------+
10 rows in set (0.00 sec)
Stepping through the additional code:
$sql = 'select distinct postcode from open_pubs';
$stmt = mysqli_prepare($link, $sql);
mysqli_stmt_bind_result($stmt, $postcode);
mysqli_stmt_execute($stmt) or die(mysqli_error($link));
mysqli_stmt_store_result($stmt);
while(mysqli_stmt_fetch($stmt)){
This block selects the distinct postcodes from each address. Open Postcode Geo geocodes postcodes, not full addresses, so the most efficient way to geocode multiple addresses is to geocode distinct postcodes.
Note the use of mysqli_stmt_store_result() which allows a second MySQLi routine to run within the loop.
if($arr['status'] == 'match' and $arr['match_type'] == 'unit_postcode' and isset($arr['data']['easting']) and !empty($arr['data']['easting']) and isset($arr['data']['northing']) and !empty($arr['data']['northing'])){
Here we test that we have values for easting and northing, and that those values are neither empty nor zero, before updating our dataset with those values.
$sql2 = 'update open_pubs set easting = ?, northing = ? where postcode = ?';
$stmt2 = mysqli_prepare($link, $sql2);
mysqli_stmt_bind_param($stmt2, 'sss', $arr['data']['easting'], $arr['data']['northing'], $postcode);
mysqli_stmt_execute($stmt2) or die(mysqli_error($link));
mysqli_stmt_close($stmt2);
Here we update every record in the open_pubs table which shares the postcode we just fetched from the API.
Note that this update is faster if you have an index on the postcode field.
So far in this series we have examined the different ways to access the location data. But how do we incorporate user input into that process? That’s the topic of the next article in the series, How to Geocode User Input.
This is part 2 of our series How to Add Postcode-Based Proximity Search With Open Data. In part 3, we show how to implement the common application of nearest locations to a postcode provided by a user.