This tutorial by Nick Williams, a MapInfo trainer at Acuity Training, aims to provide an example of how data analysis and cleansing using Excel can be combined with presentation and mapping using MapInfo.
The example will be to calculate the percentage increase in the average property price per district/borough in Surrey and then to produce a thematic map where the boroughs are coloured according to the average price increase.
The project can be broken down into the following steps:-
- Data cleansing using Excel
- Calculate the average prices using Excel
- Calculate the percentage increase per district using Excel
- Import the data into MapInfo
- Join the tables in MapInfo
- Produce a thematic map using MapInfo
The price pad data can be downloaded from https://www.gov.uk/guidance/about-the-price-paid-data#page-navigation
The price paid information is released in the form of price paid data under the terms of Open Government Licence
The district and borough boundaries can be downloaded from Ordnance Survey’s open data page as part of the Boundary Line file. This data is used under the terms and conditions of Ordnance Survey Open Data.
The purpose of this exercise is to provide a tutorial using freely available data and is not intended to express any opinion on the subject.
1. Data Cleansing
The price paid file is published as a csv (Comma Separated Value) file. Csv files are a means of exchanging data between different programs. The data is arranged into columns and each column is separated by a comma. They can be opened in Excel or other spreadsheet programs e.g. Open Office Calc as well as GIS software such as MapInfo or ArcMap.
The data is published in columns in the order set out below:
|Explanation (where appropriate)
|Transaction unique identifier
|A reference number which is generated automatically recording each published sale. The number is unique and will change each time a sale is recorded.
|Sale price stated on the transfer deed.
|Date of Transfer
|Date when the sale was completed, as stated on the transfer deed.
|D = Detached, S = Semi-Detached, T = Terraced, F = Flats/Maisonettes
|Y = a newly built property, N = an established residential building
|Relates to the tenure: F = Freehold, L-Leasehold etc.
|Primary Addressable Object Name. If there is a sub-building for example the building is divided into flats, see Secondary Addressable Object Name (SAON).
|Secondary Addressable Object Name. If there is a sub-building, for example the building is divided into flats, there will be a SAON.
|Record Status – monthly file only
|Indicates additions, changes and deletions to the records.(please see guide below).
Open the file from within Excel. If the file isn’t visible then use the pull down list with the file type to either show all files or Text or cvs files.
The file covers the entire country which makes it very large and this exercise is only dealing with the data for Surrey.
To extract the data for Surrey
- Add a new row for row 1 and update it with the column headings shown above.
- Use the Filter function to select transactions for Surrey only
- Press Sort and Filter
- Arrows will appear at the top of each column
- Pull down the arrow for the County column
- Untick the box next to Select All and tick the box next to Surrey
Now only the data for Surrey is displayed. Copy and paste these rows into a new spreadsheet. I created a new Excel file called Surrey House Prices with separate tabs for 2004 and 2014.
2. Calculate the average house prices for each borough
Pivot tables are a quick and easy way to summarise data.
It may be easier to select the columns first due to the size of the spreadsheet.
- Select the Insert tab on the ribbon
- Press the Pivot table button
- Select the data (if it isn’t already selected)
- Select the location for the pivot table (it may be easier to add it to a new sheet)
- In the Pivot Table Fields box
- Select Borough and Price fields to display.
- Drag the Borough field to the Rows box so the data is grouped on Borough
- Drag Price to the Values field, now pull the arrow down to select Average. The average price for each borough is now displayed
3. Calculate the percentage increase per borough
- Copy and paste the Borough Name and average price to a new sheet. I’ve called the sheet Difference.
- Name the average price column 2004 for clarity purposes
- Copy the 2014 average prices into a new column titled 2014.
- Create a new column titled Difference %.
- Use the following formulae to calculate the percentage increase
Where B2 is the 2004 price and C2 is the 2014 price
- Save this sheet as Difference.csv file so it can be imported into MapInfo
4. Import the data into MapInfo
MapInfo will be used to create a thematic map that colours each district by the percentage increase in property prices.
Both tables will be added to MapInfo then joined on the district name. This must be spelt identically in both the district polygon file and the csv file. E.g. Guildford District must be entered as either Guildford District or Guildford in both. It can’t be spelt as Guildford in one file but Guildford district in the other if the join is to work.
- Open the table district_borough_unitary_region.shp by selecting File and open. Change the file type pull down to ESRI shapefile (shp). MapInfo will create a tab file, it is a good idea to place this in the same location. MapInfo will ask to confirm the projection. Leave this as British National Grid, which is the most common projection for UK GIS projects.
The file will open in a map window as it contains geometry, in this case the district outline polygons.
- Open the file created during the above steps (Difference.csv). Change the file type to Comma Separated Value (csv).
- MapInfo will now prompt for some information on opening the csv file.
- The data contains column headings, so tick the box Use First Line For Column Titles
- This file opens in a browser window as it doesn’t contain any geometry.
Copy the percentage increase between the tables
The next step is to add the Percentage Difference in house price information to the Boundary Line data so the thematic map can be created.
It is good practice to make a copy of any data set that is going to be modified. In this case, only the district polygons that are within Surrey are needed so select those polygons and save them as a new file.
- To view the data for the district_borough_unitary_region.shp file, select Window>New Browser Window.
- Select the file district_borough_unitary_region.shp from the list of available files to browse.
The column File Name contains details on which county the district is in:-
Therefore, it will be easy to create a query to select the districts that are within Surrey.
- Select Query>Select to open the Select dialog box
- Press Assist to open the Expression Builder
- Pull down Column and select File Name
- Pull down Operator and select =
- Click in the expression box and type “Surrey_County” so the expression reads File_name = “SURREY_COUNTY”
- Press OK on the Expression box, the expression is now copied to the Select box
- Press OK on the Select box to run the expression
The districts within Surrey are now selected.
Save this as a new table to eliminate all the superflous data.
- Select File>Save Copy As
- Select Query 1 as the table to save and Press Save As.
- Name the table something Sensible e.g. Surrey_Differences
5. Join the Tables
An additional column needs to be added to contain the Percentage Difference in house prices that will be copied from the Difference.csv file.
- Open the table created in the above steps (Surrey_Differences.tab)
- Select Table>Maintenance>Table Structure and select Surrey Differences as the table to modify.
- The Modify Table Structure Window Opens. This lists the table’s fields and their type.
- Press Add Field, a new field appears at the bottom of the list. Name this Increase. Change the type so it is the same type of field as the Increase field in the Difference.csv file. In this case it should be a Small Integer
- Press Ok to close the dialog and update the table.
Updating the column
The newly created column can now be updated with the data from the Difference.csv file.
The Update column function is opened by selecting Table>Update Column. Enter the following values:-
- Table to Update: Surrey_Differences
- Column to Update: Percentage_increase
- Get Value From “Differences”
- Calculate Value of Expression “Increase”
Press the Join button to instruct MapInfo how to join the tables:-
- Join Where Name from Surrey_Differences matches District from table Difference
The Increase column is now updated with the percentage increase from the Difference file.
6. Create the Thematic Map
The table Surrey_Differences should now contain just the districts that are within Surrey with a column titled Increase that contains the percentage increase in house prices that was copied from the Difference.csv file.
- Select Window>New Map Window
MapInfo will prompt for which layers to include in the new map window if there is more than one mappable table open. Select the table Surrey_Differences and press the > button to add it to the list of tables.
The polygons are currently styled with the default white fill and black outline.
- Select Map>Create Thematic Map
MapInfo will now prompt through the three steps to creating a thematic map:-
In the first dialog, choose the template. This must be a region in this example as the source table contains polygons.
Select the table and field that the thematic will use. In this case the table is Surrey Differences and the Increase field will be used to create the thematic.
MapInfo will select default ranges. In this case they aren’t suitable. A look at the data will show that having 3 ranges for 60-65%, 65-69% and 70-74% increase is more suitable.,
- Press the Ranges button to open the Customise Ranges dialog
- Change the number of ranges to 3 and the method (of calculation) to Equal Ranges
- Press OK to close the Customise Ranges dialog.
The proposed ranges are now copied to the Create Thematic Map box. Check the colours are suitable (e.g. the highest increase should be red, not green) and press OK to create the thematic map.
- Press OK and the thematic map will be applied