Areas of UK for plotting
Contents
Areas of UK for plotting#
The problem this analysis is trying to solve is to find a way to seperate addresses into larger regions:
To identify differences in regions
To be able to plot the data onto a map
This is split into 3 pages:#
The input data being used has UK postcodes, this can then either be converted to a latitude-longitude pair or a larger region that contains many post codes, such as a county or city region.
I’ll use the latter method so I can produce choropleth maps. There are a few different tools that can be used to plot these in python. I’ll use folium mainly because I have used it before.
To produce the choropleth maps from the postcode data, requires two things:
A way to transform postcodes to larger regions
Details of the polygons for those regions
Note: the regions in the two need to be the same.
For smaller regions the UK census regions work well or maybe the best way to go is with a nearest neighbour algorithm?
Code to create maps#
# geometry data
import fiona
import geopandas as gpd
import json
# general - working with dataframes and numbers
import pandas as pd
import numpy as np
import os
# plot map parts
import folium
import branca.colormap as cm
def split_islands(df):
"""
Takes a gpd dataframe of Scotland which needs reducing in size as too big to plot
Highlands and Islands multi-polygon is split (or exploded) from a multi-polygon to set of polygons, the first few are selected and then merged (dissolve) and combined with rest of df
Args: df (geopandas dataframe) of Scottish data
Returns:
df (geopandas dataframe) reduced in size
"""
from shapely.geometry.polygon import Polygon
from shapely.geometry.multipolygon import MultiPolygon
# the islands is the 7th component
gdf = df.iloc[7:8].copy()
# slight mods to help change things
gdf["geometry"] = [MultiPolygon([feature]) if isinstance(feature, Polygon)
else feature for feature in gdf["geometry"]]
# explode th eislands into a number of polygons from one multipolygon
gdf_parts = gdf.explode(column='geometry', ignore_index=True, index_parts=False)
# take the first 3 elements only and dissolve back to one multi polygon
df7new = gdf_parts.iloc[0:3].dissolve()
# create the new geopanda with the new islands and the previous rest
dfnew = pd.concat([df.loc[:6], df7new]).reset_index(drop=True)
return dfnew
def joinWalesRegions(df):
wales_region_dict={
'Wrexham':'North Wales',
'Conwy':'North Wales',
'Gwynedd':'North Wales',
'Isle of Anglesey':'North Wales',
'Flintshire':'North Wales',
'Denbighshire':'North Wales',
'Powys':'Mid Wales',
'Ceredigion':'Mid Wales',
'Carmarthenshire':'South West Wales',
'Swansea':'South West Wales',
'Neath Port Talbot':'South West Wales',
'Pembrokeshire':'South West Wales',
'Bridgend':'South Wales',
'Vale of Glamorgan':'South Wales',
'Cardiff':'South Wales',
'Rhondda Cynon Taf':'South Wales',
'Merthyr Tydfil':'South Wales',
'Caerphilly':'South East Wales',
'Newport':'South East Wales',
'Torfaen':'South East Wales',
'Monmouthshire':'South East Wales',
'Blaenau Gwent':'South East Wales',
}
df.Name = df.Name.str.replace(' Council','')
df.Name = df.Name.replace(wales_region_dict)
df = df.dissolve(by='Name',as_index=False)
return df
def kml_create_json(url_data, loc_save='./_data', fname='region.json', doScotWales=False):
"""
loads a KML location file (from local file or url)
and then saves this as a json file locally
Args: url_data (file path) the path to the kml file,
loc_save (directory path) the location to save the data
fname (file name string) the file name of the json file
doScotWales (string) whether need to call split_islands on Scottish data
or join on Wales data
Returns:
Names the regions in the json file
file path to the json file
"""
# so can use kml files
fiona.drvsupport.supported_drivers['KML'] = 'rw'
# load the kl file to a gpd df
df = gpd.read_file(url_data, driver=fiona.drvsupport.supported_drivers['KML'])
if doScotWales == 'Scotland':
df = split_islands(df)
elif doScotWales == 'Wales':
df = joinWalesRegions(df)
# Save as a json file to load in plot_map
file_path = os.path.join(loc_save, fname)
df.to_file( file_path, driver="GeoJSON")
return df.Name, file_path
def plot_map(df,
what_to_plot='amount_pc',region_to_plot='Name',
json_path='./_data/region.json',
longitude=-3.1, latitude=54.1):
"""
loads a KML location file (from local file or url)
and then saves this as a json file locally
Args: df (pandas dataframe) dataframe with location (region_to_plot) related to json file
in one column and data to plot (what_to_plot) in another
what_to_plot/region_to_plot (string) columns in df
json_path (file path) the path to the json file
longitude/latitude (float) for the centre of the map
Returns:
The folium map
"""
# create a basic map
# This sets the basic look of the map
# things like the zoom, color scheme, where the map is centred etc
m = folium.Map(location=[latitude,longitude],
zoom_start=5,
control_scale=True,
tiles="Stamen Toner")
folium.TileLayer('CartoDB positron',name="Light Map",control=False).add_to(m)
"""
Create the Choropleth part:
- This creates the Choropleth map (i.e. regions on a map of different color based on their value)
- json_path this is the path to a json file with the location data
- key_on= "feature.properties.Name" refers to the json file
- and needs to match region_to_plot in columns=[region_to_plot,what_to_plot]
- data the data normally a pandas dataframe
- the columns are region_to_plot and what_to_plot
- what_to_plot are the values to plot
- region_to_plot should match the json file
"""
choropleth = folium.Choropleth(
geo_data=json_path,
name='choropleth',
legend_name= what_to_plot,
data= df,
columns=[region_to_plot,what_to_plot],
key_on= "feature.properties.Name",
fill_color='YlGn',
).add_to(m)
# adds ability to see names of regions on mouseove
choropleth.geojson.add_child(
folium.features.GeoJsonTooltip(['Name'],labels=False)
)
return m
def url_KML_map(url_data, doScotWales=False):
"""
The main function used to take a KML file and plot it
Calls kml_create_json to create json file
and plot_map to plot the map
Args: url_data (file path) the path to the kml file
doScotWales (string) whether need to call split_islands on Scottish data
or join on Wales data
Returns:
The folium map
"""
# create a json file for plotting and gives back names of regions
fname=url_data.split('/')[-1].split('.')[0]+'.json'
map_names, json_path = kml_create_json(url_data, fname=fname, doScotWales = doScotWales)
df = pd.DataFrame(columns=['County','Data'])
# add the names of the regions
df['County'] = map_names
# create some random data to plot
df['Data']= np.random.randint(0 ,100,len(df) )
m = plot_map(df,what_to_plot='Data',region_to_plot='County',
json_path = json_path)
return m
Create maps#
From UK postcodes, i.e. first two letter of the postcode like NE, M, PL, WA. This is the easiest way as the postcode input data can be easily transformed.
However, the regions are less intuitive. i.e. what is WA or PL? And some postcode areas can be very small or big such as London regions (small) or N. Ireland (big).
url_data = "https://www.doogal.co.uk/kml/UkPostcodes.kml"
url_KML_map(url_data)
For England#
I chose a region size approximately the same size as the poscode region above. I did this using doogal for counties in England. This gave both the polygon file for counties and a postcode to county conversion.
url_data = 'https://www.doogal.co.uk/kml/counties/Counties.kml'
url_KML_map(url_data)
For Scotland#
Postcode data can be found in numerous places (including Doogal), I decided to use the National Records of Scotland. As the region size is similar to the poscode one and England counties.
For the polygon data the UK Data Service is used.
The Scottish Parliamentary regions is used, with the area_to_plot='ScottishParliamentaryRegion2021Code
column from the postcode data from National Records of Scotland.
Note there is slight difference in codes for regions. The years for the regions is different, but after checking the regions are the same but with slight different code numbers.
The .kml file is very large mainly because the Highlands and Islands multi-polygon is so detailed to inorporate all the islands. To get around this the function split_islands
is used which explodes the multi-polygon for the Highlands and Islands takes a few of the polygons then merges it back together.
# url_data = "https://borders.ukdataservice.ac.uk/ukborders/easy_download/prebuilt/kml/Scotland_preg_2011.zip"
url_data = ".\\_data\\scotland_preg_2011.KML"
url_KML_map(url_data,'Scotland')