Areas of UK for plotting#

The problem this analysis is trying to solve is to find a way to seperate addresses into larger regions:

  • To identify differences in regions

  • To be able to plot the data onto a map

This is split into 3 pages:#

The input data being used has UK postcodes, this can then either be converted to a latitude-longitude pair or a larger region that contains many post codes, such as a county or city region.

I’ll use the latter method so I can produce choropleth maps. There are a few different tools that can be used to plot these in python. I’ll use folium mainly because I have used it before.

To produce the choropleth maps from the postcode data, requires two things:

  1. A way to transform postcodes to larger regions

  2. Details of the polygons for those regions

Note: the regions in the two need to be the same.

For smaller regions the UK census regions work well or maybe the best way to go is with a nearest neighbour algorithm?

Code to create maps#

# geometry data
import fiona
import geopandas as gpd
import json

# general - working with dataframes and numbers
import pandas as pd
import numpy as np

import os

# plot map parts
import folium
import branca.colormap as cm

def split_islands(df):
    """
    Takes a gpd dataframe of Scotland which needs reducing in size as too big to plot
        Highlands and Islands multi-polygon is split (or exploded) from a  multi-polygon to set of polygons, the first few are selected and then merged (dissolve) and combined with rest of df
    
    Args: df (geopandas dataframe) of Scottish data
    Returns: 
        df (geopandas dataframe) reduced in size
    
    """
    from shapely.geometry.polygon import Polygon
    from shapely.geometry.multipolygon import MultiPolygon
    
    # the islands is the 7th component
    gdf = df.iloc[7:8].copy()

    # slight mods to help change things
    gdf["geometry"] = [MultiPolygon([feature]) if isinstance(feature, Polygon)
                       else feature for feature in gdf["geometry"]]

    # explode th eislands into a number of polygons from one multipolygon
    gdf_parts = gdf.explode(column='geometry', ignore_index=True, index_parts=False)

    # take the first 3 elements only and dissolve back to one multi polygon
    df7new = gdf_parts.iloc[0:3].dissolve()

    # create the new geopanda with the new islands and the previous rest
    dfnew = pd.concat([df.loc[:6], df7new]).reset_index(drop=True)
    
    return dfnew

def joinWalesRegions(df):
    wales_region_dict={ 
         'Wrexham':'North Wales', 
         'Conwy':'North Wales', 
         'Gwynedd':'North Wales', 
         'Isle of Anglesey':'North Wales',
         'Flintshire':'North Wales', 
         'Denbighshire':'North Wales', 
         'Powys':'Mid Wales',
         'Ceredigion':'Mid Wales',
         'Carmarthenshire':'South West Wales',
         'Swansea':'South West Wales',
         'Neath Port Talbot':'South West Wales',
         'Pembrokeshire':'South West Wales',
         'Bridgend':'South Wales',
         'Vale of Glamorgan':'South Wales',
         'Cardiff':'South Wales',
         'Rhondda Cynon Taf':'South Wales',
         'Merthyr Tydfil':'South Wales',
         'Caerphilly':'South East Wales',
         'Newport':'South East Wales',  
         'Torfaen':'South East Wales', 
         'Monmouthshire':'South East Wales',    
         'Blaenau Gwent':'South East Wales',
        }
    
    df.Name = df.Name.str.replace(' Council','')
    df.Name = df.Name.replace(wales_region_dict)
    
    df = df.dissolve(by='Name',as_index=False)
    
    return df

def kml_create_json(url_data, loc_save='./_data', fname='region.json', doScotWales=False):
    """
    loads a KML location file (from local file or url) 
    and then saves this as a json file locally
    
    Args: url_data (file path) the path to the kml file,
          loc_save (directory path) the location to save the data
          fname (file name string) the file name of the json file
          doScotWales (string) whether need to call split_islands on Scottish data
            or join on Wales data
    Returns: 
        Names the regions in the json file
        file path to the json file
    
    """
        
    # so can use kml files
    fiona.drvsupport.supported_drivers['KML'] = 'rw'

    # load the kl file to a gpd df
    df = gpd.read_file(url_data, driver=fiona.drvsupport.supported_drivers['KML'])

    if doScotWales == 'Scotland':
        df = split_islands(df)
    elif doScotWales == 'Wales':
        df = joinWalesRegions(df)
    # Save as a json file to load in plot_map
    file_path = os.path.join(loc_save, fname)
    df.to_file( file_path, driver="GeoJSON")

    
    return df.Name, file_path

def plot_map(df, 
             what_to_plot='amount_pc',region_to_plot='Name',
             json_path='./_data/region.json',
             longitude=-3.1, latitude=54.1):
    """
    loads a KML location file (from local file or url) 
    and then saves this as a json file locally
    
    Args: df (pandas dataframe) dataframe with location (region_to_plot) related to json file
                in one column and data to plot (what_to_plot) in another
          what_to_plot/region_to_plot (string) columns in df
          json_path (file path) the path to the json file
          longitude/latitude (float) for the centre of the map
    Returns: 
        The folium map
    
    """
    # create a basic map
    # This sets the basic look of the map
    # things like the zoom, color scheme, where the map is centred etc

    m = folium.Map(location=[latitude,longitude], 
                   zoom_start=5,
                   control_scale=True,
                   tiles="Stamen Toner")
    
    folium.TileLayer('CartoDB positron',name="Light Map",control=False).add_to(m)
    
    """
    Create the Choropleth part:

    - This creates the Choropleth map (i.e. regions on a map of different color based on their value)
    - json_path this is the path to a json file with the location data
        - key_on= "feature.properties.Name" refers to the json file
        - and needs to match region_to_plot in columns=[region_to_plot,what_to_plot]
    - data the data normally a pandas dataframe
        - the columns are region_to_plot and what_to_plot
        - what_to_plot are the values to plot
        - region_to_plot should match the json file
    """

    choropleth = folium.Choropleth(
        geo_data=json_path,
        name='choropleth',
        legend_name= what_to_plot,
        data= df,
        columns=[region_to_plot,what_to_plot],
        key_on= "feature.properties.Name",
        fill_color='YlGn',
    ).add_to(m)
    
    # adds ability to see names of regions on mouseove
    choropleth.geojson.add_child(
    folium.features.GeoJsonTooltip(['Name'],labels=False)
    )
    

    return m


def url_KML_map(url_data, doScotWales=False):
    """
    The main function used to take a KML file and plot it 
        Calls kml_create_json to create json file
        and plot_map to plot the map
    
    Args: url_data (file path) the path to the kml file
          doScotWales (string) whether need to call split_islands on Scottish data
            or join on Wales data
    Returns: 
        The folium map
    
    """
        
    # create a json file for plotting and gives back names of regions
    fname=url_data.split('/')[-1].split('.')[0]+'.json'
    map_names, json_path = kml_create_json(url_data, fname=fname, doScotWales = doScotWales)

    df = pd.DataFrame(columns=['County','Data'])
      
    # add the names of the regions
    df['County'] = map_names
    # create some random data to plot
    df['Data']= np.random.randint(0 ,100,len(df) )

    m = plot_map(df,what_to_plot='Data',region_to_plot='County',
                json_path = json_path)
    
    return m

Create maps#

From UK postcodes, i.e. first two letter of the postcode like NE, M, PL, WA. This is the easiest way as the postcode input data can be easily transformed.

However, the regions are less intuitive. i.e. what is WA or PL? And some postcode areas can be very small or big such as London regions (small) or N. Ireland (big).


url_data = "https://www.doogal.co.uk/kml/UkPostcodes.kml"

url_KML_map(url_data)
Make this Notebook Trusted to load map: File -> Trust Notebook

For England#

I chose a region size approximately the same size as the poscode region above. I did this using doogal for counties in England. This gave both the polygon file for counties and a postcode to county conversion.

url_data = 'https://www.doogal.co.uk/kml/counties/Counties.kml'
url_KML_map(url_data)
Make this Notebook Trusted to load map: File -> Trust Notebook

For Scotland#

Postcode data can be found in numerous places (including Doogal), I decided to use the National Records of Scotland. As the region size is similar to the poscode one and England counties.

For the polygon data the UK Data Service is used.

The Scottish Parliamentary regions is used, with the area_to_plot='ScottishParliamentaryRegion2021Code column from the postcode data from National Records of Scotland.

Note there is slight difference in codes for regions. The years for the regions is different, but after checking the regions are the same but with slight different code numbers.

The .kml file is very large mainly because the Highlands and Islands multi-polygon is so detailed to inorporate all the islands. To get around this the function split_islands is used which explodes the multi-polygon for the Highlands and Islands takes a few of the polygons then merges it back together.

# url_data = "https://borders.ukdataservice.ac.uk/ukborders/easy_download/prebuilt/kml/Scotland_preg_2011.zip"
url_data = ".\\_data\\scotland_preg_2011.KML"

url_KML_map(url_data,'Scotland')
Make this Notebook Trusted to load map: File -> Trust Notebook

For Wales#

For Wales: I couldn’t find a split of postcodes, that I also had geometry data for, that wasn’t of very small areas. So I decided to create my own.

I used the district codes from the Wales postcode data. These are of the form W06000015, W06000016, etc, and there are 22 for Wales

I copied all these codes into the Doogal Multiple postcode generator. Which created a kml file I downloaded. I then combined these areas into 5 using wales_region_dict (below cell)

  • North Wales

  • Mid Wales

  • South-West Wales

  • South Wales

  • Sout-East Wales

The map regions are then found using pd.concat() to join the geopandas dataframes and .dissolve to combine the polygons (multi-polygons) of each region.

url_data='.\\_data\\WalesDistrict.kml'
url_KML_map(url_data)
Make this Notebook Trusted to load map: File -> Trust Notebook
wales_region_dict={ 
 'Wrexham':'North Wales', 
 'Conwy':'North Wales', 
 'Gwynedd':'North Wales', 
 'Isle of Anglesey':'North Wales',
 'Flintshire':'North Wales', 
 'Denbighshire':'North Wales', 
 'Powys':'Mid Wales',
 'Ceredigion':'Mid Wales',
 'Carmarthenshire':'South West Wales',
 'Swansea':'South West Wales',
 'Neath Port Talbot':'South West Wales',
 'Bridgend':'South Wales',
 'Vale of Glamorgan':'South Wales',
 'Cardiff':'South Wales',
 'Rhondda Cynon Taf':'South Wales',
 'Merthyr Tydfil':'South Wales',
 'Caerphilly':'South East Wales',
 'Newport':'South East Wales',  
 'Torfaen':'South East Wales', 
 'Monmouthshire':'South East Wales',    
 'Blaenau Gwent':'South East Wales',
}
url_data='.\\_data\\WalesDistrict.kml'
url_KML_map(url_data,'Wales')
Make this Notebook Trusted to load map: File -> Trust Notebook

For Northern Ireland#

I decided to group the area together using BT postcodes.

A bit on Folium#

Part 1

m = folium.Map(location=[latitude,longitude], 
                   zoom_start=5,
                   control_scale=True,
                   tiles="Stamen Toner")
    
folium.TileLayer('CartoDB positron',name="Light Map",control=False).add_to(m)

Part 2

choropleth = folium.Choropleth(
    geo_data=json_path,
    name='choropleth',
    legend_name= what_to_plot,
    data= df,
    columns=[region_to_plot,what_to_plot],
    key_on= "feature.properties.Name",
    fill_color='YlGn',
).add_to(m)

Part 3

choropleth.geojson.add_child(
folium.features.GeoJsonTooltip(['Name'],labels=False)
)

Above is the function used to plot maps in this page using Folium

  1. Create a basic map:

    • This sets the basic look of the map

    • things like the zoom, color scheme, where the map is centred etc

  2. Create the Choropleth part:

    • This creates the Choropleth map (i.e. regions on a map of different color based on their value)

    • json_path this is the path to a json file with the location data

      • key_on= "feature.properties.Name" refers to the json file

      • and needs to match region_to_plot in columns=[region_to_plot,what_to_plot]

    • data the data normally a pandas dataframe

      • the columns are region_to_plot and what_to_plot

      • what_to_plot are the values to plot

      • region_to_plot should match the json file

  3. This just adds ability to see names of regions on mouseover

A bit on geopandas#

GeoPandas is an open source project to make working with geospatial data in python easier. GeoPandas extends the datatypes used by pandas to allow spatial operations on geometric types. Geometric operations are performed by shapely. Geopandas further depends on fiona for file access and matplotlib for plotting.

Here I will mainly be loading .KML files. These require a bit of additional code to load. The result of a GPD file is shown below.

  • geometry- this contains the geometry data to define the regions

  • Name- default name of the region, this needs to be referenced in file that calls this

  • kml files need specific loading terminology

    • gpd.read_file(f.path, driver='KML', layer=layer) can work

    • alternatively:

      • fiona.drvsupport.supported_drivers['KML'] = 'rw'

      • df = pd.read_file(url_data,driver=fiona.drvsupport.supported_drivers['KML'])

  • explode go from a multi-polygon to a series of polgons

  • dissolve go back from polygons to multi-polygon

Going from postcode to area#

To create a function that takes in a postcode and outputs a region the following was done:

  • Load the postcode and region data to a dataframe

    • i.e. pcode = pd.read_csv(os.path.join(dirname,filename),                                        usecols=['Postcode',area_to_plot])

  • Convert it to a dict pcode_dict = dict(pcode.values)

  • Use map to convert the postcodes to area

    • df[area_to_plot]=df.loc[:,'Postcode'].map(pcode_dict)

Next page

More details on the combined maps.