Areas of UK for plotting#

The problem this analysis is trying to solve is to find a way to seperate addresses into larger regions:

  • To identify differences in regions

  • To be able to plot the data onto a map

This is split into 3 pages:#

The input data being used has UK postcodes, this can then either be converted to a latitude-longitude pair or a larger region that contains many post codes, such as a county or city region.

I’ll use the latter method so I can produce choropleth maps. There are a few different tools that can be used to plot these in python. I’ll use folium mainly because I have used it before.

To produce the choropleth maps from the postcode data, requires two things:

  1. A way to transform postcodes to larger regions

  2. Details of the polygons for those regions

Note: the regions in the two need to be the same.

For smaller regions the UK census regions work well or maybe the best way to go is with a nearest neighbour algorithm?

Code to create maps#

# geometry data
import fiona
import geopandas as gpd
import json

# general - working with dataframes and numbers
import pandas as pd
import numpy as np

import os

# plot map parts
import folium
import branca.colormap as cm

def split_islands(df):
    """
    Takes a gpd dataframe of Scotland which needs reducing in size as too big to plot
        Highlands and Islands multi-polygon is split (or exploded) from a  multi-polygon to set of polygons, the first few are selected and then merged (dissolve) and combined with rest of df
    
    Args: df (geopandas dataframe) of Scottish data
    Returns: 
        df (geopandas dataframe) reduced in size
    
    """
    from shapely.geometry.polygon import Polygon
    from shapely.geometry.multipolygon import MultiPolygon
    
    # the islands is the 7th component
    gdf = df.iloc[7:8].copy()

    # slight mods to help change things
    gdf["geometry"] = [MultiPolygon([feature]) if isinstance(feature, Polygon)
                       else feature for feature in gdf["geometry"]]

    # explode th eislands into a number of polygons from one multipolygon
    gdf_parts = gdf.explode(column='geometry', ignore_index=True, index_parts=False)

    # take the first 3 elements only and dissolve back to one multi polygon
    df7new = gdf_parts.iloc[0:3].dissolve()

    # create the new geopanda with the new islands and the previous rest
    dfnew = pd.concat([df.loc[:6], df7new]).reset_index(drop=True)
    
    return dfnew

def joinWalesRegions(df):
    wales_region_dict={ 
         'Wrexham':'North Wales', 
         'Conwy':'North Wales', 
         'Gwynedd':'North Wales', 
         'Isle of Anglesey':'North Wales',
         'Flintshire':'North Wales', 
         'Denbighshire':'North Wales', 
         'Powys':'Mid Wales',
         'Ceredigion':'Mid Wales',
         'Carmarthenshire':'South West Wales',
         'Swansea':'South West Wales',
         'Neath Port Talbot':'South West Wales',
         'Pembrokeshire':'South West Wales',
         'Bridgend':'South Wales',
         'Vale of Glamorgan':'South Wales',
         'Cardiff':'South Wales',
         'Rhondda Cynon Taf':'South Wales',
         'Merthyr Tydfil':'South Wales',
         'Caerphilly':'South East Wales',
         'Newport':'South East Wales',  
         'Torfaen':'South East Wales', 
         'Monmouthshire':'South East Wales',    
         'Blaenau Gwent':'South East Wales',
        }
    
    df.Name = df.Name.str.replace(' Council','')
    df.Name = df.Name.replace(wales_region_dict)
    
    df = df.dissolve(by='Name',as_index=False)
    
    return df

def kml_create_json(url_data, loc_save='./_data', fname='region.json', doScotWales=False):
    """
    loads a KML location file (from local file or url) 
    and then saves this as a json file locally
    
    Args: url_data (file path) the path to the kml file,
          loc_save (directory path) the location to save the data
          fname (file name string) the file name of the json file
          doScotWales (string) whether need to call split_islands on Scottish data
            or join on Wales data
    Returns: 
        Names the regions in the json file
        file path to the json file
    
    """
        
    # so can use kml files
    fiona.drvsupport.supported_drivers['KML'] = 'rw'

    # load the kl file to a gpd df
    df = gpd.read_file(url_data, driver=fiona.drvsupport.supported_drivers['KML'])

    if doScotWales == 'Scotland':
        df = split_islands(df)
    elif doScotWales == 'Wales':
        df = joinWalesRegions(df)
    # Save as a json file to load in plot_map
    file_path = os.path.join(loc_save, fname)
    df.to_file( file_path, driver="GeoJSON")

    
    return df.Name, file_path

def plot_map(df, 
             what_to_plot='amount_pc',region_to_plot='Name',
             json_path='./_data/region.json',
             longitude=-3.1, latitude=54.1):
    """
    loads a KML location file (from local file or url) 
    and then saves this as a json file locally
    
    Args: df (pandas dataframe) dataframe with location (region_to_plot) related to json file
                in one column and data to plot (what_to_plot) in another
          what_to_plot/region_to_plot (string) columns in df
          json_path (file path) the path to the json file
          longitude/latitude (float) for the centre of the map
    Returns: 
        The folium map
    
    """
    # create a basic map
    # This sets the basic look of the map
    # things like the zoom, color scheme, where the map is centred etc

    m = folium.Map(location=[latitude,longitude], 
                   zoom_start=5,
                   control_scale=True,
                   tiles="Stamen Toner")
    
    folium.TileLayer('CartoDB positron',name="Light Map",control=False).add_to(m)
    
    """
    Create the Choropleth part:

    - This creates the Choropleth map (i.e. regions on a map of different color based on their value)
    - json_path this is the path to a json file with the location data
        - key_on= "feature.properties.Name" refers to the json file
        - and needs to match region_to_plot in columns=[region_to_plot,what_to_plot]
    - data the data normally a pandas dataframe
        - the columns are region_to_plot and what_to_plot
        - what_to_plot are the values to plot
        - region_to_plot should match the json file
    """

    choropleth = folium.Choropleth(
        geo_data=json_path,
        name='choropleth',
        legend_name= what_to_plot,
        data= df,
        columns=[region_to_plot,what_to_plot],
        key_on= "feature.properties.Name",
        fill_color='YlGn',
    ).add_to(m)
    
    # adds ability to see names of regions on mouseove
    choropleth.geojson.add_child(
    folium.features.GeoJsonTooltip(['Name'],labels=False)
    )
    

    return m


def url_KML_map(url_data, doScotWales=False):
    """
    The main function used to take a KML file and plot it 
        Calls kml_create_json to create json file
        and plot_map to plot the map
    
    Args: url_data (file path) the path to the kml file
          doScotWales (string) whether need to call split_islands on Scottish data
            or join on Wales data
    Returns: 
        The folium map
    
    """
        
    # create a json file for plotting and gives back names of regions
    fname=url_data.split('/')[-1].split('.')[0]+'.json'
    map_names, json_path = kml_create_json(url_data, fname=fname, doScotWales = doScotWales)

    df = pd.DataFrame(columns=['County','Data'])
      
    # add the names of the regions
    df['County'] = map_names
    # create some random data to plot
    df['Data']= np.random.randint(0 ,100,len(df) )

    m = plot_map(df,what_to_plot='Data',region_to_plot='County',
                json_path = json_path)
    
    return m

Create maps#

From UK postcodes, i.e. first two letter of the postcode like NE, M, PL, WA. This is the easiest way as the postcode input data can be easily transformed.

However, the regions are less intuitive. i.e. what is WA or PL? And some postcode areas can be very small or big such as London regions (small) or N. Ireland (big).


url_data = "https://www.doogal.co.uk/kml/UkPostcodes.kml"

url_KML_map(url_data)
Make this Notebook Trusted to load map: File -> Trust Notebook

For England#

I chose a region size approximately the same size as the poscode region above. I did this using doogal for counties in England. This gave both the polygon file for counties and a postcode to county conversion.

url_data = 'https://www.doogal.co.uk/kml/counties/Counties.kml'
url_KML_map(url_data)
Make this Notebook Trusted to load map: File -> Trust Notebook