The Spatial Mismatch of Empty Homes and Housing Pressure

Lenka

May 7, 2026 21 min read housing, Python, inequalities

I've recently read Andrew Brooks article in the Geographical, which provides a convincing argument in today’s debate on UK empty homes. The argument resonated with me, as the pressure to build new houses, both from the public and the government, has never quite sat well with me. I suppose this may be influenced by coming from the Czech Republic, where 50–60% of the population lives in flats, which is a stark contrast to the UK, where only around 20% do. Moreover, many people in the Czech Republic would rather own a flat than a house, perhaps due to better transport options and lower maintenance requirements. Looking at national statistics, one might not initially assume this to be a problem. With a total population of 69.3 million and roughly 30.4 million dwellings, this equates to around 2.2 people per dwelling. Theoretically, this appears entirely reasonable.

Setting that aside, however, the reality is that housing demand is generally much higher than supply, especially in some locations. This is not simply due to a lack of homes; rather, it is likely driven by a number of underlying factors such as the increasing number of empty homes, the increasing number of lone households, the increasing number of retirees living in otherwise empty family housing, and at the same time the increasing number of empty retirement properties, among others. Additionally, each of these factors has its own underlying causes, which often circle back to the broader demand–supply dynamic. (I'm not a housing expert however, so don't quote me on this and let me know whats your theory!) This complete picture is much more nuanced (as Brooks suggests), and the complexity of the situation is difficult to fully understand, let alone quantify. However, we can still examine some aspects of Brooks’s claim:

“What is called a ‘housing crisis’ is really an ‘empty room crisis’, caused by vacant properties and uninhabited bedrooms. There is not a problem of housing stock, but of distribution”

So let’s take a closer look at what this looks like on a map (a phrase geographers tend to use at every opportunity, a slightly toxic trait, if you will).

The available data

I use the latest (2025) data on the number of empty homes in England, derived from council tax records (Council Taxbase 2025). This dataset is only available at the local authority level, so the analysis cannot be more granular. However, it also includes information on second homes and a breakdown by tax band.

The second dataset is occupancy data from the latest Census (2021), obtained via Nomis. This shows how many households have either unoccupied bedrooms or are overcrowded in each local authority.

I complement this with data on population size for each local authority, as well as urban–rural classification and the latest IMD data (2025), to provide additional context. Finally, I include geographical boundaries for local authorities to enable mapping.

import pandas as pd 
import numpy as np 
import geopandas as gpd
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1.inset_locator import inset_axes
import matplotlib.ticker as mtick
import matplotlib.patches as mpatches
import seaborn as sns
from libpysal.weights import Queen
from esda.moran import Moran_Local_BV
import numpy as np
import folium
import warnings
warnings.filterwarnings("ignore")

# Load data
# technical note: you may need to make some manual adjustments to certain sheets,
# such as removing the first or last few rows and converting object columns to numeric values
# You may also need to use LAD lookup tables to correct issues with Sheffield and Barnsley LAD codes
empty_homes = pd.read_csv('./empty_homes_2025.csv')
occupancy = pd.read_csv('./occupancy2021.csv').rename(columns={'mnemonic': 'LADCD'})
ruc = pd.read_csv('./RUC.csv').iloc[:, [0,2,4]]
imd = pd.read_csv('./IMD.csv').loc[:, ['LAD24CD','IMD_decile(10 least deprived)']].groupby('LAD24CD').mean().reset_index()
pop = pd.read_csv('./pop.csv').rename(columns={'2021': 'population', 'mnemonic': 'LADCD'})
LA = gpd.read_file('./Local_Authority_Districts_May_2024_Boundaries_UK_BFE_7458506961569058424.geojson')

# Merge datasets together
empty_homes.rename(columns={'ONS Code': 'LADCD'}, inplace=True)
data = empty_homes.merge(occupancy, on='LADCD', how='left'
                         ).merge(ruc, left_on='LADCD', right_on='LAD24CD', how='left'
                                 ).merge(pop, on='LADCD', how='left').merge(imd, left_on='LADCD', right_on='LAD24CD', how='left')
data = LA.merge(data, right_on='LADCD', left_on='LAD24CD', how='right')
data.columns = data.columns.str.strip()

# let's just keep the columns we need
data = data.loc[~data['LAD24NM'].isna(), ['LADCD','LAD24NM','Region','geometry', 
       'Total empty homes', 'all_empty_small', 'all_empty_medium',
       'all_empty_large', 'Total homes empty more than 6 months',
       'long_empty_small', 'long_empty_medium', 'long_empty_large', 'Total_dwellings',
       'Total_small', 'Total_medium', 'Total_large', 'Second_homes',
       'second_small', 'second_medium', 'second_large',
       'Total: All households', 'Occupancy rating of bedrooms: +2 or more',
       'Occupancy rating of bedrooms: +1', 'Occupancy rating of bedrooms: 0',
       'Occupancy rating of bedrooms: -1',
       'Occupancy rating of bedrooms: -2 or less', 'RUC21CD',
       'Urban_rural_flag','IMD_decile(10 least deprived)',
       'population']].rename(columns={'Total_dwellings': 'Total homes'})
data.head(5)

We now convert the core variables into more digestible measures, expressed as rates.

data["long_empty_rate"] = (data["Total homes empty more than 6 months"] / data["Total homes"])
data["second_homes_rate"] = (data['Second_homes'] / data["Total homes"])
data["all_empty_rate"] = (data["Total empty homes"] / data["Total homes"])
data["all_unoccupied_rate"] = ((data["Total empty homes"]+ data['Second_homes']) / data["Total homes"])

data["overcrowded_rate"] = ((data["Occupancy rating of bedrooms: -2 or less"]+ data['Occupancy rating of bedrooms: -1']) /data["Total: All households"])
data["underoccupancy_rate"] = ((data['Occupancy rating of bedrooms: +2 or more'] + data['Occupancy rating of bedrooms: +1']) /data["Total: All households"])

# look into the size of the bedrooms
data["all_unoccupied_small_rate"] = ((data["Total empty homes"]+data["Total homes empty more than 6 months"]+ data['Second_homes']) / data["Total homes"])

data["all_unoccupied_small_rate"] = ((data['all_empty_small']  + data['second_small'])/ data['Total_small'])
data["all_unoccupied_medium_rate"] = ((data['all_empty_medium']  + data['second_medium'])/ data['Total_medium'])
data["all_unoccupied_large_rate"] = ((data['all_empty_large']  + data['second_large'])/ data['Total_large'])

# predefines some figures
def map_with_london(data, column, title, ax, percentage=False):

    data.plot(column=column,ax=ax, legend=True)
    
    if percentage:
        cbar = ax.get_figure().axes[-1]
        cbar.yaxis.set_major_formatter(mtick.PercentFormatter(xmax=1))

    ax.set_title(title)
    ax.axis("off")

    # london
    london = data[data["Region"] == "L"]
    axins = inset_axes(ax, width="40%", height="40%", loc="center left")
    london.plot(column=column, ax=axins, legend=False)
    axins.set_title("London", fontsize=8)
    axins.axis("off")


def top_bottom_plot(data, column, ax, name_col="LAD24NM", title = None):
    
    top5 = data.sort_values(column, ascending=False).head(5)
    bottom5 = data.sort_values(column, ascending=True).head(5)

    cmap = plt.get_cmap("viridis")
    top5["color"] = [cmap(0.95)] * len(top5)      
    bottom5["color"] = [cmap(0.05)] * len(bottom5)  

    combined = pd.concat([bottom5, top5])
    combined = combined.sort_values(column, ascending=True)

    ax.barh(combined[name_col],combined[column],color=combined["color"])

    ax.set_title(f"Top 5 and Bottom 5 areas: {column}" if title is None else title)
    ax.set_xlabel("Rate")
    ax.xaxis.set_major_formatter(mtick.PercentFormatter(1))

fig, axes = plt.subplots(1, 2, figsize=(22, 6))
map_with_london(data, "all_empty_rate", 
                "All empty homes \n(percentage of all homes left empty for more than 6 months)", ax = axes[0], percentage=True)
top_bottom_plot(data, "all_empty_rate", ax = axes[1], title="Top 5 and Bottom 5 areas")
plt.show()

fig, axes = plt.subplots(1, 2, figsize=(22, 6))
map_with_london(data, "long_empty_rate", 
                "Long term empty homes \n(percentage of all homes left empty for more than 6 months)", ax = axes[0], percentage=True)
top_bottom_plot(data, "long_empty_rate", ax = axes[1], title="Top 5 and Bottom 5 areas")
plt.show()

fig, axes = plt.subplots(1, 2, figsize=(22, 6))
map_with_london(data, "second_homes_rate", 
                "Second homes \n(percentage of all homes that are second homes)", ax = axes[0], percentage=True)
top_bottom_plot(data, "second_homes_rate", ax = axes[1], title="Top 5 and Bottom 5 areas")
plt.show()

fig, axes = plt.subplots(1, 2, figsize=(22, 6))
map_with_london(data, "all_unoccupied_rate", 
                "All unoccupied homes \n(percentage of all homes that are unoccupied \nat least portion of the year, including second homes)", ax = axes[0], percentage=True)
top_bottom_plot(data, "all_unoccupied_rate", ax = axes[1], title="Top 5 and Bottom 5 areas")
plt.show()

The proportion of empty homes ranges between 0% and 4.4%, with an average of around 2%. The highest value is observed in the Isles of Scilly, while the lowest is in Havant. Within these, long-term empty properties range between 1.1% and 3.3%, with similar areas appearing in both the top and bottom five.

However, these are not the only types of empty properties people tend to notice in their neighbourhoods. In fact, when discussing empty homes, people often refer to second homes, technically not vacant properties but still visibly udnerused ones. I therefore extracted this information from the same dataset. The rate of second homes ranges from 0% to 24%, with an average of around 1%. This represents a much wider range than empty homes, but it is also highly skewed: most areas have very low rates, while only a few have very high concentrations. On a map, the City of London immediately stands out as having the highest rate of second homes.

As a result, a combined measure of empty homes and second homes is more strongly influenced by the distribution of second homes than by empty properties alone.

Let’s now take a look at the size distribution.

fig, axes = plt.subplots(1, 2, figsize=(22, 6))
map_with_london(data, "all_unoccupied_small_rate", 
                "Small size unoccupied homes \n(percentage of all homes left unoccupied)", ax = axes[0], percentage=True)
top_bottom_plot(data, "all_unoccupied_small_rate", ax = axes[1], title="Top 5 and Bottom 5 areas")
plt.show()

fig, axes = plt.subplots(1, 2, figsize=(22, 6))
map_with_london(data, "all_unoccupied_medium_rate", 
                "Medium size unoccupied homes \n(percentage of all homes that are unoccupied)", ax = axes[0], percentage=True)
top_bottom_plot(data, "all_unoccupied_medium_rate", ax = axes[1], title="Top 5 and Bottom 5 areas")
plt.show()

fig, axes = plt.subplots(1, 2, figsize=(22, 6))
map_with_london(data, "all_unoccupied_large_rate", 
                "Large size unoccupied homes \n(percentage of all homes that are unoccupied)", ax = axes[0], percentage=True)
top_bottom_plot(data, "all_unoccupied_large_rate", ax = axes[1], title="Top 5 and Bottom 5 areas")
plt.show()

Before looking into the empty bedrooms data, let us first examine how many households live in appropriately sized properties.

data['balance_rate'] = data['Occupancy rating of bedrooms: 0'] / data['Total: All households']

fig, axes = plt.subplots(1, 2, figsize=(15, 6))

map_with_london(data, 'balance_rate', 
                "Balanced households \n(percentage of household living in 'appropriate' accommodation)", ax = axes[0], percentage=True)
top_bottom_plot(data, 'balance_rate', ax = axes[1], title="Top 5 and Bottom 5 areas")

plt.show()

The average rate of balanced households across England is 25%, with a maximum of 63% and a minimum of 12%. Interestingly, the top five areas with the highest rates are all located in London.

fig, axes = plt.subplots(1, 2, figsize=(18, 6))
map_with_london(data, "overcrowded_rate", 
                "Overcrowded households \n(percentage of household with 1 or less bedrooms than needed)", ax = axes[0], percentage=True)
top_bottom_plot(data, "overcrowded_rate", ax = axes[1], title="Top 5 and Bottom 5 areas")
plt.show()

fig, axes = plt.subplots(1, 2, figsize=(16, 6))
map_with_london(data, "underoccupancy_rate", 
                "Underoccupied households \n(percentage of household with 1+ 'empty' bedrooms)", ax = axes[0], percentage=True)
top_bottom_plot(data, "underoccupancy_rate", ax = axes[1], title="Top 5 and Bottom 5 areas")
plt.show()

# reshape
df_long = data.loc[:, ["all_empty_rate", "long_empty_rate", "second_homes_rate", "all_unoccupied_rate",
                'balance_rate', "overcrowded_rate", 'underoccupancy_rate']].melt(
    var_name="variable",
    value_name="rate")

# labels
label_map = {"all_empty_rate": "All empty homes",
    "long_empty_rate": "Long-term empty",
    "second_homes_rate": "Second homes",
    "all_unoccupied_rate": "All empty and second homes",
    "balance_rate": "Balanced",
    "overcrowded_rate": "Overcrowded",
    "underoccupancy_rate": "Underoccupancy"}

df_long["variable"] = df_long["variable"].map(label_map)

# plot
fig, ax = plt.subplots(figsize=(10, 6))
sns.boxplot(data=df_long, x="variable", y="rate", ax=ax, color="orange")
ax.set_ylabel("Rate")
ax.set_xlabel("")
ax.yaxis.set_major_formatter(mtick.PercentFormatter(1))
ax.set_title("Distribution of empty housing rates and occupation levels across local authorities")
plt.xticks(rotation=20)
plt.tight_layout()
plt.show()

Interestingly, the percentage of households with underoccupied bedrooms is much higher than the percentage of overcrowded households. In some local authorities, up to 85% of households have two or more spare bedrooms, with an average of around 70%, while maximum overcrowding reaches approximately 21%, with an average of just 3%. This is an important finding.

It is far more common in England to have the luxury of a spare bedroom (at least 80% of households have at least one) than to experience overcrowding (around 20%). Across local authorities, we observe empty property rates ranging between 0% and 3%, with an average of around 1%. This comparison depends on how we define “extra space”—if we take any spare bedroom as surplus.

We therefore adopt more flexible definition. One spare bedroom can often serve practical purposes (such as a home office, storage, or a child’s room), and similarly, a shortfall of one bedroom may reflect temporary life circumstances, such as a growing family. For this reason, we focus only on households with two or more spare bedrooms when identifying underoccupation.

data['balance_rate'] = (data['Occupancy rating of bedrooms: 0']+ data['Occupancy rating of bedrooms: +1']+ data['Occupancy rating of bedrooms: -1']) / data['Total: All households']
data["overcrowded_rate"] = (data["Occupancy rating of bedrooms: -2 or less"] /data["Total: All households"])
data["underoccupancy_rate"] = (data['Occupancy rating of bedrooms: +2 or more'] /data["Total: All households"])

fig, axes = plt.subplots(1, 2, figsize=(16, 6))
map_with_london(data, "balance_rate", 
                "Balanced households \n(percentage of household living in 'appropriate' accommodation)", ax = axes[0], percentage=True)
top_bottom_plot(data, "balance_rate", ax = axes[1], title="Top 5 and Bottom 5 areas")
plt.show()

fig, axes = plt.subplots(1, 2, figsize=(18, 6))
map_with_london(data, "overcrowded_rate", 
                "Overcrowded households \n(percentage of household with 2 or less bedrooms than needed)", ax = axes[0], percentage=True)
top_bottom_plot(data, "overcrowded_rate", ax = axes[1], title="Top 5 and Bottom 5 areas")
plt.show()

fig, axes = plt.subplots(1, 2, figsize=(16, 6))
map_with_london(data, "underoccupancy_rate", 
                "Underoccupied households \n(percentage of household with 2 or more spare bedrooms)", ax = axes[0], percentage=True)
top_bottom_plot(data, "underoccupancy_rate", ax = axes[1], title="Top 5 and Bottom 5 areas")
plt.show()

# reshape
df_long = data.loc[:, ["all_empty_rate", "long_empty_rate", "second_homes_rate", "all_unoccupied_rate",
                'balance_rate', "overcrowded_rate", 'underoccupancy_rate']].melt(
    var_name="variable",
    value_name="rate")

# labels
label_map = {"all_empty_rate": "All empty homes",
    "long_empty_rate": "Long-term empty",
    "second_homes_rate": "Second homes",
    "all_unoccupied_rate": "All empty and second homes",
    "balance_rate": "Balanced",
    "overcrowded_rate": "Overcrowded",
    "underoccupancy_rate": "Underoccupancy"}

df_long["variable"] = df_long["variable"].map(label_map)

# plot
fig, ax = plt.subplots(figsize=(10, 6))
sns.boxplot(data=df_long, x="variable", y="rate", ax=ax, color="orange")
ax.set_ylabel("Rate")
ax.set_xlabel("")
ax.yaxis.set_major_formatter(mtick.PercentFormatter(1))
ax.set_title("Distribution of empty housing rates and occupation levels across local authoritie")
plt.xticks(rotation=20)
plt.tight_layout()
plt.show()

Adjusting our definition of what counts as “spare” results in a less extreme view of the situation. On average, 62% of households in each area are considered balanced, although this ranges from 45% to 91%. Among the areas with the lowest levels of balanced accommodation are Rutland, Rushcliffe, and the Cotswolds; at the same time, these areas also have some of the highest proportions of households with underoccupied bedrooms.

In contrast, the areas with the highest levels of balanced housing, such as the City of London, Tower Hamlets, and Islington, also have the lowest rates of underoccupation.

Interestingly, none of these areas appear among those with the highest or lowest levels of overcrowding. The proportion of households experiencing overcrowding averages around 0.5%, ranging from 0% to 5%. This indicates a strong skew: most areas experience very low levels of overcrowding, while only a few exhibit relatively higher rates. All of the top five areas are London boroughs, with Newham having the highest rate.

To provide some context, Newham, despite having the highest overcrowding rate at 5%, still performs relatively well on other measures: around 80% of households are balanced (above average), approximately 12% experience underoccupation (well below average), and only about 1% of homes are empty (also below average). This suggests that while overcrowding is an issue, the overall housing situation is not uniformly poor.

Let’s now take a closer look at how these variables vary across the urban–rural classification, as the overcrowding map already hints at some differences between these contexts.

fig, axes = plt.subplots(1, 4, figsize=(16, 5), sharey=False)

for i, var in enumerate(["all_unoccupied_rate", "balance_rate", "overcrowded_rate", "underoccupancy_rate"]):
    sns.violinplot(data=data, x="Urban_rural_flag", y=var,
        hue="Urban_rural_flag",
        palette={"Urban": "grey", "Rural": "darkgreen"},
        inner="quartile",
        legend=False,
        ax=axes[i]
    )
    titles = ["All Empty and Second Homes","Balanced Housing","Overcrowding","Under-occupation"]
    axes[i].set_title(titles[i])
    axes[i].set_xlabel("")
    axes[i].set_ylabel("Rate")
    axes[i].yaxis.set_major_formatter(mtick.PercentFormatter(1))

plt.tight_layout()
plt.show()

fig, axes = plt.subplots(1, 4, figsize=(16, 5), sharey=False)

for i, var in enumerate(["all_unoccupied_rate", "balance_rate", "overcrowded_rate", "underoccupancy_rate"]):
    sns.regplot(data=data, x="IMD_decile(10 least deprived)", y=var,
        ax=axes[i],
        scatter_kws={"alpha": 0.5}
    )
    titles = ["All Empty and Second Homes","Balanced Housing","Overcrowding","Under-occupation"]
    axes[i].set_title(titles[i])
    axes[i].set_xlabel("IMD Decile (10 = least deprived)")
    axes[i].set_ylabel("Rate")
    axes[i].yaxis.set_major_formatter(mtick.PercentFormatter(1))

plt.tight_layout()
plt.show()

So where is the mismatch?

Looking across the maps and variables, several patterns become clear:

Empty properties are distributed relatively evenly across both urban and rural areas, suggesting there is no strong spatial (e.g. polycentric) pattern.
At both national and local authority scales, overcrowding appears less common than might be expected. It is important to note that this dataset does not capture the full extent of overcrowding (it only identifies households with two or fewer bedrooms). However, where overcrowding is present, it is almost entirely concentrated in urban areas.
Underoccupation is widespread across England. Over 40% of local authorities have at least 40% of households with two or more spare bedrooms. It is present in both urban and rural areas, although it is slightly more common in rural areas.

It therefore appears that the relationship between these three variables within a single area is not straightforward. It is not immediately clear where empty homes or spare bedrooms could be used to alleviate overcrowding, beyond the general observation that spare bedrooms are widely available. To address this, we can apply some simple feature engineering to better capture potential mismatches.

Finding areas of mismatch

Let us consider the “empty bedroom problem” as a single-variable issue. By combining the relevant variables, we can construct an index of “occupational pressure” by subtracting the number of households with spare bedrooms from those lacking sufficient bedrooms. For example:

$$ \text{Pressure Index} = \frac{(\text{-2 or less}) - (\text{+2 or more})}{\text{Total households}} $$

This produces a variable that ranges from negative values (indicating excess capacity) to positive values (indicating pressure from overcrowding), with values around zero suggesting a balance between spare bedrooms and overcrowding.

This is, of course, a simplification. Areas that appear “balanced” may not be evenly distributed in reality, but the index provides a useful proxy for internal housing pressure and helps highlight potential mismatches.

data["pressure_index"] = (data["Occupancy rating of bedrooms: -2 or less"] - data["Occupancy rating of bedrooms: +2 or more"]) / data["Total: All households"]
data["pressure_index"].hist();

fig, axes = plt.subplots(1, 2, figsize=(10, 5), sharey=False)

for i, var in enumerate(["all_unoccupied_rate", "pressure_index"]):
    sns.violinplot(data=data, x="Urban_rural_flag", y=var,
        hue="Urban_rural_flag",
        palette={"Urban": "grey", "Rural": "darkgreen"},
        inner="quartile",
        legend=False,
        ax=axes[i]
    )
    titles = ["All Empty and Second Homes","Occupancy Pressure Index"]
    axes[i].set_title(titles[i])
    axes[i].set_xlabel("")
    axes[i].set_ylabel("Rate")
    axes[i].yaxis.set_major_formatter(mtick.PercentFormatter(1))

plt.tight_layout()
plt.show()

fig, axes = plt.subplots(1, 2, figsize=(10, 5), sharey=False)

for i, var in enumerate(["all_unoccupied_rate", "pressure_index"]):
    sns.regplot(data=data, x="IMD_decile(10 least deprived)", y=var,
        ax=axes[i],
        scatter_kws={"alpha": 0.5}
    )
    titles = ["All Empty and Second Homes","Occupancy Pressure Index"]
    axes[i].set_title(titles[i])
    axes[i].set_xlabel("IMD Decile (10 = least deprived)")
    axes[i].set_ylabel("Rate")
    axes[i].yaxis.set_major_formatter(mtick.PercentFormatter(1))

plt.tight_layout()
plt.show()

The new pressure index is overwhelmingly negative, reflecting what we observed in the maps above: the issue of underoccupation is far more widespread and pronounced than overcrowding. This creates many areas with potential “pressure release” in terms of available space.

Let's now examine the relationship between these variables using a scatter plot, highlighting observations that fall into the extremes. To do this, we define the top and bottom 20% of observations within each variable.

low_empty = data["all_unoccupied_rate"].quantile(0.2)
high_empty = data["all_unoccupied_rate"].quantile(0.8)

low_pressure = data["pressure_index"].quantile(0.2)
high_pressure = data["pressure_index"].quantile(0.8)



fig, ax = plt.subplots(figsize=(8, 8))

sns.scatterplot(data=data, x="all_unoccupied_rate", y="pressure_index",
    hue="Urban_rural_flag", palette={"Urban": "grey","Rural": "darkgreen"}, alpha=0.8, ax=ax)
xlim = ax.get_xlim()
ylim = ax.get_ylim()

# colored boxes
ax.fill_between([xlim[0], low_empty], high_pressure, ylim[1], color="orange", alpha=0.15)
ax.fill_between([high_empty, xlim[1]], high_pressure, ylim[1], color="red", alpha=0.15)
ax.fill_between([high_empty, xlim[1]], ylim[0], low_pressure, color="blue", alpha=0.15)
ax.fill_between([xlim[0], low_empty], ylim[0], low_pressure, color="green", alpha=0.15)

# lines
ax.axvline(low_empty, color="black", linestyle="--", linewidth=1)
ax.axvline(high_empty, color="black", linestyle="--", linewidth=1)
ax.axhline(low_pressure, color="black", linestyle="--", linewidth=1)
ax.axhline(high_pressure, color="black", linestyle="--", linewidth=1)

# axes ticks
ax.set_xticks([low_empty, data["all_unoccupied_rate"].median(), high_empty])
ax.set_xticklabels(["Low", "Average", "High"], rotation=60)
ax.set_yticks([low_pressure, data["pressure_index"].median(), high_pressure])
ax.set_yticklabels(["Low", "Average", "High"])

# labels
ax.set_xlabel("Empty and Second Homes")
ax.set_ylabel("Occupancy Pressure Index")
ax.text(xlim[0], ylim[1], "Property Shortage", color="orange", ha="left", va="top")
ax.text(xlim[1], ylim[1], "High pressure", color="red", ha="right", va="top")
ax.text(xlim[1], ylim[0], "Underused spaces", color="blue", ha="right", va="bottom")
ax.text(xlim[0], ylim[0], "Balanced living", color="green", ha="left", va="bottom")

ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
plt.tight_layout()
plt.show()

This allows us to clearly distinguish between different types of areas:

Red are areas that experience both a high number of empty properties and high occupancy pressure. In these places, local populations may face higher levels of crowding while being surrounded by unused dwellings.
Orange are areas with low levels of empty properties but high occupancy pressure. Here, people may experience crowding, but there are few empty properties available locally.
Blue are areas with a high number of empty homes and substantial spare bedroom capacity. These could be described as approaching “ghost town” conditions, where housing space is significantly underused.
Green are areas with low levels of empty homes and, at the same time, a high availability of spare bedrooms, perhaps the most balanced housing environments.

Let’s now look at how these patterns appear on a map.

def classify(row):
    if row["all_unoccupied_rate"] <= low_empty and row["pressure_index"] >= high_pressure:
        return "Property Shortage"
    elif row["all_unoccupied_rate"] >= high_empty and row["pressure_index"] >= high_pressure:
        return "High pressure"
    elif row["all_unoccupied_rate"] >= high_empty and row["pressure_index"] <= low_pressure:
        return "Underused spaces"
    elif row["all_unoccupied_rate"] <= low_empty and row["pressure_index"] <= low_pressure:
        return "Balanced living"
    else:
        return "Middle"  # optional: non-extreme areas

data["housing_type"] = data.apply(classify, axis=1)

color_map = {
    "Property Shortage": "gold",    
    "High pressure": "#d73027",    
    "Underused spaces": "#4575b4",   
    "Balanced living": "#1a9850",     
    "Middle": "#d9d9d9"       
}

data["color"] = data["housing_type"].map(color_map)
pop_by_type = data.groupby("housing_type")["population"].sum()

def fmt_pop(x):
    if x >= 1_000_000:
        return f"{x/1_000_000:.1f}M"
    elif x >= 1_000:
        return f"{x/1_000:.0f}K"
    else:
        return str(int(x))
    

ig, ax = plt.subplots(figsize=(10, 10))

data.plot(color=data["color"], linewidth=0.2, edgecolor="white", ax=ax)

ax.set_axis_off()

legend_handles = [
    mpatches.Patch(
        color="gold",
        label=f"Property Shortage ({fmt_pop(pop_by_type.get('Property Shortage', 0))})"
    ),
    mpatches.Patch(
        color="#d73027",
        label=f"High pressure ({fmt_pop(pop_by_type.get('High pressure', 0))})"
    ),
    mpatches.Patch(
        color="#4575b4",
        label=f"Underused spaces ({fmt_pop(pop_by_type.get('Underused spaces', 0))})"
    ),
    mpatches.Patch(
        color="#1a9850",
        label=f"Balanced living ({fmt_pop(pop_by_type.get('Balanced living', 0))})"
    ),
]

ax.legend(
    handles=legend_handles,
    loc="lower right",
    frameon=False,
    title="The Extreme (Total area population size)"
)
ax.set_title("Housing Pressure (Extreme Conditions)", fontsize=14)

plt.tight_layout()
plt.show()

The areas at the extremes are distributed across the UK. Orange and blue areas are those with relatively high numbers of empty homes; however, in blue areas there is also a high level of spare bedroom capacity, while red areas are those experiencing overcrowding. In these red areas, policies aimed at increasing property occupancy and reducing bedroom underoccupation could be particularly beneficial.

Orange areas, on the other hand, experience higher levels of household crowding but have very low levels of empty homes.

So far, this interpretation has been fairly straightforward, each area is considered in isolation. In reality, neighbouring areas are often closely connected. If one area experiences high overcrowding, surrounding areas are likely to face similar housing pressures. This suggests that housing pressure is not simply a local issue, but a spatial one. Let’s now explore whether we can account for this by incorporating spatial statistics.

Accounting for spatial relations



# spatial weights 
w = Queen.from_dataframe(data)
w.transform = "r"

# variables
x = data["pressure_index"].values
y = data["all_unoccupied_rate"].values

# identify islands 
data["is_island"] = False
data.loc[w.islands, "is_island"] = True

# bivariate local Moran
lisa = Moran_Local_BV(x, y, w)

# set islands to NaN for cluster type 
data.loc[data["is_island"], "lisa_cluster"] = np.nan

# cluster type
data["lisa_cluster"] = lisa.q
data["p_value"] = lisa.p_sim

data["lisa_5"] = np.where(data["p_value"] < 0.05, data["lisa_cluster"], 0)

label_map = {
    1: "High pressure",        # HH
    2: "Underused spaces",     # LH
    3: "Balanced living",      # LL
    4: "Property Shortage",    # HL
    0: "Middle"}

data["lisa_5_label"] = data["lisa_5"].map(label_map)

Here are the final variables and the number of areas in each combination.


data.loc[:,['housing_type','lisa_5_label']].value_counts()

housing_type       lisa_5_label     
Middle             Middle               201
                   Balanced living       24
Balanced living    Middle                11
High pressure      Middle                10
Middle             Property Shortage      9
Property Shortage  Middle                 7
Underused spaces   Middle                 7
Middle             High pressure          7
                   Underused spaces       5
High pressure      High pressure          5
Balanced living    Balanced living        4
High pressure      Property Shortage      3
Property Shortage  High pressure          1
                   Property Shortage      1
Underused spaces   Underused spaces       1
Name: count, dtype: int64


# different colour schema
color_map2 = {
    "High pressure + high nearby empties": "#850000",     # muted red
    "High pressure + low nearby empties": "#B37100",  # muted yellow
    "Low pressure + high nearby empties": "#002487",      # muted blue
    "Balanced living + low empties": "#006622",        # muted green
    "Not significant": "lightgray"
}

# new labels
lisa_labels = {
    "High pressure": "High pressure + high nearby empties",
    "Property Shortage": "High pressure + low nearby empties",
    "Underused spaces": "Low pressure + high nearby empties",
    "Balanced living": "Balanced living + low empties",
    "Middle": "Not significant"
}

data["lisa_readable"] = data["lisa_5_label"].map(lisa_labels)

# plot
fig, ax = plt.subplots(figsize=(10, 15), constrained_layout=True)
data.plot(color=data["lisa_readable"].map(color_map2), ax=ax)

ax.set_title("Spatial clusters of housing pressure and empty homes (LISA 5%)", fontsize = 14)
ax.axis("off")

legend_items = [
    ("High pressure + high nearby empties", "High occupancy, surrounded by empty homes"),
    ("High pressure + low nearby empties", "High occupancy, little nearby empty stock"),
    ("Low pressure + high nearby empties", "Spare capacity, surrounded by empty homes"),
    ("Balanced living + low empties", "Spare capacity, little nearby empty stock"),
    ("Not significant", "No clear spatial pattern")
]

handles = [
    mpatches.Patch(color=color_map2[k], label=v)
    for k, v in legend_items
]

ax.legend(handles=handles, frameon=False, loc="center left", title="Cluster type")

plt.show()

The map shows clusters of the two variables. Specifically:

Red areas are local authorities with relatively high occupancy pressure (in lived-in households) that are surrounded by areas with high rates of empty dwellings.
Yellow areas are local authorities with relatively high occupancy pressure, but are surrounded by areas with little to no empty dwellings.
Blue areas are local authorities with low occupancy pressure (i.e. plenty of spare bedrooms), surrounded by areas with high rates of empty dwellings—whether these are second homes or genuinely empty properties.
Green areas are perhaps the most balanced, characterised by low occupancy pressure, ample spare bedrooms, and very low levels of empty homes in surrounding areas.

You can explore both layers in the interactive map below:

first_layer = data.loc[data['housing_type'] != 'Middle',['LADCD', 'LAD24NM', 'geometry','housing_type','Urban_rural_flag','IMD_decile(10 least deprived)','population']
                       ].rename(columns={'housing_type': 'Housing Type'})
second_layer = data.loc[data['lisa_readable'] != 'Not significant',['LADCD', 'LAD24NM', 'geometry','lisa_readable','Urban_rural_flag','IMD_decile(10 least deprived)','population']
                        ].rename(columns={'lisa_readable': 'LISA Cluster'})

# base map
m = folium.Map(location=[54.5, -2.5], zoom_start=6, tiles="cartodbpositron")

# style functions
def style_function_factory(column, cmap, opacity=0.5):
    def style_function(feature):
        val = feature["properties"][column]
        color = cmap.get(val, "#d9d9d9")

        return {
            "fillColor": color,
            "color": "black",
            "weight": 0.3,
            "fillOpacity": opacity
        }
    return style_function

# layers
folium.GeoJson(
    first_layer,
    name="Housing Pressure Extreme",
    style_function=style_function_factory('Housing Type', color_map),
    tooltip=folium.GeoJsonTooltip(fields=["LAD24NM", "Housing Type",'Urban_rural_flag','population','IMD_decile(10 least deprived)'])
).add_to(m)


folium.GeoJson(
    second_layer,
    name="Spatial cluster (LISA 5%)",
    style_function=style_function_factory("LISA Cluster", color_map2),
    tooltip=folium.GeoJsonTooltip(fields=["LAD24NM", "LISA Cluster",'Urban_rural_flag','population','IMD_decile(10 least deprived)'])
).add_to(m)

folium.LayerControl(collapsed=False).add_to(m)
m.save("map.html")

Summary

Is this a problem of distribution rather than a shortage of housing? The maps above suggest YES. There appears to be a substantial amount of underoccupied housing within UK local authorities. In fact, in around 40% of local authorities, at least 40% of households have two or more spare bedrooms, which is more common in rural local authorities.

However, the patterns suggest that underoccupation is more likely to occur in rural and less deprived areas, while overcrowding is concentrated almost entirely in urban, and often more deprived, areas. Moreover, although the high-pressure areas appear relatively small on the map, they contain roughly twice the population of the other extreme categories because they are predominantly urban and have much higher population densities.

Perhaps this is one of the reasons why the “not enough homes” debate is far more visible and widely discussed than the issue of “ghost towns” and underused housing space, it is simply a experienced by more people.

There appear to be areas where both housing pressure and opportunity coexist, and where targeted policy interventions could help rebalance local housing markets and mitigate future pressures.

That said, there are several important limitations and unknowns:

There is no clear definition of what constitutes an acceptable level of underoccupation or overcrowding.
Similarly, the “acceptable” rate of empty homes is unclear—one could argue that any empty home represents inefficient use of resources.
The levels of overcrowding observed in the UK may be relatively low compared to those in other countries.
The methodology may overlook areas where underoccupation and overcrowding offset one another.
Housing mobility is constrained by financial, social, and temporal factors, meaning that reallocating space is a complex and long-term process for individuals and families. Some degree of empty or underused housing is likely unavoidable.

In future work, it would be useful to examine the size and type of empty homes, as affordability may be a key factor. As some sources suggest, “it is not the lack of housing, but the lack of affordable housing.” Additionally, incorporating a temporal dimension could provide further insight, particularly given indications that the situation may be worsening over time.