In the previous blog post, we used a data set
downloaded from the UK based HM Land Registry website to demonstrate how Access
can be used in the context of Data Analysis.
As you may remember, the data set contained details of 2023 house sales
in the Salford area of Greater Manchester, and we created a series of Access
Queries to count the number of sales which occurred in seven separate price
bands. This gave us an overview of what had occurred in the Salford area
housing market that year.
In today’s post we are going to process the
same data set with an Access Crosstab Query to give us a breakdown of average
house sale price by property type and postcode area. This will show us the differences in average
house sale price according to where a property is located and the type of
building it is (ie whether the property is detached, semi-detached, terraced,
or a flat). As we shall see, our
crosstab query will restructure the dataset in such a way that property type
and postcode values from the original table become row and column
headings in the resulting new table.
Furthermore, the values produced in the new table are derived from an
aggregated average of individual house sale amounts from the original.
Let’s consider what this means in
practice. The original table is
comprised of rows which represent individual house sales with each column containing
a specific piece of information about the sale.
Here is a sample of the 2,942 record dataset we downloaded and cleansed in
the last blog post (see Using Access for Data Analysis).
![]() |
Figure 1: Sample from original dataset of house sales in Salford during 2023. |
As we can see each house sale record has its
own unique id, sale amount (price_paid), date (deed_date), postcode and
property type. As stated above, we are going to create a crosstab query which
will restructure the table so that each row represents a property type
and each column a postcode area. Each value in the table is the average
sale amount (ie price_paid) for all properties for the property type and
postcode area of its particular row and column respectively.
The table below shows how the crosstab query results will look:
Postcode Postcode Postcode Postcode Postcode Property type Average
£ Average
£ Average
£ Average
£ Average
£ Property type Average
£ Average
£ Average
£ Average
£ Average
£ Property type Average
£ Average
£ Average
£ Average
£ Average
£ Property type Average
£ Average
£ Average
£ Average
£ Average
£
Before we look at how to create the crosstab
query itself, let us first clarify what the property type and postcode
data actually means. Property type data can be one of four different letters – D,
S, T or F. These
are the first letters of the following property types – Detached, Semi-detached,
Terraced or Flat. For those of you outside of the UK and
unfamiliar with these property types: Detached means a stand-alone
building containing a single home; Semi-detached means a single building
comprised of two separate homes; Terraced means a row of multiple
adjoining homes; and Flat means a large building divided into multiple
apartments. Detached homes tend to be the most expensive property type,
followed by Semi-detached, Terraced and Flats respectively. However, location
has a direct bearing on the property’s sale price, so a semi-detached property
in a desirable area may be more expensive than a detached property located
somewhere less desirable. This is where
the properties postcode value comes in.
Postcodes are comprised of two separate parts eg M28 1FU. The full postcode tells us which part of a
street the property is located, so each full postcode may be shared by a small
number of other properties on the same street.
The first part of the postcode eg M28, on the other hand, relates
to the general area where the property is located, so there can be hundreds of
other properties sharing this part of the code.
As such this can give us a good idea about the desirability of the
properties location, and it is this part that we shall use for the column
headings in our crosstab query.
Creating the
Access Crosstab Query
Let us now
look at how the crosstab query is created.
In the
previous blog post on Using Access for Data Analysis we downloaded a
dataset from the UK based HM Land Registry website and imported it into an
Access table. We called the table tblSalfordHouseSales
(after data cleansing). We shall be
using this table to create the crosstab query below.
Here are the step by step instructions:
- Open the Access database in which tblSalfordHouseSales is located and select the CREATE tribbon.
- Click the QUERY DESIGN icon from the QUERIES group to open the QUERY DESIGN window.
- Click the ADD TABLES icon from the QUERY SETUP group of the QUERY DESIGN ribbon. This opens the ADD TABLES sidebar (if it is not open already).
- Select the TABLES tab from the ADD TABLES sidebar, and drag the tblSalfordHouseSales table onto the top section of the QUERY DESIGN window.
- Next, we need to click the CROSSTAB icon from the QUERY TYPE group of the QUERY DESIGN ribbon. This adds CROSSTAB row to the QUERY DESIGN grid in the lower half of the QUERY DESIGN window.
Figure 2: : The CROSSTAB icon in the QUERY TYPE group of the QUERY DESIGN ribbon. Figure 3: The QUERY DESIGN grid after selecting property_type in first column. Notice the extra CROSSTAB row heading appear after clicking the CROSSTAB icon in the QUERY TYPE ribbon group. - The first field we are going to add to the QUERY DESIGN grid is property_type. Do this by selecting the property_type field from the drop down list in the first column of the FIELD row of the QUERY DESIGN grid, or by dragging property_type down from the tblSalfordHouseSales table box in the top half of the window.
- As explained above, each data value in the property_type field will act as a row heading in our crosstab query. To set this up on the QUERY DESIGN grid, we go down to the CROSSTAB row on the property type column, and select ROW HEADING from the drop down list as in figure 3 above.
- The next field we are going to add to the QUERY DESIGN grid is postcode. However, as explained above, we are not interested in the full postcode, just the general postcode area. You may remember that we discussed how UK postcodes are comprised of two parts and it is the first part which tells us the general area. As such, we need to use a string function to extract the first part of the code from the full field value. We are going to achieve this end using the InStr function to count the number of characters in the first part of the code, and then using the Left function to extract these characters from the full field value. We use the InStr function to search for the space character which separates the two parts of the code. The value the function returns is the numerical character position of the space. The syntax for this is as follows:
InStr([postcode], “ “) -1
By
subtracting 1 from the return value, we get the character count of the first
part of the code. For example, if the postcode was “SK14 5RP”, our function
returns 5 for the space position, and then 4 for the character
count after we subtract 1.
We
then use the Left function to extract the number of characters in the character
count from the left side of the string. The syntax for this is as follows:
Left([postcode], InStr([postcode], “
“)-1)
… and
this returns the first part of the postcode as a new string value ie “SK14” for
the example used above.
As
such, we can now add this function to the field row in the second column of the
QUERY DESIGN grid. The string data this function will return will form the
column headings in our crosstab query result, so select COLUMN HEADING in the
crosstab row drop down list as in the screenshot below:
10. Now that we have sorted our row and column headings, we just need to take care of the actual values where the rows and columns intersect. As you may remember, these are going to be the average price paid for a property of a specific type (ie the property_type row) in a specific area (ie the postcode_area column). Do this by selecting the price_paid field from the drop down list in the third column of the FIELD row of the QUERY DESIGN grid, or by dragging price_paid down from the tblSalfordHouseSales table box in the top half of the window.
11. Next, we need to change the value in the TOTAL row of the price_paid column in the QUERY DESIGN grid. Change the value from GROUP BY to AVG. This will give us the average price_paid aggregate for properties of a specific type in a specific area.
12. This is where we set the CROSSTAB row of the price_paid column in the QUERY DESIGN grid. Select VALUE from the dropdown list as in figure 4 above. This simply tells Access that the price_ paid field is constituting the values in the crosstab query.
13. Finally, we just need to set field properties for price_paid so that we get a whole number without any decimal places. Do this by clicking anywhere in the price_paid QUERY DESIGN GRID column, and then clicking the PROPERTY SHEET icon in the SHOW/HIDE group of the QUERY DESIGN ribbon, and then set the FORMAT property to CURRENCY and the DECIMAL PLACES property to 0. This will make the query results much easier to read.
All we need to do now is click RUN in the RESULTS group of
the QUERY DESIGN ribbon to see the results of our crosstab query!
![]() |
Figure 5: Average price paid for property types in the different Salford poscode areas (data taken from sales during 2023). |
As we can see, the different property type
codes from data in our original table form the row headings in the crosstab
query, and the postcode areas form the column headings. The actual table
values, as we have learnt above, are an aggregated price_paid average of all
properties of a specific type in a specific area. So, for example, the average price paid of
all detached properties in the M27 postcode area is £335,592. If
we compare this to the average price paid of the same property type in the M28
postcode area we see the value of £440,601 is more expensive, thereby
suggesting that the latter area is more desirable. As such, the crosstab query
has given us a meaningful summary of the 2,942 rows of data from our original
table. In so doing it has helped us
understand what the original data was telling us.