Monday, 26 May 2025

Using Access for Data Analysis

Normally, when we think of  MS Access, we do so in the context of database development; that is to say, as software to create database applications for long term use to keep track of data such as customer orders, income and expenditure, and stock control. Such database applications are used by one or more people over a period of time to store, process and retrieve information, and often support the day to day operation of a business.  

In this tutorial, however, we are  going to look at how MS Access can be used in the context of data analysis.  This use is very much different from the database development model described above. Rather than setting up a database for long term use in the day to running of a business operation for example,  we are going to use Access in a short term context, as a tool to import data from an external source, and then analyse that data through Access Queries to make it meaningful from a particular perspective. This summarised data can be presented to, say, a management team to make informed decisions based on what the information reveals.

To do this we are going to be working with 2023 house sale data for the Salford area of Greater Manchester.  The process will involve extracting a dataset from the UK based HM Land Registry website, importing the dataset into Access, creating query's to categorise and aggregate the data, and then create a report containing a clustered column chart to present the data visually.

Downloading the Dataset

The first step in the process is to download the dataset from the HM Land Registry website. This will involve filling in an online webform to specify the data we require. In our case this is all data between 1st January and 31st December 2023 in the district of SALFORD.

 1. Click this link the HM Land Registry link to open the web form in your browser.


2. Enter "SALFORD" in the DISTRICT textbox, 01/01/2023 and 31/12/2023 for the DATE range (note the UK date format) , and select "ALL" from the HOW MANY RESULTS option group at the bottom. 
3. Click the SHOW RESULTS button.

This opens a new web page showing 2947 property sale transactions for 2915 properties. We now need to download the results into a CSV file.

4. Click the  DOWLOAD DATA button at the top of the form.  This opens a web page giving us options for downloading:




5. Click GET ALL RESULTS AS CSV WITH HEADERS button to start the download. 

A dataset called ppd_data.csv will be saved to your downloads folder.  You can now open the file to inspect the data.  It will open in MS Excel by default and if you scroll to the bottom of the worksheet there should be 2947 rows.


Importing the Dataset into Access

We can now go ahead and import the data into our Access Database.

1. Click NEW DATA SOURCE on the Access EXTERNAL DATA ribbon. Choose FROM FILE when the context menu opens, and then click TEXT FILE.


This will start the GET EXTERNAL DATA - TEXT FILE wizard.

2. Click the BROWSE button on the first page of the wizard and select the text file which has been saved in your downloads folder.  Leave the top radio button selected - it reads "import the source data into a new table in the current database".  Click OK at the bottom of the wizard window.


3. The next page (see screenshot below) shows a sample of the data we are importing.  At the top it reads "Your data seems to be in a delimited format.  If it isn't, choose the format that more correctly describes your data".  If you look at the sample of data displayed, you see each field is separated by a comma.  We can therefore leave the "Delimited" radio button checked, and click NEXT.



4. The next page of the wizard (see screenshot below) asks us to choose the delimiter that separates our fields. As we noted above, our data is separated by commas, so we should now select the "Comma" radio button.  We also need to click the "First Row Contains Field Names" tickbox.  You may get pop up message explaining some of the field names have been modified to make them compatible with Access.  This is fine. Just click OK. We can now click NEXT to move on to the next page of the wizard.



5. This page (see screenshot below) gives us the option to specify information about each of the fields we are importing.  This includes specifying the data type for the field. We are going to change the data type for the price_paid field from SHORT TEXT to CURRENCY.  To do this, highlight the price_paid column and select CURRENCY from the DATA TYPE combo box. We will also change the data type on the deed_date field from SHORT TEXT to DATE WITH TIME. Another thing we are going to change is the index type (ie the Indexed field option) on the unique_id field from "Yes (Duplicates OK)" to "Yes (No Duplicates). This will ensure the data set does not contain any duplicate records.

At this stage we can clean the data set further by excluding any fields that we do not need.  To do this, highlight the field for exclusion and tick the "Do not import field" tick-box in the Field Options section.  We are going exclude the estate_type field, and all the fields from transaction_category and after. 

We can then click NEXT to continue.




6. The next page of the wizard lets us choose a primary key.  There are three options: these are "Let Access add primary key", "Choose my own primary key" and "No primary key".  We are going to select our own primary key, so tick the middle option and select the "unique_id" field from the combo box list. Then click NEXT.



7. In the last page of the wizard we are asked to choose a name for the table into which our data is to be imported.  Entry the table name "tblSalfordHouseSales" and click FINISH.


We can now take an initial look at the table of data we have imported.  Go to the Access navigation pane and open tblSalfordHouseSales (see screenshot below).


As we can see, the data set contains 2946 records.  Each record provides information about the sale of a given property and consists of  13 fields.  Some of  most useful fields we shall be using for data analysis in this tutorial are price_paid, deed_date and postcode. We also have the option of analysing house sales by property_type and whether the property is a new_build.


Checking the Data

Now we have imported the data set into Access, we can start checking it's accuracy.  However, before we start, lets make a copy of the imported table so we can undo any changes that we make without having to repeat the download and import process.  Do this by right clicking tblSalfordHouseSales in the navigation pane and select COPY from the context menu.  Then simply right click a blank space in the navigation pane and select PASTE.  A window opens where we can enter a new name to call the table we are pasting, and select which paste option we would like.  We are going to call the new table "tblSalfordHouseSalesOriginalData" and select the Structure and Data paste option.  This will paste a new table based on the same structure, and containing the same data, as the one we copied. Then click OK.




The first thing we are going to check are the postcodes (zip codes) in our dataset.  Since  the data we have downloaded is for the Salford area of Greater Manchester, each record in the dataset should start with one of the following values in the postcode field: - M3, M5,  M6, M7, M27, M28, M30, M38, M44, and M50 (after which there is a space followed by the second part of the postcode) . As such, we are going to create a query which groups the data into groups comprised of the first three values of the postcode, and provide some basic aggregate information for each grouping. From this information, we should be able to pick out any obvious anomalies contained in the data.

Create Query to Group & Aggregate Postcode Data




1. Create a new query by clicking the QUERY DESIGN icon on the CREATE ribbon.
2. Click the ADD TABLES icon (on the QUERY DESIGN ribbon) to display the ADD TABLES sidebar if it is not visible already.
3. Drag the tblSalfordHouseSales table from the tables list in the sidebar over to the QUERY DESIGN window.
4. Click the TOTALS icon in the SHOW/HIDE group of the QUERY DESIGN ribbon.  This adds a TOTAL row to the QUERY DESIGN grid.  This enables us to specify whether the field is to group the data contained therein, or perform an aggregation on it.
5. The first column in the query grid (see screenshot above) is going give us a list of postcode groups.  All the post code groups in the Salford area (see above) are 2 or 3 characters long.  So to extract the postcode group value from the value stored in the postcode field , we are going to create a calculated field in the first row of the first column of the query grid. The calculation is based on the postcode field in tblSalfordHouseSales and utilises the Left string function to take the first 3 characters of the field value.  This is enclosed within a RTrim function to remove the right blank space if the postcode group is two characters rather than three. It is written as follows: 

RTrim(Left([postcode], 3))

We will use the alias PCode as the field name. This is written ...

PCode:

... directly before the RTrim  function syntax. So altogether the first row of the first column in the query grid (see screenshot above) will read ...

PCode: RTrim(Left([postcode], 3))

We then set this field to GROUP BY in the TOTALS row.

6. In the second column of the query grid,  we are going to count the number of records within each postcode group.  So, in the ADD TABLES side bar, drag the unique_id field from tblSalfordHouseSales down to the first row of the second column in the query grid, and select COUNT from the drop down list in the TOTALs row.
7. In the third column of the query grid we are going to SUM the total of price_paid to give us the total price paid within each postcode group.  So drag price_paid from tblSalfordHouseSales down to the first row of the third column in the grid, and select SUM in the TOTALs row.
8. Click the RUN or DATASHEET icon  in the RESULTS group of the QUERY DESIGN ribbon to see the query results.



Looking at the query results above, we can see that most of the properties have a post code in the Salford area - these are M3, M5,  M6, M7, M27, M28, M30, M38, M44, and M50.  However there are three properties which do not have any postcode, and one property which has a postcode starting with WA1.  Taken together these come to a total value of £10,249,999.  In order to ensure our data set is accurate, we need to investigate why these properties do not have a Salford post code and remove them if necessary.  To do this we shall create another query to get the full information for the dubious records.



1. Repeat steps 1 to 3 above to create a new query based on the tblSalfordHouseSales table.
2. Drag the asterix * from the  tblSalfordHouseSales table down to the first row of the first column on the query design grid.  The asterix is a quick way to include all the fields in the table without having to drag each one individually.  However, if we need to enter a criteria for a particular field (which we do) we can still drag that field down separately .. 
3. As such, drag the postcode field down to the first row of the second column in the grid.
4 Click the CRITERIA row in the postcode column and enter the following expression: 

Is Null  or Like "WA1*"

The criteria Is Null Or Like "wa1*" will pull out any record containing a
postcode with a null value or a value starting with "wa1".

5. Click the RUN or DATASHEET view icon in the RESULTS group of the QUERY DESIGN ribon the see the query results



The first record returned by our query is for a property with the postcode WA14 5DY which (if you scroll across) is actually located in the town of Altrincham.  If you are familiar with the geography of Greater Manchester you will know that this town is in the borough of Trafford rather than Salford. As such, this record does not belong in our dataset and should be removed.  When we examine the data for the remaining three records, it is apparent that they pertain to land in a commerce park, a parking space, and a brown field site. As such they fall outside the scope of our interest in house sales, and can also be removed from our dataset. Its worth reflecting that had we not done this check, our dataset would have inaccurate to the value of £10,249,999.

A good way to remove these records from the dataset is to go back into QUERY DESIGN view and change the query type from SELECT to DELETE.




After changing the query type from SELECT to DELETE, you will see an addition row appear in the query grid entitled "Delete".  You will also see that Access has filled in the values for this row.  In the first column it has added "From" which means the records are being deleted from the tblSalfordHouseSales table, and in the second column it has added "Where" which means all records in the table matching our criterion Is Null Or Like "WA1*" will be deleted.  

Click the RUN icon in the RESULTS group of the QUERY DESIGN ribbon to run the delete query. 

Access will display a warning dialog saying that the 4 rows containing dubious records are going to be deleted from our table.



Click YES to continue the delete operation.

If we now go back and reopen tblSalfordHouseSales the record count will now show 2942 records, 4 less than the 2946 records we had previously.



There are many other checks that we can do to ensure our dataset is clean, but the steps mentioned above give us an idea of how we can use Access's queries and other features in this process.  Let us now move on and see how we can use Access to analyse our dataset.

Analysing Data with Access Queries

To illustrate how Access can be used to make sense of our housing sale dataset, we are going to create a number of price bands and count the number of sales in each band. This will give us a basic understanding of the housing market in Salford during 2023.  

This task will involve the creation of a subquery for each individual price band, and a parent query to pull together the sale counts within each band, showing them all in one row of aggregate data.  Lets begin with the subqueries.

We are going to create a total of seven price bands as follows:

PRICE BAND                                     SUBQUERY
< £100,000                                            qrySubBelow100K            
>= £100,000 and < £200,000                qrySub100Kto200K                
>= £200,000 and < £300,000                qrySub200Kto300K
>= £300,000 and < £400,000                qrySub300Kto400K
>= £400,000 and < £500,000                qrySub400Kto500K
>= £500,000 and < £1,000,000             qrySub500Kto1000K
>= £1,000,000                                       qrySubAbove1000K


Each of the above queries is based on the unique_id and price_paid fields of tblSalfordHouseSales.  Once again, the query is going to use the TOTALS row in the QUERY DESIGN grid because we need to aggregate the data returned.  We are going to set up our query to count the number of rows returned in the dataset when we add a criterion to the price_paid field. To do this we shall set the value in the TOTALS row of the unique_id column to COUNT,  and the value of the TOTALS row of the price_paid column to WHERE.  All we need to do then is enter the criterion in the CRITERIA row of the price_paid column.  For example, when we create qrySubBelow100K to count the number of sales less than £100,000  the criterion we enter is  <100000 as in the screenshot below:

You may have noticed how the tickbox in the SHOW row
of the QUERY DESIGN grid is unticked when we select WHERE 
in the TOTAL row for price_paid. This is because the only purpose 
of the price_paid column is to filter the rows to be included in the
count of unique_id's.

When we run the query a single row and column is returned with the count of unique_id's matching the query criterion entered for price_paid. For records with a price paid value less than £100,000 this should value should be 140.



Here are the instructions for creating the subqueries for the seven price bands.  We will need to repeat this process seven times for each of the subquerie:

1. Create a new query by clicking the QUERY DESIGN icon on the CREATE ribbon.
2. Click the ADD TABLES icon (on the QUERY DESIGN ribbon) to display the ADD TABLES sidebar if it is not visible already.
3. Drag the tblSalfordHouseSales table from the tables list in the sidebar over to the QUERY DESIGN window.
4. Click the TOTALS icon in the SHOW/HIDE group of the QUERY DESIGN ribbon
5.In the first column of the QUERY DESIGN grid, select the unique_id field from the drop down list in the top row.
6. We now need to go down to the TOTAL row in the first column and select COUNT from the drop down list.
7. In the second column of the QUERY DESIGN grid, select the price_paid field from the drop down list.
8. Going down to the TOTAL row of the second column, select WHERE from the drop down list.
9. We now need to enter the criterion in the CRITERIA row of the second column.  The criterion we enter depends which of the seven subqueries we are working on.  The criterion for qrySubBelow100K, for example, is <100000. The criteria for all seven queries are listed in the table below.
10. Click the SAVE icon at the top left of the screen and the name of subquery that we were creating.

WHERE CRITERIA                      SUBQUERY
< 100000                                           qrySubBelow100K            
>= 100000 AND < 200000               qrySub100Kto200K                
>= 200000 AND < 300000               qrySub200Kto300K
>= 300000 AND  < 400000              qrySub300Kto400K
>= 400000 AND < 500000               qrySub400Kto500K
>= 500000 AND < 100000               qrySub500Kto1000K
>= 1000000                                       qrySubAbove1000K

In addition to the seven subqueries above, we shall also create another subquery (qrySubCountAll) to give us a grand total for 2023 Salford house sales across all price bands.  This subquery is the same as those we created above, except we do NOT need the price_paid column or any WHERE clause therein. We just need the COUNT in the unique_id column as in all the previous subqueries:


When this query is run we get the same output format as the previous seven subqueries (see screenshot below).  The result is a count of all house sales in tblSalfordHouseSales.  This should be 2942.



Now we have created all our subqueries we can begin work on the parent query which pulls all the subquery results together in one row of aggregate data.

The Parent Query

We are going to call our parent query qryHousePriceGroups.  Each column in the query will display the aggregate total count of house sales for each of the price bands.  The last column will display the grand total.  To do this we will need to add all 8 subqueries we created to the QUERY DESIGN grid window and use the appropriate subquery as the data source for each of the parent query columns.



As we can see in the screenshot above, we have given each column name an alias such as "Below 100K" and "100K to 200K".  This will make the query output easier for users to read and understand.

Here are the step by step instructions for creating the parent query.

1. Create a new query by clicking the QUERY DESIGN icon on the CREATE ribbon.
2. Click the ADD TABLES icon (on the QUERY DESIGN ribbon) to display the ADD TABLES sidebar if it is not visible already.  Then select the QUERIES tab at the top of the sidebar.  You should see all the subqueries that we have created previously.
3. Drag all 8 subqueries over to the QUERY DESIGN window.  We will now start to dragging the CountOfUnique_id fields from each subquery onto the QUERY DESIGN grid below.
4. The first CountOfUnique_id field we are going to drag is from the qrySubBelow100K subquery.  Drag this down to the cell in the top row of the first column in the grid.  
5. We shall now enter the alias name for the column.  We do this by typing the name Below 100K in the same cell just before CountOfUnique_id separated from it by a colon as follows:

Below 100K: CountOfUnique_id

6. In the top row of the next column, we are going to drag CountOfUnique_id from the qrySub100Kto200K subquery.  We shall give this the alias 100K to 200K.
7. Repeat step 6 for each of the subqueries using the following alias's:

200K to 300K:
300K to 400K:
400K to 500K:
500K to 1000K:
Above 1000K:
All:

When we run the query we should get this result:



As we can see, out of a total of 2,942 house sales, the vast majority fall within the 100K to 200K, and 200k to 300K price groups.  The 100K to 200K group contains 1,258 sales, and the 200K to 300K group contains 992 sales, giving a total of  2,250 sales out of 2,942, or 76% of the total 2023 Salford housing sale count. The figures also reflect the fact that Salford has some poorer districts as well as those which are very affluent.  There were 140 house sales (5%) with  property values of less than £140,000, but 100 sales (5%) with values between £500,000 and £1000,000, and 26 sales (1%) above £1000,000.

Lets end by showing this data visually in a clustered column chart embedded  in an Access report.  Here are the step by step instructions:

1. Click the REPORT DESIGN icon in the REPORTS group of the CREATE ribbon.
2. Resize the REPORT DESIGN grid so we get a larger section of screen in which to create the chart.
3. Click the INSERT MODERN CHART drop down menu in the CONTROLS group of the REPORT DESIGN ribbon.  We are going to select the CLUSTERED COLUMN CHART from the COLUMN submenu.

4. The cursor will now change to a small chart symbol.  Position the cursor in the top left hand corner of the DETAIL section of the design grid.  A small chart will embed itself on the grid.  At this stage we just have sample data displayed in the chart.
5. Resize the chart by clicking it's border on the bottom left corner and dragging the corner down and across so we get a bigger chart.

New chart showing sample data.

6. We will now connect the chart to our parent query. We can do this from the CHART SETTINGS sidebar. Click anywhere inside the chart so it is selected. Then click the CHART SETTINGS icon in the TOOLS group of the REPORT DESIGN ribbon (if the sidebar is not displayed already).


7.  Click the QUERIES radio button under DATA SOURCE, then select the parent query that we created earlier from the drop down list.
8. Go down to the VALUES (Y axis) section below.  You will see a list of checkboxes with the price bands we created in our query. Tick each of the boxes except ALL. That is from Below 100K down to Above 1000K. You will notice there is a COUNT OF aggregate function contained within brackets after each of the check box labels. We need to change the aggregate function to NONE by clicking the drop down list which appears when you hover the mouse over the text. You should now see the data from our parent query appear on the chart. We now need to customise the appearance of the chart to make it more user friendly.
Chart showing parent query data ready
to be customised.

9. The first customisation we shall make is the addition of a data label above each chart column.  This shows the value for the count of sales for each of the price bands. To do this select the FORMAT tab at the top of the CHART SETTINGS side bar. Select the first price band Below 100K from the drop down list under the DATA SERIES title. Then tick the Display Data Label checkbox. 



    You should now see the value 140 appear above the Below 100K column on the chart.

10. Select the next price band from the Data Series drop down list and tick the Display Data Label checkbox. Do this for the rest of the price band columns on the chart.
11. For the remaining customisations, we are going to use the chart property sheet. Click anywhere inside the chart so it is selected, then click the PROPERTY SHEET icon from the TOOLS group of the REPORT DESIGN ribbon.
12. The next customisation we are going to make is the addition of a chart title. Select the FORMAT tab on the property sheet and locate the HAS TITLE and CHART TITLE properties further down the sheet.  Ensure the HAS TITLE property is set to YES, and enter "2023 Salford House Sales" for the CHART TITLE property value.



13. We shall next move the legend from the top of the chart to the bottom.  To do this locate Legend Position also on the FORMAT tab of the property sheet.  Change the property value to "bottom".
14. The final customisations we are going to make involve adding a title to the vertical axis and remove the text saying "140" below the horizonal axis.  To do this locate the primary values axis title on the property sheet and change the value to "Count of Sales";  and then locate the category axis font color property, click the ellipse button (...) in the value cell, and select white from the colour chart.  This will hide the unwanted text from view.



15. We can now open the report by clicking the REPORT VIEW button on the VIEWS group of the  REPORT DESIGN ribbon. The result should look like the screenshot below:




We can now see the data from our parent query in visual form, thereby gaining a clearer view of the concentration of house sales in the 100K to 200K, and 200K to 300K price bands.

There are, of course, many other dimensions we can use to analyse our data.  For example, we can create queries to group house sales according to property type, and find out the average value in each group.  We could also have downloaded a larger dataset which includes sales over a number of years, and create a chart showing how prices have changes over particular timespans. I intend to cover areas such as these in future posts.


Sunday, 24 March 2024

Static Variables Part 2: the Advantage of Static Variables

In part one of this tutorial - Using Static Variables - we examined the difference in behaviour between an ordinary local variable and a static variable.  We learnt that whilst both types can only be referenced within the same sub procedure or function in which they were declared, the value of a static variable is retained after code in the sub procedure or function has completed execution.  The terms we used to understand this behaviour were scope and lifetime respectively.

In part two, this current post, we are going to consider how we can use the unique scope and lifetime of a static variable to our advantage.  In order to illustrate this we shall examine the workings of an order counter custom control comprised of a textbox and two control buttons.


An Order Counter Control

This sort of control may be familiar to you if you order your groceries online. It usually appears under an image of a product and allows the customer to set the order amount by clicking the plus and minus buttons without having to type in the actual number in the textbox.  It is of interest to us because it lends itself to the use of a static variable to keep track of the order amount.


The advantage of Static Variables

So what is it about static variables that make them advantageous to use in contexts such as that of our order counter control?  To understand this lets take look at how the VBA code used to power this it is organised.


VBA Code Powering the Order Counter
VBA Code Powering the Order Counter

As we can see, the coding for the form containing the order counter is comprised of five sub-procedures and a general declarations section where three constants, INITIALIZE, INCREMENT, and DECREMENT are declared.  Three of these sub-procedures, Form_Load(), cmdMinus_Click(), and cmdPlus_Click(),  contain event handling code which execute when the form opens, or the minus and plus buttons are clicked, respectively.  Each consist of one statement calling changeOrderAmount(...), the sub-procedure used to calculate the order amount displayed in the txtOrderValue textbox.  This is the location where our static variable, intOrderAmount, is declared.

It is in the changeOrderAmount(...) sub-procedure where the value of intOrderAmount is initialized, incremented or decremented within a SELECT...CASE statement, which is, in turn, based on the value of the intApplyChange argument it received when called. For example, if the  INCREMENT constant was received, the value of intOrderAmount is incremented by 1.  Likewise, DECREMENT decreases the value by 1, and INITIALIZE sets the value to 0. Note the fact that the actual order amount is not set directly when incremented or decremented, but is calculated from its previous value.  For this reason intOrderAmount has to persist across multiple calls to the sub procedure, and is why it is declared as Static.

In order to achieve the above without using a static variable, we could instead declare intOrderAmount in the general declarations section of the form's code module using a Dim  statement or the Private keyword. This gives the variable a module level scope and lifetime: it can now be referenced from any sub-procedure or function within the same module, and any value it contains will be retained while the form connected to the module remains open.  As such, the module level version of intOrderAmount has the same lifetime as our original static version declared within the changeOrderAmount(...) sub-procedure. However, since the variable only needs to be referenced from within the sub-procedure, to give it module level scope in the above mentioned way will leave it unnecessarily exposed to accidental editing as well as making our code less easy to read.

These issues arising from module level scope become more pronounced as more code is added to the module.  Other functions and sub procedures are likely to use their own variables, some of which may also have module level scope. These are generally less easy to keep track of because they can be modified from any location in the module, often under a variety of different conditions. In this sort of context it becomes increasingly difficult to read and understand our code, and  there is a greater chance of accidentally referencing a different variable to the one intended, especially if it has a similar name or performs a similar role. As such, using a static variable with a scope local to the sub-procedure or function in which it is declared, whilst also retaining it value over multiple calls, is the much better option.


Conclusion

The advantage of using static variables lies in the fact that they are, on the one hand,  declared within a sub-procedure or function, and thereby have a local scope; and on the other hand, retain their value once code in the parent sub-procedure or function has completed execution.  As a result, the variable is protected against accidental editing from code elsewhere in the project, whilst retaining its value in order to keep a running total over multiple calls to its parent sub procedure or function.  Furthermore, since static variables are declared within sub-procedures or functions, as opposed to an external location in the project, our code is more manageable, and hence easier to read and understand.  Generally speaking, our code is more streamlined and robust.


Saturday, 9 March 2024

Using VBA Static Variables

In this post we are going to look at what static variables are, how they work, and how they are different from ordinary local variables.  In a future post we will build upon this and consider the advantage of  using static variables over locals, and look at an example of how they may be used in practice.


The Difference between Local and Static Variables

So what are static variables exactly? To understand this, let's begin by examining how ordinary local variables work. All variable types have there own particular scope and lifetime.  The first of these determines the locations within the program from which the variable can be accessed, and the latter determines how long the variable is active.  


Local Variables

Local variables have a scope which is limited to the sub procedure or function from which it is declared. For example:


Private Sub testThis()
    Dim intExample As Integer
    intExample = 10
    MsgBox ("The value of our test variable is " & intExample)
End Sub

Here we have declared an integer variable called intExample within a sub procedure called testThis().  Since the variable was declared within the sub procedure, it is local in scope - the value of 10 it  subsequently contains can only be accessed within the testThis() sub after the declaration statement. So if the MsgBox Statement  referencing the intExample variable is used outside the sub, the variable is not recognised. 


Furthermore, once code in the sub procedure completes execution, the intExample variable ceases to exist, and it's value of 10 is lost. The next time the testThis() sub executes, intExample's value will need to be set anew. As such ordinary local variables are limited to the length of time the sub procedure or function is executing. This is an example what is meant by a variables lifetime.


Static Variables

Now let's take a look at how the scope and lifetime of a static variable works in contrast to an ordinary local variable.  Consider the following code snippet:


Private Sub testThis()
    Static intExampleStatic As Integer
    Dim intExampleLocal As Integer
    intExampleStatic = intExampleStatic + 1
    intExampleLocal = intExampleLocal + 1
    response = MsgBox("The value of intExampleStatic is " & intExampleStatic & _
        ", whereas the value of intExampleLocal is " & intExampleLocal & ".",                     vbInformation, "Static & Local Variable Output")
End Sub

Here we have created another sub procedure called testThis(), and declared a static variable and an ordinary local variable, intExampleStatic and intExampleLocal respectively.  Notice how the declaration statement for the static variable uses the Static keyword rather than Dim ie: 


Static intExampleStatic As Integer


in contrast to ... 


Dim intExampleLocal As Integer   


Since both variables are declared as Integer, they are automatically initialized with a value of 0.  Then in the following lines of code we increment the value of  each of the variables by 1:


intExampleStatic = intExampleStatic + 1
intExampleLocal = intExampleLocal + 1


... and display both values using a message box statement: 


response = MsgBox("The value of intExampleStatic is " & intExampleStatic & _
        ", whereas the value of intExampleLocal is " & intExampleLocal & ".", vbInformation, "Static & Local Variable Output")

Now the scope of both kinds of variable, local and static, work in exactly the same way.  That is to say, static variables, along with ordinary local variables, can only be accessed from within the same sub procedure or function from which they were declared.  Where they differ is in their respective lifetimes.  Whereas a local variable ceases to exist after the code within the same sub or function completes execution, the value contained in a static variable will be retained - at least until the codes'  underlying form or report is closed, or in the case of a global module, until the database itself is closed. To demonstrate this in action, lets see what happens when the testThis() sub procedure starts being called.


The first time testThis() is called the message box output shows that the intExampleStatic and intExampleLocal variables have the same value ie 1:


Message Box Output


It is not until the second and subsequent calls that we see the difference in lifetime between static and local variables taking effect.  This is because the value stored in the local variable is lost each time the sub procedure completes, and then re-initialized the next time it is called.  The static variable, on the other hand, retains its value from the previous call.  Hence, the output after the second call shows the static variable to have a value of 2, but the local variable again has a value of 1:




This pattern continues each time the sub procedure is called: the value of the static variable is retained and any increments are added to this retained value, whereas the local variable is created anew and any increments are applied to its initialization value of 0.  Hence the third time testThis() is called, the value of the static variable is increased to 3, and the value of the local variable remains at 1:




... and so on until the sub procedures's underlying form or report is closed.  It is not until the next time the form or report is reopened that the static variable's value is re-initialized to 0.


So this has shown us how static variables work and how they differ from ordinary local variables.  In the next blog post we are going to consider the advantages of using static variables over local variables, and will demonstrate this by creating a a simple order counter control where a user clicks two buttons to increment or decrement the quantity of an item to be ordered without having to type the actual number.


Wednesday, 21 February 2024

Display a Random Customer Record using VBA and Form Filter

Suppose you are responsible for managing an Access Database containing customer records.  You have been asked to introduce a contact management feature whereby the person operating the database can extract a random customer record.  This blog post walks you through the VBA code needed to generate a random number and use it to filter out a customer record, all at the click of a form button.

In order to demonstrate this we will be using the following data:


Table of Fictitious Customer Records


We will display the random customer record on the following form:



The VBA code starts when the user clicks the "Get Random" button to display a random customer record on the form. Lets begin by creating the control button.


Creating the Control Button

1. Open the form in DESIGN VIEW
2. Click the BUTTON icon in the CONTROLS group on the FORM DESIGN ribbon.
3. Click the location on the form grid where we will be positioning the button.  Resize the control button by clicking on one of the buttons' corner or edges and dragging to enlarge or contract as appropriate.
4. Click the PROPERTY SHEET icon from the TOOLS group of the FORM DESIGN ribbon if it is not already open.
5. Ensure the control button is selected. Click the button to select if it is not. Now the property    sheet is displaying the properties specific to our button.
6. Change the button CAPTION property on the FORMAT tab to "Get Random".
7. Change the button NAME property on the OTHER tab to "btnRnd"

Check how the button looks by changing  to FORM VIEW from the VIEWS group of the FORM DESIGN ribbon. If its OK, we are now ready to add the VBA code which runs when the command button is clicked.


Adding the VBA Code

1. Open the form in DESIGN VIEW, and reopen the PROPERTY SHEET if it isn't already.
2. Select the control button by clicking on it.
3. Select the EVENT TAB on the PROPERTY SHEET, and click inside the grid cell adjacent to where it says ON CLICK.
4. Click the ellipse button ("...") at the end of the cell to open the CHOOSE BUILDER dialog, Select CODE BUILDER from the drop down list and click OK to open the VBA editor.
 

You should now see an empty sub procedure called btnRnd_Click(). This is the control button's event handler.  This means the code we enter here will run when the user clicks the command button at runtime.

 

The VBA to Generate a Random Number

In order to retrieve a random customer record we need to generate a random number which corresponds to one of the customers in the database table. We can do this using an inbuilt VBA function called Rnd.  This generates a random decimal number between 0 and 1. For example, it  may return the value 0.5924582.  For this to be of any use to us, we need change the decimal number to an integer and multiply it by the number of customer records in the table.  So, for example, our table contains 10 customer records so we need to multiply the value returned by rnd by 10 and use another function called int to convert it from decimal to integer. We do this as follows:

    Int(rnd * 10)

This returns a value between 0 and 9.   We now need to add 1 to the value to get a number between 1 and 10:

    Int(rnd * 10)+1

Just one more thing to note: computers are unable to return a true random number on their own.  They return a seed number based on a pre-existing sequence of numbers. However, when we initialize the random number generator with the Randomize statement, the seed number returned is based on the system timer, which effectively makes the returned value a true random number.

So to put all this together we are going to create our own custom function to return a random integer number based on the number of records in our database table. The number returned will correspond to a particular record in the table. We shall call our function getRandom and will pass the number of customer records as the parameter. The code for the function is as follows:


Private Function getRandom(intRecordCount As Integer) As Integer  
    Randomize
    getRandom = Int((intRecordCount * Rnd) + 1)  
End Function

For more information on how custom functions work, please check out my post on Writing Custom Functions for Access VBA.

The next step is to write the VBA code to connect the random number returned by our getRandom function with it's corresponding customer record in the database table.  We shall do this by looping through a DAO Recordset containing records from the customer table.


Looping through the DAO Recordset


Before we can create the code to filter a customer record based on our random number, we need to find a way of mapping any potential number returned by the getRandom function  (in our case this is between 1 and 10) against the records in the database table. Whilst it is tempting to use the returned random number directly against the value stored in the customer_id field in a form filter WHERE clause - ie "customer_id = " & getRandom(intRecordCount) - there is a problem with this.  That is to say, although the customer id's in our table go up in an unbroken sequence from 1 to 10, it is possible that one of these records will be deleted in the future.  If and when this happens, no customer record would be found using the above WHERE clause if the random number corresponded to the deleted record. Moreover, the record count would not include the deleted record so the random number returned would be within a range which did NOT include the last record in the table.

What we must do, therefore, is create a DAO Recordset containing all the records in our customer table and use the value returned by getRandom to step through each customer record in the set until this value has been reached - eg if getRandom returns the value 8, we simply loop though the recordset to the 8th record contained therein.  Once we get to this record we can look up the actual value in the customer_id field, and then use this as the basis of our form filter WHERE clause.

If you are unfamiliar with how DAO recordsets work, you may want to check out my post on Accessing Data with VBA Code - Introducing the DAO Recordset; otherwise, lets jump straight in, and run through the code we will be using.

The first step is to create a new instance of a DAO recordset object and set its value to that of our  tblCustomer table.

Dim rstCustomers As DAO.Recordset
Set rstCustomers = Application.CurrentDb.OpenRecordset("tblCustomer", dbOpenSnapshot)

You may have noticed the second parameter we have used when opening the recordset is dbOpenSnapshot.  This means our recordset will be a static snapshot of the customer table. Unlike the versatile dbOpenDynaset type, records cannot be updated using dbOpenSnapshot, but in cases where editing is not required, the snapshot type is the more efficient option. 

In the next step we are going to count the number of records in the rstCustomers recordset and store the value in an integer variable called intCount.  To count the number of records in rstCustomers we will read the RecordCount property using the following syntax:  rstCustomer.RecordCount.  However, before doing so, it is first necessary to visit all the records in the recordset so they can all be counted. The best way to do this is to use the recordset MoveLast method which moves the recordset cursor though each of the records before pointing to the last record in the set. The syntax for this is rstCustomers.MoveLast. All this is done in three lines of code as follows:

Dim intCount As Integer
rstCustomers.MoveLast
intCount = rstCustomers.RecordCount

Now that we know how many records are in the customer table, we can use our getRandom function, passing intCount as the parameter, to obtain a random number.  We then step through the records in the recordset until we reach this number.  Once this record has been reached, we can read the value contained in the customer_id field.  

To do all this we shall begin by defining an integer variable called intCustomerId.  This will be used to store the customer id once we have arrived at the random customer record.  Before we set up the loop, we need to move the recordset cursor back to the first record using the MoveFirst  method.  This is because the cursor is still at the last record from when did the record count in the section of code above.  We will use a For...Next loop to step through the recordset, starting at the first record and ending at the record which corresponds to the value returned by our getRandom function. This is done with the following line of code: For i = 1 To getRandom(intCount).  The first statement within the loop sets the value of the intCustomerId variable to the value contained in the customer_id field of the record where the recordset cursor is located. We reference the field value using the recordset name ("rstCustomers") followed by an exclamation mark ("!") and then the field name ("Customer_Id") as follows: intCustomerId = rstCustomers!Customer_Id. Once this value is stored in our intCustomer variable we can move the recordset cursor to the next record using the MoveNext method: rstCustomers.MoveNext. Finally the loop counter is incremented by 1 with Next i.  If the end value of the loop counter had been reached before the Next statement had executed, the loop counter will still increment by 1, but the program flow will exit the loop and move to the statement following Next i.  

Dim intCustomerId As Integer
rstCustomers.MoveFirst
For i = 1 To getRandom(intCount)
           intCustomerId = rstCustomers!Customer_Id
           rstCustomers.MoveNext
Next i    

 

So we have stepped through the recordset and arrived at our random record storing the value of it's customer_id field in the intCustomerId variable.  Now the program flow has exited the loop we are going to use the intCustomerId value to display the record on our form.  This will be done by means of a form filter containing a WHERE clause based on the stored customer_id. The syntax for the WHERE clause is "Customer_id = " & intCustomerId and the syntax to create form filter is DoCmd.ApplyFilter filtername, whereclause, controlname.  Although the ApplyFilter method has three parameters, we are just going to use the second - the WHERE clause.  As such we write the full statement as follows:

DoCmd.ApplyFilter , "Customer_id = " & intCustomerId

(NB Although the first parameter is left blank we still need the separating coma just before the WHERE clause string.  The last parameter is also left blank, but a separating coma after the WHERE clause is not needed.)

Lastly we need to close the rstCustomer recordset and set its value to NOTHING.  This clears the memory that it had been taking up and allows us to reopen the recordset again the next time the user clicks the getRandom button. We place this code within a IF conditional statement which checks the recordset is still open before it executes thereby avoiding any error from occurring if it is not.

If Not rstCustomers Is Nothing Then
    rstCustomers.Close: Set rstCustomers = Nothing
End If



All we need to do now is look at how all the code we have run through above is organised in the code module for the frmCustomer form.



As you can see from the screenshot above, our code is divided into two Sub Procedures and one Function, btnRnd_Click(), RandomCustomerRecord() and getRandom() respectively.  When the user clicks the Get Random button on the Customer Form, the code in the btnRnd_Click() sub executes first. This is the event handling sub that Access created automatically when we set up the control button at the start of the tutorial.  There is just one statement here calling the second sub procedure which is RandomCustomerRecord().  This contains the main body of the code for retrieving the random customer record from the database and displaying it on the form.  The reason for giving this it's own sub rather than placing it in the code handler, is that we can potentially reuse this code by calling it from different locations in the program as required.  The only task this code does not do is generate the random number representing a specific customer record for selection.  This code is located in the getRandom() Function which is called from the FOR statement of the FOR...NEXT loop. The number returned from the function determines how many records are stepped through in the rstCustomers recordset before getting to the record which is to be displayed right at the end of the process.

Once RandomCustomerRecord() has the random number and retrieved the customer record from the recordset, the form filter is applied using the value of cutomer id.  The form will then display the record with the same customer_id as that stored in the sub's intCustomerId variable.



And there we have it: a random customer record at the click of a button!