Data linkage is a technique for connecting pieces of information that are thought to relate to the same person, family, place or event. Information is created when a person comes into contact with certain services, for example, when they visit an emergency department, stay in a hospital or register the birth of their child.
The Data Linkage Branch (DLB) links data in Western Australia. Data linkage techniques in WA have been developed to ensure the best possible matching while at the same time protecting personal privacy. There are two parts to records linked by the DLB:
- Demographic data – this is identifiable information such as a person’s name or address.
- Content data – this is the information about what happened to the person, such as diagnosis and treatment in hospital.
Privacy is protected by separating these data before it is provided for linkage. This practice is known as the ‘separation principle’.
Since there are often millions of records being linked this way, highly specialised computer programs do most of the matching. For some of the more difficult matches a Linkage Officer will look at the records and make a decision about whether it is a true match.
The DLB Linkage Team matches only the demographic information, and then makes a special unique ID, called a “linkage key”, for each group of records that belongs to one person. These keys can then be used for approved requests, to join up the content data of the records, without releasing the person’s name or other identifying information.
WA Data Linkage System (WADLS)
The WA Data Linkage System (WADLS) stores the linkage keys created by the DLB. To create and maintain the WADLS, the DLB’s Linkage and System Teams have developed a bespoke linkage system in house, termed ‘DLS3’. The system is highly versatile and completely integrates and streamlines all aspects of the “end to end” linkage process.
The linkage process can be split into the following steps:
Obtain demographic data, clean and standardise
Raw data is provided for linkage. All or some of the following demographic fields are included:
- Name (first name, second name, family name, aliases)
- Date of Birth
- Address (house number, street name, suburb, postcode)
- Other unique identifiers (e.g. Hospital Unit Medical Record Number)
The data fields are cleaned and put into a standard format that can be used for linkage. Customised identifiers are assigned. For example:
- MC DONALD > MCDONALD
- 12th August 1982 > 19820812
Load Demographic Tables
The demographic details are loaded into tables in a relational database. There are different tables for different datasets because not all datasets have the same variables.
Run Linkage Engine and Load Links
The linkage program runs comparisons between two datasets. Linkage strategies are customised according to the individual characteristics of each dataset.
Some links pass as automatic matches, some are automatic rejections, and some fall into a “grey area” in between, where links are manually checked by Linkage Officers for validity.
With more than 1.2 million records, on average, being linked every week by the DLB), and a dynamic and constantly changing system, it is important to ensure that the links we make between records and chains are of the highest quality. There are many ways to assess the quality of both existing and proposed links and DLB employs a variety of strategies and tools to ensure that our linkage system contains the highest quality links. These are detailed in DLB’s Linkage Quality Paper.
Linkage strategies are also regularly revisited to ensure that the system of links is continually refined and improved.
Extract Linkage Keys
Customised project specific linkage keys are extracted by encrypting the “linkage key” for each chain of records. These are the keys that have service data attached by the various data collections.
Changes to Linkage Keys
Please be aware that the WA Data Linkage System (WADLS) is a dynamic system where data and links are created, modified and deleted on a regular basis. Although linkage keys are considered to be “person identifiers”, they are not unchanging – when linkage keys are extracted, each one is designated by selecting the “master” record ID (also known as the “ROOT LPNO”) from the given chain of records. The algorithm that performs this task is designed to maximise the stability and consistency of the keys, however some degree of variation is unavoidable.
Changes may occur when:
- New data is loaded and linked
- Previously unlinked records are belatedly linked into a chain
- Two or more chains are found to belong to the same person and are merged together
- A single chain is found to belong to multiple people and is split apart
- A record is deleted from the WADLS (e.g. at the direction of the Data Provider)
- A dataset is reloaded in a new format (e.g. after a database migration at the source)
Please note that this limitation may affect requests for data updates, as well as any project that requires multiple iterative data extractions. See the Amendments and Data Updates page for more information.