1. What is data linkage?
Data linkage is a technique for creating links within and between data sources so that information that is thought to relate to the same person, family, place or event can be connected for analysis.
2. What is a Linkage Key?
The linkage process generates a set of indices sometimes called “linkage keys” that are stored by the Data Linkage Branch in a Links Table. These “linkage keys” point to records thought to belong to the same person, and are held separately from any personal demographic information. They enable related records to be joined together for approved research projects.
3. What datasets are linked?
The data collections routinely linked by the Data Linkage Branch are:
- Midwives Notifications
- Cancer Registrations
- Mental Health contacts
- Hospital Separations
- Emergency Presentations
- Electoral Roll
- Birth Registrations
- Death Registrations
- Marriage Registrations (for linkage only, not available for research)
Electoral, birth and death records are linked routinely under a special arrangement with the WA Electoral Commission and the WA Registry of Births, Deaths and Marriages.
Updates to the links are ongoing and new demographic information is received on a regular basis. This ensures that the links in the WADLS remain as up to date as possible. For more information on the datasets currently linked please see our Available Datasets page.
4. What is "core" data?
We refer to the population health data collections managed within the Department of Health WA as the “core” data. Electoral records, birth and death registrations are also considered to be “core” data for the WADLS.
5. What is the separation principle?
A separation principle was developed to address privacy concerns and enable data custodians to retain control over access to information in their care. This protocol is now referred to as the “best practice protocol” and is used widely by a number of linkage centres across the country.
The principle consists of four distinct steps. In this way, access to identifying information is restricted to a specialised linkage team who perform the first and second steps. Data custodians are involved in the third step. Researchers are only involved in the last step and therefore do not need to access any personal identifying information.
- Linkage staff create, store and manage links in a dynamic Linkage System using confidential personal demographic information.
- Linkage staff extract subsets of links from the linkage system, then encrypt these “linkage keys” differently for each particular project.
- Encrypted “linkage keys” are provided to the custodians (of the separate datasets) so they can add them to their clinical or service details for that particular project.
- Lastly, researchers receive clinical or service details from each data custodian and use the encrypted linkage keys to connect the details needed for their analyses.
6. Why is data linkage useful?
Data linkage adds value to routinely collected data, because the information required to study complex diseases is rarely found in one place. Epidemiologists and population health and health services researchers need to study many factors to make sure their research is meaningful.
It is fair to say that the WADLS holds the “keys” to health and medical research in WA. The “chain of links” design of the WADLS has enabled it to be easily updated and expanded. The system now plays an important part in helping many researchers to discover what makes people healthy. It is an extremely valuable research tool for academics, policy planners and analysts.
7. What is linked data used for?
There are many applications for linked data:
- Population based health research and policy development
- To investigate potential projects i.e. testing hypotheses and pilot studies
- As a capture-recapture tool, to improve the quality of datasets
- For follow-up and comparison of different treatment regimes
- To study the aetiology, co-morbidities and outcomes of disease
8. Who can use linked data?
Access to linked data is granted to Data Applicants who have obtained approval from:
- the relevant Data Custodians to ensure the data requested is appropriate for the purpose of the project, and
For research projects, approval is also required from:
- Department of Health Human Research Ethics Committee(s), and
- the Department of Health Research Governance Office.
Other approvals may be required depending on the nature of the request. Strict protocols must be followed to ensure the confidentiality and security of linked data, and wherever possible, research should be performed using unidentifiable data.
For more information please see the Data Linkage Branch Access and Charging Policy.
9. How do I acknowledge the Data Linkage Branch and Department of Health WA in publications?
Acknowledging the Data Linkage Branch, Department of Health WA and other data collections in publications is part of the Data Linkage Branch Access and Charging Policy and undertakings signed to by Principal Investigators. The acknowledgment will vary according to the individual project, but here are some examples:
- Acknowledgement 1: Standard Project The authors wish to thank the staff at the Western Australian Data Linkage Branch and [insert names of Data Collections involved].
- Acknowledgement 2: More complex project The authors wish to thank the Linkage and Client Services Teams at the Western Australian Data Linkage Branch, in particular [insert names of staff who provided extra help], as well as [insert names of Data Collections/Custodians involved].
- Acknowledgement 3: Required where Cause of Death Unit Record File (COD URF) data has been used for analysis The authors wish to thank the State and Territory Registries of Births, Deaths and Marriages, the State and Territory Coroners, and the National Coronial Information System for enabling COD URF data to be used for this publication.
- Acknowledgement 4: National study using data collections from multiple states The authors wish to thank the staff of the data linkage units of the State and Territory health departments (WA, Victoria, SA-NT, NSW, QLD) for the linkage of the data. Further, we thank the data custodians for the provision of the following data:
- Inpatient hospital data (5 States and Territories)
- Emergency Department data (5 States and Territories)
- State and Territory Registries of Births, Deaths and Marriages, the State and Territory Coroners, and the National Coronial Information System for enabling CODURF data to be used for this publication
The Data Linkage Branch also encourages Data Applicants to acknowledge the people of Western Australia, whose data is being used for these projects.
10. How do I add/remove personnel to my project?
The applicant should submit an online Amendment Form to the DoH HREC Executive Officer via the WA Health Research Governance System (RGS). See the Amendments page for more information.
11. There is something wrong with my data, who do I contact?
Please contact the Data Linkage Branch Client Services team at firstname.lastname@example.org
12. Who do I contact to discuss data variables?
Queries related to data variables should be directed to the relevant individual Data Custodians. For further information please see the contacts listed in the Datasets table.
13. Why can’t I access certain variables?
There are some variables contained in the data collections which are deemed to be identifiable or potentially identifiable (e.g. name, full date of birth, address). The National Health and Medical Research Council (NHMRC) National Statement states that the public benefit of using personal health information must outweigh the risk to privacy; therefore wherever possible only non-identifiable data will be released for medical and health research.
14. What am I allowed to release in publications?
No information that will directly or indirectly identify individuals should be released in publications. When there are a small number of people in a study group, be careful about describing details (e.g. cause of death) in the text. The same applies to tables, graphs and maps. Cell suppression for small cell counts is generally applied to all project outputs, unless you wish to seek specific approval from the ethics committee and Data Custodians to publish such information. If you have any queries about what you can include in your publication, please contact Janine Alan.
15. Which datasets can I obtain SEIFA/RA codes for?
SEIFA/RA codes can be added to the datasets which are routinely geocoded, using the ABS areas for 1996, 2001, 2006 and 2016 census data:
- Emergency Department Data Collection
- Death Registrations
- Hospital Morbidity Data Collection
- Midwives Notification System
16. I have discovered a breach in protocol, what do I do?
Please contact a member of the Data Linkage Branch Client Services Team for advice at email@example.com.
17. How do I know the progress of my project?
To ask about the status of your project please contact the Data Linkage Branch Client Services Team at firstname.lastname@example.org.
18. How will my data be delivered?
All complete data files are delivered to the relevant analyst via a secure online file transfer system. Files are encrypted and the password is sent separately via SMS.
19. How does the Data Linkage Branch ensure the validity of it's links?
The Data Linkage Branch employs a variety of approaches and tools to ensure that the links we make between records and chains are of the highest quality. For more information, or download our Linkage Quality paper.
20. How do I request feedback on my draft outputs?
All draft outputs, regardless of format, resulting from the review of linked data must be sent to the Data Linkage Branch and Custodians for review. Please allow two weeks (10 business days) for draft output review, or possibly longer if the output is very long (e.g. a student dissertation or large report).
Please email your draft outputs to DataServices@health.wa.gov.au. A DLB Client Services staff member will coordinate the review of the material by Data Custodians on your behalf. The review will focus on:
- Impact on privacy and risk of reidentification of the data, including checking the suppression of low cell counts
- Issues relating to the data provided which may impact the interpretation of results, such as data quality, scope and coverage, and appropriateness of data items used
- Terminology and descriptions of data collections and linkage
- Appropriate acknowledgements to data collections and DLB (see related FAQ#10)
DLB Client Services will provide any feedback via email. Any issues regarding privacy or inappropriate use of the data will be reported directly to DOH HREC for consideration and liaison with the Principal Investigator.
21. I want to apply for linked data - what are the next steps?
Make sure you have a read of the Application Process, then when you’re ready to apply, download the relevant forms from the Applications forms page. Complete the forms and submit them to email@example.com, and we’ll be in touch within a day or so.
22. Do you provide any training on how to apply for linked data?
We certainly do – information on our training workshops can be found here.
The researcher training workshop is a whole day session with hands on activities and a deeper look at what the Branch does as well as how to apply for linked data.
23. What should I include in a data availability statement in publications?
The following wording can be used for data availability statements in publications:
The datasets generated and/or analysed during the current study are not publicly available due to the terms of the ethics approval granted by the Western Australian Department of Health Human Research Ethics Committee (WADOH HREC) and data disclosure policies of the Data Providers. The datasets may be available from the corresponding author upon request and subject to approval from the WADOH HREC and relevant custodians
24. What/how many data items are required for ad-hoc data linkage?
Data linkage is most often performed using probabilistic data linkage techniques, but can also be done deterministically, or pseudo deterministically.
To achieve the maximum possible linkage rate, ideal data fields include:
* Given name(s)
* Date of birth
* Date of event
* Fields related to Core Linked datasets (ie. UMRN)
If only some of these fields can be provided, linkage can still be done, however the linkage rate is often much lower. Additionally, in cases where there is limited demographic information provided a record might link equally well to two or more individuals. In such cases it will not be linked at all to avoid the overprovision of data. See https://www.datalinkage-wa.org.au/data/information-data-providers/ for more information.