4. What is “core” data?
We refer to the population health data collections managed within the Department of Health as the “core” data. Electoral records, birth and death registrations are also considered to be “core” data for the WADLS.
5. What is the separation principle?
A separation principle was developed to address privacy concerns and enable data custodians to retain control over access to information in their care. This protocol is now referred to as the “best practice protocol” and is used widely by a number of linkage centres across the country.
The principle consists of four distinct steps. In this way, access to identifying information is restricted to a specialised linkage team who perform the first and second steps. Data custodians are involved in the third step. Researchers are only involved in the last step and therefore do not need to access any personal identifying information.
- Linkage staff create, store and manage links in a dynamic Linkage System using confidential personal demographic information.
- Linkage staff extract subsets of links from the linkage system, then encrypt these “linkage keys” differently for each particular project.
- Encrypted “linkage keys” are provided to the custodians (of the separate datasets) so they can add them to their clinical or service details for that particular project.
- Lastly, researchers receive clinical or service details from each data custodian and use the encrypted linkage keys to connect the details needed for their analyses.
6. Why is data linkage useful?
Data linkage adds value to routinely collected data, because the information required to study complex diseases is rarely found in one place. Epidemiologists and population health and health services researchers need to study many factors to make sure their research is meaningful.
It is fair to say that the WADLS holds the “keys” to health and medical research in WA. The “chain of links” design of the WADLS has enabled it to be easily updated and expanded. The system now plays an important part in helping many researchers to discover what makes people healthy. It is an extremely valuable research tool for academics, policy planners and analysts.
7. What is linked data used for?
There are many applications for linked data:
- Population based health research and policy development
- To investigate potential projects i.e., testing hypotheses and pilot studies
- As a capture-recapture tool, to improve the quality of datasets
- For follow-up and comparison of different treatment regimes
- To study the aetiology, co-morbidities and outcomes of disease
8. Who can use linked data?
Access to linked data is granted to Data Applicants who have obtained approval from the relevant Data Custodians to ensure the data requested is appropriate for the purpose of the project.
For research projects, approval is also required from Department of Health Human Research Ethics Committee(s), and the Department of Health Research Governance Office.
Other approvals may be required depending on the nature of the request. Strict protocols must be followed to ensure the confidentiality and security of linked data, and wherever possible, research should be performed using unidentifiable data.
For more information please see the Access and Charging Policy.
9. How do I acknowledge Data Linkage Services and the Department of Health WA in publications?
Acknowledging Data Linkage Services, Department of Health WA and other data collections in publications is part of the Department of Health’s Access and Charging Policy and undertakings signed to by Principal Investigators. The acknowledgement will vary according to the individual project, but here are some examples:
- Acknowledgement 1 – Standard Project: The authors wish to thank the staff from the Department of Health WA’s Data Linkage Services and [insert names of Data Collections involved].
- Acknowledgement 2 – More complex project: The authors wish to thank the Linkage, Data Outputs and Client Services Teams from the Department of Health WA’s Data Linkage Services, in particular [insert names of staff who provided extra help], as well as [insert names of Data Collections/Custodians involved].
- Acknowledgement 3 – Required where Cause of Death Unit Record File (COD URF) data has been used for analysis: The authors wish to thank the Australian Co-ordinating Registry, the Registries of Births, Deaths and Marriages, the Coroners, the National Coronial Information System and the Victorian Department of Justice and Community Safety for enabling COD URF data to be used for this publication.
- Acknowledgement 4 – National study using data collections from multiple states: The authors wish to thank the staff of the data linkage units of the State and Territory health departments (WA, Victoria, SA-NT, NSW, QLD) for the linkage of the data. Further, we thank the data custodians for the provision of the following data:
- Inpatient hospital data (5 States and Territories)
- Emergency Department data (5 States and Territories)
- COD URF: Australian Co-ordinating Registry, the State Registries of Births, Deaths and Marriages, the Coroners, the National Coronial Information System and the Victorian Department of Justice and Community Safety
Data Linkage Services also encourages Data Applicants to acknowledge the people of Western Australia, whose data is being used for these projects.
10. How do I add/remove personnel to my project?
The applicant should submit an online Amendment Form to the Department of Health HREC Executive Officer via the WA Health Research Governance System (RGS). See the Amendments page for more information.
11. There is something wrong with my data, who do I contact?
Please contact the ISPD Client Services team at DataServ@health.wa.gov.au.
12. Who do I contact to discuss data variables?
Queries related to data variables should be directed to the relevant individual Data Custodians. For further information please see the contacts listed in the Datasets table.
13. Why can’t I access certain variables?
There are some variables contained in the data collections which are deemed to be identifiable or potentially identifiable (e.g., name, full date of birth, address). The National Health and Medical Research Council (NHMRC) National Statement states that the public benefit of using personal health information must outweigh the risk to privacy. Wherever possible, only non-identifiable data will be released for medical and health research.
14. What am I allowed to release in publications?
No information that will directly or indirectly identify individuals should be released in publications. When there are a small number of people in a study group, be careful about describing details (e.g., cause of death) in the text. The same applies to tables, graphs and maps. Cell suppression for small cell counts is generally applied to all project outputs, unless you wish to seek specific approval from the ethics committee to publish such information. If you have any queries about what you can include in your publication, please contact DataServ@health.wa.gov.au.
15. Which datasets can I obtain SEIFA/RA codes for?
SEIFA/RA codes can be added to the datasets which are routinely geocoded, using the ABS areas for 1996, 2001, 2006 and 2016 census data:
- Emergency Department Data Collection
- Death Registrations
- Hospital Morbidity Data Collection
- Midwives Notification System
16. I have discovered a breach in protocol, what do I do?
Please contact a member of the ISPD Client Services Team for advice at DataServ@health.wa.gov.au.
17. How do I know the progress of my project?
To ask about the status of your project please contact the ISPD Client Services Team at DataServ@health.wa.gov.au.
18. How will my data be delivered?
All complete data files are delivered to the relevant analyst via a secure online file transfer system. Files are encrypted and the password is sent separately via SMS.
19. How does the Data Linkage Branch ensure the validity of it’s links?
The Department of Health employs a variety of approaches and tools to ensure that the links we make between records and chains are of the highest quality. For more information, or download our Linkage Quality paper.
20. What things should I check in my draft output prior to publication?
Please review your draft outputs prior to publication to ensure:
- Small cell counts have been suppressed (at a minimum, expressed as <=5), unless you have been granted approval from DOH HREC to publish smaller counts
- Data Collections are appropriately named (see Available Datasets)
- Data Collections and the Department of Health are included in the acknowledgements section (not required for abstracts). See FAQ #9 for suggested wording
- The output is consistent with the aims and methodology detailed in your approved Application for Data and ethics application
21. I want to apply for linked data – what are the next steps?
Make sure you have a read of the Application Process, then when you’re ready to apply, download the relevant forms from the Applications forms page. Complete the forms and submit them to DataServ@health.wa.gov.au, and we’ll be in touch within a day or so.
22. Do you provide any training on how to apply for linked data?
Data Linkage Services hosts a researcher training workshop as a whole day session to provide practical support to those wishing to apply for linked data.
Currently there are no scheduled researcher training workshops. We recommend bookmarking our News and Events page to get information on future workshop dates.
23. What should I include in a data availability statement in publications?
The following wording can be used for data availability statements in publications:
The datasets generated and/or analysed during the current study are not publicly available due to the terms of the ethics approval granted by the Western Australian Department of Health Human Research Ethics Committee and data disclosure policies of the Data Providers. The datasets may be available from the corresponding author upon request and subject to approval from the Human Research Ethics Committee and relevant custodians.
24. What/how many data items are required for ad-hoc data linkage?
Data linkage is most often performed using probabilistic data linkage techniques, but can also be done deterministically, or pseudo deterministically.
To achieve the maximum possible linkage rate, ideal data fields include:
- Given name(s)
- Date of birth
- Date of event
- Fields related to Core Linked datasets (i.e., UMRN)
If only some of these fields can be provided, linkage can still be done, however the linkage rate is often much lower. Additionally, in cases where there is limited demographic information provided a record might link equally well to two or more individuals. In such cases it will not be linked at all to avoid the overprovision of data. See our Data Providers page for more information.