Data Extraction

The process of extracting data for projects involves multiple teams at the Department of Health, as well as the assistance of Data Collections from the Department of Health and other external agencies, according to the following stages:


 

1. Approval Granted

Approval for the project is received from relevant Ethics Committees and stakeholders.


 

2. Identify Study Population

First, the study population is selected. This can be done via a variety of methods, such as a new linkage, where the Applicant already has the study population chosen, or via selection from one or more Data Collections. For example:

  • All participants of a research project (a new linkage)
  • All people who went to hospital for a colonoscopy (from the Hospital Morbidity Data Collection)
  • All people with colorectal cancer (from WA Cancer Registry)
  • People in all of these groups (by combining all of the records from the three sources above)

Other associated study groups may also be selected, for example control or comparison group selections (e.g., random sample of people from the WA Electoral Roll who are the same age and gender as the cases) and Family Connections groups (e.g., children of the cases).


 

3. Extract linkage keys

Once the study population is defined, the Linkage Team extracts the encrypted linkage keys for each requested dataset. The Project Manager then distributes these lists of keys to the relevant data collections for the service data to be attached.


 

4. Attach content data

The Data Custodians arrange for the requested content data from their collection to be attached to the linkage keys. For some Data Collections, the Data Outputs team can perform this process using the Custodian Administered Research Extract Server (CARES). There is more information about at the CARES page.

For many data collections, the files are sent to the ISPD Client Services Team to coordinate quality checking. For some datasets, the content data may be released directly to the Applicant, at the discretion of the Data Custodian.


 

5. Checking

The ISPD Client Services Team arranges for the Data Outputs Team to check the data matches the request and convert all the data to a common format. Supporting documentation is also written to describe the data requested.


 

6. Data Release

The ISPD Client Services Team prepares the data for release by encrypting it and applying password protection. The data is then released to the Applicant via secure online transfer.

 


Linked Data Preview

Linked data files are usually provided as tab-delimited text files. They will be encrypted/password protected using WinZip or 7zip. On receipt of the data, Applicants will also be given supporting reference documentation.

The following is an example of a file a researcher might receive. Please see the data dictionaries on the Downloads page for detailed information on each dataset.