Glossary

 


 

%-Linked

For each data set, this refers to the percentage of records that have been linked to any other record in the WA Data Linkage System. Also referred to as the ‘completeness’ of the given linkage, noting that this number may be impacted by records that are impossible to link (due to missing or invalid demographic information).


Ad-hoc Linkage

A single-purpose, one-off linkage, usually relating to a specific research project. These are typically, but not always, linked using a manually created dataset, without being loaded into the WA Data Linkage System.

Application for Data

The method used to request linked data from the Department of Health. The Application for Data uses a set of modular forms that are specific to each type of service and dataset requested.


Blocking

A part of the linkage process whereby pairs of records are subsetted for comparison based on a list of fields on which they must match exactly.


CARES

Custodian Administered Research Extract Server: a Department of Health initiative that streamlines linked data extraction, quality control and delivery services.


Chain

All of the linked records thought to belong to a single person. Also see definition for ROOT.


Chain Number

The common identifier assigned by the WADLS to records that have been linked together.


Clerical Review

The process whereby potential matches that do not match strongly enough to automatically match, nor weakly enough to automatically discard, are manually evaluated by a Linkage Officer.


Cohort Selection

The first phase of data coordination for a linked data request that defines the study group of interest. E.g. a group of people born in a specific time range, or selected from inpatient records by searching for a particular diagnosis code. In adherence to the Separation Principle, this phase is run separately from the Data Linkage Team if it involves clinical information.


Core Linkages

The long-running, routine linkages that represent the ‘spine’ of the WA Data Linkage System to which other datasets can be linked. Comprises: (1) Hospital Morbidity; (2) Emergency Department; (3) Midwives Notifications; (4) Mental Health; (5) Cancer Registry; (6) Births, Deaths, Marriages; and (7) Electoral Roll. See also ‘Non-Core Linkages’.


Data

Can refer to:

(1) the demographic data used in the Data Linkage process; or
(2) information pertaining to services provided to people or their content or clinical information (available only from Data Custodians, including via the Custodian Administered Research Extract Server (CARES)).

Data Applicant

A person, group, entity who formally requests access to linked data.


Data Cleaning

The process of standardising the demographic fields of data to be linked, to ensure maximum compatibility with existing linked data.


Data Custodian

The person within an organisation/agency formally assigned to collect, manage, secure and disclose a dataset on a day-to-day basis at the direction of the Data Steward.


Data Flow Diagram

A pictorial representation of the flow of data from party to party. Most often used for specific linked data requests, and a requirement for explaining complex data requests.


Data Linkage

A complex technique for connecting data records within and between datasets using demographic data (e.g. name, date of birth, address, sex, medical record number). Also known as ‘Record Linkage’ or ‘Linkage’. See also ‘Probabilistic Linkage’ and ‘Deterministic Linkage’.


ISPD Client Services

Can refer to:
(1) The ISPD Client Services Team. This team was formerly known as the Data Linkage Branch, and the Research Data Services Team. The Team can be contacted through the general Data Services email address of DataServ@health.wa.gov.au; or
(2) The unique services offered by Data Linkage Services, include Linkage, Extraction, Family Connections, Geocoding and Sample Selection. These services relate directly to the Data Services Forms, which are completed by Data Applicants.


Data Steward

The person within an organisation/agency formally assigned to set the strategic purpose, operation and disclosure model of a data collection.


Dataset

A collection of similar items of information, for example a Western Australian (WA) Births dataset might contain many thousands of pieces of information, each of which contains the name, place, and date of birth for WA people.


Deidentified

Related to the identifiability of a dataset/data item; where the identity of a person/organisation has been removed and therefore is not immediately obvious nor can be reasonably ascertained using other sources of information. The National Health and Medical Research Council discourages the use of this term as its meaning can be unclear, however, this term is still used by a number of other authorities. See related term ‘unidentifiable’.


Department of Health WA HREC

Department of Health Human Research Ethics Committee. A group that provides governance over Department of Health personal health information and its use in research.

See also: http://ww2.health.wa.gov.au/Articles/A_E/Department-of-Health-Human-Research-Ethics-Committee


Derived Aboriginal and Torres Strait Islander Status Flag

A DLB-created value that represents a ‘best guess’ of the Aboriginal and Torres Strait Islander status of a person, via an algorithm that collects Aboriginal and Torres Strait Islander status information from various records in a linkage chain to calculate a single result. Formerly known as the ‘Getting Our Story Right (GOSR) flag’.


Deterministic Linkage

A method of linking records using unique identifiers (e.g. Elector number), where any two records with the same unique identifier are deemed to be a definite match. No clerical review is done when undertaking deterministic linkage. It is assumed that the unique identifiers being used are truly unique across the population being linked.


DLB

Data Linkage Branch: a term that was previously used to denote the specialist team at the Department of Health who are responsible for developing and maintaining the WA Data Linkage System, performing data linkage, and the facilitation of access to linked data.


Encrypted LPNO

Encrypted versions of the LPNO are provided to clients to identify individual records.


Encryption

A process where information is transformed so that it is unrecognisable, and where this transformation can only be reversed (decrypted) by a person with the same secret key used to encrypt the original data.


EOI

Expression of Interest: The colloquial term for a draft Application for Data.


Extraction

Can refer to:

(1) the extraction of linkage keys (by Linkage Officers); or
(2) the extraction of content data to which these keys will be appended (by the Custodian Administered Research Extract Server (CARES) or the relevant Data Custodians).

False Negative

A correct link that has not been discovered (i.e. a ‘missed’ link). The Department of Health’s interactive, multi-pass, multi-dataset approach to linkage is designed to minimise false negatives.


False Positive

A link that has been made in error between the data for two or more distinct people. The Department of Health has numerous quality assurance measures in place to ensure the maximum number of false positives are filtered out prior to links being loaded into the WADLS.


Family Connections

A link that connects people who belong to the same family (e.g. mother, father, sibling, cousin), provided as a list of pairs of related ROOTS and their relation type.


Feasibility Letter

A letter from the Research Data Services Project Officer indicating in-principle support from Data Custodians and Research Data Services for a linked data request. The letter communicates to a Data Applicant that his/her request is technically feasible and that the application can proceed to the next phase of ethical review by the Department of Health Human Research Ethics Committee (HREC). The letter does not communicate formal approval.


Geocode

A point on the earth’s surface described spatially by a geographical coordinate, usually a latitude/longitude, representing the position of a known feature such as a street address, a named place or an item of infrastructure. Alternatively, this term is often used to describe the derived boundaries.


Geocoded Address

A geographical coordinate that approximates the centre of either the property or the main building on the property described by the property street address.


Geocoding

The process of assigning a geographical coordinate to a named feature. In the case of the feature being a street address, the address must first be matched to a known address in a spatially referenced dataset such as those maintained by Landgate (in WA) and Australia Post. Once a geographical coordinate has been assigned to the feature, census statistical areas (SA1, SA2, LGA) can be derived.


Identifiable

Related to the identifiability of a dataset/data item; where the identity of a person/organisation is immediately obvious.


Infrastructure Linkage

A dataset approved for linkage to the WADLS that can be potentially accessed for multiple projects and is not tied to one specific project approval, timeline or group.


Link

A connection between records indicating that two records are deemed to belong to the same person.


Linkage

See ‘Data Linkage’.


Linkage Key

Can refer to:
(1) a Chain Number; or
(2) a ROOT.


LPNO

DLB’s in-house record identifier, assigned to every record loaded into the WA Data Linkage System.


Matching

A part of the linkage process whereby blocked pairs of records are compared, according to user-set parameters, to determine the strength of the match.


MoU

Memorandum of Understanding: a non-binding agreement between parties. In this context, MoUs are generally a mechanism to allow data sharing between a data provider and the Department of Health. E.g. to establish an infrastructure linkage, the Department of Health would have an MoU with the Data Provider and also request approval for the new linkage from the Department of Health WA Human Research Ethics Committee (HREC).


MyFT

The secure online file transfer system operated by WA Health. This system is used by Research Data Services to send and receive data.


Non-core Linkages

All linkages of datasets (other than the Core Linkages) that have been loaded into the WADLS.


Potentially Reidentifiable

Related to the identifiability of a dataset/data item; where the identity of a person/organisation is not immediately obvious but could be ascertained through a unique combination of fields or using information that the recipient already holds.


Probabilistic Linkage

A method of linking records using non-unique identifiers (e.g. name, date of birth) to establish weights which represent the likelihood that two records belong to the same person. These weight are used to inform matches and non-matches, and can include clerical review for a selected ‘grey area’ in between.


Quality Control

A process that examines an extract of data for completeness and correctness. This includes checking file counts, consistency of roots, presence of requested fields, overlap between groups, field formatting and the correct application of inclusion/exclusion criteria.


Record

A single data item sourced from a data collection, which typically refers to one event, instance or registration (e.g. hospital discharge, birth registration, car crash), although in rare cases can refer to more than one. The specifics of what constitutes a record varies between data collections, depending on how the data is recorded and stored. Each record contains: (1) demographic information (names, addresses, etc), that DLB uses to link the data, and; (2) service information (diagnoses, procedures, etc) that are used by Data Applicants to perform analysis.


Record Linkage

See ‘Data Linkage’.


Role Separation

The practice of separating access to identifiable information from clinical or service information. e.g. within a Project Team, one person manages the identifiable information of consenting participants. Another member of the team analyses and manages the deidentified health service information. Neither member has access to both sets of information.


ROOT

The ‘master’ LPNO used to identify a Chain. It is generally the LPNO belonging to the earliest record in a person’s Chain that has been sourced from a Core health dataset. The ROOT and Chain Number will largely correspond 1-to-1.

The ROOT is used for:
(1) ease of processing (they can be handled the same as LPNOs); and
(2) security (i.e., Research Data Services provides the encrypted ROOT to Data Applicants to identify Chains, rather than the Chain Number).


Separation Principle

The separation of roles to ensure privacy is maintained, whereby identifiable information is kept separate to clinical/service information. The Department of Health’s data engineering teams and Research Data Services strictly adhere to this principle and it must be maintained in all projects requesting linked data. Also see ‘Role Separation’.


SUFEX

The secure online file transfer system managed by Curtin University. SUFEX was decommissioned in Q4 2020. The Department of Health uses MyFT for secure file transfers.


Unidentifiable

Related to the identifiability of a dataset/data item; where the identity of a person/organisation is not immediately obvious and it is not reasonably possible to reidentify a person/organisation using other sources of information.


WAAHEC

Western Australian Aboriginal Health Ethics Committee: the ethics committee run through the Aboriginal Health Council of WA. The Committee’s objectives are to effectively monitor ethically sound and culturally appropriate research where Aboriginality is of interest and ensure the benefits to Aboriginal people.


WADLS

Western Australian Data Linkage System: the Western Australian system used to connect available health and other related information for the WA population. This incorporates database tables holding demographic data and linkage keys, and the bespoke tools used by Linkage Staff to process, create, store and retrieve them.