As the linkageData Linkage: a complex technique for connecting data records within and between datasets using demographic data (e.g. name, date of birth, address, sex, medical record number). Also called ‘Record Linkage’ or ‘Linkage’. system has developed and expanded over the last decade particularly, so too has the number and complexity of projects requesting to use WA linked dataCan refer to: (1) the demographic data used in the Data Linkage process; or (2) information pertaining to services provided to people or their clinical information (available only from Data Custodians, including via CARES)..   Although a particular research question may seem to  be straight forward, the data flows may be quite difficult to organise.   Project complexity can significantly impact on data delivery timeframes and cost.

What makes a complex project?

New linkages
New linkages may take significant amounts of time to complete. The speed of the linkage depends very much on the size, timeframe, quality and completeness of the datasetA collection of similar items of information, for example a WA Births dataset might contain many thousands of pieces of information, each of which contains the name, place, and date of birth for WA people. being linked. A new linkage (e.g. of a research dataset) can take anywhere between a few hours to several months to complete. Small datasets with well recorded and standardised information can be linked quickly. Conversely, large datasets with millions of records and missing fields take longer to link.

Cohort selectionThe first phase of data coordination for a linked data request that defines the study group of interest. E.g. a group of people born in a specific time range, or selected from inpatient records by searching for a particular diagnosis code. In adherence to the Separation Principle, this phase is run separately from the Data Linkage Team if it involves clinical information. specifications
Projects become more complex as the number of cohort sources increases. The cohort selection is the first step in the preparation of data for a project, so sourcing cohorts (or subset of cases of interest) from multiple datasets takes more time. In addition, the type and complexity of selection criteria also impacts on the time required, e.g. using a list of hundreds of ICD codes for selection of hospital records.

Control selection specifications
The selection of control groups significantly increases a project’s complexity and extends delivery timelines. Control groups are most often selected from the WA Electoral Roll and Birth Registrations. Control selection requires a Linkage Officer and/or Data Analyst to write or amend scripts to select a suitable set of comparison records that may need to be matched or randomly selected and may also span multiple data sources. Some criteria that may be very specific (e.g. matchingA part of the linkage process whereby blocked pairs of records are compared, according to user-set parameters, to determine the strength of the match. on gender, full date of birth, residential postcode and an event date) can take several weeks to complete the selection.

Number and type of datasets
Similarly to the above criteria, projects become more complex and time consuming with each additional data source requested. Service data must be extracted from individual data collections for each project so therefore the more datasets involved, the more work required. Some datasets  (e.g. external to the DOHWADepartment of Health Western Australia) may have particular governance and security requirements in place which will also impact on complexity and timelines. The DLBData Linkage Branch: the specialist team at the Department of Health who are responsible for developing and maintaining the WA Data Linkage System, performing data linkage, and the facilitation of access to linked data. has processed some projects which have requested information from up to 25 data sources on multiple cohorts.

Family ConnectionsA link that connects people who belong to the same family (e.g. mother, father, sibling, cousin), provided as a list of pairs of related ROOTS and their relation type. System
The Family Connections System provides genealogicalRelating to the history of a person, family or group. links for the WA population. Accessing this system
requires an even more complex set of processes, over and above the Data LinkageA complex technique for connecting data records within and between datasets using demographic data (e.g. name, date of birth, address, sex, medical record number). Also called ‘Record Linkage’ or ‘Linkage’. System. This in combination with the above factors (e.g. extracting data for both children and parents) can delay data delivery times.

Complexity of DLB projects 1995 - 2013

The DLB maintains detailed records about all projects received. For a project presented at the International Data Linkage Conference in Perth in 2012, the DLB Client Services Team investigated how project complexity has changed over time. To determine project complexity, a series of scores were calculated for each aspect of a request:

• 2 points were assigned for each new linkage required
• 1 point was assigned for each cohort source
• 1 point was assigned for each dataset requested
• 1 additional point was assigned for external or non-core datasets
• 2 points were assigned for each control group required
• 5 additional points were assigned if the project requested genealogical links via the Family Connections system

The total of these scores was the project’s complexity score. The total scores were then split into three categories:

• A complexity score of <5 was defined as a standard project
• A complexity score of 6 – 10 was defined as a complex project
• A complexity score of >11 was defined as a highly complex project

The graph below indicates the number of projects processed by DLB since its inception in 1995, broken down by complexity category.


 File 300