Generalized Record Linkage Software GRLS, Statistics Canada
- internal linkage: if we have a single file that contains more than one record per individual (e.g. hospital admissions), GRLS can group the records probably belonging to each individual
- internal linkage can be used to deduplicate a file with possibly multiple records per individual (e.g. customer database)
- two-table linkage: if we have two files containing information of the same entity (e.g. records from a cancer registry in table A and records from the mortality registry in table B), these file could be linked to calculate survival statistics.
GRLS allows to use the same methodology for different kind of projects. The linkage process is separated in distinct phases. Each phase involves choosing values (e.g. rules, how attributes will be compared), examining their effect, and adjusting the values as necessary before going on to the next phase. Results from later phases often suggest adjustments to earlier phases. Because phases are distinct, you can easily retrace your steps, run an earlier phase again with new adjustments, run intermediate phases as they are, and quickly catch up to where you were. GRLS is therefore an iterative record linkage system.
Advantages of GRLS
- helps to develop the best linkage strategy for a specific project
- supports refinements and re-running a phase with altered parameters
- helps to record single decisions in order to reproduce linkage results
- makes even linkage of huge datasets possible
GRLS and the SNC project
Within the SNC project GRLS is used to link mortality data to the Census data. Linkages of other datasets to the SNC data are possible and should be discussed within the framework of new SNC projects.
For a short introduction to GRLS and the concepts used in record linkage processes, see here.
G-Link – new development of GRLS
- Probabilistic record linkage software based on GRLS
- using SAS as database