The classical record linkage theory was developed by Fellegi & Sunter, 1969(1), whereas the basic ideas were already presented by Newcombe & Kennedy in 1959(2). The term ‘record linkage’ was first used by the chief of the U.S. National Office of Vital Statistics, Dr. Halbert L. Dunn in 1946, when he wrote that “each person in the world creates a book of life. The book starts with birth and ends with death. Its pages are made up of principle events in life. Record linkage is the name given to the process of assembling the pages of the book into a volume”.(3)
The basic idea is to link records from different datasets by comparing common attributes, which include person identifiers like names, dates of birth etc. and demographic information. Pairs of records are classified as links if their common attributes predominantly agree, or as non-links if they predominantly disagree. Two datasets A and B to be linked can have AXB possible pairs, which are divided into M as the set of true matches and U as the set of non-matches. For each attribute in a possible pair a weight is calculated which considers the probability of mere chance to be equal if the pair is truly non-linked (u-probability) and the probability of being equal given the pair is truly linked (m-probability). The weights for each attribute are summed up to build a total weight for each possible link. Thresholds cut the possible links into the definite and rejected pairs. As agood and easily understandable introduction to record linkage we recommend the paper from Clark, 2004(4).
- Fellegi IP,.Sunter AB. A theory of record linkage. Journal of the American Statistical Association 1969;64:1183-210.
- Newcombe HB, Kennedy JM, Axford SJ, James AP. Automatic linkage of vital records. Science 1959;130:954-9.
- Dunn HL. Record linkage. Am J Public Health 1946;36:1412-6.
- Clark DE. Practical introduction to record linkage for injury research. Inj Prev 2004;10:186-91.