Probabilistic Data Linkage

rev. 04/25/2012
An abstract concept of probabilistic linkageWhen first funded in 1992, the Utah CODES project used four population-based databases:

Since these databases were collected independently from different sources, there are no database keys that identify person X in the crash file as person Y in the hospital file.

For this reason, techniques such as probabilistic record linkage are needed to combine these databases.

Record linkage is accomplished by comparing common data fields in two different files, such as the date of birth or gender. The comparisons of numerous data fields lead to a judgment that two records refer to the same person and event (and should be linked) or that the records do not refer to the same person and event (and should not be linked). This judgment is based on the cumulative agreement and disagreement of field values.

Probabilistic linkage software accomplishes this task mathematically, rather than relying on the subjective impression of a human clerical reviewer. More thorough treatments of the subject have been previously published (Cook et al., 2001; Jaro, 1995; Newcombe, 1988).