Access

You are not currently logged in.

Access your personal account or get JSTOR access through your library or other institution:

login

Log in to your personal account or through your institution.

If you need an accessible version of this item please contact JSTOR User Support

A Theory for Record Linkage

Ivan P. Fellegi and Alan B. Sunter
Journal of the American Statistical Association
Vol. 64, No. 328 (Dec., 1969), pp. 1183-1210
DOI: 10.2307/2286061
Stable URL: http://www.jstor.org/stable/2286061
Page Count: 28
  • Download ($14.00)
  • Cite this Item
If you need an accessible version of this item please contact JSTOR User Support
A Theory for Record Linkage
Preview not available

Abstract

A mathematical model is developed to provide a theoretical framework for a computer-oriented solution to the problem of recognizing those records in two files which represent identical persons, objects or events (said to be matched). A comparison is to be made between the recorded characteristics and values in two records (one from each file) and a decision made as to whether or not the members of the comparison-pair represent the same person or event, or whether there is insufficient evidence to justify either of these decisions at stipulated levels of error. These three decisions are referred to as link (A1), a non-link (A3), and a possible link (A2). The first two decisions are called positive dispositions. The two types of error are defined as the error of the decision A1 when the members of the comparison pair are in fact unmatched, and the error of the decision A3 when the members of the comparison pair are, in fact matched. The probabilities of these errors are defined as μ = ∑γεΓ u(γ)P(A1∣γ) and λ = ∑γεΓ m(γ)P(A3∣γ) respectively where u(γ), m(γ) are the probabilities of realizing γ (a comparison vector whose components are the coded agreements and disagreements on each characteristic) for unmatched and matched record pairs respectively. The summation is over the whole comparison space Γ of possible realizations. A linkage rule assigns probabilities P(A1∣γ), and P(A2∣γ), and P(A3∣γ) to each possible realization of γ ε Γ. An optimal linkage rule L(μ, λ, Γ) is defined for each value of (μ, λ) as the rule that minimizes P(A2) at those error levels. In other words, for fixed levels of error, the rule minimizes the probability of failing to make positive dispositions. A theorem describing the construction and properties of the optimal linkage rule and two corollaries to the theorem which make it a practical working tool are given.

Page Thumbnails

  • Thumbnail: Page 
1183
    1183
  • Thumbnail: Page 
1184
    1184
  • Thumbnail: Page 
1185
    1185
  • Thumbnail: Page 
1186
    1186
  • Thumbnail: Page 
1187
    1187
  • Thumbnail: Page 
1188
    1188
  • Thumbnail: Page 
1189
    1189
  • Thumbnail: Page 
1190
    1190
  • Thumbnail: Page 
1191
    1191
  • Thumbnail: Page 
1192
    1192
  • Thumbnail: Page 
1193
    1193
  • Thumbnail: Page 
1194
    1194
  • Thumbnail: Page 
1195
    1195
  • Thumbnail: Page 
1196
    1196
  • Thumbnail: Page 
1197
    1197
  • Thumbnail: Page 
1198
    1198
  • Thumbnail: Page 
1199
    1199
  • Thumbnail: Page 
1200
    1200
  • Thumbnail: Page 
1201
    1201
  • Thumbnail: Page 
1202
    1202
  • Thumbnail: Page 
1203
    1203
  • Thumbnail: Page 
1204
    1204
  • Thumbnail: Page 
1205
    1205
  • Thumbnail: Page 
1206
    1206
  • Thumbnail: Page 
1207
    1207
  • Thumbnail: Page 
1208
    1208
  • Thumbnail: Page 
1209
    1209
  • Thumbnail: Page 
1210
    1210