outline
●What is record linkage     
● Record linkage applications 
● A short history of record linkage 
● The record linkage process 
● Record linkage techniques and challenges
What is record linkage?
● The process of linking records that represent the same entity in one or more databases (patients, customers, businesses, products, publications, etc.)     在一个或多个数据库(患者、客户、企业、产品、出版物等)中链接代表同一实体的记录的过程。
● Also known as data linkage, data matching, entity resolution, duplicate  detection, object identification, etc. 
● Major challenge is that unique entity identifiers are not available in the databases to be linked (or if available, they are not consistent or not stable)    主要挑战是在不同的数据库中,实体的单一识别码无法统一用来聚合
Record linkage challenges
● No unique entity identifiers available     没有可用的实体识别符
● Real world data are dirty (typographical errors and variations, missing and out-of-date values, different coding schemes, etc.)     数据不干净
● Scalability 
– Naïve comparison of all record pairs has a quadratic complexity 
– Remove likely non-matches as efficiently as possible 
● No training data in many linkage applications    在许多链接应用中没有训练数据
 – No record pairs with known true match status     没有已知真实匹配状态的记录对
● Privacy and confidentiality (because personal information, like names and addresses, are commonly required for linking)    隐私和保密性
 
                         
                                

