Valérie Garès (INSA Rennes)
Record linkage and analysis of linked data with application in French national health data system.
In this work, we extend the Fellegi–Sunter probabilistic record linkagemodel for mixed–type data. Probabilistic record linkage is a process of combining data from different sources, when such data refer to common entities andidentifying information is not available. Fellegi and Sunter proposed aprobabilistic record linkage framework that takes into account multiple non–identifying information, but is limited to simple binary comparison between matching variables. We propose an extension of this model for mixed–type comparison vectors. We develop a mixture model for handling comparisonvalues of low prevalence categorical matching variables, and a mixture of hurdlegamma distribution for handling comparison values of continuous matchingvariables. The proposed model is applied to perform linkage between a registryof patients suffering from venous thromboembolism in the Brest and the French national health data system. In a second work, we propose a model for Cox regression with linked data. The linked data can bring analysts novel and valuable knowledge which is unable to obtain from a single database. However,linkage errors are usually unavoidable regardless of record linkage methods and ignoring these errors may lead to bias estimates. In this work, we propose anadjusted estimating equation for secondary Cox regression analysis, where linked data have been prepared by someone else and no information on matching variables are available to the analyst. An asymptotically unbiased variance estimator is
also proposed. The proposed model is applied to a linked database from theBrest stroke registry.