rltk
-
The Record Linkage ToolKit (rltk) is a general-purpose open-source record linkage platform that allows users to build powerful Python programs that link records referring to the same underlying entity.
Record linkage is an extremely important problem that shows up in domains extending from social networks to bibliographic data and biomedicine.
Current open platforms for record linkage have problems scaling even to moderately sized datasets, or are just not easy to use (even by experts).
RLTK attempts to address all of these issues.