De-duping the VDW: a protocol for identifying common enrollees across sites without sending PHI [poster] Conference Poster uri icon
  • Background/Aims: Early on in VDW history, the sheer geographical distribution of HMORN member sites made it exceedingly unlikely that the same person would be found in more than one site’s data. Thus our multi-site research has proceeded without much fear of double-counting anyone. But as the HMORN expands its membership to new organizations, this assumption becomes harder to justify. We now have 3 sites operating in Minnesota for instance, and 2 each in Massachusetts, Northern California, Idaho, and Washington. Methods: This presentation will propose a protocol for creating a dataset at HMORN sites listing the local MRNs of people who also appear in another site’s data. Most notably—the protocol does not call for transmitting any unencrypted PHI. The core of the method involves the repeated use of a commutative cipher to encrypt identifiers, each site using its own key, and then comparing the results for overlaps. The end result of the protocol is that each site will know which of its MRNs represent people who also appear in other sites’ data, and which site(s) those are. The presentation will describe the data flows, algorithms and processes necessary to create this data. Results: The protocol described can be run once to assess the actual overlap, and—if it is significant enough—periodically to create this data at the sites. Once created, the data could be made part of the VDW, allowing HMORN projects to easily assure that each person in each distributed cohort appeared in only one site’s file. Conclusions: Being able to say that we have actually investigated population overlap and either found it in fact to be negligible, or that we have a method for de-duplicating people is far preferable to hand-waving and bare assertions that any overlap must be insignificant. As the HMORN grows and VDW collaborators become more numerous, we should expect this question to take on more and more significance in the eyes of both our funders and the consumers of our research.

  • publication date
  • 2012
  • Research
  • Collaboration
  • Data Systems
  • Privacy of Patient Data