Background: The HMORN Virtual Data Warehouse (VDW) Utilization files are used in almost every VDW research project for a range of purposes including selecting study populations, building disease registries, measuring health status, and evaluating resource use and appropriateness of care. Utilization data, including encounter, diagnosis and procedure data, comes from multiple data sources including legacy data, electronic health records, and claims. Because the data come from many sources and require complicated processes of standardization, the VDW tables can be very complex to build, potentially leading to inconsistencies across sites. Objective: Our objective was to assess, document, and improve the overall quality, availability, and completeness of the VDW Utilization data. To understand whether our QA approach was effective, we compared the current data quality to quality assurance data collected in 2009. Methods: The HMORN Utilization Work Group, together with KP CESR staff, developed quality assurance programs to build summary tables for participating sites. The Utilization Work Group then combined the summary tables from each site and graphically compared utilization rates, diagnosis capture, and other statistics across HMORN sites to provide a more complete picture of the variability across sites and identify potential outliers that may indicate data quality concerns. Results: Overall, we found that VDW Utilization data quality has improved considerably since 2009, as demonstrated by the reduction in variability across sites. In particular, rates of hospitalization, inpatient days, and doctor’s office visits are considerably more consistent across time and sites. Residual differences likely reflect real-world variation in membership composition and standards of care. In addition, we identified areas with persistent variability that indicate a need for further exploration, such as rates of dialysis and out-of-office encounters. Finally, we also found between-site differences in the interpretation of the Utilization specification, such as the designation of principal and primary diagnoses. Conclusions: Identification and resolution of data quality problems through frequent use of the data, cross-site quality checks, QA programs that produce traffic light (pass/warning/fail) reports, and sites sharing ETL (Extract, Transform, Load) code have considerably improved the data quality of the VDW utilization files.