BACKGROUND: Improving health equity in depression care and suicide screening requires that measures like the Patient Health Questionnaire 9 (PHQ-9) function similarly for diverse racial and ethnic groups. We evaluated PHQ-9 differential item functioning (DIF) between racial/ethnic groups in a retrospective cohort study of secondary electronic health record (EHR) data from eight healthcare systems. METHODS: The population (n = 755,156) included patients aged 18-64 with mental health and/or substance use disorder (SUD) diagnoses who had a PHQ-9 with no missing item data in the EHR for primary care or mental health visits between 1/1/2009-9/30/2017. We drew two random samples of 1000 from the following racial/ethnic groups originally recorded in EHRs (n = 14,000): Hispanic, and non-Hispanic White, Black, Asian, American Indian/Alaska Native, Native Hawaiian/Other Pacific Islander, multiracial. We assessed DIF using iterative hybrid ordinal logistic regression and item response theory with p < 0.01 and 1000 Monte Carlo simulations, where change in model R(2) > 0.01 represented non-negligible (e.g., clinically meaningful) DIF. RESULTS: All PHQ-9 items displayed statistically significant, but negligible (e.g., clinically unmeaningful) DIF between compared groups. The negligible DIF varied between random samples, although six items showed negligible DIF between the same comparison groups in both random samples. LIMITATIONS: Our findings may not generalize to disaggregated racial/ethnic groups or persons without mental health and/or SUD diagnoses. CONCLUSIONS: We found the PHQ-9 had clinically unmeaningful cross-cultural DIF for adult patients with mental health and/or SUD diagnoses. Future research could disaggregate race/ethnicity to discern if within-group identification impacts PHQ-9 DIF.