Dylan Bouchard, PhD

Lead Applied Research Scientist, Thomson Reuters Labs

I'm an applied research scientist at Thomson Reuters Labs. My recent research has focused on AI safety — uncertainty quantification, hallucination detection, and bias and fairness in large language models. Previously, I led the AI Research program at CVS Health, where I authored the UQLM and LangFair open-source toolkits. I hold a PhD in Economics (Econometrics) from North Carolina State University.

Open-Source Projects

UQLM — Uncertainty Quantification for Language Models

A Python toolkit for LLM hallucination detection via uncertainty quantification, implementing black-box, white-box, LLM-judge, and ensemble scorers. 1.2K+ GitHub stars, 60K+ PyPI downloads; recognized by LangChain.

Repository · Documentation
LangFair — Bias & Fairness Assessment for LLMs

A Python package for use-case-level LLM bias and fairness assessment using a bring-your-own-prompts approach. 250+ GitHub stars, 40K+ downloads; integrated into the LangChain ecosystem and incorporated into the Coalition for Health AI's responsible AI best practices.

Repository · Documentation

Select Publications

D. Bouchard, M. S. Chauhan, D. Skarbrevik, H.-K. Ra, V. Bajaj, and Z. Ahmad. UQLM: A Python Package for Uncertainty Quantification in Large Language Models. Journal of Machine Learning Research, 27(13):1–10, 2026. [JMLR]
D. Bouchard and M. S. Chauhan. Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers. Transactions on Machine Learning Research, 2025. [OpenReview]
D. Bouchard, M. S. Chauhan, V. Bajaj, and D. Skarbrevik. Fine-Grained Uncertainty Quantification for Long-Form Language Model Outputs: A Comparative Study. Transactions on Machine Learning Research, 2026. [OpenReview]
D. Bouchard. Bring Your Own Prompts: Use-Case-Specific Bias and Fairness Evaluation for LLMs. Proceedings of the LT-EDI Workshop at ACL, 2026 (to appear). [ACL Anthology]
D. Bouchard, M. S. Chauhan, D. Skarbrevik, V. Bajaj, and Z. Ahmad. LangFair: A Python Package for Assessing Bias and Fairness in Large Language Model Use Cases. Journal of Open Source Software, 10(105):7570, 2025. [DOI]
D. Bouchard, M. S. Chauhan, Z. Ahmad, and H.-K. Ra. Functional Entropy: Predicting Functional Correctness in LLM-Generated Code with Uncertainty Quantification. Under review, 2026. [arXiv]
D. Bouchard. Is Escalation Worth It? A Decision-Theoretic Characterization of LLM Cascades. Under review, 2026. [arXiv]

Selected Talks

Functional Entropy: Predicting Functional Correctness in LLM-Generated Code with Uncertainty Quantification. SURGeLLM Workshop at ACL 2026 (oral presentation).
Bring Your Own Prompts: Use-Case-Specific Bias and Fairness Evaluation for LLMs. LT-EDI Workshop at ACL 2026 (oral presentation).
Uncertainty Quantification for Language Models: Standardizing and Evaluating Black-Box, White-Box, LLM Judge, and Ensemble Scorers. NeurIPS 2025 LLM Evaluation Workshop.
UQLM: Detecting LLM Hallucinations with Uncertainty Quantification in Python. PyData Global 2025 (oral presentation).
UQLM: A Toolkit for LLM Hallucination Detection Using Uncertainty Quantification. AI Alliance Trust and Safety Working Group, July 2025 (invited talk).

Service

Reviewing: NeurIPS, ICML, ICLR, COLM, ACL, ACM TIST, and ACM CHI.

Program committee: Multiplicity and Homogenization in AI (NeurIPS 2026).

Awards: Best Reviewer, LT-EDI Workshop at ACL 2026.