Dylan Bouchard, PhD
Lead Applied Research Scientist, Thomson Reuters Labs
I'm an applied research scientist at Thomson Reuters Labs. My recent research has focused on AI safety — uncertainty quantification, hallucination detection, and bias and fairness in large language models. Previously, I led the AI Research program at CVS Health, where I authored the UQLM and LangFair open-source toolkits. I hold a PhD in Economics (Econometrics) from North Carolina State University.
Open-Source Projects
-
A Python toolkit for LLM hallucination detection via uncertainty quantification, implementing black-box, white-box, LLM-judge, and ensemble scorers. 1K+ GitHub stars, 40K+ PyPI downloads; recognized by LangChain.
-
A Python package for use-case-level LLM bias and fairness assessment using a bring-your-own-prompts approach. 250+ GitHub stars, 40K+ downloads; integrated into the LangChain ecosystem and incorporated into the Coalition for Health AI's responsible AI best practices.
Select Publications
- Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers. Transactions on Machine Learning Research, 2025. [OpenReview]
- UQLM: A Python Package for Uncertainty Quantification in Large Language Models. Journal of Machine Learning Research, 27(13):1–10, 2026. [JMLR]
- Bring Your Own Prompts: Use-Case-Specific Bias and Fairness Evaluation for LLMs. Proceedings of the LT-EDI Workshop at ACL, 2026 (to appear). [arXiv]
- Fine-Grained Uncertainty Quantification for Long-Form Language Model Outputs: A Comparative Study. Under review, 2026. [arXiv]
- LangFair: A Python Package for Assessing Bias and Fairness in Large Language Model Use Cases. Journal of Open Source Software, 10(105):7570, 2025. [DOI]
- Is Escalation Worth It? A Decision-Theoretic Characterization of LLM Cascades. Under review, 2026. [arXiv]
Selected Talks
- Functional Entropy: Predicting Functional Correctness in LLM-Generated Code with Uncertainty Quantification. SURGeLLM Workshop at ACL 2026 (upcoming).
- Bring Your Own Prompts: Use-Case-Specific Bias and Fairness Evaluation for LLMs. LT-EDI Workshop at ACL 2026 (upcoming).
- Uncertainty Quantification for Language Models: Standardizing and Evaluating Black-Box, White-Box, LLM Judge, and Ensemble Scorers. NeurIPS 2025 LLM Evaluation Workshop.
- UQLM: Detecting LLM Hallucinations with Uncertainty Quantification in Python. PyData Global 2025 (oral presentation).
- UQLM: A Toolkit for LLM Hallucination Detection Using Uncertainty Quantification. AI Alliance Trust and Safety Working Group, July 2025 (invited).
Service
Reviewer for NeurIPS, ICLR, ACL, ACM TIST, and ACM CHI.