Open-source LLM evaluation with 14+ metrics and pytest integration
DeepEval is an open-source LLM evaluation framework with 14+ research-backed metrics including hallucination detection, bias, toxicity, and task-specific evaluations. It integrates with pytest for CI/CD workflows, supports A/B testing of prompts, and offers a cloud dashboard for tracking evaluation results over time. Designed to make LLM testing as natural as unit testing.
No reviews yet. Be the first!