RAG Evaluation with ragas: Reference-Free Metrics & Monitoring

カートのアイテムが多すぎます

ご購入は五十タイトルがカートに入っている場合のみです。

カートに追加できませんでした。

しばらく経ってから再度お試しください。

ウィッシュリストに追加できませんでした。

しばらく経ってから再度お試しください。

ほしい物リストの削除に失敗しました。

しばらく経ってから再度お試しください。

ポッドキャストのフォローに失敗しました

ポッドキャストのフォロー解除に失敗しました

RAG Evaluation with ragas: Reference-Free Metrics & Monitoring

無料で聴く

ポッドキャストの詳細を見る

このコンテンツについて

Unlock the secrets to evaluating Retrieval-Augmented Generation (RAG) pipelines effectively and efficiently with ragas, the open-source framework that’s transforming AI quality assurance. In this episode, we explore how to implement reference-free evaluation, integrate continuous monitoring into your AI workflows, and optimize for production scale — all through the lens of Keith Bourne’s comprehensive Chapter 9.

In this episode:

- Overview of ragas and its reference-free metrics that achieve 95% human agreement on faithfulness scoring

- Implementation patterns and code walkthroughs for integrating ragas with LangChain, LlamaIndex, and CI/CD pipelines

- Production monitoring architecture: sampling, async evaluation, aggregation, and alerting

- Comparison of ragas with other evaluation frameworks like DeepEval and TruLens

- Strategies for cost optimization and asynchronous evaluation at scale

- Advanced features: custom domain-specific metrics with AspectCritic and multi-turn evaluation support

Key tools and technologies mentioned:

- ragas (Retrieval Augmented Generation Assessment System)

- LangChain, LlamaIndex

- LangSmith, LangFuse (observability and evaluation tools)

- OpenAI GPT-4o, GPT-3.5-turbo, Anthropic Claude, Google Gemini, Ollama

- Python datasets library

Timestamps:

00:00 - Introduction and overview with Keith Bourne

03:00 - Why reference-free evaluation matters and ragas’s approach

06:30 - Core metrics: faithfulness, answer relevancy, context precision & recall

09:00 - Code walkthrough: installation, dataset structure, evaluation calls

12:00 - Integrations with LangChain, LlamaIndex, and CI/CD workflows

14:30 - Production monitoring architecture and cost considerations

17:00 - Advanced metrics and custom domain-specific evaluations

19:00 - Common pitfalls and testing strategies

20:30 - Closing thoughts and next steps

Resources:

- "Unlocking Data with Generative AI and RAG" by Keith Bourne - Search for 'Keith Bourne' on Amazon and grab the 2nd edition

- Memriq AI: https://Memriq.ai

- ragas website: https://www.ragas.io/

- ragas GitHub repository: https://github.com/vibrantlabsai/ragas (for direct access to code and docs)

Tune in to build more reliable, scalable, and maintainable RAG systems with confidence using open-source evaluation best practices.

まだレビューはありません