llm-evaluation 2 Building a RAG Evaluation Pipeline with Python Jun 19, 2026 DeepSeek V4 API Thinking Mode Arena Apr 30, 2026