Contenuto
Hugging Face (Twitter) RT @wangzhilin123: We built ProfBench to raise the bar for LLMs - literally. At @NVIDIA, we worked with domain experts to create a benchmark that goes far beyond trivia and short answers. ProfBench tests LLMs on complex, multi-step tasks that demand the kind of reasoning, synthesis, and clarity you'd expect from a PhD physicist or MBA consultant. 🌎 This isn’t just a dataset drop. It’s a global collaboration: 38 professionals across 8 countries contributed over 7,000 expert-written rubrics across finance MBA 💵, consulting MBA 📊, chemistry PhD 🧪and physics PhD 🚀. 🧗Every prompt and grading rubric was handcrafted, requiring tens of hours of dedicated and focussed work. Now fully supported in the NeMo Evaluator SDK, ProfBench enables reproducible, rubric-based evaluations and side-by-side model comparisons. 🔗 ProfBench on @HuggingFacehttps://huggingface.co/datasets/nvidia/ProfBench 🔗 NeMo Evaluator SDK... Перейти на оригинальный пост