Parallel Experiments

@LinghaoCh

Stay informed. Stay authentic. Welcome to the public part of my brain. Here I share curations and thoughts. Created with ❤️ by @linghao.

Subscribers1,750Current channel subscribers

Tracked posts937Indexed post count

Recent reach6,356Sum of recent post views

Recent posts

Tag: #ai · 5 posts

当前筛选 #ai清除筛选

Posted Apr 19

Find similar View

https://arxiv.org/abs/2305.18290#llm#ai 今天深入学习了 DPO，再次感叹扎实的数学功底对 AI/ML Research 的重要性…… 原始的 RLHF 是用 pairwise human preference data（A 和 B 哪个更好）去训练一个 reward model，然后用 RL 来训练主 policy model，objective 是 minimize negative log likelihood + regularization（比如 PPO 就是通过新旧 policy 之间的 KL Divergence 来做 regularization）。这样的缺点在于 RL 是出了名的难搞，而且还需要一个 critic model 来预测 reward，使得整个系统的复杂性很高。 DPO 的思路是，观察到 RLHF 的 objective 本质上是 minimize loss over (latent) reward function，通过一番 reparameterization 等数学推导，重新设计了一个 minimize loss over policy 的 objective，绕过了中间这个 reward model，让 gradient update 直接增加 policy model 生成 winner response 的概率并降低 loser response 的概率，大幅简化了流程。拓展阅读： - KTO: 更进一步，不需要 pairwise comparison，只用对 individual example 的 upvote/downvote 也可以学习到 preference。 - IPO: 解决 DPO 容易 overfit 的问题。

2,650 views

Hashtags

#llm #ai

Posted Apr 17

Find similar View

Truly a thought-provoking piece, from the author of τ-bench. https://ysymyth.github.io/The-Second-Half/#ai So what’s suddenly different now? In three words: RL finally works. More precisely: RL finally generalizes. After several major detours and a culmination of milestones, we’ve landed on a working recipe to solve a wide range of RL tasks using language and reasoning. The second half of AI — starting now — will shift focus from solving problems to defining problems. In this new era, evaluation becomes more important than training. Instead of just asking, “Can we train a model to solve X?”, we’re asking, “What should we be training AI to do, and how do we measure real progress?” To thrive in this second half, we’ll need a timely shift in mindset and skill set, ones perhaps closer to a product manager. It turned out the most important part of RL might not even be the RL algorithm or environment, but the priors, which can be obtained in a way totally unrelated from RL (LLMs).

795 views

Hashtags

#ai

Posted Apr 9

Find similar View

https://ai-2027.com “We predict that the impact of superhuman AI over the next decade will be enormous, exceeding that of the Industrial Revolution.” 不管怎样，这个页面的 interaction 很棒 #ai

860 views

Hashtags

#ai

Posted Feb 12

Find similar View

用两天在路上开车的时间听完了 Latent Space 这期跟传奇 Bret Taylor 一个半小时的访谈，收获颇多！ #podcast#ai https://www.latent.space/p/bret

1,750 views

Hashtags

#podcast #ai

Posted Oct 12

Find similar View

用看待人类智能的方式去设想机器智能的可能形式，是一种非常狭隘的观念。联想到这篇文章：https://zhuanlan.zhihu.com/p/26253133#ai

301 views

Hashtags

#ai