TGTGInsighttelegram intelligenceLIVE / telegram public index
Back to channels
Parallel Experiments avatar

TGINSIGHT CHAT

Parallel Experiments

@LinghaoCh

Blogs

Stay informed. Stay authentic. Welcome to the public part of my brain. Here I share curations and thoughts. Created with ❤️ by @linghao.

Subscribers1,750Current channel subscribers
Tracked posts937Indexed post count
Recent reach6,356Sum of recent post views
Recent posts

Recent posts

Tag: #ai · 5 posts

当前筛选 #ai清除筛选

Posted Apr 19

https://arxiv.org/abs/2305.18290#llm#ai 今天深入学习了 DPO,再次感叹扎实的数学功底对 AI/ML Research 的重要性…… 原始的 RLHF 是用 pairwise human preference data(A 和 B 哪个更好)去训练一个 reward model,然后用 RL 来训练主 policy model,objective 是 minimize negative log likelihood + regularization(比如 PPO 就是通过新旧 policy 之间的 KL Divergence 来做 regularization)。这样的缺点在于 RL 是出了名的难搞,而且还需要一个 critic model 来预测 reward,使得整个系统的复杂性很高。 DPO 的思路是,观察到 RLHF 的 objective 本质上是 minimize loss over (latent) reward function,通过一番 reparameterization 等数学推导,重新设计了一个 minimize loss over policy 的 objective,绕过了中间这个 reward model,让 gradient update 直接增加 policy model 生成 winner response 的概率并降低 loser response 的概率,大幅简化了流程。 拓展阅读: - KTO: 更进一步,不需要 pairwise comparison,只用对 individual example 的 upvote/downvote 也可以学习到 preference。 - IPO: 解决 DPO 容易 overfit 的问题。

2,650 views

Hashtags

Posted Apr 17

Truly a thought-provoking piece, from the author of τ-bench. https://ysymyth.github.io/The-Second-Half/#ai So what’s suddenly different now? In three words: RL finally works. More precisely: RL finally generalizes. After several major detours and a culmination of milestones, we’ve landed on a working recipe to solve a wide range of RL tasks using language and reasoning. The second half of AI — starting now — will shift focus from solving problems to defining problems. In this new era, evaluation becomes more important than training. Instead of just asking, “Can we train a model to solve X?”, we’re asking, “What should we be training AI to do, and how do we measure real progress?” To thrive in this second half, we’ll need a timely shift in mindset and skill set, ones perhaps closer to a product manager. It turned out the most important part of RL might not even be the RL algorithm or environment, but the priors, which can be obtained in a way totally unrelated from RL (LLMs).

795 views

Hashtags

Posted Apr 9

https://ai-2027.com “We predict that the impact of superhuman AI over the next decade will be enormous, exceeding that of the Industrial Revolution.” 不管怎样,这个页面的 interaction 很棒 #ai

860 views

Hashtags

Posted Feb 12

用两天在路上开车的时间听完了 Latent Space 这期跟传奇 Bret Taylor 一个半小时的访谈,收获颇多! #podcast#ai https://www.latent.space/p/bret

1,750 views

Hashtags

Posted Oct 12

用看待人类智能的方式去设想机器智能的可能形式,是一种非常狭隘的观念。联想到这篇文章:https://zhuanlan.zhihu.com/p/26253133#ai

301 views

Hashtags