DPS Build@dps_build · Post #52 · 03/12/2023, 11:07 AM
A team of ex-OpenAI fellows at Together have released a 20B chat-GPT model, fine-tuned for chat using EleutherAI's GPT-NeoX-20B, with over 43 million instructions under the Apache-2.0 license.
https://github.com/togethercomputer/OpenChatKit
https://www.together.xyz/blog/openchatkit
#nlp
DPS Build@dps_build · Post #51 · 03/12/2023, 03:50 AM
Haystack
• Ask questions in natural language and find granular answers in your documents.
• Perform semantic search and retrieve documents according to meaning, not keywords.
• Use off-the-shelf models or fine-tune them to your domain.
• Use user feedback to evaluate, benchmark, and continuously improve your live models.
• Leverage existing knowledge bases and better handle the long tail of queries that chatbots receive.
• Automate processes by automatically applying a list of questions to new documents and using the extracted answers.
https://github.com/deepset-ai/haystack
#nlp
DPS Build@dps_build · Post #49 · 03/11/2023, 11:33 PM
为什么 ChatGPT API 是革命性的?
这几天读了读 ChatGPT API 的文档,太惊喜了:
1. 最新版的 API 是基于 gpt-turbo-3.5 的,这一版的 API 的交互是革命性的。得益于模型的强大,用户不需要提交各种参数,只要写 prompt 就行。也就是说 API 的 UX 被大大简化。用户不需要在请求里写参数,只要在 prompt 里写人话,模型自行能够明白用户的表达。
2. 更厉害的是,gpt 这类模型可以接受 chain of thoughts (COT) 的 prompt,如果用户觉得结果不满意,可以继续提交请求让模型生成更好的答案。在李宏毅的讲座里,他给出了一个例子就是,如果让模型直接解答一个复杂的数学题,效果可能不是很好,但是加上 let’s do it step by step 的 prompt 之后,模型给出了一步步的推导过程,结果大为改善。
3. 除了直接调用 ChatGPT API 的基础模型以外,OpenAI 还提供了让用户提交自己的 embedding 和 fine-tuning 等定制模型的方式,这两种都可以通过 API 来实现,不需要额外的步骤。不过,最新的 API 暂时不支持 fine-tuning
4. 以前随便开发一个 NLP 的模型,基本上开发周期是以月计算的,有了 ChatGPT API 之后,抛去准备数据的时间,开发周期可以以小时计算。我从零开始开始读文档,到写出一个 Q&A 生成的项目,只花了半天时间。放在以前,至少要花一两个月的时间吧。
#nlp
http://www.aparat.com/v/0scM5
Irene Chen A Beginner's Guide to Deep Learning.
What is #Deep_Learning ? It has recently exploded in popularity as a complex and incredibly powerful tool. This talk will present the basic concepts underlying deep learning in understandable pieces for complete beginners to #machine_learning.
http://www.aparat.com/v/Corus
Advanced users #Deep_Learning, anyone who has followed #machine_learning over the past years has heard it. In this talk I will go past the hype and show what deep learning actually means and how one goes about solving complex machine learning task with a minimum amount of code, with the help of theano, an amazing python library for deep learning.
http://codeinpython.com/tutorials/deep-learning-tensorflow-keras-pytorch/?nonamp=1
Deep Learning #Tensorflow vs #Keras vs #PyTorch
#Deep_learning is the application of artificial #neural_networks (ANNs) to learn tasks. These tasks contain more than one hidden layer. Deep learning is part of a broader family of #machine_learning. Machine learning itself is a part of #Artificial_Intelligence(#AI).
https://www.analyticsvidhya.com/blog/2016/08/deep-learning-path/?utm_content=bufferd56c5&utm_medium=social&utm_source=linkedin.com&utm_campaign=buffer
#Deep_Learning, a prominent topic in #Artificial_Intelligence domain, has been in the spotlight for quite some time now. It is especially known for its breakthroughs in fields like Computer Vision and Game playing (Alpha GO), surpassing human ability. Since the last survey, there has been a drastic increase in the trends. (click here to check out the survey)
Here is what Google trends shows us:
https://github.com/BVLC/caffe
#Caffe is a #deep_learning framework made with expression, speed, and modularity in mind. It is developed by Berkeley AI Research (BAIR)/The Berkeley Vision and Learning Center (BVLC) and community contributors.
http://www.kdnuggets.com/2017/09/essential-data-science-machine-learning-deep-learning-cheat-sheets.html#.WdGzWthHcEo.linkedin
30 Essential #Data_Science , #machine_learning & #Deep_Learning Cheat Sheets
UCSD and Together AI Research Introduces Parcae: A Stable Architecture for Looped Language Models That Achieves the Quality of a Transformer Twice the Size
The core idea is to recast the looped forward pass as a nonlinear time-variant dynamical system over the residual stream. By analyzing the linearized form of this system, the research team shows that prior injection methods — addition and concatenation-with-projection — produce marginally stable or unconstrained parameterizations of the state transition matrix Ā. Parcae fixes this by constraining Ā via discretization of a negative diagonal parameterization, guaranteeing ρ(Ā) < 1 at all times.
Two additional training fixes accompany the architectural change: a normalization layer on the prelude output to prevent late-stage loss spikes, and a per-sequence depth sampling algorithm that corrects a distributional mismatch bug in prior recurrence sampling methods.
On results:
→ Parcae reduces validation perplexity by up to 6.3% over parameter- and data-matched RDMs at 350M scale
→ A 770M Parcae model matches the Core benchmark quality of a 1.3B standard Transformer
→ At 1.3B parameters, Parcae outperforms the parameter-matched Transformer by 2.99 points on Core and 1.18 points on Core-Extended
On scaling laws:
→ Compute-optimal training scales mean recurrence µ_rec and tokens D in tandem following power laws (µ_rec ∝ C^0.40, D ∝ C^0.78)
→ Test-time looping follows a saturating exponential decay — gains plateau near the training recurrence depth µ_rec, setting a hard ceiling on inference-time scaling
→ A unified law predicts held-out model loss within 0.85–1.31% average error
Pretrained models from 140M to 1.3B are available on Hugging Face.
Full analysis: https://www.marktechpost.com/2026/04/16/ucsd-and-together-ai-research-introduces-parcae-a-stable-architecture-for-looped-language-models-that-achieves-the-quality-of-a-transformer-twice-the-size/
Paper: https://arxiv.org/pdf/2604.12946
Technical details: https://www.together.ai/blog/parcae
Models: https://huggingface.co/collections/SandyResearch/parcae
#MachineLearning#NLP#LLM#DeepLearning#AIResearch