TGTGInsightintelligence telegramLIVE / telegram public index

TGINSIGHT CHAT

Hugging Face

@huggingface

Tecnologie

Iscritti195Iscritti attuali

Post tracciati1,011Post indicizzati

Reach recente215Visualizzazioni post recenti

Post recenti

Pag. 54 di 85 · 1,011 post

Pubblicato 31 ott

Vedi

Hugging Face (Twitter) RT @novasarc01: many people have asked me about how to keep up with frontier research and new models. this is one of the best gold resource to start with. covers pre-training, post-training, infra, architecture nuances and recent advances. huge respect to the hf team for putting it together. https://twitter.com/eliebakouch/status/1983930328751153159#m

21 views

Pubblicato 31 ott

Vedi

Hugging Face (Twitter) RT @Yampeleg: hf are doing god’s work fr https://twitter.com/_lewtun/status/1983929588909797414#m

18 views

Pubblicato 31 ott

Vedi

Hugging Face (Twitter) RT @Thom_Wolf: We’ve cooked another one of these 200+ pages practical books on model training that we love to write. This time it’s on all pretraining and post-training recipes and how to do a training project hyper parameter exploration. Closing the trilogy of: 1. Building a pretraining dataset with the « FineWeb blog post » 2. Scaling infra GPU cluster with the « Ultrascale Playbook » 3. And now all the training recipes and HP exploration for pre- and post-training with this « Smol Training Playbook » The HF science team on fire https://twitter.com/eliebakouch/status/1983930328751153159#m

19 views

Pubblicato 31 ott

Vedi

Hugging Face (Twitter) RT @RisingSayak: With simple changes, I was able to cut down @krea_ai's new real-time video gen's timing from 25.54s to 18.14s 🔥🚀 1. FA3 through `kernels` 2. Regional compilation 3. Selective (FP8) quantization Notes are in 🧵 below

17 views

Pubblicato 30 ott

Vedi

Hugging Face (Twitter) RT @Hesamation: holy shit... Hugging Face cooked again! 🔥 they just dropped a free blog (BOOK) that covers the no-bs reality of building SOTA models. i haven't seen any lab/researcher go into the real decisions behind the LLM research and its nuances. this is literally a gem. Syllabus: → Training compass: why → what → how → Every big model starts with a small ablation → Designing the model architecture → The art of data curation → The training marathon → Beyond base models — post-training in 2025 → Infrastructure - the unsung hero skimming through the blog, this is incredibly detailed just like their ultrascale playbook. i'm gonna read this and share more about it in the coming days. Read here: https://huggingface.co/spaces/HuggingFaceTB/smol-training-playbook

21 views

Pubblicato 30 ott

Vedi

Hugging Face (Twitter) RT @alexinexxx: thank god i’m unemployed so i can take a break from learning cuda & just read this banger hehe https://twitter.com/eliebakouch/status/1983930328751153159#m

20 views

Pubblicato 30 ott

Vedi

Hugging Face (Twitter) RT @srush_nlp: The work Hugging Face does continues to be incredible. Putting in serious effort to make these topics accessible and detailed. https://huggingface.co/spaces/HuggingFaceTB/smol-training-playbook#introduction

21 views

Pubblicato 30 ott

Vedi

Hugging Face (Twitter) RT @calebfahlgren: You know Hugging Face put out a banger blog post (book) when you see this https://twitter.com/eliebakouch/status/1983930328751153159#m

19 views

Pubblicato 30 ott

Vedi

Hugging Face (Twitter) RT @ClementDelangue: Happy Halloween from Reachy Mini! You'll be able to 3D print these skins at home thanks to open-source

18 views

Pubblicato 30 ott

Vedi

‌Hugging Face (Twitter) RT @Kimi_Moonshot: Kimi Linear Tech Report is dropped! 🚀 https://huggingface.co/moonshotai/Kimi-Linear-48B-A3B-Instruct Kimi Linear: A novel architecture that outperforms full attention with faster speeds and better performance—ready to serve as a drop-in replacement for full attention, featuring our open-sourced KDA kernels! Kimi Linear offers up to a 75% reduction in KV cache usage and up to 6x decoding throughput at a 1M context length. Key highlights: 🔹 Kimi Delta Attention: A hardware-efficient linear attention mechanism that refines the gated delta rule. 🔹 Kimi Linear Architecture: The first hybrid linear architecture to surpass pure full attention quality across the board. 🔹 Empirical Validation: Scaled, fair comparisons + open-sourced KDA kernels, vLLM integration, and checkpoints. The future of agentic-oriented attention is here! 💡

13 views

Pubblicato 30 ott

Vedi

Hugging Face (Twitter) RT @eliebakouch: Training LLMs end to end is hard. Very excited to share our new blog (book?) that cover the full pipeline: pre-training, post-training and infra. 200+ pages of what worked, what didn’t, and how to make it run reliably https://huggingface.co/spaces/HuggingFaceTB/smol-training-playbook

15 views

Pubblicato 30 ott

Vedi

Hugging Face (Twitter) RT @Nouamanetazi: We're releasing The Smol Training Playbook 📖 Training SmolLM3 on 384 H100s for nearly a month taught us: infrastructure is the unsung hero of LLM training. Most care about architecture and data, yet few understand the hardware layer. This playbook changes that 🧵

13 views

1 2•••5•••10•••15•••20•••25•••30•••35•••40•••45•••50•••52 535455 56•••60•••65•••70•••75•••80•••84 85

← PrecedentePag. 54 di 85Successiva →