TGTGInsightintelligence telegramLIVE / telegram public index
← Hugging Face
Hugging Face avatar

TGINSIGHT POST

Post #1316

@huggingface

Hugging Face

Visualizzazioni11Numero di visualizzazioni
Pubblicato7 set07/09/2025, 21:35
Contenuto del post

Contenuto

Hugging Face (Twitter) RT @maximelabonne: Pheww, another banger dataset from @huggingface! > 3T tokens, 475M PDFs, 1733 languages > Close to Nemotron-CC v2 and FineWeb-Edu+DCLM on its own (‼️) > Greatly boosts perf when combined, likely because it provides high diversity that complements the other datasets well