TGTGInsightintelligence telegramLIVE / telegram public index
← Hugging Face
Hugging Face avatar

TGINSIGHT POST

Post #1313

@huggingface

Hugging Face

Visualizzazioni18Numero di visualizzazioni
Pubblicato7 set07/09/2025, 21:34
Contenuto del post

Contenuto

Hugging Face (Twitter) RT @Thom_Wolf: This is huge Continuing our foundational work to enable anyone to train state of the art AI model, we’re thrilled to release « FinePDFs » 3T tokens of textual data that until now was locked away in PDFs, arguably some of the highest quality publicly available data out there. We gathered FinePDF to create the largest permissively licensed corpus sourced exclusively from PDFs. Amazingly challenging infra and processing work, h/t to the fineweb team https://twitter.com/HKydlicek/status/1964584936524124645#m