TGTGInsightintelligence telegramLIVE / telegram public index
Contenuto del post
Contenuto
Hugging Face (Twitter) RT @HKydlicek: We’re releasing the full FinePdfs source code — plus new datasets and models! 🚀 📚 Datasets: • OCR-Annotations — 1.6k PDFs labeled for OCR need • Gemma-LID-Annotation — 20k samples per language (annotated with Gemma3-27B) 🤖 Models: • XGB-OCR — OCR classifier for PDFs