TGTGInsightintelligence telegramLIVE / telegram public index
Contenuto del post
Contenuto
Hugging Face (Twitter) RT @eliebakouch: Most web data in (very) low resource languages is Bible and Wikipedia. The rest? @huggingface data team ran Gemma3 27B for 3 months to translate it into english, to improve translation models and to bring cultural context from 500+ language communities into english training data. Here is the full pipeline https://huggingface.co/datasets/HuggingFaceFW/finetranslationshttps://twitter.com/gui_penedo/status/2009677127671492616#m