TGINSIGHT CHAT
GitHub Trends
@githubtrending
TechnologiesSee what the GitHub community is most excited about today. A bot automatically fetches new repositories from https://github.com/trending and sends them to the channel. Author and maintainer: https://github.com/katursis
Recent posts
Tag: #asr · 3 posts
Posted Jun 11
#python#asr#captions#cli#python#subtitle#subtitles#transcript#transcripts#translating_transcripts#youtube#youtube_api#youtube_asr#youtube_captions#youtube_subtitles#youtube_transcript#youtube_transcripts#youtube_video The YouTube Transcript API is a tool that helps you get the text from YouTube videos. It's fast and easy to use, saving you time compared to watching the whole video. You can use it to make subtitles, translate text, and even analyze what's being said in videos. This is helpful for content creators who want to make their videos more accessible and for researchers who need to study video content quickly. It also supports multiple languages, making it useful for a wide range of users. https://github.com/jdepoix/youtube-transcript-api
Posted Jun 7
#jupyter_notebook#android#asr#deep_learning#deep_neural_networks#deepspeech#google_speech_to_text#ios#kaldi#offline#privacy#python#raspberry_pi#speaker_identification#speaker_verification#speech_recognition#speech_to_text#speech_to_text_android#stt#voice_recognition#vosk Vosk is a powerful tool for recognizing speech without needing the internet. It supports over 20 languages and dialects, making it useful for many different users. Vosk is small and efficient, allowing it to work on small devices like smartphones and Raspberry Pi. It can be used for things like chatbots, smart home devices, and creating subtitles for videos. This means users can have private and fast speech recognition anywhere, which is especially helpful when internet access is limited. https://github.com/alphacep/vosk-api
Posted May 8
#python#asr#deeplearning#generative_ai#large_language_models#machine_translation#multimodal#neural_networks#speaker_diariazation#speaker_recognition#speech_synthesis#speech_translation#tts NVIDIA NeMo is a powerful, easy-to-use platform for building, customizing, and deploying generative AI models like large language models (LLMs), vision language models, and speech AI. It lets you quickly train and fine-tune models using pre-built code and checkpoints, supports the latest model architectures, and works on cloud, data center, or edge environments. NeMo 2.0 is even more flexible and scalable, with Python-based configuration and modular design, making it simple to experiment and scale up. The main benefit is that you can create advanced AI applications faster, with less effort, and at lower cost, while getting high performance and easy deployment options[1][2][3]. https://github.com/NVIDIA/NeMo