#python#audio#deeplearning#minicpm#python#pytorch#speech#speech_synthesis#text_to_speech#tts#tts_model#voice_cloning
VoxCPM is a free, open-source TTS tool that turns text into realistic speech without tokens, creating expressive audio that matches context and clones voices perfectly from just 3-10 seconds of sample. Download VoxCPM1.5 (800M params) from Hugging Face, install via pip, and use simple Python or CLI commands for fast synthesis (RTF 0.15 on RTX 4090) or fine-tuning your own voices. You benefit by easily making natural audiobooks, podcasts, clones, or apps with pro-quality sound—saving time and costs on voice work.
https://github.com/OpenBMB/VoxCPM
Speech Note
#Linux desktop and #Sailfish OS app for note taking, reading and translating with offline #Speech to Text #stt, Text to Speech #tts and Machine #Translation
https://github.com/mkiol/dsnote
MPL-2.0 license
https://github.com/mkiol/dsnote#how-to-install
Speech Note let you take, read and translate notes in multiple languages. It uses Speech to Text, Text to Speech and Machine Translation to do so. Text and voice processing take place entirely offline, locally on your computer, without using a network connection. Your privacy is always respected. No data is sent to the Internet.
Speech Note uses many different processing engines to do its job. Currently these are used:
Speech to Text (STT)
Coqui STT (a fork of Mozilla DeepSpeech)
Vosk
whisper.cpp
Faster Whisper
april-asr
Text to Speech (TTS)
espeak-ng
MBROLA
Piper
RHVoice
Coqui TTS
Mimic 3
WhisperSpeech
Kokoro
Parler-TTS
F5-TTS
S.A.M.
Machine Translation (MT)
Bergamot Translator
https://writeout.ai
#Transcribe and #translate any #audio file. 100% free to use.
This website with source code available (it can be hosted locally) allows you to upload any audio file and receive a transcription and/or text translation. It uses OpenAI's Whisper API on the back end.
Source on GitHub:
https://github.com/beyondcode/writeout.ai
#writeout#ai#speech#recognition
Vosk Speech Recognition Toolkit
Vosk is an offline open source #speech#recognition toolkit. It enables speech recognition for 20+ languages and dialects - English, Indian English, German, French, Spanish, Portuguese, Chinese, Russian, Turkish, Vietnamese, Italian, Dutch, Catalan, Arabic, Greek, Farsi, Filipino, Ukrainian, Kazakh, Swedish, Japanese, Esperanto, Hindi, Czech. More to come.
Vosk models are small (50 Mb) but provide continuous large vocabulary transcription, zero-latency response with streaming API, reconfigurable vocabulary and speaker identification.
Speech recognition bindings implemented for various programming languages like Python, Java, Node.JS, C#, C++ and others.
Vosk supplies speech recognition for chatbots, smart home appliances, virtual assistants. It can also create subtitles for movies, transcription for lectures and interviews.
Vosk scales from small devices like Raspberry Pi or Android smartphone to big clusters.
https://t.me/speech_recognition
https://alphacephei.com/vosk
https://github.com/alphacep/vosk-api
https://github.com/tyiannak/pyAudioAnalysis
#pyAudioAnalysis is a Python library covering a wide range of audio analysis tasks. Through pyAudioAnalysis you can:
Extract #audio features and representations (e.g. mfccs, spectrogram, chromagram)
Classify unknown #sounds
Train, parameter tune and evaluate classifiers of audio segments
Detect audio events and exclude silence periods from long recordings
Perform supervised segmentation (joint segmentation - classification)
Perform unsupervised segmentation (e.g. speaker diarization)
Extract audio thumbnails
Train and use audio regression models (example application: emotion recognition)
Apply dimensionality reduction to visualize audio data and content similarities
anx-reader
Anx Reader is an advanced e-reader designed for book lovers, providing intelligent and focused reading.
It supports various #ebook formats, including EPUB, MOBI, AZW3, FB2, TXT, and PDF, and offers powerful AI features such as shelf organization by progress and tone, mind map generation for deep understanding, a built-in dictionary and translator, perspective analysis, and summary generation. #TTS
The program offers cross-platform syncing across Android, iOS, macOS, and Windows devices, allowing you to sync books, notes, and reading progress via WebDAV.
Additional features include customizable reading settings (font size and style, line spacing, themes), a workspace for notes with export options, and reading stats tracking with habit visualization.
Lang: Dart
https://github.com/Anxcye/anx-reader
Via @open_source_friend
Version 3.10 of the legendary programming language is now here: https://www.python.org/downloads/release/python-3100
No rush to update, though. #Python
#Python is the main language of data science, per this analysis on 10M Jupyter Notebooks: https://blog.jetbrains.com/datalore/2020/12/17/we-downloaded-10-000-000-jupyter-notebooks-from-github-this-is-what-we-learned/
https://github.com/pytorch/pytorch
#PyTorch doesn't only port #Torch to Python, but adds many other conveniences, such as #GPU acceleration and a library that allows multiprocessing to be done with shared memory (for partitioning jobs across multiple cores). Best of all, it can provide GPU-powered replacements for some of the unaccelerated functions in #NumPy.
#machine_learning
A list of software #alternatives and resources for professional #audio#video and live events production on #Linux
https://gitlab.com/nodiscc/awesome-linuxaudio
https://github.com/nodiscc/awesome-linuxaudio/releases/tag/1.0.0