#python#agentic_ai#agents#ai#ai_agents#realtime#stt#tts#video_agents#video_ai#vision_ai#voice_ai
Vision Agents is an open-source Python framework by Stream to build real-time AI agents that watch video, listen to audio, and respond instantly with low latency under 30ms. It integrates YOLO, Roboflow, OpenAI, Gemini, and 25+ tools for apps like golf coaching, security cameras detecting theft, or phone assistants. Install easily with `uv add vision-agents`, use free Stream credits, and deploy on any video network. You benefit by quickly creating smart video AI for gaming, safety, or coaching without vendor lock-in, saving time and costs on custom builds.
https://github.com/GetStream/Vision-Agents
Speech Note
#Linux desktop and #Sailfish OS app for note taking, reading and translating with offline #Speech to Text #stt, Text to Speech #tts and Machine #Translation
https://github.com/mkiol/dsnote
MPL-2.0 license
https://github.com/mkiol/dsnote#how-to-install
Speech Note let you take, read and translate notes in multiple languages. It uses Speech to Text, Text to Speech and Machine Translation to do so. Text and voice processing take place entirely offline, locally on your computer, without using a network connection. Your privacy is always respected. No data is sent to the Internet.
Speech Note uses many different processing engines to do its job. Currently these are used:
Speech to Text (STT)
Coqui STT (a fork of Mozilla DeepSpeech)
Vosk
whisper.cpp
Faster Whisper
april-asr
Text to Speech (TTS)
espeak-ng
MBROLA
Piper
RHVoice
Coqui TTS
Mimic 3
WhisperSpeech
Kokoro
Parler-TTS
F5-TTS
S.A.M.
Machine Translation (MT)
Bergamot Translator
anx-reader
Anx Reader is an advanced e-reader designed for book lovers, providing intelligent and focused reading.
It supports various #ebook formats, including EPUB, MOBI, AZW3, FB2, TXT, and PDF, and offers powerful AI features such as shelf organization by progress and tone, mind map generation for deep understanding, a built-in dictionary and translator, perspective analysis, and summary generation. #TTS
The program offers cross-platform syncing across Android, iOS, macOS, and Windows devices, allowing you to sync books, notes, and reading progress via WebDAV.
Additional features include customizable reading settings (font size and style, line spacing, themes), a workspace for notes with export options, and reading stats tracking with habit visualization.
Lang: Dart
https://github.com/Anxcye/anx-reader
Via @open_source_friend
Maid - Mobile Artificial Intelligence Distribution
Maid is a cross-platform free and an open-source application for interfacing with llama.cpp models locally, and remotely with Ollama, Mistral, Google Gemini and OpenAI models remotely.
-Choose from A wide range of models that runs LOCALLY and access remote models via api key!
-Text based output
-Image Generation (Selected Models only)
-No video or short clips generation yet
-Voice generation on selected models (Not tested)
-Setting model parameters
-Setting system prompt (Making the model behave/generate output in a certain way).
-And more.
Get it on
Github - https://github.com/Mobile-Artificial-Intelligence/maid/releases/latest
Fdroid - https://f-droid.org/packages/com.danemadsen.maid/
Spystore - https://play.google.com/store/apps/details?id=com.danemadsen.maid
*Don't clear CACHE OF THE APP AND EXCLUDE IT FROM SYSTEM'S AUTO CACHE CLEANING as app stores everything in device cache*
Follow @nogoolag and @libreware for more
#ai
Cherry Studio
Cherry Studio is a desktop client for Windows, Mac and Linux, which supports many LLM providers, including large cloud services and local models.
Among its main functions is the ability to work with more than 300 pre -designed #AI assistants, the creation of custom assistants, as well as support for various formats of documents, including text, images and office files.
The application offers tools for global search, top management and translating, which significantly improves interaction with the user thanks to the cross -platform and many settings options.
https://github.com/cherryhq/cherry-studio
LibreChat AI
Open-source platform that allows users to chat and interact with various #AI models through a unified interface. You can use OpenAI, Gemini, Anthropic and other AI models using their API. You may also use Ollama as an endpoint and use LibreChat to interact with local LLMs. It can be installed locally or deployed on a server.
LibreChat is designed to be highly customizable and supports a wide range of AI providers and services. Let me summarize its main features:
Free and Open Source: Accessible to everyone without any costs.
Customization: Offers extensive options to tailor the platform to individual preferences.
Multi-AI Support: Integrates with numerous AI models and services.
Unified Interface: Provides a consistent experience for interacting with different AI models.
https://www.librechat.ai
https://itsfoss.com/librechat-linux/
Jan.ai
https://jan.ai
A platform that enables you to run self-hosted local #AI. Jan provides an OpenAI-equivalent API server at localhost:1337 that can be used as a drop-in replacement with compatible apps.
With Jan, you can:
-Run open-source LLMs locally or connect to cloud AIs like ChatGPT or Google.
-Search the web and databases.
Integrate AI with everyday tools to work on your behalf (with permission).
-Customize and add features with Extensions.
Jan is opinionated software about what AI should be.
Version 3.10 of the legendary programming language is now here: https://www.python.org/downloads/release/python-3100
No rush to update, though. #Python
#Python is the main language of data science, per this analysis on 10M Jupyter Notebooks: https://blog.jetbrains.com/datalore/2020/12/17/we-downloaded-10-000-000-jupyter-notebooks-from-github-this-is-what-we-learned/
Dicio assistant
Dicio is a free and open source#voice#assistant running on #Android. It supports many different skills and input/output methods, and it provides both speech and graphical feedback to a question. It interprets user input and (when possible) generates user output entirely on-device, providing privacy by design. It has multilanguage support, and is currently available in these languages: Czech, English, French, German, Greek, Italian, Polish, Russian, Slovenian, Spanish, Swedish and Ukrainian. Open to contributions :-D
https://github.com/Stypox/dicio-android
Download
https://f-droid.org/packages/org.stypox.dicio
https://github.com/Stypox/dicio-android/releases
https://play.google.com/store/apps/details?id=org.stypox.dicio
Skills
Currently Dicio answers questions about:
search: looks up information on DuckDuckGo (and in the future more engines) - Search for Dicio
weather: collects weather information from OpenWeatherMap - What's the weather like?
lyrics: shows Genius lyrics for songs - What's the song that goes we will we will rock you?
open: opens an app on your device - Open NewPipe
calculator: evaluates basic calculations - What is four thousand and two times three minus a million divided by three hundred?
telephone: view and call contacts - Call Tom
timer: set, query and cancel timers - Set a timer for five minutes
current time: query current time - What time is it?
navigation: opens the navigation app at the requested position - Take me to New York, fifteenth avenue
media: play, pause, previous, next song
Speech to text
Dicio uses Vosk as its speech to text (#STT) engine. In order to be able to run on every phone small models are employed, weighing ~50MB. The download from here starts automatically whenever needed, so the app language can be changed seamlessly.