TGTGInsighttelegram intelligenceLIVE / telegram public index
← GitHub Trends

TGINSIGHT SIMILAR POSTS

Find similar content

Source channel @githubtrending · Post #14985 · Jul 22

#c_lang#cuda#cuda_driver_api#cuda_kernels#cuda_opengl You can use the CUDA Samples from NVIDIA to learn and test CUDA Toolkit 12.9 features by downloading them from GitHub or as a ZIP file. These samples show how to use CUDA for GPU programming, including utilities, concepts, libraries, and performance optimization. You build them with CMake on Linux, Windows, or Tegra devices, and can run tests automatically with a provided Python script. This helps you understand CUDA programming, debug GPU code, and optimize your applications for better performance on NVIDIA GPUs. It’s a practical way to develop and improve GPU-accelerated software efficiently. https://github.com/NVIDIA/cuda-samples

Results

40 similar posts found

General global search

GitHub Trends

@githubtrending · Post #15007 · 07/29/2025, 12:00 PM

#c_lang You can find detailed guides for Linux kernel developers and users in the Documentation/ folder, which includes files in formats like HTML and PDF. To build these documents yourself, use commands like `make htmldocs` or `make pdfdocs`. The documentation covers important topics such as kernel building, running requirements, and upgrade issues. You can also view the latest formatted docs online. Additionally, the kernel source uses a special comment style called kernel-doc to embed documentation directly in the code, making it easier to understand functions and structures. This helps you learn, build, and maintain the Linux kernel more effectively. https://github.com/raspberrypi/linux

Hashtags

GitHub Trends

@githubtrending · Post #14899 · 07/02/2025, 02:00 PM

#c_lang FreeRTOS is a powerful tool for building embedded systems. It helps developers create complex systems that can do many tasks at once while using limited resources. This makes it great for small devices like those in IoT. FreeRTOS is also open-source, which means developers can modify it and share improvements. It supports many libraries and tools, such as networking and file systems, making it easy to connect devices to the internet and manage data. This helps developers quickly build and maintain their projects, saving time and effort. https://github.com/FreeRTOS/FreeRTOS

Hashtags

GitHub Trends

@githubtrending · Post #14870 · 06/27/2025, 11:30 AM

#c_lang Microui is a very small and simple user interface library written in plain C, with about 1100 lines of code. It works within a fixed memory size without extra allocation and includes basic controls like windows, buttons, sliders, textboxes, and labels. It can be used with any system that can draw rectangles and text, and you can easily add your own custom controls. Microui processes user input and generates drawing commands but does not draw itself, so you handle rendering separately. This makes it lightweight, portable, and easy to integrate into various projects, especially where minimal memory use and simplicity are important[1][2]. https://github.com/rxi/microui

Hashtags

GitHub Trends

@githubtrending · Post #14856 · 06/23/2025, 12:30 PM

#c_lang SpaghettiKart is an unofficial PC port of Mario Kart 64 that runs smoothly on Windows, Linux, and even Nintendo Switch, offering better performance and online multiplayer, which the original N64 version lacked. You need a legal US ROM in .z64 format to use it, as the game itself doesn’t include any copyrighted assets. It supports custom mods and different graphics backends for better visuals and stability. This means you can enjoy Mario Kart 64 with improved graphics, online play, and modding options on modern devices, making the classic game more accessible and fun today[2][4]. https://github.com/HarbourMasters/SpaghettiKart

Hashtags

GitHub Trends

@githubtrending · Post #14843 · 06/19/2025, 01:00 PM

#c_lang ESP-IDF is Espressif's official software framework for developing applications on ESP32 and related chips, supporting Windows, Linux, and macOS. It offers a complete set of tools, libraries, and drivers for Wi-Fi, Bluetooth, and IoT features, enabling you to build connected devices efficiently using C or C++. ESP-IDF supports multiple chip versions with stable releases and ongoing updates, ensuring reliability and production readiness. It includes easy commands for building, flashing, and monitoring your projects, plus example templates to start quickly. Using ESP-IDF helps you create robust, feature-rich IoT applications with strong community and official support. This saves time and effort in development and deployment. https://github.com/espressif/esp-idf

Hashtags

GitHub Trends

@githubtrending · Post #14785 · 06/04/2025, 11:30 AM

#c_lang jemalloc is a memory allocator that helps computers manage memory more efficiently. It was first used in FreeBSD in 2005 and is now used in many applications because it reduces memory fragmentation and supports many threads running at the same time. This means it can handle lots of small memory requests without slowing down, making it very useful for demanding applications. The benefit to users is faster and more reliable performance, especially in programs that need to handle a lot of data or run many tasks simultaneously. https://github.com/jemalloc/jemalloc

Hashtags

GitHub Trends

@githubtrending · Post #14734 · 05/21/2025, 01:30 PM

#c_lang Windows Subsystem for Linux 2 (WSL2) lets you run Linux on Windows using a lightweight virtual machine. This means you can use Linux tools and apps directly from Windows, which is great for developers. WSL2 is faster and more efficient than its predecessor, WSL1, because it uses a complete Linux kernel. This setup allows for better performance and compatibility with Linux applications. Users can also customize their WSL2 kernel by building it from source, which can be useful for adding specific features or fixing issues. https://github.com/microsoft/WSL2-Linux-Kernel

Hashtags

GitHub Trends

@githubtrending · Post #14730 · 05/21/2025, 11:30 AM

#c_lang Kilo is a small text editor that uses less than 1,000 lines of code. It is simple to use and doesn't need any extra libraries. You can save files with **CTRL-S**, quit with **CTRL-Q**, and search for words with **CTRL-F**. Kilo is a good starting point for making more advanced text editors or command-line interfaces. It's free to use and modify under the BSD 2 clause license. This makes it easy for users to learn from and build upon, helping them create their own tools. https://github.com/antirez/kilo

Hashtags

GitHub Trends

@githubtrending · Post #14719 · 05/19/2025, 12:00 AM

#c_lang Using the Flipper Zero can be very helpful for people interested in cybersecurity and technology. It's a tool that helps with physical penetration testing and software-defined radio. You can find useful resources like infrared codes, tutorials, and guides on GitHub and other platforms. There are also communities like Discord and forums where you can get help and learn more about the device. This helps users learn and improve their skills in a fun and interactive way. https://github.com/UberGuidoZ/Flipper

Hashtags

GitHub Trends

@githubtrending · Post #14670 · 05/04/2025, 11:30 AM

#c_lang Klipper is a special software for 3D printers that uses a computer to help the printer work better. It makes prints faster and more precise by controlling the printer's movements very accurately. This means you get better quality prints with less vibration and fewer mistakes. Klipper also helps reduce issues like nozzle oozing, which can ruin prints. It's free and easy to set up, making it a great choice for anyone looking to improve their 3D printing experience. https://github.com/Klipper3d/klipper

Hashtags

Machinelearning

@ai_machinelearning_big_data · Post #9190 · 12/05/2025, 01:40 PM

🌟CUDA-L2: ИИ научился писать CUDA-ядра эффективнее инженеров NVIDIA. Исследовательская группа DeepReinforce разработала систему полностью автоматического написания GPU-кода для матричного умножения под названием CUDA-L2. Этот код работает на 10–30% быстрее, чем cuBLAS и cuBLASLt, а это, на минуточку, уже оптимизированные библиотеки от самой NVIDIA. Обычно такие библиотеки создаются вручную людьми, которые используют готовые шаблоны ядер. А автотюнеры лишь подкручивают параметры, например, размер тайлов. Но DeepReinforce считают, что даже критически важные и глубоко оптимизированные задачи, как HGEMM, могут быть улучшены с помощью LLM, работающей в связке с RL. В системе CUDA-L2 языковая модель буквально пишет исходный код CUDA с нуля для каждого размера матрицы. Она не просто меняет параметры, она может менять структуру кода, циклы, стратегию тайлинга, паддинг и даже свизл-паттерны. А еще, она сама выбирает стиль программирования - будь то сырой CUDA, CuTe, CUTLASS или inline PTX. Процесс выглядит так: цикл RL запускает сгенерированные ядра на реальном железе, измеряет скорость и корректность, а затем обновляет LLM. Со временем модель выводит свои собственные правила производительности, вместо того чтобы полагаться на знания, заложенные людьми. В качестве генератора использовалась модель DeepSeek 671B. Ее дополнительно доучили на смеси массива CUDA-ядер и качественном коде из библиотек PyTorch, ATen, CUTLASS и примеров от NVIDIA. 🟡Что это дает на практике Для претрейна и файнтюна LLM большая часть времени GPU тратится именно на операции матричного умножения HGEMM. Если ускорить эти ядра на те самые 10–30%, которые обещает CUDA-L2, то весь процесс обучения становится заметно дешевле и быстрее. Поскольку CUDA-L2 обрабатывает около 1000 реальных размеров матриц, а не пару вручную настроенных, ускорение работает для самых разных архитектур. Это значит, что в тот же бюджет на GPU можно вместить больше токенов обучения, больше прогонов SFT или RLHF и т.д. 🟡Тесты HGEMM-ядра, созданные CUDA-L2, стабильно быстрее стандартных библиотек. В так называемом "оффлайн-сценарии" CUDA-L2 работает примерно на 17–22% быстрее, чем torch.matmul, cuBLAS и cuBLASLt. Она даже на 11% обгоняет cuBLASLt AutoTuning, который сам по себе уже использует поиск ядра. А в "серверном", сценарии, который имитирует реальный инференс с паузами между вызовами - разница еще больше: буст в 24–29% по сравнению с torch.matmul и cuBLAS. Простым рисёрчем проект не ограничен, в репозитории на Github авторы выложили оптимизированные ядра HGEMM A100 для 1000 конфигураций. В планах: расширение на архитектуры Ada Lovelace, Hopper, Blackwell, поддержка более плотных конфигураций и 32-битный HGEMM. 🟡Arxiv 🖥GitHub @ai_machinelearning_big_data #AI#ML#CUDA#DeepReinforce

每日 AWESOME 观察

@awesomeopensource · Post #46 · 03/02/2018, 05:57 PM

​​waifu2x 基于深度学习的二次元插图超解析器。你是否有过好看的插图想做壁纸但因为分辨率不够只好作罢的经历。现在waifu2x可以拯救你,waifu2x成倍的放大二次元图片并且降噪,你的老婆从未如此清晰。 ps: waifu2x依赖cuda(nvidia显卡专属),如果你和我一样没有n卡推荐使用cl-waifu2x作为转换器,支持所有平台和cpu或者gpu,性能还不错。 环境:#cuda 语言:#lua 分类:#深度学习#工具