TGTGInsightаналитика telegramLIVE / telegram public index
← [404] — программирование

TGINSIGHT SIMILAR POSTS

Найти похожее

Источник @procode404 · Post #3885 · 18 дек.

Почему развитие в ИИ стоит начинать с изучения математики и алгоритмов Руководитель Школы анализа данных Яндекса в подкасте Machine Learning Podcast рассказывает, почему фундамент (матан, линал, теорвер, алгоритмы) — это не скучная теория, а база для работы с ИИ в 2026. Вы узнаете, как глубокое понимание математики помогает писать эффективный код, отлаживать модели и ориентироваться в разных областях ML. А ещё — почему даже опытным разработчикам полезно возвращаться к фундаментальным дисциплинам. Перейти к прослушиванию #подкаст#ML

Результаты

Найдено 2,033 похожих постов

Общий глобальный поиск

Am Neumarkt 😱

@amneumarkt · Post #279 · 26.10.2021, 19:30

#ML (I am experimenting with a new platform. This post is also available at: https://community.kausalflow.com/c/ml-journal-club/how-do-neural-network-generalize ) There are somethings that are quite hard to understand in deep neural networks. One of them is how the network generalizes. [Zhang2016] shows some experiments about the amazing ability of neural networks to learn even completely random datasets. But they can not generalize as the data is random. How to understand generalization? The authors mentioned some theories like VC dimension, Rademacher complexity, and uniform stability. But none of them is good enough. Recently, I found the work of Simon et al [Simon2021]. The authors also wrote a blog about this paper [Simon2021Blog]. The idea is to simplify the problem of generalization by looking at how a neural network approximates a function f. This is approximate vectors in Hilbert space. Thus we are looking at the similarity of the vectors f, and its neural network approximation f'. The similarity of these two vectors is related to the eigenvalues of the so-called “neural tangent kernel” (NTK). Using NTK, they derived an amazingly simple quantity, learnability, which can measure how Hilbert space vectors align with each other, that is, how good the approximation using the neural network is. [Zhang2016]: Zhang C, Bengio S, Hardt M, Recht B, Vinyals O. Understanding deep learning requires rethinking generalization. arXiv [cs.LG]. 2016. Available: http://arxiv.org/abs/1611.03530 [Simon2021Blog]: Simon J. A First-Principles Theory of NeuralNetwork Generalization. In: The Berkeley Artificial Intelligence Research Blog [Internet]. [cited 26 Oct 2021]. Available: https://bair.berkeley.edu/blog/2021/10/25/eigenlearning/ [Simon2021]: Simon JB, Dickens M, DeWeese MR. Neural Tangent Kernel Eigenvalues Accurately Predict Generalization. arXiv [cs.LG]. 2021. Available: http://arxiv.org/abs/2110.03922

Hashtags

Am Neumarkt 😱

@amneumarkt · Post #275 · 12.10.2021, 20:08

#ML Duan T, Avati A, Ding DY, Thai KK, Basu S, Ng AY, et al. NGBoost: Natural Gradient Boosting for probabilistic prediction. arXiv [cs.LG]. 2019. Available: http://arxiv.org/abs/1910.03225 (I had it on my reading list for a long time. However, I didn't read it until today because the title and abstract are not attractive at all.) But this is a good paper. It goes deep to dig out the fundamental reasons why some methods work and others don't. When inferring probability distributions, it is straightforward to come up with methods with parametrized distributions (statistical manifolds). Then, by tuning the parameters, we adjust the distribution to fit our dataset the best. The problem is the choice of the objective function and optimization methods. This paper mentioned a most generic objective function and a framework to optimize the model along the natural gradient instead of just the gradient w.r.t. the parameters. Different parametrizations of the objective is like coordinate transformations and chain rule only works if the transformations are in a "flat" space but such "flat" space is not necessarily a good choice for a high dimensional problem. For a space that is approximately flat in a small region, we can define distance like what we do in differential geometry[^1]. Meanwhile, just like "covariant derivatives" in differential geometry, some kind of covariant derivative can be found on statistical manifolds and they are called "natural derivatives". Descending in the direction of natural derivatives is navigating the landscape more efficiently. [^1]: This a Riemannian space

Hashtags

Am Neumarkt 😱

@amneumarkt · Post #268 · 25.09.2021, 18:24

#ML scikit learn reached 1.0. Nothing exciting about these new stuff but the major release probably means something. Release Highlights for scikit-learn 1.0 — scikit-learn 1.0 documentation http://scikit-learn.org/stable/auto_examples/release_highlights/plot_release_highlights_1_0_0.html

Hashtags

Am Neumarkt 😱

@amneumarkt · Post #266 · 19.09.2021, 18:51

#ML Phys. Rev. X 11, 031059 (2021) - Statistical Mechanics of Deep Linear Neural Networks: The Backpropagating Kernel Renormalization https://journals.aps.org/prx/abstract/10.1103/PhysRevX.11.031059

Hashtags

Am Neumarkt 😱

@amneumarkt · Post #262 · 17.09.2021, 06:09

#ML The authors investigate the geometry formed by the responses of neurons for certain stimulations (tunning curve). Using stimulation as the hidden variable, we can construct a geometry of neuron responses. The authors clarified the relations between this geometry and other measurements such as mutual information. The story itself in this paper may not be interesting to machine learning practitioners. But the method of using the geometry of neuron responses to probe the brain is intriguing. We may borrow this method to help us with the internal mechanism of neural networks. Kriegeskorte, Nikolaus, and Xue-Xin Wei. 2021. “Neural Tuning and Representational Geometry.” Nature Reviews. Neuroscience, September. https://doi.org/10.1038/s41583-021-00502-3.

Hashtags

Am Neumarkt 😱

@amneumarkt · Post #252 · 24.08.2021, 09:48

#ML https://www.microsoft.com/en-us/research/blog/make-every-feature-binary-a-135b-parameter-sparse-neural-network-for-massively-improved-search-relevance/ Though not the core of the model, I noticed that this model (MEB) uses the user search behavior on Bing to build the language model. If a search result on Bing is clicked by the user, it is considered to be a positive sample for the query, otherwise a negative sample. In self-supervised learning, it has been shown that negative sampling is extremely important. This Bing search dataset is naturally labeling the positive and negative samples. Kuhl idea.

Hashtags

Am Neumarkt 😱

@amneumarkt · Post #246 · 22.07.2021, 11:00

#ML Julia Computing got a lot of investment recently. I need to dive deeper into the Julia Language. https://juliacomputing.com/blog/2021/07/series-a/

Hashtags

Am Neumarkt 😱

@amneumarkt · Post #243 · 10.07.2021, 10:33

#ML Implicit Regularization in Tensor Factorization: Can Tensor Rank Shed Light on Generalization in Deep Learning? – Off the convex path http://www.offconvex.org/2021/07/08/imp-reg-tf/

Hashtags

Am Neumarkt 😱

@amneumarkt · Post #240 · 02.07.2021, 14:02

#ML Great. Tensorflow implemented built-in decision forest models. https://blog.tensorflow.org/2021/05/introducing-tensorflow-decision-forests.html?m=1

Hashtags

12•••45678•••100•••169170