Найти похожее

@amneumarkt · Post #335 · 29.03.2022, 05:57

#ml Interesting... There're some discussions on the lottery ticket hypothesis. https://www.reddit.com/r/MachineLearning/comments/tqjd3w/d_neural_networks_are_not_the_only_universal/

Hashtags

@amneumarkt · Post #334 · 22.03.2022, 20:41

#ml Beautiful and systematic derivation showing how and why negative sampling works Negative sampling is a great technique to estimate the softmax especially when the calculation of the partition function is intractable. It's used in word2vec, and many other models such as node2vec. Goldberg Y, Levy O. word2vec Explained: deriving Mikolov et al.’s negative-sampling word-embedding method. arXiv [cs.CL]. 2014. Available: http://arxiv.org/abs/1402.3722

Hashtags

@amneumarkt · Post #327 · 18.03.2022, 19:11

#ml (WARNING: Promoting of my notes. This is a test.) I learned something very interesting today: CRPS. Suppose we would like to approximate the quantile function of some data points. If we assume a parametric model of the quantile function, e.g., Q(x|theta), how do we find the parameters using the given dataset? Naturally, we need a loss function to compare our quantile function to the datapoints. CRPS is a robust choice. I have seen it being used in several papers in time series forecasting. You can find more details here: https://datumorphism.leima.is/cards/time-series/crps/

Hashtags

@amneumarkt · Post #326 · 16.03.2022, 07:57

#ml It’s a lengthy article but also a well written one. A few comments: - The author wrote a paper on “The Next Decade in AI”: https://arxiv.org/abs/2002.06177 - Make things work in their own domain. If we are gonna come up with a “theory of everything” for computing or intelligence, we will hit the “mesoscopic” wall, where the bottom up theories and the top down approaches meet but we can’t really make a connection. In the case of intelligence, the wall is determined by the complexities (maybe MDL?). You can make symbols work for high complexities but not always. Similar thing happens to neural networks. - The neural symbolic approach sounds good but it’s almost like patching a bike as wheels of a train. https://nautil.us/deep-learning-is-hitting-a-wall-14467/

Hashtags

@amneumarkt · Post #324 · 07.03.2022, 08:35

#ml I share similar thoughts with the top comment by theXYZT. If I may add to her comment, I would say: Embrace the new approach even if it shatters our philosophy. But it's not only about what happened in the history of physics. It's about what we believe in science. In some sense, the purpose of interpretability and parsimony is for human to come up with better ideas and making us happy. If a universal model is working well enough and can be improved gradually already, interpretability is not as important as predictability. This is more or less the first principle of science, if I may say so. https://www.reddit.com/r/MachineLearning/comments/t8fn7m/d_are_we_at_the_end_of_an_era_where_ml_could_be/

Hashtags

@amneumarkt · Post #318 · 09.02.2022, 07:20

#ML I made some slides to bootstrap a community in my company to share papers on graph related methods (spectral, graph neural networks, etc). These slides are mostly based on the first two chapters of the book by William Hamilton. I added some intuitive interpretations on some key ideas. Some of these are frequently used in graph neural networks even transformers. Building intuitions helps us unboxing these neural networks. But the slides are only skeleton notes so I probably have to expand them at some point. I am thinking about drawing more about the book and on this topic. Maybe even making some short videos using these slides. Let's see how far I can go. I am way too busy now. (<-no excuse)

Hashtags

@amneumarkt · Post #317 · 03.02.2022, 06:47

#ml Lol, DeepMind and OpenAI: https://deepmind.com/blog/article/Competitive-programming-with-AlphaCode vs https://openai.com/blog/formal-math/

Hashtags

@amneumarkt · Post #315 · 25.01.2022, 07:39

#ml https://ruder.io/ml-highlights-2021/

Hashtags

@amneumarkt · Post #294 · 19.11.2021, 11:59

#ML SHAP (SHapley Additive exPlanations) is a system of methods to interpret machine learning models. The author of SHAP built an easy-to-use package to help us understand how the features are contributing to the machine learning model predictions. The package comes with a comprehensive tutorial for different machine learning frameworks. - Python Package: [slundberg/shap](https://shap.readthedocs.io/) - A tutorial on how to use it: https://www.aidancooper.co.uk/a-non-technical-guide-to-interpreting-shap-analyses/ --- The package is so popular and you might be using it already. So what is SHAP exactly? It is a series of methods based on Shapley values. > SHAP (SHapley Additive exPlanations) is a game-theoretic approach to explain the output of any machine learning model. > > -- [slundberg/shap](https://github.com/slundberg/shap) Regarding Shapley value: There are two key ideas in calculating a Shapley value. - A method to measure the contribution to the final prediction of some certain combination of features. - A method to combine these "contributions" into a score. SHAP provides some methods to estimate Shapley values and also for different models. The following two pages explain Shapley value and SHAP thoroughly. - https://christophm.github.io/interpretable-ml-book/shap.html - https://christophm.github.io/interpretable-ml-book/shapley.html References: - Lundberg SM, Lee SI. A unified approach to interpreting model predictions. of the 31st international conference on neural …. 2017. Available: http://papers.nips.cc/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf - Lundberg SM, Nair B, Vavilala MS, Horibe M, Eisses MJ, Adams T, et al. Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nature Biomedical Engineering. 2018;2: 749–760. doi:10.1038/s41551-018-0304-0 --- I posted [a similar article years ago in our Chinese data weekly newsletter](https://github.com/data-com/weekly/discussions/27) but for a different story.

Hashtags

@amneumarkt · Post #285 · 08.11.2021, 16:05

#ML (See also https://bit.ly/3F1Kv2F ) Centered Kernel Alignment (CKA) is a similarity metric designed to measure the similarity of between representations of features in neural networks[^Kornblith2019]. CKA is based on the Hilbert-Schmidt Independence Criterion (HSIC). HSIC is defined using the centered kernels of the features to compare[^Gretton2005]. But HSIC is not invariant to isotropic scaling which is required for a similarity metric of representations[^Kornblith2019]. CKA is a normalization of HSIC. The attached figure shows why CKA makes sense. CKA has problems too. Seita et al argues that CKA is a metric based on intuitive tests, i.e., calculate cases that we believe that should be similar and check if the CKA values is consistent with this intuition. Seita et al built a quantitive benchmark[^Seita]. [^Kornblith2019]: http://arxiv.org/abs/1905.00414 [^Gretton2005]: https://link.springer.com/chapter/10.1007%2F11564089_7 [^Seita]: https://bair.berkeley.edu/blog/2021/11/05/similarity/

Hashtags

@amneumarkt · Post #281 · 02.11.2021, 17:04

#ML ( I am experimenting with a new platform. This post is also available at: https://community.kausalflow.com/c/ml-journal-club/probably-approximately-correct-pac-learning-and-bayesian-view ) The first time I read about PAC was in the book The Nature of Statistical Learning Theory by Vapnik [^Vapnik2000]. PAC is a systematic theory on why learning from data is even feasible [^Valiant1984]. The idea is to quantify the errors when learning from data and we find that is is possible to have infinitesimal error under some certain codnitions, e.g., large datasets. Quote from Guedj [^Guedj2019]: > A PAC inequality states that with an arbitrarily high probability (hence "probably"), the performance (as provided by a loss function) of a learning algorithm is upper-bounded by a term decaying to an optimal value as more data is collected (hence "approximately correct"). Bayesian learning is an very important topic in machine learning. We implement Bayesian rule in the components of learning, e.g., postierior in loss function. There also exists a PAC theory for Bayesian learning that explains why Bayesian algorithms works. Guedj wrote a primer on this topic[^Guedj2019]. [^Vapnik2000]: Vladimir N. Vapnik. The Nature of Statistical Learning Theory. 2000. doi:10.1007/978-1-4757-3264-1 [^Valiant1984]: Valiant LG. A theory of the learnable. Commun ACM. 1984;27: 1134–1142. doi:10.1145/1968.1972 [^Guedj2019]: Guedj B. A Primer on PAC-Bayesian Learning. arXiv [stat.ML]. 2019. Available: http://arxiv.org/abs/1901.05353 [^Bernstein2021]: Bernstein J. Machine learning is just statistics + quantifier reversal. In: jeremybernste [Internet]. [cited 1 Nov 2021]. Available: https://jeremybernste.in/writing/ml-is-just-statistics

Hashtags

@amneumarkt · Post #280 · 02.11.2021, 13:47

#ML https://www.microsoft.com/en-us/research/blog/turing-bletchley-a-universal-image-language-representation-model-by-microsoft/

Hashtags