TGINSIGHT CHAT
Am Neumarkt 😱
@amneumarkt
TechnologiesMachine learning and other gibberish See also: https://sharing.leima.is Notebooks: https://datumorphism.leima.is
Seneste opslag
Tag: #data · 25 opslag
Publiceret 21. nov.
#data Last year a colleague introduced Marimo to me. It was surprisingly good after the first few days but then I was annoyed by the auto rerun on expensive computations. Then I bailed out. This morning I read a blog on this topic and realized there's a switch... Time to jump in again. https://docs.marimo.io/guides/configuration/runtime_configuration/#disable-autorun-on-cell-change-lazy-execution
Hashtags
Publiceret 25. sep.
#data https://blog.cloudflare.com/cloudflare-data-platform/
Hashtags
Publiceret 4. apr.
#data Reciprocal Tariff Calculations | United States Trade Representative https://ustr.gov/issue-areas/reciprocal-tariff-calculations
Hashtags
Publiceret 21. feb.
#data Do Not Use Azure. To be honest, I can't even name one good thing from Microsoft. - By a person who is frustrated by Microsoft products including the sh*tty Windows OS. https://www.reddit.com/r/dataengineering/s/G9uTQmVxWC
Hashtags
Publiceret 12. feb.
#data This is a cool concept. https://pola.rs/posts/polars-cloud-what-we-are-building/
Hashtags
Publiceret 7. dec.
#data Nice. Observable's new data app generator. https://observablehq.com/framework/
Hashtags
Publiceret 24. okt.
#data How to run data science projects | Science & technology experiments https://dzidas.com/ml/2024/10/22/implementing-data-science-projects/
Hashtags
Publiceret 27. mar.
#data https://duckdb.org/2024/03/26/42-parquet-a-zip-bomb-for-the-big-data-age.html
Hashtags
Publiceret 23. mar.
#data Interesting read > we propose DATALORE, a framework that explains data changes between an initial dataset and its augmented version to improves traceability https://www.amazon.science/publications/datalore-can-a-large-language-model-find-all-lost-scrolls-in-a-data-repository
Hashtags
Publiceret 7. mar.
#data Never tried dbt but it is definitely popular judging by the amount of people talking about. I read these discussions on Reddit and I think it is worth sharing. https://www.reddit.com/r/dataengineering/s/tphxT7p0kI and https://www.reddit.com/r/dataengineering/s/YpOXP3y6av
Hashtags
Publiceret 7. apr.
#data Quite useful. I use pyarrow a lot and also a bit of polars. Mostly because pandas is slow. With the new 2.0 release, all three libraries are seamlessly connected to each other. https://datapythonista.me/blog/pandas-20-and-the-arrow-revolution-part-i
Hashtags
Publiceret 10. feb.
#data In physics, people claim that more is different. In the data world, more is very different. I'm no expert in big data, but I learned the scaling problem only when I started working for corporates. I like the following from the author. > data sizes increase much faster than compute sizes. In deep learning, many models are following a scaling law of performance and dataset size. Indeed, more data brings in better performance. But the increase in performance becomes really slow. Business doesn't need a perfect model. We also know computation costs money. At some point, we simply have to cut the dataset, even if we have all the data in the world. So ..., data hoarding is probably fine, but our models might not need that much. https://motherduck.com/blog/big-data-is-dead/
Hashtags