@dps_build · Post #2 · 02/28/2023, 06:00 PM
Pandas 2.0 将逐步适用 arrow 取代目前的 Numpy 来存储数据。读写性能及处理速度将大为提升。 https://datapythonista.me/blog/pandas-20-and-the-arrow-revolution-part-i #python#data
TGINSIGHT SIMILAR POSTS
Source channel @githubtrending · Post #15350 · Dec 21
#rust#ai#change_data_capture#context_engineering#data#data_engineering#data_indexing#data_infrastructure#data_processing#etl#hacktoberfest#help_wanted#indexing#knowledge_graph#llm#pipeline#python#rag#real_time#rust#semantic_search **CocoIndex** is a fast, open-source Python tool (Rust core) for transforming data into AI formats like vector indexes or knowledge graphs. Define simple data flows in ~100 lines of code using plug-and-play blocks for sources, embeddings, and targets—install via `pip install cocoindex`, add Postgres, and run. It auto-syncs fresh data with minimal recompute on changes, tracking lineage. **You save time building scalable RAG/semantic search pipelines effortlessly, avoiding complex ETL and stale data issues for production-ready AI apps.** https://github.com/cocoindex-io/cocoindex
@dps_build · Post #2 · 02/28/2023, 06:00 PM
Pandas 2.0 将逐步适用 arrow 取代目前的 Numpy 来存储数据。读写性能及处理速度将大为提升。 https://datapythonista.me/blog/pandas-20-and-the-arrow-revolution-part-i #python#data
@djangoproject · Post #280 · 03/23/2017, 03:02 PM
http://pybit.es/codechallenge11.html Inspired by David Beazley's #Generator Tricks for Systems Programmers we ask you to turn the following unix #pipeline into Python code using generators. To get a bunch of .py files you can use our challenges repo you cloned. Or use a project of your own. Note that in our experience one subprocess is not necessarily one generator, for example 'sort|uniq|sort' can be easily combined into one, as well as 'grep|sed'. See our template if you need guidance.
Hashtags
@fosspost · Post #527 · 10/03/2020, 06:09 AM
DigitalOcean accepts criticism and switches its #Hacktoberfest to be opt-in: https://github.com/digitalocean/hacktoberfest/pull/596
Hashtags
@djangoproject · Post #379 · 07/12/2017, 09:12 PM
http://zetcode.com/python/csv/ Python #CSV tutorial shows how to read and write CSV #data with Python csv module. #learn
@djangoproject · Post #464 · 10/16/2017, 08:07 AM
http://www.csestack.org/python-libraries-for-data-science/ As per the DIKW Pyramid Model, #Data_Science job revolves around finding the information, knowledge from Raw Data. And it can be bundled into the stack of 4 entities: source of #data manage and store data analyze the data display analyzed output (#visualization, statistics)
Hashtags
@djangoproject · Post #539 · 12/28/2017, 12:20 PM
Dash, announced this year, is an open source library for building web applications, especially those that make good use of #data visualization, in pure Python. It is built on top of #Flask, #Plotly.js and #React, and provides abstractions that free you from having to learn those frameworks and let you become productive quickly. #Dash is a #Python framework for building analytical web applications. No JavaScript required. https://plot.ly/products/dash/
@djangoproject · Post #275 · 03/18/2017, 01:51 AM
https://github.com/spotify/luigi Writing batch jobs is generally only one part of processing heaps of data; you also have to string all the jobs together into something resembling a #workflow or a #pipeline. #Luigi, created by Spotify and named for the other plucky plumber made famous by Nintendo, was built to "address all the plumbing typically associated with long-running batch processes." With Luigi, a developer can take several different unrelated data processing tasks — "a Hive query, a Hadoop job in Java, a Spark job in Scala, dumping a table from a database" — and create a workflow that runs them, end to end. The entire description of a job and its dependencies are created as Python modules, not as XML config files or another data format, so it can be integrated into other Python-centric projects. #Machine_learning
@djangoproject · Post #240 · 01/25/2017, 10:03 AM
http://www.aparat.com/v/4nbc9 This talk gives a quick overview of Python's capabilities as a #data_processing and #machine_learning tool through practical examples: gathering data from the web or a local file, validating/modifying it and finally analyzing it to build models for #classification and #prediction#tasks.
@djangoproject · Post #228 · 01/16/2017, 01:11 PM
http://www.aparat.com/v/miNUS pycon 2016- Andrew Godwin - Reinventing Django for the #Real_Time Web Django has long been tied to the #request_response pattern, but the upcoming "#channels" project changes this and allows #Django to natively support #WebSockets, running tasks after responses, easily handle #long_polling and more. Come and learn about the design, how we're trying to keep things as Django-like as possible, and how you can use it in your projects.
@djangoproject · Post #519 · 12/10/2017, 06:14 PM
https://blog.wallaroolabs.com/2017/12/stateful-multi-stream-processing-in-python-with-wallaroo/ #Wallaroo is a high-performance, open-source framework for building distributed stateful applications. In an earlier post, we looked at how Wallaroo scales #distributed_state. In this post, we’re going to see how you can use Wallaroo to implement multiple data processing #tasks performed over the same shared #state. We’ll be implementing an application we’ll call “Market Spread” that keeps track of the latest pricing information by stock while simultaneously using that state to determine whether stock order #requests should be rejected. #pipeline