TGTGInsighttelegram intelligenceLIVE / telegram public index
← GitHub Trends

TGINSIGHT SIMILAR POSTS

Find similar content

Source channel @githubtrending · Post #15573 · Mar 19

#java#a11y#accessibility#ai#bounding_box#document_parsing#eaa#html#json#markdown#ocr#ocr_recognition#pdf#pdf_accessibility#pdf_converter#pdf_extraction#pdf_parser#pdf_ua#rag#tables#tagged_pdf OpenDataLoader PDF is a free, open-source tool (Apache 2.0) that tops benchmarks with 0.90 accuracy for extracting structured data like Markdown, JSON (with bounding boxes), and HTML from any PDF—digital, scanned, or complex with tables, formulas, charts, and OCR in 80+ languages. It runs locally on CPU (0.05s/page fast mode), filters AI prompt injections for safety, integrates with LangChain/RAG, and automates accessibility tagging to Tagged PDF. You save time and costs on parsing for AI pipelines or compliance (vs. $50–200/manual doc), getting precise, private results for better LLM apps and legal standards. https://github.com/opendataloader-project/opendataloader-pdf

Results

10 similar posts found

djangoproject

@djangoproject · Post #329 · 05/04/2017, 04:34 AM

# The standard string repr for dicts is hard to read: »> my_mapping = {'a': 23, 'b': 42, 'c': 0xc0ffee} »> my_mapping {'b': 42, 'c': 12648430. 'a': 23} # 😞 # The "#json" module can do a much better job: »> import json »> print(json.dumps(my_mapping, indent=4, sort_keys=True)) { "a": 23, "b": 42, "c": 12648430 } # Note this only works with dicts containing # primitive types (check out the "pprint" module): »> json.dumps({all: 'yup'}) TypeError: keys must be a string

Hashtags

Libreware

@libreware · Post #1171 · 09/01/2023, 01:00 PM

Image to Text OCR is a utility website made by Alejandro Akbal for extracting text from any image using #OCR. This tool was made for those moments where you take a photo of some text and wish you could have it digitally. https://github.com/AlejandroAkbal/Image-to-Text-OCR Online: https://image-to-text-ocr.netlify.app/

Hashtags

The Devs

@thedevs · Post #2147 · 06/01/2025, 12:58 PM

JSON is dangerous (and slow). #article#json @thedevs https://thedevs.link/mnVS3t

JJ.ai (NFA)🪽

@jsmjsmxyz · Post #765 · 07/24/2019, 08:22 AM

优秀的程序员需要懂那些数学知识?#pdf 有人要我找帖子里提到的这本书《A Programmer's Introduction to Mathematics》(by Jeremy Kun),然后我找到了,但这个人找不到了。。。。

Hashtags

djangoproject

@djangoproject · Post #245 · 01/28/2017, 01:04 PM

https://github.com/tesseract-ocr/tesseract This package contains an #OCR (Optical character recognition) engine - libtesseract and a command line program - tesseract. The lead developer is Ray Smith. The maintainer is Zdenko Podobny. For a list of contributors see AUTHORS and github's log of contributors. #Tesseract has unicode (UTF-8) support, and can recognize more than 100 languages "out of the box". It can be trained to recognize other languages. See Tesseract Training for more information. Tesseract supports various output formats: plain-text, hocr(html), pdf. This project does not include a GUI application. If you need one, please see the 3rdParty wiki page. You should note that in many cases, in order to get better OCR results, you'll need to improve the quality of the image you are giving Tesseract.

Libreware

@libreware · Post #1213 · 12/20/2023, 03:12 AM

OSS Document Scanner Android Open Source app to #scan all your #documents. You either scan using your camera or by importing an image. The app will automatically detect you document within the photo and will crop the image. Once the document is created you can detect text within the document using #OCR. You can also share your document as a #PDF. If you want you can synchronize the app data with a webdav server (like nextloud) to never loose anything! https://github.com/Akylas/com.akylas.documentscanner https://apt.izzysoft.de/fdroid/index/apk/com.akylas.documentscanner

The Devs

@thedevs · Post #1999 · 09/20/2022, 07:02 AM

JSON Hero, a beautiful JSON viewer. #tools#json @thedevs https://kutt.it/lL2sWb

Hashtags

The Devs

@thedevs · Post #1329 · 12/11/2018, 06:56 PM

Why is 2 * (i * i) faster than 2 * i * i in Java? #coding#java @thedevs https://kutt.it/r5Vurp

Hashtags

JJ.ai (NFA)🪽

@jsmjsmxyz · Post #1108 · 12/22/2020, 10:01 AM

#OCR#Tools Newlearner 的 OCR 使用分享(离线篇) 🔌Offline OCR🔌 离线的 OCR 工具主要依赖离线库,处理精度上可能比不上在线接口,但优点是可以进行大批量的 OCR 工作,且处理速度较快。 🔍OwlOCR - 支持对 PDF, PNG, JPEG, GIF 文件进行 OCR - 支持在 iOS 设备上拍照,OwlOCR 上立即进行 OCR 处理 - 离线 OCR 多语言支持,包括简体中文和繁体中文,但 - 免费版保留了大部分功能,付费版可以提高 OCR 处理速度 🔍TextSniper - 小巧轻量,使用方便 - 支持 OCR 结果叠加至剪切板 - 离线多语言支持 - 买断制 app,包含在 Setapp 订阅中 👀 以上提到的几款 OCR 工具都是在 Win/Mac 端使用的,至于移动端我比较推荐的是「白描」。 我对 OCR 识别精度要求不高,因此使用的是 Bob 的免费接口;OCRmyPDF则是我扫描大型 PDF 文档时采取的方案。 🎗「天若 OCR」与「白描」 即将迎来优惠促销活动,有需要的朋友们可以考虑入手。 📘 关联阅读: 1⃣️OCRmyPDF·给你的PDF文档添加文字层 2⃣️alfred-ocr:macOS 上的多接口 Alfred OCR / 翻译插件 频道:@NewlearnerChannel

Hashtags