TGTGInsighttelegram intelligenceLIVE / telegram public index

TGINSIGHT SIMILAR POSTS

Find similar content

Source channel @githubtrending · Post #15573 · Mar 19

#java#a11y#accessibility#ai#bounding_box#document_parsing#eaa#html#json#markdown#ocr#ocr_recognition#pdf#pdf_accessibility#pdf_converter#pdf_extraction#pdf_parser#pdf_ua#rag#tables#tagged_pdf OpenDataLoader PDF is a free, open-source tool (Apache 2.0) that tops benchmarks with 0.90 accuracy for extracting structured data like Markdown, JSON (with bounding boxes), and HTML from any PDF—digital, scanned, or complex with tables, formulas, charts, and OCR in 80+ languages. It runs locally on CPU (0.05s/page fast mode), filters AI prompt injections for safety, integrates with LangChain/RAG, and automates accessibility tagging to Tagged PDF. You save time and costs on parsing for AI pipelines or compliance (vs. $50–200/manual doc), getting precise, private results for better LLM apps and legal standards. https://github.com/opendataloader-project/opendataloader-pdf

Hashtags

Results

10 similar posts found

djangoproject

@djangoproject · Post #329 · 05/04/2017, 04:34 AM

Find similar View

# The standard string repr for dicts is hard to read: »> my_mapping = {'a': 23, 'b': 42, 'c': 0xc0ffee} »> my_mapping {'b': 42, 'c': 12648430. 'a': 23} # 😞 # The "#json" module can do a much better job: »> import json »> print(json.dumps(my_mapping, indent=4, sort_keys=True)) { "a": 23, "b": 42, "c": 12648430 } # Note this only works with dicts containing # primitive types (check out the "pprint" module): »> json.dumps({all: 'yup'}) TypeError: keys must be a string

Hashtags

#json

Libreware

@libreware · Post #1171 · 09/01/2023, 01:00 PM

Find similar View

Image to Text OCR is a utility website made by Alejandro Akbal for extracting text from any image using #OCR. This tool was made for those moments where you take a photo of some text and wish you could have it digitally. https://github.com/AlejandroAkbal/Image-to-Text-OCR Online: https://image-to-text-ocr.netlify.app/

Hashtags

#ocr

djangoproject

@djangoproject · Post #356 · 07/02/2017, 04:42 PM

Find similar View

https://github.com/django-json-api/django-rest-framework-json-api #JSON#API support for #Django_REST_Framework

Hashtags

#json #api #django_rest_framework

The Devs

@thedevs · Post #2147 · 06/01/2025, 12:58 PM

Find similar View

JSON is dangerous (and slow). #article#json @thedevs https://thedevs.link/mnVS3t

Hashtags

#article #json

JJ.ai (NFA)🪽

@jsmjsmxyz · Post #765 · 07/24/2019, 08:22 AM

Find similar View

优秀的程序员需要懂那些数学知识？#pdf 有人要我找帖子里提到的这本书《A Programmer's Introduction to Mathematics》(by Jeremy Kun)，然后我找到了，但这个人找不到了。。。。

Hashtags

#pdf

djangoproject

@djangoproject · Post #245 · 01/28/2017, 01:04 PM

Find similar View

https://github.com/tesseract-ocr/tesseract This package contains an #OCR (Optical character recognition) engine - libtesseract and a command line program - tesseract. The lead developer is Ray Smith. The maintainer is Zdenko Podobny. For a list of contributors see AUTHORS and github's log of contributors. #Tesseract has unicode (UTF-8) support, and can recognize more than 100 languages "out of the box". It can be trained to recognize other languages. See Tesseract Training for more information. Tesseract supports various output formats: plain-text, hocr(html), pdf. This project does not include a GUI application. If you need one, please see the 3rdParty wiki page. You should note that in many cases, in order to get better OCR results, you'll need to improve the quality of the image you are giving Tesseract.

Hashtags

#ocr #tesseract

Libreware

@libreware · Post #1213 · 12/20/2023, 03:12 AM

Find similar View

OSS Document Scanner Android Open Source app to #scan all your #documents. You either scan using your camera or by importing an image. The app will automatically detect you document within the photo and will crop the image. Once the document is created you can detect text within the document using #OCR. You can also share your document as a #PDF. If you want you can synchronize the app data with a webdav server (like nextloud) to never loose anything! https://github.com/Akylas/com.akylas.documentscanner https://apt.izzysoft.de/fdroid/index/apk/com.akylas.documentscanner

Hashtags

#scan #documents #ocr #pdf

The Devs

@thedevs · Post #1999 · 09/20/2022, 07:02 AM

Find similar View

JSON Hero, a beautiful JSON viewer. #tools#json @thedevs https://kutt.it/lL2sWb

Hashtags

#tools #json

The Devs

@thedevs · Post #1329 · 12/11/2018, 06:56 PM

Find similar View

Why is 2 * (i * i) faster than 2 * i * i in Java? #coding#java @thedevs https://kutt.it/r5Vurp

Hashtags

#coding #java

JJ.ai (NFA)🪽

@jsmjsmxyz · Post #1108 · 12/22/2020, 10:01 AM

Find similar View

#OCR#Tools Newlearner 的 OCR 使用分享（离线篇） 🔌Offline OCR🔌 离线的 OCR 工具主要依赖离线库，处理精度上可能比不上在线接口，但优点是可以进行大批量的 OCR 工作，且处理速度较快。 🔍OwlOCR - 支持对 PDF, PNG, JPEG, GIF 文件进行 OCR - 支持在 iOS 设备上拍照，OwlOCR 上立即进行 OCR 处理 - 离线 OCR 多语言支持，包括简体中文和繁体中文，但 - 免费版保留了大部分功能，付费版可以提高 OCR 处理速度 🔍TextSniper - 小巧轻量，使用方便 - 支持 OCR 结果叠加至剪切板 - 离线多语言支持 - 买断制 app，包含在 Setapp 订阅中 👀 以上提到的几款 OCR 工具都是在 Win/Mac 端使用的，至于移动端我比较推荐的是「白描」。我对 OCR 识别精度要求不高，因此使用的是 Bob 的免费接口；OCRmyPDF则是我扫描大型 PDF 文档时采取的方案。 🎗「天若 OCR」与「白描」即将迎来优惠促销活动，有需要的朋友们可以考虑入手。 📘 关联阅读： 1⃣️OCRmyPDF·给你的PDF文档添加文字层 2⃣️alfred-ocr：macOS 上的多接口 Alfred OCR / 翻译插件频道：@NewlearnerChannel

Hashtags

#ocr #tools