TGTGInsighttelegram intelligenceLIVE / telegram public index
← GitHub Trends

TGINSIGHT SIMILAR POSTS

Find similar content

Source channel @githubtrending · Post #15163 · Sep 24

#python#document_analysis#layout_analysis#ocr#parser#pdf#pdf_converter#pdf_parser#python#vlm_ocr Dolphin is a smart AI tool that can analyze and understand complex document images, like pages with text, tables, formulas, and pictures. It works in two steps: first, it figures out the layout and reading order of the page; then, it quickly parses each element using special prompts. This makes it fast and accurate for turning document images into structured data like JSON or Markdown. You can use pre-trained models and easy code to process single pages, PDFs, or specific elements. This helps you save time and effort when extracting information from complicated documents efficiently. https://github.com/bytedance/Dolphin

Results

10 similar posts found

Libreware

@libreware · Post #1171 · 09/01/2023, 01:00 PM

Image to Text OCR is a utility website made by Alejandro Akbal for extracting text from any image using #OCR. This tool was made for those moments where you take a photo of some text and wish you could have it digitally. https://github.com/AlejandroAkbal/Image-to-Text-OCR Online: https://image-to-text-ocr.netlify.app/

Hashtags

JJ.ai (NFA)🪽

@jsmjsmxyz · Post #765 · 07/24/2019, 08:22 AM

优秀的程序员需要懂那些数学知识?#pdf 有人要我找帖子里提到的这本书《A Programmer's Introduction to Mathematics》(by Jeremy Kun),然后我找到了,但这个人找不到了。。。。

Hashtags

Libreware

@libreware · Post #1213 · 12/20/2023, 03:12 AM

OSS Document Scanner Android Open Source app to #scan all your #documents. You either scan using your camera or by importing an image. The app will automatically detect you document within the photo and will crop the image. Once the document is created you can detect text within the document using #OCR. You can also share your document as a #PDF. If you want you can synchronize the app data with a webdav server (like nextloud) to never loose anything! https://github.com/Akylas/com.akylas.documentscanner https://apt.izzysoft.de/fdroid/index/apk/com.akylas.documentscanner

djangoproject

@djangoproject · Post #245 · 01/28/2017, 01:04 PM

https://github.com/tesseract-ocr/tesseract This package contains an #OCR (Optical character recognition) engine - libtesseract and a command line program - tesseract. The lead developer is Ray Smith. The maintainer is Zdenko Podobny. For a list of contributors see AUTHORS and github's log of contributors. #Tesseract has unicode (UTF-8) support, and can recognize more than 100 languages "out of the box". It can be trained to recognize other languages. See Tesseract Training for more information. Tesseract supports various output formats: plain-text, hocr(html), pdf. This project does not include a GUI application. If you need one, please see the 3rdParty wiki page. You should note that in many cases, in order to get better OCR results, you'll need to improve the quality of the image you are giving Tesseract.

JJ.ai (NFA)🪽

@jsmjsmxyz · Post #1108 · 12/22/2020, 10:01 AM

#OCR#Tools Newlearner 的 OCR 使用分享(离线篇) 🔌Offline OCR🔌 离线的 OCR 工具主要依赖离线库,处理精度上可能比不上在线接口,但优点是可以进行大批量的 OCR 工作,且处理速度较快。 🔍OwlOCR - 支持对 PDF, PNG, JPEG, GIF 文件进行 OCR - 支持在 iOS 设备上拍照,OwlOCR 上立即进行 OCR 处理 - 离线 OCR 多语言支持,包括简体中文和繁体中文,但 - 免费版保留了大部分功能,付费版可以提高 OCR 处理速度 🔍TextSniper - 小巧轻量,使用方便 - 支持 OCR 结果叠加至剪切板 - 离线多语言支持 - 买断制 app,包含在 Setapp 订阅中 👀 以上提到的几款 OCR 工具都是在 Win/Mac 端使用的,至于移动端我比较推荐的是「白描」。 我对 OCR 识别精度要求不高,因此使用的是 Bob 的免费接口;OCRmyPDF则是我扫描大型 PDF 文档时采取的方案。 🎗「天若 OCR」与「白描」 即将迎来优惠促销活动,有需要的朋友们可以考虑入手。 📘 关联阅读: 1⃣️OCRmyPDF·给你的PDF文档添加文字层 2⃣️alfred-ocr:macOS 上的多接口 Alfred OCR / 翻译插件 频道:@NewlearnerChannel

Hashtags

JJ.ai (NFA)🪽

@jsmjsmxyz · Post #1107 · 12/22/2020, 07:13 AM

#OCR#Tools Newlearner 的 OCR 使用分享(在线篇) 通常在图片、PDF文档中提取文字,我们都会使用 OCR(Optical Character Recognition) 技术,今天就和大家分享一下几款比较优秀的 OCR 工具 ☁️Online OCR ☁️ 在线 OCR 大多是调用云 OCR 引擎进行处理,对得到的结果进行优化后再输出,所以精确度、还原度会更高。因为大多数 OCR 接口都需要付费,所以有一定的使用成本。 🔍iText - 使用 Google & 百度 & 腾讯 OCR 接口,识别精准度高 - 独创算法,优化识别结果 - 支持识别后翻译 - 每月免费体验20次,Pro 版支持月/年付订阅 🔍天若OCR - 一款 Windows 平台上的 OCR 工具 - 支持表格识别、竖排识别、LaTex 公式识别、翻译功能 - 支持自定义文本接口 - 提供免费版与付费版,付费版采取买断制 🔍Bob - 本质是一款翻译工具,但其附带的 OCR 功能可以满足日常使用 - 支持自定义文本接口,默认使用百度智能云 OCR 接口 - 半开源,免费 - Bob 的作者十分贴心,在使用文档中给出了各大 OCR 接口(百度、腾讯、搜狗、有道)的申请方式:教程地址 频道:@NewlearnerChannel

Hashtags

FOSS Post

@fosspost · Post #772 · 10/05/2021, 05:48 AM

Version 3.10 of the legendary programming language is now here: https://www.python.org/downloads/release/python-3100 No rush to update, though. #Python

Hashtags

FOSS Post

@fosspost · Post #593 · 12/23/2020, 05:38 PM

#Python is the main language of data science, per this analysis on 10M Jupyter Notebooks: https://blog.jetbrains.com/datalore/2020/12/17/we-downloaded-10-000-000-jupyter-notebooks-from-github-this-is-what-we-learned/

Hashtags

JJ.ai (NFA)🪽

@jsmjsmxyz · Post #995 · 04/09/2020, 08:07 AM

#Github情报#OCR OCRmyPDF 给你的PDF文档添加文字层 Github | WiKi OCRmyPDF将OCR文本层添加到扫描的PDF文件中,从而可以对其进行搜索或复制粘贴。 ✨ 特点 - 使用强大的开源 Tesseract OCR引擎识别,支持100多种语言 - 调用全部可用CPU资源进行OCR(耗电警告⚠️ - 从常规PDF生成可搜索的PDF文件 - 优化PDF尺寸,生成比输入文件小的文件 - 在执行OCR之前对图像进行歪斜校正和/或清洁 🔍部署 - 支持多种操作系统 Linux, Win, macOS … - 支持 brew install ocrmypdf 但需要自己安装语言库 - macOS 一键安装脚本 (努力更新中 - 可配合 Alfred / Launchabr 制作成 Workflow 使用 👀 没有文字层的PDF文献/文档真的难受,OCRmyPDF的扫描精准度虽然说不是特别高, 但有了文字层,我们就可以方便的在文档里做标注了~ 频道:@NewlearnerChannel

djangoproject

@djangoproject · Post #375 · 07/07/2017, 07:57 PM

https://simpleisbetterthancomplex.com/2015/11/23/small-open-source-django-projects-to-get-started.html Small Open-Source Django Projects to Get Started Learning #Django and #Python can be very fun. I personally love programming with Python and for the most part, work with the Django framework. But in the beginning some stuff can be confusing, especially if you are coming from a Java or C♯ background, like me.