TGTGInsighttelegram intelligenceLIVE / telegram public index
← GitHub Trends

TGINSIGHT SIMILAR POSTS

Find similar content

Source channel @githubtrending · Post #15163 · Sep 24

#python#document_analysis#layout_analysis#ocr#parser#pdf#pdf_converter#pdf_parser#python#vlm_ocr Dolphin is a smart AI tool that can analyze and understand complex document images, like pages with text, tables, formulas, and pictures. It works in two steps: first, it figures out the layout and reading order of the page; then, it quickly parses each element using special prompts. This makes it fast and accurate for turning document images into structured data like JSON or Markdown. You can use pre-trained models and easy code to process single pages, PDFs, or specific elements. This helps you save time and effort when extracting information from complicated documents efficiently. https://github.com/bytedance/Dolphin

Results

1 similar post found

Search: #lxml

当前筛选 #lxml清除筛选
djangoproject

@djangoproject · Post #551 · 01/23/2018, 04:28 PM

http://lxml.de/ #lxml is the most feature-rich and easy-to-use library for processing #XML and #HTML in the Python language. The lxml XML toolkit is a Pythonic binding for the #C libraries #libxml2 and #libxslt. It is unique in that it combines the speed and XML feature completeness of these libraries with the simplicity of a native Python #API, mostly compatible but superior to the well-known ElementTree API. The latest release works with all #CPython versions from 2.6 to 3.6. See the introduction for more information about background and goals of the lxml project. Some common questions are answered in the FAQ.