
Text extraction from invoices
- or -
Post a project like this1752
$1.0k
- Posted:
- Proposals: 5
- Remote
- #3191733
- OPPORTUNITY
- Awarded
Description
Experience Level: Expert
Work to do.
1. Description
We have bboxes that has been added to the invoices
Below the table in the invoice we must consider they are moving in Y-direction since the table with prices expands and retracts.
We have made a model with the table so it detects the table and the values.
This can probably be enhanced.
In the attached image I show an example of the problem.
Same supplier can send an invoice with different number of lines in the table. This means the text below, the subtotal, VAT and total can be on different pages
As you can also see on the invoice the lines can grow and shrink in height itself.
We can do bounding boxes on a very large (many pages) invoice but the next invoice can be even bigger or very small (= 1 page for example).
The areas we need to extract then is moving up and down and can be on different pages, but we still need to detect the bbox as we must be able to extract the data from those areas. So we must detect the text itself and utilise the bounding box.
another problem is exceeding strings in bboxes. see the next two images.
These are the areas we must solve ASAP -> Weekend work
2. Skills
Python
MySQL
5+ years
vision
text extraction from bounding boxes
OpenCV, NLP, spaCy, regex, tesseract, OCR, PDFXML, TableNet, DeepDeSRT, Graph neural networks, GANs and genetic algorithm
But it is allowed to do it simpler too.
You must have done something similar previously and you know what regex and tesseract is and have used it several times.
You have worked with vision, ML, DL or NN
Price is a price holder
1. Description
We have bboxes that has been added to the invoices
Below the table in the invoice we must consider they are moving in Y-direction since the table with prices expands and retracts.
We have made a model with the table so it detects the table and the values.
This can probably be enhanced.
In the attached image I show an example of the problem.
Same supplier can send an invoice with different number of lines in the table. This means the text below, the subtotal, VAT and total can be on different pages
As you can also see on the invoice the lines can grow and shrink in height itself.
We can do bounding boxes on a very large (many pages) invoice but the next invoice can be even bigger or very small (= 1 page for example).
The areas we need to extract then is moving up and down and can be on different pages, but we still need to detect the bbox as we must be able to extract the data from those areas. So we must detect the text itself and utilise the bounding box.
another problem is exceeding strings in bboxes. see the next two images.
These are the areas we must solve ASAP -> Weekend work
2. Skills
Python
MySQL
5+ years
vision
text extraction from bounding boxes
OpenCV, NLP, spaCy, regex, tesseract, OCR, PDFXML, TableNet, DeepDeSRT, Graph neural networks, GANs and genetic algorithm
But it is allowed to do it simpler too.
You must have done something similar previously and you know what regex and tesseract is and have used it several times.
You have worked with vision, ML, DL or NN
Price is a price holder
Robert W.
100% (3)Projects Completed
3
Freelancers worked with
3
Projects awarded
17%
Last project
10 Jul 2021
United Kingdom
New Proposal
Login to your account and send a proposal now to get this project.
Log inClarification Board Ask a Question
-

missing attached images................
Robert W.08 Mar 2021you get that when you apply for the project
974386
We collect cookies to enable the proper functioning and security of our website, and to enhance your experience. By clicking on 'Accept All Cookies', you consent to the use of these cookies. You can change your 'Cookies Settings' at any time. For more information, please read ourCookie Policy
Cookie Settings
Accept All Cookies