Create research document detailing how to OCR one line of text using pure java only.

  • Posted
  • Proposals 0
  • Remote
  • #971
  • Archived
  • 0

Description

Experience Level: Intermediate
The project is to first of all do some research online and come up with a description of how the code could be created that I want done. If this is done well there could be a possibility of going on to actually doing the coding in another separate project.

So at first I want someone to do some research online and come up with useful notes on how you could go about writing an open source pure java program (i.e. no reliance on libraries that are windows or linux) that can perform OCR (optical character regognition) on the attached file called one_line_of_text.bmp. If its easier the file can be converted to tiff or whateverother format first.

The following open source code on google code is pure java and reads in an image (on a phone) and can analyze it:-

http://zxing.googlecode.com/svn/trunk/core/src/com/google/zxing/

and so some of this may be useful in this implementation.

The following C code is open source and does a good job of ocr on this bmp file:-

http://tesseract-ocr.googlecode.com/svn/trunk/

Here is the output showing it successfully processing the text in the bmp file:-

D:\tesseract_ocr>tesseract.exe one_line_of_text.bmp one_line_of_text.bmp.output
D:\tesseract_ocr>more one_line_of_text.bmp.output.txt
P.O. Box 5592 Northampton NN4 1ZY


D:\Projects\delme\tesseract_ocr>

Possibly some of this code could be converted to java for use in this project?

The end result would be open sourced if the coding went ahead.

I would estimate that the notes for this initial stage of the project would take half a day to a full day to do the research and create the notes, which should be in plain text or rtf or word format, which url references to all things found on the internet and headlines between sections. I would estimate if I was doing it myself to come up with roughly 2 - 5 pages of notes including excerpts. The person doing the work should have a good understanding of Java and quite a lot of experience in using it and be used to working with and ideally modifying open source code. The notes could include some ideas on how the program would be created using the research done online and the code or tools or pure java libraries found online.

(NOTE the attached file has a .txt ending as uploading bmp files was not supported on this site. Just download the file and remove the .txt ending)

Clarification Board

    There are no clarification messages.