
OCR manual proofreading - accuracy paramount
- or -
Post a project like this27
£200(approx. $268)
- Posted:
- Proposals: 15
- Remote
- #4482519
- Awarded
Website Designer | WordPress, Shopify, Webflow, Wix & GoHighLevel Expert | Landing Pages, Sales Funnels, SEO, UI/UX, Conversion Optimization, Lead Generation & Automation
Full Stack Web & Mobile App Developer | Expert in Android & iOS |Graphic Design| Video Editing & Animation|Certified & Top Rated

6671319600693912216595814201489978413143278587821059915104785810840853474928
Description
Experience Level: Expert
Overview
Two weeks ago I hired someone for this task and the work was unusable.
Do not apply unless you:
can follow detailed instructions exactly
are willing to manually check your work
Data
Raw folders (top link) + PDFs (bottom link, same material, lower res):
https://www.dropbox.com/scl/fo/ixwakdbgdhodiq9sqlbk6/AAuJLgrYLb2WB1a--E_b76A?rlkey=y0f9poa3t1rpalmemh2qljs2n&st=yp5f8q0t&dl=0
https://www.dropbox.com/scl/fo/0hffxzzltdbx7z0koew3a/AFCl98iUlBpUN180vatAaXI?rlkey=0m1uzuig0hg1xj2ugg5ns2rtp&st=xj6g51u1&dl=0
Poor OCR that needs correcting - but please note, not all the folders have OCR because the previous freelancer did such a bad job
https://www.dropbox.com/scl/fo/w8bg7k6y9xkq30thxipda/AA2hhtWQVkfdVHBvbnxxpYM?rlkey=cxbgqy0a5r5gqyl0ibh5r5uzb&st=ydh6jeqb&dl=0
Payment Structure
£30 initial deposit
Deliver 10 complete .txt files (one per folder) as a sample
Work will be reviewed for accuracy before continuing
OCR Transcription Instructions (Mandatory)
1. File Structure
One folder = one .txt file
Include all documents in that folder
2. Document Format (Exact)
==============================
ARCHIVE_FOLDER: TBUK1
DOCUMENT_ID: TBUK1_01
DATE: (exactly as written or Unknown)
PLACE: (exactly as written or Not stated)
Then the transcription.
3. Numbering
Sequential: TBUK1_01, TBUK1_02, etc.
No skipping
No restarting within folder
4. Multi-page Documents
A document may span multiple pages or PDFs
It must remain ONE DOCUMENT_ID
5. Transcription Rules
You must:
copy text exactly
preserve paragraphs and headings
You must not:
correct spelling
rewrite
summarise
clean or improve the text
6. Illegible Text
Use: [illegible] or [illegible text]
Do not guess
7. Remove Only
page numbers
obvious scan artefacts
8. Duplicates
duplicate pages → transcribe once
overlapping scans → remove repetition
9. Diagrams
Do not transcribe drawings.
Use:
HAND-DRAWN ENGINEERING DRAWING
10. Metadata
DATE: exact or Unknown
PLACE: exact or Not stated
do not infer
11. Output
.txt only
UTF-8 encoding
no formatting
Two weeks ago I hired someone for this task and the work was unusable.
Do not apply unless you:
can follow detailed instructions exactly
are willing to manually check your work
Data
Raw folders (top link) + PDFs (bottom link, same material, lower res):
https://www.dropbox.com/scl/fo/ixwakdbgdhodiq9sqlbk6/AAuJLgrYLb2WB1a--E_b76A?rlkey=y0f9poa3t1rpalmemh2qljs2n&st=yp5f8q0t&dl=0
https://www.dropbox.com/scl/fo/0hffxzzltdbx7z0koew3a/AFCl98iUlBpUN180vatAaXI?rlkey=0m1uzuig0hg1xj2ugg5ns2rtp&st=xj6g51u1&dl=0
Poor OCR that needs correcting - but please note, not all the folders have OCR because the previous freelancer did such a bad job
https://www.dropbox.com/scl/fo/w8bg7k6y9xkq30thxipda/AA2hhtWQVkfdVHBvbnxxpYM?rlkey=cxbgqy0a5r5gqyl0ibh5r5uzb&st=ydh6jeqb&dl=0
Payment Structure
£30 initial deposit
Deliver 10 complete .txt files (one per folder) as a sample
Work will be reviewed for accuracy before continuing
OCR Transcription Instructions (Mandatory)
1. File Structure
One folder = one .txt file
Include all documents in that folder
2. Document Format (Exact)
==============================
ARCHIVE_FOLDER: TBUK1
DOCUMENT_ID: TBUK1_01
DATE: (exactly as written or Unknown)
PLACE: (exactly as written or Not stated)
Then the transcription.
3. Numbering
Sequential: TBUK1_01, TBUK1_02, etc.
No skipping
No restarting within folder
4. Multi-page Documents
A document may span multiple pages or PDFs
It must remain ONE DOCUMENT_ID
5. Transcription Rules
You must:
copy text exactly
preserve paragraphs and headings
You must not:
correct spelling
rewrite
summarise
clean or improve the text
6. Illegible Text
Use: [illegible] or [illegible text]
Do not guess
7. Remove Only
page numbers
obvious scan artefacts
8. Duplicates
duplicate pages → transcribe once
overlapping scans → remove repetition
9. Diagrams
Do not transcribe drawings.
Use:
HAND-DRAWN ENGINEERING DRAWING
10. Metadata
DATE: exact or Unknown
PLACE: exact or Not stated
do not infer
11. Output
.txt only
UTF-8 encoding
no formatting
Sasha W.
100% (16)Projects Completed
20
Freelancers worked with
18
Projects awarded
44%
Last project
13 Mar 2026
United Kingdom
New Proposal
Login to your account and send a proposal now to get this project.
Log inClarification Board Ask a Question
-

Hi Sasha, could you confirm the total volume after the 10-file sample - either folder count plus approximate page/image count, or how much of the archive the £200 is meant to cover? Also, after the sample is approved, will the remaining work be released as funded milestones before continuation?
1151369
We collect cookies to enable the proper functioning and security of our website, and to enhance your experience. By clicking on 'Accept All Cookies', you consent to the use of these cookies. You can change your 'Cookies Settings' at any time. For more information, please read ourCookie Policy
Cookie Settings
Accept All Cookies