
I need a database expert to extract text from a pdf to plaintext
- or -
Post a project like this$440
- Posted:
- Proposals: 42
- Remote
- #3858122
- OPPORTUNITY
- Expired
WordPress Expert | Web & App Developer | SEO Specialist | Content Writer | Blockchain | Python | OpenAI | Machine Learning

Data Scraper| Data Collection| Data Entry| Web Research| Lead Generation | Product Listing
Researcher, Data base building, Lead generation, E Mail database, Data Entry, Research base writing, Market Research,

Web Scraping & Data Extraction Specialist | Python Expert | API Integration | Lead Generation

31510058360598255736945859774864553091106532994884590455296228378048840883296841
Description
Experience Level: Expert
I have a PDF file of a book in Italian (748 pages) with 50 chapters (average of 14 pages per chapter), from which I need some (not all) text on the page to be extracted (copied & pasted) into 50 plaintext files, one sentence per line, one txt file per chapter.
I will provide you with 1 PDF file in Italian, and you will need to provide me with 50 plaintext files (.txt) containing the text from the 50 chapters, one sentence per line.
Important: you will need to pay attention to the accents as they are important in Italian - "é" is not the same as "e"; all accents from the original PDF need to be preserved in the plaintext files.
A sample page is attached - the parts highlighted in yellow need to be extracted, while the rest of the text (the phonetic spelling underneath every line of Italian) is to be ignored. Not all pages have the phonetic spelling, in which case all of the text needs to be extracted.
I will provide you with 1 PDF file in Italian, and you will need to provide me with 50 plaintext files (.txt) containing the text from the 50 chapters, one sentence per line.
Important: you will need to pay attention to the accents as they are important in Italian - "é" is not the same as "e"; all accents from the original PDF need to be preserved in the plaintext files.
A sample page is attached - the parts highlighted in yellow need to be extracted, while the rest of the text (the phonetic spelling underneath every line of Italian) is to be ignored. Not all pages have the phonetic spelling, in which case all of the text needs to be extracted.
Candice N.
0% (0)Projects Completed
-
Freelancers worked with
-
Projects awarded
0%
Last project
10 May 2026
United Kingdom
New Proposal
Login to your account and send a proposal now to get this project.
Log inClarification Board Ask a Question
-

please send me pdf file, first I will check it then i will tell you.
-

Can you please some sample so that we can sample work that can be attached
-

Hello Candice,
The attachment was missing. Are the pages in the PDF file saved as images or text?
Regards,
105145210509701050963
We collect cookies to enable the proper functioning and security of our website, and to enhance your experience. By clicking on 'Accept All Cookies', you consent to the use of these cookies. You can change your 'Cookies Settings' at any time. For more information, please read ourCookie Policy
Cookie Settings
Accept All Cookies