
PDF Scraping and Data Review Interface (MVP for AI application)
- or -
Post a project like this24
£155(approx. $209)
- Posted:
- Proposals: 16
- Remote
- #4489858
- OPPORTUNITY
- Awarded
Website Developer || Mobile App Developer || Website Designer || Ai Developer || Software Engineer

Python & WordPress Expert | Web Scraping | React.js | Automation | Data Extraction
♛ Most Trusted #1 Team |19+ years of expertise in Website, Mobile Apps, Desktop & Console Games. Wordpress, ReactJS, Shopify, Laravel, Python, React Native, Flutter, Unity, Unreal Engine and AR/VR




Full-Stack Web & Mobile App Developer | Fast MVPs (2–4 Weeks) | SaaS & AI Apps
AI/ML & Web Development Agency | Automation, Chatbots & Data Science | Designing Team

1254929212876072131091161181845612275455128342121715121006198537464611228464066713195041591
Description
Experience Level: Entry
We require a simple MVP application for Quark to test the first stage of a larger document-processing and AI project.
The application must scrape PDF files from a given web link, save them into a dedicated folder on our VPS, and catalogue the file details in a database. It must also run a local service on the VPS to extract text from the downloaded PDFs, clean that text, add basic metadata, and store the processed output in a database.
A simple internal UI is required so we can inspect the extracted text, metadata, and file records manually in order to assess and improve the quality of the extraction process.
The core requirements are:
scrape PDF files from a specified web link;
save downloaded PDFs into a dedicated folder structure;
record file details in a database, including source URL, file name, local file path, download date, file size, processing status, and any errors;
extract text from each PDF;
clean and normalise the extracted text;
store extracted text and metadata in the database;
include basic metadata such as source URL, access date, document title if available, document date if available, file size, page count if available, and extraction status;
provide a simple internal interface to view downloaded files, extracted text, metadata, and failed or incomplete processing.
This is a test / proof-of-concept build, not a full production system. The immediate objective is to prove a working pipeline for scraping, storing, extracting, cataloguing, and reviewing PDF content.
This MVP is intended as a precursor to a larger AI-based platform, so experience with document processing, structured data extraction, and AI-related workflows will be an advantage and may lead to ongoing development work.
The application must scrape PDF files from a given web link, save them into a dedicated folder on our VPS, and catalogue the file details in a database. It must also run a local service on the VPS to extract text from the downloaded PDFs, clean that text, add basic metadata, and store the processed output in a database.
A simple internal UI is required so we can inspect the extracted text, metadata, and file records manually in order to assess and improve the quality of the extraction process.
The core requirements are:
scrape PDF files from a specified web link;
save downloaded PDFs into a dedicated folder structure;
record file details in a database, including source URL, file name, local file path, download date, file size, processing status, and any errors;
extract text from each PDF;
clean and normalise the extracted text;
store extracted text and metadata in the database;
include basic metadata such as source URL, access date, document title if available, document date if available, file size, page count if available, and extraction status;
provide a simple internal interface to view downloaded files, extracted text, metadata, and failed or incomplete processing.
This is a test / proof-of-concept build, not a full production system. The immediate objective is to prove a working pipeline for scraping, storing, extracting, cataloguing, and reviewing PDF content.
This MVP is intended as a precursor to a larger AI-based platform, so experience with document processing, structured data extraction, and AI-related workflows will be an advantage and may lead to ongoing development work.
Robert B.
100% (13)Projects Completed
7
Freelancers worked with
6
Projects awarded
65%
Last project
13 Apr 2026
United Kingdom
New Proposal
Login to your account and send a proposal now to get this project.
Log inClarification Board Ask a Question
-

Hi Robert,
What are some website examples?
And what pdf or data is required to be scrapped from them?
Thanks
1152940
We collect cookies to enable the proper functioning and security of our website, and to enhance your experience. By clicking on 'Accept All Cookies', you consent to the use of these cookies. You can change your 'Cookies Settings' at any time. For more information, please read ourCookie Policy
Cookie Settings
Accept All Cookies