PDF Scraping and Data Review Interface (MVP for AI application)

- or -

Post a project like this

Ends in (days)

Fixed Price

£155(approx. $206)

Posted: 3 months ago
Proposals: 14
Remote
#4489858
OPPORTUNITY
Awarded

+ have already sent a proposal.

Description

Experience Level: Entry

We require a simple MVP application for Quark to test the first stage of a larger document-processing and AI project.

The application must scrape PDF files from a given web link, save them into a dedicated folder on our VPS, and catalogue the file details in a database. It must also run a local service on the VPS to extract text from the downloaded PDFs, clean that text, add basic metadata, and store the processed output in a database.

A simple internal UI is required so we can inspect the extracted text, metadata, and file records manually in order to assess and improve the quality of the extraction process.

The core requirements are:

scrape PDF files from a specified web link;
save downloaded PDFs into a dedicated folder structure;
record file details in a database, including source URL, file name, local file path, download date, file size, processing status, and any errors;
extract text from each PDF;
clean and normalise the extracted text;
store extracted text and metadata in the database;
include basic metadata such as source URL, access date, document title if available, document date if available, file size, page count if available, and extraction status;
provide a simple internal interface to view downloaded files, extracted text, metadata, and failed or incomplete processing.

This is a test / proof-of-concept build, not a full production system. The immediate objective is to prove a working pipeline for scraping, storing, extracting, cataloguing, and reviewing PDF content.

This MVP is intended as a precursor to a larger AI-based platform, so experience with document processing, structured data extraction, and AI-related workflows will be an advantage and may lead to ongoing development work.

New Proposal

Clarification Board Ask a Question

20 Apr 2026

Hi Robert,
What are some website examples?
And what pdf or data is required to be scrapped from them?
Thanks

Description

Robert B.

New Proposal

Clarification Board Ask a Question