Document Extraction & AI Query Platform (second stage)

- or -

Post a project like this

Ends in (days)

Fixed Price

£210(approx. $284)

Posted: 31 minutes ago
Proposals: 15
Remote
#4492501
OPPORTUNITY
Open for Proposals

+ have already sent a proposal.

Description

Experience Level: Intermediate

Overview

We are building a system that collects and analyses documents from UK council websites.

Stage 1 has already been completed and is working. It successfully:

Scrapes a council website
Identifies and downloads document files (primarily PDFs)
Stores those files in a structured format
Extracts basic text for inspection

Stage 2 is to build on this foundation and develop a scalable backend system that can operate across multiple councils, organise documents, extract useful content, and enable AI-based querying of that data.

Scope
2A(i) – Scraping & Document System

Develop the existing scraper into a system that can:

Explore council websites and locate documents across multiple sections
Download and store documents in an organised and structured way
Track documents over time (new, existing, changed, duplicate)
Categorise documents (e.g. minutes, agendas, policies)
Extract basic information (titles, dates, sections where possible)
Provide clear visibility of what has been found, stored, and processed
2A(ii) – Multi-Council Validation
Extend the system from a single working example to at least 3 different council websites
Demonstrate that it adapts to different website structures
2B – Document Processing & Structuring
Extract readable text from documents
Clean and structure the content
Break documents into smaller usable sections
Link all extracted content back to its source
Prepare the data for both keyword and semantic search
2C – AI Query Capability
Accept natural language questions about council documents
Use AI to identify and retrieve relevant content
Return clear answers grounded in the documents
Include references to source material
Indicate when no reliable answer is available
Core Requirements
System must build directly on the existing Stage 1 functionality
Must be usable across multiple councils
Must be accessible via a backend interface (API)
Must run reliably and allow monitoring of processes
Must allow inspection of stored documents and extracted data
Must be structured so a multi-user frontend can be built on top
Deliverable

A working backend system that:

Extends the existing Stage 1 scraper into a multi-council system
Collects, tracks, and organises council documents
Extracts and structures document content
Supports AI-based querying with referenced answers
Has been demonstrated across multiple council websites

Please only provide FIXED bids. Placeholder bids will be immediately rejected. Any bid will be deemed your full and final price for the job. Please add the text 'This is my full and final bid based up your job description' to your message to confirm understanding of this.
The budget is only an auto suggestion by PPH and is not reflective of my assessment of the job value. Please take the time to calculate what you believe to be the cost and tailor your bid accordingly.
AI responses will be rejected.

New Proposal

Clarification Board Ask a Question

57 minutes ago

- Can you describe how Stage 1 is currently built (tech stack, architecture, storage method), and what limitations you’ve already encountered with it?

- How accurate and granular do you want the document categorisation and structuring to be (basic tagging vs deeply structured sections and metadata)?

- What are your expectations from the AI querying experience, short answers, detailed summaries, or something closer to a research assistant with context?

Description

Robert B.

New Proposal

Clarification Board Ask a Question