
Document Extraction & AI Query Platform (second stage)
- or -
Post a project like this29
£210(approx. $284)
- Posted:
- Proposals: 15
- Remote
- #4492501
- OPPORTUNITY
- Open for Proposals
I Build Scalable Mobile Apps for Startups (React Native + AI + Backend)

Website Developer || Mobile App Developer || Website Designer || Ai Developer || Software Engineer

PPH's TOP Notch Website & Mobile App Developer & Designer(10+ yrs) ✔ Wordpress ✔ Shopify ✔ OpenCart ✔ Laravel ✔ PHP ✔ React Native ✔ Android ✔ iOS ✔HTML/CSS✔Javascript/jQuery✔Responsive Design✔ASP.net




♛ Most Trusted #1 Team |19+ years of expertise in Website, Mobile Apps, Desktop & Console Games. Wordpress, ReactJS, Shopify, Laravel, Python, React Native, Flutter, Unity, Unreal Engine and AR/VR




11193042815281287607211667437559836567263711900463121193806020673374646112834212275455
Description
Experience Level: Intermediate
Overview
We are building a system that collects and analyses documents from UK council websites.
Stage 1 has already been completed and is working. It successfully:
Scrapes a council website
Identifies and downloads document files (primarily PDFs)
Stores those files in a structured format
Extracts basic text for inspection
Stage 2 is to build on this foundation and develop a scalable backend system that can operate across multiple councils, organise documents, extract useful content, and enable AI-based querying of that data.
Scope
2A(i) – Scraping & Document System
Develop the existing scraper into a system that can:
Explore council websites and locate documents across multiple sections
Download and store documents in an organised and structured way
Track documents over time (new, existing, changed, duplicate)
Categorise documents (e.g. minutes, agendas, policies)
Extract basic information (titles, dates, sections where possible)
Provide clear visibility of what has been found, stored, and processed
2A(ii) – Multi-Council Validation
Extend the system from a single working example to at least 3 different council websites
Demonstrate that it adapts to different website structures
2B – Document Processing & Structuring
Extract readable text from documents
Clean and structure the content
Break documents into smaller usable sections
Link all extracted content back to its source
Prepare the data for both keyword and semantic search
2C – AI Query Capability
Accept natural language questions about council documents
Use AI to identify and retrieve relevant content
Return clear answers grounded in the documents
Include references to source material
Indicate when no reliable answer is available
Core Requirements
System must build directly on the existing Stage 1 functionality
Must be usable across multiple councils
Must be accessible via a backend interface (API)
Must run reliably and allow monitoring of processes
Must allow inspection of stored documents and extracted data
Must be structured so a multi-user frontend can be built on top
Deliverable
A working backend system that:
Extends the existing Stage 1 scraper into a multi-council system
Collects, tracks, and organises council documents
Extracts and structures document content
Supports AI-based querying with referenced answers
Has been demonstrated across multiple council websites
Please only provide FIXED bids. Placeholder bids will be immediately rejected. Any bid will be deemed your full and final price for the job. Please add the text 'This is my full and final bid based up your job description' to your message to confirm understanding of this.
The budget is only an auto suggestion by PPH and is not reflective of my assessment of the job value. Please take the time to calculate what you believe to be the cost and tailor your bid accordingly.
AI responses will be rejected.
We are building a system that collects and analyses documents from UK council websites.
Stage 1 has already been completed and is working. It successfully:
Scrapes a council website
Identifies and downloads document files (primarily PDFs)
Stores those files in a structured format
Extracts basic text for inspection
Stage 2 is to build on this foundation and develop a scalable backend system that can operate across multiple councils, organise documents, extract useful content, and enable AI-based querying of that data.
Scope
2A(i) – Scraping & Document System
Develop the existing scraper into a system that can:
Explore council websites and locate documents across multiple sections
Download and store documents in an organised and structured way
Track documents over time (new, existing, changed, duplicate)
Categorise documents (e.g. minutes, agendas, policies)
Extract basic information (titles, dates, sections where possible)
Provide clear visibility of what has been found, stored, and processed
2A(ii) – Multi-Council Validation
Extend the system from a single working example to at least 3 different council websites
Demonstrate that it adapts to different website structures
2B – Document Processing & Structuring
Extract readable text from documents
Clean and structure the content
Break documents into smaller usable sections
Link all extracted content back to its source
Prepare the data for both keyword and semantic search
2C – AI Query Capability
Accept natural language questions about council documents
Use AI to identify and retrieve relevant content
Return clear answers grounded in the documents
Include references to source material
Indicate when no reliable answer is available
Core Requirements
System must build directly on the existing Stage 1 functionality
Must be usable across multiple councils
Must be accessible via a backend interface (API)
Must run reliably and allow monitoring of processes
Must allow inspection of stored documents and extracted data
Must be structured so a multi-user frontend can be built on top
Deliverable
A working backend system that:
Extends the existing Stage 1 scraper into a multi-council system
Collects, tracks, and organises council documents
Extracts and structures document content
Supports AI-based querying with referenced answers
Has been demonstrated across multiple council websites
Please only provide FIXED bids. Placeholder bids will be immediately rejected. Any bid will be deemed your full and final price for the job. Please add the text 'This is my full and final bid based up your job description' to your message to confirm understanding of this.
The budget is only an auto suggestion by PPH and is not reflective of my assessment of the job value. Please take the time to calculate what you believe to be the cost and tailor your bid accordingly.
AI responses will be rejected.
Robert B.
100% (13)Projects Completed
8
Freelancers worked with
7
Projects awarded
61%
Last project
1 May 2026
United Kingdom
New Proposal
Login to your account and send a proposal now to get this project.
Log inClarification Board Ask a Question
-

- Can you describe how Stage 1 is currently built (tech stack, architecture, storage method), and what limitations you’ve already encountered with it?
- How accurate and granular do you want the document categorisation and structuring to be (basic tagging vs deeply structured sections and metadata)?
- What are your expectations from the AI querying experience, short answers, detailed summaries, or something closer to a research assistant with context?
1153597
We collect cookies to enable the proper functioning and security of our website, and to enhance your experience. By clicking on 'Accept All Cookies', you consent to the use of these cookies. You can change your 'Cookies Settings' at any time. For more information, please read ourCookie Policy
Cookie Settings
Accept All Cookies