
Experienced Python Developers for large web scraping project
- or -
Post a project like this£23/hr(approx. $31/hr)
- Posted:
- Proposals: 18
- Remote
- #3369517
- Expired
Top Cert WordPress|Shopify|Wix|PHP|React JS|Full Stack| Developer|177 (5stars) and only single (4stars)

WordPress Expert | Web & App Developer | SEO Specialist | Content Writer | Blockchain | Python | OpenAI | Machine Learning

WordPress, AWS, React.js Developer, Graphic Designing, Content Writing, Digital Marketing, Remote IT Support
♛ Professional UK No. 1 ♛ Software Programming Agency , Specialised in ✔ Magento ✔ Wordrpess ✔ Shopify ✔ OpenCart ✔ Laravel ✔ Android ✔ iOS ✔ HTML\CSS✔Javascript\jQuery✔Responsive Design, React Js

Top rated PHP Web Development | WordPress | Magento | Drupal | OpenCart | PrestaShop | Joomla

Sr. Software Engineer | Python | AWS | Serverless | System Design | Distributed Systems
319813231510055288771249900614111883491639631233410507546214410269985024759876312476
Description
Experience Level: Expert
Essential Skills...
Python 3
Web Scraping
Git
pytest
Mechanize Python Package
Selenium webdriver / Headless Chrome
Azure devops
Desirable skills
Azure
Service Bus
Docker
Kubernetes (AKS)
REST
Microservice Architecture
Start Date
ASAP
Full time
Duration of Project
Depends on the speed of the developer but we expect somewhere between 2-6 weeks.
Desired Experience
Previous experience of scraping websites
Understanding of the DOM, HTML elements, Browser Developer Tools
Building Python Web Apis and Services
Working with Microsoft Azure and Devops
Main Task
Fix Broken Web Scrapers.
We have approximately 430 web scrapers crawling British local council website for planning applications. Currently, about 260 scrapers are working and we need to get the remaining fixed within the next 4 to 6 weeks. We also expect the usual unit tests around code and some end to end tests for some happy path scenarios.
Much of the code is based on the old repo: https://github.com/aspeakman/UKPlanning . Please review it to get a taste of the work involved. For some scapers its just a case of fixing a url, others will need to be rewritten.
Secondary Task
Refactor the code to meet best practices.
This isn’t the first priority and shouldn’t get in the way for completing the main task. But new code should be written to best practices and if there’s time and budget we can refactor the existing code.
Software Architecture
The overall architecture can be broken down into 2 systems: The Crawlers and the Search Engine.
The Crawlers
A .NET Orchestrator Service/Api runs every day at 4am and tells the crawlers to start crawling via a POST request. The Python Crawler crawls planning opportunities in the construction industry and publishes the opportunities to a topic. A .NET service then transforms the data to a normalised shape and publishes to another topic. Then a third .NET service receives the transformed Opportunity and writes it to the database. The Python developer is only expected to work on the Python Crawler and no knowledge of .NET is necessary.
The Search Engine
The Search Engine indexes the Opportunities in Elastic Search, which is made available through a .NET Web Api and used in the Front End.
The Python Crawler
The Python Crawler consists of 3 Docker Containers: The Api, The Redis Queue, and The Worker. We can scale up the number of Workers so that we can crawl many websites in parallel.
The Orchestrator runs every day at 4am (CRON job) and sends a list of councils to the Python Crawler Api to crawl.
The Python Crawler Api adds each council as a Job to the Redis Queue.
The Python Worker dequeues a Job from the Redis Queue and crawls the specified council.
The Worker scrapes the Planning Opportunities and publishes them to a Service Bus topic.
The Worker also sends a LastCrawledTime to the Orchestrator Api to track when it last crawled (it does this in check points to help long running jobs).
Python 3
Web Scraping
Git
pytest
Mechanize Python Package
Selenium webdriver / Headless Chrome
Azure devops
Desirable skills
Azure
Service Bus
Docker
Kubernetes (AKS)
REST
Microservice Architecture
Start Date
ASAP
Full time
Duration of Project
Depends on the speed of the developer but we expect somewhere between 2-6 weeks.
Desired Experience
Previous experience of scraping websites
Understanding of the DOM, HTML elements, Browser Developer Tools
Building Python Web Apis and Services
Working with Microsoft Azure and Devops
Main Task
Fix Broken Web Scrapers.
We have approximately 430 web scrapers crawling British local council website for planning applications. Currently, about 260 scrapers are working and we need to get the remaining fixed within the next 4 to 6 weeks. We also expect the usual unit tests around code and some end to end tests for some happy path scenarios.
Much of the code is based on the old repo: https://github.com/aspeakman/UKPlanning . Please review it to get a taste of the work involved. For some scapers its just a case of fixing a url, others will need to be rewritten.
Secondary Task
Refactor the code to meet best practices.
This isn’t the first priority and shouldn’t get in the way for completing the main task. But new code should be written to best practices and if there’s time and budget we can refactor the existing code.
Software Architecture
The overall architecture can be broken down into 2 systems: The Crawlers and the Search Engine.
The Crawlers
A .NET Orchestrator Service/Api runs every day at 4am and tells the crawlers to start crawling via a POST request. The Python Crawler crawls planning opportunities in the construction industry and publishes the opportunities to a topic. A .NET service then transforms the data to a normalised shape and publishes to another topic. Then a third .NET service receives the transformed Opportunity and writes it to the database. The Python developer is only expected to work on the Python Crawler and no knowledge of .NET is necessary.
The Search Engine
The Search Engine indexes the Opportunities in Elastic Search, which is made available through a .NET Web Api and used in the Front End.
The Python Crawler
The Python Crawler consists of 3 Docker Containers: The Api, The Redis Queue, and The Worker. We can scale up the number of Workers so that we can crawl many websites in parallel.
The Orchestrator runs every day at 4am (CRON job) and sends a list of councils to the Python Crawler Api to crawl.
The Python Crawler Api adds each council as a Job to the Redis Queue.
The Python Worker dequeues a Job from the Redis Queue and crawls the specified council.
The Worker scrapes the Planning Opportunities and publishes them to a Service Bus topic.
The Worker also sends a LastCrawledTime to the Orchestrator Api to track when it last crawled (it does this in check points to help long running jobs).
Marcus P.
0% (0)Projects Completed
-
Freelancers worked with
-
Projects awarded
50%
Last project
21 Jan 2026
United Kingdom
New Proposal
Login to your account and send a proposal now to get this project.
Log inClarification Board Ask a Question
-
There are no clarification messages.
We collect cookies to enable the proper functioning and security of our website, and to enhance your experience. By clicking on 'Accept All Cookies', you consent to the use of these cookies. You can change your 'Cookies Settings' at any time. For more information, please read ourCookie Policy
Cookie Settings
Accept All Cookies