
Data Extraction Projects
Looking for freelance Data Extraction jobs and project work? PeoplePerHour has you covered.
opportunity
Building database of owners by web scraping
I have been working with DeepSeek to extract data from the web site tuscasasrurales.com. The data I need is shown on the attached file Granada.png. Some of this data requires a link to be clicked as shown in the uploaded file Tuscasasrurales.png. Email addresses have to be obtained by visiting the website (if there is one). Where no data is available, leave blank. I have been trying to extract the data province by Spanish province. To get a list by province, enter the site and add the name of the province in the search box e.g Granada which returns 304 entries. Although DeepSeek was unable to get all the info I wanted it has given me a python script which will do the job. I have uploaded this. It does not include the fields Bedrooms and Bathrooms which I would also like included. Can you do this work and how much would you charge? There are initially 10 provinces with an average of +/- 200 entries in each. Thanks - Allan
4 hours ago37 proposalsRemoteopportunity
Setup a connection between Hoowla and Powerbi
Integrate Hoowla with Power BI to enable full data extraction and reporting. Use the provided Hoowla API documentation to authenticate, retrieve all available endpoints and datasets, and implement efficient data ingestion into Power BI. Deliver a reusable, documented solution using Power Query (M), REST connectors, or a custom data connector as appropriate. Ensure incremental refresh capability, error handling, data transformation, and clear mapping of Hoowla fields to report-ready tables. Provide deployment and brief usage instructions.
7 days ago17 proposalsRemoteIntegrate 10 APIs into 3 Different categories on Wordpress
Integrate 10 APIs of 3 different categories into my wordpress website without the data slowing down the website nor the data going into the wrong category. please confirm how much you will do the entire integration and please confirm if you will be using plugin or fully coding. Thank you for your interest in this project. Please, I need someone who can help integrate APIs. 1. The APIs, when integrated, should pull the data and store it on my web platform so that it would be easy for visitors to access without having to request it every time. 2. The extracted data would be displayed on a WordPress directorist plugin. if you are using code or a plugin, please let me know. But importantly, the data pulled should be able to display I. The title II. The Overview (The initial part of the job information) III. More Information (Continuation of the extracted information) IV. Link to the job post to apply on V. The Logo, if available, if not, a default from file should be loaded. VI. The only jobs to be pulled are remote jobs in some twenty-something categories. 3. The data pulled from the APIs, when directed to the originating website, should not require a login for visitors to use or apply nor lead to a collection of other jobs but the job post alone. What is your best rate for the 10 APIs. Here are some I would like you to start with: https://rapidapi.com/Pat92/api/jobs-api14 https://rapidapi.com/fantastic-jobs-fantastic-jobs-default/api/active-jobs-db Online learning portals Greenhouse and other remote job boards, please let me know which you have sucessfully integrated and can do on directorist plugin on wordpress website Can you also work with XML feeds, if yes, please let me know your cost, if you can create a bot scrapper, how much? How many days will you be needing to get this done Thanks.
5 days ago17 proposalsRemoteopportunity
Extract blood test data from PDF documents that have been OCR'd
The objective is to build a structured blood test database that allows pathology results to be viewed, edited, filtered, and exported to Excel via a web-based HTML interface. The system stores results in a clean, standardised format so trends can be analysed accurately over time. Using AI-assisted OCR, I have built a local Python extraction pipeline that converts PDF pathology reports into machine-readable text and inserts structured data into a SQLite database. The majority of blood tests extract correctly, including canonical test name, result value, unit, and reference range. However, I have reached a specific technical issue with three markers: • CRP (C-reactive protein) • ESR • GLU (Glucose) The OCR output clearly contains the correct lines, and debug logs confirm they are processed. Yet no rows are inserted for these markers. The failure appears to occur between canonical matching, numeric extraction, or validation logic. Current System Architecture The system runs locally and consists of: • extraction_core_2.py (main engine) • Supporting modules for OCR preprocessing, lab dictionary building, regex matching, and validation • SQLite backend • Schema-driven canonical lab dictionary • Controlled fuzzy fallback logic • HTML viewer for results display and Excel export Pipeline flow: Convert PDF to image (pdf2image) Preprocess Run Tesseract OCR Clean and normalise text Match against canonical lab dictionary Extract: canonical test name numeric result unit reference range Validate Insert into SQLite The engine is deterministic and rule-based. The Specific Problem Example OCR line: CRP H 5.2 mg/L 0-5 OCR text is correct. NUMBER_PATTERN matches. The canonical dictionary contains the test. Yet: Inserted 0 rows from 0126251OrderReport_23B00006604_CRP.pdf Likely failure points include: • Canonical containment match failing due to normalisation • Flag tokens (“H”, “L”) interfering with numeric capture • Numeric extraction anchored incorrectly • Validation rejecting due to strict range formatting • Unit pattern mismatch (e.g. mmol/L) • Dictionary indexing issue • Match overridden by another lab name • Guard conditions too strict If validation fails, the row is rejected silently. All other panels extract correctly. The issue appears isolated. What Is Required This is not a rebuild. We do not want: • Re-architecture • Experimental AI guessing logic • Large-scale changes • Expanded fuzzy matching We need: 1. Precise Diagnosis Identify exactly where CRP, ESR, and GLU are failing insertion and which rule is causing rejection. 2. Minimal Safe Fix Implement a targeted correction that: • Adjusts canonical matching if required • Anchors numeric extraction correctly • Allows flag tokens without blocking capture • Relaxes only necessary validation checks • Preserves deterministic behaviour 3. Zero Regression • No impact to currently working panels • No performance degradation • No uncontrolled fuzzy expansion 4. Modular Implementation If appropriate: • Implement as small isolated module or • Cleanly adjust matching block The existing architecture should remain intact. Constraints The system is designed to be: • Deterministic • Schema-driven • Reproducible • Forensic-grade We cannot introduce probabilistic or unpredictable behaviour. Longer-Term Goal After stabilising extraction: • Migrate to web deployment • Enable structured uploads • Add trend analysis • Later incorporate AI-assisted interpretation Immediate priority: Stabilise deterministic extraction for CRP, ESR, and GLU without breaking the existing engine. Materials Provided Uploaded: • Full extraction_core_2.py (text format) • Screenshot of HTML viewer • Sample PDF files • Export showing required output Additional materials available on request: • Sample OCR blocks • Canonical dictionary entries • Regex patterns • Validation logic • Database schema • Debug logs This is a focused debugging and refinement request. I have spent many hours attempting to isolate the issue and now require an experienced developer to identify the blocking condition and implement a practical fix. I have been advised this should take 1–2 hours for a senior developer. Looking for a swift turnaround.
22 days ago21 proposalsRemoteData engineer
We are seeking an experienced Data Engineer to help organize, clean, and structure complex real estate and regulatory compliance data across multiple sources. This role focuses on transforming inconsistent datasets related to leases, occupancy, tenants, and rent information into a reliable and scalable data foundation. The ideal candidate will review existing data, identify quality issues such as duplication and missing fields, and design standardized schemas and relationships. You will build transformation workflows to clean and normalize data from spreadsheets, databases, and system exports. In this role, you will create master datasets for properties, units, households, leases, and compliance tracking while implementing validation rules and exception reporting. You will also document data definitions, mapping logic, and business rules to support transparency and long-term maintainability, while collaborating with stakeholders to translate operational requirements into structured data models. Strong proficiency in SQL and Python is required, along with hands-on experience in ETL/ELT workflows and relational data modeling. Experience working with messy, Excel-heavy datasets and building data quality checks is essential, and familiarity with tools like dbt, Airflow, or cloud platforms such as Snowflake or BigQuery is highly preferred. Success in this role means delivering a clear, consistent source of truth for lease and occupancy data, reducing inconsistencies, and preparing the data environment for reporting, automation, and future product development.
5 days ago14 proposalsRemoteResearch and data population
I'm looking for someone who can build me a targeted list of supporter groups, Facebook pages, TikTok accounts etc. for a football team. This would involve collating Name, Bio, Email, Phone Number, Total Followers, Social pages links into a spreadsheet (organisations and individual influencers linked to my particular team). These would be compiled into a spreadsheet I'd provide. I'm not sure exactly how many entries there will be, it's more of a research and population task but I can provide full details to you on application. It needs to be thorough and well-researched. Please quote your hourly rate, and only apply if you can deliver this by Friday 27 March (AM). Maybe around 4-hours work but difficult to say, potentially more or less depending on how quick you are. Thanks, Harvey
4 hours ago17 proposalsRemoteData Scraping
I am seeking an adept data scraper to extract comprehensive information from a specified Shopify blog and compile it into a structured spreadsheet for seamless importation into a WordPress platform. The required data includes the original blog URL, publication date, URL link to the article's image, article title, and the full content of each article. The task involves approximately 174 articles, necessitating meticulous attention to detail and accuracy. Your expertise in data extraction and formatting will be invaluable for this project. the url is: https://kingsnqueens.com/blogs/news. Thank you for your interest!
24 days ago65 proposalsRemoteopportunity
Need a plugin for wordpress
I need a plugin that can extract orders from woocommerce, then we can change quantity in excel and import it back and we can decrease/increase quantity and split the orders if needed.
5 days ago59 proposalsRemotePDF Scraping
Hi, I need to scrape Item# & HT# from first table and "Item and Piece# together" and HT# from 2nd table. I also need Spool# given on the bottom right corner. Sample PDF is attached. There are around 600 pages. This is not a manual data entry job. I need coders to extract data programmatically within few hours.
a month ago22 proposalsRemoteAI / Machine Learning Engineer (LLM & Applied AI) – Remote (EU)
Responsibilities AI / Machine Learning Build and deploy AI-powered applications using existing Large Language Models (LLMs). Design systems that ingest data, extract structured insights, and generate accurate outputs. Develop RAG pipelines, chunking strategies, and LLM orchestration workflows. Build tools for model training, evaluation, inference serving, monitoring, and alerting. Experiment with modern ML frameworks and open-source AI tools. Software Engineering Develop scalable microservices that integrate AI models with production systems. Build APIs and backend services to process and manage AI-generated data. Work with modern programming languages such as Python, JavaScript, Go, or Rust. Data Engineering Design pipelines to extract, transform, and load data from multiple sources. Clean, normalize, and validate datasets for model usage. Optimize data pipelines for reliability and performance. Database & Infrastructure Design database schemas and optimize queries. Manage performance and scalability of data storage systems. Ensure AI infrastructure is production-ready and scalable. Collaboration Work closely with product managers, engineers, and subject matter experts. Communicate technical challenges and solutions clearly. Help define best practices for AI system architecture and development. Requirements Based in the European Union 8+ years of software engineering experience Strong experience with Python or JavaScript Hands-on experience with LLM APIs (OpenAI, Anthropic, or similar) AI / LLM Experience Experience building RAG systems Knowledge of chunking strategies for LLM optimization Experience with LangChain, LangGraph, or similar orchestration tools Familiarity with AI monitoring, observability, and evaluation frameworks Experience building agent-based workflows or AI automation Engineering Experience Experience building microservices and scalable systems Strong knowledge of data pipelines and ETL processes Experience designing and optimizing databases and data models Additional Skills Strong understanding of ML concepts and NLP techniques Ability to work with ambiguous problems and rapidly evolving AI tools Experience with modern software development practices (Git, testing, CI/CD, code reviews) Engagement Details Location: Remote (EU-based freelancers only) Contract: Freelance Availability: Part-time or Full-time Duration: Long-term collaboration possible Nice to Have Experience building AI agents or multi-agent workflows Experience with evaluation frameworks for LLMs Experience deploying AI infrastructure in production environments
9 days ago17 proposalsRemoteAI / Machine Learning Engineer (LLM & Applied AI) – Remote (EU)
Responsibilities AI / Machine Learning Build and deploy AI-powered applications using existing Large Language Models (LLMs). Design systems that ingest data, extract structured insights, and generate accurate outputs. Develop RAG pipelines, chunking strategies, and LLM orchestration workflows. Build tools for model training, evaluation, inference serving, monitoring, and alerting. Experiment with modern ML frameworks and open-source AI tools. Software Engineering Develop scalable microservices that integrate AI models with production systems. Build APIs and backend services to process and manage AI-generated data. Work with modern programming languages such as Python, JavaScript, Go, or Rust. Data Engineering Design pipelines to extract, transform, and load data from multiple sources. Clean, normalize, and validate datasets for model usage. Optimize data pipelines for reliability and performance. Database & Infrastructure Design database schemas and optimize queries. Manage performance and scalability of data storage systems. Ensure AI infrastructure is production-ready and scalable. Collaboration Work closely with product managers, engineers, and subject matter experts. Communicate technical challenges and solutions clearly. Help define best practices for AI system architecture and development. Requirements Based in the European Union 8+ years of software engineering experience Strong experience with Python or JavaScript Hands-on experience with LLM APIs (OpenAI, Anthropic, or similar) AI / LLM Experience Experience building RAG systems Knowledge of chunking strategies for LLM optimization Experience with LangChain, LangGraph, or similar orchestration tools Familiarity with AI monitoring, observability, and evaluation frameworks Experience building agent-based workflows or AI automation Engineering Experience Experience building microservices and scalable systems Strong knowledge of data pipelines and ETL processes Experience designing and optimizing databases and data models Additional Skills Strong understanding of ML concepts and NLP techniques Ability to work with ambiguous problems and rapidly evolving AI tools Experience with modern software development practices (Git, testing, CI/CD, code reviews) Engagement Details Location: Remote (EU-based freelancers only) Contract: Freelance Availability: Part-time or Full-time Duration: Long-term collaboration possible Nice to Have Experience building AI agents or multi-agent workflows Experience with evaluation frameworks for LLMs Experience deploying AI infrastructure in production environments
9 days ago13 proposalsRemoteFreelance Data Entry Clerk
Project Description: I am seeking a detail-oriented Freelance Data Entry Clerk. The ideal candidate will accurately input, update, and manage data across spreadsheets, databases, and CRM systems to ensure records are complete, organised, and error-free. Key Responsibilities: Transferring information from physical records, PDFs, or audio files into spreadsheets, CRM systems, or databases. Updating existing records to ensure data is current and accurate. Meticulously reviewing data to identify and correct inaccuracies, missing information, or inconsistencies. Compiling and entering data from internal documents, reports, and provided source materials. Sorting and organising digital or physical documents for easy retrieval. This is a freelance hiring opportunity, not an offer for permanent employment or outsourcing projects. If you are organised, detail-oriented, and experienced in Freelance Data Entry Clerk, I look forward to hearing from you!
5 days ago46 proposalsRemoteMarketplace Data Collection
We are looking for a freelancer who can quickly collect listings from several online marketplaces. Task: - Find 3,000 listings on the marketplaces listed below that meet our criteria. - Add all listings to an Excel spreadsheet. - You must be registered on the platforms to open contact details and include them in the Excel file. Marketplaces: - Osta.ee - Skelbiu.lt - eMAG - Bazar - Allegro - OLX Requirements: - Attention to detail - Ability to work quickly and accurately - Registered accounts on the platforms to access contact details Deadline: The work must be completed within 3 days. Payment: Payment is negotiable. Important: We need someone who can start working immediately.
13 days ago25 proposalsRemoteopportunityurgent
Reformatting and cleaning data from an old CRM
I have several excel spreadsheets with excessive data which need cleaning and updating then putting into a workable format
9 days ago84 proposalsRemoteExpires in 20Aged payable summary page in Power bi
We provide financial reports to our clients at period end and are currently looking for help with a new report page to display an Aged Payables Summary in Power BI, similar to the format available in Xero. The report needs to be dynamic and respond to the period selected within the Power BI report. The data source is Xero. We use an ETL service to extract the data from Xero and load it into a SQL database, which then serves as the data source for the Power BI report. The report currently utilises the following Xero tables: Journal Invoices Accounts Organisation Tracking Categories we are looking for someone who has already worked on Data from Xero or have created a dynamic Aged payable Summary in Power BI. We would be happy to arrange a meeting to discuss the project requirements in more detail.
a month ago24 proposalsRemoteopportunity
UK Crypto Tax reconciliation & data analysis
Description: We are a UK-based accountancy firm specialising in Crypto tax, and we are looking for an experienced Crypto Tax Data Analyst to support ongoing client work. This role is focused heavily on data analysis rather than traditional accounting. Scope of work includes: - Reviewing wallet and exchange data (CSV/API exports) - Line-by-line transaction analysis - Identifying and categorising taxable events under UK (HMRC) rules - Reconciling discrepancies across wallets, exchanges, and DeFi activity - Cleaning and structuring datasets for tax reporting - Supporting preparation of outputs for final tax review Typical clients include: - High-volume traders - DeFi users (staking, liquidity pools, bridging, etc.) - NFT traders - Individuals with complex multi-wallet activity Requirements: - Strong understanding of UK Crypto tax treatment (HMRC guidance essential) - Proven experience using tools such as Koinly, Recap, CoinTracking or similar - Ability to handle large datasets accurately and efficiently - Strong analytical mindset and attention to detail - Experience identifying errors, duplicates, missing cost basis, and incorrect classifications Nice to have: - Experience working with UK accountancy firms - Familiarity with DeFi protocols and on-chain activity - Basic Excel / data manipulation skills Engagement: - Ongoing work available for the right candidate - Initially project-based, with potential for long-term collaboration Trial Task (Important): - Shortlisted candidates will be asked to complete a paid trial task. This will involve reviewing a sample dataset and: - Identifying key issues (e.g. missing cost basis, incorrect classifications, duplicates) - Providing a brief explanation of how you would resolve them - Demonstrating your approach to structuring clean, usable data This is a critical part of our selection process to ensure candidates can handle real-world Crypto data complexity. To apply: Please include: Examples of similar Crypto tax work you’ve completed Which software/tools you’ve used Your approach to handling messy or incomplete datasets
5 days ago5 proposalsRemoteUK Business DATA Supplier -
I am looking for a business data supplier. Data will be independant businesses - owners name, business name, address, email, whatsapp , post code price per 1000, 10000 & 100000 + turn around time. If you can scrape any other information for direct marketing, please let us know, including LinkedIn & plastic card companies Regards Proactiv
15 days ago25 proposalsRemoteDesign & Organization of Waste Collection Data Using Star Schema
Design and implementation of a Data Warehouse using a Star Schema to analyze waste collection operations. The project focuses on transforming raw operational data into structured dimension and fact tables to support data-driven insights and decision-making.
9 days ago8 proposalsRemoteProperty Research Assistant (UK Planning Data – Land & Barns)
Project Description: This role involves research only. No sales, negotiation, or brokerage activity is required. Research UK property listings to identify land, barns, or smallholdings with granted and active planning permission for holiday, glamping, or agritourism use. Tasks: Search Rightmove, Zoopla, Plotfinder Verify planning via UK council portals Record accurate data in a spreadsheet Output (per entry): Listing link Price Planning reference + confirmed active status Planning use Agent contact Brief notes Requirements: UK planning portal familiarity preferred Only include properties with confirmed planning permission Volume: 10–30 entries initially (ongoing possible) Payment: Per valid entry or fixed batch (agreed) Start: Immediate
5 days ago11 proposalsRemoteopportunity
Form production which can be via a microsoft app or similar
We are a construction company based in the UK. I would like to design the following form types under one system/software: Induction form for when first entering a construction site Signing forms for daily contractors to use when they turn up and leave each site, these to record per site and also have a main data base of the records. site audit forms Snagging forms If possible all the above to work off the same application. And have a management level so as key personal can update, then middle management can request data/records, with site staff being able to set up and manage at a local level. I would like a facility that i can add additional forms onto. also i would like to produce dash boards that show Financial KIPs in the office. We record most of these currently onto excel, but i want to have a visual update that can is easy to integrate and update. If possible with live information or by requesting data and updating
3 days ago47 proposalsRemote