
Data Cleansing Projects
Looking for freelance data cleansing jobs and project work? Browse active opportunities on PeoplePerHour, or hire data miners through Toptal’s rigorously vetted talent network.
opportunity
HEADTEACHERS Primary school email addresses
Please provide email contact data for the following roles within London and Surrey Primary and Secondary Schools: Headteachers Deputy Headteachers Business Managers PE Teachers School Offices Before proceeding, please confirm the total data count and the date of the last data cleanse. It is essential that the data is restricted to London and Surrey only, with no records from other regions. Please also confirm data turnaround time or if data is readily available
9 hours ago17 proposalsRemoteopportunity
Data Analyst
Position Overview: We are seeking a Data Analyst to transform raw data into actionable insights. In this role, you will analyze complex datasets, create reports, and develop visualizations to support business decisions and optimize processes. Key Responsibilities: Collect, clean, and organize data from multiple sources. Analyze datasets to identify trends and insights. Create reports and dashboards using BI tools (Tableau, Power BI, etc.). Design visualizations to communicate findings clearly. Collaborate with teams to provide data-driven recommendations. Ensure data integrity through audits and validation checks. Required Skills and Experience: Education: Bachelor’s degree in Data Science, Statistics, or related field. Experience: 1–3 years in data analysis or a related field. Technical Skills: Proficiency in SQL and data analysis tools (Excel, Python, R, etc.). BI Tools: Experience with data visualization tools (Tableau, Power BI). Analytical Skills: Strong ability to interpret data and provide actionable insights. Communication: Ability to present complex data simply and effectively. Nice to Have: Experience with machine learning or predictive analytics. Familiarity with data warehousing and ETL processes.
4 days ago24 proposalsRemoteData classification
I will share a purely categorical dataset and need it turned into a clear, well-documented end-to-end classification workflow that I can study for academic purposes. Using Python with Pandas, NumPy, scikit-learn, and visualisations in Matplotlib or Seaborn, start with an exploratory review, handle all cleaning and preprocessing (encoding, missing values, feature selection), then build and compare suitable classification models. Sound evaluation—accuracy, precision, recall, F1 or any metric you judge relevant—must accompany the models, followed by a concise discussion of the results and why a particular approach performs best. Please highlight your experience with similar projects when you respond; I value demonstrated know-how over long proposals. Deliverables I expect: • A well-commented Jupyter notebook covering EDA, preprocessing, model training, and evaluation • The cleaned dataset (or the code that generates it) • A brief markdown or slide deck that walks through the methodology, findings, and recommended next steps Clarity of explanation is just as important as model accuracy, as the primary goal is learning from your workflow.
4 days ago11 proposalsRemoteData Scraping
We are looking for a freelancer with experience in data extraction and web automation to collect a list of registered businesses from a Laravel-based platform that requires login. I have valid login credentials (my own account). The task includes: Logging in using provided credentials Accessing the authenticated business listing Handling pagination to retrieve all entries Exporting the data to CSV or Excel
8 days ago31 proposalsRemoteData input to a website
I'm a real estate agent. I have about 100 properties that I needed loaded to my website. This will include 100's of photos, property descriptions, property details, etc. I need a reliable freelancer able to do this professionally and accurately. Must be completed quickly
7 days ago66 proposalsRemoteI need somone to clean my list of data
I have an email list of approximately 35,000 email addresses and I need someone to clean the data. This can be done either manually or using an automated process — I’m flexible on the method. The requirements are: 1. Remove all duplicate email addresses 2. Remove any invalid email addresses or addresses that are likely to bounce I’m not concerned about how this is carried out, as long as the final list of email addresses is accurate and fully cleaned WITH NO INVALD OR OR EMAILS THAT BOUNCE.
3 days ago66 proposalsRemoteMarine Event Prediction System (Environmental Data + ML)
I am building a system to log marine-related events at specific locations and times, automatically enrich those events with historical environmental data, and use the growing dataset to predict the likelihood of similar events occurring under comparable future conditions. This project is focused on environmental pattern detection, particularly phase transitions and converging conditions, not static averages. The system will be developed in clear milestones and must be modular and expandable.
9 hours ago13 proposalsRemoteMultilingual audio data collection project
We are conducting a multilingual audio data collection project and are seeking native speakers from specific regions to participate. The project involves recording natural, high-quality voice samples in your native language and regional accent. We are currently looking for native speakers of English (Ireland, New Zealand, Scotland, South Africa, Wales, Singapore), German (Switzerland), Chinese (Hong Kong), and Cantonese (China, Hong Kong). Participants must be born and raised in the respective region to ensure authentic pronunciation and accent accuracy. This is a remote, freelance opportunity suitable for individuals with clear speech and access to a quiet recording environment.
5 days ago5 proposalsRemoteSenior Data Engineer
We are seeking a Senior Data Engineer to design, implement, and optimize data pipelines utilizing Scala, Spark, and Java. The ideal candidate will develop and maintain real-time data processing systems essential for business operations. Collaboration with data scientists and analysts is crucial to understand data requirements and deliver high-quality solutions. Responsibilities include ensuring data quality through robust testing, monitoring workflows, and troubleshooting pipelines. Candidates should possess a degree in Computer Science or Engineering, with proven experience in data engineering, real-time processing, and SQL proficiency. Familiarity with cloud platforms and data governance is preferred. We offer a competitive salary, benefits, and opportunities for professional growth in a collaborative environment. Key Responsibilities: - Design, implement, and optimize data pipelines using Scala, Spark, and Java. - Develop and maintain real-time data processing systems to support business-critical operations. - Collaborate with data scientists, analysts, and other stakeholders to understand data requirements and deliver high-quality solutions. - Ensure data quality and reliability through robust testing and validation procedures. - Monitor and troubleshoot data pipelines and workflows to ensure high availability and performance. - Stay current with emerging technologies and industry best practices to continuously improve our data infrastructure. Qualifications: -Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field. - Proven experience with Scala, Spark, and Java in a data engineering or similar role. - Strong understanding of real-time data processing and streaming technologies. - Experience with big data platforms and tools such as Hadoop, Kafka, and Flink is a plus. - Proficiency in SQL and experience with relational databases. - Excellent problem-solving skills and attention to detail. - Strong communication and collaboration skills to work effectively with cross-functional teams. Preferred Skills: - Experience with cloud platforms (AWS, Azure, Google Cloud) and their data services. - Knowledge of data warehousing solutions and ETL processes. - Familiarity with data governance and security best practices.
17 days ago18 proposalsRemoteWeb Scraping Required
I require a structured data extraction project from the following directory: https://www.buildington.co.uk/companies The objective is to extract and structure company data into a clean Excel spreadsheet. Required fields: • Company Name • Contact Name (if available) • Telephone Number • Email Address • Website URL Important: • Some data is available directly on the directory page. • In certain cases, the freelancer may need to visit the company’s website to retrieve missing contact details. • Data must be structured, cleaned and deduplicated. • Output format: Excel (.xlsx) with clearly labelled columns. • Please confirm your approach and tools before starting. This is not a one-off copy and paste task. I am looking for someone who can create a reliable and efficient extraction method.
35 minutes ago19 proposalsRemoteData scraping needed
I have several directories I need to obtain contact information from... Can anyone help me? I will have all the sites for the person who will help!
19 days ago67 proposalsRemoteDeveloper UK/US Financial News Screener (Python / APIs / Data)
PROJECT OVERVIEW I’m building a financial stock news screener focused initially on UK (AIM & Main Market) and then later US small-cap stocks. The system is designed to filter genuinely market-moving news from daily noise. This is an ongoing build, not a one-off task. I’m looking for one capable individual developer (not an agency) who can own the technical implementation and iterate with me as the logic evolves. Where necessary, I want the option to work side-by-side in person for short periods, so UK or Europe location matters. WHAT THE SYSTEM DOES (HIGH LEVEL) - Ingests real-time and scheduled stock news from multiple APIs (UK & US) - Parses and classifies news using firstly rules, secondly AI - Scores news for market impact (positive / negative / neutral) - Flags dilution, governance and risk signals - Outputs structured alerts and internal dashboards This is a logic-heavy, data-driven project, not a UI-first build. REQUIRED SKILLS (MUST-HAVE) Please do not apply unless you are comfortable with most of the following: - Python (primary language) - Working with REST APIs (news, market data, etc.) - Data parsing, filtering and scoring logic - SQL or NoSQL databases (Postgres, MongoDB or similar) - Clean, readable, maintainable code - Git / version control - Comfortable discussing system design and trade-offs Experience with financial data, trading tools, or market news is a major plus. NICE TO HAVE (NOT ESSENTIAL) - Experience with financial markets, trading, RNS or SEC filings - AI / LLM-based text classification - Elasticsearch or similar search tools - AWS or cloud deployment - Previous SaaS or data-platform builds LOCATION REQUIREMENT (IMPORTANT) You must be based in or near one of the following: - UK - Spain - Western or Southern Europe (easy travel to UK or Spain) Due to the likely evolving nature of the project I want the option to meet in person and potentially work together for short, focused periods if required. Please clearly state your location when applying. ENGAGEMENT TYPE - Individual freelancer only (no agencies) - Long-term potential if the fit is right - Paid hourly or milestone-based (open to discussion) HOW TO APPLY (VERY IMPORTANT) To avoid generic applications, please include: 1. A short description of a similar data-driven or API-heavy project you’ve worked on 2. Your primary tech stack 3. Your current location 4. Your hourly rate 5. Confirmation that you are open to occasional in-person collaboration Applications that ignore this will not be considered. WORKING STYLE I’m an equities/indices trader with 30 years experience, reasonably technical but not a developer. I value clear thinking, no nonsense honest communication, and someone who challenges bad ideas rather than blindly coding them.
33 minutes ago6 proposalsRemoteAdministrative Support @Data Management Microsoft Word & Excel
I am looking for professional support in customer service and administrative tasks using Microsoft Office tools. The project includes handling customer inquiries, organizing data, and preparing well-formatted documents to support daily business operations. The required tasks include: • Responding to customer messages professionally • Data entry and information organization • Creating and formatting documents using Microsoft Word • Managing simple Excel sheets for tracking data or tasks • Providing general administrative support Accuracy, clear communication, and timely delivery are essential for this project. The goal is to ensure organized data, professional customer interaction, and smooth workflow using reliable office tools.
4 days ago17 proposalsRemoteopportunity
Data Validation & Research Admin (Virtual Assistant)
We are running a 3 to 6 month data quality control project for a new database, which can be extended based on performance. We need support from a qualified research and data quality control virtual assistant. Check and validate structured data before it’s published Spot and fix discrepancies across different sources Track team deliverables and make sure deadlines are hit Flag any quality issues or gaps early Keep simple documentation/audit trails of changes Put together a weekly data quality & research progress report Suggest small process improvements to make the workflow smoother Required Experience: Experience with data quality, research support, or admin work Super detail-oriented and organised Able to manage deadlines across multiple people/projects Strong Excel skills, including formulas and basic data analysis High numeracy and attention to detail Basic understanding of finance concepts Methodical, process-driven, likes keeping things neat Bonus: experience handling structured datasets Success Metrics: Team research deadlines are consistently met Clear improvement in data consistency and accuracy Weekly reports are trusted and relied on Fewer ad hoc data issues or escalations Team works with clearer expectations and accountability Payment & Commitment: Initial rate: £120 per 10 hours (DOE), can be increased after initial successful performance review Minimum commitment: 10 hours per month Performance bonuses available
10 days ago39 proposalsRemoteopportunity
.step model from scan date - racing yacht
I have detailed scan data from a RTC360 of a racing yacht and require a high precision .step model.
2 days ago10 proposalsRemoteExpert Data Researcher Needed – UK SPV Landlord Database
I’m looking for a highly competent, expert-level data researcher to carry out an ad-hoc project collating UK landlords operating through SPV (Special Purpose Vehicle) limited companies. The task is to build a clean, well-structured spreadsheet (any format that can be imported into Excel/Google Sheets) containing verified data sourced primarily from Companies House and/or other reliable UK property/ownership databases. IS THERE A WAY TO INCLUDE CONTACT DETAILS - EMAIL / /ADDRESS / TELEPHONE? Key Requirements: Proven experience using Companies House and corporate ownership databases Strong understanding of UK property SPV structures Ability to identify and filter landlord companies accurately Excellent data hygiene and validation skills Able to work independently and deliver quickly (ASAP) Deliverable: A spreadsheet containing SPV landlord company data Clearly labelled columns (e.g. company name, number, directors, registered address, SIC codes, property links if available, etc.) Scalable structure so the dataset can be expanded later Project Type: Fixed price (please quote based on your expected hours) Ad-hoc with potential for repeat work Ideal Candidate: Advanced researcher or analyst Prior experience in property, financial due diligence, or corporate intelligence Comfortable working with large datasets and complex filtering logic Please include: Relevant experience with Companies House or similar databases Example of similar work (if available) Your proposed fixed price and estimated turnaround time
6 days ago20 proposalsRemoteYouTube data extract
Hi, I’m looking to create a list of videos on YouTube that have the keyword Hitler in the title. I’m looking for a CSV file output. I’d want all the usual video stats including: Video title Video description Video tags Category Thumbnail Hashtags Video URL link View count Upload date Video length Likes Dislikes Comments count Channel name Channel join date Channel subscribers count Total channel views Number of uploaded videos Channel description Channel URL link No API will be provided, and I’d like up to 7,000 videos please. Looking forward to hearing from you! Many thanks
a month ago29 proposalsRemoteData Annual Form 10-k
Overview Accountants must be familiar with the amount of data required in the Annual Form 10-K filing and the Annual Proxy filing for publicly traded companies because they will use the data from these filings to research a company’s competitors. Accountants can even use the data from these filings in their personal life to research investment opportunities. Scenario In this milestone, you will prepare a valuation for a 1% minority shareholder on the assumption that your company is a “going concern” company, meaning that the company will be able to pay its financial obligations as needed for the foreseeable future. Directions In this milestone, you will provide a brief history and overview of the company you selected. Use your company’s most recent Form 10-K filing and SEC Annual Proxy filing from The Securities and Exchange Commission’s (SEC) website to gather the information described in the rubric criteria. You will also provide a brief summary of your findings for your valuation team members and include a visualization in the summary. Note: Refer to this module’s Discussion for the list of companies to choose from. Specifically, you must address the following rubric criteria: Links Provide the most recent SEC Form 10-K filing link for the company. Provide the most recent SEC Proxy filing link for the company. History and Overview Provide a brief company history overview based on external research of the company. Consider the following questions to guide your response: How long has the company been in business? Who was the original founder of the company? What significant changes to company leadership have occurred? How has the company changed since its beginning? Consider expansion of locations or products/services, etc. Identify all of the company’s major locations for their facilities and/or other properties. Identify all of the customers recognized by the company. List all of the names of the executive management team of the company. Identify all of the competition recognized by the company. Identify all of the major shareholders of the company. Describe business risks recognized by the company. Explain how the company is committed to environmental, social and governance (ESG) efforts and sustainability. Describe the company’s leadership in energy and environmental design (LEED) status. Consider the following questions to guide your response: Is the company currently LEED certified? If the company is not currently LEED certificated, is it working toward becoming LEED certified? Summary Summarize your findings for the valuation team. Include the following details in your response: Explain what you learned as you researched the company. Identify the key points the valuation team needs to be aware of. Create at least one effective visualization that supports key points. Include the following detail in your response: Provide appropriate labels for the visualization(s). If you need writing support, access the Academic Support module of your course. What to Submit Submit the Business Valuation Template with the Milestone One: Introduction section completed. The Introduction section should be an additional 4- to 6-page Microsoft Word document with double spacing, 12-point Times New Roman font, and one-inch margins in addition to the current page count. Sources should be cited according to APA style. Note: You will be using this same file throughout all the milestones and your project. Supporting Materials The following resources support your work on this assignment: Website: U.S. Securities and Exchange Commission This website allows users to search for a publicly traded company and find their published financial statements. Use this website to find your company’s 10-K filing.
23 days ago7 proposalsRemoteTitle: Backend Developer – Data Model & Logic for MVP
Looking for an experienced backend developer to build the core data model and process logic for a web/mobile MVP. The work includes: • Scalable JSON data structures for users, items, and workflows • Backend logic for process handling • API-ready setup for future integration • Clear documentation of data structures and logic Skills Required: • Backend development (Python, Node.js, or similar) • JSON, API, and database schema design • Clean, documented code • Understanding of process logic is a plus
13 days ago39 proposalsRemoteNeed help with Looker Dashboard in Looker Studio
I have a file where I will need help with Looker dashboards. It is related to Phishing data.
13 days ago13 proposalsRemote