Extraction Projects
Looking for freelance Extraction jobs and project work? PeoplePerHour has you covered.
Invoice extraction spreadsheet
I would like a way that i can set up that information is extracted from invoices and then feeds on to a spreadsheet of all the information i require. im not wanting anything complicated to use, it just takes the three or four things i need from every invoice i have and puts that information in the correct column. it need to work and not need constant adjustment to rules. i will be able to provide the heading from the invoices that needs to always be extracted. Sorry just so you know i need to be able to run this programme my self each day with different invoices as they come in, so this need to be a programme or system that is for myself to use not for me to send you the invoices. thanks
4 days ago25 proposalsRemoteReal Estate firm - using Microsoft Lists
We are a real estate company currently targeting acquisitions on behalf of a company. We receive a lot of marketing brochures, emails and websites which we need to put in to a readable format - we have decided on Microsoft Lists. We are currently exploring using parseur.com to extract the information from the brochures, though we are having trouble extracting. The job involves: - Using Microsoft Power Automate to extract the data from parseur.com / potential other AI pdf reader tool - Creating a map that is connected to Microsoft Lists to view the asset locations
9 days ago18 proposalsRemoteGybe new logo
In looking for a logo to use on letter head and or emails “Gybe Procure “ (based in the construction / mineral extraction sector )
a month ago58 proposalsRemoteopportunity
PDF Automation
I need handwritten / text written PDF OCR reading software to read PDF's that come in via email and automate via zapier into excel or google sheets & dropbox. We have application forms submitted to us by email, some of them are filled out by hand (handwritten) and some of them are filled in on a computer, I would like the data extracted and put into a spreadsheet. Is this possible? Thanks Will
17 days ago25 proposalsRemoteDigital Formating
We are looking for a person who is proficient with large-volume scanning and conversion projects to help us put together a 21-week course that’s in English but will also have Yoruba, and Spanish. We have most of the content with a blueprint/outline. We are using material from books and PDFs that have been scanned. Most of the content has been scanned, some of it is in digital format so you will need to use good OCR software to extract the images and text. Of course, we are looking for consistent formatting and a professional look. We will need you to create worksheets, Quizzes, and Tests for the material.
14 days ago17 proposalsRemoteI need a 3D solid from a 3D mesh model
I need someone proficient in Solidworks and/or Rhino who can convert a surface model of a boat into a solid enabling me to 3D print sections of the hull. I have various source files .obj .ma fbx and more i am not familiar with I have been working only with the .obj opening in solidworks and converting to mesh attempting to work with surfaces and profile slices to reverse engineer the original model. Essentially I would like - model scaled to 1.2m in length from original scale (+3m) -hull only extracted from model for future use - everything above and including fenders to be removed. -model origin shifted to centre-line at the stern/back deck level of hull -solid created of the hull model -ideally it would be great to know a workflow to be able to do this utilising Solidworks and/or Rhino I spent alot of time but as a new user really have struggled.
14 hours ago19 proposalsRemotePython ETL Expert Needed for Data Transformation Project
Hello We're looking for a skilled Python ETL expert to assist us with a data transformation project. Our goal is to extract, transform, and load data from various sources into a structured format for analysis. We need someone with experience in automating ETL pipelines, handling data quality issues, and ensuring seamless integration. Key Responsibilities: Retrieve data from all sorts of places – databases, APIs, and flat files. Work on raw data, making it clean, consistent, and accurate. Handle missing values, normalize data, and some feature engineering. Load the transformed data where it belongs, ensuring it gets along with our storage solution. Test and validate, ensuring our data stays true to its transformed self. Proven experience in Python development with a focus on ETL processes Strong understanding of data extraction, transformation, and loading best practices. Previous success in automating ETL pipelines for efficient data processing. Excellent problem-solving skills and attention to detail. Python Expertise: Proven experience in Python development with a focus on ETL processes. A deep understanding of data extraction, transformation, and loading best practices. A track record of successfully automating ETL pipelines for efficient data processing. Excellent problem-solving skills and meticulous attention to detail. Delivarbles: structured and organized CSV files, each containing accurately transformed, structured data. An automated and efficient ETL pipeline that can be easily maintained. Regular updates on project progress, ensuring we're always on the same page. If you're a Python ETL expert looking for an engaging project, we'd love to hear from you! Please send a propsal detailing your relevant experience, a brief overview of your approach to ETL projects, and any examples of past work.
18 days ago16 proposalsRemoteNeed help in featured collection Dawn theme ?
I need assistance designing and adding a new featured product card to an existing Dawn theme collection block on my website. The current block displays products as a responsive carousel and I would like to feature one highlighted product as a larger blank card above the others. The new design should seamlessly blend with the layout and styling of the existing collection content. Proficiency with WordPress and theme customization is required to extract relevant code, make appropriate template and CSS edits, and add the new featured item markup. Ability to style the blank card and integrate it into the current carousel without affecting core functionality is important. Suggestions for highlighting the promoted product will also be considered.
9 days ago14 proposalsRemoteNeed Google business scraper with Python (urgent)
I need a list of US businesses with a business.site and negocio.site website. The list needs to come from google maps, and you need to filter the business profiles by 'website' = business.site You will show first milestone project within 2 hours. I want a list of all the business profiles that have a website that says business.site Here's an example: https://your-lawn-care-connection.business.site/ I need you to make a list of all businesses with a business.site and extract their name, field, phone number & location. Groceries Restaurants Accountants Dentist Therapist Electrician Tree Removal Pest Control HVAC Roofing Landscaping Insulation Drywall Fencing Plumber Towing Cornwall Junk Removal Dentists auto detailing dentures Chiropractor Bathroom Remodel Kitchen Remodel Contractor siding
20 days ago14 proposalsRemoteNeed Google business scraper(with Python)
I need a list of US businesses with a business.site and negocio.site website. The list needs to come from google maps, and you need to filter the business profiles by 'website' = business.site You will show first milestone project within 2 hours. I want a list of all the business profiles that have a website that says business.site Here's an example: https://your-lawn-care-connection.business.site/ I need you to make a list of all businesses with a business.site and extract their name, field, phone number & location. Groceries Restaurants Accountants Dentist Therapist Electrician Tree Removal Pest Control HVAC Roofing Landscaping Insulation Drywall Fencing Plumber Towing Cornwall Junk Removal Dentists auto detailing dentures Chiropractor Bathroom Remodel Kitchen Remodel Contractor siding
20 days ago9 proposalsRemoteApplication Development Architect for 2 page technical writeup
We're looking for a skilled Technical or Solution Architect to create a 2 page solution design document. Yes exactly 2 pages! Your task will involve designing a solution to integrate EPC Open Source data with our business website, focusing on extracting and displaying Energy Performance Certificate (EPC) ratings for UK properties. You'll draft a comprehensive technical document detailing the process flow, API integration, data fetching, and presentation. Experience with RESTful APIs, data parsing, and security measures is essential. If you're adept at creating scalable, efficient solutions and can transform complex requirements into actionable plans, we'd love to hear from you. The EPC data is available via OpenSource - https://epc.opendatacommunities.org/docs/api/domestic . I need it done urgently so please apply if you have good ENGLISH written and oral skills ( for meetings with me). No Chat Gpt relies.
8 days ago9 proposalsRemoteNeed to create a report using SQL
I require assistance in generating a complex report from multiple data sources within a custom application. The report necessitates retrieving and aggregating information from various tables using SQL queries. It must pull and consolidate pertinent records from a range of interconnected database assets while preserving data integrity. Specifically, the report will need to extract salient fields like identifiers, values, and timestamps from several intricately related operational stores. It should then compile and organize the extracted segments to provide a holistic view of core business operations over a specified timeframe. The constituent pieces must be intelligently linked and cross-referenced to offer meaningful insights. The underlying database schema may be intricate with numerous joins and intricate relationships between entities. As such, the SQL development will require strong expertise in querying disparate sources and nesting clauses to obtain all needed attributes. Optimization will also be critical as improper queries could impact system performance. Overall, the finished report must deliver a cohesive narrative and present compiled information in a clear format that facilitates easy analysis. Requirements gathering, iterative testing, and validation against source data will be pivotal to ensure accuracy and reliability. Mobility with formatting for both onscreen and printable distribution is preferable. Key skills needed are advanced SQL, relational data modeling, report generations, and analytical problem-solving abilities. Experience extracting consolidated views from sophisticated production databases through SQL is a must.
a month ago18 proposalsRemoteRSS Feed Aggregation and Categorization Web Application
Objective: To develop a web application that captures RSS feeds from multiple sources, extracts comprehensive data including title, article body, images, source, author, etc., categorizes the content, and generates its own RSS feed based on these categories. Functional Requirements: RSS Feed Capture: The application shall integrate with various sources to capture RSS feeds. Upon capturing, the application shall retrieve complete data from the feed, including title, article body, images, source, author, and any additional relevant information. Data Extraction: The captured RSS feeds shall undergo parsing to extract relevant information from each feed entry. Information extracted shall include but not limited to: Title of the article. Body of the article. Images associated with the article. Source of the article (URL or name). Author(s) of the article. Publication date/time. Any other metadata provided by the feed. Categorization: The application shall categorize the extracted content into various predefined or dynamically generated categories based on content analysis. Categories may include but are not limited to: News, Technology, Sports, Finance, Entertainment, etc. The categorization process should utilize techniques such as keyword analysis, machine learning algorithms, or user-defined rules to assign content to appropriate categories. RSS Feed Generation: Once the content is categorized, the application shall create its own RSS feed(s) for each category. The generated RSS feeds shall include the categorized content along with relevant metadata. Each RSS feed should conform to standard RSS specifications and be accessible via a unique URL. Non-Functional Requirements: Performance: The application shall handle a large volume of RSS feeds efficiently, ensuring minimal latency in capturing, processing, and categorizing the content. Response time for user interactions shall be optimized to provide a seamless browsing experience. Scalability: The architecture of the application should be designed to scale horizontally to accommodate increasing numbers of RSS feeds and users. Load balancing mechanisms should be implemented to distribute incoming traffic across multiple servers. Reliability: The application shall be robust and resilient to failures, ensuring continuous operation even in the event of hardware or software failures. Data integrity measures shall be in place to prevent data loss or corruption. Security: The application shall implement authentication and authorization mechanisms to control access to sensitive functionalities and data. Data transmission and storage shall be encrypted to protect against unauthorized access or tampering. Assumptions: The RSS feeds from various sources are accessible via standard HTTP protocols. The application will not alter the original content of the RSS feeds, but rather create its own feeds based on categorized content.
25 days ago7 proposalsRemoteopportunity
Local Council Chatbot utilising Llama2 and dataset of PDF docs.
Full stack developer with relevant experience in AWS services and LLM deployment. Description Develop an MVP that provides a chat interface allowing users to query a dataset of local council documents, which will variously include minutes and policy documents. A dataset that contains all information relating to the purpose, policies, news, information, and decision making by that council. The dataset would contain approximately 100 PDF documents, and the chatbot would return meaningful and coherent answers to user prompts, while providing reference links to documents that information in the response is taken from. The client acknowledges the current limitations of LLMs in returning responses from queries across multiple documents, especially given current token limits and processing cost restrictions. A developer is sought that can leverage techniques to embed metadata in the text, allowing techniques such as RAG to extract snippets of data from multiple documents relating to the query and collate them into a response to the user, while adhering to token limits. Objective Develop an automated semantic text analysis pipeline that processes and analyses textual data extracted from documents using Llama2. This pipeline enriches text with metadata for deeper insights and enables semantic search capabilities through a user-friendly interface. This stage of the project is for a MVP system, leveraging AWS services such as Textract for text extraction, a text categorising stage with a simple to use GUI, all-mpnet-base-v2 for embedding, and Postgres with a vector extension. This job posting is for the MPV stage only, but we must be mindful of the stage two development and facilitate rapid and straightforward scalability in any stage one MPV processes. System Overview The solution encompasses AWS services for storage and processing, a custom interface for metadata enrichment, all-mpnet-base-v2 for generating text embeddings, Postgres and a vector extension for efficient storage and retrieval of vectors, and a custom-built web interface for user interaction. RAG will be implemented with a broad a context as possible to the model across a large document set. Phase 1: MVP Stage 1. Document Storage and Processing Trigger Tool: Amazon S3. Process: Upload documents (PDFs initially) to designated S3 buckets, documents will be renamed in accordance with a set naming convention and details of the document entered into the database. This triggers the subsequent text extraction process. For test purposes the uploads will be made manually, and at later stages a web scraper will be added that automatically places PDF documents into relevant S3 buckets. 2. Text Extraction - Tool: AWS Textract. - Process: Text is extracted from uploaded PDF documents and temporarily stored in A3 buckets to facilitate further processing. 3. Text Enrichment Developer to advise on best method of adding labels / categories to the text, via an easy to use interface. Labels to be added at a granular level to allow the return of text snippets from within the chunks of data, but with relevant metadata. The purpose of this is to provide context to the LLM in formulating responses from a broad range of documents without exceeding the token limit. 4. Text Vectorization • Embedding tool: all-mpnet-base-v2 • LLM: Amazon SageMaker (using LLaMA 2). Process: The text is processed with LLaMA 2 to generate vector embeddings, capturing semantic information for advanced analysis and search functionalities. 5. Vector Storage Tool: Postgres with a vector extension Process: Text vectors are stored in the database, allowing for efficient management and retrieval of vectorized data for semantic searches. 6. Front-end Web Application and Search Functionality Front-end Technology: • React.js. Key Features: • Semantic search input and results display. • email input field for collecting contact information for marketing purposes, forwarding to the client's email address. • Homepage containing descriptive marketing text. 3 pages total: home page, interaction page, contact page, plus a pop up with GDPR info. Graphics provided as template guidance. Back-end Technology: • Python with FastAPI. 7. Fine Tuning Allow for fine tuning based on a series of questions and responses to be provided by the client, until such point that coherent responses to queries are achieved. Phase 2: Full Automation and Scaling Beyond the scope of this job. Notes: The developer is to provide guidance and feedback on the capabilities of the technologies and is free to provide their own guidance and suggestions. However, the functionality of the system in providing coherent responses based on text snippets drawn from a large dataset is both the challenge and the absolute requirement. Please only bid with your full and final price. Placeholders will not be accepted. Completion with approximately two weeks. Please respond by explaining how you would handle the text enrichment?
21 days ago17 proposalsRemoteopportunity
Webinar Outlines and Marketing Content plus Speaker Bios
Could you write a clear concise and inspiring Bio for a keynote speaker? Could you summarise the content of a one-hour webinar into a clear concise and inspiring description? One that would persuade anyone that reads it to sign up and pay to watch it. Do you have the software to extract and transcribe a one hour webinar and turn the words in an editable format? If so I have over 200 webinars, each an hour long, given by a number of different speakers which each need three. -A speaker Bio -A marketing / sales outline of why the reader should listen to the webinar and the main four or five benefits they would gain -A usable transcript of the webinar audio. (Current Webinars here as an example: https://www.bemoreeffective.com/webinars/) Could you do this? Accurately? Cheaply? Quickly?
25 days ago27 proposalsRemoteopportunity
Automated Semantic Text Analysis Pipeline
Comprehensive Use Case Specification: Automated Semantic Text Analysis Pipeline Objective Develop an automated semantic text analysis pipeline that processes and analyses textual data extracted from documents. This pipeline enriches text with metadata for deeper insights and enables semantic search capabilities through a user-friendly interface. This stage of the project is for a MVP system should leverage AWS services such as Textract for text extraction, a text categorising stage with a simple to use GUI, all-mpnet-base-v2 for embedding, and Postgres with a vector extension. This job posting is for the MPV stage only, but we must be mindful of the stage two development and facilitate rapid and straightforward scalability in any stage one MPV processes. System Overview The solution encompasses AWS services for storage and processing, a custom interface for metadata enrichment, all-mpnet-base-v2 for generating text embeddings, Postgres and a vector extension for efficient storage and retrieval of vectors, and a custom-built web interface for user interaction. RAG will be implemented with a broad a context as possible to the model across a large document set. Phase 1: MVP Stage 1. Document Storage and Processing Trigger - Tool: Amazon S3. - Process: Upload documents (PDFs initially) to designated S3 buckets, documents will be remained in accordance with a set naming convention and key metadata relating to the document entered into the database for future reference. This triggers the subsequent text extraction process. For test purposes the uploads will be made manually, and at later stages a web scraper will be added that automatically places PDF documents into relevant S3 buckets. 2. Text Extraction - Tool: AWS Textract. - Process: Text is extracted from uploaded PDF documents and temporarily stored in A3 buckets to facilitate further processing. 3. Text Enrichment Developer to advise on best method of adding labels / categories to the text, via an easy to use interface. Labels to be added at a granular level to allow the return of text snippets, providing context to the LLM in formulating it's responses from a broad range of documents without exceeding the token limit. 4. Text Vectorization - Embedding tool: all-mpnet-base-v2 - LLM: Amazon SageMaker (using LLaMA 2). - Process: The text is processed with LLaMA 2 to generate vector embeddings, capturing semantic information for advanced analysis and search functionalities. 5. Vector Storage - Tool: Postgres with a vector extension - Process: Text vectors are stored in the database, allowing for efficient management and retrieval of vectorized data for semantic searches. 6. Front-end Web Application and Search Functionality - Front-end Technology: React.js. - Key Features: - Semantic search input and results display. - email input field for collecting contact information for marketing purposes, forwarding to the client's email address. - Homepage containing descriptive marketing text. - 3 pages total: home page, interaction page, contact page, plus a pop up with GDPR info. Graphics provided as template guidance. - Back-end Technology: Python with FastAPI. Phase 2: Full Automation and Scaling 1. Automated Document Ingestion - Process: A web scraping tool is implemented to automatically identify and upload new documents to the S3 bucket, facilitating a continuous flow of data into the pipeline without manual intervention. 2. Scalable Architecture - Deployment: The application components are containerized using Docker and managed with Kubernetes (Amazon EKS), ensuring the system can scale efficiently to accommodate increased data volumes and user queries. 3. Enhanced Processing Capabilities - Improvements: Integrate additional NLP and ML models for broader and more nuanced text analysis. Consider fine-tuning custom models for specific domain applications. 4. User registration and user management system integration. Please note the attached contract agreement that will be deemed agreed to upon acceptance of the project. Your price given on PPH will be deemed to be your full and final price, and you will be deemed to have fully understood the scope, brief, and specification. To provide context, the project business plan has been uploaded. This is for context only and does not form part of the brief.
22 days ago11 proposalsRemoteMVHR design for whole house
I seek an experienced mechanical ventilation and heat recovery (MVHR) design specialist to comprehensively model airflow requirements and duct routing for a two-level residential property of approximately 205 square meters. The goal is to implement a balanced whole-house ventilation strategy utilizing an Airflow DV130 mechanical ventilation with heat recovery unit. This multi-zone project necessitates zone-by-zone airflow calculations followed by optimized duct routing throughout floor levels to thoroughly and quietly distribute ventilation. Design requirements include fresh air intake, extract outlets, and heat recovery media sized to Code standards. Attention must be paid to achieve balanced pressure relationships and minimize duct length/fittings. Your experience designing similar mid-sized homes is needed to swiftly but accurately complete schematics, and a bill of materials. Knowledge of UK building regulations and best practice duct installation methods is important. Kindly send me your proposals.
15 days ago7 proposalsRemoteopportunity
Keyword Research Web Tool
I am building a Web application that helps businesses and website developers conduct keyword research to find all the keyword phrases typed into search engines, search volumes of the phrases, and categorize them in terms of high, medium or low competition. The basic features are: 1) Enter a phrase or query in a search field. 2) Select your country from a dropdown list of countries 3) Select your language from a dropdown list of countries 4) Select the search engine from four options, Google, Bing, YouTube and Amazon 5) Upon clicking the Enter button, the app connects to the search engine database and extracts the results. 5) The results are presented in 3 ways and will be exportable via CSV, Google Sheets or PDF: i) Visual Wheels ii) Lists iii) Alphabetical tables 6) The user can create projects and store several searches within the projects. Frontend Development Technologies: ReactJS or Vue.js D3.js or Highcharts Bootstrap or Tailwind CSS Backend Development Technologies: Node.js with Express.js Python: For more complex data processing or interaction with search engines' APIs. Database Technologies: MongoDB API Integration Custom API Development: For extracting search volumes and competition categories. Authentication and Authorization Technologies: OAuth and JWT (JSON Web Tokens): For managing user logins and securing sessions. Payment Gateway Integration Stripe or PayPal: To manage the subscription payments. Cloud Services and Deployment AWS or Google Cloud Platform: For hosting. Export Functionality Libraries or APIs for generating CSV, Google Sheets, and PDF formats will be necessary.
a month ago30 proposalsRemoteopportunity
Electrical systems designer
Hello, I am looking to work with someone who can design the electrical system for a web take-up and vacuum conveyor system. Currently, each product line and the waste web comes off the printer and falls into a tray for the operator to sort into bags. The end goal of this project is to automate the whole process. What does this look like. Automating the process involves an operator loading the material roll onto the machine, feeding the roll onto the bed and the take-up, and letting the machine do the rest. The web will travel along the cutting belt, onto the vacuum conveyor, the waste material is then extracted onto a take-up and the finished stickers travel down the vacuum conveyor to a bagging station. The initial take-up design consists of the web feeding through a load cell and onto the take-up mandrel fixed to a motor. I require someone who can produce an electrical load list, cable sizing, breaker sizing and finally, produce a wiring diagram for all the components. I will attach a few images below to give you an idea of the work involved, however, I would be happy to talk you through a 3d model, and all the documents we have. Thank you
23 days ago12 proposalsRemoteBackground remover
I require the development of a web-based background removal tool to remove backgrounds from images automatically. The tool should utilize state-of-the-art deep learning techniques like semantic segmentation and instance segmentation to identify and isolate foreground objects from their backgrounds with high accuracy. It is important that the extracted foregrounds are as clean as possible with no visible artifacts or remnants from the original background. The web app should have an easy-to-use interface where users can upload an image, select the foreground objects they want to retain, and remove the background with a single click. Advanced customization options like adjusting eraser size, threshold, etc. would be appreciated but not necessary. Once processed, the background-removed images should be available for instant download in PNG format. Integration with cloud storage for handling large files and version control would be beneficial. Experience with TensorFlow, PyTorch or other deep learning frameworks for computer vision tasks and building full-stack web applications is required to take up this challenging yet interesting project.
21 days ago30 proposalsRemote