
AI Vision System - YOLO Dataset Preparation
- or -
Post a project like this- Posted:
- Proposals: 15
- Remote
- #4487291
- Expired









Description
Description:
I am building an AI-based visual inspection system using real-world highway images and need support preparing a high-quality dataset for YOLO training.
This is a structured, multi-phase project. Accuracy, consistency, and attention to detail are critical.
⚠️ Important: This job will begin with Phase 1 only (image scrubbing and classification).
Further phases (annotation and YOLO dataset preparation) will follow based on performance.
---
Scope of Work:
Phase 1 – Image Scrubbing & Pre-Classification (Current Phase)
This is the most critical step of the project.
* Review large volumes of real-world highway images (hundreds of thousands available)
* Identify and filter out images that are not useful (no defects, irrelevant content, low-quality data)
* Sort and group images into the correct defect classifications based on provided examples
* Ensure consistency when assigning images to classes (similar defects must always be grouped the same way)
Note:
These are real-world inspection images. Multiple defect types may appear across different stretches of highway, and some images may contain no relevant defects at all. Strong judgment is required. Please message if you have any questions, i need acuracy and strong attention to detail.
---
Phase 2 – Image Annotation (Future Phase)
* Use LabelMe to annotate images
* Draw bounding boxes or polygons depending on object type
* Label objects according to a predefined class list (34–36 classes)
* Follow strict naming and labeling conventions
---
Phase 3 – Dataset Preparation for YOLO (Future Phase)
* Convert LabelMe annotations into YOLO format
* Ensure correct class IDs and structure
* Organize dataset into train/ and val/ folders
* Verify all images have matching label files
---
Phase 4 – Quality Control (Ongoing)
* Ensure labels are accurate and consistent
* Avoid missing or incorrect annotations
* Perform validation before delivery
---
Class System:
* 34–36 defect classes
* Each class will be provided with example images
* All classes are important — the goal is to reflect real-world conditions, not prioritize a subset
* Consistency across similar defect types is critical
---
Requirements:
* Experience reviewing or organizing large image datasets
* Strong attention to detail and consistency
* Ability to follow structured instructions and class definitions
* Familiarity with LabelMe or similar tools is a plus
* Basic understanding of YOLO format is a plus (required for later phases)
---
Deliverables (Phase 1):
* Scrubbed and filtered image sets
* Images grouped into correct classifications
* Clean and organized folder structure
---
Volume:
* Very large dataset (hundreds of thousands of images available)
* Initial batches will be provided for Phase 1
* Only a subset of images will move forward to annotation
* Potential for ongoing work across multiple phases
---
To Apply:
Please include:
* Confirmation that you understand Phase 1 is focused on image scrubbing and classification
* Your approach to reviewing and filtering large image datasets
* Your expected turnaround time for an initial batch
* Any relevant experience with image datasets or annotation work
---
Test Task:
A small test batch will be provided.
You will be asked to scrub and classify images based on provided examples.
---
Notes:
* Accuracy and consistency are more important than speed
* This is part of a larger AI system — data quality is critical
* Strong performance in Phase 1 may lead to continued work in annotation and dataset preparation phases
Marco T.
0% (0)New Proposal
Login to your account and send a proposal now to get this project.
Log inClarification Board Ask a Question
-

Hi,
Before I apply, I just want to clarify a few points to make sure expectations are aligned for Phase 1:
How many images are included in the initial batch for the test and first phase, and how will batches be delivered?
Will you be providing a clearly defined class reference guide with example images for each defect type, or is part of the role interpreting and refining classifications?
How should edge cases be handled where defects are unclear, overlapping, or do not fully match a defined class?
Do you expect images with multiple defect types to be assigned to a single primary class in Phase 1, or grouped separately for multi-label handling later?
What level of filtering is expected for borderline images, for example low visibility defects or partially obscured data?
Is there a preferred folder structure and naming convention for the classified output, or should this be defined as part of the process?
Will there be a validation step or feedback loop after the first batch to ensure classification consistency before scaling?
What is your expected turnaround time per batch once the workflow is established?
Once I understand these points, I can outline a clear and consistent approach for handling large volumes while maintaining accuracy.
Thanks,
Uthman -

Hi Marco,
A few quick clarifications:
1. Should no-defect images be removed or placed in a separate class?
2. Will a detailed class guide be provided for all categories?
3. How should multi-defect images be handled (primary class vs multiple vs flag)?
4. What’s the approach for ambiguous cases — skip or flag for review?
5. Any required folder structure or naming conventions?
6. What is the initial batch size and expected turnaround?
7. Will there be a review/feedback step after the first batch?
Looking Forward.
Best Regards. -

For the $300 Phase 1 pass, what approximate image count do you want in the initial batch, and do you want the output grouped by your 34–36 classes with a separate reject folder for no-deterioration / unusable images?
-

- How clearly defined are your defect classes at this stage—do you already have strict classification guidelines, or would you like help refining edge cases where defects may overlap or appear ambiguous?
- What criteria should be used to reject images—beyond obvious issues like low quality or irrelevance, are there specific thresholds (e.g., defect visibility, size, or clarity) that determine whether an image is usable?Marco T.10 Apr 2026I have strict guidelines per project SOP's, and I have samples of each classsification.
-

Could you please specify the approximate number of images included in the Phase 1 batch
Marco T.10 Apr 2026I have hundreds of thousands of images. We are using real world images from the cameras installed in our vehicles. They are all shot in the same size and there are no blurry images. The easiest part is removing the ones with do deteriorating. That easily removes 70 to 80 % of the images. from there its finding enough images of each classification.
Vijeet D.10 Apr 2026ok what I meant to ask is your budget 300$ for how many images? because its unrealistic for 100s of 1000s of images
-

Hi Marco,
Can you please share the test batch so we may perform scrubbing & classification?
Looking forward to your reply.
Best Regards,
VConn Pvt Ltd
-

Where is your listed below attached files?
batch data freelancer.zip
sample 1.mp4
asf_agrietamiento_fatiga.pdf
corriemento o ondulaciones.png
bache.png
agrietamiento por fatiga.png
grieta longitudinal.png