ML Ops / Data Operations Engineer
Elevate Robotics, Inc., is a subsidiary of Apptronik. Elevate was formed to democratize superhuman mobile manipulation to improve worker safety and to help solve the ever-increasing labor shortage problem. Our team has a heritage building some of the most advanced robots on the planet for years, dating back to the DARPA Robotics Challenge. We apply our expertise across the full robotics stack to some of the most important and impactful problems our society faces and expect our products and technology to change the world for the better. We value creativity, humility, integrity, passion, and curiosity to help us overcome existing technological barriers in the industry to create truly innovative products.
JOB SUMMARY
We are seeking an experienced MLOps / Data Operations Engineer to own and scale the data foundation that powers our Vision-Language-Action (VLA) models for mobile manipulation. This role is data-centric: focused on how data is collected, curated, versioned, validated, and delivered to machine learning teams for training.
You will work closely with machine learning engineers, robot operators, and data labeling teams to ensure that high-quality, well-structured datasets are continuously produced and ready for use in model training. You will design processes and build tooling that make data collection efficient, reliable, and scalable as our robotic systems operate in the real world.
This role is critical to enabling rapid iteration and sustained model performance by ensuring both data quality and data volume keep pace with our training needs.
KEY RESPONSIBILITIES
Data Collection & Curation (Core Responsibility)
Design, implement, and maintain processes and tools for collecting training data from robotic systems in real-world environments.
Define and enforce data quality standards, validation checks, and acceptance criteria for training readiness.
Curate datasets by filtering, cleaning, structuring, and annotating raw sensor data (e.g., images, video, metadata).
Work closely with robot operators and data labeling teams to improve collection protocols, labeling accuracy, and throughput.
Develop feedback loops to identify data gaps, failure modes, and distribution issues in existing datasets.
Dataset Management & Versioning
Own dataset versioning, lineage, and reproducibility across experiments and model iterations.
Build and maintain tooling for tracking dataset provenance, splits, and compatibility with different training runs.
Ensure datasets are discoverable, auditable, and easy for ML engineers to consume.
Data Scale & Efficiency
Drive strategies to increase effective data volume, including:
More efficient data collection workflows
Smart sampling and filtering
Data augmentation and transformation pipelines
Partner with operations and robotics teams to maximize useful data yield from deployed robots.
Collaboration with ML Engineers
Work closely with machine learning engineers to understand training requirements, data formats, and failure cases.
Translate model performance issues into actionable data collection or curation improvements.
Ensure data pipelines align with evolving model architectures and training workflows.
Infrastructure & Tooling
Build internal tools, scripts, and services to automate data ingestion, processing, validation, and export.
Maintain scalable storage solutions for large datasets (e.g., images, video, logs).
Implement monitoring and metrics for data quality, freshness, and coverage.
Training & Experiment Support (Nice to Have)
Support training workflows by enabling reliable access to curated datasets.
(Nice to have) Assist with scheduling or monitoring large training runs in collaboration with ML engineers.
Skills & Requirements
Strong software engineering fundamentals (Python required; testing, documentation, version control).
Professional experience in data engineering, MLOps, or ML infrastructure roles.
Hands-on experience building data pipelines for large-scale or production ML systems.
Experience managing large datasets, including storage, versioning, and validation.
Ability to work cross-functionally with ML engineers, robotics teams, and non-software operators.
Comfort operating close to real-world systems where data is messy, incomplete, or noisy.
Preferred / Bonus Qualifications
Familiarity with ML training workflows and frameworks (e.g., PyTorch, TensorFlow).
Experience with distributed systems or large-scale data processing.
Experience with robotics data (vision, perception, logs, sensor streams).
Experience supporting labeling workflows or annotation tooling.
Exposure to training job scheduling frameworks (e.g., Ray, Slurm).
Education & Experience
Bachelor’s or Master’s degree in Computer Science, Engineering, or a related technical field.
3+ years of experience in software engineering, data engineering, or ML infrastructure roles.
Experience building tools or platforms used by ML engineers or researchers.
Physical Requirements
Prolonged periods of sitting at a desk and working on a computer.
Ability to lift up to 15 pounds occasionally.
Vision to read printed materials and a computer screen.
Hearing and speech sufficient for communication
