Our mission is to build multimodal AI to expand human imagination and capabilities
We believe that multimodality is critical for intelligence. To go beyond language models and build more aware, capable and useful systems, the next step function change will come from vision. So, we are working on training and scaling up multimodal foundation models for systems that can see and understand, show and explain, and eventually interact with our world to effect change.
We will deploy these systems to make a new kind of intelligent creative partner that can imagine with us. Free and away from the pressure of being creative. It's for all of us whose imaginations have been constrained, who've had to channel vivid dreams through broken words, hoping others will see what we see in our mind's eye. A partner that can help us show — not just tell.
Dream Machine is an early step to building that.
Try it here
Why you should join us:
- Luma is bringing together the best team in the world to achieve our goal, from researchers to engineers and designers to growth operators
- Luma is not just a lab - we are deeply product focused and our vision merging AI models and delightful products is unique in the industry
- We build. We ship. Our early products have been wildly successful
What do we value?
- Expertise in your field
- Urgency, velocity and execution
- Problem solving mindset
- Clear communication
- Product focus
As MLE on Luma's Data team you are responsible for raising the bar for our data quality. Data is the critical foundation of our products, and we are looking for individuals who can identify creative approaches to data and captioning and then implement solutions for processing at PB scale. Good candidates should have exceptional general python engineering skills alongside a combination of industry ML experience, Data experience, and passion for building AI products.
Competencies
Responsibilities
- Design data pipelines, including finding appropriate data sources, scraping, filtering, post-processing, de-duplicating, and versioning. The system should be robust and scalable for production use.
- Design and implement frameworks to evaluate the effectiveness of our models and data. For example, set up the standards for an automated evaluation pipeline to run before any new model gets deployed into the API.
- Work closely with others who might be data contributors or consumers or both to incorporate their data usage needs on a variety of tasks and domains.
- Work with human labeling vendors to refine the procedure and guidelines to collect high-quality human annotation data.
- Conduct open-ended research to improve the quality of collected data, including but not limited to, semi-supervised learning, human-in-the-loop machine learning and fine-tuning with human feedback.
Experience
- 5+ years of relevant experience or demonstration of high impact projects as a Data Engineer, Machine Learning Engineer, or Data Scientist, dealing with large amounts of data on a daily basis.
- Have a strong belief in the criticality of high-quality data and are highly motivated to work with the associated challenges.
- Have experience working in large distributed systems.
- Strong generalist python and pytorch skills
- Experience using SQL, Spark, or other tools for processing large amounts of data.
Please note this role is not meant for recent grads.