Job Description
Who we areAt Twelve Labs, we are pioneering the development of cutting-edge multimodal foundation models that have the ability to comprehend videos just like humans do. Our models have redefined the standards in video-language modeling, empowering us with more intuitive and far-reaching capabilities, and fundamentally transforming the way we interact with and analyze various forms of media.With a remarkable $107 million in Seed and Series A funding, our company is backed by top-tier venture capital firms such as NVIDIA’s NVentures, NEA, Radical Ventures, and Index Ventures, and prominent AI visionaries and founders such as Fei-Fei Li, Silvio Savarese, Alexandr Wang and more. Headquartered in San Francisco, with an influential APAC presence in Seoul, our global footprint underscores our commitment to driving worldwide innovation.We are a global company that values the uniqueness of each person’s journey. It is the differences in our cultural, educational, and life experiences that allow us to constantly challenge the status quo. We are looking for individuals who are motivated by our mission and eager to make an impact as we push the bounds of technology to transform the world. Join us as we revolutionize video understanding and multimodal AI.About the roleAs Machine Learning Engineer, Distributed Training Infrastructure, you will be responsible for ensuring that compute performance and ease-of-use never delay our research timeline. You will own strategy and implementation for all compute & training infrastructure optimization, observability, scaling, and orchestration. You will collaborate closely with other engineers and scientists to define and implement your chosen roadmap. This role is a perfect fit for research minded compute specialists who want to build SOTA video, vision, and video-language modeling systems!
Twelve Labs provides a platform that enables businesses and developers to access multimodal video understanding capabilities. The platform allows users to analyze and interpret video content through various modalities, including visual and auditory data. It supports the development of applications that can process and understand complex video information. The company serves industries such as technology, media, entertainment, and security.