We are Genmo, a research lab dedicated to building open, state-of-the-art models for video generation towards unlocking the right brain of AGI. Join us in shaping the future of AI and pushing the boundaries of what's possible in video generation.
The Role
You'll own our model serving layer, implementing high-performance inference systems that can handle millions of requests daily. You'll work at the intersection of ML frameworks and cloud infrastructure, building automated pipelines for model optimization and deployment. Your work will directly impact the performance and scalability of our video generation models, ensuring sub-second latency at global scale.
Key Responsibilities
Design and implement high-performance model serving infrastructure supporting streaming, batching, and multi-modal inputs
Build automated model compilation and optimization pipelines using TensorRT, torch.compile, and other compilers
Optimize serving systems for throughput, latency, and GPU utilization across our H100 fleet
Develop monitoring and observability for model-specific metrics (quality, latency, throughput)
Collaborate with researchers to transition models from development to production
Implement A/B testing, canary deployments, and gradual rollout strategies for models
Integrate serving layer with platform infrastructure (load balancers, API gateways, queue systems)
Qualifications
Bachelor's or Master's degree in Computer Science or related field
4+ years ML engineering experience with 2+ years focused on model serving
Production experience with high-performance model serving frameworks (vLLM, SGLang, TensorRT-LLM, or similar)
Strong Python proficiency and PyTorch experience
Experience with model compilation and optimization (TensorRT, ONNX, quantization)
Track record of building inference systems at scale (10K+ QPS)
Understanding of attention mechanisms and transformer architectures
Experience with containerized deployment and orchestration
We Value
Contributions to open-source serving frameworks
Experience with continuous batching and advanced serving optimizations
Knowledge of GPU architecture and memory management
Background at companies with large-scale ML serving
Experience with streaming/iterative generation patterns
Genmo is an Equal Opportunity Employer. Candidates are evaluated without regard to age, race, color, religion, sex, disability, national origin, sexual orientation, veteran status, or any other characteristic protected by federal or state law. Genmo, Inc. is an E-Verify company and you may review the Notice of E-Verify Participation and the Right to Work posters in English and Spanish.
If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.
Genmo seeks a strategic Cloud Quant to enhance our innovative GPU infrastructure and propel the next era of AI video generation.
Become part of RESPEC’s dedicated team, tackling bold challenges as a .Net Developer focused on transformative digital applications.
Join Intuitive as a Senior Software Engineer to develop scalable data solutions that enhance healthcare delivery through innovative technology.
A Software Engineer role at PNC in Pittsburgh focused on designing, developing, and maintaining effective software solutions that meet business needs.
Join Benchling as a Full Stack Software Engineer to build and maintain core platforms that enable innovative biotech solutions.
Join eBay as a Senior Staff Software Engineer and lead the evolution of our commerce platform from Chicago, collaborating with innovative thinkers.
Experienced Staff Software Architect wanted to guide the scalable platform architecture and technical leadership at Teambridge, a rapidly growing workforce technology company in San Francisco.
Be part of Genius Sports as a Senior Full Stack Software Engineer, driving innovation and building AI-based products that enhance sports analytics.
Roblox is on the lookout for a Principal Software Engineer to drive the development of cutting-edge avatar rendering technologies.
Contribute as an Entry-Level Software Engineer at Drafted, building scalable backend services and collaborating on new features while working remotely in a dynamic early-career hiring platform.
Stryker is seeking a Senior Staff Software Engineer to enhance our pioneering Edge platform, vital for managing clinical communications in healthcare.
Become a key player in developing innovative Android applications at IXL Learning, the leading EdTech company dedicated to improving education through technology.
Bevy is seeking a Mid-level Frontend Engineer to enhance user experiences with React.js in a fully remote environment.
Join Palo Alto Networks as a Senior Staff AI Engineer, driving impactful AI solutions in a collaborative environment that prioritizes innovation and uniqueness.
Subscribe to Rise newsletter