Job details

Applied AI Software Engineer

Canvas Medical is the electronic medical records (EMR) and payments development platform for healthcare. We build modern, elegant front- and back-end tooling to enable new ways for developers and clinicians to collaborate to solve healthcare’s toughest challenges. Canvas is institutionally backed by some of the greatest technology investors in the world (funded notable health tech companies such as GoodRx, Oscar Health, and Hims & Hers Health).

The Role

We’re hiring an Applied AI Software Engineer to lead evaluations for agents in development and the post-deployment fleet of agents operating in Canvas to automate work for our customers. You will help develop agents in Canvas using state of the art foundation model inference and fine-tuning APIs along with our server-side SDK. The server-side SDK provides extensive tools and virtually all the context necessary for excellent agent performance. You’ll be responsible for designing and running rigorous evaluation experiments that measure performance, safety, and reliability across a wide variety of clinical, operational, and financial use cases.

This role is ideal for someone with deep experience evaluating LLM-based agents at scale. You’ll create high-fidelity unit evals and end-to-end evaluations, define expert-determined ground truth outcomes, and manage iterations across model variants, prompts, tool use, and context window configurations. Your work will directly inform model selection, fine-tuning, and go/no-go decisions for AI features used in production settings.

You’ll collaborate with product, ML engineering, and clinical informatics teams to ensure that Canvas's AI agents are not only capable, but trustworthy and robust under real-world healthcare constraints. You will also work with technical product marketers and developer advocates to help our broader developer community and the broader market understand the uniquely differentiated value of agents in Canvas.

Who You Are

You have extensive hands-on experience evaluating LLM-based systems, including multi-agent architectures and prompt-based pipelines.
You are deeply familiar with foundation model APIs (OpenAI, Claude, Gemini, etc.) and how to systematically benchmark agent performance using those models in applied settings.
You care about correctness and reproducibility and have built or contributed to frameworks for automated evals, annotation pipelines, and experiment tracking.
You bring structure to ambiguity and know how to define “correctness” in complex, nuanced domains.
You are comfortable collaborating across engineering, product, and clinical subject matter experts.
You are not afraid of complexity and are energized by the rigor required in healthcare deployments.

What You’ll Do

Design and execute large-scale evaluation plans for LLM-based agents performing clinical documentation, scheduling, billing, communications, and general workflow automation tasks.
Build end-to-end test harnesses that validate model behavior under different configurations (prompt templates, context sources, tool availability, etc.).
Partner with clinicians to define accurate expected outcomes (gold standard) for performance comparisons in domains of clinical consequence, and partner with other subject matter experts in other non-clinical domains.
Run and replicate experiments across multiple models, parameters, and interaction types to determine optimal configurations.
Deploy and maintain ongoing sampling for post-deployment governance of agent fleets.
Analyze results and summarize tradeoffs in clarity for product and engineering stakeholders, as well as for technical stakeholders among our customers and the broader market.
Take ownership over internal eval tooling and infrastructure, ensuring speed, rigor, and reproducibility.
Identify and recommend candidates for reinforcement fine-tuning or retrieval augmentation based on gaps identified in evals.

What Success Looks Like at 90 Days

An expanded set of robust evaluation suites exists for all major AI features currently in development and in production.
We have well-defined correctness criteria for each workflow and a reliable source of expert-determined outcome objects.
Product and engineering teams have integrated your evaluation tools into their daily workflows.
Evaluation results are clearly documented and reproducible, enabling trust in the performance trajectory.
Your have effectively engaged your marketing counterparts to translate your work into key messages to the market and to Canvas customers.

Qualifications

5+ years of experience in applied machine learning or AI engineering, with a focus on evaluation and benchmarking.
Proficiency with foundation model APIs and experience orchestrating complex agent behaviors via prompts or tools.
Experience designing and running high-throughput evaluation pipelines, ideally including human-in-the-loop or expert-labeled benchmarks.
Superlative Python engineering skills and familiarity with experiment management tools and data engineering toolsets in general including, yes, SQL and database management.
Familiarity with clinical or healthcare data is a strong plus.
Experience with reinforcement fine-tuning, model monitoring, or RLHF is a plus.
Research shows that women and other minority groups might avoid applying if they don’t meet 100% of the qualifications. We encourage you to apply even if you don’t meet everything listed in the job posting.

$300,000 - $400,000 a year

We are a mostly remote, distributed team. We encourage people to do their work when and where they perform at their best. Because of this structure, strong written communication skills, time management skills, and personal accountability are very important to us.

Employee Benefits:

Competitive Salary & Equity Package

Health Insurance

Home Office Stipend

401k

Paid Maternity/Paternity Leave (12 weeks)

Flexible/unlimited PTO

Canvas Medical provides equal employment opportunities to all employees and applicants for employment without regard to race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

Average salary estimate

$350000 / YEARLY (est.)

min

max

$300000K

$400000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

Similar Jobs

Technical Intern: AI + Web App Development

Wavess Hybrid United States

VIEW

Posted yesterday

Contribute as a Technical Intern in AI and web app development at Wavess, an innovative early-stage AI marketing platform, working remotely on real product features alongside the founding team.

Mobile Test Automation Architect with BLE Technology (Only Locals to CA)

VTekis Consulting LLP Hybrid CA-84, Pleasanton, CA, USA

VIEW

Posted 9 days ago

Join us as a Mobile Test Automation Architect and drive the evolution of automated testing solutions in MedTech applications.

Senior Software Engineer Backend (Python, Microservices, Gen AI) (9848)

Extreme Networks Hybrid Illinois, United States

VIEW

Posted 9 days ago

Be part of a cutting-edge team at Extreme Networks as a Senior Software Engineer Backend focusing on Python, Microservices, and Generative AI innovations.

Fullstack Engineer, Mobile Observability

LaunchDarkly Hybrid United States

VIEW

Posted 8 days ago

Dental Insurance

Disability Insurance

Flexible Spending Account (FSA)

Health Savings Account (HSA)

Vision Insurance

Family Medical Leave

Paid Holidays

Join LaunchDarkly's Observability team as a Full Stack Engineer and create groundbreaking telemetry solutions for mobile applications.

Senior iOS Software Developer (Hybrid)

Detroit Labs Hybrid No location specified

VIEW

Posted 6 days ago

As a Senior iOS Developer at Detroit Labs, you'll shape the future of connected vehicle applications in an exciting hybrid role.

Senior Software Engineer, ML Infra

Ambient.ai Hybrid SF Bay Area

VIEW

Posted 7 days ago

Join Ambient.ai as a Senior Software Engineer to shape the future of AI-powered physical security solutions with cutting-edge machine learning infrastructure.

Senior Software Engineer II

Talkdesk Hybrid Seattle, Washington, United States

VIEW

Posted 4 days ago

Be part of Talkdesk, a cloud contact center leader, as a Senior Software Engineer focusing on AI-powered solutions.

Software Engineering Intern

Blockhouse Hybrid New York City

VIEW

Posted 8 days ago

Join Blockhouse as a Software Engineering Intern and gain hands-on experience in building innovative backend solutions for financial technology.

Principal Platform Engineer

Bankjoy Hybrid No location specified

VIEW

Posted 4 days ago

Dental Insurance

Vision Insurance

Performance Bonus

Paid Holidays

As a Principal Platform Engineer at Bankjoy, you'll drive the evolution of our Azure infrastructure to enhance security and efficiency for community banks and credit unions.

Embedded Software Engineer

Jergens, Inc. Hybrid Cleveland, Ohio, United States

VIEW

Posted 8 days ago

Join ASG Division of Jergens as an Embedded Software Engineer and help pioneer advanced fastening technologies.

Staff Software Engineer, Backend (can be remote)

SAGA Diagnostics Hybrid No location specified

VIEW

Posted 7 days ago

SAGA Diagnostics invites an experienced Staff Software Engineer, Backend to lead the design of innovative solutions in a remote setting.

Machine Learning Engineer

Substack Hybrid San Francisco

VIEW

Posted 12 days ago

Fast-Paced

Startup Mindset

Inclusive & Diverse

Collaboration over Competition

As a Machine Learning Engineer at Substack, you'll innovate and integrate cutting-edge machine learning solutions into our publishing platform.

Full Stack Engineer - Global Accounts Receivable

American Express Hybrid Sunrise, Florida, United States

VIEW

Posted 5 days ago

Inclusive & Diverse

Empathetic

Collaboration over Competition

Growth & Learning

Transparent & Candid

Medical Insurance

Dental Insurance

Mental Health Resources

Life insurance

Disability Insurance

Child Care stipend

Employee Resource Groups

Learning & Development

Become a pivotal part of American Express as a Full Stack Engineer, where your technical expertise will play a key role in developing advanced financial solutions.