Job Title: Resilience, Testability & Scalability Lead
Location: Fort Mill, SC / New York / New Jersey (Hybrid)
Data Platforms – Engineering Quality & Resilience Track
Role Overview:
We are looking for, technically strong Resilience, Testability & Scalability Lead to drive engineering excellence across our data platforms and cloud-based applications. This role is critical in ensuring system uptime, test automation maturity, performance under scale, and architectural resilience to meet stringent regulatory and service-level demands.
The ideal candidate will have a deep background in designing highly available systems, implementing robust disaster recovery, managing scalable cloud infrastructure, and building automated, testable, and observable platforms—especially within AWS and Kubernetes environments.
Key Responsibilities:
•Design and implement high availability and failover strategies across multi-zone AWS deployments
•Lead the development and execution of disaster recovery and business continuity plans, including RTO/RPO validation and cross-region strategies
•Define testability strategies, test data management frameworks, and performance testing protocols
•Enable infrastructure and application resilience by introducing circuit breakers, retry patterns, service meshes, and graceful degradation mechanisms
•Establish real-time monitoring, alerting, and log aggregation frameworks using tools like CloudWatch and Prometheus
•Drive test automation and quality engineering best practices, integrating with CI/CD pipelines
•Optimize application and data layer performance through query tuning, caching, and indexing strategies
•Scale data processing using distributed frameworks like Apache Spark, and implement event-driven stream processing with Kafka
•Collaborate with platform, DevOps, and SRE teams to ensure resource efficiency, cost control, and performance SLAs
•Contribute to regulatory readiness by enforcing security, encryption, and audit logging standards
Required Skills & Experience:
Infrastructure Resilience & DR:
•Multi-AZ deployments, auto-scaling, load balancing, circuit breakers
•Disaster recovery design: backup/restore, cross-region replication, RTO/RPO
Monitoring & Observability:
•Experience with CloudWatch, Prometheus, log aggregators
•Set up alerting for incident response, latency, throughput, and error rates
Application Resilience & Security:
•Error handling, service degradation, exponential backoff
•Security best practices: IAM policies, encryption at rest/transit
•Familiarity with FINRA/SIPC compliance standards (preferred)
Test Automation & Quality:
•Unit testing (e.g., pytest), integration testing, E2E automation
•Test data generation, synthetic data, environment provisioning
•Performance testing using JMeter, Gatling, stress and capacity testing
•Code reviews, static analysis, data validation, anomaly detection
Scalability & Optimization:
•Horizontal scaling using Kubernetes, Docker, service discovery
•API Gateway, caching layers (Redis, Memcached), DB partitioning
•Connection pooling, capacity planning, cost-aware architecture
Data & Stream Processing:
•Spark cluster management, parallel processing, big data optimization
•Kafka-based messaging, windowing, and aggregation for real-time data
Preferred Qualifications:
•Experience in financial services or regulated environments
•Familiarity with LPL’s enterprise data and platform modernization initiatives
•AWS or Kubernetes certifications
•Strong communication skills and cross-functional collaboration experience
If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.
Be a pivotal part of our team as a Lead Java Backend Developer specializing in Kafka to create and optimize scalable backend systems.
Seeking a Lead Java Backend Developer with Kafka expertise to design and maintain scalable backend systems in a collaborative environment.
A Sr Principal Engineer Systems Architect position at Northrop Grumman focused on advanced missile defense system architecture in Huntsville, AL.
Lead innovation in robotic surgical technologies as Principal Hardware Electrical Engineer at Medtronic, driving product development from concept through production.
The CAD Operator role at Kimley-Horn involves creating and managing complex land development plans using AutoCAD, supporting commercial to residential projects.
A DevOps Engineer role in a mature healthcare tech company, remotely driving scalable cloud infrastructure and streamlining software delivery.
Lead Intel’s AI SoC Architecture team to pioneer next-generation AI accelerator designs that push the boundaries of performance and innovation.
Innovate in satellite signal processing and software-defined radio technologies as a Senior Research Associate Engineer at The Ohio State University.
Medtronic is looking for a Systems Engineer I to contribute to the design and development of advanced spinal surgical systems, enhancing precision and patient care.
Contribute hands-on mechanical engineering skills at Phoenix Tailings in the design, assembly, and testing of hardware for sustainable mining technologies.
Lead engineering and scientific efforts for innovative unmanned aerial system solutions at Threat Tec, a rapidly growing Veteran-Owned business.
An experienced Cloud Engineering Architect is needed to drive AWS cloud infrastructure innovation and lead a skilled engineering team at Accurate Background.
Join Marvin as a Manufacturing Engineer Intern and contribute to innovative process improvements in a leading window and door company.
Join a dynamic team at The Boeing Company as a Test & Evaluation Engineer focused on enhancing the capabilities of the B-1B and B-52H aircraft.
An experienced Civil Engineer with strong project management skills is sought to join Kimley-Horn’s respected Land Development team in Reston, VA to deliver top-tier site development projects.
Subscribe to Rise newsletter