Principal Engineer ( OpenShift Operations )
Expanse
Principal Engineer ( OpenShift Operations )
Current Employees, apply here
Our Mission
At Palo Alto Networks®, we’re united by a shared mission—to protect our digital way of life. We thrive at the intersection of innovation and impact, solving real-world problems with cutting-edge technology and bold thinking. Here, everyone has a voice, and every idea counts. If you’re ready to do the most meaningful work of your career alongside people who are just as passionate as you are, you’re in the right place.
Who We Are
In order to be the cybersecurity partner of choice, we must trailblaze the path and shape the future of our industry. This is something our employees work at each day and is defined by our values: Disruption, Collaboration, Execution, Integrity, and Inclusion. We weave AI into the fabric of everything we do and use it to augment the impact every individual can have. If you are passionate about solving real-world problems and ideating beside the best and the brightest, we invite you to join us!
We believe collaboration thrives in person. That’s why most of our teams work from the office full time, with flexibility when it’s needed. This model supports real-time problem-solving, stronger relationships, and the kind of precision that drives great outcomes.Job Summary
Our Mission
At Palo Alto Networks® everything starts and ends with our mission:
Being the cybersecurity partner of choice, protecting our digital way of life.
We have the vision of a world where each day is safer and more secure than the one before. These aren’t easy goals to accomplish – but we’re not here for easy. We’re here for better. We are a company built on the foundation of challenging and disrupting the way things are done, and we’re looking for innovators who are as committed to shaping the future of cybersecurity as we are.
Disruption is at the core of our technology and on our way of work to meet the needs of our employees now and in the future through FLEXWORK, our approach to how we work. We’re changing the nature of work from benefits to learning, location to leadership, we’ve rethought and recreated every aspect of the employee experience at Palo Alto Networks. And because it FLEXes around each individual employee based on their individual choices, employees are empowered to push boundaries and help us all evolve, together.
Your Career
You will be responsible for the design and development of a scalable distributed management plane infrastructure to manage Palo Alto Networks’ next-generation network security solutions.
Your Impact
The Senior Data Center Operations Engineer is responsible for the bedrock of our high-availability infrastructure. This role bridges the gap between physical hardware and the Red Hat OpenShift Container Platform (OCP). Your mission is to ensure 99.99% availability by architecting resilient physical layouts and automating the deployment, scaling, and self-healing capabilities of our production clusters.
Key Responsibilities
High-Availability (HA) Infrastructure: Monitor and maintain data center systems with a focus on "Zero Single Point of Failure" (ZSPoF) architecture for OpenShift control planes and worker nodes.
Cluster Reliability Engineering: Implement and manage OpenShift 4.x clusters across multiple power and cooling zones to ensure 99.99% uptime.
Disaster Recovery & Business Continuity: Design, test, and execute automated failover strategies and backup/restore procedures using tools like OADP (Velero) and Red Hat ACM.
Automated Maintenance: Perform routine maintenance and upgrades using GitOps (ArgoCD) and the Machine Config Operator to ensure zero-downtime node evacuations and patching.
Complex Troubleshooting: Resolve deep-stack hardware and software issues, from faulty GPU firmware to OpenShift SDN (OVN-Kubernetes) network latencies.
Vendor & Lifecycle Management: Coordinate with vendors for specialized hardware (e.g., NVIDIA, Dell, Cisco) while maintaining strict security and firmware compliance.
Efficiency & Capacity Architecture: Optimize rack density for high-performance GPU clusters while managing thermal loads and power distribution (PDU) to prevent circuit-trip outages.
Observability Implementation: Maintain accurate documentation and integrate hardware health metrics (IPMI/SNMP) into Prometheus/Grafana for proactive alerting.
Physical Deployment: Rack and stack high-density GPU servers, ensuring redundant power-pathing and high-speed (100G/200G) InfiniBand or Ethernet cabling.
Hardware Lifecycle: Perform precision physical installation and replacement of critical components (CPUs, GPUs, NVMe storage) in a live production environment without impacting cluster quorum.
Qualifications
Education: Bachelor's degree in Computer Science, IT, or equivalent experience.
Platform Expertise: 5+ years of experience specifically operating Red Hat OpenShift (OCP) in a production environment.
Hardware Fluency: Deep experience racking/stacking and cabling high-density GPU systems (e.g., NVIDIA DGX or similar) and specialized AI/ML hardware.
Infrastructure as Code (IaC): Advanced proficiency in Ansible or Pulumi for automating bare-metal provisioning and cluster configuration.
Scripting: Strong Python and Bash skills for developing custom health-check scripts and API integrations.
Linux Mastery: Expert-level CoreOS and RHEL administration, including kernel tuning and systemd management.
Networking: Solid understanding of BGP, VLAN tagging, LACP, and Load Balancing (F5/NGINX) essential for cluster ingress.
Virtualization & Storage: Experience with vSphere or KVM, and persistent storage solutions like OpenShift Data Foundation (ODF) or Ceph.
Tooling: Familiarity with DCIM tools (Netbox) and monitoring stacks ( ELK/Lok ..etci).
Physical Requirements
Lifting: Ability to lift and move equipment up to 50 pounds (e.g., high-density 2U/4U servers).
Environment: Comfortable working in high-decibel, climate-controlled data center aisles.
Dexterity: Capable of standing, walking, and performing precision cabling in tight rack spaces for extended periods.
Travel: May require occasional travel to remote data center sites or edge locations.
Our Commitment
We’re trailblazers that dream big, take risks, and challenge cybersecurity’s status quo. It’s simple: we can’t accomplish our mission without diverse teams innovating, together.
Palo Alto Networks is evolving and changing the nature of work to meet the needs of our employees now and in the future through FLEXWORK, our approach to how we work. From benefits to learning, location to leadership, we’ve rethought and recreated every aspect of the employee experience at Palo Alto Networks. And because it FLEXes around each individual employee based on their individual choices, employees are empowered to push boundaries and help us all evolve, together.
We are committed to providing reasonable accommodations for all qualified individuals with a disability. If you require assistance or accommodation due to a disability or special need, please contact us at accommodations@paloaltonetworks.com.
Palo Alto Networks is an equal opportunity employer. We celebrate diversity in our workplace, and all qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or other legally protected characteristics.
Compensation Disclosure
The compensation offered for this position will depend on qualifications, experience, and work location. For candidates who receive an offer at the posted level, the starting base salary (for non-sales roles) or base salary + commission target (for sales/com-missioned roles) is expected to be the annual range listed below. The offered compensation may also include restricted stock units and a bonus. A description of our employee benefits may be found here.
$147,000.00 - $237,500.00/yrOur Commitment
We’re trailblazers that dream big, take risks, and challenge cybersecurity’s status quo. It’s simple: we can’t accomplish our mission without diverse teams innovating, together.
We are committed to providing reasonable accommodations for all qualified individuals with a disability. If you require assistance or accommodation due to a disability or special need, please contact us at accommodations@paloaltonetworks.com.
Palo Alto Networks is an equal opportunity employer. We celebrate diversity in our workplace, and all qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or other legally protected characteristics.
All your information will be kept confidential according to EEO guidelines.
Is role eligible for Immigration Sponsorship?: YesCurrent Employees, apply here