Site Reliability Engineering Lead

  • Location:
    Milpitas, California, US
  • Area of Interest
    Engineer - Network
  • Job Type
    Professional
  • Technology Interest
    Cloud and Data Center
  • Job Id
    1376410

Cisco IoT CloudOps is responsible for the development and operation of the cloud environment powering all Cisco hosted IoT applications.

Our team of cloud engineers work to design, develop, secure, automate, and operate application infrastructure in a variety of public and private clouds, providing the engineering group a consistent platform to deploy all the IoT application components!

What You'll Do

We are looking for someone to manage a team that operates the Cloud Infrastructure running multiple Applications, and will work with the internal technical teams on best practices, creation of alerting, runbooks, and escalation paths. This person will be responsible for tools and process for 24x7 monitoring, alert management, and triaging.

Who You Are

Are you an Engineering Leader with experience managing a NOC or Cloud Operations with a solid understanding of Cloud Infrastructure and Applications? The role requires process oriented, energetic, detail-oriented individual to effectively lead the team to achieve the primary goal of maintaining High Availability and MTTR targets!

Minimum Qualifications

  • BSEE/CS combined or equivalent degree with 12 years related experience
  • 3-5 years of experience working with cloud environments (AWS), Kubernetes and Python (automation)
  • Experience managing 24/7 NOC operation, leading troubleshooting efforts during outages
  • Experience leading teams across multiple shifts
  • Experience generating KPIs and reports
  • Experience with using Grafana, Prometheus, Kibana
  • Linux system exposure, familiarity with Virtualization, Containers & Microservices

Technical Experience:

  • Experience in managing production cloud (AWS or others), associated process and automation workflows
  • Has broader system level knowledge, intuitive understanding of system level interactions, performance trade-offs
  • Experience with Continuous deployment of cloud-based software with Canary releases, Rolling updates, blue-green deployments
  • Practical experience in balancing zero-down time with databases, schema upgrades, code updates etc
  • Good understanding of databases
  • Experience with SOC2 audit, handling audit requirements and documentation
#LI-NO1

Cisco is an Affirmative Action and Equal Opportunity Employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, gender, sexual orientation, national origin, genetic information, age, disability, veteran status, or any other legally protected basis.
Cisco will consider for employment, on a case by case basis, qualified applicants with arrest and conviction records.

We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us to request accommodation.

Cisco Covid-19 Vaccination Requirements
The health and safety of Cisco's employees, customers, and partners is a top priority. Our goal is to protect and mitigate the spread of COVID-19 infection for strong business resiliency during the pandemic. Therefore, Cisco may require new hires to be fully vaccinated against COVID-19 if the role requires business-related travel, meeting with customers/partners (including visiting third-party sites on behalf of Cisco), attending trade events, and Cisco office entry, unless otherwise prohibited by applicable law, and in countries where COVID-19 vaccination is legally required. The company will consider legally required accommodations/exceptions for medical, religious, and other reasons as per the requirements of the role and in accordance with applicable law. Additional information will be provided to candidates about the requirements and accommodation process at the offer time based on region.

Share