Site Reliability Engineer
Location:San Jose, California, US
Area of InterestEngineer - Software
Technology InterestBig Data, Analytics, Cloud and Data Center
Site Reliability Engineer
What You'll Do
You are deeply motivated Site Reliability engineer with background in DevOps/SRE software development and operations. Ideal candidate must have experience building, shipping and operating software-as-a-service (SaaS) product. Ideal candidate would have managed such products using Cloud Native Principles and exposed to cloud technologies. This position will enable Continuous Monitoring & Management of infrastructure while providing timely response within designated SLA times to service effecting faults and performance issues. As an SRE you will work closely with our Managed Services Team to diagnose & characterize issues to provide continuous improvement and to develop infrastructure best practices. As SRE you will be driven to build highly scalable, fault-tolerant, and easy to administer infrastructure. You must be pro-active and organized, diligent about documentation, and passionate about monitoring and automating everything.
This can only be accomplished by a candidate with substantial real-world experience actually building, deploying and operating distributed systems using cloud technologies.
Who You'll Work With
Cisco is transforming the networking industry. To make this happen, we are heavily investing in team responsible for The Network. Intuitive. We are disrupting the industry by building a new networking platform that can learn, adapt, and secure itself at the speed of today’s businesses. This Digital Network Architecture platform automates network management and provides our customers with state-of-the-art analytics and insights. This team's innovations span artificial intelligence, machine learning, analytics, IoT, security, automation, and more.
Who You Are
This role is primarily to apply your SRE skills to create complete self-serve Software Delivery Machine. The targeted platform will support vast number of cloud and hybrid customers. The candidate is expected to have strong hands-on skills and will guide and contribute technically to the infrastructure engineering.
- Develop full-fledged software tooling to deliver programmable infrastructure (infrastructure as code)
- Develop tooling to drive end-to-end micro-services monitoring and management
- Implement Kubernetes compliance and best practices in terms of security, audits, network policies, reporting
- Develop Self-service Console to provide infrastructure visibility
· Manage the availability, scalability and performance of the Infrastructure platforms.
· Create the tools and infrastructure leveraged by the rest of the engineering teams
· Diagnose and repair network, application, and hardware bottlenecks
· Test and tune network, hardware, and software configurations to maximize performance
· Deploy and manage monitoring and diagnostic tools
· Monitoring systems, databases and networks for proper operation and performance.
· Providing a 7×24 on call support for the operations infrastructure.
· Create and maintain continuous integration (CI) and continuous deployment (CD) environments to facilitate an agile development process.
· Work is generally expected to take place during normal working hours however the Platform Operations Team provides Tier2 and Tier3 7x24x365 on call escalation and candidates should be flexible with schedules to meet the needs and demands of the business.
· Strong knowledge of core Enterprise LINUX (Red Hat/CentOS) with a focus upon building, maintaining, securing and performance tuning systems.
· Proven experience capacity planning, performance tuning, and infrastructure architecture. Experience scaling web, application, and data systems horizontally and vertically.
· Experience with K8S and other virtual infrastructure platforms.
· High-level shell fluency + one or more scripting languages ( Python, Go, Perl, or similar ).
· Experience with system automation using Ansible.
· Experience with monitoring, alerting, and pipeline analysis tools
· Experience with queuing/data-pipelining.
· Experience with SQL/NoSQL systems such as PostgresSQL, MySQL, Cassandra, or Redis.
· Experience in the development of operational procedures, processes, and scripts
The candidate expected to have strong hands-on skills and will guide and contribute technically to the product.
- BS/MS in Computer Science or related area
- Four or more years of relevant work experience
- Hands on experience working with Kubernetes infrastructure
- Kubernetes Certification is highly preferred
- Expert understanding of Kubernetes internals (clustering, scheduling, controllers, API server, etc.
- Very good understanding of container networking
- Very good software programming skills using Go/Python/YM
- Excellent understanding of microservices architecture
- Experience with Kubernetes monitoring tools (prometheus)
At Cisco, each person brings their unique talents to work as a team and make a difference. Yes, our technology changes the way the world works, lives, plays and learns, but our edge comes from our people.
We connect everything – people, process, data and things – and we use those connections to change our world for the better.
We innovate everywhere - From launching a new era of networking that adapts, learns and protects, to building Cisco Services that accelerate businesses and business results. Our technology powers entertainment, retail, healthcare, education and more – from Smart Cities to your everyday devices.
We benefit everyone - We do all of this while striving for a culture that empowers every person to be the difference, at work and in our communities.