SRE Devops Engineer
Location:Feltham, England, United Kingdom
Area of InterestEngineer - Software
Technology InterestCloud and Data Center
SRE Devops Engineer
The Business Entity
It’s an exciting time to be a part of Cisco IT’s Infrastructure and Container Servies. Our team is responsible to build and operate on-premise clouds on Cisco ACI Software Defined Network in a DevSecOps model. We are enabling Cloud Native Infrastructure within Cisco
Who You Are
You are a success driven Site Reliability Engineer with proven leadership skills and who has a passion for enterprise cloud infrastructure automation and DevOps frameworks. You have a proven track record of designing, developing and managing cloud infrastructure Ops code using open source technologies.
What you’ll do
Site Reliability Engineers are responsible and take ownership for reliability, scalability, automation, and other issues related to uptime and availability of our on-premise cloud. You will need to have strong skills in following areas:
• Design, write and build tools to improve the reliability, availability and scalability of our Openstack/VMWare/Openshift clouds.
• Augment existing instrumentation to build a cohesive picture of the characteristics of our systems with special attention to points of failure.
- Design and develop improvements, focused on resilience, to our production systems to achieve and surpass SLOs
- Help improve our operational practices to minimize service disruptions
• Work with our Service Assurance team to modernize and improve our monitoring and alerting stack.
• Design new tools to monitor and smart alerts that help discover failures or issues before our customers.
• Work with engineers to identify root cause and fix issues
• Influence, design and create new architectures, standards and methods for large-scale enterprise systems.
• Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
Who you will work with
We are a DevOps team inside Cisco IT building the next generation cloud platform that will be leveraged by all of Cisco as we move to cloud native applications. This is a small team of highly motivated individuals leveraging Agile scrum. We move at a fast pace and are passionate about cloud and automation. Giving back and contributing to the Opensource projects we leverage is encouraged. Where there is not a tool or project to deliver what we need we develop it. We have a history of building clouds at a large scale and are looking for someone with just as much passion about cloud as we have.
• Experience with tools like Elasticsearch, Logstash, Nagios, Grafana, Graphite, InfluxDB, StatsD, and CollectD
• Experience with building and maintaining Redhat or Centos Linux
• Experience with configuration automation using Ansible
• Experience with public cloud like AWS, GCP, or Azure
• Experience with on-premise cloud technologies using VMware or Openstack
• Experience with container technologies like Openshift, Kubernetes, and Docker
• Software development lifecycle including design, development, testing, packaging, deployment, upgrade and support.
• Experience with software development tools like Git, Gerrit, Spinnaker, and Jenkins
• Python, Go, or similar programming experience.
• QA and testing experience of your code and the entire platform.
• Understanding of security including OS hardening, firewalls, iptables, and working with Infosec
• Understanding of network basics like routers and switches
• Leadership in building and maintaining SRE technologies
• Agile software development practices
• Working with geographically distributed teams
• Understand IT processes, including: architecture, design, implementation, and operations
• Opensource development experience
• Self-motivated, able and willing to help where help is needed
• Able to build relationships, be culturally sensitive, have goal alignment, have learning agility
Typically requires BS/BA and 10+ yrs of relevant experience.