Site Reliability Engineer

  • Location:
    London, England, United Kingdom
  • Area of Interest
    Engineer - Software
  • Job Type
    Professional
  • Technology Interest
    Cloud and Data Center, Networking
  • Job Id
    1255464
New

As a System Administrator on the Meraki Backend Infrastructure Team, you will assist with server maintenance in addition to diving into critical issues as they arise. This is a transitory role that will allow you to be mentored by Site Reliability Engineers (SREs) on the team and grow into that role yourself.

Meraki's Backend Infrastructure Team is responsible for building and scaling the cloud that supports millions of Meraki devices across the world. Meraki’s customer base has grown by a factor of 2-3 every year, serving more than 1.5 billion HTTP requests per day across six datacentres. Our customers depend on the Meraki cloud to monitor and manage their critical infrastructure of network switches, security appliances, wireless APs, security cameras, and phones.

In this role, you will be part of a small engineering team that is based out of our headquarters in San Francisco, CA. You will be responsible for the operational aspects of our fleet of servers: both responding to incidents (managing our suppliers to ensure failed hardware is replaced, applying security patches and managing OS upgrade projects) and defining processes and procedures to ensure work is carried out consistently and reliably as our team grows. With a senior mentor, you’ll build automatic systems in ruby to replace manual operational tasks. This is a ticket driven role for 50% of the time. In the remainder of your time you will have the opportunity to learn with the expectation of becoming a full SRE within 12 to 18 months. To be successful at this role you must complete these tasks to a high standard.

Example projects of a Meraki System Administrator:

  • Performing hardware RMAs and other related issues as they occur.

  • Server maintenance, including applying security patches to address vulnerabilities in our systems, running scripts to automate moving machines into and out of production, and OS upgrades.

  • Troubleshooting, performing root cause analysis, and resolving production issues from the network and application layers all the way down to the system level. This might include anything from digging into source code (our own or from open source projects), hunting memory leaks, tracing bottlenecks in upstream networks, or database query optimization. You’ll work with a SRE mentor.

You are an ideal candidate if you:

  • Know your way around *nix systems. We run Debian.

  • Believe in the Unix way. You build large systems out of small components that each do one job and do it well.

  • Have previous experience with a ticketing system (ie: Jira) for tracking work.

  • Are interested in scripting or coding and digging into other people’s source code in search of the root cause of a problem

  • You automate all the things.

  • Care about the customer experience. You have experience supporting an externally-facing production environment.

  • Bonus Points: Hands-on scripting or coding with 1-2 languages like Ruby, Scala, Python, or Bash daily.

Keywords: Site Reliability Engineering, DevOps, System Administration, Software Engineering, Production Engineering

Cisco is an Affirmative Action and Equal Opportunity Employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, gender, sexual orientation, national origin, genetic information, age, disability, veteran status, or any other legally protected basis.

Share