AppD Site Reliability Engineer
Location:Offsite, San Jose, California, US
Area of InterestEngineer - Software
Technology InterestCloud and Data Center
Cisco AppDynamics is an application performance monitoring solution that uses machine learning and artificial intelligence (AI) to provide real-time visibility and insight into IT environments. With our unique AIOps solution, you can take the right action at exactly the right time with automated anomaly detection, rapid root-cause analysis, and a unified view of your entire application ecosystem, including private and public clouds. Using Cisco AppDynamics, you’ll finally align IT, DevOps, and the business around the information that helps you protect your bottom line and deliver flawless customer experiences at scale.
You are an ambitious self-starter who enjoys new challenges with a CS / EE degree and / or relevant and desired experiences including:
* 3+ years experience in SRE or software development
* Experience scaling large-scale transaction production systems for operational resiliency
* Solid knowledge of networking and internet technology
* Strong experience in troubleshooting and identifying root cause for complex issues across micro-service architectures
* Experience with Jenkins, Python, Java or Go
* Experience with Docker, Kubernetes, Helm and Terraform.
* Experience with cloud platforms such as AWS, Azure or GCP.
* Experience with one or more monitoring and virtualization tools such as Prometheus, Nagios, Datadog, Grafana, AppDynamics, Pingdom.
* Deep analytical and problem-solving skills
About the Role
End User Monitoring (EUM), also known as Real User Monitoring (RUM), measures user experience from real users by capturing performance data on end user devices like browsers, mobile applications and IoT devices. EUM also supports Synthetic User Monitoring (SUM) which enables customers to test scripted flows against their applications from browsers deployed around the world.
The team manages browser agents, mobile agents, IoT agents, synthetic user agents, the EUM Cloud (a scalable data processor—think MapReduce + data analytics), our engine (built on AWS) and synthetic services to schedule & manage synthetic sessions.
As a key member of the team responsible for the growth, scale and reliability of EUM services, your primary responsibilities include:
* Driving SRE delivery principles, including automating and improving CI/CD pipelines.
* Working with service owners in planning, designing and deploying new services.
* Improving and maintaining the existing frameworks to scale up the product line.
* Providing expertise in measuring, monitoring and improving availability, resilience and latency across micro-services
* Collaborating with the infrastructure team to define and improve delivery methodologies in accordance with DevOps and SRE models
* Automate and streamline global deployment efforts
* Participating in oncall rotation
We Are Cisco
#WeAreCisco, where each person is unique, but we bring our talents to work as a team and make a difference. Here’s how we do it.
We embrace digital, and help our customers implement change in their digital businesses. Some may think we’re “old” (30 years strong!) and only about hardware, but we’re also a software company. And a security company. An AI/Machine Learning company. We even invented an intuitive network that adapts, predicts, learns and protects. No other company can do what we do – you can’t put us in a box!
But “Digital Transformation” is an empty buzz phrase without a culture that allows for innovation, creativity, and yes, even failure (if you learn from it.)
Day to day, we focus on the give and take. We give our best, we give our egos a break and we give of ourselves (because giving back is built into our DNA.) We take accountability, we take bold steps, and we take difference to heart. Because without diversity of thought and a commitment to equality for all, there is no moving forward.
So, you have colorful hair? Don’t care. Tattoos? Show off your ink. Like polka dots? That’s cool.