Medallia is a unique opportunity for an energetic, ambitious Site Reliability Engineer to come in and build next-generation service infrastructure. Business is booming, and the tools that have made Medallia successful to date need to evolve to keep up with explosive growth. This is not a pager-driven role -- we need you here to create the systems that will keep the pagers silent and the customers delighted.
Site Reliability Engineering at Medallia creates the automation that delivers high quality, agility, and efficiency to the business. We build the monitoring, testing, and deployment systems that bring the DevOps philosophy to Medallia. Our aspiration is to automate away the tedious parts of running a SaaS platform, freeing our time to focus on what's awesome about our jobs: building new features, capturing customer insights, and growing at a breakneck pace. We work with product engineering, product development, and operations to make the world's best customer experiences even better. Senior SREs own key components of the manageability and scaling infrastructure at Medallia, and ensure that they continue to improve overall engineering productivity and end user experience.
As a Sr. Site Reliability Engineer, you may:
- Build and own a continuous integration and deployment pipeline.
- Create event-based monitoring solutions used by engineers and business customers.
- Design and use a log-based telemetry platform to extract operational insights.
- Create automated testing frameworks for use by Product Engineering.
- Design A/B experimentation frameworks to pilot cool new features.
- BS or equivalent experience in Computer Science or other technical specialty.
- 5+ years experience managing Internet-scale services.
- Ability to code or script in at least one language (Java, Groovy, PHP, Perl, Go, Ruby, etc.)
- Solid understanding of infrastructure and application performance metrics.
- Experience and solid understanding of various infrastructure components and how they impact services: servers, storage, and networking.
- Experience with enterprise-scale telemetry platforms and foundational components such as Kafka, logstash, Netflix Atlas, scribe, or flume.
- Experience with web application build, deployment, and release management, such as Jenkins or Bamboo.
- Experience configuring and maintaining operational monitoring and reporting tools such as Nagios, Zabbix, Sumo Logic, Kibana, and graphite/statsd.
- Experience using configuration management tools, such as cfengine, Puppet, or Chef.
- Familiarity with relational databases. PostgreSQL would be a plus.