Today, with our current team members, we have the strength to help you grow in any career direction you want, whether in our stores, distribution centers, manufacturing facilities or our Midwest corporate offices. It's your future, maybe it starts with Meijer.Currently, Meijer is seeking a System Engineer / Sr. Site Reliability Engineer (SRE) for its SRE Systems Team. As part of Application Managed Services (AMS) Run 2.0, Meijer is transitioning into a new proactive operating model in application management. Meijer is hiring Sr. Site Reliability Engineer to get engaged early in the software development life cycle to build capabilities to enhance reliability and scalability. The primary goals of this model are: Reduce Mean Time to Detect (MTTR), Mean time to resolve (MTTR), increase system availability and reduce overall incidents.
The Sr. Site Reliability Engineer (SRE) will work with other SREs, business product owners, developers, support team members and quality analysts to help them drive value delivery.
What you’ll be doing (Responsibilities):
- As part of SRE systems team, standardizes SRE activities at Meijer by documenting best practices.
- Develops standards for capabilities like logging, metrics, distributed tracing and Chaos engineering so that Meijer SRE practices are readily consumable without any ambiguity.
- Develops reusable components using C# for other teams to implement reliability standards.
- Drives standardization initiatives to create a “single-pane-of-glass” view for Meijer’s business and IT operations using Azure monitor and PowerBI
- Drives key initiatives utilizing SRE/DevOps concepts and monitoring tools like Azure monitor and Dynatrace.
- Works with various development, SRE, support and testing teams to establish reliability best practices.
- Acts as a consultant and subject matter expert on SRE capabilities and tooling
- Helps SRE and development teams to mature their reliability activities – like estimating SRE activities for a new initiative, conducting an assessment on the current and future state of reliability, helping the team to identify the key system/business metrics and solutionize ways to track them.
- Takes part in SRE governance and lead Community of Practice activities
- Conducts Lunch n Learn sessions on SRE practices.
- Work as an SRE in multiple Lines of Businesses on need basis
- Develops dashboards, alerts, and monitoring for various systems
- Drive engagements with Development and Business Teams to define key Business and system metrics
- Developing SRE capabilities to meet SLI/SLO/SLA requirements.
- Creates error budget for each component, availability dashboard and sets up fast burn and slow burn alerts
- Performs chaos engineering by artificially injecting faults in systems to simulate SLO failures.
- Designs, codes, tests, and implements automation using C#.NET, Powershell and Azure CLI following GIT process.
- Coordinates structured walkthroughs and technical reviews ensuring reliability, resiliency and scalability
- Ensures overall quality by continuous monitoring in development cycle.
- Partners with the solution architects to design for new systems development, new package system evaluations and enhancement of existing systems.
- Mentors and coaches other members of the team. (SREs & Support)
- Coordinates feasibility studies/proofs of concept to evaluate solutions.
- Works with ITS Security and Infrastructure teams to ensure cloud-based systems and programs are secure.
- Works within the ITIL framework and SAFe.
- Actively participates in all team Agile ceremonies.
This job profile is not meant to be all inclusive of the responsibilities of this position; may perform other duties as assigned or required.
- Bachelor’s degree in Computer Science, Computer Information Systems, Business Information Systems, Engineering or related discipline or equivalent work experience and technical training is required.
- Any SAFe certification or training preferred.
- Minimum 3 years of experience as an SRE, preferably in retail industry
- Minimum of 3 years' experience in building serverless applications using Azure (Function apps, Logic apps, CosmosDB, SQL on Azure, deployment slots, API manager etc)
- Minimum of 3 years of experience in implementing observability using multiple tools like Azure monitor, Dynatrace, Data Dog, AppDynamics, Elastic, Prometheus/Grafana or Splunk
- 5+ years of experience as a software engineer developing, building, testing, and deploying software using C# or Java
- Ability to drive discussions with cross-functional teams for setting standards for SRE initiatives (SLIs, SLOs, business metrics, Chaos Engineering, Self-healing, Full stack Observability, dashboarding, alerting & continuous monitoring) without any handholding.
- Strong knowledge in design patterns and Azure well architected framework
- Experience in a cross-functional agile team is preferred.
- Ability to adapt to rapidly changing technology and apply it to business needs.
- Strong knowledge of database management and file access methods is desirable.
- Strong analytical and problem-solving skills.
- Strong written, oral communication skills and should be able to collaborate with business, development, QA, DevOps, SRE and Support teams