Advanced Site Reliability Engineer
Are you looking for a hybrid or remote work opportunity? Are you interested in a workplace that allows for flexibility in your day? Are you ready for a workplace that provides benefits that suit your needs?
Advanced Site Reliability Engineer (SRE) combines operational and engineering knowledge with the necessary creativity to develop new solutions to emerging challenges and to optimize existing systems and processes.
Advanced Site Reliability Engineer is responsible for the system availability and the reliability of key platform services and applications. Advanced SRE Engineer ensures that these meet the requirements of internal and external stakeholders.
As an Advanced SRE, you are motivated to collaborate with various teams within the company, enabling the realization and operation of sustainable production systems that can adapt to global challenges and changes.
Advanced Site Reliability Engineer’s role includes practical disposition, a deep technical understanding and the important ability to plan and develop systems in a targeted manner. You will be supporting daily operations with a hyper focus on performance at scale, real-time monitoring, logging, analyzing, alerting and root cause by understanding the business impact.
- Implementing and maintaining DevOps practices within the team, including continuous integration and deployment.
- Displaying expertise in knowledge of engineering methodologies, concepts and skills and their application in the area of specified engineering specialty.
- Meeting identified goals with respect to SRE team metrics.
- Continuously improving processes and infrastructure to enhance platform reliability and performance.
- Proactively collaborating with Software Engineers, Problem Management, Observability teams to optimize platform performance and scalability.
- Advising and supporting Customer Support teams for complex technical questions/troubleshooting related to the system infrastructure.
- Proactively monitoring and troubleshooting the platform performance and reliability issues using all available tools, dashboards.
- Administration and operation of cloud platforms (MS-Azure).
- Advising and supporting customers for technical questions and for setting up and configuring servers and infrastructure.
- Taking over the incident, request fulfilment for Relativity application(supervision of the application in day-to-day business).
- Responding to system alerts using PagerDuty and similar tools to ensure swift issue resolution.
- Fixing escalation issues.
- Carrying out checks after incidents and implementing solutions to prevent future issues (by applying a blameless postmortem culture; responsible for any post-incident actions that involve the development or optimization of any part of the software development lifecycle or the incident lifecycle).
- Identifying areas for improvement, conducting post-incident reviews, and driving initiatives to enhance system reliability, performance, and operational efficiency.
- Documenting the team knowledge, continuously improving processes and infrastructure to enhance platform reliability and performance.
- Eliminating inefficiencies and identifying what can easily be automated – toil reduction.Experience in automating manual processes required.
- Accompanying rollouts and migrations.
- Active participation in the recruitment and onboarding process of new team members.
- Providing technical leadership throughout the design process and guidance with regards to practices, procedures and techniques. Serving as a guide and mentor for junior level Site Reliability Engineers.
- Demonstrating consistent commitment to company core values.
- Performing additional duties as assigned.
- Bachelor's degree in Computer Science or related field.
- 3+ years of experience in Site Reliability Engineering or DevOps.
- Experience with other tools such as SQL and NoSQL databases and orchestration services.
- Experience in dealing with MS Azure (AS-900, AS-104, PowerShell).
- Good knowledge of a software architecture and design patterns (Kubernetes).
- Experience in a systemmonitoring and alerting.
- Experience in using JIRA, New Relic, Jenkins, Tableau.
- Good knowledge of agile methodologies and a rapid development cycle.
- Experience with DevOps practices, including CI/CD.
- Very good knowledge of English (spoken & written).
- Excellent problem-solving and communication skills.
- Excellent analytical skills.
- Meticulous attention to detail.
- An eagerness to learn, explore, and introduce new technologies.
- Ability to work independently and efficiently under pressure, drive projects to completion and meet deadlines.
- Ability to work in a fast-paced and dynamic environment.
- Ability to work as a team player in a cross-functional team to develop practical solutions and ensure positive user experiences.
- Strong problem-solving and decision-making skills.
- Willingness to take ownership and drive topics end-to-end.
- Personal initiative, commitment, perseverance, and resilience.
- Well-developed communication and teamwork skills.
- An innovative approach and a well-founded way of working.
- Aspiration for DevOps principles and SRE engineering excellence
- Drive to empower your colleagues.
- Compensation Relativity is committed to competitive, fair and equitable compensation practices.
- This position is eligible for total compensation which includes a competitive base salary, annual performance bonus target of 10%, and long-term incentives. The expected salary range for this role is between 163,738 and 200,125 PLN gross/year (Employment Contract). The final offered salary will be based on several factors, including but not limited to the candidate’s depth of experience, skill set, qualifications, and internal pay equity. Hiring at the top end of the range would not be typical, to allow for future meaningful salary growth in this position.