entre

Sr Site Reliability Engr.

brightwheel- Remote
http://mybrightwheel.com
Full Time
Junior (3-5 years)
Annually

Pay Range

Annually:

$100,000 - $150,000

No equity

Industry

Crypto

Description

What You'll Do Develop infrastructure and tooling solutions for complex product engineering projects towards a business goal; ship software that matters to our customers and our company Scale application infrastructure to handle over 1M requests per minute using pragmatic & distributed architectures Develop a fully automated observability stack based on the existing SaaS system, and extend it to predict capacity needs based on the usage patterns Improve engineering velocity by implementing best practices and frameworks; improve coding efficiency and quality Be a steward of quality, scalability, security, and performance. You'll work with other engineers to ensure that we have a solid foundation that serves our customers and enables the team to continue building a great product Drive sound, data-driven decision-making; analyze data insights to uncover opportunities to improve architecture for a great customer experience Implement excellence in engineering processes & culture across teams Collaborate with product and engineering teams effectively and with empathy; promote technical learning across teams Mentor and advocate for junior engineers; push the boundary of their comfort zone. Build trust and respect in the team Interview and evaluate engineering candidates' technical capabilities to help grow our engineering team Qualifications: Technical Skills You’re experienced with scaling and shipping complex, distributed services in AWS or equivalent using infrastructure as code You’re an expert at building infrastructure and tools that make it simple to develop and run code; your stakeholders love to use the things you build You have a wide understanding of the system and application architectures, and have a strong observability background You have strong programming fundamentals, ideally in a variety of languages like Ruby, Python, or Node You’re an expert at measuring query latencies, resource allocation and management You have experience deploying to and orchestrating containers (Docker, Kubernetes, etc.) You prioritize automation and continuous testing in order to optimize speed while simultaneously enhancing quality and security You have an understanding of operational toil, observability, performance, and scalability You are familiar with incident response and management tools like PagerDuty