SightPlan is growing our Site Reliability Engineering team to help deploy, manage, troubleshoot, and enhance our multifamily platform-as-a-service (PaaS) offering.
As a Site Reliability Engineer focused on Development Operations, you will design, implement, and maintain our integration and deployment pipelines. Our technology stack is heavily based on Amazon Web Services (AWS). This is an active development position that implements improvements and tooling that facilitate the efficient operation and rapid troubleshooting of our systems.
All members of the Site Reliability Engineering team are expected to keep a watchful eye on and improve the availability, capacity, security, operability, latency and performance of SightPlan’s offerings. The entire team participates in an on-call rotation.
A successful candidate does not have to have expertise in all qualifications/skills listed below. We are open to well-rounded candidates strong in several areas and interested in or capable in others.
- Keeping our continuous integration and deployment pipelines up and running
- Automating work including infrastructure needs, testing, failover solutions, failure mitigation, etc., within the integration and deployment pipelines.
- Work closely with internal partners, including feature-development teams, to ensure we ship software that meets service-level and security objectives (includes participation in architecture and security reviews)
- Maintain the monitoring systems and track service-level objectives (SLO)
- Improve tooling and automation to minimize internal ticket escalations.
- Re-platform or refactor solutions to improve performance and cost.
- Participate in an on-call rotation with your teammates.
- Debug complex issues/outages across our stack and create solid solutions.
- Analyze and document the root-cause of issues/outages (RCA)
Qualifications & Skills
- Amazon Web Services (AWS) – Especially solid experience with services commonly used to implement web services like CloudFront, Route 53, ALB/ELB/NLB, API Gateway, Lambda, S3, SQS, EC2, Kinesis, etc.
- Linux administration including storage, network, and security (Ubuntu preferred)
- Experience operating production systems at scale
- Ability to troubleshoot complex system-level SaaS/PaaS problems
- Understand networking and messaging, especially between services (HTTP, AMQP)
- Familiarity with one or more relational and/or NoSQL databases (e.g., Mongo, Redis, Postgres)
- Advanced Scripting (e.g., Bash)
- Excellent communication skills, both verbal and written
- Experience with DevOps engineering or Site Reliability Engineering (SRE)
- Experience with containers and orchestration (e.g., Kubernetes, Docker)
- Hands-on experience developing applications using Ruby, Java, Python, Go, C# or another high-level language.
- Log Management or Operational Intelligence Tools (e.g., Splunk, ELK)
- Experience automating infrastructure, testing, and deployments using tools like Ansible, Chef, or Terraform and able to explain Infrastructure as Code.
- Security Conscious
Traits We Value
- Can-Do Attitude – You know problems can be solved.
- Positivity – You take pride in energizing the people around you.
- Flexibility – There isn’t just one way to get something done.
- Openness – You love to share your hard-won tricks of the trade
To apply send resume or LinkedIn profile to firstname.lastname@example.org.