Filevine is a Legal AI company delivering Legal Operating Intelligence for the future of legal work. Grounded in a singular system of truth, Filevine brings together data, documents, workflows, and teams into one unified platform—where modern legal work happens with clarity and consistency.
Powered by LOIS, the Legal Operating Intelligence System, Filevine connects context across every matter to transform legal operations from reactive to proactive. LOIS reads, understands, and reasons across your data to surface insight, automate complexity, and give professionals the clarity and confidence to see more, know more, and do more. Fueled by a team of exceptional collaborators and innovators, Filevine’s rapid growth has earned AI awards and recognition from Deloitte and Inc. as one of the most innovative and fastest-growing technology companies in the country.
Role Summary:
As a Site Reliability Engineer, you will ensure the reliability, scalability, and performance of our AWS-based cloud infrastructure and applications. You'll bridge development and operations, automating processes, monitoring systems, and responding to incidents to maintain exceptional uptime for our mission-critical legal platform. This role is ideal for someone who loves solving complex infrastructure challenges in a production environment. This is a full-time, in-office based role in Salt Lake City, Utah.
Responsibilities
- Design, implement, and maintain highly available, scalable infrastructure on AWS.
- Automate infrastructure provisioning, deployment, and monitoring using IaC tools (e.g., Terraform, CloudFormation).
- Monitor system health, performance, and capacity; proactively identify and resolve issues.
- Participate in on-call rotation to respond to and resolve production incidents.
- Collaborate with development teams to improve observability, logging, and alerting.
- Drive continuous improvement in reliability through chaos engineering, load testing, and post-incident reviews.
- Ensure security best practices and compliance requirements are embedded in our infrastructure.
- Optimize costs while maintaining performance and reliability standards.
Qualifications
- 3-5 years of experience in Site Reliability Engineering, DevOps, or similar roles.
- Deep expertise with AWS services (e.g., EC2, ECS/EKS, RDS, Lambda, S3, VPC, CloudWatch, etc.).
- Proficiency in infrastructure as code (Terraform preferred) and CI/CD pipelines.
- Strong scripting/programming skills (e.g., Python, Bash, Go).
- Experience with monitoring and observability tools (e.g., Datadog, Prometheus, Grafana, ELK stack).
- Solid understanding of networking, Linux systems, and container orchestration (Dockers).
- Proven ability to troubleshoot complex, distributed systems issues.
- Bachelor's degree in Computer Science, Engineering, or equivalent experience.
Nice-to-have
- AWS certifications (e.g., Solutions Architect, DevOps Engineer).
- Experience in SaaS environments or regulated industries (legal tech a plus).
- Familiarity with microservices, serverless architectures, and database reliability.
- Passion for building resilient systems that support high-stakes workflows.