Sr. Site Reliability Engineer

Akasa.Com in Remote - United States

$145,000 - $200,000

About AKASA

AKASA is the preeminent provider of generative AI solutions for the healthcare revenue cycle. The company has raised more than $205M in funding from investors such as Andreessen Horowitz, BOND, and Costanoa Ventures.

Named one of the fastest-growing GenAI startups to watch by AIM Research, we’re solving the biggest challenges in the financial infrastructure of healthcare. Transaction volume through the AKASA Platform has grown consistently, with a ~ 2.5x year-over-year increase in the last year. The AKASA customer base represents more than $90B in net patient revenue and includes the most innovative health systems in the country, like Stanford and Johns Hopkins.

Our founding team includes Silicon Valley leaders who have founded or been founding team members of multiple companies with successful exits. Our CEO was ranked among the “Top 50 Healthcare Technology CEOs” by the Healthcare Technology Report. We have been recognized as one of “America’s Best Startup Employers” by Forbes, “Most Innovative Digital Health Startups” by CB Insights, “Best Companies for Remote Workers” by Quartz, and “Best Places to Work” by Fortune, Modern Healthcare, and Built-In, along with being certified as a “Great Place to Work” for the past four years in a row.

Learn more at www.AKASA.com .

We are building the future of healthcare with AI. Everyone is welcome. As an inclusive workplace, we are committed to building an environment where our employees are comfortable bringing their authentic selves to work.

Join us!

About the Role
In this role, you will work closely with both Infrastructure and Platform team members to integrate best practice monitoring into our applications. Your focus will be on developing high-quality runbooks for incident management, ensuring that our response procedures are efficient and effective. You will be responsible for building high-quality visualizations and meaningful alerting systems that provide clear, actionable insights into system performance and health.
As an SRE, you will manage and optimize our infrastructure using tools like Terraform, GitHub CI/CD, and Kubernetes. You will respond to incidents, troubleshoot production issues across the entire stack, and implement automation to streamline operational processes. Your role will involve designing and maintaining core infrastructure to support our users, ensuring our SaaS products run smoothly and efficiently.
Additionally, you will be proactive in identifying potential issues before they become outages, leveraging your expertise in telemetry data collection, querying, and monitoring using tools such as Grafana, Prometheus/Mimir, OpenSearch, and Sentry. You will collaborate with development teams to embed reliability and best practices into the software development lifecycle, ensuring robust and resilient applications.
Your contributions will be vital in scaling our monitoring infrastructure, enhancing system reliability, and ensuring seamless user experiences. By continuously improving our infrastructure and processes, you will help AKASA deliver high-quality, dependable services to our customers.
AKASA is based in South San Francisco. As a company, we embraced remote work. We consider ourselves experts in working collaboratively wherever our team members reside.
We’re committed to doing the best work of our lives, together. Come see if we're the right team for you.

AKASA is a proud equal opportunity employer and we believe that a diverse and inclusive workforce is an imperative. We welcome people of different backgrounds, genders, races, ethnicities, abilities, sexual orientations, and perspectives, just to name a few. We do not discriminate based upon any protected class and we encourage candidates of all identities and backgrounds to apply. AKASA considers qualified applicants regardless of criminal histories in accordance with the San Francisco Fair Chance Ordinance.

AKASA is committed to providing reasonable accommodations for candidates with disabilities in our recruiting process. If you need any assistance or accommodations due to a disability, please let us know at recruiting@akasa.com.
    • Incident Response: Lead an on-call rotation (PagerDuty) to respond to incidents impacting system availability.
    • Application Architecture : Dive deep into our application architectures and work with engineering teams on best practices for monitoring, reliability, and scalability.
    • Infrastructure Management : Manage our infrastructure using Terraform, GitHub CI/CD, and Kubernetes.
    • Proactive Monitoring : Develop monitoring solutions that alert based on symptoms rather than outages.
    • Documentation : Document every action to turn findings into repeatable processes and automation.
    • Process Improvement : Enhance operational processes (such as deployments and upgrades) to ensure reliability and efficiency.
    • Infrastructure Development : Design, build, and maintain core infrastructure to support our applications effectively.
    • Troubleshooting : Troubleshoot and resolve production issues across various services and levels of the stack.
    • Growth Planning : Strategically plan and scale AKASA’s monitoring
    • Monitoring : Proficient in visualizing, monitoring, and alerting on telemetry data (logs, metrics, & traces) using tools such as Grafana, Prometheus/Mimir, OpenSearch, Sentry, and similar technologies.
    • Containerization & Infrastructure : Experience with Docker, Kubernetes, Terraform, or similar technologies.
    • Programming Skills : 5+ years of professional experience using Python, Go, Java, or similar
    • Linux/Unix Proficiency : Proficient with Linux and Unix Shell
    • Collaboration : Excellent collaboration and asynchronous communication skills.
    • Documentation : Committed to thorough documentation to streamline learning and processes.
    • Proactive Attitude : Proactive and enthusiastic attitude towards identifying and fixing issues.
    • Agility : Ability to deliver quickly, iterate fast, and adapt to changing requirements.
    • Version Control : Proficient in using Git/GitHub for version control.
    • Cloud Platforms : Experience with AWS (preferred), Google Cloud, or Azure.
    • Networking : Understanding of networking principles and protocols.
    • Security : Knowledge of security best practices in infrastructure management.
    • Performance Tuning : Experience in performance tuning and optimization
    • Unlimited paid time off (PTO)
    • Expansive coverage for health, dental, and vision
    • Employer contribution to Health Savings Accounts (HSA)
    • Generous parental leave policy
    • Full employee coverage for life insurance
    • Company-paid holidays
    • 401(K) plan
    • Based on market data and other factors, the salary range for this position is $145,000-$200,000 + Equity. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description.
    • The above represents the expected salary range for this job requisition. Ultimately, in determining your pay, we'll consider your location, experience, and other job-related factors.
Apply