Site Reliability Engineer (f/m/d)

Jina-Ai in Berlin

🙌 Who are we?

- A commercial open-source company that empowers businesses and developers to create cutting-edge neural search, generative AI, and multimodal services using state-of-the-art LMOps, MLOps, and cloud-native technologies
- Founded in Feb. 2020, raised $ 37.5M in 20 months. Now a global team of 65 with four offices: Berlin (HQ), San Jose, Shenzhen, and Beijing.
- One of the high-valued & high-potential AI startups in the world, featured on Forbes DACH AI30 2020, CBInsights AI 100 2021 & 2022.


✨ Who do we want?

- You are passionate about multimodal intelligence and making it accessible to everyone.
- You want to work with the latest technologies and are fascinated by AI/ML.
- You are a fast learner and a team player and enjoy working in an async, distributed environment.
- You are proactive and take ownership of your projects.
- You have excellent communication skills in English.

💁 About this position


😊 Benefits & Perks

💰 Competitive salary & stock options
🌎 Multi-cultural & diverse team
🎓 Numerous opportunities to present/attend top AI/OSS/industry conference
🦄 Rapid career development opportunities alongside the company
🏢 Central office in downtown Berlin, San Jose, Shenzhen, Beijing
⛱️ Free snacks & drinks, monthly team events, flexible working hours, home office options
💻 Macbooks & top-notch equipment


💼 Hiring Process

Candidates can expect the hiring process to follow the order below. Please keep in mind that candidates can be declined from the position at any stage of the process.

- The first round is the CV screening, candidates will receive an email that contains a link for booking the next round. This process takes a maximum of one week.

- Qualified candidates will be invited to schedule a 30-minute screening call specifically on Zoom with one of our global recruiters. For engineering candidates, after this interview candidates will receive an email and be asked to complete an offline code challenge. On average the candidates can finish it in 30 minutes.

- Next, candidates will be invited to schedule Peer Interviews with team members from the relevant team. There are two rounds of Peer Interview, 1st is Technical Peer Interview and the 2nd is Team Peer Interview . For engineering candidates, the team will examine the quality of the offline challenge as well as you fundamental knowledge and coding skill during the Technical Peer Interview; one should also expect a live-coding challenge in 10 to 15 minutes. As long as candidates passed the Technical Peer Interview, they will be invited to talk with specific Team Lead in the Team Peer Interview stage. The interview will be more relevant to practical problem solving.

- Finally, candidates will be invited to schedule a 30-minute interview with CXO.

We will collect the feedback from all interviewers and make a decision in a maximum of two weeks (on average it takes 5 working days). Then the candidate will be invited to another 15-minute call with our recruiters to discuss the terms of the offer.

    • Work closely with engineering teams to enhance deployment strategies for higher reliability of Jina's Cloud services.
    • Build & improve observability stack, streamline & automate Ops processes (incident, problem Management) for different Cloud services.
    • Provide reliable technical support and mentorship on complex issues in a high velocity, dynamic environment.
    • Be a part of the on-call team for production issues during shift or as required.
    • 2+ years experience in building and managing infrastructure on AWS / Azure / GCP.
    • You have owned & operated production scale Kubernetes clusters with exposure to vendor specific Kubernetes solutions such as EKS, AKS and GKE.
    • Solid knowledge of logging, monitoring and observability platforms (Prometheus/Grafana/Jaeger) with large scale distributed systems.
    • 1+ years of experience with cloud automation and infrastructure as code (Terraform/Cloudformation/Helm).
    • Familiarity with at least one programming language, preferably Golang or Python.
    • Experience managing critical production infrastructure, maintaining reliability and uptime, and having a customer first view of operational safety.
Apply