Get Jobs Alerts to Your Inbox
Create a Job alert and never miss a great opportunity again.

Remote Reliability Engineer with focus monitoring (Senior) hired by Giant Swarm

Apply
Seniority Level
Senior
Employee Type
Fully Remote
Salary per Year
$60k - $130k
Timezone: 
Worldwide

Why us

You’ll never feel like a replaceable cog in the machine at this company. We value your opinion and experience, so if you have something to add let's talk about it! Our strong culture of failure helps us stay up-to-date with new ideas while also helping us identify areas where things can be improved upon--you'll find that from day one there is always work being done by passionate people who care deeply for their craft (and each other).

We're excited to meet in person twice a year at our onsite, as well as during conferences and events! We love learning from each other so we continue that through bi-yearly personal development talks. Plus, if you want feedback about what's going well or where things can be improved - just let us know with regards to how your performance is going every month 🙂

As a member of our engineering team, you will be responsible for keeping the Kubernetes clusters running and healthy. You also have an important role in developing the product itself - working closely with Platform Engineers who are building out services across all key areas related to this technology platform.

When you join the tribe, it's not just your voice that gets heard. You stand up to be counted in everything we do and when our quarterly hackathons come around all of our work together on out-of-the-box projects with room for innovation because even these events can serve as a hive sprint if need be!

We're excited about the opportunities that our platform brings. We empower developers to ship great products, and are a diverse (fully remote since 2014) yet experienced team with headquarters in Cologne!

Requirements for remote Site Reliability Engineers

  • With your experience in Prometheus, Grafana, and alert manager you'll be able to take on a variety of projects. With SLO's understanding as well as how they relate with the usage of Cloud Native Tools running on top Kubernetes (Prometheus, grafana); it will not be hard for any project that comes up!
  • You're an operations rockstar with experience in coding and managing infrastructure. You know how to use automation tools like Terraform or Ansible, but also Chef (or puppet) when necessary!
  • You’re not afraid to dig deep and find the answer. You have good coding skills (preferably Go, but Python or similar is fine as well).
  • You're the kind of person who doesn't get satisfied until they know every last detail about your system. You must Neatly combine operational and end-user knowledge to ensure that you have a holistic approach when it comes down tryingDebugging systems at all levels, from kernel fundamentals right up to workloads running on Kubernetes.
  • Don't be sad, use code!

Your Job

  • You are an expert in resolving incidents on our own and customer clusters. You participate in the on-call support schedule, where you are always available for advice about infrastructure solution needs like power outages or other critical events that could impact their businesses.
  • You know how servers and systems work. You make tweaks to their behavior, so they can work for you better!
  • We are looking for an Infrastructure Engineer who can create and maintain our infrastructure. You will be responsible to design, configure, build or upgrading Kubernetes clusters as well as the cloud provider templates that support them all!
  • Do not let the idea of automation stop your progress. Use tools like Kubernetes Controllers and Operators that will allow you to do all this without much hassle or risk in a safe way until eventually if things go well then bots can take over most aspects!

What are you waiting for?