Remote job: FleetOps Engineer
On being a FleetOps Engineer at balena
At balena, we help our customers deploy and manage tens of thousands of IoT devices across the globe. The balena ‘fleet’ is extremely heterogeneous, with devices of many different types and architectures, and is constantly growing and evolving.
Coupled with a philosophy of support-driven development, our FleetOps engineers are the “special operations forces of support”, often tackling the high-impact, high-complexity cases that affect the entire balena fleet.
A major focus of the FleetOps team is device reliability engineering, or helping to make device management safer for our users, which includes building tools and automating where possible. Examples of past FleetOps projects include: resinhup — our solution for managing host OS updates, and configizer — a solution we developed to more safely adjust on-device configuration remotely.
As a key member of the FleetOps team, you will be constantly alternating between reactive management practices (temporarily relieving customer friction) and preventative maintenance across the fleet. You won’t have just a single component to maintain, but instead you will work both on providing workarounds that can eventually be productized, and on making existing tools more robust and scalable. You will continuously seek new territory for what customers need in the short/medium term, and collaborate with product engineers to effectively handle the ‘delta’ between what the product is now and where it is heading.
You will actively contribute to product decisions with data from the field. Components like on-device metrics, monitoring, data visualization, and debugging are all common territory for the team. Things you work on today may become new capabilities in the balena platform tomorrow!
- Take customer interactions and issues, write scripts and turn these into tools and products that will enable our users to effectively manage the health of their own fleets
- Convert reactive support into preventative maintenance — diving in to solve the problem now with whatever means necessary, but then building and automating tools/products for the entire fleet
- Contribute to roadmap, development, and maintenance of key OS features such as remote host OS updates, brownfield migrations, etc.
- Help define and educate users on best practices for going to production on balena; you will be a go-to resource for best practices, and will learn and teach the lessons of scaling
- Be a key resource for other engineers on support; you’ll often be asked to lend your expertise and contribute to internal docs/cookbooks to extract your knowledge and educate others
- Create tools to help monitor and understand the overall health of the balena device fleet
- Customer-facing skills; ability to understand the actual problem users are trying to solve and work together to find a solution
- Dynamic and flexible demeanor, as user requirements and/or the product change frequently
- Ability to both hold the big picture in mind and dive into the weeds. You’ll be transitioning between the two in real-time
- Having the patience to research and observe patterns, and being methodical and thorough in your approach.
- Continuous improvement mindset; you’re constantly thinking about how to automate your manual work and be more efficient
- Ability to independently make tradeoff decisions and knowing where your marginal time is most productively spent
- Being curious and willing to constantly build on your product knowledge (through projects, tutorials, support shifts, etc.)
- Excellent communication skills, and fluency in English
- Experience with deploying, administering or monitoring Linux systems and applications
- Proven scripting skills (shell, Python, Node.js/JS, Rust etc.) and familiarity with tool building and automation
- Background in SysAdmin, SysEng, DevOps, SRE types of roles, or experience working on similar challenges
- Interest in or familiarity with IoT, embedded software, the balena platform, etc.
Make sure to let us know if any of these items apply to you. If possible, please also share a sample of your work (URL or attachment).
We’re delighted to hear about you! Along with your CV/Resume, please answer the questions in our application form to help us make an informed initial assessment.
Link: FleetOps Engineer