This service is set to disconnect automatically after 0 minutes of inactivity. Your session will end in 1 minutes. Click OK to reset the timer to 0 minutes.
You have been signed out. This service is set to sign out after 0 minutes of inactivity.
Job Description - Cloud Engineering Manager (2200012E)
Our engineering team has built the largest private Medicare marketplace in the country. We passionately focus on the continuous improvement of the systems we build and the culture we promote.
We build a platform that provides the best possible support to our customers who are shopping for insurance, and where our insurance carriers can be confident that their products are accurately and impartially represented.
We are looking for an Engineering Manager for our Cloud Operations team. This position is responsible for leading the design, deployment, scaling, and maintenance of a complex, multi-tenant hybrid cloud.
You have passion about security, reliability, and automation in line with DevOps and Site Reliability Engineering (SRE) principles.
You have experience, but you are always willing to learn new things. You value expertise and a passion for coordinating efforts with Software Engineering, Systems Engineering, and InfoSec teams.
While you recognize and stay up-to-date with current techniques and tools, you are prudent knowing what is and what isn’t a good fit.
People Management, Coaching, and Community Building
Manage a team of direct report engineers.
Mentor engineers and others in the organization on infrastructure reliability, reducing toil, operating software at growing scale, reducing technical complexity and sprawl, and writing software and tooling to improve resilience and automating operations
Assist to interview, hire, and onboard high-quality job applicants
Conduct regular 1-on-1 meetings with engineering reports
Keep leadership well informed of your team’s direction and focus.
Ensure that Site Reliability Engineers across various teams are well informed of changes or status.
Explore new ways of improving communication between Site Reliability Engineers and with other teams.
Promote inclusion and collaboration between various functional disciplines.
Write and maintain architectural, stakeholder, and policy documentation.
Encourage and inspire others to innovate.
Look for new ways to improve our processes.
Look for new ways to improve the quality of our infrastructure.
Look for new ways to increase the velocity with which teams deliver, using expertise from various functional disciplines.
Look for new ways to remediate production incidents more quickly and safely.
Participate in department communities of practice.
Define success and accountability for the Site Reliability Engineering discipline
Adhere to and advocate for best practices including Infrastructure as Code, monitoring, high availability, disaster recovery, security, and DevOps methodologies.
Know what needs to be worked on and keep Site Reliability Engineers focused on the goal.
Provide prompt assistance and remediation solutions during critical situations and production incidents.
Work with teams to implement and refine SRE standards as they are decided upon by the technology organization.
8-10 years of hands-on technical experience with many of the following technologies
Windows and Linux Servers
Cloud platforms, preferably Azure
Secrets management with Azure Key Vault, HashiCorp Vault or similar systems
Configuration management tools like Ansible and Terraform
Load balancers such as F5 Big-IP
Web servers such as IIS (Internet Information Services)
Application Performance Monitoring with tools like Application Insights / Azure Monitor
Monitoring tools such as Azure Monitor, Zabbix, Solar Winds
Continuous Integration and Continuous Delivery with tools like TeamCity, Octopus Deploy, Concourse or GitHub Actions
Log Aggregation tools like SumoLogic or Splunk
Networking tools such as DNS (Domain Name System), DHCP (Dynamic Host Configuration Protocol), proxy servers and software-defined networking in cloud environments such as Azure
3+ years experience with people and team management