Hex Trust
We are looking for a Senior/Lead DevOps Engineer to be responsible for managing and overseeing our microservice, containerised cloud infrastructure hosted within AWS, GCP, and on-premises. In addition, a Senior DevOps Engineer will be responsible for leading the team alongside other senior engineers, driving projects, supporting junior developers and managing advanced platform configuration tasks and assignments.
Duties & Responsibilities
PLATFORM & INFRASTRUCTUREManage advanced on-premise and cloud infrastructure:Cloud-based Kubernetes Cluster environments and their requisite application systems and support components
On-premise Kubernetes Clusters
Additional container orchestration with Nomad
Codify all infrastructure components using IaC such as Terraform, Terragrunt, and Pulumi
Manage and monitor network latency and security across all web application platforms infrastructure
Advanced monitoring and alerting configurations using FOSS tools such as Prometheus, Grafana, Loki, Alert-Manager
Manage, coordinate, and implement software upgrades, patches, hot fixes on servers
DEVOPS: Developer Productivity and OperationsSupport Dev + DevOps team with management of web application infrastructure for our various product platforms
Support GitOps model of application management and deployment: Gitlab, ArgoCD, etc
Programmatic and automated Load and Pen Testing
Creating and curating various support systems including: DB backups/restores, developer diagnostic tooling, system logging and error tracing, performance and system monitoring
TECH-SEC-OPS: Delivery Management + SecurityIntegration of Security / Compliance checks into Application and Infrastructure delivery pipelines – ensuring production deployments are resilient, stable, and secured
Leverage FOSS and SAAS Pen Test platforms for constant system testing
Coordinate with Security team to ensure highest standards of quality around privileged infrastructure/application access
SRE: Monitoring and ObservabilityPerformance tune and scale web application infrastructure to handle tens of thousands of page views and transactions per day.
Explore new technologies in order to evolve both application functionality and contribute to the design of infrastructure, deployment and maintenance processes.
RESILIENCY: Resilient, Scalable Infrastructure ManagementAble to configure advanced scalability models leveraging infrastructure provisioned by the platform team and monitoring components provided by SRE
Able to scale platforms based on application/container load, node load, network load, etc.
Delivering cross-regional, HA infrastructure models with global mesh
Chaos Engineering principles for testing
Requirements
Deep understanding of KubernetesHelm and Kustomize
Network Policy + Service Mesh
Proficiency in the following tools/frameworks/languages is highly recommended:Golang/Python/Javascript/Typescript
Terraform, Terragrunt, Pulumi
Kubernetes CLI (kubectl/kubeadm)
AWS CLI
Hashicorp Configuration Language (HCL),
Experience in BOTH AWS and GCP; some on-premise infrastructure management experience a plus (IBM ESXi, Proxmox, etc.)
Knowledge of Docker and multi-stage/multi-architecture builds, registries, and best image practices
Experience with GitOps development workflow and infrastructure as code (IAC) approach
Diploma/Degree in Information Technology or related disciplines
You love the console and are not afraid to read man pages
6-10 years of experience and deep knowledge of:Linux/Unix Server System Administration
Network Administration
Network infrastructure management and maintenance experience
Hands-on experience with relational database structures: MySQL, MariaDB, PostgreSQL
Solid knowledge of standard network communication protocols such as DNS, HTTP(S), LDAP, SMTP, SNMP
Understanding and capabilities to manage, monitor, and optimise network traffic: TCP and UDP; comfortable diagnosing problems on the wire.