This role is for you if you are comfortable with Big Data and Cloud infrastructure (we use Google Cloud), have a good familiarity with high-performance databases, and are keen on ensuring high reliability and efficiency in large-scale systems.
What you’ll do/ Responsibilities:
Design and build our Machine Learning Platform to help data scientists productionize their models and features faster.Automate all parts of the data science lifecycle: feature engineering, model training, testing, and deployment.Deploy, operate, and grow some of the largest ML systems in the region.Collaborate with product teams to understand operational requirements. Translate these requirements into observable architecture and SRE processes.
What you’ll need/ Requirements:
At least 3 years as an infrastructure or software engineer.Experience with Go, Python, and shell script. Java optional.Experience with cloud environments. Google Cloud preferred.Experience with modern cloud deployment technology such as Terraform, Kubernetes, Helm.Experience with operating and debugging Big Data frameworks such as Spark, Flink, Kafka, and Airflow.Experience with relational and non-relational databases, preferably including clustering and high-availability configurations.Experience with large-scale production systems and microservice architectures.Familiarity with DevOps and Site Reliability Engineering (SRE) principles.