This role is for you if you are comfortable with Big Data and Cloud infrastructure (we use Google Cloud), have a good familiarity with high-performance databases, and are keen on ensuring high reliability and efficiency in large-scale systems.
What You'll Do
Design and build our Machine Learning Platform to help data scientists productionize their models and features fasterAutomate all parts of the data science lifecycle: feature engineering, model training, testing, and deploymentDeploy, operate, and grow some of the largest ML systems in the regionCollaborate with product teams to understand operational requirements. Translate these requirements into observable architecture and SRE processes
What You'll Need
At least 3 years as an infrastructure or software engineerExperience with Go, Python, and shell script. Java optionalExperience with cloud environments. Google Cloud preferredExperience with modern cloud deployment technology such as Terraform, Kubernetes, HelmExperience with operating and debugging Big Data frameworks such as Spark, Flink, Kafka, and AirflowExperience with relational and non-relational databases, preferably including clustering and high-availability configurationsExperience with large-scale production systems and microservice architecturesFamiliarity with DevOps and Site Reliability Engineering (SRE) principles