Machine Learning Engineer
Scale.jobs · ·
Tech Stack Required
About the Role
About The Role The role drives the development and scaling of core machine learning services, bridging the gap between experimental prototyping and highly available production systems. The engineer will collaborate closely with product and data platform teams to embed predictive intelligence and generative capabilities into customer-facing software. The focus spans across traditional ML pipelines, deep learning paradigms, and modern LLM application patterns. Operating in a remote-first, high-autonomy culture, the engineer will make critical architecture decisions regarding model training pipelines, real-time inference optimization, and MLOps tooling. Key Responsibilities Build, optimize, and maintain scalable machine learning pipelines for model training, validation, and batch or streaming inference. Develop and deploy retrieval-augmented generation (RAG) applications, agentic workflows, and fine-tuning scripts utilizing state-of-the-art LLMs. Implement robust evaluation frameworks and unit testing suites for ML models to detect regression, hallucinations, and performance degradation. Collaborate with backend engineers to expose model inference endpoints via high-throughput, low-latency APIs built with FastAPI or gRPC. Establish automated MLOps infrastructure for model monitoring, data drift detection, and continuous integration/continuous deployment (CI/CD) of ML assets. Optimize inference latency and GPU utilization through quantization, pruning, and model compilation libraries like TensorRT or vLLM. What We Are Looking For 3-6 years of experience as a Machine Learning Engineer or Software Engineer working on production-grade AI systems. Deep proficiency in Python and solid experience with ML frameworks such as PyTorch, scikit-learn, and Hugging Face Transformers. Hands-on experience with vector search databases (e.g., Pinecone, Qdrant, Milvus, or pgvector) and modern orchestration tools like LangChain or LlamaIndex. Solid understanding of relational and non-relational databases, including experience building feature pipelines in SQL, pandas, or PySpark. Familiarity with containerization (Docker, Kubernetes) and cloud-based ML orchestration platforms like AWS SageMaker, GCP Vertex AI, or Run:ai. Bonus: Experience with Triton Inference Server, Kubernetes-native ML tools (Kubeflow, KServe), or contributions to open-source ML/LLM repositories. Show more Show less
Ready to apply?
Takes you directly to Scale.jobs's application page
About Scale.jobs
Get similar jobs in your inbox
Weekly digest of AI engineering roles matched to your stack.
Subscribe — Free