SC

LLM / GenAI Engineer

Scale.jobs · ·

Full-timeLos Angeles, CAPosted 3 days agoSalary estimated

Tech Stack Required

Python LangChain LLMs RAG Pinecone Qdrant Kubernetes Docker AWS GCP

About the Role

About The Role The role is designed for a software engineer who has moved beyond basic prompt engineering and understands how to architect, deploy, and scale production-grade generative AI systems. The team focuses on building robust Retrieval-Augmented Generation (RAG) pipelines, multi-agent orchestrations, and fine-tuning workflows that deliver deterministic, reliable outputs in high-throughput environments. Working alongside backend engineers and data scientists, this role will own the integration of state-of-the-art LLMs into core product workflows, directly tackling the challenges of model latency, cost optimization, evaluation metrics, and hallucination mitigation. Key Responsibilities Design, implement, and optimize production-grade Retrieval-Augmented Generation (RAG) pipelines using LangChain, LlamaIndex, or native integrations Build and maintain scalable vector search infrastructures using systems like Pinecone, Milvus, Qdrant, or pgvector, focusing on high-recall indexing and low-latency retrieval Implement systematic LLM evaluation and observability frameworks utilizing tools like Phoenix, LangSmith, or custom LLM-as-a-judge pipelines to track drift and accuracy Deploy, serve, and optimize open-source LLMs (Llama, Mistral) using frameworks like vLLM, TGI, or TensorRT-LLM on AWS or GCP Execute parameter-efficient fine-tuning (PEFT, LoRA, QLoRA) on domain-specific datasets to adapt models for specialized enterprise tasks Collaborate with backend engineering teams to expose robust, asynchronous APIs serving real-time model outputs with built-in rate-limiting and fallback mechanisms What We Are Looking For 3-6 years of professional software engineering experience, with at least 1.5 years of hands-on experience building and deploying generative AI systems in production Advanced proficiency in Python, including experience with asynchronous programming (asyncio), FastAPI, and writing highly-optimized, testable code Deep technical understanding of transformer architectures, attention mechanisms, embedding spaces, and context-window management Practical experience setting up and tuning vector databases and managing semantic search pipelines at scale BS or MS in Computer Science, Data Science, Mathematics, or a closely related technical field Bonus: Experience with advanced prompting methodologies, agentic frameworks (CrewAI, AutoGen), or containerization and orchestration via Docker and Kubernetes Show more Show less

Ready to apply?

Takes you directly to Scale.jobs's application page

Apply Now →

LLM / GenAI Engineer

Tech Stack Required

About the Role

About Scale.jobs

Similar Roles