SC

LLM / GenAI Engineer

Scale.jobs · ·

Full-timeLos Angeles, CAPosted TodaySalary estimated

Tech Stack Required

Python LangChain LLMs RAG Pinecone Kubernetes Docker AWS GCP Triton

About the Role

About The Role The role is focused on building and optimizing production-grade Generative AI systems, moving beyond simple prompting into complex retrieval-augmented generation (RAG) pipelines, multi-agent orchestrations, and fine-tuning workflows. The engineer will bridge the gap between cutting-edge foundational models and robust, scalable software architectures that serve real-world enterprise needs. This position requires deep technical expertise in natural language processing, vector databases, and model evaluation. The engineer will collaborate closely with product teams, backend engineers, and data scientists to deploy, monitor, and continuously improve LLM-based applications. Key Responsibilities Design and deploy advanced RAG systems using frameworks like LangChain, LlamaIndex, or custom python-based orchestrations. Optimize vector database configurations and indexing strategies using Pinecone, Milvus, or pgvector to achieve sub-100ms retrieval latencies. Develop and execute automated evaluation pipelines (using Ragas, TruLens, or LLM-as-a-judge frameworks) to measure retrieval quality, hallucination rates, and answer correctness. Perform parameter-efficient fine-tuning (PEFT, LoRA, QLoRA) on open-source models like Llama, Mistral, or Mixtral for domain-specific tasks. Implement model-serving optimization techniques including quantization (AWQ, GPTQ), caching, and batching via vLLM, Triton Inference Server, or TGI. Collaborate with backend infrastructure teams to build secure, rate-limited API gateways, asynchronous message queues, and robust logging for LLM calls. What We Are Looking For 3-6 years of professional software engineering experience, with at least 1.5 years dedicated to building and deploying LLM applications in production. Expert-level Python programming skills, including experience with asynchronous programming (asyncio, FastAPI) and test-driven development. Hands-on experience with vector search engines, embedding models, and metadata filtering strategies at scale. Solid understanding of generative AI risks and mitigation strategies, including prompt injection defense, guardrailing frameworks (NeMo Guardrails, Llama Guard), and PII filtering. Strong foundation in cloud infrastructure (AWS or GCP) and containerization technologies (Docker, Kubernetes). Bonus: Experience with LLM agent frameworks (CrewAI, AutoGen), graph databases (Neo4j) for GraphRAG, or custom transformer training. Show more Show less

Ready to apply?

Takes you directly to Scale.jobs's application page

Apply Now →

LLM / GenAI Engineer

Tech Stack Required

About the Role

About Scale.jobs

Similar Roles