SC

LLM / GenAI Engineer

Scale.jobs · ·

Full-timeLos Angeles, CAPosted 9 days agoSalary estimated

Tech Stack Required

Python LangChain LangGraph LLMs RAG Pinecone Qdrant Kubernetes Docker AWS GCP Triton

About the Role

About The Role The role focuses on building, optimizing, and deploying production-grade Generative AI systems, moving beyond basic API wrappers to design complex RAG pipelines, multi-agent orchestrations, and fine-tuning workflows. The engineer will bridge the gap between cutting-edge AI research and scalable, low-latency software engineering. This position collaborates closely with data engineers and product teams to integrate large language models into core product offerings. The work directly impacts system reliability, response accuracy, and inference costs at scale. Key Responsibilities Design, implement, and scale production-grade Retrieval-Augmented Generation (RAG) pipelines using advanced chunking, re-ranking, and hybrid search techniques Develop and evaluate agentic workflows and prompt-chaining architectures using frameworks such as LangChain, LangGraph, or custom orchestration engines Optimize LLM inference latency and throughput using techniques like quantization, speculative decoding, and model hosting frameworks like vLLM or TGI Implement systematic LLM evaluation and observability pipelines utilizing tools like Arize Phoenix, LangSmith, or custom LLM-as-a-judge frameworks Fine-tune open-source models (e.g., Llama, Mistral) using PEFT techniques like LoRA and QLoRA on domain-specific datasets Collaborate with infrastructure teams to deploy and monitor models in cloud environments like AWS or GCP using Docker and Kubernetes What We Are Looking For 3-6 years of professional software engineering experience, with at least 1.5 years dedicated to building and deploying LLM applications in production Expert-level Python skills, including experience with asynchronous programming and robust backend API design (FastAPI, Flask) Hands-on experience with vector databases such as Pinecone, Qdrant, Milvus, or pgvector for high-scale semantic search Deep understanding of LLM architectures, context window limitations, embedding spaces, and attention mechanisms Solid software engineering fundamentals: CI/CD, unit testing, containerization, and structured logging Bonus: Contributions to open-source GenAI projects, experience with custom model evaluation datasets, or familiarity with Triton Inference Server Show more Show less

Ready to apply?

Takes you directly to Scale.jobs's application page

Apply Now →

LLM / GenAI Engineer

Tech Stack Required

About the Role

About Scale.jobs

Similar Roles