SC

LLM / GenAI Engineer

Scale.jobs · ·

Full-timeLos Angeles, CAPosted 4 days agoSalary estimated

Tech Stack Required

Python LangChain OpenAI API Anthropic Hugging Face LLMs RAG Pinecone Weaviate Qdrant Kubernetes Docker AWS GCP Triton

About the Role

About The Role The role focuses on building, optimizing, and scaling production-grade Generative AI applications and LLM orchestration systems. This position bridges the gap between raw foundation models and robust enterprise software, ensuring high reliability, low latency, and deterministic behavior in complex AI workflows. The engineer will collaborate closely with backend developers and product teams to design robust retrieval-augmented generation (RAG) architectures, fine-tune models on proprietary datasets, and implement systematic evaluation pipelines to prevent regressions. Key Responsibilities Design, implement, and optimize production RAG pipelines using LangChain, LlamaIndex, or custom orchestration frameworks to query multi-modal datasets Build and maintain scalable vector search infrastructures using technologies like Pinecone, Weaviate, Qdrant, or pgvector, ensuring optimal indexing and query performance Develop systematic evaluation frameworks for LLM outputs, utilizing quantitative metrics, LLM-as-a-judge patterns, and human-in-the-loop validation tools Fine-tune open-source models (such as LLaMA, Mistral) using parameter-efficient methods like LoRA, QLoRA, and deep speed optimization frameworks Optimize LLM inference latency and cost through techniques such as prompt caching, structured output generation, quantization, and custom routing logic Collaborate with backend and MLOps teams to integrate LLMs into secure, highly-available microservices using FastAPI, Docker, and Kubernetes What We Are Looking For 3-6 years of software engineering experience, with at least 1.5 years dedicated to building and deploying LLM-based applications in production Expert-level Python programming skills, including experience with async programming, custom packaging, and strict testing practices Hands-on experience with LLM APIs (OpenAI, Anthropic, Cohere) as well as self-hosting and deploying open-source models via vLLM, Hugging Face TGI, or Triton Deep understanding of vector embeddings, chunking strategies, semantic search, and hybrid retrieval techniques Strong background in cloud architecture (AWS or GCP) and modern CI/CD pipelines for machine learning applications Bonus: Contributions to open-source LLM tooling, experience with agentic workflows (e.g., Autogen, CrewAI), or a BS/MS in Computer Science with an AI focus Show more Show less

Ready to apply?

Takes you directly to Scale.jobs's application page

Apply Now →

LLM / GenAI Engineer

Tech Stack Required

About the Role

About Scale.jobs

Similar Roles