We are looking for an experienced Machine Learning Engineer specializing in model inference and optimization to join our team. This role focuses on improving the efficiency and scalability of LLMs in production, including model deployment, quantization, and inference acceleration. The ideal candidate will have 2-3 years of experience working with ML frameworks such as PyTorch or TensorFlow, a deep understanding of neural network architectures, and a strong interest in LLM inference optimization.

Machine Learning Engineer - Inference (2-3 years)