NVIDIA
Remote (Santa Clara, CA, USA)
NVIDIA is looking for a dedicated Machine Learning Engineer specializing in LLM training datasets engineering. This is a highly technical role requiring deep expertise in machine learning, data science and data engineering to develop innovative solutions that address the unique challenges of training foundation models. This role involves addressing innovative machine learning challenges through building and improving our data ecosystem.
What you'll be doing:
Develop datasets for LLM pre-training and post training (fine-tuning and reinforcement learning), optimize models and evaluate performance.
Design and implement data strategies for model training and evaluation that includes data collection, cleaning, labeling, augmentation, RL verifier datasets to improve model performance. Actively identify and manage data issues such as outliers, noise, and biases.
Generate high-quality synthetic data to augment existing datasets, especially for...
