Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy, and consent to receive emails from Rise
Jobs / Job page
Data Engineer image - Rise Careers
Job details

Data Engineer

About the Institute of Foundation Models

We are a dedicated research lab for building, understanding, using, and risk-managing foundation models. Our mandate is to advance research, nurture the next generation of AI builders, and drive transformative contributions to a knowledge-driven economy.


As part of our team, you’ll have the opportunity to work on the core of cutting-edge foundation model training, alongside world-class researchers, data scientists, and engineers, tackling the most fundamental and impactful challenges in AI development. You will participate in the development of groundbreaking AI solutions that have the potential to reshape entire industries. Strategic and innovative problem-solving skills will be instrumental in establishing MBZUAI as a global hub for high-performance computing in deep learning, driving impactful discoveries that inspire the next generation of AI pioneers.




The Role


As a Data Engineer specializing in Natural Language Processing (NLP) and large-scale data processing, you will quickly and effectively gather, curate, and prepare high-quality datasets to support cutting-edge NLP research. Your role will be instrumental in enabling researchers by delivering essential data through efficient and scalable engineering practices, including web crawling, LLM-generated content refinement, and robust data pipelines, primarily leveraging Python and related technologies.


Key Responsibilities
  • Rapidly collect, curate, and preprocess datasets based on detailed specifications provided by NLP researchers, delivering data within tight timelines (typically within 1-2 days).
  • Develop and maintain efficient web crawling solutions, APIs, and automated workflows to continuously improve data collection processes.
  • Refine and evaluate outputs from Large Language Models (LLMs) to generate structured datasets suitable for model training and benchmarking.
  • Implement scalable data pipelines, ensuring efficient data processing, storage, retrieval, and distribution to research teams.
  • Collaborate closely with researchers and engineers to ensure collected data meets specified quality and relevance criteria.
  • Document data collection methodologies, dataset characteristics, and pipeline architecture clearly and effectively.
  • Engage with peer teams and participate in technical reviews to uphold best practices and data quality standards.
  • Represent MBZUAI at industry and research forums, showcasing technical capabilities in large-scale data processing and AI data infrastructure.
  • Perform all other duties as reasonably directed by the line manager commensurate with these functional objectives.


Academic Qualifications
  • Bachelor's degree in Computer Science, Data Science, Engineering, or a related technical field required
  • Master’s degree or equivalent experience in Computer Science, Data Engineering, or related technical fields preferred.


Professional Experience - Required
  • Extensive experience in data engineering, data processing, and automation using Python.
  • Demonstrated proficiency in designing and deploying web crawling solutions, automated data extraction, and processing pipelines.
  • Strong understanding of data structures, algorithms, databases, SQL, and performance optimization.
  • Experience working with cloud infrastructure and distributed data processing frameworks (e.g., AWS, Spark, Kafka, Kubernetes).
  • Excellent problem-solving abilities, attention to detail, and the capability to rapidly address technical challenges.
  • Strong communication and collaboration skills with cross-functional teams.


Professional Experience - Preferred
  • Proven track record of supporting NLP or AI research teams with rapid and reliable data delivery.
  • Experience with refining outputs from large-scale AI models, such as LLM-generated data.
  • Contributions to open-source projects, coding competitions, or high visibility in coding communities (e.g., GitHub, Stack Overflow).
  • Familiarity with the latest advancements in NLP data processing and large language model technologies.


$100,000 - $500,000 a year

Salary Range & Description

The starting base pay for this position is as shown above. The actual base pay is dependent upon a variety of job-related factors such as professional background, training, work experience, location, business needs and market demand. Therefore, in some circumstances, the actual salary could fall outside of this expected range. This pay range is subject to change and may be modified in the future.


Visa Sponsorship

This position is eligible for visa sponsorship.


Benefits Include

*Comprehensive medical, dental, and vision benefits 

 *Bonus

*401K Plan

*Generous paid time off, sick leave and holidays

*Paid Parental Leave

*Employee Assistance Program

*Life insurance and disability




Average salary estimate

$300000 / YEARLY (est.)
min
max
$100000K
$500000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

Similar Jobs

Join a dynamic team at the Institute of Foundation Models as a Community Manager, leading the charge in building an engaged open-source community around groundbreaking AI technologies.

Photo of the Rise User
Posted 8 days ago

Step into the role of Data Engineer at Stone Tech, where innovation meets a vibrant work culture in the payments industry.

Photo of the Rise User
Posted 10 days ago

As a Lead Data Engineer at Capital One, you will drive major transformation by solving complex business problems with innovative data solutions.

Photo of the Rise User
Posted 6 days ago
Mission Driven
Social Impact Driven
Passion for Exploration
Reward & Recognition

Join SpaceX as a Data Engineer and contribute to the innovative Starlink project, enhancing global internet connectivity through advanced technology.

Photo of the Rise User
Contentsquare Hybrid No location specified
Posted 13 days ago
Inclusive & Diverse
Collaboration over Competition
Growth & Learning
Dare to be Different
Diversity of Opinions
Dental Insurance
Vision Insurance
Performance Bonus
Paid Time-Off
Mental Health Resources
Employee Resource Groups
Social Gatherings

Join Contentsquare as a Senior Data Engineer to enhance data governance and compliance while working collaboratively with security and platform teams.

Posted 4 days ago

Join Carrum Health as a Senior Data Engineer and help transform healthcare through innovative data solutions.

Photo of the Rise User
Rackspace Hybrid No location specified
Posted 11 days ago

Join Rackspace Technology as a Data Engineer to innovate scalable data solutions in a dynamic, remote environment.

Photo of the Rise User

Endava is seeking a Senior Data Engineer proficient in Python and Databricks to drive the development of effective data pipelines in a hybrid work environment.

Photo of the Rise User
Posted 4 days ago

Step into a pivotal role as a Senior Data Engineer at Fetch, where you'll elevate our DataOps strategies and build robust data infrastructures.

Photo of the Rise User
Apple Hybrid Cupertino, California, United States
Posted 11 days ago
Inclusive & Diverse
Diversity of Opinions
Work/Life Harmony
Dare to be Different
Reward & Recognition
Empathetic
Take Risks
Growth & Learning
Transparent & Candid
Mission Driven
Passion for Exploration
Feedback Forward
Medical Insurance
Dental Insurance
Vision Insurance
Mental Health Resources
Life insurance
Disability Insurance
Health Savings Account (HSA)
Flexible Spending Account (FSA)
Learning & Development
Paid Time-Off
Maternity Leave
Social Gatherings

Apple is looking for an experienced Data Engineer to enhance financial decision-making through advanced data applications and web solutions.

Photo of the Rise User

Join Tenstorrent as a Data and DevOps Engineer to help optimize innovative hardware products shaping the future of computing.

Photo of the Rise User
Posted 9 days ago

Become a vital part of New Era Technology as a Data Engineer, focusing on data ingestion and integration for end-to-end technology solutions.

Photo of the Rise User
RecargaPay Hybrid No location specified
Posted 13 days ago

Join RecargaPay as a Senior Data Engineer to shape an impactful digital financial ecosystem in Brazil through cutting-edge data solutions.

Photo of the Rise User
Nelnet Hybrid Lincoln, NE
Posted 9 hours ago

A Data Engineer role at Nelnet focusing on building and maintaining efficient data pipelines and scalable Snowflake data platforms to empower analytics and business decisions.

MATCH
Calculating your matching score...
FUNDING
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
No info
HQ LOCATION
No info
EMPLOYMENT TYPE
Full-time, hybrid
DATE POSTED
June 7, 2025

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!