
The fastest, most comprehensive way to become an AI Engineer in 2024
Welcome to the AI Engineer Roadmap! This guide offers a project-based approach to mastering AI engineering, whether you're a beginner or looking to expand your skills. Each section includes practical projects to apply your knowledge, build real-world AI applications, and develop crucial problem-solving skills ᕙ( •̀ ᗜ •́ )ᕗ
- Web/App Development
- Beginner Text Generation
- Advanced Text Generation
- Image Generation
- Speech
- Computer Vision

It helps to have the ability to code your own interfaces, but it's also 100% possible to build AI products without knowing how to program. It's up to you if you wanna go down the coding (full-stack) route or no-code (Webflow, Zapier, etc) route.
- Front-end: Learn React for building interactive user interfaces
- Back-end: Master NodeJS/NextJS for server-side development
- Database: Understand and implement Postgres for data storage
There are tons of roadmaps out there for learning web development. One of my favorites is Scrimba. I also have a bootcamp on Youtube that covers full-stack web dev + building AI apps
- Website Builder: Explore Webflow for creating professional websites without coding
- Workflow Builder: Use Zapier to automate processes and integrate applications
- Database: Leverage Firebase or Airtable for easy-to-use, scalable data storage solutions

-
Understanding Large Language Models (LLMs)
- Watch 3Blue1Brown's Youtube series on LLMs/Transformers as an entry point
- (Bonus) Watch Karpathy's video on building GPT from scratch
-
Proprietary LLMs
- OpenAI's GPT models
- Anthropic's Claude 3 family
- Google's Gemini
-
Open-source LLMs
- Meta's LLaMA 3
- Cohere's Command-R
-
Prompt Engineering
-
Basic Chatbots
- Explore Vercel's AI Library documentation
- Project: Create a poem generator
-
Handling Structured Output
- Learn techniques for generating and parsing structured data from LLMs
- Check out Instructor or use string parsing

-
Function Calling and Tool Usage
- Implement LLM-powered tools and integrate external functions
- Project: Build a personal assistant that can interact with your calendar, email, and task list
-
Web-browsing Capabilities
- Learn about techniques for scraping and summarizing web content
- Project: Build an open-source version of Perplexity (like morph.so)
-
Fine-tuning LLMs
- Techniques for adapting pre-trained models to specific tasks
- Project: Fine-tune a model on a specific domain (e.g., medical terminology, legal jargon)
-
Embeddings and Vector Databases
-
Retrieval Augmented Generation (RAG)
- Learn about different RAG architectures and when to use them
- Project: Develop a "Chat with PDF" application
-
AI Agents
- Study projects like OpenDevin to understand autonomous AI systems
- Project: Autonomous research agent

-
Text-to-Speech (TTS)
- Implement TTS using services like ElevenLabs and OpenAI
- Project: Create an audiobook generator from text input
-
Speech-to-Text (STT)
- Utilize models like OpenAI's Whisper for transcription
- Project: Create a job interview coach application
-
Speech Analysis
- Explore emotion and intent analysis using tools like Hume AI or Google Gemini 1.5 Pro
- Project: Create an AI Therapist with emotion detection
- Learn about prosody analysis and its applications in understanding speaker intent

-
Prompt Engineering for Image Generation
- Read up on art history and photography terminology to craft effective prompts
- Join the Midjourney Discord to study how experts prompt image models
- Project: Create a series of images that tell a story, using consistent style and characters
-
Proprietary Image Generation Models
- Explore capabilities of models like GPT-4o, Claude, and Gemini
- Project: Children's coloring/story book generator
- Learn about image-to-image transformations (style transfer, inpainting, outpainting)
-
Open-source Image Generation Models
- Experiment with Stable Diffusion and other accessible models
- Project: Build a custom image generation UI with fine-grained controls

-
Image Analysis
- Leverage models like Claude or GPT-4o for comprehensive image understanding
- Project: Develop an app that can analyze and describe the contents of photos
- Learn about object detection, segmentation, and classification techniques
-
Video Analysis
- Explore advanced capabilities with models like Google Gemini 1.5 Pro
- Project: Video narration
- Study techniques for tracking objects and analyzing motion in videos
- Project: Create a sports analysis tool that can break down player movements and tactics
Happy learning and building!
- Zack