AI Engineer Roadmap

The fastest, most comprehensive way to become an AI Engineer in 2024

Welcome to the AI Engineer Roadmap! This guide offers a project-based approach to mastering AI engineering, whether you're a beginner or looking to expand your skills. Each section includes practical projects to apply your knowledge, build real-world AI applications, and develop crucial problem-solving skills ᕙ( •̀ ᗜ •́ )ᕗ

Web/App Development

It helps to have the ability to code your own interfaces, but it's also 100% possible to build AI products without knowing how to program. It's up to you if you wanna go down the coding (full-stack) route or no-code (Webflow, Zapier, etc) route.

Full-stack Route (recommended)

Front-end: Learn React for building interactive user interfaces
Back-end: Master NodeJS/NextJS for server-side development
Database: Understand and implement Postgres for data storage

There are tons of roadmaps out there for learning web development. One of my favorites is Scrimba. I also have a bootcamp on Youtube that covers full-stack web dev + building AI apps

No-code Route

Website Builder: Explore Webflow for creating professional websites without coding
Workflow Builder: Use Zapier to automate processes and integrate applications
Database: Leverage Firebase or Airtable for easy-to-use, scalable data storage solutions

Beginner Text Generation

Understanding Large Language Models (LLMs)
- Watch 3Blue1Brown's Youtube series on LLMs/Transformers as an entry point
- (Bonus) Watch Karpathy's video on building GPT from scratch
Proprietary LLMs
- OpenAI's GPT models
- Anthropic's Claude 3 family
- Google's Gemini
Open-source LLMs
- Meta's LLaMA 3
- Cohere's Command-R
Prompt Engineering
- Study Anthropic's Prompting Guide
Basic Chatbots
- Explore Vercel's AI Library documentation
- Project: Create a poem generator
Handling Structured Output
- Learn techniques for generating and parsing structured data from LLMs
- Check out Instructor or use string parsing

Advanced Text Generation

Function Calling and Tool Usage
- Implement LLM-powered tools and integrate external functions
- Project: Build a personal assistant that can interact with your calendar, email, and task list
Web-browsing Capabilities
- Learn about techniques for scraping and summarizing web content
- Project: Build an open-source version of Perplexity (like morph.so)
Fine-tuning LLMs
- Techniques for adapting pre-trained models to specific tasks
- Project: Fine-tune a model on a specific domain (e.g., medical terminology, legal jargon)
Embeddings and Vector Databases
- Understand and implement vector representations of text
- Explore vector database solutions for efficient similarity search (e.g. Chroma, Supabase, Weaviate)
- Project: Build a semantic search engine for a large corpus of documents
Retrieval Augmented Generation (RAG)
- Learn about different RAG architectures and when to use them
- Project: Develop a "Chat with PDF" application
AI Agents
- Study projects like OpenDevin to understand autonomous AI systems
- Project: Autonomous research agent

Speech

Text-to-Speech (TTS)
- Implement TTS using services like ElevenLabs and OpenAI
- Project: Create an audiobook generator from text input
Speech-to-Text (STT)
- Utilize models like OpenAI's Whisper for transcription
- Project: Create a job interview coach application
Speech Analysis
- Explore emotion and intent analysis using tools like Hume AI or Google Gemini 1.5 Pro
- Project: Create an AI Therapist with emotion detection
- Learn about prosody analysis and its applications in understanding speaker intent

Image Generation

Prompt Engineering for Image Generation
- Read up on art history and photography terminology to craft effective prompts
- Join the Midjourney Discord to study how experts prompt image models
- Project: Create a series of images that tell a story, using consistent style and characters
Proprietary Image Generation Models
- Explore capabilities of models like GPT-4o, Claude, and Gemini
- Project: Children's coloring/story book generator
- Learn about image-to-image transformations (style transfer, inpainting, outpainting)
Open-source Image Generation Models
- Experiment with Stable Diffusion and other accessible models
- Project: Build a custom image generation UI with fine-grained controls

Computer Vision

Image Analysis
- Leverage models like Claude or GPT-4o for comprehensive image understanding
- Project: Develop an app that can analyze and describe the contents of photos
- Learn about object detection, segmentation, and classification techniques
Video Analysis
- Explore advanced capabilities with models like Google Gemini 1.5 Pro
- Project: Video narration
- Study techniques for tracking objects and analyzing motion in videos
- Project: Create a sports analysis tool that can break down player movements and tactics

Happy learning and building!

Zack

my twitter

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI Engineer Roadmap

Table of Contents

Web/App Development

Full-stack Route (recommended)

No-code Route

Beginner Text Generation

Advanced Text Generation

Speech

Image Generation

Computer Vision

About

Uh oh!

Releases

Packages

License

deepvineet-prog/ai-engineer-roadmap

Folders and files

Latest commit

History

Repository files navigation

AI Engineer Roadmap

Table of Contents

Web/App Development

Full-stack Route (recommended)

No-code Route

Beginner Text Generation

Advanced Text Generation

Speech

Image Generation

Computer Vision

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages