AI Resume Agent Development Guide
Project Overview
This document outlines the complete step-by-step process to build an AI Resume Agent
that can screen, analyze, and rank candidate resumes and GitHub profiles based on job
descriptions using semantic similarity scoring.
Table of Contents
1. Initial Setup and Architecture
2. Core Components Development
3. AI Integration and Scoring
4. Issue Resolution and Improvements
5. Final System Architecture
6. Running the System
Initial Setup and Architecture
Step 1: Project Structure
Create the following directory structure:
AIResumeAgent/
├── main.py
├── config.py
├── web_app.py
├── agents/
│ ├── resume_finder.py
│ └── resume_analyzer.py
├── tools/
│ └── resume_parser.py
├── resumes/
├── data/
└── logs/
Step 2: Environment Setup
1. Create a virtual environment:
python -m venv ai_agent_env
2. Activate the virtual environment:
# Windows
ai_agent_env\Scripts\Activate.ps1
# Linux/Mac
source ai_agent_env/bin/activate
3. Install required dependencies:
pip install crewai
pip install langchain-community
pip install scikit-learn
pip install sentence-transformers
pip install pypdf
pip install python-docx
pip install spacy
pip install requests
pip install fastapi
pip install uvicorn
python -m spacy download en_core_web_sm
Core Components Development
Step 3: Configuration Setup (config.py)
Create configuration settings for the AI agent:
import os
class Settings:
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
RESUMES_DIR = os.path.join(BASE_DIR, "resumes")
DATA_DIR = os.path.join(BASE_DIR, "data")
LOGS_DIR = os.path.join(BASE_DIR, "logs")
LLM_MODEL = "llama2" # or your preferred Ollama model
OLLAMA_BASE_URL = "http://localhost:11434"
SUPPORTED_FORMATS = [".pdf", ".docx", ".txt"]
settings = Settings()
Step 4: Resume Parser Development (tools/resume_parser.py)
Create a parser to extract text from various resume formats:
Initial Version Issues:
The parser was returning raw_text key instead of full_text
Missing proper error handling for PDF extraction
Final Working Version:
import os
import spacy
from pypdf import PdfReader
from docx import Document
class ResumeParser:
def __init__(self):
self.nlp = spacy.load("en_core_web_sm")
def extract_text_pdf(self, path):
with open(path, "rb") as f:
reader = PdfReader(f)
text = ""
for page in reader.pages:
text += page.extract_text() or ""
return text
def extract_text_docx(self, path):
doc = Document(path)
return "\n".join([p.text for p in doc.paragraphs])
def parse_resume(self, path):
ext = os.path.splitext(path)[1].lower()
if ext == ".pdf":
text = self.extract_text_pdf(path)
elif ext == ".docx":
text = self.extract_text_docx(path)
elif ext == ".txt":
with open(path, "r", encoding="utf-8") as f:
text = f.read()
else:
return None
doc = self.nlp(text)
skills = [ent.text for ent in doc.ents if ent.label_ == "ORG"]
return {
"full_text": text, # Changed from raw_text to full_text
"skills": skills,
"file_path": path
}
Step 5: Resume Finder Agent (agents/resume_finder.py)
Challenge: Initially used unsupported Ollama embedding method
Solution: Integrated SentenceTransformer for embeddings and GitHub API with
authentication
from crewai import Agent
from langchain_community.llms import Ollama
import requests
import time
import logging
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
from sentence_transformers import SentenceTransformer
# GitHub Personal Access Token for API authentication
GITHUB_TOKEN = "your_github_personal_access_token"
HEADERS = {"Authorization": f"token {GITHUB_TOKEN}"} if GITHUB_TOKEN else {}
class ResumeFinderAgent:
def __init__(self, llm):
self.llm = llm
self.embedder = SentenceTransformer('all-MiniLM-L6-v2')
self.agent = Agent(
role="Resume Scout",
goal="Find resumes from public profiles matching job requirements",
backstory="You are an expert at searching the internet for technical talent and public resumes.",
tools=[],
llm=self.llm
)
def get_text_embedding(self, text):
embedding = self.embedder.encode([text])
return np.array(embedding).reshape(1, -1)
def search_github(self, keywords, location="India", max_results=10, logger=None):
found = []
for kw in keywords:
query = f"{kw} location:{location} type:user"
url = f"https://api.github.com/search/users?q={query}"
if logger:
logger.debug(f"GitHub Search URL: {url}")
try:
res = requests.get(url, headers=HEADERS, timeout=5)
if res.status_code == 200:
users = res.json().get('items', [])[:max_results]
if logger:
logger.info(f"Found {len(users)} users for keyword '{kw}'")
for u in users:
profile = {"username": u.get("login"), "url": u.get("html_url")}
found.append(profile)
if logger:
logger.debug(f"GitHub profile found: {profile}")
else:
if logger:
logger.warning(f"GitHub API status {res.status_code} for keyword '{kw}'")
time.sleep(1)
except Exception as e:
if logger:
logger.error(f"GitHub search error for '{kw}': {e}")
if logger:
logger.info(f"Total GitHub profiles found: {len(found)}")
return found
def analyze_github_profile_llm(self, profile, job_description, logger=None):
username = profile['username']
user_url = f"https://api.github.com/users/{username}"
try:
res = requests.get(user_url, headers=HEADERS, timeout=5)
bio = ""
repos_text = ""
if res.status_code == 200:
data = res.json()
bio = data.get('bio', '') or ''
repos_url = f"https://api.github.com/users/{username}/repos"
repos_res = requests.get(repos_url, headers=HEADERS, timeout=5)
if repos_res.status_code == 200:
repos = repos_res.json()
repos_text = " ".join([repo["name"] + " " + (repo.get("description") or "") for repo in repos])
else:
if logger:
logger.warning(f"Failed to fetch GitHub bio for user {username}, status {res.status_code}")
candidate_text = (bio + " " + repos_text).strip()
if not candidate_text:
candidate_text = username
job_emb = self.get_text_embedding(job_description)
cand_emb = self.get_text_embedding(candidate_text)
score = float(cosine_similarity(job_emb, cand_emb)[0][0])
if logger:
logger.info(f"GitHub '{username}' scored {score:.4f}")
logger.debug(f"Candidate text (bio + repos excerpt): '{candidate_text[:200]}'")
return {
"username": username,
"url": profile["url"],
"score": score,
"bio": bio,
"repos_text": repos_text
}
except Exception as e:
if logger:
logger.error(f"Error analyzing GitHub profile {username} with LLM: {e}")
return {"username": username, "url": profile["url"], "score": 0.0, "bio": "", "repos_text": ""}
def find_resumes(self, job_desc, keywords, logger=None):
profiles = self.search_github(keywords, logger=logger)
return profiles
AI Integration and Scoring
Step 6: Resume Analyzer Agent (agents/resume_analyzer.py)
Evolution: Started with simple scoring, evolved to use both LLM analysis and embedding
similarity
Final Version with Comprehensive Logging:
from crewai import Agent
from langchain_community.llms import Ollama
import json
import re
import logging
class ResumeAnalyzerAgent:
def __init__(self, llm):
self.llm = llm
self.agent = Agent(
role="Resume Analyst",
goal="Analyze resumes for job fit and score them",
backstory="You are an HR professional with deep experience matching resumes to technical job
descriptions.",
tools=[],
llm=self.llm
)
def analyze_resume(self, resume_data, job_description, logger=None):
resume_text = resume_data.get('full_text') or resume_data.get('content') or ''
candidate_name = resume_data.get('name', 'Unknown')
if logger:
logger.info(f"Starting analysis for resume '{candidate_name}'")
logger.debug(f"Job Description excerpt: '{job_description[:200].strip()}'")
logger.debug(f"Resume content excerpt: '{resume_text[:200].strip()}'")
prompt = (
f"Job Description:\n{job_description}\n\n"
f"Resume Content:\n{resume_text}\n\n"
"Please analyze the resume's fit for the job and provide a JSON response with: "
"score (1-10), recommendation (yes/maybe/no), reasoning."
)
try:
response = self.llm.invoke(prompt)
if logger:
logger.debug(f"LLM raw response:\n{response}")
match = re.search(r"\{.*\}", response, re.DOTALL)
if match:
analysis = json.loads(match.group())
if logger:
logger.info(f"LLM analysis parsed for '{candidate_name}': {analysis}")
# Validate keys
if "score" not in analysis:
analysis["score"] = 5
if logger:
logger.warning(f"No 'score' field in LLM response for '{candidate_name}', defaulting to
5")
if "recommendation" not in analysis:
analysis["recommendation"] = "maybe"
if logger:
logger.warning(f"No 'recommendation' field in LLM response for '{candidate_name}',
defaulting to 'maybe'")
if "reasoning" not in analysis:
analysis["reasoning"] = "No reasoning provided."
if logger:
logger.warning(f"No 'reasoning' field in LLM response for '{candidate_name}'")
return analysis
else:
if logger:
logger.error(f"Failed to find JSON in LLM response for '{candidate_name}'")
except Exception as e:
if logger:
logger.error(f"Error invoking LLM or parsing response for '{candidate_name}': {e}")
if logger:
logger.info(f"Using fallback analysis for resume '{candidate_name}' with score=5,
recommendation='maybe'")
return {
"score": 5,
"recommendation": "maybe",
"reasoning": "LLM analysis failed or response unparseable; using default."
}
def analyze_resumes(self, resumes, job_description, logger=None):
results = []
if logger:
logger.info(f"Analyzing total {len(resumes)} resumes")
for i, resume in enumerate(resumes, 1):
if logger:
logger.info(f"Analyzing resume {i}/{len(resumes)}")
analysis = self.analyze_resume(resume, job_description, logger)
results.append({"resume": resume, "analysis": analysis})
if logger:
logger.info("Sorting resumes by score in descending order")
return sorted(results, key=lambda x: x["analysis"]["score"], reverse=True)
Step 7: Main Orchestrator with Dynamic Threshold (main.py)
Key Innovation: Added dynamic threshold adjustment to prevent zero results
import os
import json
import argparse
import logging
from datetime import datetime
from hashlib import md5
from crewai import Agent
from langchain_community.llms import Ollama
from config import settings
from agents.resume_finder import ResumeFinderAgent
from agents.resume_analyzer import ResumeAnalyzerAgent
from tools.resume_parser import ResumeParser
class AIResumeAgent:
def __init__(self):
self.llm = Ollama(model=settings.LLM_MODEL, base_url=settings.OLLAMA_BASE_URL)
self.resume_finder = ResumeFinderAgent(self.llm)
self.resume_analyzer = ResumeAnalyzerAgent(self.llm)
self.parser = ResumeParser()
def setup_logger(self, job_description):
desc_hash = md5(job_description.encode()).hexdigest()[:8]
timestamp = datetime.now().strftime("%Y%m%d-%H%M%S")
log_dir = os.path.join(settings.BASE_DIR, "logs")
os.makedirs(log_dir, exist_ok=True)
log_filename = os.path.join(log_dir, f"search_{desc_hash}_{timestamp}.log")
logger = logging.getLogger(log_filename)
logger.setLevel(logging.DEBUG)
if logger.hasHandlers():
logger.handlers.clear()
fh = logging.FileHandler(log_filename)
formatter = logging.Formatter('%(asctime)s | %(levelname)s | %(message)s')
fh.setFormatter(formatter)
logger.addHandler(fh)
return logger
def filter_candidates_with_dynamic_threshold(
self, candidates, initial_threshold=7.0, min_threshold=4.0, step=0.5, logger=None
):
threshold = initial_threshold
qualified = []
while threshold >= min_threshold:
qualified = [c for c in candidates if c["analysis"]["score"] >= threshold]
if logger:
logger.info(f"Filtering candidates with threshold {threshold}: found {len(qualified)}")
if qualified:
break
threshold -= step
if not qualified and logger:
logger.warning(f"No candidates matched above minimum threshold {min_threshold}")
return qualified
def run_recruitment_process(self, job_description: str, search_keywords: list, top_n: int = 5):
logger = self.setup_logger(job_description)
logger.info("Starting AI Resume Screening...")
results = {
"job_description": job_description,
"search_keywords": search_keywords,
"timestamp": datetime.now().isoformat(),
"found_profiles": [],
"parsed_resumes": [],
"analyzed_resumes": [],
"top_candidates": []
}
# Step 1: Parse local resumes
resume_files = []
for ext in settings.SUPPORTED_FORMATS:
resume_files.extend([f for f in os.listdir(settings.RESUMES_DIR) if f.lower().endswith(ext)])
logger.info(f"Local resume files found: {len(resume_files)}")
for rf in resume_files:
path = os.path.join(settings.RESUMES_DIR, rf)
logger.info(f"Parsing resume: {path}")
try:
parsed = self.parser.parse_resume(path)
if parsed:
results["parsed_resumes"].append(parsed)
logger.info(f"Resume parsed successfully: {rf}")
else:
logger.warning(f"Failed to parse resume: {rf}")
except Exception as e:
logger.error(f"Exception parsing resume {rf}: {e}")
# Step 2: Analyze parsed resumes with dynamic threshold
qualified_candidates = []
if results["parsed_resumes"]:
logger.info(f"Analyzing {len(results['parsed_resumes'])} parsed resumes")
analyzed = self.resume_analyzer.analyze_resumes(results["parsed_resumes"], job_description,
logger=logger)
results["analyzed_resumes"] = analyzed
qualified_candidates = self.filter_candidates_with_dynamic_threshold(analyzed, logger=logger)
results["top_candidates"] = qualified_candidates[:top_n]
logger.info(f"Qualified {len(qualified_candidates)} resumes after dynamic thresholding, top
{top_n} selected")
else:
logger.info("No parsed resumes found")
# Step 3: Fallback to GitHub profiles if no qualified resumes
if not qualified_candidates:
logger.info("No qualified resumes found; searching GitHub profiles")
found_profiles = self.resume_finder.find_resumes(job_description, search_keywords,
logger=logger)
scored_profiles = [
self.resume_finder.analyze_github_profile_llm(p, job_description, logger=logger)
for p in found_profiles
]
scored_profiles = sorted(scored_profiles, key=lambda x: x["score"], reverse=True)
results["top_candidates"] = [{"github_profile": p} for p in scored_profiles[:top_n]]
results["found_profiles"] = found_profiles
logger.info(f"GitHub fallback: selected top {len(results['top_candidates'])} profiles")
else:
results["found_profiles"] = []
# Step 4: Save results
os.makedirs(settings.DATA_DIR, exist_ok=True)
output_file = os.path.join(settings.DATA_DIR, "results.json")
with open(output_file, "w") as f:
json.dump(results, f, indent=2)
logger.info(f"Process complete. Results saved at {output_file}")
return results
def main():
parser = argparse.ArgumentParser()
parser.add_argument(
"--request-file",
type=str,
help="Path to JSON file containing job_description, keywords, and optional top_n"
)
args = parser.parse_args()
job_desc = """
Senior Python Developer - Machine Learning
Skills: Python, TensorFlow, PyTorch, Docker, SQL, Cloud Platforms
"""
keywords = ["python", "machine-learning", "tensorflow", "docker", "sql"]
top_n = 5
if args.request_file and os.path.exists(args.request_file):
with open(args.request_file, "r") as f:
req = json.load(f)
job_desc = req.get('job_description', job_desc)
keywords = req.get('keywords', keywords)
top_n = req.get('top_n', top_n)
agent = AIResumeAgent()
agent.run_recruitment_process(job_desc, keywords, top_n)
if __name__ == "__main__":
main()
Issue Resolution and Improvements
Step 8: Ollama Setup
Prerequisite: Install and run Ollama locally
1. Download Ollama from https://ollama.ai/
2. Install and start the service
3. Pull required model: ollama pull llama2
4. Ensure service runs on http://localhost:11434
Step 9: GitHub API Token Setup
Problem: GitHub API rate limiting (403 errors)
Solution: Create Personal Access Token
1. Go to https://github.com/settings/tokens
2. Generate new token with repo permissions
3. Replace "your_github_personal_access_token" in resume_finder.py
Step 10: Dependency Issues Resolution
Problems Encountered:
1. Missing scikit-learn: pip install scikit-learn
2. Missing sentence-transformers: pip install sentence-transformers
3. Ollama embedding method not available: Switched to SentenceTransformer
4. Resume parsing returning empty content: Fixed key naming from raw_text to full_text
Step 11: Logging and Debugging Improvements
Added comprehensive logging for:
Every API call and response
Score calculations with detailed explanations
Threshold adjustments and candidate filtering decisions
LLM prompt and response analysis
Error handling and fallback mechanisms
Final System Architecture
Component Integration:
1. main.py: Orchestrates the entire workflow
2. resume_finder.py: Searches GitHub, scores profiles with embeddings
3. resume_analyzer.py: Analyzes local resumes with LLM scoring
4. resume_parser.py: Extracts text from various resume formats
5. config.py: Centralized configuration management
AI Technologies Used:
CrewAI: Agent framework for role-based task execution
Ollama LLM: Natural language processing and analysis
SentenceTransformer: Semantic embeddings for similarity scoring
spaCy: Natural language processing for resume parsing
Cosine Similarity: Mathematical scoring for semantic matching
Why This Qualifies as an AI Agent:
1. Autonomous Decision Making: Dynamically adjusts thresholds, chooses fallback
strategies
2. Semantic Understanding: Uses embeddings to understand meaning beyond
keywords
3. Multi-step Reasoning: Orchestrates complex workflows with decision trees
4. Learning Capabilities: Can be extended with feedback loops
5. Explainability: Provides detailed reasoning for all decisions
Running the System
Step 12: Basic Execution
# Activate environment
ai_agent_env\Scripts\Activate.ps1
# Run the main agent
python main.py
# Or with custom parameters
python main.py --request-file job_request.json
Step 13: Web Application (Optional)
# If using FastAPI web interface
uvicorn web_app:app --reload
# If using Streamlit
streamlit run web_app.py
Step 14: System Workflow
1. Input: Job description and search keywords
2. Local Processing: Parse uploaded resumes, analyze with LLM
3. Scoring: Generate semantic similarity scores
4. Filtering: Apply dynamic threshold adjustment
5. Fallback: Search GitHub profiles if no local matches
6. Output: Ranked list of top candidates with detailed analysis
7. Logging: Complete audit trail in timestamped log files
Key Features Achieved
✅ Multi-format resume parsing (PDF, DOCX, TXT)
✅ LLM-powered resume analysis with natural language reasoning
✅ GitHub profile search and scoring
✅ Semantic similarity matching using embeddings
✅ Dynamic threshold adjustment to prevent zero results
✅ Comprehensive logging and audit trails
✅ Configurable top-N candidate selection
✅ Fallback mechanisms for robust operation
✅ Authentication for external APIs
✅ Modular, extensible architecture
Potential Enhancements
Integration with LinkedIn API
Support for more resume formats
Advanced skill extraction and matching
Web-based user interface improvements
Batch processing capabilities
Integration with ATS systems
Performance optimization for large datasets
This AI Resume Agent successfully demonstrates advanced AI capabilities in natural
language processing, semantic understanding, and autonomous decision-making for
practical HR automation tasks.