[go: up one dir, main page]

0% found this document useful (0 votes)
19 views14 pages

Ai Agent Build Guide

Uploaded by

sivaram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views14 pages

Ai Agent Build Guide

Uploaded by

sivaram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

AI Resume Agent Development Guide

Project Overview

This document outlines the complete step-by-step process to build an AI Resume Agent
that can screen, analyze, and rank candidate resumes and GitHub profiles based on job
descriptions using semantic similarity scoring.

Table of Contents

1. Initial Setup and Architecture

2. Core Components Development

3. AI Integration and Scoring

4. Issue Resolution and Improvements

5. Final System Architecture

6. Running the System

Initial Setup and Architecture

Step 1: Project Structure

Create the following directory structure:

AIResumeAgent/
├── main.py
├── config.py
├── web_app.py
├── agents/
│ ├── resume_finder.py
│ └── resume_analyzer.py
├── tools/
│ └── resume_parser.py
├── resumes/
├── data/
└── logs/

Step 2: Environment Setup

1. Create a virtual environment:


python -m venv ai_agent_env

2. Activate the virtual environment:

# Windows
ai_agent_env\Scripts\Activate.ps1
# Linux/Mac
source ai_agent_env/bin/activate

3. Install required dependencies:

pip install crewai


pip install langchain-community
pip install scikit-learn
pip install sentence-transformers
pip install pypdf
pip install python-docx
pip install spacy
pip install requests
pip install fastapi
pip install uvicorn
python -m spacy download en_core_web_sm

Core Components Development

Step 3: Configuration Setup (config.py)

Create configuration settings for the AI agent:

import os

class Settings:
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
RESUMES_DIR = os.path.join(BASE_DIR, "resumes")
DATA_DIR = os.path.join(BASE_DIR, "data")
LOGS_DIR = os.path.join(BASE_DIR, "logs")

LLM_MODEL = "llama2" # or your preferred Ollama model


OLLAMA_BASE_URL = "http://localhost:11434"

SUPPORTED_FORMATS = [".pdf", ".docx", ".txt"]

settings = Settings()

Step 4: Resume Parser Development (tools/resume_parser.py)

Create a parser to extract text from various resume formats:


Initial Version Issues:

 The parser was returning raw_text key instead of full_text

 Missing proper error handling for PDF extraction

Final Working Version:

import os
import spacy
from pypdf import PdfReader
from docx import Document

class ResumeParser:
def __init__(self):
self.nlp = spacy.load("en_core_web_sm")

def extract_text_pdf(self, path):


with open(path, "rb") as f:
reader = PdfReader(f)
text = ""
for page in reader.pages:
text += page.extract_text() or ""
return text

def extract_text_docx(self, path):


doc = Document(path)
return "\n".join([p.text for p in doc.paragraphs])

def parse_resume(self, path):


ext = os.path.splitext(path)[1].lower()
if ext == ".pdf":
text = self.extract_text_pdf(path)
elif ext == ".docx":
text = self.extract_text_docx(path)
elif ext == ".txt":
with open(path, "r", encoding="utf-8") as f:
text = f.read()
else:
return None

doc = self.nlp(text)
skills = [ent.text for ent in doc.ents if ent.label_ == "ORG"]

return {
"full_text": text, # Changed from raw_text to full_text
"skills": skills,
"file_path": path
}

Step 5: Resume Finder Agent (agents/resume_finder.py)


Challenge: Initially used unsupported Ollama embedding method
Solution: Integrated SentenceTransformer for embeddings and GitHub API with
authentication

from crewai import Agent


from langchain_community.llms import Ollama
import requests
import time
import logging
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
from sentence_transformers import SentenceTransformer

# GitHub Personal Access Token for API authentication


GITHUB_TOKEN = "your_github_personal_access_token"
HEADERS = {"Authorization": f"token {GITHUB_TOKEN}"} if GITHUB_TOKEN else {}

class ResumeFinderAgent:
def __init__(self, llm):
self.llm = llm
self.embedder = SentenceTransformer('all-MiniLM-L6-v2')
self.agent = Agent(
role="Resume Scout",
goal="Find resumes from public profiles matching job requirements",
backstory="You are an expert at searching the internet for technical talent and public resumes.",
tools=[],
llm=self.llm
)

def get_text_embedding(self, text):


embedding = self.embedder.encode([text])
return np.array(embedding).reshape(1, -1)

def search_github(self, keywords, location="India", max_results=10, logger=None):


found = []
for kw in keywords:
query = f"{kw} location:{location} type:user"
url = f"https://api.github.com/search/users?q={query}"
if logger:
logger.debug(f"GitHub Search URL: {url}")
try:
res = requests.get(url, headers=HEADERS, timeout=5)
if res.status_code == 200:
users = res.json().get('items', [])[:max_results]
if logger:
logger.info(f"Found {len(users)} users for keyword '{kw}'")
for u in users:
profile = {"username": u.get("login"), "url": u.get("html_url")}
found.append(profile)
if logger:
logger.debug(f"GitHub profile found: {profile}")
else:
if logger:
logger.warning(f"GitHub API status {res.status_code} for keyword '{kw}'")
time.sleep(1)
except Exception as e:
if logger:
logger.error(f"GitHub search error for '{kw}': {e}")
if logger:
logger.info(f"Total GitHub profiles found: {len(found)}")
return found

def analyze_github_profile_llm(self, profile, job_description, logger=None):


username = profile['username']
user_url = f"https://api.github.com/users/{username}"
try:
res = requests.get(user_url, headers=HEADERS, timeout=5)
bio = ""
repos_text = ""
if res.status_code == 200:
data = res.json()
bio = data.get('bio', '') or ''
repos_url = f"https://api.github.com/users/{username}/repos"
repos_res = requests.get(repos_url, headers=HEADERS, timeout=5)
if repos_res.status_code == 200:
repos = repos_res.json()
repos_text = " ".join([repo["name"] + " " + (repo.get("description") or "") for repo in repos])
else:
if logger:
logger.warning(f"Failed to fetch GitHub bio for user {username}, status {res.status_code}")

candidate_text = (bio + " " + repos_text).strip()


if not candidate_text:
candidate_text = username

job_emb = self.get_text_embedding(job_description)
cand_emb = self.get_text_embedding(candidate_text)

score = float(cosine_similarity(job_emb, cand_emb)[0][0])


if logger:
logger.info(f"GitHub '{username}' scored {score:.4f}")
logger.debug(f"Candidate text (bio + repos excerpt): '{candidate_text[:200]}'")
return {
"username": username,
"url": profile["url"],
"score": score,
"bio": bio,
"repos_text": repos_text
}
except Exception as e:
if logger:
logger.error(f"Error analyzing GitHub profile {username} with LLM: {e}")
return {"username": username, "url": profile["url"], "score": 0.0, "bio": "", "repos_text": ""}

def find_resumes(self, job_desc, keywords, logger=None):


profiles = self.search_github(keywords, logger=logger)
return profiles

AI Integration and Scoring

Step 6: Resume Analyzer Agent (agents/resume_analyzer.py)

Evolution: Started with simple scoring, evolved to use both LLM analysis and embedding
similarity

Final Version with Comprehensive Logging:

from crewai import Agent


from langchain_community.llms import Ollama
import json
import re
import logging

class ResumeAnalyzerAgent:
def __init__(self, llm):
self.llm = llm
self.agent = Agent(
role="Resume Analyst",
goal="Analyze resumes for job fit and score them",
backstory="You are an HR professional with deep experience matching resumes to technical job
descriptions.",
tools=[],
llm=self.llm
)

def analyze_resume(self, resume_data, job_description, logger=None):


resume_text = resume_data.get('full_text') or resume_data.get('content') or ''
candidate_name = resume_data.get('name', 'Unknown')

if logger:
logger.info(f"Starting analysis for resume '{candidate_name}'")
logger.debug(f"Job Description excerpt: '{job_description[:200].strip()}'")
logger.debug(f"Resume content excerpt: '{resume_text[:200].strip()}'")

prompt = (
f"Job Description:\n{job_description}\n\n"
f"Resume Content:\n{resume_text}\n\n"
"Please analyze the resume's fit for the job and provide a JSON response with: "
"score (1-10), recommendation (yes/maybe/no), reasoning."
)

try:
response = self.llm.invoke(prompt)
if logger:
logger.debug(f"LLM raw response:\n{response}")
match = re.search(r"\{.*\}", response, re.DOTALL)
if match:
analysis = json.loads(match.group())
if logger:
logger.info(f"LLM analysis parsed for '{candidate_name}': {analysis}")
# Validate keys
if "score" not in analysis:
analysis["score"] = 5
if logger:
logger.warning(f"No 'score' field in LLM response for '{candidate_name}', defaulting to
5")
if "recommendation" not in analysis:
analysis["recommendation"] = "maybe"
if logger:
logger.warning(f"No 'recommendation' field in LLM response for '{candidate_name}',
defaulting to 'maybe'")
if "reasoning" not in analysis:
analysis["reasoning"] = "No reasoning provided."
if logger:
logger.warning(f"No 'reasoning' field in LLM response for '{candidate_name}'")
return analysis
else:
if logger:
logger.error(f"Failed to find JSON in LLM response for '{candidate_name}'")
except Exception as e:
if logger:
logger.error(f"Error invoking LLM or parsing response for '{candidate_name}': {e}")

if logger:
logger.info(f"Using fallback analysis for resume '{candidate_name}' with score=5,
recommendation='maybe'")
return {
"score": 5,
"recommendation": "maybe",
"reasoning": "LLM analysis failed or response unparseable; using default."
}

def analyze_resumes(self, resumes, job_description, logger=None):


results = []
if logger:
logger.info(f"Analyzing total {len(resumes)} resumes")
for i, resume in enumerate(resumes, 1):
if logger:
logger.info(f"Analyzing resume {i}/{len(resumes)}")
analysis = self.analyze_resume(resume, job_description, logger)
results.append({"resume": resume, "analysis": analysis})
if logger:
logger.info("Sorting resumes by score in descending order")
return sorted(results, key=lambda x: x["analysis"]["score"], reverse=True)

Step 7: Main Orchestrator with Dynamic Threshold (main.py)

Key Innovation: Added dynamic threshold adjustment to prevent zero results


import os
import json
import argparse
import logging
from datetime import datetime
from hashlib import md5

from crewai import Agent


from langchain_community.llms import Ollama
from config import settings
from agents.resume_finder import ResumeFinderAgent
from agents.resume_analyzer import ResumeAnalyzerAgent
from tools.resume_parser import ResumeParser

class AIResumeAgent:
def __init__(self):
self.llm = Ollama(model=settings.LLM_MODEL, base_url=settings.OLLAMA_BASE_URL)
self.resume_finder = ResumeFinderAgent(self.llm)
self.resume_analyzer = ResumeAnalyzerAgent(self.llm)
self.parser = ResumeParser()

def setup_logger(self, job_description):


desc_hash = md5(job_description.encode()).hexdigest()[:8]
timestamp = datetime.now().strftime("%Y%m%d-%H%M%S")
log_dir = os.path.join(settings.BASE_DIR, "logs")
os.makedirs(log_dir, exist_ok=True)
log_filename = os.path.join(log_dir, f"search_{desc_hash}_{timestamp}.log")
logger = logging.getLogger(log_filename)
logger.setLevel(logging.DEBUG)
if logger.hasHandlers():
logger.handlers.clear()
fh = logging.FileHandler(log_filename)
formatter = logging.Formatter('%(asctime)s | %(levelname)s | %(message)s')
fh.setFormatter(formatter)
logger.addHandler(fh)
return logger

def filter_candidates_with_dynamic_threshold(
self, candidates, initial_threshold=7.0, min_threshold=4.0, step=0.5, logger=None
):
threshold = initial_threshold
qualified = []

while threshold >= min_threshold:


qualified = [c for c in candidates if c["analysis"]["score"] >= threshold]
if logger:
logger.info(f"Filtering candidates with threshold {threshold}: found {len(qualified)}")
if qualified:
break
threshold -= step

if not qualified and logger:


logger.warning(f"No candidates matched above minimum threshold {min_threshold}")
return qualified
def run_recruitment_process(self, job_description: str, search_keywords: list, top_n: int = 5):
logger = self.setup_logger(job_description)
logger.info("Starting AI Resume Screening...")

results = {
"job_description": job_description,
"search_keywords": search_keywords,
"timestamp": datetime.now().isoformat(),
"found_profiles": [],
"parsed_resumes": [],
"analyzed_resumes": [],
"top_candidates": []
}

# Step 1: Parse local resumes


resume_files = []
for ext in settings.SUPPORTED_FORMATS:
resume_files.extend([f for f in os.listdir(settings.RESUMES_DIR) if f.lower().endswith(ext)])
logger.info(f"Local resume files found: {len(resume_files)}")

for rf in resume_files:
path = os.path.join(settings.RESUMES_DIR, rf)
logger.info(f"Parsing resume: {path}")
try:
parsed = self.parser.parse_resume(path)
if parsed:
results["parsed_resumes"].append(parsed)
logger.info(f"Resume parsed successfully: {rf}")
else:
logger.warning(f"Failed to parse resume: {rf}")
except Exception as e:
logger.error(f"Exception parsing resume {rf}: {e}")

# Step 2: Analyze parsed resumes with dynamic threshold


qualified_candidates = []
if results["parsed_resumes"]:
logger.info(f"Analyzing {len(results['parsed_resumes'])} parsed resumes")
analyzed = self.resume_analyzer.analyze_resumes(results["parsed_resumes"], job_description,
logger=logger)
results["analyzed_resumes"] = analyzed

qualified_candidates = self.filter_candidates_with_dynamic_threshold(analyzed, logger=logger)


results["top_candidates"] = qualified_candidates[:top_n]
logger.info(f"Qualified {len(qualified_candidates)} resumes after dynamic thresholding, top
{top_n} selected")
else:
logger.info("No parsed resumes found")

# Step 3: Fallback to GitHub profiles if no qualified resumes


if not qualified_candidates:
logger.info("No qualified resumes found; searching GitHub profiles")
found_profiles = self.resume_finder.find_resumes(job_description, search_keywords,
logger=logger)
scored_profiles = [
self.resume_finder.analyze_github_profile_llm(p, job_description, logger=logger)
for p in found_profiles
]
scored_profiles = sorted(scored_profiles, key=lambda x: x["score"], reverse=True)
results["top_candidates"] = [{"github_profile": p} for p in scored_profiles[:top_n]]
results["found_profiles"] = found_profiles
logger.info(f"GitHub fallback: selected top {len(results['top_candidates'])} profiles")
else:
results["found_profiles"] = []

# Step 4: Save results


os.makedirs(settings.DATA_DIR, exist_ok=True)
output_file = os.path.join(settings.DATA_DIR, "results.json")
with open(output_file, "w") as f:
json.dump(results, f, indent=2)
logger.info(f"Process complete. Results saved at {output_file}")

return results

def main():
parser = argparse.ArgumentParser()
parser.add_argument(
"--request-file",
type=str,
help="Path to JSON file containing job_description, keywords, and optional top_n"
)
args = parser.parse_args()

job_desc = """
Senior Python Developer - Machine Learning
Skills: Python, TensorFlow, PyTorch, Docker, SQL, Cloud Platforms
"""
keywords = ["python", "machine-learning", "tensorflow", "docker", "sql"]
top_n = 5

if args.request_file and os.path.exists(args.request_file):


with open(args.request_file, "r") as f:
req = json.load(f)
job_desc = req.get('job_description', job_desc)
keywords = req.get('keywords', keywords)
top_n = req.get('top_n', top_n)

agent = AIResumeAgent()
agent.run_recruitment_process(job_desc, keywords, top_n)

if __name__ == "__main__":
main()

Issue Resolution and Improvements

Step 8: Ollama Setup


Prerequisite: Install and run Ollama locally

1. Download Ollama from https://ollama.ai/

2. Install and start the service

3. Pull required model: ollama pull llama2

4. Ensure service runs on http://localhost:11434

Step 9: GitHub API Token Setup

Problem: GitHub API rate limiting (403 errors)


Solution: Create Personal Access Token

1. Go to https://github.com/settings/tokens

2. Generate new token with repo permissions

3. Replace "your_github_personal_access_token" in resume_finder.py

Step 10: Dependency Issues Resolution

Problems Encountered:

1. Missing scikit-learn: pip install scikit-learn

2. Missing sentence-transformers: pip install sentence-transformers

3. Ollama embedding method not available: Switched to SentenceTransformer

4. Resume parsing returning empty content: Fixed key naming from raw_text to full_text

Step 11: Logging and Debugging Improvements

Added comprehensive logging for:

 Every API call and response

 Score calculations with detailed explanations

 Threshold adjustments and candidate filtering decisions

 LLM prompt and response analysis

 Error handling and fallback mechanisms

Final System Architecture


Component Integration:

1. main.py: Orchestrates the entire workflow

2. resume_finder.py: Searches GitHub, scores profiles with embeddings

3. resume_analyzer.py: Analyzes local resumes with LLM scoring

4. resume_parser.py: Extracts text from various resume formats

5. config.py: Centralized configuration management

AI Technologies Used:

 CrewAI: Agent framework for role-based task execution

 Ollama LLM: Natural language processing and analysis

 SentenceTransformer: Semantic embeddings for similarity scoring

 spaCy: Natural language processing for resume parsing

 Cosine Similarity: Mathematical scoring for semantic matching

Why This Qualifies as an AI Agent:

1. Autonomous Decision Making: Dynamically adjusts thresholds, chooses fallback


strategies

2. Semantic Understanding: Uses embeddings to understand meaning beyond


keywords

3. Multi-step Reasoning: Orchestrates complex workflows with decision trees

4. Learning Capabilities: Can be extended with feedback loops

5. Explainability: Provides detailed reasoning for all decisions

Running the System

Step 12: Basic Execution

# Activate environment
ai_agent_env\Scripts\Activate.ps1

# Run the main agent


python main.py

# Or with custom parameters


python main.py --request-file job_request.json

Step 13: Web Application (Optional)

# If using FastAPI web interface


uvicorn web_app:app --reload

# If using Streamlit
streamlit run web_app.py

Step 14: System Workflow

1. Input: Job description and search keywords

2. Local Processing: Parse uploaded resumes, analyze with LLM

3. Scoring: Generate semantic similarity scores

4. Filtering: Apply dynamic threshold adjustment

5. Fallback: Search GitHub profiles if no local matches

6. Output: Ranked list of top candidates with detailed analysis

7. Logging: Complete audit trail in timestamped log files

Key Features Achieved

 ✅ Multi-format resume parsing (PDF, DOCX, TXT)

 ✅ LLM-powered resume analysis with natural language reasoning

 ✅ GitHub profile search and scoring

 ✅ Semantic similarity matching using embeddings

 ✅ Dynamic threshold adjustment to prevent zero results

 ✅ Comprehensive logging and audit trails

 ✅ Configurable top-N candidate selection

 ✅ Fallback mechanisms for robust operation

 ✅ Authentication for external APIs

 ✅ Modular, extensible architecture

Potential Enhancements
 Integration with LinkedIn API

 Support for more resume formats

 Advanced skill extraction and matching

 Web-based user interface improvements

 Batch processing capabilities

 Integration with ATS systems

 Performance optimization for large datasets

This AI Resume Agent successfully demonstrates advanced AI capabilities in natural


language processing, semantic understanding, and autonomous decision-making for
practical HR automation tasks.

You might also like