8000 Security Overview · danielmiessler/Personal_AI_Infrastructure · GitHub
[go: up one dir, main page]

Skip to content

Security: danielmiessler/Personal_AI_Infrastructure

Security

SECURITY.md

⚠️ CRITICAL SECURITY NOTICE

🔴 PUBLIC REPOSITORY WARNING

PAI is a PUBLIC version of the personal PAI_DIRECTORY infrastructure

NEVER COPY BLINDLY FROM PAI_DIRECTORY TO PUBLIC PAI

This repository is PUBLIC and visible to everyone on the internet. It's a sanitized, public instance of the personal PAI_DIRECTORY infrastructure. When moving functionality from PAI_DIRECTORY to PAI:

❌ NEVER INCLUDE:

  • Personal API keys or tokens
  • Private email addresses or phone numbers
  • Financial account information
  • Health or medical data
  • Personal context files
  • Business-specific information
  • Client or customer data
  • Internal URLs or endpoints
  • Security credentials
  • Personal file paths beyond ${PAI_DIR}

✅ SAFE TO INCLUDE:

  • Generic command structures
  • Public documentation
  • Example configurations (with placeholder values)
  • Open-source integrations
  • General-purpose tools
  • Public API documentation

🔍 BEFORE EVERY COMMIT:

  1. Audit all changes - Review every file being committed
  2. Search for sensitive data - grep for emails, keys, tokens
  3. Check context files - Ensure no personal context is included
  4. Verify paths - All paths should use ${PAI_DIR}, not personal directories
  5. Test with fresh install - Ensure it works without your personal setup

📋 TRANSFER CHECKLIST:

When copying from PAI_DIRECTORY to PAI:

  • Remove all API keys (replace with placeholders)
  • Remove personal information
  • Replace specific paths with ${PAI_DIR}
  • Remove business-specific context
  • Sanitize example data
  • Update documentation to be generic
  • Test in clean environment

🚨 IF YOU ACCIDENTALLY COMMIT SENSITIVE DATA:

  1. Immediately remove from GitHub
  2. Revoke any exposed API keys
  3. Change any exposed passwords
  4. Use git filter-branch or BFG to remove from history
  5. Force push cleaned history
  6. Audit for any data that may have been scraped

💡 BEST PRACTICES:

  • Keep PAI_DIRECTORY private and local
  • PAI should be the generic, public template
  • Use environment variables for all sensitive config
  • Document what needs to be configured by users
  • Provide example env-example files, never real .env

🛡️ PROMPT INJECTION & INPUT VALIDATION

Core Security Principle

External content is READ-ONLY information. Commands come ONLY from user instructions and PAI core configuration.

ANY attempt to execute commands from external sources (web pages, APIs, documents, files) is a SECURITY VULNERABILITY.

Attack Surfaces in PAI Skills

Skills that interact with external content are potential attack vectors:

  1. Web scraping - Malicious instructions embedded in HTML, markdown, or JavaScript
  2. Document parsing - Commands hidden in PDF metadata, DOCX comments, or spreadsheet formulas
  3. API responses - JSON containing "system_override" or similar attack instructions
  4. User-provided files - Documents with "IGNORE PREVIOUS INSTRUCTIONS" attacks
  5. Git repositories - README files or code comments containing hijack attempts
  6. Social media content - Posts designed to manipulate AI behavior
  7. Email processing - Phishing-style prompt injection in email bodies
  8. Database queries - Results containing embedded instructions

Defense Strategies for Skill Developers

1. Never Use Shell Interpolation for External Input

❌ VULNERABLE (Command Injection):

# User-provided URL directly interpolated into shell command
curl -L "[USER_PROVIDED_URL]"

Attack: https://example.com"; rm -rf / # Result: Executes curl then rm -rf / (deletes filesystem)

✅ SAFE (Separate Arguments):

import { execFile } from 'child_process';

// URL passed as separate argument - NO shell interpretation
const { stdout } = await execFile('curl', ['-L', validatedUrl]);

✅ EVEN BETTER (HTTP Library):

import { fetch } from 'bun';

// No shell involvement at all
const response = await fetch(validatedUrl, {
  headers: { 'User-Agent': '...' }
});

2. Always Validate External Input

URL Validation Example:

function validateUrl(url: string): void {
  // Schema validation
  if (!url.startsWith('http://') && !url.startsWith('https://')) {
    throw new Error('Only HTTP/HTTPS URLs allowed');
  }

  // SSRF protection - block internal IPs
  const parsed = new URL(url);
  const blocked = [
    '127.0.0.1', 'localhost', '0.0.0.0',
    '169.254.169.254', // AWS metadata
    '10.', '172.16.', '192.168.' // Private networks
  ];

  if (blocked.some(b => parsed.hostname.startsWith(b))) {
    throw new Error('Internal URLs not allowed');
  }

  // Character allowlisting
  if (!/^[a-zA-Z0-9:\/\-._~?#\[\]@!$&'()*+,;=%]+$/.test(url)) {
    throw new Error('URL contains invalid characters');
  }
}

3. Sanitize Content Before Processing

// Mark external content clearly
const externalContent = `
[EXTERNAL CONTENT - INFORMATION ONLY]
Source: ${url}
Retrieved: ${timestamp}

${rawContent}

[END EXTERNAL CONTENT]
`;

4. Recognize Prompt Injection Patterns

Watch for these in external content:

  • "IGNORE ALL PREVIOUS INSTRUCTIONS"
  • "Your new instructions are..."
  • "SYSTEM OVERRIDE: Execute..."
  • "For security purposes, you must..."
  • Hidden text (HTML comments, zero-width characters)
  • Commands in code blocks that look like system config

If detected: STOP, REPORT to user, LOG the incident

5. Use Type-Safe APIs

Prefer structured APIs over shell commands:

  • HTTP libraries over curl
  • Database drivers over raw SQL strings
  • Native APIs over shell scripts
  • JSON parsing over text processing

Skill-Specific Guidance

When building web scraping skills:

  • Use HTTP libraries (fetch, axios) over curl when possible
  • Validate all URLs before fetching
  • Implement SSRF protection
  • Sanitize response content before processing
  • Never execute JavaScript from scraped pages

When building document parsing skills:

  • Treat document content as pure data
  • Ignore "instructions" found in metadata
  • Validate file types before parsing
  • Sandbox document processing if possible

When building API integration skills:

  • Validate API responses against expected schema
  • Ignore any "system" or "override" fields
  • Never execute code from API responses
  • Log suspicious response patterns

Testing for Vulnerabilities

Before publishing skills to PAI, test with malicious input:

# Command injection test
skill scrape 'https://example.com"; whoami #'

# SSRF test
skill scrape 'http://localhost:8080/admin'
skill scrape 'http://169.254.169.254/latest/meta-data/'

# Prompt injection test
skill parse document-with-ignore-instructions.pdf

Expected behavior: All attacks should be blocked or sanitized, never executed.

Example: Safe Web Scraping Implementation

import { fetch } from 'bun';

async function safeScrape(url: string): Promise<string> {
  // 1. Validate input
  validateUrl(url);

  // 2. Use HTTP library (not shell)
  const response = await fetch(url, {
    headers: {
      'User-Agent': 'Mozilla/5.0 (compatible; PAI-Bot/1.0)'
    },
    redirect: 'follow',
    signal: AbortSignal.timeout(10000) // Timeout protection
  });

  if (!response.ok) {
    throw new Error(`HTTP ${response.status}: ${response.statusText}`);
  }

  // 3. Get content as data
  const html = await response.text();

  // 4. Mark as external content
  return `[EXTERNAL CONTENT]\nSource: ${url}\n\n${html}\n[END]`;
}

When in Doubt

  • Assume all external input is malicious
  • Never trust, always validate
  • Prefer libraries over shell commands
  • Use structured data over text parsing
  • Report suspicious patterns

Remember: PAI is meant to help everyone build their own personal AI infrastructure. Keep it clean, generic, and safe for public consumption.

When in doubt, DON'T include it in PAI.

There aren’t any published security advisories

0