SkillKit

Security Scanner

Detect malicious patterns, secrets, and vulnerabilities in AI agent skills

Security Scanner

SkillKit includes a built-in security scanner that analyzes skills for prompt injection, command injection, data exfiltration, hardcoded secrets, and other threats before installation.

Quick Start

skillkit scan ./my-skill

The scanner runs automatically during skillkit install. Use --no-scan to skip or --force to install despite findings.

CLI Usage

skillkit scan <path>                       # Scan skill directory
skillkit scan <path> --format json         # JSON output
skillkit scan <path> --format table        # Tabular output
skillkit scan <path> --format sarif        # SARIF v2.1 for GitHub Code Scanning
skillkit scan <path> --fail-on high        # Exit code 1 if HIGH+ findings
skillkit scan <path> --skip-rules UC001    # Skip specific rules
skillkit scan <path> --skip-rules PI001,PI002  # Skip multiple rules

Threat Categories

The scanner detects 46+ patterns across 6 threat categories:

Prompt Injection (PI001–PI015)

Detects attempts to override, bypass, or manipulate agent instructions.

RuleSeverityDescription
PI001CriticalInstruction override ("ignore previous instructions")
PI002CriticalContext clearing ("forget everything")
PI003CriticalTraining bypass ("disregard your training")
PI004CriticalSystem prompt injection ("new instructions:")
PI005HighRole manipulation ("you are now a...")
PI006HighRoleplay manipulation ("pretend to be...")
PI007HighPersistent behavior change ("from now on...")
PI008HighSystem prompt extraction ("show me your instructions")
PI009CriticalPolicy bypass (jailbreak, unrestricted mode, DAN)
PI010HighConcealment ("don't tell the user")
PI011HighDelimiter injection (conversation role markers)
PI012HighModel-specific delimiters (Llama [INST] tags)
PI013MediumYAML front-matter injection
PI014MediumHidden HTML comments with instructions
PI015MediumHidden Markdown comments with instructions

Command Injection (CI001–CI010)

Detects code execution, shell injection, and path traversal patterns.

RuleSeverityDescription
CI001Criticaleval() calls
CI002CriticalFunction() constructor
CI003Criticalchild_process exec/execSync
CI004CriticalPython subprocess with shell=True
CI005MediumTemplate literals with dynamic content in command context
CI006MediumPath traversal (../)
CI007HighShell command chaining with dangerous commands
CI008MediumProcess execution functions
CI009HighPython dynamic import/compile
CI010MediumNode.js vm module usage

Data Exfiltration (DE001–DE008)

Detects network requests, webhook URLs, and credential access patterns.

RuleSeverityDescription
DE001CriticalDiscord webhook URLs
DE002CriticalTelegram bot API URLs
DE003HighSlack webhook URLs
DE004HighHTTP POST with env variables
DE005HighFetch/axios sending to external URLs
DE006MediumReading .env or credentials files
DE007MediumEnvironment variable access patterns
DE008MediumDNS/IP-based exfiltration patterns

Tool Abuse (TA001–TA006)

Detects tool shadowing, autonomy escalation, and capability manipulation.

RuleSeverityDescription
TA001HighTool shadowing (redefining built-in tools)
TA002HighAutonomy abuse ("keep retrying", "run without confirmation")
TA003HighCapability inflation (discovering/enabling extra tools)
TA004MediumTool chaining (read sensitive data then send externally)
TA005MediumPermission escalation patterns
TA006MediumRecursive self-invocation

Hardcoded Secrets (SK001–SK013)

Detects API keys, tokens, and credentials embedded in skill files.

RuleSeverityDescription
SK001CriticalOpenAI API key (sk-...)
SK002CriticalStripe live publishable key
SK003CriticalStripe secret key
SK004CriticalGitHub personal access token
SK005CriticalGitHub OAuth token
SK006CriticalAWS access key (AKIA...)
SK007CriticalSlack bot token (xoxb-...)
SK008CriticalSlack user token (xoxp-...)
SK009CriticalPrivate key block
SK010HighGoogle API key
SK011CriticalAnthropic API key (sk-ant-...)
SK012Highnpm token
SK013LowUUID patterns (with credential context)
SK-ENVHighEnvironment file included in skill

Unicode Steganography (UC001–UC007)

Detects invisible characters and obfuscation techniques.

RuleSeverityDescription
UC001MediumZero-width characters (U+200B, U+200C, U+200D, U+FEFF)
UC002HighBidirectional text override (U+202A–U+202E)
UC003HighBidirectional isolate characters (U+2066–U+2069)
UC004HighUnicode tag characters (U+E0001–U+E007F)
UC005MediumControl characters
UC006MediumBase64-encoded payloads
UC007MediumHex/Unicode escape sequences

Manifest Validation (MF001–MF008)

Validates SKILL.md frontmatter and skill structure.

RuleSeverityDescription
MF001LowMissing SKILL.md frontmatter
MF002LowMissing skill name
MF003LowInvalid skill name format
MF004InfoMissing skill description
MF005InfoShort skill description
MF006HighDangerous tool in allowed-tools (Bash, shell)
MF007HighImpersonation (claiming official affiliation)
MF008MediumBinary files in skill directory

Analyzers

The scanner uses three analyzers that run in parallel:

AnalyzerPurposeRules
StaticAnalyzerRegex pattern matching against security rulesPI, CI, DE, TA, UC
ManifestAnalyzerSKILL.md frontmatter validationMF
SecretsAnalyzerCredential and secret detectionSK

Output Formats

Summary (default)

Colored terminal output with severity indicators, file locations, snippets, and remediation advice.

JSON

Machine-readable JSON with all findings, stats, and verdict.

skillkit scan ./my-skill --format json | jq '.findings[] | {ruleId, severity, title}'

Table

Tabular format for quick review:

skillkit scan ./my-skill --format table

SARIF

SARIF v2.1 for GitHub Code Scanning and IDE integration:

skillkit scan ./my-skill --format sarif > results.sarif

Upload to GitHub:

gh api repos/{owner}/{repo}/code-scanning/sarifs \
  -f "sarif=$(gzip -c results.sarif | base64)"

CI/CD Integration

GitHub Actions

- name: Scan skills
  run: npx skillkit scan ./skills --format sarif --fail-on high > results.sarif

- name: Upload SARIF
  uses: github/codeql-action/upload-sarif@v3
  with:
    sarif_file: results.sarif

Pre-commit Hook

skillkit scan . --fail-on medium

Programmatic Usage

import { SkillScanner, formatResult } from '@skillkit/core';

const scanner = new SkillScanner({
  failOnSeverity: 'high',
  skipRules: ['UC001', 'MF004'],
});

const result = await scanner.scan('./my-skill');

console.log(formatResult(result, 'summary'));
console.log(`Verdict: ${result.verdict}`);
console.log(`Findings: ${result.findings.length}`);

Severity Levels

LevelExit CodeDescription
Critical1 (with --fail-on critical)Immediate security threat
High1 (with --fail-on high, default)Significant security risk
Medium1 (with --fail-on medium)Potential security concern
Low1 (with --fail-on low)Minor issue or best practice
Info0Informational finding

False Positive Handling

The scanner automatically filters:

  • Placeholder patterns: your-api-key, example, sample, dummy, xxx
  • Test files: *.test.ts, *.spec.ts, __tests__/
  • Import statements: import ... from '../' (not path traversal)
  • Comments: Code comments mentioning patterns
  • Security discussions: Text discussing jailbreaks or vulnerabilities (uses negative lookbehind)

To skip specific rules:

skillkit scan . --skip-rules UC001,UC002,MF004

On this page