Home Projects

8 AI Security Projects

Three phases, eight projects: LLM defense, adversarial ML, AI red teaming, and secure AI system design — built progressively and documented openly.

Phase 1 · Months 1–2 Phase 2 · Months 3–5 Phase 3 · Months 6–8
Phase 1

Core AI Security Foundations

Months 1–2
P1

LLM Security

LLM Prompt Injection Defense Toolkit

In Progress

Summary

A production-quality Python library providing defensive patterns, semantic filters, and evaluation scripts that detect and mitigate prompt injection attacks across LLM API providers. Covers direct injection, indirect injection via tool outputs, and jailbreak-as-injection hybrids.

Architecture

Input ──► [Pre-LLM Sanitizer] │ ├─ Keyword block list │ ├─ Semantic similarity filter (embeddings) │ └─ Regex-based pattern detector ▼ [LLM API Gateway] │ ▼ [Post-Response Auditor] │ ├─ Instruction-following verifier │ └─ Data-leakage classifier ▼ Output ◄── [Safe Response]

Threat Model

ThreatSeverityMitigation
Direct prompt injection via user input HIGH Semantic filter + system prompt hardening
Indirect injection via tool/RAG outputs HIGH Output sanitization before re-injecting into context
Jailbreak via role-play framing MEDIUM Instruction-following verification post-response
System prompt leakage MEDIUM Post-response classifier for sensitive content

Skills Demonstrated

LLM security engineering
Prompt injection mitigation
Secure AI app design
Evaluation framework design
Python library packaging

Stack

Python LangChain OpenAI Sentence-Transformers FastAPI pytest
P2

AI Cyber Defense

AI-Powered Log Analyzer

In Progress

A tool that ingests raw log data (syslog, Windows Event Log, cloud audit logs), builds a searchable vector index, and uses an LLM to summarize anomalies, flag suspicious behavior patterns, and generate analyst-ready incident summaries. Zero rules required — pure semantic detection.

Architecture

Log Sources ──► [Ingest Pipeline] (Syslog/CEF) │ ├─ Parse & normalize │ └─ Chunk & embed ▼ [Vector Store (Chroma)] │ ▼ [RAG Query Engine] │ ├─ Semantic anomaly search │ └─ Temporal correlation ▼ [LLM Summarizer] │ ▼ [Report Output] (Markdown / JSON / HTML)
PythonChromaDBLangChain OpenAIPandasStreamlit

Skills Demonstrated

AI-powered cyber defense
RAG pipeline engineering
Vector search at scale
Threat detection patterns
Security data engineering
View on GitHub →
Phase 2

Adversarial ML & Secure AI Engineering

Months 3–5
P3

Adversarial ML

Adversarial Attack Playground

Planned

An interactive research environment implementing FGSM, PGD, Carlini-Wagner, and DeepFool attacks in PyTorch, with rich visualizations of decision boundaries, perturbation norms, and model robustness under attack. Designed for both learning and rigorous experimentation.

PyTorchARTFoolbox MatplotlibJupyterW&B

Research Note

Includes reproductions of Goodfellow et al. 2015 (FGSM), Madry et al. 2018 (PGD), and Moosavi-Dezfooli 2017 (Universal Perturbations) with annotated notebooks and original result comparisons.

Skills Demonstrated

Adversarial ML implementation
Model robustness evaluation
Research communication
PyTorch deep learning
Experimental methodology
View on GitHub →
P4

Secure AI Engineering

Secure MLOps Template

Planned

A production-ready GitHub template repository for ML projects that bakes in security from day one: automated model scanning on every PR, dependency auditing, secrets management via SOPS, signed model artifacts with provenance tracking, and container security.

What's Included

GitHub Actions CI with Trivy, Bandit, and Safety scans
SOPS-encrypted secrets for training/inference credentials
Model weight hashing + signed manifest generation
Docker image hardening (non-root, read-only filesystem)
Dependency pinning + SBOM generation
GitHub ActionsDockerSOPS TrivySigstorePython

Skills Demonstrated

Secure AI deployment
MLOps pipeline design
Supply-chain security
DevSecOps practices
Container hardening
View on GitHub →
P5

AI Threat Modeling

AI Threat Modeling Framework

Planned

A structured threat modeling framework for AI/ML systems, inspired by MITRE ATLAS and STRIDE but tailored to the unique attack surface of modern AI: training data, model weights, inference APIs, and the human-AI interaction layer. Includes worked examples and a reusable questionnaire.

MITRE ATLASSTRIDENIST AI RMF Draw.ioMarkdown

Framework Scope

Covers 6 AI attack surfaces: training data poisoning, model supply chain, inference evasion, prompt injection, model inversion, and membership inference. Each surface includes threat enumeration, severity scoring, and countermeasure mapping.

Skills Demonstrated

AI threat modeling
Systems security thinking
Architecture design
MITRE ATLAS fluency
Security documentation
View on GitHub →
Phase 3

High-Impact, Employer-Ready Work

Months 6–8
P6

AI Red Teaming

LLM Red Teaming Evaluation Suite

Planned

A comprehensive automated testing framework that probes LLMs across five attack categories, produces structured evaluation reports, and tracks safety regression over model versions. Designed to mirror the red-teaming workflows used at frontier AI labs before public releases.

Attack Coverage

Jailbreaks

DAN, AIM, SWITCH, role-play, token manipulation

Safety Bypasses

Refusal suppression, context hijacking

Toxicity

Hate speech, NSFW, harmful instruction elicitation

Data Extraction

PII leakage, training data extraction, prompt leakage

PythonPyRITGarak OpenAI EvalsLLM-as-JudgeFastAPI

Skills Demonstrated

AI red teaming methodology
Evaluation framework design
AI safety engineering
Automated testing pipelines
Structured reporting
View on GitHub →
P7

Multi-Agent Systems

AI-Powered SOC Assistant

Planned

A multi-agent system built on LangGraph that automates first-line SOC analyst tasks: ingesting alerts from a SIEM, correlating with threat intelligence feeds, generating plain-language incident summaries, suggesting investigation playbook steps, and producing formatted analyst reports.

Agent Graph

SIEM Alert ──► [Triage Agent] │ Classify severity │ Enrich with threat intel ▼ [Investigation Agent] │ Correlate events │ Query asset inventory │ Suggest MITRE ATT&CK TTPs ▼ [Report Agent] │ Generate narrative summary │ Suggest next actions │ Format for analyst review ▼ Analyst-Ready Incident Report
LangGraphLangChainOpenAI ElasticPythonFastAPI

Skills Demonstrated

Multi-agent system design
AI-assisted cybersecurity
Real-world SOC workflows
LangGraph orchestration
SIEM integration
View on GitHub →
P8

Research

Research Paper Reproductions

Planned

Faithful reproductions of 3–5 landmark papers in adversarial ML and AI security, each with annotated code, original result comparisons, and commentary on what holds up in 2026. Demonstrates research literacy, technical depth, and the ability to translate papers into working systems.

Paper #1

Goodfellow et al., 2015

"Explaining and Harnessing Adversarial Examples" — FGSM

Paper #2

Madry et al., 2018

"Towards Deep Learning Models Resistant to Adversarial Attacks" — PGD

Paper #3

Moosavi-Dezfooli, 2017

"Universal Adversarial Perturbations"

PyTorchJupyterCIFAR-10ImageNet