Tools & Projects
Three phases, eight projects: LLM defense, adversarial ML, AI red teaming, and secure AI system design — built progressively and documented openly.
LLM Security
A production-quality Python library providing defensive patterns, semantic filters, and evaluation scripts that detect and mitigate prompt injection attacks across LLM API providers. Covers direct injection, indirect injection via tool outputs, and jailbreak-as-injection hybrids.
| Threat | Severity | Mitigation |
|---|---|---|
| Direct prompt injection via user input | HIGH | Semantic filter + system prompt hardening |
| Indirect injection via tool/RAG outputs | HIGH | Output sanitization before re-injecting into context |
| Jailbreak via role-play framing | MEDIUM | Instruction-following verification post-response |
| System prompt leakage | MEDIUM | Post-response classifier for sensitive content |
AI Cyber Defense
A tool that ingests raw log data (syslog, Windows Event Log, cloud audit logs), builds a searchable vector index, and uses an LLM to summarize anomalies, flag suspicious behavior patterns, and generate analyst-ready incident summaries. Zero rules required — pure semantic detection.
Adversarial ML
An interactive research environment implementing FGSM, PGD, Carlini-Wagner, and DeepFool attacks in PyTorch, with rich visualizations of decision boundaries, perturbation norms, and model robustness under attack. Designed for both learning and rigorous experimentation.
Research Note
Includes reproductions of Goodfellow et al. 2015 (FGSM), Madry et al. 2018 (PGD), and Moosavi-Dezfooli 2017 (Universal Perturbations) with annotated notebooks and original result comparisons.
Secure AI Engineering
A production-ready GitHub template repository for ML projects that bakes in security from day one: automated model scanning on every PR, dependency auditing, secrets management via SOPS, signed model artifacts with provenance tracking, and container security.
AI Threat Modeling
A structured threat modeling framework for AI/ML systems, inspired by MITRE ATLAS and STRIDE but tailored to the unique attack surface of modern AI: training data, model weights, inference APIs, and the human-AI interaction layer. Includes worked examples and a reusable questionnaire.
Framework Scope
Covers 6 AI attack surfaces: training data poisoning, model supply chain, inference evasion, prompt injection, model inversion, and membership inference. Each surface includes threat enumeration, severity scoring, and countermeasure mapping.
AI Red Teaming
A comprehensive automated testing framework that probes LLMs across five attack categories, produces structured evaluation reports, and tracks safety regression over model versions. Designed to mirror the red-teaming workflows used at frontier AI labs before public releases.
Jailbreaks
DAN, AIM, SWITCH, role-play, token manipulation
Safety Bypasses
Refusal suppression, context hijacking
Toxicity
Hate speech, NSFW, harmful instruction elicitation
Data Extraction
PII leakage, training data extraction, prompt leakage
Multi-Agent Systems
A multi-agent system built on LangGraph that automates first-line SOC analyst tasks: ingesting alerts from a SIEM, correlating with threat intelligence feeds, generating plain-language incident summaries, suggesting investigation playbook steps, and producing formatted analyst reports.
Research
Faithful reproductions of 3–5 landmark papers in adversarial ML and AI security, each with annotated code, original result comparisons, and commentary on what holds up in 2026. Demonstrates research literacy, technical depth, and the ability to translate papers into working systems.
Paper #1
Goodfellow et al., 2015
"Explaining and Harnessing Adversarial Examples" — FGSM
Paper #2
Madry et al., 2018
"Towards Deep Learning Models Resistant to Adversarial Attacks" — PGD
Paper #3
Moosavi-Dezfooli, 2017
"Universal Adversarial Perturbations"