Projects — Paul Turner AI Security

Phase 1

Core AI Security Foundations

Months 1–2

LLM Security

LLM Prompt Injection Defense Toolkit

In Progress

Summary

A production-quality Python library providing defensive patterns, semantic filters, and evaluation scripts that detect and mitigate prompt injection attacks across LLM API providers. Covers direct injection, indirect injection via tool outputs, and jailbreak-as-injection hybrids.

Architecture

Input ──► [Pre-LLM Sanitizer] │ ├─ Keyword block list │ ├─ Semantic similarity filter (embeddings) │ └─ Regex-based pattern detector ▼ [LLM API Gateway] │ ▼ [Post-Response Auditor] │ ├─ Instruction-following verifier │ └─ Data-leakage classifier ▼ Output ◄── [Safe Response]

Threat Model

Threat	Severity	Mitigation
Direct prompt injection via user input	HIGH	Semantic filter + system prompt hardening
Indirect injection via tool/RAG outputs	HIGH	Output sanitization before re-injecting into context
Jailbreak via role-play framing	MEDIUM	Instruction-following verification post-response
System prompt leakage	MEDIUM	Post-response classifier for sensitive content

Skills Demonstrated

LLM security engineering

Prompt injection mitigation

Secure AI app design

Evaluation framework design

Python library packaging

Stack

Python LangChain OpenAI Sentence-Transformers FastAPI pytest

View on GitHub → Research Notes →

AI Cyber Defense

AI-Powered Log Analyzer

In Progress

A tool that ingests raw log data (syslog, Windows Event Log, cloud audit logs), builds a searchable vector index, and uses an LLM to summarize anomalies, flag suspicious behavior patterns, and generate analyst-ready incident summaries. Zero rules required — pure semantic detection.

Architecture

Log Sources ──► [Ingest Pipeline] (Syslog/CEF) │ ├─ Parse & normalize │ └─ Chunk & embed ▼ [Vector Store (Chroma)] │ ▼ [RAG Query Engine] │ ├─ Semantic anomaly search │ └─ Temporal correlation ▼ [LLM Summarizer] │ ▼ [Report Output] (Markdown / JSON / HTML)

PythonChromaDBLangChain OpenAIPandasStreamlit

Skills Demonstrated

AI-powered cyber defense

RAG pipeline engineering

Vector search at scale

Threat detection patterns

Security data engineering

View on GitHub →

Phase 2

Adversarial ML & Secure AI Engineering

Months 3–5

Adversarial ML

Adversarial Attack Playground

Planned

An interactive research environment implementing FGSM, PGD, Carlini-Wagner, and DeepFool attacks in PyTorch, with rich visualizations of decision boundaries, perturbation norms, and model robustness under attack. Designed for both learning and rigorous experimentation.

PyTorchARTFoolbox MatplotlibJupyterW&B

Research Note

Includes reproductions of Goodfellow et al. 2015 (FGSM), Madry et al. 2018 (PGD), and Moosavi-Dezfooli 2017 (Universal Perturbations) with annotated notebooks and original result comparisons.

Skills Demonstrated

Adversarial ML implementation

Model robustness evaluation

Research communication

PyTorch deep learning

Experimental methodology

View on GitHub →

Secure AI Engineering

Secure MLOps Template

Planned

A production-ready GitHub template repository for ML projects that bakes in security from day one: automated model scanning on every PR, dependency auditing, secrets management via SOPS, signed model artifacts with provenance tracking, and container security.

What's Included

GitHub Actions CI with Trivy, Bandit, and Safety scans

SOPS-encrypted secrets for training/inference credentials

Model weight hashing + signed manifest generation

Docker image hardening (non-root, read-only filesystem)

Dependency pinning + SBOM generation

GitHub ActionsDockerSOPS TrivySigstorePython

Skills Demonstrated

Secure AI deployment

MLOps pipeline design

Supply-chain security

DevSecOps practices

Container hardening

View on GitHub →

AI Threat Modeling

AI Threat Modeling Framework

Planned

A structured threat modeling framework for AI/ML systems, inspired by MITRE ATLAS and STRIDE but tailored to the unique attack surface of modern AI: training data, model weights, inference APIs, and the human-AI interaction layer. Includes worked examples and a reusable questionnaire.

MITRE ATLASSTRIDENIST AI RMF Draw.ioMarkdown

Framework Scope

Covers 6 AI attack surfaces: training data poisoning, model supply chain, inference evasion, prompt injection, model inversion, and membership inference. Each surface includes threat enumeration, severity scoring, and countermeasure mapping.

Skills Demonstrated

AI threat modeling

Systems security thinking

Architecture design

MITRE ATLAS fluency

Security documentation

View on GitHub →

Phase 3

High-Impact, Employer-Ready Work

Months 6–8

AI Red Teaming

LLM Red Teaming Evaluation Suite

Planned

A comprehensive automated testing framework that probes LLMs across five attack categories, produces structured evaluation reports, and tracks safety regression over model versions. Designed to mirror the red-teaming workflows used at frontier AI labs before public releases.

Attack Coverage

Jailbreaks

DAN, AIM, SWITCH, role-play, token manipulation

Safety Bypasses

Refusal suppression, context hijacking

Toxicity

Hate speech, NSFW, harmful instruction elicitation

Data Extraction

PII leakage, training data extraction, prompt leakage

PythonPyRITGarak OpenAI EvalsLLM-as-JudgeFastAPI

Skills Demonstrated

AI red teaming methodology

Evaluation framework design

AI safety engineering

Automated testing pipelines

Structured reporting

View on GitHub →

Multi-Agent Systems

AI-Powered SOC Assistant

Planned

A multi-agent system built on LangGraph that automates first-line SOC analyst tasks: ingesting alerts from a SIEM, correlating with threat intelligence feeds, generating plain-language incident summaries, suggesting investigation playbook steps, and producing formatted analyst reports.

Agent Graph

SIEM Alert ──► [Triage Agent] │ Classify severity │ Enrich with threat intel ▼ [Investigation Agent] │ Correlate events │ Query asset inventory │ Suggest MITRE ATT&CK TTPs ▼ [Report Agent] │ Generate narrative summary │ Suggest next actions │ Format for analyst review ▼ Analyst-Ready Incident Report

LangGraphLangChainOpenAI ElasticPythonFastAPI

Skills Demonstrated

Multi-agent system design

AI-assisted cybersecurity

Real-world SOC workflows

LangGraph orchestration

SIEM integration

View on GitHub →

Research

Research Paper Reproductions

Planned

Faithful reproductions of 3–5 landmark papers in adversarial ML and AI security, each with annotated code, original result comparisons, and commentary on what holds up in 2026. Demonstrates research literacy, technical depth, and the ability to translate papers into working systems.

Paper #1

Goodfellow et al., 2015

"Explaining and Harnessing Adversarial Examples" — FGSM

Paper #2

Madry et al., 2018

"Towards Deep Learning Models Resistant to Adversarial Attacks" — PGD

Paper #3

Moosavi-Dezfooli, 2017

"Universal Adversarial Perturbations"

PyTorchJupyterCIFAR-10ImageNet

8 AI Security Projects

Core AI Security Foundations

LLM Prompt Injection Defense Toolkit

Summary

Architecture

Threat Model

Skills Demonstrated

Stack

AI-Powered Log Analyzer

Architecture

Skills Demonstrated

Adversarial ML & Secure AI Engineering

Adversarial Attack Playground

Skills Demonstrated

Secure MLOps Template

What's Included

Skills Demonstrated

AI Threat Modeling Framework

Skills Demonstrated

High-Impact, Employer-Ready Work

LLM Red Teaming Evaluation Suite

Attack Coverage

Skills Demonstrated

AI-Powered SOC Assistant

Agent Graph

Skills Demonstrated

Research Paper Reproductions