Update: the open-source 62K multimodal prompt injection dataset now has GCG suffixes, multi-turn orchestration, indirect injection, tool abuse, and more (v2 + v3 added overnight)

Posted here yesterday about the v1 cross-modal dataset. One of you suggested adding GCG adversarial suffixes and multi-turn attack coverage. That feedback turned into v2 and v3 being built and shipped within 24 hours. The dataset has gone from 47K to 62K samples.

HuggingFace: https://huggingface.co/datasets/Bordair/bordair-multimodal GitHub: https://github.com/Josh-blythe/bordair-multimodal-v1/ MIT licensed.

The repo's also picked up early interest from engineers at NVIDIA, PayPal, NetApp, and AUGMXNT (based on GitHub stars), which is a good signal that this is hitting the right audience.

What's new since yesterday:

v2: 14,358 samples (the stuff you asked for) - 162 PyRIT jailbreak templates x 50 seeds. Covers DAN variants, Pliny model-specific jailbreaks (Claude, GPT, Gemini, Llama, DeepSeek), roleplay, authority impersonation - 2,400 GCG adversarial suffix samples. Includes a nanoGCG generator you can point at your own local model:

bash python generate_v2_pyrit.py --gcg-model lmsys/vicuna-7b-v1.5 --gcg-steps 250

Swap in whatever you're running locally, get suffixes tuned to its specific vulnerabilities.

1,656 AutoDAN fluent wrappers. These are the human-readable jailbreaks that perplexity filters miss entirely
13 encoding converters (base64, ROT13, leetspeak, morse, NATO phonetic, etc.) x 138 seeds
Multi-turn: Crescendo 6-turn escalation, PAIR iterative refinement, TAP tree-search, Skeleton Key, many-shot (10/25/50/100-shot)
152 ensemble samples combining multi-turn final turns + GCG suffixes (near-100% ASR on frontier models per Andriushchenko et al. 2024)

v3: 187 samples covering gaps in v1 and v2 Indirect injection (RAG poisoning, email/calendar/API response manipulation), system prompt extraction, tool/function-call injection, agent CoT manipulation, structured data attacks (JSON/XML/CSV/YAML), code-switching between languages mid-sentence, homoglyph/Unicode tricks, QR/barcode injection, ASCII art bypass.

The v3 categories are specifically the real-world attack surfaces that existing datasets underrepresent. If you're running a RAG pipeline or an agent with tool access, the indirect injection and tool-call samples are worth looking at.

v1 is unchanged from yesterday: 47,518 cross-modal samples 23,759 attacks across text+image, text+document, text+audio, triple, and quad modality combos. 23,759 benign matched 1:1 by modality with edge cases like .gitignore config and heart bypass surgery to stress-test false positives.

Quick start hasn't changed:

```python import json from pathlib import Path

all_attacks = [] for version_dir in ["payloads", "payloads_v2", "payloads_v3"]: for cat_dir in Path(version_dir).iterdir(): if cat_dir.is_dir(): for f in sorted(cat_dir.glob("*.json")): all_attacks.extend(json.loads(f.read_text("utf-8")))

benign = [] for f in Path("benign").glob("multimodal_*.json"): benign.extend(json.loads(f.read_text("utf-8")))

expected_detection = true (attack) / false (benign)

```

Appreciate the feedback from yesterday. This is exactly how open-source is supposed to work. If there are other attack families or vectors you think are missing, let me know and I'll add them.

submitted by /u/BordairAPI
[link] [comments]

expected_detection = true (attack) / false (benign)

Leave a Comment