Visual graph classification for blockchain security: Experiences fine-tuning Qwen2-VL on AMD MI300X [D]

Hi everyone,

I’ve been working on a computer vision approach to a specific security problem in the "Agentic Economy": identifying malicious transaction patterns that are mathematically obfuscated but topologically distinct.

The Problem

Traditional rule-based security engines and even standard GNNs often struggle with "splitting attacks"—where a high-value transaction is fragmented into thousands of micro-transactions to bypass statistical thresholds. However, when these flows are projected as a 2D graph topology, they exhibit very specific adversarial signatures (Star patterns, centralized hubs, mixing chains).

The Approach: VLM for Graph Classification

Instead of relying on graph embeddings, I’ve experimented with a Vision-Language approach using Qwen2-VL-2B-Instruct. The intuition is that VLMs are increasingly efficient at recognizing structural relationships in 2D layouts.

Technical Specs:

  • Base Model: Qwen2-VL-2B-Instruct.
  • Fine-tuning: LoRA (r=16, alpha=32) targeting attention projections (q, k, v, o).
  • Dataset (Dogon-10K): I generated 10,000 synthetic transaction graph images using NetworkX and Matplotlib. The dataset covers four classes: NORMAL, DRAIN_STAR, MIXING_CHAIN, and COORDINATED_CLUSTER.
  • Hardware / Stack: Trained on an AMD MI300X using the ROCm stack. This was a great opportunity to stress-test PEFT/TRL on AMD hardware for vision-centric tasks.

Why VLM over GNN?

While GNNs are the standard for graph data, the "image-based" approach allowed for faster prototyping of adversarial pattern recognition without the complexity of building a custom graph auto-encoder for every new chain's schema. The VLM’s ability to interpret "visual intent" proved highly effective at distinguishing a decentralized organic ecosystem from a coordinated sybil attack.

Model & Code

The LoRA weights are available on Hugging Face for anyone interested in testing visual graph classification: 🔗 Hugging Face: https://huggingface.co/Ibonon/imina_na_lora

The full source code for the inference engine and the Dogon dataset generator is currently being cleaned up. 🔗 GitHub: [Under Construction]

I’m particularly interested in hearing if anyone else is using VLMs for visual anomaly detection in abstract data structures (like graphs or network logs).

submitted by /u/Any_Good_2682
[link] [comments]

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top