STIX Storage for Developers: Memory, Files, and Databases

After following my last tutorial, you’ve printed your first SDO to the console - congratulations. Now you need to keep things: reload them tomorrow, link them, query them, and ship bundles without turning your workstation into a JSON graveyard.

This post is the practical guide I wish I’d had on day two: how to persist and query STIX using the stix2 datastore APIs, when to use MemoryStore, FileSystemStore, or a database backend like ArangoDB, and how to design a setup that scales beyond your laptop.

Choosing where to store your STIX data

Let’s start with the three main options developers actually use.

1. MemoryStore

Use when: running tests, prototyping, or building fast ETL pipelines.
Pros: ultra-fast, no setup, perfect for temporary datasets.
Cons: volatile — data disappears when your script exits.

Best suited for:

CI pipelines that generate and validate STIX on the fly.
Transient transformations (e.g. parsing feeds into bundles).

2. FileSystemStore

Use when: you need persistent, local storage — but not a full database.
Pros: version-aware folder structure, portable, simple to sync or ship.
Cons: slower with very large datasets; no advanced graph querying.

Best suited for:

Developer workspaces
Simple pipelines where JSON files are acceptable artifacts

3. Databases (e.g. ArangoDB)

Use when: you want database-backed STIX with powerful graph queries.
Pros: native graph storage, query language (AQL), fast relationships traversal, scales horizontally.
Cons: requires setup and server management.

Best suited for:

Production systems
Threat intel platforms
Environments with large STIX datasets that need complex querying

1. MemoryStore: Fast, Lightweight, and Disposable

When you just want to experiment, run tests, or build quick pipelines, MemoryStore is your friend.

It keeps everything in memory — no files, no databases, no setup.

When to use it:

Unit tests and CI pipelines
ETL or enrichment scripts that build STIX bundles on the fly
Quick validation or conversion utilities

Think of it as a scratchpad for STIX.

Example: Creating and querying objects

memory_store_example.py

# python3 memory_store_example.py
from stix2 import MemoryStore, AttackPattern, ThreatActor, Relationship, TLP_GREEN

# Initialize a MemoryStore (everything is stored in memory)
ms = MemoryStore()

# Create STIX objects
ta = ThreatActor(
    name="APT-404",
    threat_actor_types=["state-sponsored"],
    created_by_ref="identity--9779a2db-f98c-5f4b-8d08-8ee04e02dbb5",
    object_marking_refs=[TLP_GREEN],
)

ap = AttackPattern(
    name="Spear Phishing",
    created_by_ref="identity--9779a2db-f98c-5f4b-8d08-8ee04e02dbb5",
    object_marking_refs=[TLP_GREEN],
)

rel = Relationship(relationship_type="uses", source_ref=ta, target_ref=ap)

# Add all objects to the MemoryStore
ms.add([ta, ap, rel])

# Query objects
from stix2 import Filter
relationships = ms.query([Filter("type", "=", "relationship")])
print(relationships[0]["relationship_type"])  # uses

What happens:

Objects are created and stored entirely in RAM.
Once your script exits, they’re gone — but that’s often all you need.

💡 Tip: You can easily convert MemoryStore contents into a STIX Bundle for export:

from stix2 import Bundle
bundle = Bundle(ms.query([]))
print(bundle.serialize(pretty=True))

2. FileSystemStore: Durable, Local, and Developer-Friendly

When your STIX data needs to persist, but you don’t want the overhead of a full database, use the FileSystemStore.

It stores STIX objects on disk in a structured, version-aware folder layout — perfect for local projects or air-gapped systems.

When to use it

Developer workspaces and prototypes
Offline environments
Simple STIX repositories
Sharing datasets between systems or people

Example: Writing and reading from disk

# python3 filesystem_store_example.py
from stix2 import Identity, FileSystemStore

# Create a FileSystemStore (it will create folders automatically)
fs = FileSystemStore("tmp/stix2_store")

identity = Identity(
    identity_class="organization",
    name="Example Corp.",
)

fs.add([identity])

After running this, your store will look like this:

tmp/stix2_store/
└── identity/
    └── identity--f1d3e7e1-1234-4b6c-8aaf-00bb55ee7d12/
        └── 2025-06-23T09:00:00.000Z.json

Example: Querying by property

# python3 query_filesystem.py
from stix2 import FileSystemStore, Filter

fs = FileSystemStore("tmp/stix2_store")
filter_by_name = Filter("name", "=", "Example Corp.")
results = fs.query([filter_by_name])

for obj in results:
    print(obj["type"], obj["name"])

Example: Following relationships

The FileSystemStore supports a simple graph traversal API:

related = fs.related_to("attack-pattern--b2c77df1-7aac-4b02-bdf1-6e71cb023d61")
for obj in related:
    print(obj["type"], obj.get("name", obj.get("relationship_type")))

Why developers love it

It’s just files — you can git commit, scp, or rsync your entire store.
Each version of an object gets its own JSON file.
Easy to inspect manually or diff in version control.

💡 Tip: When you’re ready to share, bundle the objects into one JSON:

from stix2 import Bundle
bundle = Bundle(fs.query([]))
with open("bundle.json", "w") as f:
    f.write(bundle.serialize(pretty=True))

3. Databases: Scalable, Queryable, and Production-Ready

When you need real querying power — cross-object relationships, graph traversal, and high performance — it’s time to move to a database.

ArangoDB is a perfect fit for STIX because STIX data is fundamentally graph-shaped. Threat intel isn’t just flat objects—it’s relationships: Malware uses Infrastructure, Threat Actor targets Sector, Indicator indicates Campaign, and so on.

Most traditional databases struggle to query and traverse these relationships efficiently, but ArangoDB is a native multi-model database built for this kind of connected data. With STIX, every object already has globally unique IDs and relationships (source_ref, target_ref) that map naturally into graph edges. ArangoDB treats these not as awkward JOINs, but first-class graph queries using AQL.

On top of that, STIX data also benefits from flexible schema storage, and ArangoDB’s JSON-first document model is ideal. STIX objects evolve over time (versioning, extensions, added properties), which break rigid SQL schemas. With ArangoDB you can store each STIX SDO/SCO/SRO as-is, with powerful filtering, indexing, and deduplication support. Need to pivot from an IP → Domain → Malware → Threat Actor → Campaign chain in milliseconds? That’s why graph databases exist—and why ArangoDB handles STIX like a native language.

When to use it

You’re building a threat intelligence platform or enrichment engine.
You need to run queries across linked data (e.g., “which threat actors use this malware?”).
You’re managing millions of STIX objects and relationships.

Example: Storing STIX objects in ArangoDB

stix2arango is a command line tool we built as a proof-of-concept to insert STIX bundles into ArangoDB.

You can read the installation guide for all the available options, but to illustrate how it works, here is an example command to import the MITRE ATT&CK Enterprise 15.1 dataset:

python3 stix2arango.py \
    --file enterprise-attack-15_1.json \
    --database stix2arango_demo \
    --collection mitre_attack_enterprise \
    --stix2arango_note v15.1 \
    --ignore_embedded_relationships false \
    --is_large_file

The following AQL query would return all current MITRE ATT&CK Technique STIX objects:

FOR doc IN mitre_attack_enterprise
  FILTER doc._stix2arango_note == "v15.1"
    AND doc.type == "attack-pattern"
    AND doc.x_mitre_is_subtechnique != true
    AND doc.x_mitre_deprecated != true
    AND doc.revoked != true
    AND doc.external_references ANY MATCH {
      source_name == "mitre-attack" AND STARTS_WITH(external_id, "T")
    }
  RETURN {
    id: doc.id,
    name: doc.name,
    description: doc.description,
    external_ids: (
      FOR r IN doc.external_references
        FILTER r.source_name == "mitre-attack" AND STARTS_WITH(r.external_id, "T")
        RETURN r.external_id
    )
  }

Where is becomes useful is in traversing relationships between objects.

For example, you might want to uncover all objects linked to a particular ATT&CK Group (in this case G0016):

FOR doc IN mitre_attack_enterprise_vertex_collection
  FILTER doc._stix2arango_note == "v15.1"
    AND doc.type == "intrusion-set"
    AND doc.x_mitre_deprecated != true
    AND doc.revoked != true
    AND doc.external_references ANY MATCH {
      source_name == "mitre-attack" AND external_id == "T1595"
    }
  FOR v, e, p IN 1..2 ANY doc GRAPH "mitre_attack_enterprise_graph"
    RETURN {
      intrusion_set: doc.name,
      related: v.name,
      relationship: e.relationship_type,
      source: e.source_ref,
      target: e.target_ref
    }

Why it’s worth it

Full graph traversal across SDOs, SCOs, and SROs
Complex queries via ArangoDB’s AQL
Horizontal scaling and indexing
Integrates seamlessly with existing stix2 objects

💡 Tip: For production systems, use ArangoDB’s SmartGraph or Enterprise Graph features for even faster lookups across large datasets.

Choosing the Right Store

Use Case	Best Option	Why
Unit tests / quick scripts	MemoryStore	Zero setup, fast, disposable
Local or offline use	FileSystemStore	Durable and portable
Production systems	Database (ArangoDB)	Graph-native, scalable, queryable
Feed ingestion or ETL	Database (ArangoDB)	Handles large parallel inserts
Research or prototyping	FileSystemStore	Simplicity wins

TL;DR

MemoryStore — temporary, lightweight, great for testing and automation.
FileSystemStore — local, durable, human-readable, ideal for small-medium datasets.
ArangoDB (via stix2arango) — database-backed, graph-powered, built for scale.

Start simple with MemoryStore.

Persist with FileSystemStore.

Scale with stix2arango.

CTI Butler

The CTI Search Engine.

Start searching...

Discuss this post

Head on over to the dogesec community to discuss this post.

Join the discussion...

Never miss an update

Your subscription could not be saved. Please try again.

Your subscription has been successful.