After following my last tutorial, you’ve printed your first SDO to the console - congratulations. Now you need to keep things: reload them tomorrow, link them, query them, and ship bundles without turning your workstation into a JSON graveyard.
This post is the practical guide I wish I’d had on day two: how to persist and query STIX using the stix2 datastore APIs, when to use MemoryStore, FileSystemStore, or a database backend like ArangoDB, and how to design a setup that scales beyond your laptop.
Choosing where to store your STIX data
Let’s start with the three main options developers actually use.
1. MemoryStore
- Use when: running tests, prototyping, or building fast ETL pipelines.
- Pros: ultra-fast, no setup, perfect for temporary datasets.
- Cons: volatile — data disappears when your script exits.
Best suited for:
- CI pipelines that generate and validate STIX on the fly.
- Transient transformations (e.g. parsing feeds into bundles).
2. FileSystemStore
- Use when: you need persistent, local storage — but not a full database.
- Pros: version-aware folder structure, portable, simple to sync or ship.
- Cons: slower with very large datasets; no advanced graph querying.
Best suited for:
- Developer workspaces
- Simple pipelines where JSON files are acceptable artifacts
3. Databases (e.g. ArangoDB)
- Use when: you want database-backed STIX with powerful graph queries.
- Pros: native graph storage, query language (AQL), fast relationships traversal, scales horizontally.
- Cons: requires setup and server management.
Best suited for:
- Production systems
- Threat intel platforms
- Environments with large STIX datasets that need complex querying
1. MemoryStore: Fast, Lightweight, and Disposable
When you just want to experiment, run tests, or build quick pipelines, MemoryStore is your friend.
It keeps everything in memory — no files, no databases, no setup.
When to use it:
- Unit tests and CI pipelines
- ETL or enrichment scripts that build STIX bundles on the fly
- Quick validation or conversion utilities
Think of it as a scratchpad for STIX.
Example: Creating and querying objects
memory_store_example.py
# python3 memory_store_example.py
from stix2 import MemoryStore, AttackPattern, ThreatActor, Relationship, TLP_GREEN
# Initialize a MemoryStore (everything is stored in memory)
ms = MemoryStore()
# Create STIX objects
ta = ThreatActor(
name="APT-404",
threat_actor_types=["state-sponsored"],
created_by_ref="identity--9779a2db-f98c-5f4b-8d08-8ee04e02dbb5",
object_marking_refs=[TLP_GREEN],
)
ap = AttackPattern(
name="Spear Phishing",
created_by_ref="identity--9779a2db-f98c-5f4b-8d08-8ee04e02dbb5",
object_marking_refs=[TLP_GREEN],
)
rel = Relationship(relationship_type="uses", source_ref=ta, target_ref=ap)
# Add all objects to the MemoryStore
ms.add([ta, ap, rel])
# Query objects
from stix2 import Filter
relationships = ms.query([Filter("type", "=", "relationship")])
print(relationships[0]["relationship_type"]) # uses
What happens:
- Objects are created and stored entirely in RAM.
- Once your script exits, they’re gone — but that’s often all you need.
💡 Tip: You can easily convert MemoryStore contents into a STIX Bundle for export:
from stix2 import Bundle
bundle = Bundle(ms.query([]))
print(bundle.serialize(pretty=True))
2. FileSystemStore: Durable, Local, and Developer-Friendly
When your STIX data needs to persist, but you don’t want the overhead of a full database, use the FileSystemStore.
It stores STIX objects on disk in a structured, version-aware folder layout — perfect for local projects or air-gapped systems.
When to use it
- Developer workspaces and prototypes
- Offline environments
- Simple STIX repositories
- Sharing datasets between systems or people
Example: Writing and reading from disk
# python3 filesystem_store_example.py
from stix2 import Identity, FileSystemStore
# Create a FileSystemStore (it will create folders automatically)
fs = FileSystemStore("tmp/stix2_store")
identity = Identity(
identity_class="organization",
name="Example Corp.",
)
fs.add([identity])
After running this, your store will look like this:
tmp/stix2_store/
└── identity/
└── identity--f1d3e7e1-1234-4b6c-8aaf-00bb55ee7d12/
└── 2025-06-23T09:00:00.000Z.json
Example: Querying by property
# python3 query_filesystem.py
from stix2 import FileSystemStore, Filter
fs = FileSystemStore("tmp/stix2_store")
filter_by_name = Filter("name", "=", "Example Corp.")
results = fs.query([filter_by_name])
for obj in results:
print(obj["type"], obj["name"])
Example: Following relationships
The FileSystemStore supports a simple graph traversal API:
related = fs.related_to("attack-pattern--b2c77df1-7aac-4b02-bdf1-6e71cb023d61")
for obj in related:
print(obj["type"], obj.get("name", obj.get("relationship_type")))
Why developers love it
- It’s just files — you can
git commit,scp, orrsyncyour entire store. - Each version of an object gets its own JSON file.
- Easy to inspect manually or diff in version control.
💡 Tip: When you’re ready to share, bundle the objects into one JSON:
from stix2 import Bundle
bundle = Bundle(fs.query([]))
with open("bundle.json", "w") as f:
f.write(bundle.serialize(pretty=True))
3. Databases: Scalable, Queryable, and Production-Ready
When you need real querying power — cross-object relationships, graph traversal, and high performance — it’s time to move to a database.
ArangoDB is a perfect fit for STIX because STIX data is fundamentally graph-shaped. Threat intel isn’t just flat objects—it’s relationships: Malware uses Infrastructure, Threat Actor targets Sector, Indicator indicates Campaign, and so on.
Most traditional databases struggle to query and traverse these relationships efficiently, but ArangoDB is a native multi-model database built for this kind of connected data. With STIX, every object already has globally unique IDs and relationships (source_ref, target_ref) that map naturally into graph edges. ArangoDB treats these not as awkward JOINs, but first-class graph queries using AQL.
On top of that, STIX data also benefits from flexible schema storage, and ArangoDB’s JSON-first document model is ideal. STIX objects evolve over time (versioning, extensions, added properties), which break rigid SQL schemas. With ArangoDB you can store each STIX SDO/SCO/SRO as-is, with powerful filtering, indexing, and deduplication support. Need to pivot from an IP → Domain → Malware → Threat Actor → Campaign chain in milliseconds? That’s why graph databases exist—and why ArangoDB handles STIX like a native language.
When to use it
- You’re building a threat intelligence platform or enrichment engine.
- You need to run queries across linked data (e.g., “which threat actors use this malware?”).
- You’re managing millions of STIX objects and relationships.
Example: Storing STIX objects in ArangoDB
stix2arango is a command line tool we built as a proof-of-concept to insert STIX bundles into ArangoDB.
You can read the installation guide for all the available options, but to illustrate how it works, here is an example command to import the MITRE ATT&CK Enterprise 15.1 dataset:
python3 stix2arango.py \
--file enterprise-attack-15_1.json \
--database stix2arango_demo \
--collection mitre_attack_enterprise \
--stix2arango_note v15.1 \
--ignore_embedded_relationships false \
--is_large_file
The following AQL query would return all current MITRE ATT&CK Technique STIX objects:
FOR doc IN mitre_attack_enterprise
FILTER doc._stix2arango_note == "v15.1"
AND doc.type == "attack-pattern"
AND doc.x_mitre_is_subtechnique != true
AND doc.x_mitre_deprecated != true
AND doc.revoked != true
AND doc.external_references ANY MATCH {
source_name == "mitre-attack" AND STARTS_WITH(external_id, "T")
}
RETURN {
id: doc.id,
name: doc.name,
description: doc.description,
external_ids: (
FOR r IN doc.external_references
FILTER r.source_name == "mitre-attack" AND STARTS_WITH(r.external_id, "T")
RETURN r.external_id
)
}
Where is becomes useful is in traversing relationships between objects.
For example, you might want to uncover all objects linked to a particular ATT&CK Group (in this case G0016):
FOR doc IN mitre_attack_enterprise_vertex_collection
FILTER doc._stix2arango_note == "v15.1"
AND doc.type == "intrusion-set"
AND doc.x_mitre_deprecated != true
AND doc.revoked != true
AND doc.external_references ANY MATCH {
source_name == "mitre-attack" AND external_id == "T1595"
}
FOR v, e, p IN 1..2 ANY doc GRAPH "mitre_attack_enterprise_graph"
RETURN {
intrusion_set: doc.name,
related: v.name,
relationship: e.relationship_type,
source: e.source_ref,
target: e.target_ref
}
Why it’s worth it
- Full graph traversal across SDOs, SCOs, and SROs
- Complex queries via ArangoDB’s AQL
- Horizontal scaling and indexing
- Integrates seamlessly with existing
stix2objects
💡 Tip: For production systems, use ArangoDB’s SmartGraph or Enterprise Graph features for even faster lookups across large datasets.
Choosing the Right Store
| Use Case | Best Option | Why |
|---|---|---|
| Unit tests / quick scripts | MemoryStore | Zero setup, fast, disposable |
| Local or offline use | FileSystemStore | Durable and portable |
| Production systems | Database (ArangoDB) | Graph-native, scalable, queryable |
| Feed ingestion or ETL | Database (ArangoDB) | Handles large parallel inserts |
| Research or prototyping | FileSystemStore | Simplicity wins |
TL;DR
- MemoryStore — temporary, lightweight, great for testing and automation.
- FileSystemStore — local, durable, human-readable, ideal for small-medium datasets.
- ArangoDB (via stix2arango) — database-backed, graph-powered, built for scale.
Start simple with MemoryStore.
Persist with FileSystemStore.
Scale with stix2arango.
CTI Butler
The CTI Search Engine.
Discuss this post
Head on over to the dogesec community to discuss this post.
Never miss an update
Sign up to receive new articles in your inbox as they published.
