If you are reading this blog post via a 3rd party source it is very likely that many parts of it will not render correctly. Please view the post on dogesec.com for the full interactive viewing experience.

If you prefer, you can also access the markdown of this post here.

tl;dr

Take CVEs in STIX format. Take ATT&CK data in STIX format. Join them using STIX SROs. Search for CVEs by ATT&CK IDs.

Overview

I have described previously about how CTI Butler joins data from different knowledgebase.

One of the features I’ve recently being working on is linking the 200k+ CVE objects in CTI Butler to ATT&CK (it already links them to Sigma Rules, CWEs and CPEs).

Imagine being able to begin a hunt by asking; which CVEs published in the last month are using the ATT&CK Enterprise Technique, Content Injection (T1659)? Which CVEs are using PACEMAKER (S1109)? Or any of the other object types reported by ATT&CK.

Of course this not only allows you to pivot on CVEs using ATT&CK, but also any other information you might hold that is linked into MITRE ATT&CK (e.g. you might be tracking actors using ATT&CK tools, etc.).

The problem is, how do you take a CVE and link it ATT&CK.

Getting CVE data

CTI Butler already holds all published CVEs in STIX 2.1 format (generated by cve2stix).

For this use-case we only case about the Vulnerability object (because these contain the most descriptive information about each CVE).

For example to get all CVEs published in 2018

FOR doc IN nvd_cve_vertex_collection
    FILTER doc.type == "vulnerability"
    AND doc.revoked != true
    AND doc.created >= "2020-01-01T00:00:00.000Z" AND doc.created <= "2020-12-31T23:59:59.999Z"
    COLLECT WITH COUNT INTO length
    RETURN length
[
  18304
]

For demo purposes, lets look at a smaller subset of description for CVEs published in January 2020.

FOR doc IN nvd_cve_vertex_collection
    FILTER doc.type == "vulnerability"
    AND doc.revoked != true
    AND doc.created >= "2020-01-01T00:00:00.000Z" AND doc.created <= "2020-01-31T23:59:59.999Z"
    RETURN {
        id: doc.id,
        cve: doc.name,
        published: doc.created,
        modified: doc.modified,
        description: doc.description
        }

(this generates over 1000 results, so we need to page through them, as only 1000 will be returned by this query)

[
  {
    "id": "vulnerability--7256211c-e7f8-521b-af2e-9575690af8f1",
    "cve": "CVE-2019-15985",
    "published": "2020-01-06T08:15:11.503Z",
    "modified": "2020-01-08T20:55:54.203Z",
    "description": "Multiple vulnerabilities in the REST and SOAP API endpoints of Cisco Data Center Network Manager (DCNM) could allow an authenticated, remote attacker to execute arbitrary SQL commands on an affected device. To exploit these vulnerabilities, an attacker would need administrative privileges on the DCNM application. For more information about these vulnerabilities, see the Details section of this advisory. Note: The severity of these vulnerabilities is aggravated by the vulnerabilities described in the Cisco Data Center Network Manager Authentication Bypass Vulnerabilities advisory, published simultaneously with this one."
  },
  {
    "id": "vulnerability--620f4cb1-67db-57bb-b7b0-81f6c1b45519",
    "cve": "CVE-2016-6587",
    "published": "2020-01-08T18:15:10.213Z",
    "modified": "2020-01-13T19:50:15.777Z",
    "description": "An Information Disclosure vulnerability exists in the mid.dat file stored on the SD card in Symantec Norton Mobile Security for Android before 3.16, which could let a local malicious user obtain sensitive information."
  },
  {
    "id": "vulnerability--a68f7180-2a48-520c-b027-19a4404e3eaa",
    "cve": "CVE-2015-5952",
    "published": "2020-01-15T17:15:13.537Z",
    "modified": "2020-01-22T15:28:32.637Z",
    "description": "Directory traversal vulnerability in Thomson Reuters for FATCA before 5.2 allows remote attackers to execute arbitrary files via the item parameter."
  },

Side note, you might be wondering why a CVE with ID CVE-2015-5952 was publushed in 2020. I am not to sure either, but this is the date being reported as the created time by the NVD CVE API that cve2stix uses to assign published value

curl --location 'https://services.nvd.nist.gov/rest/json/cves/2.0?cveId=CVE-2015-5952'
{
    "resultsPerPage": 1,
    "startIndex": 0,
    "totalResults": 1,
    "format": "NVD_CVE",
    "version": "2.0",
    "timestamp": "2024-06-15T15:07:46.420",
    "vulnerabilities": [
        {
            "cve": {
                "id": "CVE-2015-5952",
                "sourceIdentifier": "[email protected]",
                "published": "2020-01-15T17:15:13.537",
                "lastModified": "2020-01-22T15:28:32.637",

Back to the subject at hand…

Using an AI model to map descriptions to ATT&CK objects

Most description values are very short.

Though I am still confident a well trained AI model will be more than capable.

As a POC I’ll start with a generic model, GPT-4o;

<CVE ID>
<CVE DESCRIPTION>

What MITRE ATT&CK concepts are being described in this text?

For each ATT&CK concept identified, print your response as only JSON in the following structure:

{
    "CVE_ID": {
        "detected_objects": [
        {
            attack_id: "ID",
            attack_name: "NAME",
            confidence_score: "SCORE"
        },
        {
            attack_id: "ID",
            attack_name: "NAME",
            confidence_score: "SCORE"
        }
    ]
}

Where confidence score defines how sure you are this technique or subtechnique is being described in the text (between 0 [lowest] and 1 [highest])`

Lets try the above command with the description of CVE-2016-6587:

An Information Disclosure vulnerability exists in the mid.dat file stored on the SD card in Symantec Norton Mobile Security for Android before 3.16, which could let a local malicious user obtain sensitive information.

Which GPT-4o returns;

{
    "CVE-2016-6587": {
        "detected_objects": [
        {
            "attack_id": "T1005",
            "attack_name": "Data from Local System",
            "confidence_score": 0.9
        },
        {
            "attack_id": "T1586",
            "attack_name": "Compromise Accounts",
            "confidence_score": 0.7
        },
        {
            "attack_id": "T1589",
            "attack_name": "Gather Victim Identity Information",
            "confidence_score": 0.7
        }
    ]
}

At this point I’m not looking for model accuracy because the model can be swapped out easily as required – we have some much better locally trained models on ATT&CK data. There are also off-the-shelf models available to you, e.g. those that ship with TRAM.

The point being is the general concept here works, and also allows me to set a confidence threshold in my code for allowing only high confidence ATT&CK matches.

If you want to make a public model more accurate

HOWEVER, I do appreciate that many reading this won’t have the resources, datasets, or time to train a model. This is where retrieval-augmented generation (RAG) can help.

To do this you can use CTI Butler to grab a list of all relevant ATT&CK objects you want the AI to be aware of;

FOR doc IN mitre_attack_enterprise_vertex_collection
  FILTER doc._stix2arango_note != "automatically imported on collection creation"
  AND doc._stix2arango_note == "v15.1"
  AND (
    doc.type == "x-mitre-tactic" OR 
    doc.type == "attack-pattern" OR 
    doc.type == "course-of-action" OR 
    doc.type == "intrusion-set" OR 
    doc.type == "malware" OR 
    doc.type == "tool" OR 
    doc.type == "campaign" OR 
    doc.type == "x-mitre-data-source"
  )
  LET attack_id = (
    FOR extRef IN (FOR ref IN doc.external_references RETURN ref)
    FILTER extRef.source_name == "mitre-attack"
    LIMIT 1
    RETURN extRef.external_id
  )[0]
  SORT attack_id
  RETURN {
    attack_id: attack_id,
    name: doc.name,
    description: doc.description
  }

Returns;

[
  {
    "attack_id": "C0001",
    "name": "Frankenstein",
    "description": "[Frankenstein](https://attack.mitre.org/campaigns/C0001) was described by security researchers as a highly-targeted campaign conducted by moderately sophisticated and highly resourceful threat actors in early 2019. The unidentified actors primarily relied on open source tools, including [Empire](https://attack.mitre.org/software/S0363). The campaign name refers to the actors' ability to piece together several unrelated open-source tool components.(Citation: Talos Frankenstein June 2019)"
  },
  {
    "attack_id": "C0002",
    "name": "Night Dragon",
    "description": "[Night Dragon](https://attack.mitre.org/campaigns/C0002) was a cyber espionage campaign that targeted oil, energy, and petrochemical companies, along with individuals and executives in Kazakhstan, Taiwan, Greece, and the United States. The unidentified threat actors searched for information related to oil and gas field production systems, financials, and collected data from SCADA systems. Based on the observed techniques, tools, and network activities, security researchers assessed the campaign involved a threat group based in China.(Citation: McAfee Night Dragon)"
  },
  {
    "attack_id": "C0004",
    "name": "CostaRicto",
    "description": "[CostaRicto](https://attack.mitre.org/campaigns/C0004) was a suspected hacker-for-hire cyber espionage campaign that targeted multiple industries worldwide, with a large number being financial institutions. [CostaRicto](https://attack.mitre.org/campaigns/C0004) actors targeted organizations in Europe, the Americas, Asia, Australia, and Africa, with a large concentration in South Asia (especially India, Bangladesh, and Singapore), using custom malware, open source tools, and a complex network of proxies and SSH tunnels.(Citation: BlackBerry CostaRicto November 2020)"
  },
  {
    "attack_id": "C0005",
    "name": "Operation Spalax",
    "description": "[Operation Spalax](https://attack.mitre.org/campaigns/C0005) was a campaign that primarily targeted Colombian government organizations and private companies, particularly those associated with the energy and metallurgical industries. The [Operation Spalax](https://attack.mitre.org/campaigns/C0005) threat actors distributed commodity malware and tools using generic phishing topics related to COVID-19, banking, and law enforcement action. Security researchers noted indicators of compromise and some infrastructure overlaps with other campaigns dating back to April 2018, including at least one separately attributed to [APT-C-36](https://attack.mitre.org/groups/G0099), however identified enough differences to report this as separate, unattributed activity.(Citation: ESET Operation Spalax Jan 2021)  "
  },

In total there are 1991 ATT&CK objects returned by this query. You can of course trim it to only include objects you care about, e.g. only using doc.type == "attack-pattern" will return just Techniques and Subtechniques

FOR doc IN mitre_attack_enterprise_vertex_collection
  FILTER doc._stix2arango_note != "automatically imported on collection creation"
  AND doc._stix2arango_note == "v15.1"
  AND (
    doc.type == "attack-pattern"
  )
  LET attack_id = (
    FOR extRef IN (FOR ref IN doc.external_references RETURN ref)
    FILTER extRef.source_name == "mitre-attack"
    LIMIT 1
    RETURN extRef.external_id
  )[0]
  SORT attack_id
  COLLECT WITH COUNT INTO length
    RETURN length
[
  780
]

To do this you could use a prompt along the lines of:

Here is a list of MITRE ATT&CK technique and sub-technique objects with their; ID, name, and description.

<BATCH OF CTI BUTLER RECORDS>

Linking AI response with MITRE ATT&CK STIX objects

So, to continue my proof of concept, lets assume my confidence threshold is >= 0.7 (so all the entries returned earlier match my filter). Using this information I can now link the detected ATT&CK object IDs.

I can use a CTI Butler query to do this;

LET ATTACK_IDS = [
    "T1005",
    "T1586",
    "T1589"
]

LET enterprise_results = (
    FOR doc IN mitre_attack_enterprise_vertex_collection
        FILTER doc._stix2arango_note != "automatically imported on collection creation"
        AND doc._stix2arango_note == "v15.1"
        AND (doc.type != "x-mitre-collection" AND doc.type != "marking-definition" AND doc.type != "identity" AND doc.type != "x-mitre-matrix" AND doc.type != "x-mitre-data-component")
        AND doc.external_references != null AND IS_ARRAY(doc.external_references)
        FOR extRef IN doc.external_references
            FILTER extRef.external_id IN ATTACK_IDS
            AND extRef.source_name == "mitre-attack"
            RETURN {
                id: doc.id,
                attack_id: extRef.external_id,
                name: doc.name,
                type: doc.type,
                collection: "enterprise"
            }
)

LET ics_results = (
    FOR doc IN mitre_attack_ics_vertex_collection
        FILTER doc._stix2arango_note != "automatically imported on collection creation"
        AND doc._stix2arango_note == "v15.1"
        AND (doc.type != "x-mitre-collection" AND doc.type != "marking-definition" AND doc.type != "identity" AND doc.type != "x-mitre-matrix" AND doc.type != "x-mitre-data-component")
        AND doc.external_references != null AND IS_ARRAY(doc.external_references)
        FOR extRef IN doc.external_references
            FILTER extRef.external_id IN ATTACK_IDS
            AND extRef.source_name == "mitre-attack"
            RETURN {
                id: doc.id,
                attack_id: extRef.external_id,
                name: doc.name,
                type: doc.type,
                collection: "ics"
            }
)

LET mobile_results = (
    FOR doc IN mitre_attack_mobile_vertex_collection
        FILTER doc._stix2arango_note != "automatically imported on collection creation"
        AND doc._stix2arango_note == "v15.1"
        AND (doc.type != "x-mitre-collection" AND doc.type != "marking-definition" AND doc.type != "identity" AND doc.type != "x-mitre-matrix" AND doc.type != "x-mitre-data-component")
        AND doc.external_references != null AND IS_ARRAY(doc.external_references)
        FOR extRef IN doc.external_references
            FILTER extRef.external_id IN ATTACK_IDS
            AND extRef.source_name == "mitre-attack"
            RETURN {
                id: doc.id,
                attack_id: extRef.external_id,
                name: doc.name,
                type: doc.type,
                collection: "mobile"
            }
)

RETURN UNION_DISTINCT(
    enterprise_results,
    ics_results,
    mobile_results
)
[
  [
    {
      "id": "attack-pattern--5282dd9a-d26d-4e16-88b7-7c0f4553daf4",
      "attack_id": "T1589",
      "name": "Gather Victim Identity Information",
      "type": "attack-pattern",
      "collection": "enterprise"
    },
    {
      "id": "attack-pattern--3c4a2599-71ee-4405-ba1e-0e28414b4bc5",
      "attack_id": "T1005",
      "name": "Data from Local System",
      "type": "attack-pattern",
      "collection": "enterprise"
    },
    {
      "id": "course-of-action--7ee0879d-ce4f-4f54-a96b-c532dfb98ffd",
      "attack_id": "T1005",
      "name": "Data from Local System Mitigation",
      "type": "course-of-action",
      "collection": "enterprise"
    },
    {
      "id": "attack-pattern--81033c3b-16a4-46e4-8fed-9b030dd03c4a",
      "attack_id": "T1586",
      "name": "Compromise Accounts",
      "type": "attack-pattern",
      "collection": "enterprise"
    }
  ]
]

Note, you might want to exclude the COA returned as it is linked to the Technique T1005.

Therefore I now have enough information to link a CVE to all ATT&CK objects. I will do this with STIX relationship object modelled as follows;

{
    "type": "relationship",
    "spec_version": "2.1",
    "id": "relationship--<UUID V5>",
    "created_by_ref": "identity--<ARANGO_CTI_PROCESSOR_ID>",
    "created": "<VULNERABILITY CREATED TIME>",
    "modified": "<VULNERABILITY MODIFIED TIME>",
    "relationship_type": "mitre-attack",
    "source_ref": "vulnerability--<ID>",
    "target_ref": "<MITRE ATTACK OBJECT ID>",
    "object_marking_refs": [
        "marking-definition--94868c89-83c2-464b-929b-a1a8aa3c8487",
        "marking-definition--<ARANGO_CTI_PROCESSOR_ID>"
    ]
}

e.g. for CVE-2016-6587 you’d have three STIX relationship objects created.

All 3 would have a source_ref==vulnerability--620f4cb1-67db-57bb-b7b0-81f6c1b45519

And target_refs for each would be:

  1. attack-pattern--81033c3b-16a4-46e4-8fed-9b030dd03c4a
  2. attack-pattern--3c4a2599-71ee-4405-ba1e-0e28414b4bc5
  3. attack-pattern--5282dd9a-d26d-4e16-88b7-7c0f4553daf4

In summary

Now I can search CTI Butler for CVEs using ATT&CK objects as the source.

For example, if I wanted to retrieve all Vulnerabilities related to the ATT&CK Technique, Data from Local System (T1005 / attack-pattern--3c4a2599-71ee-4405-ba1e-0e28414b4bc5), I could run the query…

FOR edge_doc IN mitre_attack_enterprise_edge_collection
FILTER edge_doc.target_ref == "attack-pattern--3c4a2599-71ee-4405-ba1e-0e28414b4bc5"
AND edge_doc.relationship_type == "mitre-attack"
AND edge_doc.source_ref LIKE "vulnerability--%"
FOR vertex_doc IN mitre_attack_enterprise_vertex_collection
FILTER vertex_doc.id == edge_doc.source_ref
LET cleaned_doc = KEEP(vertex_doc, ATTRIBUTES(vertex_doc, true) FILTER LIKE(CURRENT, "_%", false))
RETURN cleaned_doc

Posted by:

David Greenwood

David Greenwood, Do Only Good Everyday




Discuss this post


Head on over to the DOGESEC community to discuss this post.

DOGESEC community

Never miss an update


Sign up to receive new articles in your inbox as they published.

Your subscription could not be saved. Please try again.
Your subscription has been successful.