Skip to content

Commit 830f5af

Browse files
authored
Merge pull request #635 from oasis-open/fix-email-message-test
fix email message test
2 parents faface0 + 854e0c4 commit 830f5af

File tree

8 files changed

+232
-141
lines changed

8 files changed

+232
-141
lines changed

USING_NEO4J.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
# Experimenting with the Neo4j graph database Python STIX DataStore
22

3-
The Neo4j graph database Python STIX DataStore is a proof-of-concept implementation to show how to store STIX content in a graph database.
3+
The Neo4j graph database Python STIX DataStore is a proof-of-concept implementation to show how to store STIX content in a graph database.
44

55
## Limitations:
66

7-
As a proof-of-concept it has minimal functionality.
8-
7+
As a proof-of-concept it has minimal functionality.
8+
99
## Installing Neo4j
1010

1111
See https://neo4j.com/docs/desktop-manual/current/installation
@@ -18,18 +18,18 @@ The python neo4j library used is py2neo, available in pypi at https://pypi.org/p
1818

1919
## Implementation Details
2020

21-
We would like to that the folks at JHU/APL for their implementation of [STIX2NEO4J.py](https://github.com/opencybersecurityalliance/oca-iob/tree/main/STIX2NEO4J%20Converter), which this code is based on.
21+
We would like to that the folks at JHU/APL for their implementation of [STIX2NEO4J.py](https://github.com/opencybersecurityalliance/oca-iob/tree/main/STIX2NEO4J%20Converter), which this code is based on.
2222

2323
Only the DataSink (for storing STIX data) part of the DataStore object has been implemented. The DataSource part is implemented as a stub. However, the graph database can be queried using the neo4j cypher langauge within
2424
the neo4j browser.
2525

26-
The main concept behind any graphs is nodes and edges. STIX data is similar as it contains relationship objects (SROs) and node objects (SDOs, SCOs and SMOs). Additional edges are provided by STIX embedded relationships, which are expressed as properties in STIX node objects. This organization of data in STIX is a natural fit for graph models, such as neo4j.
26+
The main concept behind any graphs is nodes and edges. STIX data is similar as it contains relationship objects (SROs) and node objects (SDOs, SCOs and SMOs). Additional edges are provided by STIX embedded relationships, which are expressed as properties in STIX node objects. This organization of data in STIX is a natural fit for graph models, such as neo4j.
2727

28-
The order in which STIX objects are added to the graph database is arbitrary. Therefore, when an SRO or embedded relationship is added via the DataStore, the nodes that it connects may not be present in the database, so the relationship is not added to the database, but remembered by the DataStore code as an unconnected relationship. Whenever a new node is
29-
added to the database, the unconnected relationships must be reviewed to determine if both nodes of a relationship can now be represented using an edge in the graph database.
28+
The order in which STIX objects are added to the graph database is arbitrary. Therefore, when an SRO or embedded relationship is added via the DataStore, the nodes that it connects may not be present in the database, so the relationship is not added to the database, but remembered by the DataStore code as an unconnected relationship. Whenever a new node is
29+
added to the database, the unconnected relationships must be reviewed to determine if both nodes of a relationship can now be represented using an edge in the graph database.
3030

31-
Note that unless both the source and target nodes are eventually added,
32-
the relationship will not be added either.
31+
Note that unless both the source and target nodes are eventually added,
32+
the relationship will not be added either.
3333
How to address this issue in the implementation has not been determined.
3434

3535
## Demonstrating a neo4j database for STIX
Lines changed: 16 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,11 @@
33
# Code developed by JHU/APL - First Draft December 2021
44

55
# DISCLAIMER
6-
# The script developed by JHU/APL for the demonstration are not “turn key” and are
6+
# The script developed by JHU/APL for the demonstration are not “turn key” and are
77
# not safe for deployment without being tailored to production infrastructure. These
88
# files are not being delivered as software and are not appropriate for direct use on any
99
# production networks. JHU/APL assumes no liability for the direct use of these files and
10-
# they are provided strictly as a reference implementation.
10+
# they are provided strictly as a reference implementation.
1111
#
1212
# NO WARRANTY, NO LIABILITY. THIS MATERIAL IS PROVIDED “AS IS.” JHU/APL MAKES NO
1313
# REPRESENTATION OR WARRANTY WITH RESPECT TO THE PERFORMANCE OF THE MATERIALS, INCLUDING
@@ -20,11 +20,12 @@
2020
# CONSEQUENTIAL, SPECIAL OR OTHER DAMAGES ARISING FROM THE USE OF, OR INABILITY TO USE,
2121
# THE MATERIAL, INCLUDING, BUT NOT LIMITED TO, ANY DAMAGES FOR LOST PROFITS.
2222

23+
from getpass import getpass
2324
## Import python modules for this script
2425
import json
2526
from typing import List
27+
2628
from py2neo import Graph, Node
27-
from getpass import getpass
2829
from tqdm import tqdm
2930

3031
#Import variables
@@ -44,10 +45,12 @@ def __init__(self):
4445
self.nodes_with_object_ref = list()
4546
self.nodes = list()
4647
self.bundlename = BundleName
47-
self.infer_relation = {"parent_ref": "parent_of",
48+
self.infer_relation = {
49+
"parent_ref": "parent_of",
4850
"created_by_ref": "created_by",
4951
"src_ref": "source_of",
50-
"dst_ref": "destination_of"}
52+
"dst_ref": "destination_of",
53+
}
5154
self.__load_json(JSONFILE)
5255

5356
def __load_json(self, fd):
@@ -85,16 +88,18 @@ def make_nodes(self):
8588
node_contents[key] = apobj[key]
8689
# Make the Bundle ID a property
8790
# use dictionary expansion as keywork for optional node properties
88-
node = Node(apobj["type"],
89-
name=node_name,
90-
bundlesource=self.bundlename,
91-
**node_contents)
91+
node = Node(
92+
apobj["type"],
93+
name=node_name,
94+
bundlesource=self.bundlename,
95+
**node_contents,
96+
)
9297
# if node needs new created_by relation, create the node and then the relationship
9398
self.sgraph.create(node)
9499
# save off these nodes for additional relationship creating
95100
if 'object_refs' in keys:
96101
self.nodes_with_object_ref.append(apobj)
97-
102+
98103
# create relationships that exist outside of relationship objects
99104
# such as Created_by and Parent_Of
100105
def __make_inferred_relations(self):
@@ -112,7 +117,7 @@ def __make_inferred_relations(self):
112117
else:
113118
ref_list = apobj[k]
114119
for ref in ref_list:
115-
# The "b to a" relationship is reversed in this cypher query to ensure the correct relationship direction in the graph
120+
# The "b to a" relationship is reversed in this cypher query to ensure the correct relationship direction in the graph
116121
cypher_string = f'MATCH (a),(b) WHERE a.bundlesource="{self.bundlename}" AND b.bundlesource="{self.bundlename}" AND a.ap_id="{str(ref)}" AND b.ap_id="{str(apobj["id"])}" CREATE (b)-[r:{rel_type}]->(a) RETURN a,b'
117122
try:
118123
self.sgraph.run(cypher_string)

stix2/datastore/neo4j/demo.py

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,17 @@
11

2-
import sys
32
import json
3+
import sys
4+
5+
from identity_contact_information import \
6+
identity_contact_information # noqa F401
7+
# needed so the relational db code knows to create tables for this
8+
from incident import event, impact, incident, task # noqa F401
9+
from observed_string import observed_string # noqa F401
410

511
import stix2
612
from stix2.datastore.neo4j.neo4j import Neo4jStore
713
import stix2.properties
814

9-
# needed so the relational db code knows to create tables for this
10-
from incident import incident, event, task, impact
11-
from identity_contact_information import identity_contact_information
12-
from observed_string import observed_string
13-
1415

1516
def main():
1617
with open(sys.argv[1], "r") as f:

stix2/datastore/neo4j/neo4j.py

Lines changed: 49 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,24 @@
1-
import json
1+
import re
22

33
from py2neo import Graph, Node, Relationship
4-
import re
54

65
import stix2
76
from stix2.base import _STIXBase
8-
from stix2.datastore import (
9-
DataSink, DataSource, DataStoreMixin,
10-
)
7+
from stix2.datastore import DataSink, DataSource, DataStoreMixin
118
from stix2.parsing import parse
129

1310

1411
def convert_camel_case_to_snake_case(name):
1512
return re.sub(r'(?<!^)(?=[A-Z])', '_', name).lower()
13+
14+
1615
def remove_sro_from_list(sro, sro_list):
1716
for rel in sro_list:
18-
if (rel["source_ref"] == sro["source_ref"] and
19-
rel["target_ref"] == sro["target_ref"] and
20-
rel["relationship_type"] == sro["relationship_type"]):
17+
if (
18+
rel["source_ref"] == sro["source_ref"] and
19+
rel["target_ref"] == sro["target_ref"] and
20+
rel["relationship_type"] == sro["relationship_type"]
21+
):
2122
sro_list.remove(rel)
2223
break
2324
return sro_list
@@ -29,6 +30,7 @@ def hash_dict_as_string(hash_dict):
2930
hashes.append(f'{hash_type}:{hash}')
3031
return ",".join(hashes)
3132

33+
3234
def _add(sink, stix_data, allow_custom=True, version="2.1"):
3335
"""Add STIX objects to MemoryStore/Sink.
3436
@@ -73,23 +75,25 @@ class Neo4jStore(DataStoreMixin):
7375

7476
default_neo4j_connection = "bolt://neo4j:password@localhost:7687"
7577

76-
def __init__(self, host=default_host, username=default_username, password=default_password, allow_custom=True, version=None,
77-
clear_database=True):
78+
def __init__(
79+
self, host=default_host, username=default_username, password=default_password, allow_custom=True, version=None,
80+
clear_database=True,
81+
):
7882
self.sgraph = Graph(host=host, auth=(username, password))
7983
super().__init__(
80-
source = Neo4jSource(
84+
source=Neo4jSource(
8185
sgraph=self.sgraph,
8286
allow_custom=allow_custom,
8387

8488
),
85-
sink = Neo4jSink(
89+
sink=Neo4jSink(
8690
sgraph=self.sgraph,
8791
allow_custom=allow_custom,
8892
version=version,
8993
clear_database=clear_database,
9094

9195

92-
)
96+
),
9397
)
9498

9599

@@ -119,7 +123,7 @@ def __init__(self, sgraph, allow_custom=True, version=None, clear_database=False
119123
self.relationships_to_recheck = list()
120124
self.sub_object_relationships = list()
121125
self.counter = 1
122-
self.allow_custom=allow_custom
126+
self.allow_custom = allow_custom
123127
if clear_database:
124128
self.sgraph.delete_all()
125129

@@ -175,10 +179,12 @@ def _insert_sdo_sco_smo(self, obj, type_name):
175179
self.sub_object_relationships.append((key, obj[key]))
176180
# Make the Bundle ID a property
177181
# use dictionary expansion as keyword for optional node properties
178-
node = Node(type_name,
179-
name=node_name,
180-
# bundlesource=self.bundlename,
181-
**node_contents)
182+
node = Node(
183+
type_name,
184+
name=node_name,
185+
# bundlesource=self.bundlename,
186+
**node_contents,
187+
)
182188
# if node needs new created_by relation, create the node and then the relationship
183189
self.sgraph.create(node)
184190
# check to see if the addition of this node makes it possible to create a relationship
@@ -206,10 +212,12 @@ def _insert_sub_object(self, sub_prop, sub_obj, parent_node):
206212
node_contents[key] = value
207213
else:
208214
self.sub_object_relationships.append((key, value))
209-
node = Node(sub_prop,
210-
name=sub_prop + "_" + self.next_id(),
211-
# bundlesource=self.bundlename,
212-
**node_contents)
215+
node = Node(
216+
sub_prop,
217+
name=sub_prop + "_" + self.next_id(),
218+
# bundlesource=self.bundlename,
219+
**node_contents,
220+
)
213221
self.sgraph.create(node)
214222
relationship = Relationship(parent_node, sub_prop, node)
215223
self.sgraph.create(relationship)
@@ -230,10 +238,12 @@ def _insert_external_references(self, refs, parent_node):
230238
node_contents[key] = value
231239
else:
232240
self.sub_object_relationships.append((key, value))
233-
node = Node("external_reference",
234-
name="external_reference" + "_" + self.next_id(),
235-
# bundlesource=self.bundlename,
236-
**node_contents)
241+
node = Node(
242+
"external_reference",
243+
name="external_reference" + "_" + self.next_id(),
244+
# bundlesource=self.bundlename,
245+
**node_contents,
246+
)
237247
relationship = Relationship(parent_node, "external_reference", node)
238248
self.sgraph.create(relationship)
239249

@@ -254,15 +264,17 @@ def _insert_extensions(self, extensions, parent_node):
254264
node_contents[key] = hash_dict_as_string(value)
255265
else:
256266
node_contents[key] = value
257-
node = Node(type_name,
258-
name=type_name + "_" + self.next_id(),
259-
# bundlesource=self.bundlename,
260-
**node_contents)
267+
node = Node(
268+
type_name,
269+
name=type_name + "_" + self.next_id(),
270+
# bundlesource=self.bundlename,
271+
**node_contents,
272+
)
261273
relationship = Relationship(parent_node, type_name, node)
262274
self.sgraph.create(relationship)
263275
self._insert_embedded_relationships(ext, parent_node["id"])
264276

265-
def _is_node_available(self, id,):
277+
def _is_node_available(self, id):
266278
cypher_string = f'OPTIONAL MATCH (a) WHERE a.id="{str(id)}" UNWIND [a] AS list_rows RETURN list_rows'
267279
cursor = self.sgraph.run(cypher_string).data()
268280
return cursor[0]["list_rows"]
@@ -290,7 +302,7 @@ def _insert_embedded_relationships(self, obj, id, recheck=False):
290302
k_tokens = k.split("_")
291303
# find refs, but ignore external_references since they aren't objects
292304
if "ref" in k_tokens[len(k_tokens) - 1] and k_tokens[len(k_tokens) - 1] != "references":
293-
rel_type = "_".join(k_tokens[: -1])
305+
rel_type = "_".join(k_tokens[: -1]) # noqa F841
294306
ref_list = []
295307
# refs are lists, push singular ref into list to make it iterable for loop
296308
if not type(obj[k]).__name__ == "list":
@@ -307,9 +319,9 @@ def _insert_embedded_relationships(self, obj, id, recheck=False):
307319
remove_sro_from_list(obj, self.relationships_to_recheck)
308320
else:
309321
if not recheck:
310-
embedded_relationship = {"source_ref": id,
311-
"target_ref": ref,
312-
"relationship_type": k}
322+
embedded_relationship = {
323+
"source_ref": id,
324+
"target_ref": ref,
325+
"relationship_type": k,
326+
}
313327
self.relationships_to_recheck.append(embedded_relationship)
314-
315-

stix2/datastore/neo4j/neo4j_testing.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
import datetime as dt
22
import os # noqa: F401
33

4-
54
import pytz
65

76
import stix2

stix2/datastore/relational_db/input_creation.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -102,7 +102,7 @@ def generate_insert_information(self, dictionary_name, stix_object, **kwargs):
102102
table_child = data_sink.tables_dictionary[
103103
canonicalize_table_name(table_name + "_" + dictionary_name + "_" + "values", schema_name)
104104
]
105-
child_table_inserts = generate_insert_for_dictionary_list(table_child, next_id, value, data_sink, contained_type)
105+
child_table_inserts.extend(generate_insert_for_dictionary_list(table_child, next_id, value, data_sink, contained_type))
106106
value = next_id
107107
stix_type = IntegerProperty()
108108
else:

stix2/datastore/relational_db/relational_db_testing.py

Lines changed: 45 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -288,11 +288,52 @@ def test_dictionary():
288288
)
289289

290290

291+
multipart_email_msg_dict = {
292+
"type": "email-message",
293+
"spec_version": "2.1",
294+
"id": "email-message--ef9b4b7f-14c8-5955-8065-020e0316b559",
295+
"is_multipart": True,
296+
"received_lines": [
297+
"from mail.example.com ([198.51.100.3]) by smtp.gmail.com with ESMTPSA id \
298+
q23sm23309939wme.17.2016.07.19.07.20.32 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 \
299+
bits=128/128); Tue, 19 Jul 2016 07:20:40 -0700 (PDT)",
300+
],
301+
"content_type": "multipart/mixed",
302+
"date": "2016-06-19T14:20:40.000Z",
303+
"from_ref": "email-addr--89f52ea8-d6ef-51e9-8fce-6a29236436ed",
304+
"to_refs": ["email-addr--d1b3bf0c-f02a-51a1-8102-11aba7959868"],
305+
"cc_refs": ["email-addr--e4ee5301-b52d-59cd-a8fa-8036738c7194"],
306+
"subject": "Check out this picture of a cat!",
307+
"additional_header_fields": {
308+
"Content-Disposition": ["inline"],
309+
"X-Mailer": ["Mutt/1.5.23"],
310+
"X-Originating-IP": ["198.51.100.3"],
311+
},
312+
"body_multipart": [
313+
{
314+
"content_type": "text/plain; charset=utf-8",
315+
"content_disposition": "inline",
316+
"body": "Cats are funny!",
317+
},
318+
{
319+
"content_type": "image/png",
320+
"content_disposition": "attachment; filename=\"tabby.png\"",
321+
"body_raw_ref": "artifact--4cce66f8-6eaa-53cb-85d5-3a85fca3a6c5",
322+
},
323+
{
324+
"content_type": "application/zip",
325+
"content_disposition": "attachment; filename=\"tabby_pics.zip\"",
326+
"body_raw_ref": "file--6ce09d9c-0ad3-5ebf-900c-e3cb288955b5",
327+
},
328+
],
329+
}
330+
331+
291332
def main():
292333
store = RelationalDBStore(
293-
MariaDBBackend("mariadb+pymysql://admin:admin@127.0.0.1:3306/rdb", force_recreate=True),
334+
# MariaDBBackend("mariadb+pymysql://admin:admin@127.0.0.1:3306/rdb", force_recreate=True),
294335
# PostgresBackend("postgresql://localhost/stix-data-sink", force_recreate=True),
295-
# SQLiteBackend("sqlite:///stix-data-sink.db", force_recreate=True),
336+
SQLiteBackend("sqlite:///stix-data-sink.db", force_recreate=True),
296337

297338
True,
298339
None,
@@ -340,6 +381,8 @@ def main():
340381
malware = malware_with_all_required_properties()
341382
store.add(malware)
342383

384+
store.add(stix2.parse(multipart_email_msg_dict))
385+
343386
# read_obj = store.get(directory_stix_object.id)
344387
# print(read_obj)
345388
else:

0 commit comments

Comments
 (0)