Cyber Threat Intelligence Storing and Sharing using LLMs and OpenAI GPT-3.5 Turbo

As existing and emerging smart cities continue to expand their IoT and AI – enabled platforms, this introduces novel and complex dimensions to the threat intelligence landscape linked with identifying, responding and sharing data related to attack vectors, aimed at emerging IoT and AI technologies. IRIS vision is to integrate and demonstrate a single platform addressed to CERTs/CSIRTs and Critical Infrastructure Operators for assessing, detecting, responding to and sharing information regarding threats & vulnerabilities of IoT and AI-driven ICT systems.

To achieve this, one of the IRIS platform’s perspectives is the collection, analysis, storage, correlation, and sharing of information about threats, attacks and vulnerabilities from both internal and external sources. Within the IRIS context, the value of the CTI Sharing and Storage tool is to provide the sharing functionalities that support the privacy, disclosure and incident response requirements for threat intelligence collaboration. Advanced filtering techniques are implemented to meet the requirements of each organisation. These techniques ensure that IRIS delivers an effective threat intelligence collaboration that provides rich and actionable threat intelligence while protecting sensitive organisational, system, personal, or classified data disclosure. In this regard, IRIS leverages the CTI Sharing and Storage tool and its submodules generating dynamic taxonomies and ontologies.  To be more precise, from the one side, the taxonomy (-ies) is generated and from the other side, the generated taxonomy (-ies) is used to update existing threat taxonomies (e.g. MISP Taxonomies) using the taxonomies’ terms identified by NER, BERTopic and Pattern matching as the key main algorithms and techniques. In addition to that, a new promising technique that has been introduced recently and aims to present more stable and meaningful results to generate taxonomies and ontologies is the Large Language Models (LLMs)[1]. Due to the fact that such types of Language Models (LMs)[2] can predict word sequence probabilities or create new text from provided input they have the ability to understand and generate human language. However, despite their advancements, LMs face several challenges, such as the risk of overfitting, and difficulties in accurately capturing complex linguistic phenomena[3].

LLMs exemplified by GPT-3[4], InstructGPT[5], and GPT-4[6], are characterized by their extensive parameter sizes and advanced learning capabilities. A notable feature of LLMs is their capacity for in-context learning, where they generate text based on specific contexts or prompts, enhancing their relevance and coherence in interactive uses.

In order to build taxonomies, we are currently using an open-source model called Llama 2[7] which is a collection of pre-trained and fine-tuned LLMs ranging in scale from 7 billion to 70 billion parameters. Llama 2 is quite powerful and outperforms other open-source models. To interact with the model, we leverage prompt engineering[8]. In this approach, users craft specific prompts (i.e., input text) to steer LLMs towards generating targeted responses or performing particular tasks. During prompt engineering, we employ a few-shot inference process. In this procedure, before presenting the desired generation description to the LLM, we also provide it with several descriptions along with their completions. This method instructs the LLM on what to generate.

Further to this, for building cybersecurity taxonomies and identifying the relationships between them, OpenAI’s GPT-3.5 Turbo is used. This model has advanced natural language processing capabilities, and can provide highly accurate and contextually relevant results. A prompt instructs the model to extract specific cybersecurity-related entities from a given text and present them in a JSON format.

In summary, the CTI Sharing and Storing tool developed within the context of IRIS provides among others the generation of dynamic taxonomies and ontologies using new promising techniques such as LLMs and OpenAI’s GPT-3.5 Turbo.

 

[1] Gao, J., & Lin, C. Y. (2004). Introduction to the special issue on statistical language modeling. ACM Transactions on Asian Language Information Processing (TALIP), 3(2), 87-93.

[2] Radford, A., Narasimhan, K., Salimans, T. & Sutskever, I. Improving language understanding by generative pre-training. https://openai.com/research/language-unsupervised (2018).

[3] Chang, Y., Wang, X., Wang, J., Wu, Y., Zhu, K., Chen, H., … & Xie, X. (2023). A survey on evaluation of large language models. arXiv preprint arXiv:2307.03109.

[4] Floridi, L., & Chiriatti, M. (2020). GPT-3: Its nature, scope, limits, and consequences. Minds and Machines, 30, 681-694.

[5] Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., & Lowe, R. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730-27744.

[6] OpenAI (2023b).Gpt-4 technical report.

[7] Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., & Scialom, T. (2023). Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.

[8] Zhou, Y., Muresanu, A. I., Han, Z., Paster, K., Pitis, S., Chan, H., & Ba, J. (2022). Large language models are human-level prompt engineers. arXiv preprint arXiv:2211.01910.