manuka | A modular OSINT honeypot for blue teamers | Security library
kandi X-RAY | manuka Summary
kandi X-RAY | manuka Summary
Manuka is an Open-source intelligence (OSINT) honeypot that monitors reconnaissance attempts by threat actors and generates actionable intelligence for Blue Teamers. It creates a simulated environment consisting of staged OSINT sources, such as social media profiles and leaked credentials, and tracks signs of adversary interest, closely aligning to MITRE’s PRE-ATT&CK framework. Manuka gives Blue Teams additional visibility of the pre-attack reconnaissance phase and generates early-warning signals for defenders. Although they vary in scale and sophistication, most traditional honeypots focus on networks. These honeypots uncover attackers at Stage 2 (Weaponization) to 7 (Actions on Objectives) of the cyber kill chain, with the assumption that attackers are already probing the network. Manuka conducts OSINT threat detection at Stage 1 (Reconnaissance) of the cyber kill chain. Despite investing millions of dollars into network defenses, organisations can be easily compromised through a single Google search. One recent example is hackers exposing corporate meetings, therapy sessions, and college classes through Zoom calls left on the open Web. Enterprises need to detect these OSINT threats on their perimeter but lack the tools to do so. Manuka is built to scale. Users can easily add new listener modules and plug them into the Dockerized environment. They can coordinate multiple campaigns and honeypots simultaneously to broaden the honeypot surface. Furthermore, users can quickly customize and deploy Manuka to match different use cases. Manuka’s data is designed to be easily ported to other third-party analysis and visualization tools in an organisation’s workflow. Designing an OSINT honeypot presents a novel challenge due to the complexity and wide range of OSINT techniques. However, such a tool would allow Blue Teamers to “shift left” in their cyber threat intelligence strategy.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of manuka
manuka Key Features
manuka Examples and Code Snippets
Community Discussions
Trending Discussions on manuka
QUESTION
Thank you for taking the time to read my question! Most likely I formulated my question wrong, so sorry for confusing you already. It also might be a basic Javascript question for some of you, but I can not wrap my head around it. I will try my best explaining as to what I am doing.
My data looks like this:
...ANSWER
Answered 2021-Feb-19 at 09:42here is how you can split a text into a list
QUESTION
I have a dataset similar to the following:
...ANSWER
Answered 2020-Jul-23 at 00:21Create a list comprehension object m
that compares values to .upper()
to get all uppercase letters and .isalpha()
to make sure you are not bringing in strings / numbers where .upper()
doesn't do anything to them. Then, simply create new columns that utilize the list comprehension with .apply(m)
QUESTION
I have the following dataset
...ANSWER
Answered 2020-Jun-30 at 00:19define your function with
QUESTION
I have two datasets. One dataset has about ~30k rows, and the second dataset has ~60k rows. The smaller dataset (df1
) has a unique identifier (upc
), which is critical to my analysis.
The larger dataset (df2
) does not have this unique identifier, but it does have a descriptive variable (product_title
) that can be matched with a similar description variable in df1
and used to infer the unique identifier.
I am trying to keep things simple, so I used expand.grid
.
ANSWER
Answered 2018-Nov-27 at 19:26Your idea is good. One realization of it then would be
QUESTION
I am having a few issues with scaling a text matching program. I am using text2vec which provides very good and fast results.
The main problem I am having is manipulating a large matrix which is returned by the text2vec::sim2() function.
First, some details of my hardware / OS setup: Windows 7 with 12 cores about 3.5 GHz and 128 Gb of memory. Its a pretty good machine.
Second, some basic details of what my R program is trying to achieve.
We have a database of 10 million unique canonical addresses for every house / business in address. These reference addresses also have latitude and longitude information for each entry.
I am trying to match these reference addresses to customer addresses in our database. We have about 600,000 customer addresses. The quality of these customer addresses is not good. Not good at all! They are stored as a single string field with absolutely zero checks on input.
The techical strategy to match these addresses is quite simple. Create two document term matrices (DTM) of the customer addresses and reference addresses and use cosine similarity to find the reference address which is the most similar to a specific customer address. Some customer addresses are so poor that will result in a very low cosine similarity -- so, for these addresses a "no match" would be assigned.
Despite being a pretty simple solution, the results obtained are very encouraging.
But, I am having problems scaling things....? And I am wondering if anyone has any suggestions.
There is a copy of my code below. Its pretty simple. Obviously, I cannot include real data but it should provide readers a clear idea of what I am trying to do.
SECTION A - Works very well even on the full 600,000 * 10 million input data set.
SECTION B - the text2vec::sim2() function causes R studio to shut down when the vocabulary exceeds about 140,000 tokens (i.e columns). To avoid this, I process the customer addresses in chunks of about 200.
SECTION C - This is the most expensive section. When processing addresses in chunks of 200, SECTION A and SECTION B take about 2 minutes. But SECTION C, using (what I would have thought to be super quick functions) take about 5 minutes to process to process a 10 million row * 200 column matrix.
Combined, SECIONS A:C take about 7 minutes to process 200 addresses. As there are 600,000 addresses to process, this will take about 14 days to process.
Are they are ideas to make this code run faster...?
...ANSWER
Answered 2018-Feb-16 at 16:40The issue in step C is that mat_sim
is sparse and all the apply
calls make column/row subsetting which are super slow (and convert sparse vectors to dense).
There could be several solutions:
- if
mat_sim
is not very huge convert to the dense withas.matrix
and then useapply
Better you can convert
mat_sim
to sparse matrix in a triplet format withas(mat_sim, "TsparseMatrix")
and then usedata.table
to get indices of the max elements. Here is an example:
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install manuka
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page