Zeek | Python distributed web scrapper and dynamic crawler | Crawler library

 by   Diastro Python Version: Current License: MIT

kandi X-RAY | Zeek Summary

kandi X-RAY | Zeek Summary

Zeek is a Python library typically used in Automation, Crawler, Selenium applications. Zeek has no vulnerabilities, it has a Permissive License and it has low support. However Zeek has 1 bugs and it build file is not available. You can download it from GitHub.

Python distributed web crawling / web scraper. This the first version of my distributed web crawler. It isn’t perfect yet but I’m sharing it because the end result is far better then what I expected and it can easily be adapted to your needs. Feel free to improve/fork/report issues. I’m planning to continue working on it and probably release an updated version in the future but i’m not sure when yet.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              Zeek has a low active ecosystem.
              It has 116 star(s) with 48 fork(s). There are 10 watchers for this library.
              OutlinedDot
              It had no major release in the last 6 months.
              There are 7 open issues and 32 have been closed. On average issues are closed in 170 days. There are no pull requests.
              It has a neutral sentiment in the developer community.
              The latest version of Zeek is current.

            kandi-Quality Quality

              OutlinedDot
              Zeek has 1 bugs (1 blocker, 0 critical, 0 major, 0 minor) and 133 code smells.

            kandi-Security Security

              Zeek has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              Zeek code analysis shows 0 unresolved vulnerabilities.
              There are 8 security hotspots that need review.

            kandi-License License

              Zeek is licensed under the MIT License. This license is Permissive.
              Permissive licenses have the least restrictions, and you can use them in most projects.

            kandi-Reuse Reuse

              Zeek releases are not available. You will need to build from source code and install.
              Zeek has no build file. You will be need to create the build yourself to build the component from source.
              Zeek saves you 374 person hours of effort in developing the same functionality from scratch.
              It has 891 lines of code, 57 functions and 10 files.
              It has high code complexity. Code complexity directly impacts maintainability of the code.

            Top functions reviewed by kandi - BETA

            kandi has reviewed Zeek and discovered the below as its top functions. This is intended to give you an instant insight into Zeek implemented functionality, and help decide if they suit your requirements.
            • This method is called when a connection is started
            • Start the thread
            • Disconnect from the server
            • Send configuration to the server
            • Read configuration from server
            • Read data from the socket
            • Write obj to socket
            • Config parser
            • Read a file
            • Reads a static URL file
            • Main thread
            • Dispatch incoming packet
            • Setup the connection
            • Listen to the client
            • Process incoming packets
            • Dispatch a packet
            • Write session data to a file
            • Log a message to the log
            • The main loop
            • Disconnects all connected clients
            • Loop through the output queue
            • Print out the ending URL
            • Connect to host and port
            • Start the crawler
            • Disconnect all connected clients
            Get all kandi verified functions for this library.

            Zeek Key Features

            No Key Features are available at this moment for Zeek.

            Zeek Examples and Code Snippets

            No Code Snippets are available at this moment for Zeek.

            Community Discussions

            QUESTION

            reading line delimited json file in python
            Asked 2022-Jan-10 at 16:41

            I want to read specific values out of a line delimited json file. The lines in the jason file look like that.

            ...

            ANSWER

            Answered 2022-Jan-10 at 16:40
            import json
            
            with open('path/to/file') as f:
                lines = f.readlines()
                
            dicts = [json.loads(line) for line in lines]
            

            Source https://stackoverflow.com/questions/70655829

            QUESTION

            Configuring connectors for multiple topics on Kafka Connect Distributed Mode
            Asked 2021-Nov-08 at 20:02

            We have producers that are sending the following to Kafka:

            • topic=syslog, ~25,000 events per day
            • topic=nginx, ~5,000 events per day
            • topic=zeek.xxx.log, ~100,000 events per day (total). In this last case there are 20 distinct zeek topics, such as zeek.conn.log and zeek.http.log

            kafka-connect-elasticsearch instances function as consumers to ship data from Kafka to Elasticsearch. The hello-world Sink configuration for kafka-connect-elasticsearch might look like this:

            ...

            ANSWER

            Answered 2021-Nov-08 at 20:02

            In distributed mode, would I still want to submit just a single elasticsearch.properties through a single API call?

            It'd be a JSON file, but yes.

            what dictates the number of workers?

            Up to you. JVM usage is one factor that you can monitor and scale on

            Not really any documentation that I am aware of

            Source https://stackoverflow.com/questions/69888199

            QUESTION

            Zeek is not storing files, even after script was loaded. What am I missing?
            Asked 2021-May-12 at 19:20

            I'm trying to configure Zeek in order to store files (every file) on disc, but without any success. OS I'm using: Debian 10.

            What I did so far:

            I can see the scripts are loaded, after checking loaded_scripts.log

            I'm a beginner on Zeek, and I'd like to learn how to enable zeek to save files (that is traversing the network) and store on disk. The only sort of files that is being stored: HTTP and SSL.

            I'm sure I'm making many mistakes, but I'm not able to find the correct way.

            EDIT

            Zeek version I'm using: zeek version 4.1.0-dev.545.

            I'm processing traffic. I haven't tried anything with pcap, but I'll try what you've suggested with "zeek -r the.pcap policy/scripts/frameworks/files/extract-all-files.zeek".

            On Zeek server, I've installed (in order to test) a FTP and a HTTP server. At html folder, I created a pdf file (so I can download it later). I've put two files (a pdf and a plain text file), and I downloaded (using a browser on another computer in the local network) that pdf file. As a result, I can see (looking at ftp.log and http.log) all the files that I mentioned, but those files aren't stored on disc. My doubt is: should they be stored by Zeek?

            ...

            ANSWER

            Answered 2021-May-12 at 19:20

            A common problem when running traffic through Zeek is that packets may have invalid checksums. Zeek by default skips such packets, so the net result is missing logs/files/artifacts that the user expects to be there. Often those invalid checksums are caused by checksum offloading, where the packet capture process grabs transmitted packets before the NIC had a chance to fix the checksums.

            Zeek normally warns when it encounters invalid checksums -- look for the something resembling the following on stderr, or in reporter.log:

            Your trace file likely has invalid TCP checksums, most likely from NIC checksum offloading. By default, packets with invalid checksums are discarded by Zeek unless using the -C command-line option or toggling the 'ignore_checksums' variable.

            (This is from find-checksum-offloading.zeek, which is included in Zeek's default configuration.)

            You have many options here. You can:

            • run Zeek with -C, as per the above
            • say redef ignore_checksums=T; in a script (usually local.zeek)
            • add the redef at the command line: zeek -r the.pcap ... ignore_checksums=T
            • fix the checksums in the pcap, e.g. with tcprewrite -C -i input.pcap -o fixed.pcap (tcprewrite ships with tcpreplay) -- this is best if others will consume your pcap too.

            Source https://stackoverflow.com/questions/67492567

            QUESTION

            SYSLOG-NG: Sending same log to two different index in elasticsearch
            Asked 2021-Feb-12 at 15:21

            I'm trying to send the same log flow to two different elasticsearch indexes, because of users with different roles each index.

            I use a file for destination too. Here is a sample:

            ...

            ANSWER

            Answered 2021-Feb-12 at 15:21

            You can check the exact error message in the journal logs, as it is suggested by systemctl:

            See "systemctl status syslog-ng.service" and "journalctl -xe" for details.

            Alternatively, you can start syslog-ng in the foreground:

            $ syslog-ng -F --stderr

            You probably have a persist-name collision due to the matching elasticsearch-http() URLs. Please try adding the persist-name() option with 2 unique names, for example:

            Source https://stackoverflow.com/questions/66172511

            QUESTION

            AttributeError: 'Client' object has no attribute 'command' Line 45
            Asked 2021-Feb-01 at 17:12

            I was trying to incorporate Reddit into my bot but every time I run it keeps on giving me this error.

            "Traceback (most recent call last): File "main.py", line 45, in @client.command() AttributeError: 'Client' object has no attribute 'command'"

            My Code:

            ...

            ANSWER

            Answered 2021-Feb-01 at 17:12

            So your problem is that you are mixing up the diffrence between bot and client. Those are to diffrent things.
            A bot is simpler, it just recivies commands from you and handles them. Meaning it does something. If you want to write in the chat and do more things you need a client.

            Besides that you can't create a Bot like that: client = commands.Bot(command_prefix=bot_prefix) The right way would be: bot = Bot(command_prefix='$') since you have already imported bot.

            The way to go for you would be to stick with the client and instead of commands use:

            Source https://stackoverflow.com/questions/65996399

            QUESTION

            Zeek Workers cannot communicate with Zeek Proxy/manager
            Asked 2020-Apr-29 at 01:14

            I set up a small zeek cluster and had it working fine. Here's my rough setup:

            ...

            ANSWER

            Answered 2020-Apr-29 at 01:14

            Seth Hall nailed it. I messed up the rules without knowing. Thankfully an easy fix. Thanks.

            Source https://stackoverflow.com/questions/61394037

            QUESTION

            How to set mmdb_dir in Zeek/Bro
            Asked 2020-Apr-27 at 09:09

            I try to use GeoIp functionality in Bro/Zeek.

            From the official Zeek Documentation:

            If you see an error message similar to “Failed to open GeoIP location database”, then you may need to either rename or move your GeoIP location database file. If the mmdb_dir value is set to a directory pathname (it is not set by default), then Zeek looks for location database files in that directory.

            Ok, mmdb_dir is not set:

            ...

            ANSWER

            Answered 2020-Apr-23 at 01:15

            The variable is defined (with an empty string value) as a redef'able constant in the init-bare.zeek file that comes with the distribution. So just say

            Source https://stackoverflow.com/questions/61348208

            QUESTION

            Zeek cluster fails with pcap_error: socket: Operation not permitted (pcap_activate)
            Asked 2020-Apr-22 at 07:41

            I'm trying to setting up a Zeek IDS cluster (v.3.2.0-dev.271) on 3 Ubuntu 18.04 LTS hosts to no avail - running zeek deploy command fails with the following output:

            ...

            ANSWER

            Answered 2020-Apr-20 at 20:33

            I was experiencing the same error for my standalone setup. Found this question from googling it. More googling the error brought me to a few blogs including one in which the comments mentioned the same error. The author mentioned giving the binaries permissions using setcap:

            Source https://stackoverflow.com/questions/61017158

            QUESTION

            Is it possible to inspect TCP reserved bits with Zeek?
            Asked 2020-Apr-08 at 19:24

            I'm testing Zeek/Bro capabilities in terms of detecting different types of steganography. After working with the ICMP protocol now I am trying to inspect the TCP protocol. I want to detect if the reserved bits in TCP are changed with help of TCP events. Unfortunately without success.

            Is it possible to inspect TCP reserved bits with Zeek?

            ...

            ANSWER

            Answered 2020-Mar-24 at 01:33

            Not out of the box, no. One way to add it would be to expand the TCP_Flags class in your local build so it captures the TCP header's th_x2 field bits as well. Then, use the tcp_packet event, which reports the flags.

            This would be quite slow, though, as it'd be packet-level analysis.

            Source https://stackoverflow.com/questions/60822134

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            In Zeek Network Security Monitor (formerly known as Bro) before 2.6.2, a NULL pointer dereference in the Kerberos (aka KRB) protocol parser leads to DoS because a case-type index is mishandled.

            Install Zeek

            You can download it from GitHub.
            You can use Zeek like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            CLONE
          • HTTPS

            https://github.com/Diastro/Zeek.git

          • CLI

            gh repo clone Diastro/Zeek

          • sshUrl

            git@github.com:Diastro/Zeek.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link

            Explore Related Topics

            Reuse Pre-built Kits with Zeek

            Consider Popular Crawler Libraries

            scrapy

            by scrapy

            cheerio

            by cheeriojs

            winston

            by winstonjs

            pyspider

            by binux

            colly

            by gocolly

            Try Top Libraries by Diastro

            github-colors

            by DiastroCSS

            LBE-OperatingSystem

            by DiastroC++

            cat

            by DiastroJavaScript