tweegraph | adapted BFS twitter crawler that collects relationships

 by   PGryllos Python Version: Current License: No License

kandi X-RAY | tweegraph Summary

kandi X-RAY | tweegraph Summary

tweegraph is a Python library typically used in Telecommunications, Media, Advertising, Marketing applications. tweegraph has no bugs, it has no vulnerabilities, it has build file available and it has low support. You can download it from GitHub.

tweegraph is a collection of functions, a class and some scripts that use tweepy to provide an interface for crawling twitter relationships. Aimed for small research tasks that involve either the studying of topological traits of a network (sub-graphs of twitter) and that can involve up to some hundreds of thousands of ids (i.e crawling ~100000 ids with 5 api keys can be done in under an hour) or / and user data (collecting 2gb of timeline data takes me usually 5-6 hours). Can be used from small research groups that cannot afford the money and / or time for setting up large clusters for distributed crawling on aws or azure and want to easily set up a crawling procedure. The crawler takes advantage of multiple api tokens, if provided, but also works with one. Adding more than one obviously speeds up the crawling significantly. For running the crawling of user ids you need tweepy and pandas. For crawling timelines you also need MongoDB with pymongo. If you want to use the plot_graph.py script to visually plot the collected graph you additionally need networkx. If you want all the above just pip install -r requirements, then install MongoDb and you will be up to date with everything that the project uses. The graph searching algorithm is a fixed BFS. Fixed means that you have to provide a specific breadth up to which the neighbors of a node will be discovered. This helps with discovering larger parts of the network faster and the crawling not being stuck in nodes with extremely high in-out degree (e.g. The crawler would spend more than 9 days to collect all the followers of Kanye West and it would end up creating an single asterisk instead of a highly interconnected network whose properties can be studied). The crawler needs an initial seed that must be a list of ids (one or more). Choose a good set on initial nodes (ids) for faster start up and more meaningfull crawling. Usually nodes with high in-out degree are a better choice. But of course that depends on the type of research you are using the crawler for. The crawler is fault taulerant. You can initiate a crawling and leave it run for a day or more (note that I have never left it run for more than a single day). The graph traversing mechanism is implemented in /tweegraph/traverser.py by the TwitterGraphTraverser class which provides an interface for using the mechanism. All connection are stored into links.csv in the form (follwer, node). That also means that the crawler treats the network as a directed graph. The collect_graph.py script showcases how the crawler can be used for collecting user relations and collect_timelines.py uses the links.csv as a seed for starting multiple crawling of timeline data of the unique nodes in the file. Here is a simple example for how to initiate a crawling process. Here is a snapshot of a network with 23104 edges collected in about 5-10min. For the collect_timelines.py script I have used MongoDB to store the results. But handling the results can be easily modified from anyone to fit his/her needs. I haven't yet added a LICENSE to the project but have in mind that the code comes with ABSOLUTELY NO WARRANTY. I have been using the crawler as a tool for my diploma dissertation and I would be more than happy if it could be usefull to other people or reasearch groups. Feel free to contact me or open an issue for feedback, possible extensions or problems. ####author Gryllos Prokopios (gryllosprokopis@gmail.com).
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              tweegraph has a low active ecosystem.
              It has 13 star(s) with 2 fork(s). There are no watchers for this library.
              OutlinedDot
              It had no major release in the last 6 months.
              There are 1 open issues and 0 have been closed. On average issues are closed in 60 days. There are no pull requests.
              It has a neutral sentiment in the developer community.
              The latest version of tweegraph is current.

            kandi-Quality Quality

              tweegraph has 0 bugs and 0 code smells.

            kandi-Security Security

              tweegraph has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              tweegraph code analysis shows 0 unresolved vulnerabilities.
              There are 0 security hotspots that need review.

            kandi-License License

              tweegraph does not have a standard license declared.
              Check the repository for any license declaration and review the terms closely.
              OutlinedDot
              Without a license, all rights are reserved, and you cannot use the library in your applications.

            kandi-Reuse Reuse

              tweegraph releases are not available. You will need to build from source code and install.
              Build file is available. You can build the component from source.
              Installation instructions are not available. Examples and code snippets are available.
              It has 677 lines of code, 47 functions and 15 files.
              It has high code complexity. Code complexity directly impacts maintainability of the code.

            Top functions reviewed by kandi - BETA

            kandi has reviewed tweegraph and discovered the below as its top functions. This is intended to give you an instant insight into tweegraph implemented functionality, and help decide if they suit your requirements.
            • Explore the graph
            • Returns the number of nodes in the pool
            • Request data from query
            • Crawl timeline
            • Create a logger for logging
            • Start the crawlers
            • Compute the mean of common hash tags
            • Determine the sentiment between two users
            • Get the common hash tags
            • Get a list of user timeline objects
            • Store timeline in database
            • Decorator for API calls
            • Create a tweepy api instance
            • Calculate the similarity between two sentences
            • Determine the disagreement between two users
            • Calculate the sentiment adamic adamic between two users
            • Calculates the inverse similarity of two users
            • Computes the relationship between two ids
            • Exports the selected nodes
            • Return the similarity between two users
            • Compute the similarity between two users
            Get all kandi verified functions for this library.

            tweegraph Key Features

            No Key Features are available at this moment for tweegraph.

            tweegraph Examples and Code Snippets

            No Code Snippets are available at this moment for tweegraph.

            Community Discussions

            No Community Discussions are available at this moment for tweegraph.Refer to stack overflow page for discussions.

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install tweegraph

            You can download it from GitHub.
            You can use tweegraph like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            CLONE
          • HTTPS

            https://github.com/PGryllos/tweegraph.git

          • CLI

            gh repo clone PGryllos/tweegraph

          • sshUrl

            git@github.com:PGryllos/tweegraph.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link