twarc | command line tool ( and Python library
kandi X-RAY | twarc Summary
kandi X-RAY | twarc Summary
twarc is a command line tool and Python library for collecting and archiving Twitter JSON data via the Twitter API. It has separate commands (twarc and twarc2) for working with the older v1.1 API and the newer v2 API and Academic Access (respectively).
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Performs Twitter search .
- Return an argument parser .
- Perform Twitter search .
- Main entry point for twitter .
- Wait for a job to finish .
- Performs a handshake .
- Extract related data from a response .
- Generate a timeline .
- Decorator for the rate limit .
- Compute tweets for a given query .
twarc Key Features
twarc Examples and Code Snippets
twarc2 csv --help
Usage: twarc2 csv [OPTIONS] [INFILE] [OUTFILE]
Convert tweets to CSV.
Options:
--input-data-type [tweets|users|counts|compliance]
Input data type - you can turn "tweets",
sudo apt-get install twarc jq wget
pip3 install twarc
git clone https://github.com/netarchivesuite/webarchive-discovery.git
pushd webarchive-discovery/
git checkout solrconfig
cp -r warc-indexer/src/main/solr/solr7/ ../so-me_solr7_config
git checko
twarc-report/ # local clone
projects/
assets/ # copy of twarc-report/assets/
projectA/
data/ # created by harvest.py
tweets/ # populated with tweet*.json files by harvest.py
metadata.json
Community Discussions
Trending Discussions on twarc
QUESTION
I am using the command line tool twarc to download Twitter data as a csv. I have set up my twarc commands and they successfully execute on the command line without issue. Example command:
twarc dosomething > outputfile.jsonl
While I would like to carry out a collection process over an extended period of time, the output files become a bit too large (10+GB) after running for more than a day.
I would like to run a bash script that executes the twarc command, runs until the output file reaches a certain limit, and then starts a new file.
These questions are related...
...although I've had little luck with the translation.
Could anyone provide some insight on setting up a basic bash script to execute a command, wait until a file grows to X size, and then start again on a new file? Could take it from there...
...ANSWER
Answered 2020-Oct-02 at 20:16The tool you're looking for is aptly named split
:
QUESTION
I have a question regarding rehydrate of the tweet's text. Any help would be appreciated.
This is the source of my data; which is about corona tweets:
I have downloaded a data set from it which is in the photo (named 01-feb-2020)
Then, I filter this data to show me the only tweets from 'GB' which is almost 24000 tweets
I have used twarc to hydrate my tweets' text as below :
first, install twarc using pip
then, type this in the command line: twarc configure
then, inter consumer key and secret key
then, write a command:
...ANSWER
Answered 2020-Aug-05 at 18:24The Tweet ID collection method (which was copy-pasting ) was not correct. After writing a proper code to save tweet ID into text file, the problem has been solved.
Also, Andy Piper mentioned the same thing in the comment part which I copy past here.
How are you getting from JSON format downloaded, into a CSV format? I'm wondering whether the Tweet ID values are valid. – Andy Piper 5 hours ago
I've managed to reproduce this now, and I believe that in the process of converting your JSON input to CSV / Excel to a list of Tweet IDs to hydrate, you are probably using JavaScript (?) and the Tweet IDs are losing their accuracy. The clue was when I noticed all of the Tweet IDs ending in 0000 in my Excel column. You'll need to use a more precise method of getting the Tweet IDs into twarc
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install twarc
You can use twarc like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page