edgar | A crawler to get company filing data from XBRL filings | Data Visualization library

by palafrank Go Version: Current License: Apache-2.0

X-Ray Key Features Code Snippets Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | edgar Summary

edgar is a Go library typically used in Analytics, Data Visualization applications. edgar has no bugs, it has no vulnerabilities, it has a Permissive License and it has low support. You can download it from GitHub.

A crawler to get company filing data from XBRL filings. The fetcher parses through the HTML pages and extracts data based on the XBRL tags that it finds and collects it into filing data arranged by filing date.

Support

Quality

Security

License

Reuse

Support

edgar has a low active ecosystem.

It has 10 star(s) with 2 fork(s). There are 3 watchers for this library.

It had no major release in the last 6 months.

There are 2 open issues and 2 have been closed. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of edgar is current.

Quality

edgar has no bugs reported.

Security

edgar has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

License

edgar is licensed under the Apache-2.0 License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

edgar releases are not available. You will need to build from source code and install.

Top functions reviewed by kandi - BETA

kandi has reviewed edgar and discovered the below as its top functions. This is intended to give you an instant insight into edgar implemented functionality, and help decide if they suit your requirements.

setData provides a function to set data
validate financial report
mapReports extracts the report number from a page .
lookupDocType looks for the document type
failingPageParser parses a file and parses it into a file .
parseHyperLinkTag parses a hyperlink tag .
parseTableRow parses a table row
Normalize number
Get missing documents
parseTableHeading parses a table heading

Get all kandi verified functions for this library.

edgar Key Features

No Key Features are available at this moment for edgar.

edgar Examples and Code Snippets

No Code Snippets are available at this moment for edgar.

Community Discussions

Trending Discussions on edgar

How to properly read large html in chunks with .iter_content?

Beautifulsoup: Is it possible to get tag name and attribute name by its value?

How to navigate to certain tags in BeautifulSoup object?

How to aggregate rows of one column based on time intervals in another column?

Pandas read xml not working properly for single tag xml

Google Apps Script: 403 error for UrlFetchApp

Add or Update elements of an ArrayList based on another ArrayList

google apps script Xmlservice

Using rvest (or another R package) to detect when the start of an HTML paragraph is a different format (e.g. emboldened)

Why am I unable to load "Groceries" data set in R?

QUESTION

How to properly read large html in chunks with .iter_content?

Asked 2021-Jun-13 at 19:35

So, I'm a very amateur python programmer but hope all I'll explain makes sense.

I want to scrape a type of Financial document called "10-K". I'm just interested in a little part of the whole document. An example of the URL I try to scrape is: https://www.sec.gov/Archives/edgar/data/320193/0000320193-20-000096.txt

Now, if I download this document as a .txt, It "only" weights 12mb. So for my ignorance doesn't make much sense this takes 1-2 min to .read() (even I got a decent PC).

The original code I was using:

...

ANSWER

Answered 2021-Jun-13 at 18:07

The time it takes to read a document over the internet is really not related to the speed of your computer, at least in most cases. The most important determinant is the speed of your internet connection. Another important determinant is the speed with which the remote server responds to your request, which will depend in part on how many other requests the remote server is currently trying to handle.

It's also possible that the slow-down is not due to either of the above causes, but rather to measures taken by the remote server to limit scraping or to avoid congestion. It's very common for servers to deliberately reduce responsiveness to clients which make frequent requests, or even to deny the requests entirely. Or to reduce the speed of data transmission to everyone, which is another way of controlling server load. In that case, there's not much you're going to be able to do to speed up reading the requests.

From my machine, it takes a bit under 30 seconds to download the 12MB document. Since I'm in Perú it's possible that the speed of the internet connection is a factor, but I suspect that it's not the only issue. However, the data transmission does start reasonably quickly.

If the problem were related to the speed of data transfer between your machine and the server, you could speed things up by using a streaming parser (a phrase you can search for). A streaming parser reads its input in small chunks and assembles them on the fly into tokens, which is basically what you are trying to do. But the streaming parser will deal transparently with the most difficult part, which is to avoid tokens being split between two chunks. However, the nature of the SEC document, which taken as a whole is not very pure HTML, might make it difficult to use standard tools.

Since the part of the document you want to analyse is well past the middle, at least in the example you presented, you won't be able to reduce the download time by much. But that might still be worthwhile.

The basic approach you describe is workable, but you'll need to change it a bit in order to cope with the search strings being split between chunks, as you noted. The basic idea is to append successive chunks until you find the string, rather than just looking at them one at a time.

I'd suggest first identifying the entire document and then deciding whether it's the document you want. That reduces the search issue to a single string, the document terminator (\n\n; the newlines are added to reduce the possibility of false matches).

Here's a very crude implementation, which I suggest you take as an example rather than just copying it into your program. The function docs yields successive complete documents from a url; the caller can use that to select the one they want. (In the sample code, the first matching document is used, although there are actually two matches in the complete file. If you want all matches, then you will have to read the entire input, in which case you won't have any speed-up at all, although you might still have some savings from not having to parse everything.)

Source https://stackoverflow.com/questions/67958718

QUESTION

Beautifulsoup: Is it possible to get tag name and attribute name by its value?

Asked 2021-May-31 at 08:55

I'm trying to scrape bunch of websites. All of them have one particular table with some changes. For example: if you check this URL.

It has the attribute value href="#icaec13e17ee4432d9971f5e4b3d32ba1_265" and refers to the tag

. However, in another URL it is denoted by ... So, I'll only have the attribute value icaec13e17ee4432d9971f5e4b3d32ba1_265. The tag name and the attribute name varies. How to get them with attribute value?

...

ANSWER

Answered 2021-May-31 at 08:52

You could define a filter function that checks if there is one HTML tag with a attribute value equal to value:

Source https://stackoverflow.com/questions/67770019

QUESTION

How to navigate to certain tags in BeautifulSoup object?

Asked 2021-Apr-28 at 22:00

Link to url I'm working with: https://www.sec.gov/Archives/edgar/data/789019/000106299321002323/0001062993-21-002323.txt

I can access the text/values contained in some tags, but not in others.

Setup (how I got to the BS soup object):

...

ANSWER

Answered 2021-Apr-28 at 22:00

You need to use lxml's XML parser.

For HTML:

Source https://stackoverflow.com/questions/67292833

QUESTION

How to aggregate rows of one column based on time intervals in another column?

Asked 2021-Apr-11 at 17:46

I have a dataset containing Reddit data. More specifically, all posts made in subreddit GME that mention "GME". See below for how this looks like:

For reproduction purposes, here is the dictionary of the first 25 rows:

...

ANSWER

Answered 2021-Apr-11 at 17:46

You could convert your date column to datetime, and then use pd.Grouper with groupby, as per below:

Source https://stackoverflow.com/questions/67045748

QUESTION

Pandas read xml not working properly for single tag xml

Asked 2021-Apr-07 at 13:55

I am using the pandas_read_xml package for reading and processing xml files into a pandas dataframe. The package works absolutely fine for my purpose in the vast majority of cases. However, the dataframe output is kind of off when reading a url with just a single tag. Let me illustrate this with the following two examples.

...

ANSWER

Answered 2021-Apr-07 at 13:55

First of all, thanks for the feedback! I wrote pandas-read-xml because pandas did not have a pd.read_xml() implementation. You (and the rest of us) will be pleased to know that there is a dev version of pandas read_xml which should be coming soon! (https://pandas.pydata.org/docs/dev/reference/api/pandas.read_xml.html)

As for you current conundrum, this is a result (and one of my many dislikes towards) of the structure of XML. Unlike JSON, where single elements can be returned within a list, the XML structure just has one XML tag, which is interpreted as a single value rather than a list.

Essentially, if there is only one "row" tag, then the "column" tags is now treated as column tags... I'm not making much sense am I? Let me explain with your examples.

Here is how I suggest you use it:

Source https://stackoverflow.com/questions/66710039

QUESTION

Google Apps Script: 403 error for UrlFetchApp

Asked 2021-Apr-04 at 01:58

I'm using google app script

...

ANSWER

Answered 2021-Apr-04 at 01:58

Although, unfortunately, I cannot replicate your situation, from but once a while, I get a failed execution and this is the response., for example, how about retrying the request as follows?

Sample script:

Source https://stackoverflow.com/questions/66936814

QUESTION

Add or Update elements of an ArrayList based on another ArrayList

Asked 2021-Mar-27 at 13:37

I have two ArrayLists of ClassRoom object and below shows the ClassRoom class :

...

ANSWER

Answered 2021-Mar-27 at 11:30

You can do like this:

Source https://stackoverflow.com/questions/66830481

QUESTION

google apps script Xmlservice

Asked 2021-Mar-27 at 01:14

I'm updating some code that used to use Xml.parse to parse this page https://www.sec.gov/cgi-bin/browse-edgar?company=&CIK=&type=8-k&owner=exclude&count=100&action=getcurrent

The old code uses Xml to get the table like... this

...

ANSWER

Answered 2021-Mar-27 at 01:14

I believe your goal as follows.

You want to retrieve the values from entry of the XML data and want to put the values to the Spreadsheet using Google Apps Script.

Modification points:

When I saw the data from the URL of https://www.sec.gov/cgi-bin/browse-edgar?action=getcurrent&CIK=&type=8-k&company=&dateb=&owner=include&start=0&count=40&output=atom, I confirmed that the data is the XML data.
When I saw your script, it seems that entry is not retrieved.

Modified script:

Source https://stackoverflow.com/questions/66825692

QUESTION

Using rvest (or another R package) to detect when the start of an HTML paragraph is a different format (e.g. emboldened)

Asked 2021-Mar-25 at 03:31

I am using an R package, edgarWebR, to parse SEC filings, such as https://www.sec.gov/Archives/edgar/data/1060224/000090480206000008/sa10k306.htm. It returns a dataframe, of which one column - called "raw" - is HTML. It breaks up the HTML page into paragraphs, one row per paragraph:

other columns raw text First row

We had a net loss of $1.55 million for the year ended December 31, 2016 and have an accumulated deficit of $61.5 million as of December 31, 2016. To achieve sustainable profitability, we must generate increased revenue.

We had a net loss of $1.55 million for the year ended December 31, 2016 and have an accumulated deficit of $61.5 million as of December 31, 2016. To achieve sustainable profitability, we must generate increased revenue. Second row We have a history of losses, and we cannot assure you that we will achieve profitability. We have a history of losses, and we cannot assure you that we will achieve profitability.

You can easily replicate an example dataframe by running

...

ANSWER

Answered 2021-Mar-23 at 20:30

I've read your related questions here on SO. Interesting work! I believe the solution is somewhere along the lines of:

1: Extract the relevant words from the HTML by doing what you're already doing

Source https://stackoverflow.com/questions/66681710

QUESTION

Why am I unable to load "Groceries" data set in R?

Asked 2021-Mar-18 at 10:25

I am unable to load Groceries data set in R.

Can anyone help?

...

ANSWER

Answered 2021-Mar-18 at 10:25

Groceries is in the arules package.

Source https://stackoverflow.com/questions/66689053

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install edgar

You can download it from GitHub.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: