edgar | A crawler to get company filing data from XBRL filings | Data Visualization library

 by   palafrank Go Version: Current License: Apache-2.0

kandi X-RAY | edgar Summary

kandi X-RAY | edgar Summary

edgar is a Go library typically used in Analytics, Data Visualization applications. edgar has no bugs, it has no vulnerabilities, it has a Permissive License and it has low support. You can download it from GitHub.

A crawler to get company filing data from XBRL filings. The fetcher parses through the HTML pages and extracts data based on the XBRL tags that it finds and collects it into filing data arranged by filing date.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              edgar has a low active ecosystem.
              It has 10 star(s) with 2 fork(s). There are 3 watchers for this library.
              OutlinedDot
              It had no major release in the last 6 months.
              There are 2 open issues and 2 have been closed. There are no pull requests.
              It has a neutral sentiment in the developer community.
              The latest version of edgar is current.

            kandi-Quality Quality

              edgar has no bugs reported.

            kandi-Security Security

              edgar has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

            kandi-License License

              edgar is licensed under the Apache-2.0 License. This license is Permissive.
              Permissive licenses have the least restrictions, and you can use them in most projects.

            kandi-Reuse Reuse

              edgar releases are not available. You will need to build from source code and install.

            Top functions reviewed by kandi - BETA

            kandi has reviewed edgar and discovered the below as its top functions. This is intended to give you an instant insight into edgar implemented functionality, and help decide if they suit your requirements.
            • setData provides a function to set data
            • validate financial report
            • mapReports extracts the report number from a page .
            • lookupDocType looks for the document type
            • failingPageParser parses a file and parses it into a file .
            • parseHyperLinkTag parses a hyperlink tag .
            • parseTableRow parses a table row
            • Normalize number
            • Get missing documents
            • parseTableHeading parses a table heading
            Get all kandi verified functions for this library.

            edgar Key Features

            No Key Features are available at this moment for edgar.

            edgar Examples and Code Snippets

            No Code Snippets are available at this moment for edgar.

            Community Discussions

            QUESTION

            How to properly read large html in chunks with .iter_content?
            Asked 2021-Jun-13 at 19:35

            So, I'm a very amateur python programmer but hope all I'll explain makes sense.

            I want to scrape a type of Financial document called "10-K". I'm just interested in a little part of the whole document. An example of the URL I try to scrape is: https://www.sec.gov/Archives/edgar/data/320193/0000320193-20-000096.txt

            Now, if I download this document as a .txt, It "only" weights 12mb. So for my ignorance doesn't make much sense this takes 1-2 min to .read() (even I got a decent PC).

            The original code I was using:

            ...

            ANSWER

            Answered 2021-Jun-13 at 18:07

            The time it takes to read a document over the internet is really not related to the speed of your computer, at least in most cases. The most important determinant is the speed of your internet connection. Another important determinant is the speed with which the remote server responds to your request, which will depend in part on how many other requests the remote server is currently trying to handle.

            It's also possible that the slow-down is not due to either of the above causes, but rather to measures taken by the remote server to limit scraping or to avoid congestion. It's very common for servers to deliberately reduce responsiveness to clients which make frequent requests, or even to deny the requests entirely. Or to reduce the speed of data transmission to everyone, which is another way of controlling server load. In that case, there's not much you're going to be able to do to speed up reading the requests.

            From my machine, it takes a bit under 30 seconds to download the 12MB document. Since I'm in Perú it's possible that the speed of the internet connection is a factor, but I suspect that it's not the only issue. However, the data transmission does start reasonably quickly.

            If the problem were related to the speed of data transfer between your machine and the server, you could speed things up by using a streaming parser (a phrase you can search for). A streaming parser reads its input in small chunks and assembles them on the fly into tokens, which is basically what you are trying to do. But the streaming parser will deal transparently with the most difficult part, which is to avoid tokens being split between two chunks. However, the nature of the SEC document, which taken as a whole is not very pure HTML, might make it difficult to use standard tools.

            Since the part of the document you want to analyse is well past the middle, at least in the example you presented, you won't be able to reduce the download time by much. But that might still be worthwhile.

            The basic approach you describe is workable, but you'll need to change it a bit in order to cope with the search strings being split between chunks, as you noted. The basic idea is to append successive chunks until you find the string, rather than just looking at them one at a time.

            I'd suggest first identifying the entire document and then deciding whether it's the document you want. That reduces the search issue to a single string, the document terminator (\n\n; the newlines are added to reduce the possibility of false matches).

            Here's a very crude implementation, which I suggest you take as an example rather than just copying it into your program. The function docs yields successive complete documents from a url; the caller can use that to select the one they want. (In the sample code, the first matching document is used, although there are actually two matches in the complete file. If you want all matches, then you will have to read the entire input, in which case you won't have any speed-up at all, although you might still have some savings from not having to parse everything.)

            Source https://stackoverflow.com/questions/67958718

            QUESTION

            Beautifulsoup: Is it possible to get tag name and attribute name by its value?
            Asked 2021-May-31 at 08:55

            I'm trying to scrape bunch of websites. All of them have one particular table with some changes. For example: if you check this URL.

            It has the attribute value href="#icaec13e17ee4432d9971f5e4b3d32ba1_265" and refers to the tag

            . However, in another URL it is denoted by ... So, I'll only have the attribute value icaec13e17ee4432d9971f5e4b3d32ba1_265. The tag name and the attribute name varies. How to get them with attribute value?

            ...

            ANSWER

            Answered 2021-May-31 at 08:52

            You could define a filter function that checks if there is one HTML tag with a attribute value equal to value:

            Source https://stackoverflow.com/questions/67770019

            QUESTION

            How to navigate to certain tags in BeautifulSoup object?
            Asked 2021-Apr-28 at 22:00

            Link to url I'm working with: https://www.sec.gov/Archives/edgar/data/789019/000106299321002323/0001062993-21-002323.txt

            I can access the text/values contained in some tags, but not in others.

            Setup (how I got to the BS soup object):

            ...

            ANSWER

            Answered 2021-Apr-28 at 22:00

            You need to use lxml's XML parser.

            For HTML:

            Source https://stackoverflow.com/questions/67292833

            QUESTION

            How to aggregate rows of one column based on time intervals in another column?
            Asked 2021-Apr-11 at 17:46

            I have a dataset containing Reddit data. More specifically, all posts made in subreddit GME that mention "GME". See below for how this looks like:

            For reproduction purposes, here is the dictionary of the first 25 rows:

            ...

            ANSWER

            Answered 2021-Apr-11 at 17:46

            You could convert your date column to datetime, and then use pd.Grouper with groupby, as per below:

            Source https://stackoverflow.com/questions/67045748

            QUESTION

            Pandas read xml not working properly for single tag xml
            Asked 2021-Apr-07 at 13:55

            I am using the pandas_read_xml package for reading and processing xml files into a pandas dataframe. The package works absolutely fine for my purpose in the vast majority of cases. However, the dataframe output is kind of off when reading a url with just a single tag. Let me illustrate this with the following two examples.

            ...

            ANSWER

            Answered 2021-Apr-07 at 13:55

            First of all, thanks for the feedback! I wrote pandas-read-xml because pandas did not have a pd.read_xml() implementation. You (and the rest of us) will be pleased to know that there is a dev version of pandas read_xml which should be coming soon! (https://pandas.pydata.org/docs/dev/reference/api/pandas.read_xml.html)

            As for you current conundrum, this is a result (and one of my many dislikes towards) of the structure of XML. Unlike JSON, where single elements can be returned within a list, the XML structure just has one XML tag, which is interpreted as a single value rather than a list.

            Essentially, if there is only one "row" tag, then the "column" tags is now treated as column tags... I'm not making much sense am I? Let me explain with your examples.

            Here is how I suggest you use it:

            Source https://stackoverflow.com/questions/66710039

            QUESTION

            Google Apps Script: 403 error for UrlFetchApp
            Asked 2021-Apr-04 at 01:58

            I'm using google app script

            ...

            ANSWER

            Answered 2021-Apr-04 at 01:58

            Although, unfortunately, I cannot replicate your situation, from but once a while, I get a failed execution and this is the response., for example, how about retrying the request as follows?

            Sample script:

            Source https://stackoverflow.com/questions/66936814

            QUESTION

            Add or Update elements of an ArrayList based on another ArrayList
            Asked 2021-Mar-27 at 13:37

            I have two ArrayLists of ClassRoom object and below shows the ClassRoom class :

            ...

            ANSWER

            Answered 2021-Mar-27 at 11:30

            QUESTION

            google apps script Xmlservice
            Asked 2021-Mar-27 at 01:14

            I'm updating some code that used to use Xml.parse to parse this page https://www.sec.gov/cgi-bin/browse-edgar?company=&CIK=&type=8-k&owner=exclude&count=100&action=getcurrent

            The old code uses Xml to get the table like... this

            ...

            ANSWER

            Answered 2021-Mar-27 at 01:14

            I believe your goal as follows.

            • You want to retrieve the values from entry of the XML data and want to put the values to the Spreadsheet using Google Apps Script.
            Modification points:
            • When I saw the data from the URL of https://www.sec.gov/cgi-bin/browse-edgar?action=getcurrent&CIK=&type=8-k&company=&dateb=&owner=include&start=0&count=40&output=atom, I confirmed that the data is the XML data.
            • When I saw your script, it seems that entry is not retrieved.
            Modified script:

            Source https://stackoverflow.com/questions/66825692

            QUESTION

            Using rvest (or another R package) to detect when the start of an HTML paragraph is a different format (e.g. emboldened)
            Asked 2021-Mar-25 at 03:31

            I am using an R package, edgarWebR, to parse SEC filings, such as https://www.sec.gov/Archives/edgar/data/1060224/000090480206000008/sa10k306.htm. It returns a dataframe, of which one column - called "raw" - is HTML. It breaks up the HTML page into paragraphs, one row per paragraph:

            other columns raw text First row

            We had a net loss of $1.55 million for the year ended December 31, 2016 and have an accumulated deficit of $61.5 million as of December 31, 2016. To achieve sustainable profitability, we must generate increased revenue.

            We had a net loss of $1.55 million for the year ended December 31, 2016 and have an accumulated deficit of $61.5 million as of December 31, 2016. To achieve sustainable profitability, we must generate increased revenue. Second row We have a history of losses, and we cannot assure you that we will achieve profitability. We have a history of losses, and we cannot assure you that we will achieve profitability.

            You can easily replicate an example dataframe by running

            ...

            ANSWER

            Answered 2021-Mar-23 at 20:30

            I've read your related questions here on SO. Interesting work! I believe the solution is somewhere along the lines of:

            1: Extract the relevant words from the HTML by doing what you're already doing

            Source https://stackoverflow.com/questions/66681710

            QUESTION

            Why am I unable to load "Groceries" data set in R?
            Asked 2021-Mar-18 at 10:25

            I am unable to load Groceries data set in R.

            Can anyone help?

            ...

            ANSWER

            Answered 2021-Mar-18 at 10:25

            Groceries is in the arules package.

            Source https://stackoverflow.com/questions/66689053

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install edgar

            You can download it from GitHub.

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            CLONE
          • HTTPS

            https://github.com/palafrank/edgar.git

          • CLI

            gh repo clone palafrank/edgar

          • sshUrl

            git@github.com:palafrank/edgar.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link