ipproxy | 代理IP提取工具 - A simple tool to crawl proxy ip | Proxy library
kandi X-RAY | ipproxy Summary
kandi X-RAY | ipproxy Summary
A simple tool to crawl proxy ip.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Get all IPs
- Returns BeautifulSoup object
- Generate set of IP addresses from a page
- Validate ip list
- Validate an IP address
- Return a set of all proxy ips
- Scans the soup
- Returns a set of all IPs
- Returns list of ip addresses
- Returns a set of ips
- Returns a set of ip addresses
- Get a set of IP addresses
- Scrape IP66
- Return a random proxy IP address
- Sort the proxy IP
- Save to csv
- Parse arguments
- Write proxies to csv
- Set logging level
- Run the process
- Read a csv file
ipproxy Key Features
ipproxy Examples and Code Snippets
Community Discussions
Trending Discussions on ipproxy
QUESTION
I have a huuuuuge csv online and I wan't to read it line by line whitout download it. But this file is behind a proxy. I wrote this code :
...ANSWER
Answered 2020-Mar-10 at 14:52The requests.get
call will get you the whole file anyway. You'd need to implement your own HTTP code, down to the socket level, to be able to process the content as it gets in, in a plain HTTP Get method.
The only way of getting partial results and slice the download is to add HTTP "range" request headers, if the server providing the file support then. (requests
can let you set these headers).
The good news is that requests can do that for you under the hood -
you can set stream=True
parameter when calling requests, and it even will let you iterate the contents line by line. Check the documentation on that part.
Here is more or less what requests
does under the hood so that you can get your contents line by line:
It will get reasobale sized chunks of your data, - but certainly not equest one line at a time (think ~80 bytes versus 100.000 bytes), because otherwise it'd need a new HTTP request for each line,and the overhead for each request is not trivial, even if made over the same TCP connection.
Anyway, as CSV being a text format, neither requests nor any other software could know the size of the lines, and even less the exact size of the "next" line to be read - before setting the range headers accordingly.
So, for this to work, ther have to have to be Python code to:
- accept a request for a "new line" of the CSV if there are buffered text lines, yield the next line,
- otherwise make an HTTP request for the next 100KB or so
- Concatenate the downloaded data to the remainder of the last downloaded line
- split the downloaded data at the last line-feed in the binary data,
- save the remainder of the last line
- convert your binary buffer to text, (you'd have to take care of multi-byte character boundaries in a multi-byte encoding (like utf-8) - but cutting at newlines may save you that)
- yield the next text line
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install ipproxy
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page