IPProxy | 爬虫所需要的IP代理,抓取九个网站的代理IP检测/清洗/入库/更新,添加调用接口 | Crawler library
kandi X-RAY | IPProxy Summary
kandi X-RAY | IPProxy Summary
ubuntu server 16.04.1 LTS 64位 ,python 3.5.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Initialize the database
- Get proxies
- Returns a list of all available IPs
- Refreshes the database
- Gets all IP addresses
- Adds the given IP list to the IP list
- Get page content
- Removes all items from the database
IPProxy Key Features
IPProxy Examples and Code Snippets
Community Discussions
Trending Discussions on IPProxy
QUESTION
I have a huuuuuge csv online and I wan't to read it line by line whitout download it. But this file is behind a proxy. I wrote this code :
...ANSWER
Answered 2020-Mar-10 at 14:52The requests.get
call will get you the whole file anyway. You'd need to implement your own HTTP code, down to the socket level, to be able to process the content as it gets in, in a plain HTTP Get method.
The only way of getting partial results and slice the download is to add HTTP "range" request headers, if the server providing the file support then. (requests
can let you set these headers).
The good news is that requests can do that for you under the hood -
you can set stream=True
parameter when calling requests, and it even will let you iterate the contents line by line. Check the documentation on that part.
Here is more or less what requests
does under the hood so that you can get your contents line by line:
It will get reasobale sized chunks of your data, - but certainly not equest one line at a time (think ~80 bytes versus 100.000 bytes), because otherwise it'd need a new HTTP request for each line,and the overhead for each request is not trivial, even if made over the same TCP connection.
Anyway, as CSV being a text format, neither requests nor any other software could know the size of the lines, and even less the exact size of the "next" line to be read - before setting the range headers accordingly.
So, for this to work, ther have to have to be Python code to:
- accept a request for a "new line" of the CSV if there are buffered text lines, yield the next line,
- otherwise make an HTTP request for the next 100KB or so
- Concatenate the downloaded data to the remainder of the last downloaded line
- split the downloaded data at the last line-feed in the binary data,
- save the remainder of the last line
- convert your binary buffer to text, (you'd have to take care of multi-byte character boundaries in a multi-byte encoding (like utf-8) - but cutting at newlines may save you that)
- yield the next text line
QUESTION
I asked this question: Wrap packets in connect requests until reach the last proxy
And I learnt that to create a chains of proxies I have to:
- create a socket
- connect the socket to proxy A
- create a tunnel via A to proxy B - either with HTTP or SOCKS protocol similar
- create a tunnel via [A,B] to proxy C similar
- create a tunnel via [A,B,C] to D
- ... until your last proxy is instructed to built the tunnel to the
final target T
I got what I have to do until the second point, cause I think I just have to add the "CONNECT" header to the http request to the proxy A. But my question is, in this example http request:
...ANSWER
Answered 2017-Sep-05 at 21:51There is no Host header with CONNECT. I.e. to request HTTP proxy A to create a tunnel to HTTP proxy B you just use:
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install IPProxy
You can use IPProxy like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page