wrk2 | constant throughput , correct latency recording variant
kandi X-RAY | wrk2 Summary
kandi X-RAY | wrk2 Summary
wrk2 is wrk modifed to produce a constant throughput load, and accurate latency details to the high 9s (i.e. can produce accurate 99.9999%'ile when run long enough). In addition to wrk's arguments, wrk2 takes a throughput argument (in total requests per second) via either the --rate or -R parameters (default is 1000). CRITICAL NOTE: Before going farther, I'd like to make it clear that this work is in no way intended to be an attack on or a disparagement of the great work that Will Glozer has done with wrk. I enjoyed working with his code, and I sincerely hope that some of the changes I had made might be considered for inclusion back into wrk. As those of you who may be familiar with my latency related talks and rants, the latency measurement issues that I focused on fixing with wrk2 are extremely common in load generators and in monitoring code. I do not ascribe any lack of skill or intelligence to people who's creations repeat them. I was once (as recently as 2-3 years ago) just as oblivious to the effects of Coordinated Omission as the rest of the world still is. wrk2 replaces wrk's individual request sample buffers with HdrHistograms. wrk2 maintains wrk's Lua API, including it's presentation of the stats objects (latency and requests). The stats objects are "emulated" using HdrHistograms. E.g. a request for a raw sample value at index i (see latency[i] below) will return the value at the associated percentile (100.0 * i / __len). As a result of using HdrHistograms for full (lossless) recording, constant throughput load generation, and accurate tracking of response latency (from the point in time where a request was supposed to be sent per the "plan" to the time that it actually arrived), wrk2's latency reporting is significantly more accurate (as in "correct") than that of wrk's current (Nov. 2014) execution model. It is important to note that in wrk2's current constant-throughput implementation, measured latencies are [only] accurate to a +/- ~1 msec granularity, due to OS sleep time behavior. wrk2 is currently in experimental/development mode, and may well be merged into wrk in the future if others see fit to adopt it's changes. The remaining part of the README is wrk's, with minor changes to reflect additional parameter and output. There is an important and detailed note at the end about about wrk2's latency measurement technique, including a discussion of Coordinated Omission, how wrk2 avoids it, and detailed output that demonstrates it. wrk2 (as is wrk) is a modern HTTP benchmarking tool capable of generating significant load when run on a single multi-core CPU. It combines a multithreaded design with scalable event notification systems such as epoll and kqueue. An optional LuaJIT script can perform HTTP request generation, response processing, and custom reporting. Several example scripts are located in scripts/.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of wrk2
wrk2 Key Features
wrk2 Examples and Code Snippets
Community Discussions
Trending Discussions on wrk2
QUESTION
I have:
- ML model (PyTorch) that vectorizes data and makes a prediction in ~3.5ms (median ≈ mean)
- HTTP API (FastAPI + uvicorn) that serves simple requests in ~2ms
But when I combine them, the median response time becomes almost 200ms.
What can be the reason for such degradation?
Note that:
- I also tried aiohttp alone, aiohttp + gunicorn and Flask development server for serving - same result
- I tried to send 2, 20 and 100 requests per second - same result
- I do realize that parallel requests can lead to decreased latency, but not 30 times!
- CPU load is only ~7%
Here's how I measured model performance (I measured the median time separately, it's nearly the same as the mean time):
...ANSWER
Answered 2021-Feb-02 at 23:41In endpoints that does highly intensive calculations and which presumably takes longer when compared to the other endpoints, use a non-coroutine handler.
When you use def
instead of async def
, by default FastAPI will use run_in_threadpool
from Starlette and which also uses loop.run_in_executor
underneath.
run_in_executor
will execute the function in the default loops executor, it executes the function in a seperate thread, also you might want to check options like ProcessPoolExecutor
and ThreadPoolExecutor
if you are doing highly CPU intensive work.
This math simple math helps a lot when working with coroutines.
QUESTION
Thanks in advance for any pointers or help.
Basically I was expecting the async version of this to perform far better than the sync version, but instead the sync version performs equivalent or somehow better.
Am I doing something wrong, what gives? I tried without Javalin in case something in the framework was creating a problem, seemed to give similar results. I did try this with just Netty (too long to post the code) and I experienced similar results also.
I wrote the following code: (javalin-3.12.0 and jetty-9.4.31.v20200723)
...ANSWER
Answered 2020-Dec-09 at 13:59Since Jetty 9+ is 100% async from the get go, this lack of difference makes sense.
(In fact, in Jetty 9+ there is extra work done to pretend to be synchronous when using synchronous APIs like InputStream.read()
or OutputStream.write()
)
Also, your load testing workloads are not realistic.
- You want many more client machines to do the testing with. There is no single software client alone is capable of stressing a Jetty server. You'll hit system resource limits well before you hit any kind of Jetty serving limits.
- At least a 4 to 1 ratio (we test with a 8 to 1 ratio) of client machines to server machines to generating enough load to stress Jetty.
- You want many concurrent connections to the server. (think 40,000+)
- Or you want HTTP/2 in the picture (which also stresses out the server resources in it's own unique ways)
- You want lots of data returned (something that would take multiple network buffers to return).
- You want to sprinkle in a few client connections that are slow to read as well (which on a synchronous server can impact the rest of the connections that are not slow by simply consuming too many resources)
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install wrk2
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page