sse | HTML5 Server-Sent-Events for Go | Websocket library

by julienschmidt Go Version: Current License: MIT

X-Ray Key Features Code Snippets(3)Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | sse Summary

sse is a Go library typically used in Networking, Websocket applications. sse has no bugs, it has no vulnerabilities, it has a Permissive License and it has low support. You can download it from GitHub.

HTML5 Server-Sent Events for Go.

Support

Quality

Security

License

Reuse

Support

sse has a low active ecosystem.

It has 69 star(s) with 12 fork(s). There are 5 watchers for this library.

It had no major release in the last 6 months.

There are 0 open issues and 1 have been closed. On average issues are closed in 38 days. There are 1 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of sse is current.

Quality

sse has 0 bugs and 0 code smells.

Security

sse has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

sse code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

sse is licensed under the MIT License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

sse releases are not available. You will need to build from source code and install.

It has 390 lines of code, 27 functions and 2 files.

It has medium code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi has reviewed sse and discovered the below as its top functions. This is intended to give you an instant insight into sse implemented functionality, and help decide if they suit your requirements.

SendBytes sends an event to the channel
format returns the event string .
New returns a new Streamer

Get all kandi verified functions for this library.

sse Key Features

No Key Features are available at this moment for sse.

sse Examples and Code Snippets

SSE mappings

java

Lines of Code : 22

License : Permissive (MIT License)

Copy

@GetMapping("/stream-sse-mvc")
    public SseEmitter streamSseMvc() {
        SseEmitter emitter = new SseEmitter();
        ExecutorService sseMvcExecutor = Executors.newSingleThreadExecutor();

        sseMvcExecutor.execute(() -> {

Starts the SSE event source .

java

Lines of Code : 17

License : Permissive (MIT License)

Copy

public static void main(String... args) throws Exception {

        Client client = ClientBuilder.newClient();
        WebTarget target = client.target(url);
        try (SseEventSource eventSource = SseEventSource.target(target).build()) {

Returns a simple SSE header value .

java

Lines of Code : 17

License : Permissive (MIT License)

Copy

public static String simpleSSEHeader() throws InterruptedException {
        Client client = ClientBuilder.newBuilder()
                .register(AddHeaderOnRequestFilter.class)
                .build();

        WebTarget webTarget = client.target(T

Community Discussions

Trending Discussions on sse

Why does gcc -march=znver1 restrict uint64_t vectorization?

How are instruction sets standardized?

What instruction set does SFENCE belong to?

Use trio nursery as a generator for Sever Sent Events with FastAPI?

Compling Rust on Mac M1 for target x86_64 linux

Why does this program print characters repeatedly, when they only appear in heap memory once?

Clarifications about SIMD in C

What is the purpose of the MoveMask for SSE and AVX

Puppeteer not working NodeJS 17 on Arch Linux

What is the syntax of "align" keyword/instruction in x86 assembly?

QUESTION

Why does gcc -march=znver1 restrict uint64_t vectorization?

Asked 2022-Apr-10 at 02:47

I'm trying to make sure gcc vectorizes my loops. It turns out, that by using -march=znver1 (or -march=native) gcc skips some loops even though they can be vectorized. Why does this happen?

In this code, the second loop, which multiplies each element by a scalar is not vectorised:

...

ANSWER

Answered 2022-Apr-10 at 02:47

The default -mtune=generic has -mprefer-vector-width=256, and -mavx2 doesn't change that.

znver1 implies -mprefer-vector-width=128, because that's all the native width of the HW. An instruction using 32-byte YMM vectors decodes to at least 2 uops, more if it's a lane-crossing shuffle. For simple vertical SIMD like this, 32-byte vectors would be ok; the pipeline handles 2-uop instructions efficiently. (And I think is 6 uops wide but only 5 instructions wide, so max front-end throughput isn't available using only 1-uop instructions). But when vectorization would require shuffling, e.g. with arrays of different element widths, GCC code-gen can get messier with 256-bit or wider.

And vmovdqa ymm0, ymm1 mov-elimination only works on the low 128-bit half on Zen1. Also, normally using 256-bit vectors would imply one should use vzeroupper afterwards, to avoid performance problems on other CPUs (but not Zen1).

I don't know how Zen1 handles misaligned 32-byte loads/stores where each 16-byte half is aligned but in separate cache lines. If that performs well, GCC might want to consider increasing the znver1 -mprefer-vector-width to 256. But wider vectors means more cleanup code if the size isn't known to be a multiple of the vector width.

Ideally GCC would be able to detect easy cases like this and use 256-bit vectors there. (Pure vertical, no mixing of element widths, constant size that's am multiple of 32 bytes.) At least on CPUs where that's fine: znver1, but not bdver2 for example where 256-bit stores are always slow due to a CPU design bug.

You can see the result of this choice in the way it vectorizes your first loop, the memset-like loop, with a vmovdqu [rdx], xmm0. https://godbolt.org/z/E5Tq7Gfzc

So given that GCC has decided to only use 128-bit vectors, which can only hold two uint64_t elements, it (rightly or wrongly) decides it wouldn't be worth using vpsllq / vpaddd to implement qword *5 as (v<<2) + v, vs. doing it with integer in one LEA instruction.

Almost certainly wrongly in this case, since it still requires a separate load and store for every element or pair of elements. (And loop overhead since GCC's default is not to unroll except with PGO, -fprofile-use. SIMD is like loop unrolling, especially on a CPU that handles 256-bit vectors as 2 separate uops.)

I'm not sure exactly what GCC means by "not vectorized: unsupported data-type". x86 doesn't have a SIMD uint64_t multiply instruction until AVX-512, so perhaps GCC assigns it a cost based on the general case of having to emulate it with multiple 32x32 => 64-bit pmuludq instructions and a bunch of shuffles. And it's only after it gets over that hump that it realizes that it's actually quite cheap for a constant like 5 with only 2 set bits?

That would explain GCC's decision-making process here, but I'm not sure it's exactly the right explanation. Still, these kinds of factors are what happen in a complex piece of machinery like a compiler. A skilled human can easily make smarter choices, but compilers just do sequences of optimization passes that don't always consider the big picture and all the details at the same time.

-mprefer-vector-width=256 doesn't help: Not vectorizing uint64_t *= 5 seems to be a GCC9 regression

(The benchmarks in the question confirm that an actual Zen1 CPU gets a nearly 2x speedup, as expected from doing 2x uint64 in 6 uops vs. 1x in 5 uops with scalar. Or 4x uint64_t in 10 uops with 256-bit vectors, including two 128-bit stores which will be the throughput bottleneck along with the front-end.)

Even with -march=znver1 -O3 -mprefer-vector-width=256, we don't get the *= 5 loop vectorized with GCC9, 10, or 11, or current trunk. As you say, we do with -march=znver2. https://godbolt.org/z/dMTh7Wxcq

We do get vectorization with those options for uint32_t (even leaving the vector width at 128-bit). Scalar would cost 4 operations per vector uop (not instruction), regardless of 128 or 256-bit vectorization on Zen1, so this doesn't tell us whether *= is what makes the cost-model decide not to vectorize, or just the 2 vs. 4 elements per 128-bit internal uop.

With uint64_t, changing to arr[i] += arr[i]<<2; still doesn't vectorize, but arr[i] <<= 1; does. (https://godbolt.org/z/6PMn93Y5G). Even arr[i] <<= 2; and arr[i] += 123 in the same loop vectorize, to the same instructions that GCC thinks aren't worth it for vectorizing *= 5, just different operands, constant instead of the original vector again. (Scalar could still use one LEA). So clearly the cost-model isn't looking as far as final x86 asm machine instructions, but I don't know why arr[i] += arr[i] would be considered more expensive than arr[i] <<= 1; which is exactly the same thing.

GCC8 does vectorize your loop, even with 128-bit vector width: https://godbolt.org/z/5o6qjc7f6

Source https://stackoverflow.com/questions/71811588

QUESTION

How are instruction sets standardized?

Asked 2022-Mar-10 at 08:59

My understanding is the AMD64 was invented by AMD as a 64 bit version of x86.

New instructions are added by both AMD and Intel (because those are the only two companies that implement AMD64). In effect, there is no central standard like there is in C++.

When new instructions are added they are usually a part of a "set" like sse or avx.

In my research, the designation for some instructions is inconsistent, ie its not always clear which set an instruction belongs to

What defines the instruction sets? Is there a universal agreement what what instructions are in which sets or is it decided by convention?

...

ANSWER

Answered 2022-Mar-10 at 03:14

There is no such thing, and I cannot imagine how there would be.

One or more people at intel define their instruction sets for their products, period. If AMD happens to have been able to make legal clones (which they have) and as part of that agreement or perhaps even not but with some penalty, they add additional instructions/features. First off it is on them to do it and keep some sense of compatibility, if they even want to be compatible. Second if they want to add extensions and can get away with it it is purely within AMD one or more engineers. Then if intel goes and makes some new instructions, it is one or more intel engineers. As history played out you then have other completely disconnected parties like gnu tools folks, microsoft tools folks and a list of others, as well as operating system folks that use tools and make their products, choosing directly or indirectly what instructions get used. And as history plays out some of these intel only or amd only instructions may be favored by one party or another. And if that party happens to have influence (microsoft windows, linux, etc), to the point that it puts pressure on intel or amd to lean one way or another, they it is their management and engineering that does that, within their company. They can choose to not go with what the users want and try to push users in their direction. Simple sales of one product line or another may dictate the success or failure of each parties decisions.

I cannot think of a or many standards that folks actually agree on even though they might have representatives that wear shirts with the same logo on them that participate in the standards bodies. From pcie to java to C++, etc (C and C++ being really bad since they were written then attempts to standardize later, which are just patches and too much left to individual compiler authors choices of interpretation). You want to win at business you differentiate yourself from the others. I have an x86 clone that is much cheaper but performs 95% as well as intel. Plus I added my own stuff that intel does not have that I pay employees to add to open source stuff, making those open source things optional to gain that feature/performance boost. That differentiates me from the competition, and for some users locks me in as their only choice.

Instruction sets for an architecture (x86 has a long line of architectures over time, arm does too and they are more organized about it imo, etc) are defined by that individual or teams within that company. End of story. At best they may have to avoid patents (yep there have been patents you have to avoid, making it hard to make a new instruction set). If two competing and compatible architectures like intel and amd (or intel team a vs intel team b, amd team a vs ...) happen to adopt each others features/instructions it is more market driven not some standards body.

Basically go look at itanium vs amd64 and how that played out.

The x86 history is a bit of a nightmare and I still cannot fathom why it still even exists (has nothing to do with the quality of the instruction set but instead how the business works), and as such attempting to put labels on things and organize them into individual boxes, really does not add any value and creates some chaos. Generation r of intel has this, generation m of amd has that, my tool supports gen r of this and gen m of that. Next year I will personally choose if I want to support the next gen of each or not. Repeat forever until the products die. You also have to choose if you want to support an older generation as those may have the same instructions but with different features/side effects despite in theory being compatible.

Source https://stackoverflow.com/questions/71413248

QUESTION

What instruction set does SFENCE belong to?

Asked 2022-Mar-03 at 21:41

I've been doing a good amount of research on AMD64 (x86-64) instructions, and its been kind of confusing. A lot of the time official CPU documentation doesn't designate instruction as part of a specific set, and the internet is sometimes split on which instruction set a specific instruction belongs to. One example of this is SFENCE, with some sources claiming that it's part of EMMX and others claiming it's part of SSE.

I'm trying to organize all of them in a spreadsheet to help with learning, but these inconsistencies are incredibly frustrating in a field that is famously technical and precise.

...

ANSWER

Answered 2022-Mar-03 at 18:00

EMMX is a subset of SSE, and sfence is part of both of them.

AMD did not immediately support all SSE instructions, but at first took a subset of it that did not require the new XMM registers (see near the bottom of the PDF), which became known as EMMX. That included for example pavgb mm0, mm1 (but not pavgb xmm0, xmm1), and also sfence.

All instructions that are in EMMX are also in SSE, processors that support SSE can execute EMMX code regardless of whether they "explicitly" support EMMX (which has a dedicated CPUID feature flag). The Zen 1 aka Summit Ridge you linked, supports EMMX implicitly: it does not have the corresponding feature flag set, but since it supports SSE, it also ends up supporting EMMX. Before Zen, AMD processors with SSE used to set the EMMX feature flag as well.

Source https://stackoverflow.com/questions/71340491

QUESTION

Use trio nursery as a generator for Sever Sent Events with FastAPI?

Asked 2022-Jan-20 at 21:39

I'm trying to build a Server-Sent Events endpoint with FastAPI but I'm unsure if what I'm trying to accomplish is possible or how I would go about doing it.

Introduction to the problem

Basically let's say I have a run_task(limit, task) async function that sends an async request, makes a transaction, or something similar. Let's say that for each task run_task can return some JSON data.

I'd like to run multiple tasks (multiple run_task(limit, task)) asynchronously, to do so I'm using trio and nurseries like so:

...

ANSWER

Answered 2022-Jan-16 at 23:27

Solution with websockets

I decided to ultimately go with websockets rather than SSE, as I realised I needed to pass an object as data to my endpoint, and while SEE can accept query params, dealing with objects as query parameters was too much of a hassle.

websockets with FastAPI are based on starlette, and are pretty easy to use, implementing them to the problem above can be done like so:

Source https://stackoverflow.com/questions/70665879

QUESTION

Compling Rust on Mac M1 for target x86_64 linux

Asked 2022-Jan-18 at 17:25

I'm trying to compile my Rust code on my M1 Mac for a x86_64 target with linux. I use Docker to achieve that.

My Dockerfile:

...

ANSWER

Answered 2022-Jan-18 at 17:25

It looks like the executable is actually named x86_64-linux-gnu-gcc, see https://packages.debian.org/bullseye/arm64/gcc-x86-64-linux-gnu/filelist.

Source https://stackoverflow.com/questions/70755856

QUESTION

Why does this program print characters repeatedly, when they only appear in heap memory once?

Asked 2021-Dec-31 at 23:21

I wrote a small program to explore out-of-bounds reads vulnerabilities in C to better understand them; this program is intentionally buggy and has vulnerabilities:

...

ANSWER

Answered 2021-Dec-31 at 23:21

Since stdout is line buffered, putchar doesn't write to the terminal directly; it puts the character into a buffer, which is flushed when a newline is encountered. And the buffer for stdout happens to be located on the heap following your heap_book allocation.

So at some point in your copy, you putchar all the characters of your secretinfo method. They are now in the output buffer. A little later, heap_book[i] is within the stdout buffer itself, so you encounter the copy of secretinfo that is there. When you putchar it, you effectively create another copy a little further along in the buffer, and the process repeats.

You can verify this in your debugger. The address of the stdout buffer, on glibc, can be found with p stdout->_IO_buf_base. In my test it's exactly 160 bytes past heap_book.

Source https://stackoverflow.com/questions/70545319

QUESTION

Clarifications about SIMD in C

Asked 2021-Dec-19 at 15:28

This is what I know about SIMD. Single-instruction-multiple-data is a way of processing data that performs the same instruction over vectors of multiple values. SIMD is implemented at different levels depending on the processor of the machine (SSE, SSE2, NEON...), and every level provides a different instruction set.

We can use these instructions sets by including immintrin.h. What I haven't really understood is: when actually developing something with SIMD, should we care about checking which instruction sets are supported? What are the best practices when developing such programs? What should we do if, for example, an instruction set is not supported; should we provide a non-SIMD alternative or the compiler unvectorises the whole thing for us?

...

ANSWER

Answered 2021-Dec-19 at 11:10

Of course we need to take care which ISA is supported, because if we use an unknown instruction then the program will be killed with a non-supported instruction signal. Besides it allows us to optimize for each architecture, for example on CPUs with AVX-512 we can use AVX-512 for better performance, but if on an older CPU then we can fallback to the appropriate version for that architecture

What are the best practices when developing such programs?

There are no general best practices. It depends on each situation because each compiler has different tools for this

If your compiler doesn't support dynamic dispatching then you need to write separate code for each ISA and call the corresponding version for the current platform
Some compilers automatically dispatch to the version optimized for the running platform, for example ICC can compile a hot loop to separate versions of SSE/AVX/AVX-512 and jump to the correct version for maximum performance.
Some other compilers support compiling to separate versions of a single function and automatically dispatch but you need to specify which function you want to optimize. For example in GCC, Clang and ICC you can use the attributes target and target_clones. See Building backward compatible binaries with newer CPU instructions support

Source https://stackoverflow.com/questions/70410806

QUESTION

What is the purpose of the MoveMask for SSE and AVX

Asked 2021-Dec-14 at 19:41

Questions

What is the purpose or intention of a MoveMask?
What's the best place to learn how to use x86/x86-64 assembly/SSE/AVX?
Could I have written my code more efficiently?

Reason for Questions

I have an function written in F# for .NET that uses SSE2. I've written the same thing using AVX2 but the underlying question is the same. What is the intended purpose of a MoveMask? I know that it works for my purposes, I want to know why.

I am iterating through two 64-bit float arrays, a and b, testing that all of their values match. I am using the CompareEqual method (which I believe is wrapping a call to __m128d _mm_cmpeq_pd) to compare several values at a time. I then compare that result with a Vector128 of 0.0 64-bit float. My reasoning is that the result of CompareEqual will give a 0.0 value in the cases where the values don't match. Up to this point, it makes sense.

I then use the Sse2.MoveMask method on the result of the comparison with the zero vector. I've previously worked on using SSE and AVX for matching and I saw examples of people using MoveMask for the purpose for testing for non-zero values. I believe this method is using the int _mm_movemask_epi8 Intel intrinsic. I have included the F# code and the assembly that is JITed.

Is this really the intention of a MoveMask or is it just a happy coincidence it works for these purposes. I know my code works, I want to know WHY it works.

F# Code ...

ANSWER

Answered 2021-Nov-08 at 05:02

MoveMask just extracts the high bit of each element into an integer bitmap. You have 3 element-size options: movmskpd (64-bit), movmskps (32-bit), and pmovmskb (8-bit).

This works well with SIMD compares, which produce an output that has all-zero when the predicate is false, all-one bits in elements where the predicate is true. All-ones is a bit-pattern for -QNaN if interpreted as an IEEE-FP floating-point value, but normally you don't do that. Instead movemask, or AND, (or AND / ANDN / OR or _mm_blend_pd) or things like that with a compare result.

movemask(v) != 0, movemask(v) == 0x3, or movemask(v) == 0 is how you check conditions like at least one element in a compare matched, or all matched, or none matched, respectively, where v is the result of _mm_cmpeq_pd or whatever. (Or just to extract signs directly without a compare).

For other element sizes, 0xf or 0xffff to match all four or all 16 bits. Or for AVX 256-bit vectors, twice as many bits, up to filling a whole 32-bit integer with vpmovmskb eax, ymm0.

What you're doing is really weird, using a 0.0 / NaN compare result as the input to another compare with vcmpeqpd xmm1, xmm1, xmm2 / vcmpeqpd xmm1, xmm1, xmm0. For the 2nd comparison, that can only be true for elements that are == 0.0 (i.e. +-0.0), because x == NaN is false for every x.

If the second vector is a constant zero (let zeroTest = Sse2.CompareEqual (comparison, zeroVector), that's pointless, you're just inverting the compare result which you could have done by checking a different integer condition or against a different constant, not doing runtime comparisons. (0.0 == 0.0 is true, producing an all-ones output, 0.0 == -NaN is false, producing an all-zero output.)

To learn more about intrinsics and SIMD, see for example Agner Fog's optimization guide; his asm guide has a chapter on SIMD. Also, his VectorClass library for C++ has some useful wrappers, and for learning purposes seeing how those wrapper functions implement some basic things could be useful.

To learn what things actually do, see Intel's intrinsics guide. You can search by asm instruction or C++ intrinsic name.

I think MS has docs for their C# System.Runtime.Intrinsics.X86, and I assume F# uses the same intrinsics, but I don't use either language myself.

Related re: comparisons:

Check that at least 1 element is true in each of multiple vectors of compare results - horizontal OR then AND
Get the last line separator - pcmpeqb -> pmovmskb -> bsr to find the position of the last match element in a vector of compare results. Bit-scan reverse on the compare mask. Often you want to scan forward to find the first match (or invert and find first mismatch, like for memcmp). e.g. Compare 16 byte strings with SSE
Or popcount them if you're counting occurrences by matching against a loop-invariant vector of a broadcasted character: How can I count the occurrence of a byte in array using SIMD? - instead of movemask, use the compare result as integer 0 / -1. SIMD subtract from a vector accumulator in the inner loop, then horizontal sum of integer elements in an outer loop.
SIMD instructions for floating point equality comparison (with NaN == NaN) - useful exercise in understanding how NaNs work.

Source https://stackoverflow.com/questions/69878534

QUESTION

Puppeteer not working NodeJS 17 on Arch Linux

Asked 2021-Nov-28 at 07:25

I've started working with Puppeteer and for some reason I cannot get it to work on my box. This error seems to be a common problem (SO1, SO2) but all of the solutions do not solve this error for me. I have tested it with a clean node package (see reproduction) and I have taken the example from the official Puppeteer 'Getting started' webpage.

How can I resolve this error?

Versions and hardware ...

ANSWER

Answered 2021-Nov-24 at 18:42

There's too much for me to put this in a comment, so I will summarize here. Maybe it will help you, or someone else. I should also mention this is for RHEL EC2 instances behind a corporate proxy (not Arch Linux), but I still feel like it may help. I had to do the following to get puppeteer working. This is straight from my docs, but I had to hand-jam the contents because my docs are on an intranet.

I had to install all of these libraries manually. I also don't know what the Arch Linux equivalents are. Some are duplicates from your question, but I don't think they all are:
pango libXcomposite libXcursor libXdamage libXext libXi libXtst cups-libs libXScrnSaver libXrandr GConf2 alsa-lib atk gtk3 ipa-gothic-fonts xorg-x11-fonts-100dpi xorg-x11-fonts-75dpi xorg-x11-utils xorg-x11-fonts-cyrillic xorg-x11-fonts-Type1 xorg-x11-fonts-misc liberation-mono-fonts liberation-narrow-fonts liberation-narrow-fonts liberation-sans-fonts liberation-serif-fonts glib2

If Arch Linux uses SELinux, you may also have to run this:
setsebool -P unconfirmed_chrome_sandbox_transition 0

It is also worth adding dumpio: true to your options to debug. Should give you a more detailed output from puppeteer, instead of the generic error. As I mentioned in my comment. I have this option ignoreDefaultArgs: ['--disable-extensions']. I can't tell you why because I don't remember. I think it is related to this issue, but also could be related to my corporate proxy.

Source https://stackoverflow.com/questions/70032857

QUESTION

What is the syntax of "align" keyword/instruction in x86 assembly?

Asked 2021-Nov-18 at 18:01

As far as I understand, some objects in the "data" section sometimes need alignment in x86 assembly.

An example I've come across is when using movaps in x86 SSE: I need to load a special constant for later xors into an XMM register.

The XMM register is 128 bits wide and I need to load a 128-bit long memory location into it, that would also be aligned at 128 bits.

With trial and error, I've deduced that the code I'm looking for is:

...

ANSWER

Answered 2021-Nov-18 at 18:01

In which assembly flavors do I use .align instead of align?

Most notably the GNU assembler (GAS) uses .align, but every assembler can have its own syntax. You should check the manual of whatever assembler you are actually using.

Do I need to write this keyword/instruction before every data object or is there a way to write it just once?

You don't need to write it before each object if you can keep track of the alignment as you go. For instance, in your example, you wrote align 16 and then assembled 4 dwords of data, which is 16 bytes. So following that data, the current address is again aligned to 16 and another align 16 would be unnecessary (though of course harmless). You could write something like

Source https://stackoverflow.com/questions/70018405

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install sse

You can download it from GitHub.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: