ieee754 | Simple JavaScript-based IEEE 754 Encoders and Decoders

 by   rtoal JavaScript Version: Current License: No License

kandi X-RAY | ieee754 Summary

kandi X-RAY | ieee754 Summary

ieee754 is a JavaScript library. ieee754 has no bugs, it has no vulnerabilities and it has low support. You can download it from GitHub.

A single-page web application containing both an IEEE 754 Encoder (for encoding a decimal value into its IEEE-754 single and double precision representation), and an IEEE 754 Decoder (for coverting a 32 or 64-bit hexidecimal representation into a decimal value). The application works entirely in JavaScript; there is no need for a server. You will need a modern browser, as the JavaScript code uses Uint8Array and friends. This application incorporates Michael Mclaughlin's big.js library.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              ieee754 has a low active ecosystem.
              It has 8 star(s) with 3 fork(s). There are 3 watchers for this library.
              OutlinedDot
              It had no major release in the last 6 months.
              ieee754 has no issues reported. There are no pull requests.
              It has a neutral sentiment in the developer community.
              The latest version of ieee754 is current.

            kandi-Quality Quality

              ieee754 has no bugs reported.

            kandi-Security Security

              ieee754 has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

            kandi-License License

              ieee754 does not have a standard license declared.
              Check the repository for any license declaration and review the terms closely.
              OutlinedDot
              Without a license, all rights are reserved, and you cannot use the library in your applications.

            kandi-Reuse Reuse

              ieee754 releases are not available. You will need to build from source code and install.

            Top functions reviewed by kandi - BETA

            kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
            Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of ieee754
            Get all kandi verified functions for this library.

            ieee754 Key Features

            No Key Features are available at this moment for ieee754.

            ieee754 Examples and Code Snippets

            Check if x is closed .
            pythondot img1Lines of Code : 89dot img1License : Non-SPDX (Apache License 2.0)
            copy iconCopy
            def embed_check_integer_casting_closed(x,
                                                   target_dtype,
                                                   assert_nonnegative=True,
                                                   name="embed_check_casting_closed"):
              """Ensures int  

            Community Discussions

            QUESTION

            For IEEE-754 floating point arithmetic, is the mantissa in [0.5, 1) or in [1, 2)?
            Asked 2021-Jun-02 at 10:47

            I was looking at several textbooks, including Numerical Linear Algebra by Trefethen and Bau, and in the section on floating point arithmetic, they seem to say that in IEEE-754, normalized floating point numbers take the form .1.... X 2^e. That is, the mantissa is assumed to be between 0.5 and 1.

            However, in this popular online floating point calculator, it is explained that normalized floating point numbers have a mantissa between 1 and 2.

            Could someone please tell me which is the correct way?

            ...

            ANSWER

            Answered 2021-Jun-02 at 10:47

            The following sets are identical:

            • { (−1)sf•2e | s ∈ {0, 1}, f is the value of a 24-bit binary numeral with a radix point after the first digit, and e is an integer such that −126 ≤ e ≤ 127 }.
            • { (−1)sf•2e | s ∈ {0, 1}, f is the value of a 24-bit binary numeral with a radix point before the first digit, and e is an integer such that −125 ≤ e ≤ 128 }.
            • { (−1)sf•2e | s ∈ {0, 1}, f is the value of a 24-bit binary numeral with a radix point after the last digit, and e is an integer such that −149 ≤ e ≤ 104 }.
            • { f•2e | f is an integer such that |f| < 224, and e is an integer such that −149 ≤ e ≤ 104 }.

            In other words, we may put the radix point anywhere in the significand we want, simply by adjusting the range of the exponent to compensate. Which form to use may be chosen for convenience or preference.

            The third form scales the significand so it is an integer, and the fourth form incorporates the sign into the significand. This form is convenient for using number theory to analyze floating-point behavior.

            IEEE 754 generally uses the first form. It refers to this as “a scientific form,” reflecting the fact that, in scientific notation, we commonly write numbers with a radix point just after the first digit, as in the mass of the Earth is about 5.9722•1024 kg. In clause 3.3, IEEE 754-2008 mentions “It is also convenient for some purposes to view the significand as an integer; in which case the finite floating-point numbers are described thus:”, followed by text equivalent to the third form above except that it is generalized (the base and other parameters are arbitrary values for any floating-point format rather than the constants I used above specifically for the binary32 format).

            Source https://stackoverflow.com/questions/67797756

            QUESTION

            Why does BitConverter seemingly return incorrect results when converting floats and bytes?
            Asked 2021-May-09 at 21:48

            I'm working in C# and attempting to pack four bytes into a float (the context is game development, where an RGBA color is packed into a single value). To do this, I'm using BitConverter, but certain conversions seem to result in incorrect bytes. Take the following example (using bytes 0, 0, 129, 255):

            ...

            ANSWER

            Answered 2021-May-09 at 21:48

            After research, experimentation, and discussion with friends, the root cause of this behavior (bytes changing when converted to and from a float) seems to be signaling vs. quiet NaNs (as Hans Passant also pointed out in a comment). I'm no expert on signaling and quiet NaNs, but from what I understand, quiet NaNs have the highest-order bit of the mantissa set to one, while signaling NaNs have that bit set to zero. See the following image (taken from https://www.h-schmidt.net/FloatConverter/IEEE754.html) for reference. I've drawn four colored boxes around each group of eight bits, as well as an arrow pointing to the highest-order mantissa bit.

            Of course, the question I posted wasn't about floating-point bit layout or signaling vs. quiet NaNs, but simply asking why my encoded bytes were seemingly modified. The answer is that the C# runtime (or at least I assume it's the C# runtime) internally converts all signaling NaNs to quiet, meaning that the byte encoded at that position has its second bit swapped from zero to one.

            For example, the bytes 0, 0, 129, 255 (encoded in the reverse order, I think due to endianness) puts the value 129 in the second byte (the green box). 129 in binary is 10000001, so flipping its second bit gives 11000001, which is 193 (exactly what I saw in my original example). This same pattern (the encoded byte having its value changed) applies to all bytes in the range 129-191 inclusive. Bytes 128 and lower aren't NaNs, while bytes 192 and higher are NaNs, but don't have their value modified because their second bit (placed at the highest-order mantissa bit) is already one.

            So that answers why this behavior occurs, but in my mind, there are two questions remaining:

            1. Is it possible to disable this behavior (converting signaling NaNs to quiet) in C#?
            2. If not, what's the workaround?

            The answer to the first question seems to be no (I'll amend this answer if I learn otherwise). However, it's important to note that this behavior doesn't appear consistent across all .NET versions. On my computer, NaNs are converted (i.e. my encoded bytes changed) on every .NET Framework version I tried (starting with 4.8.0, then working back down). NaNs appear to not be converted (i.e. my encoded bytes did not change) in .NET Core 3 and .NET 5 (I didn't test every available version). In addition, a friend was able to run the same sample code on .NET Framework 4.7.2, and surprisingly, the bytes were not modified on his machine. The internals of different C# runtimes isn't my area of expertise, but suffice to say there's variance among versions and computers.

            The answer to the second question is to, as others have suggested, simply avoid the float conversion entirely. Instead, each set of four bytes (representing RGBA colors in my case) can either be encoded in an integer or added to a byte array directly.

            Source https://stackoverflow.com/questions/67453428

            QUESTION

            Why is 5726718050568503296 truncated in JS
            Asked 2021-Apr-23 at 11:15

            As per the standard ES implements numbers as IEEE754 doubles.

            And per https://www.binaryconvert.com/result_double.html?decimal=053055050054055049056048053048053054056053048051050057054 and other programming languages https://play.golang.org/p/5QyT7iPHNim it looks like the 5726718050568503296 value can be represented exactly without losing precision.

            Why it loses 3 significant digits in JS (reproduced in latest stable google chrome and firefox)

            This question was triggered initially from the replicate javascript unsafe numbers in golang

            The value is definitely representible in double IEEE754, see how naked bits are converted to a float64 in Go: https://play.golang.org/p/zMspidoIh2w

            ...

            ANSWER

            Answered 2021-Apr-23 at 11:15

            The default rule for JavaScript when converting a Number value to a decimal numeral is to use just enough digits to distinguish the Number value. Specifically, this arises from step 5 in clause 7.1.12.1 of the ECMAScript 2017 Language Specification, per the linked answer. (It is 6.1.6.1.20 in the 2020 version.)

            So while 5,726,718,050,568,503,296 is representable, printing it yields “5726718050568503000” because that suffices to distinguish it from the neighboring representable values, 5,726,718,050,568,502,272 and 5,726,718,050,568,504,320.

            You can request more precision in the conversion to string with .toPrecision, as in x.toPrecision(21).

            Source https://stackoverflow.com/questions/67222143

            QUESTION

            Why implement lround specifically using integer math?
            Asked 2021-Apr-20 at 00:10

            I noticed that the C++ standard library has separate functions for round and lround rather than just having you use long(round(x)) for the latter.

            Looking into the implementation in glibc, I find that indeed, for platforms using IEEE754 floating point, the version that returns an integer will directly manipulate the bits from within the floating point representation, and not do the rounding using floating point operations (e.g. adding ±0.5).

            What is the benefit of having a distinct implementation when you want the result as an integer type? Is this supposed to be faster, or more accurate? If it is better to use integer math on the underlying representation, why not just always do it that way even if returning the result as a double?

            ...

            ANSWER

            Answered 2021-Apr-20 at 00:10

            One reason is that adding .5 is insufficient. Let’s say you add .5 and then truncate to an integer. (How? Is there an instruction for that? Or are you doing more work?) If x is ½−2−54 (the greatest representable value less than ½), adding .5 yields 1, because the mathematical sum, 1−2−54, is exactly halfway between the nearest two representable values, 1−2−53 and 1, and the common default rounding mode, round-to-nearest-ties-to-even, rounds that to 1. But the correct result for lround(x) is 0.

            And, of course, lround is specified to round ties away from zero, regardless of the current rounding mode. You could set the rounding mode, do some arithmetic, and restore the rounding mode, but there are problems with this.

            One is that changing the rounding mode is a typically a time-consuming operation. The rounding mode is a global state that affects most floating-point instructions. So the processor has to ensure all pending instructions complete with the prior mode, change the global state, and ensure all later instructions start after that change.

            If you are lucky, you might have a processor with per-instruction rounding modes or something similar, and then you can use any rounding mode you like without time penalty. Hewlett Packard has some processors like that. However, “round away from zero” is an uncommon mode. Most processors have round-to-nearest-ties-to-even, round toward zero, round down (toward −∞), and round up (toward +∞), and round-to-odd is becoming popular for its value in avoiding double-rounding errors. But round away from zero is rare.

            Another reason is that doing floating-point instructions alters the floating-point status flags and may generate traps, but it is desired that library routines behave as single operations. For example, if we add .5 and rounding occurs, the inexact flag will be raised, since the floating-point addition with .5 produced a result different from the mathematical sum. But to the user of lround, no inexact condition ever occurs; lround is defined to return a value rounded to an integer, and it always does so—within the long range, it never returns a computed result different from its ideal mathematical definition. So if lround(x) raised the inexact flag, that would be incorrect behavior. To avoid it, an implementation that used floating-point instructions would have to save the current floating-point flags, do its work, and restore the flags before returning.

            Source https://stackoverflow.com/questions/67166815

            QUESTION

            Can multiplication of non-zero floats yield exactly zero?
            Asked 2021-Apr-07 at 22:53

            Suppose I have a series of small random floating point numbers in either double or float format which are guaranteed to be non-zero, in a CPU which follows the IEEE754 standard, and I make multiplications between two of these small numbers.

            If both numbers are different from zero but very small (below machine epsilon), is it possible that a multiplication result would yield zero or negative zero, such that if I interpret the result as a C++ boolean, it would translate into false?

            ...

            ANSWER

            Answered 2021-Apr-07 at 22:53

            Yes. You can demonstrate that by experiment:

            Source https://stackoverflow.com/questions/66994997

            QUESTION

            Why does a calculation error occur when this double value is multiplied by 2 in C++?
            Asked 2021-Mar-18 at 14:52
            #include 
            #include 
            #include 
            #include 
            #include 
            #include 
            using namespace std;
            int main()
            {
                uint64_t int_value = 0b0000000000001101000011110100001011000000110111111010111101110011;
                double double_value = (*((double *)((void *)&int_value)));
                printf("double initiate value: %e\n", double_value);
                cout << "sign " << setw(11) << "exp"
                     << " " << setw(52) << "frac" << endl;
                for (int i = 0; i < 10; i++)
                {
                    stringstream ss;
                    ss << bitset<64>((*((uint64_t *)((void *)&double_value))));
                    auto str = ss.str();
                    cout << setw(4) << str.substr(0, 1) << " " << setw(11) << str.substr(1, 11) << " " << str.substr(12, 52) << endl;
                    double_value *= 2;
                }
            }
            
            ...

            ANSWER

            Answered 2021-Mar-18 at 11:14

            You are running into denormalised numbers. When the exponent is zero, but the mantissa is not, then the mantissa is used as is without an implicit leading 1 digit. This is done so that the representation can handle very small numbers that are smaller than what the smallest exponent could represent. So in the first two rows of your example:

            Source https://stackoverflow.com/questions/66687103

            QUESTION

            math operations on currencies (crypto) stored in sqlite as bigint
            Asked 2021-Mar-12 at 17:29

            I'm trying to store cryptocurrencies values inside a sqlite database. I read that is not correct to store those values as float nor double because the loss of precision caused by the IEEE754. For this reason I saved this values as biginteger in my database. (And I multiply or divide by 10^8 or 10^(-8) in my app before reading or storing the values).

            ...

            ANSWER

            Answered 2021-Mar-12 at 17:29

            The key is to just multiply the divident instead of multiplying the result.

            If both total_fiat_amount-commission and crypto_fiat_price are mononitery values with a maximum of two digits after the comma, you don't need to multiply both with 10^8 but only with 10^2.

            In that case, the result would be accurate to 0 decimal points of precision after the comma.

            If you want to have 8 decimal pieces of precision after the comma, you can multiply the divident with 10^8 before running the division.

            If you store total_fiat_amount, commission and crypto_fiat_price in cents, you could use this:

            Source https://stackoverflow.com/questions/66602415

            QUESTION

            Delphi and MSVC do not compare +NAN with zero the same way
            Asked 2021-Feb-01 at 13:24

            I am porting C code to Delphi and find an issue in the way the compilers (Delphi 10.4.1 and MSVC2019, both targeting x32 platform) handle comparison of +NAN to zero. Both compilers use IEEE754 representation for double floating point values. I found the issue because the C-Code I port to Delphi is delivered with a bunch of data to validate the code correctness.

            The original source code is complex but I was able to produce a minimal reproducible application in both Delphi and C.

            C-Code:

            ...

            ANSWER

            Answered 2021-Feb-01 at 13:24

            First of all, your Delphi program does not behave as you describe, at least on the Delphi version readily available to me, XE7. When your program is run, an invalid operation floating point exception is raised. I'm going to assume that you have actually masked floating point exceptions.

            Update: It turns out that at some time between XE7 and 10.3, Delphi 32 bit codegen switched from fcom to fucom which explains why XE7 sets the IA floating point exception, but 10.3 does not.

            Your Delphi code is very far from minimal. Let's try to make a truly minimal example. And let's look at other comparison operators.

            Source https://stackoverflow.com/questions/65991239

            QUESTION

            Converting from IEEE-754 to Fixed Point with nearest rounding
            Asked 2021-Jan-30 at 23:46

            I am implementing a converter for IEEE 754 32 bits to a Fixed point with S15.16 in a FPGA. The IEEE-754 standard represent the number as:

            Where s represent the sign, exp is the exponent denormalized and m is the mantissa. All these values ​​separately are represented in fixed point.

            Well, the simplest way is take the IEEE-754 value and multiplies by 2**16. Finally, round it to the nearest to get the less error in truncation.

            Problem: I'm doing in a FPGA device, so, I can't do it in this way.

            Solution: Use the binary representations from values to perform the conversion via bitwise operations

            From the previous expression, and with the condition of the exponent and mantissa are in fixed point, logic says me that I can perform as this:

            Because powers of two are shifts in fixed point, is possible to rewrite the expression as (with Verilog notation):

            ...

            ANSWER

            Answered 2021-Jan-30 at 23:46

            The ISO-C99 code below demonstrates one possible way of doing the conversion. The significand (mantissa) bits of the binary32 argument form the bits of the s15.16 result. The exponent bits tell us whether we need to shift these bits right or left to move the least significant integer bit to bit 16. If a left shift is required, rounding is not needed. If a right shift is required, we need to capture any less significant bits discarded. The most significant discarded bit is the round bit, all others collectively represent the sticky bit. Using the literal definition of the rounding mode, we need to round up if (1) either the round bit and the sticky bit are set, or (2) the round bit is set and the sticky bit clear (i.e., we have a tie case), but the least significant bit of the intermediate result is odd.

            Note that real hardware implementations often deviate from such a literal application of the rounding-mode logic. One common scheme is to first increment the result when the round bit is set. Then, if such an increment occurred, clear the least significant bit of the result if the sticky bit is not set. It is easy to see that this achieves the same effect by enumerating all possible combinations of round bit, sticky bit, and result LSB.

            Source https://stackoverflow.com/questions/65970751

            QUESTION

            C++ Convert 4 Hex Values to Float
            Asked 2021-Jan-30 at 12:42

            I am trying to convert 4 Hex Values to a float. The Hex Values are for example 3F A0 00 00. In binary representation they would correspond to 00111111 10100000 00000000 00000000. If these 4 binary values are interpreted as one 32-bit float (accordingly to IEEE754) the decimal value of the float should be 1,25. However i am struggling to automaticaly make the conversion from hex values to decimal float in C++ (I am using Qt as Framework).

            Can anybody help me please?

            Thanks!

            ...

            ANSWER

            Answered 2021-Jan-27 at 13:47
            #include 
            #include 
            using namespace std;
            
            int main()
            {
                unsigned char bytes[] = {0x3F, 0xA0, 0x00, 0x00};
                
                uint32_t x = bytes[0];
                for (int i = 1; i < std::size(bytes); ++i) x = (x << 8) | bytes[i];
            
                static_assert(sizeof(float) == sizeof(uint32_t), "Float and uint32_t size dont match. Check another int type");
                
                float f{};
                memcpy(&f, &x, sizeof(x));
                // or since C++20 if available: float f = std::bit_cast(x)
                
                cout << "f = " << f << endl;
            }
            

            Source https://stackoverflow.com/questions/65920107

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install ieee754

            You can download it from GitHub.

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            CLONE
          • HTTPS

            https://github.com/rtoal/ieee754.git

          • CLI

            gh repo clone rtoal/ieee754

          • sshUrl

            git@github.com:rtoal/ieee754.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link

            Consider Popular JavaScript Libraries

            freeCodeCamp

            by freeCodeCamp

            vue

            by vuejs

            react

            by facebook

            bootstrap

            by twbs

            Try Top Libraries by rtoal

            ple

            by rtoalShell

            plainscript

            by rtoalTypeScript

            iki

            by rtoalJava

            tiger-compiler

            by rtoalJavaScript

            manatee-subset

            by rtoalJava