WIP: `json_encode`, and `vim_snprintf`, loses floating point precision when using `%g` #15902

S0AndS0 · 2024-10-20T16:49:05Z

⚠️ This breaks echo and messes with other Vim Ex mode commands, ex;

echo json_encode({'value':0.5670403838157654})

Results could be;

{"value":5.068898e-310}

{"value":1.237282e-313}

{"value":0.0}

:echo json_decode('{"value":0.5670403838157654}')

Results, regardless of float size, always be;

{'value': 0.0}

... Side note; printf within Vim Ex mode also has inconsistent behaviors.

Questions

Is this worth pursuing? And if so;
- What causes the inconsistencies?
- Where else do I need to apply updates?
- Is there a convenient way to avoid/expand e notation?
- Any pointers on adding some unit tests?

Story time

I'm writing a plugin, aimed at Vim8 compatibility, that makes calls to an API that I do not control outputs of. And the API returns JSON that contains large floats, example;

{
  "embedding": [
    0.5670403838157654,
    0.009260174818336964,
    0.23178744316101074,
    -0.2916173040866852,
    -0.8924556970596313,
    0.8785552978515625,
    -0.34576427936553955,
    0.5742510557174683,
    -0.04222835972905159,
    -0.137906014919281
  ]
}

... As of Vim version 9.1.1-785 packaged by Arch, I use Arch (BTW™), these values are truncated to about five digits past the decimal point, eg.

:echo json_decode('{"embedding":[0.5670403838157654]}')

Result: {'embedding': [0.56704]}

Oddly it looks like this might be a echo bug, because wrapping with printf and using a large limit mostly preserves precision of input floats... mostly

:echo printf('%.25g', json_decode('{"embedding":[0.5670403838157654]}').embedding[0])

Result: 0.5670403838157653808593750

Notice, we lost the last 4 and picked-up a bunch of extra 3808593750 🤷

Warning: this breaks `json_decode`, and does less than good things to `json_encode` Vim functions! ```vim echo json_encode({'value':0.5670403838157654}) ``` > Results could be; > > - `{"value":5.068898e-310}` > - `{"value":1.237282e-313}` > - `{"value":0.0}` ```vim :echo json_decode('{"value":0.5670403838157654}') ``` > Results, regardless of float size, always be; > > - `{'value': 0.0}`

tonymec · 2024-10-21T16:21:45Z

I can't find it back in the Vim help, but IIRC :echo has always displayed Floats with some standardized default precision of 6 significant digits or so regardless of what the hardware could handle. With printf(), OTOH, it is possible to specify the precision, including a better precision than what the hardware affords, in which case "anything" will be displayed beyond the hardware limit. For instance if you try to use printf() to display 1.0 / 3.0 with tremendous precision, you'll notice that at some point it ceases to be a string of repetitions of the digit 3.

chrisbra · 2024-10-21T16:22:52Z

It currently is at :h floating-point-precision

tonymec · 2024-10-21T16:25:20Z

It currently is at :h floating-point-precision

Ah, thanks. Yes, this is what I meant.

chrisbra · 2024-10-21T18:59:25Z

Isn't that just a display issue only? In which case I think we don't even need to worry about?

S0AndS0 · 2024-10-21T20:29:34Z

I agree that echo be a non-issue to worry about, and the main thing I'm currently concerned with is preserving precision whenever possible; especially in regards to JSON encode/decode operations.

Double-checking myself on Vim version 9.1.1-785, after learning that echo for sure cannot be trusted, it looks like the decoding is fine, ex.

let json_stringy = '{ "value": "0.009260174818336964" }'
let json_floater = '{ "value": 0.009260174818336964 }'

let parsed_stringy = printf('%s', json_decode(json_stringy).value)
let parsed_floater = printf('%.18g', json_decode(json_floater).value)

echo parsed_stringy == parsed_floater
"> 1
echo len(parsed_stringy)
"> 20
echo len(parsed_floater)
"> 20

... But things get funky when re-encoding;

echo json_encode(json_decode(json_stringy))
"> {"value":"0.009260174
8000
818336964"}

echo json_encode(json_decode(json_floater))
"> {"value":0.00926}

So I may need to revert previously proposed changes and instead fixate on what json_encode is doing to floats.

Update

I reverted changes, because it looks like the %g is the spot where floats are losing precision;

File: src/json.c
Function: json_encode_item
Line: 428 (as of commit c73fc86bf8fe318aab41900dd92e380143211cda)

		vim_snprintf((char *)numbuf, NUMBUFLEN, "%g",
							   val->vval.v_float);

... And indeed hard-coding an explicitly long length gets me closer to joy, eg.

-		vim_snprintf((char *)numbuf, NUMBUFLEN, "%g",
+		vim_snprintf((char *)numbuf, NUMBUFLEN, "%.20g",

After re-compiling, tests show only the last digit is failing to retain precision!!!

let json_floater = '{ "value": 0.009260174818336964 }'
echo json_encode(json_decode(json_floater))
"> {"value":0.00926017481833696365}

Digging deeper into the source it looks like %g, and friends, behaviors are defined in;

File: src/strings.c
Function: vim_vsnprintf_typval
Lines: 3691 through 3852

... But at this moment I'm not feeling clever enough for that.

Side note

For those following along at home, here are the configuration and make commands I've been using;

cd ~/git/hub/vim/vim

./configure --enable-gui=gtc3\
            --enable-luainterp=yes\
            --enable-pythoninterp=yes\
            --with-python-config-dir="$(python-config --configdir)"\
            --enable-python3interp=yes\
            --with-python3-config-dir="$(python3-config --configdir)"\
            --enable-perlinterp=yes\
            --enable-rubyinterp=yes\
            --enable-cscope\
            --enable-farsi\
            --enable-fontset\
            --enable-gpm\
            --enable-largefile\
            --enable-multibyte\
            --enable-rightleft\
            --enable-terminal\
            --enable-xim\
            --with-features=huge\
            --with-compiledby='S0AndS0'\
            --prefix='/usr/local'

_last_version="$(git tag --list | tail -n1)"
_date_time="$(date +%Y%m%d_%H%M%S)"
_new_version="${_last_version}-${_date_time}"
make -j$(nproc) VIMRUNTIMEDIR="/usr/local/share/vim/vim_${_new_version}"

I think at this point the big questions I have are;

What would be the preferred way of dynamically defining %g length based on input?
Is it possible to get that last bit of floating precision, or is 18 characters past the decimal too much?

Update 2

I figured it'd be easier to play around with possible dynamic precision via a separate file;

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define NUMBUFLEN 65
#define MIN(x, y) ( (x < y) ? x : y )

/**
 * Attributions:
 * - https://stackoverflow.com/questions/3437404/min-and-max-in-c
 * - https://stackoverflow.com/questions/1745726/how-to-store-printf-into-a-variable
 * - https://stackoverflow.com/questions/4824/string-indexof-function-in-c
 */
int main() {
	// Assume we have conversions to and fro
	double f = 0.41968;
	char *string = "0.41968";

	// Get the number of characters beyond decimal point
	char *ptr = strchr(string, '.');
	int index = ptr - string;
	int float_size = strlen(string) - (index + 1);

	// Create a printf compatible format string
	int precision = MIN(NUMBUFLEN, float_size);
	char *buf;
	size_t sz;
	sz = snprintf(NULL, 0, "%%.%ig", precision);
	buf = (char *)malloc(sz + 1);
	snprintf(buf, sz+1, "%%.%ig", precision);

	// Debugging stuff commented out
	/* printf("Index: %i\nFloat size: %i\n", index, float_size); */
	/* printf("Precision: %i\n", precision); */
	/* printf("Format: %s\n", buf); */

	// Show we got the float and no more!
	printf(buf, f);
	printf("\n");

	return 0;
}

... And though it works, I'm not sure it really sparks joy from a future maintenance perspective 🤔

Update 3

After much fiddling about on my end it looks as though dynamically defining %.<size>g is gonna be a losing battle if the underlying type of double isn't also changed :-(

Also the whole dynamic sizing thing is likely to be a point of future frustration, for various reasons, so at this point I'm thinking a data-structure may be the safest way of preventing loss of precision; though likely at the cost of more memory.

Manual testing shows this seems to _mostly_ work... _mostly_... Mainly it fails when echoing a dictionary; ```vim echo {'floater': 0.419} ``` > `{'floater': 0.00000000000000000000}` ... But, partially succeeds when accessing a value!? ```vim echo {'floater': 0.419}.floater ``` > `6.95323265041282803632e-310` On a positive note, we can now use `%L` with `f`, `g`, and friends! ```vim echo printf('%.18Lg', {'value':0.00926017481833696365}.value) ``` > `0.009260174818336964` ... Which proves some of these changes function as intended X-D

zeertzjq · 2024-10-24T23:45:49Z

Please do not change version.c in a PR

S0AndS0 · 2024-10-24T23:47:29Z

I think I got most of the bits for allowing long double (AKA long floats) usage!

Only bit that is being a stinker are print/echo-ing dictionaries as a whole, eg.

echo {'floater': 0.419}
"> `{'floater': 0.00000000000000000000}`
echo {'floater': 0.419}.floater
"> `6.95323265041282803632e-310`

... But if feels close to doable, closer at least, and it seems printf is now properly passing the long variants of floats;

echo printf('%.18Lg', {'value':0.00926017481833696365}.value)
"> `0.009260174818336964`

TLDR I'd much appreciate a pointer to where I'm messing thing up when a dictionary''

S0AndS0 · 2024-10-24T23:48:05Z

Please do not change version.c in a PR

Uh-oh 🤦 ... I'll try to get that reverted soon

Edit

Okay I think after a bit of git reflog and git reset --hard HEAD@{<number>} I was able to recover from that mistake! Things be building without any new issues.

src/vim9execute.c

@@ -6891,7 +6891,16 @@
 				   get_var_special_name(iptr->isn_arg.number));
 		break;
 	    case ISN_PUSHF:
-		smsg("%s%4d PUSHF %g", pfx, current, iptr->isn_arg.fnumber);
+#if defined(__LP64__)
+		smsg("%s%4d PUSHF %.*Lg", FLOAT_PRECISION, pfx, current,


chrisbra marked this pull request as draft October 21, 2024 06:52

S0AndS0 changed the title ~~WIP: Conditionally support long double type for float_T~~ WIP: json_encode, and vim_snprintf, loses floating point precision when using %g Oct 22, 2024

S0AndS0 added 3 commits October 24, 2024 16:24

Define a precision for floats

4fa2be6

Add long double AKA _long float_ type

7b09d09

S0AndS0 force-pushed the wip_json-floats branch 2 times, most recently from 0d6b7c1 to dc05bdb Compare October 25, 2024 00:52

github-advanced-security bot found potential problems Oct 25, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

WIP: `json_encode`, and `vim_snprintf`, loses floating point precision when using `%g` #15902

WIP: `json_encode`, and `vim_snprintf`, loses floating point precision when using `%g` #15902

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Check failure

Uh oh!

Uh oh!

WIP: json_encode, and vim_snprintf, loses floating point precision when using %g #15902

Are you sure you want to change the base?

WIP: json_encode, and vim_snprintf, loses floating point precision when using %g #15902

Uh oh!

Conversation

Questions

Story time

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Update

Side note

Update 2

Update 3

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Edit

Uh oh!

Check failure

Uh oh!

WIP: `json_encode`, and `vim_snprintf`, loses floating point precision when using `%g` #15902

WIP: `json_encode`, and `vim_snprintf`, loses floating point precision when using `%g` #15902