8000 WIP: `json_encode`, and `vim_snprintf`, loses floating point precision when using `%g` by S0AndS0 · Pull Request #15902 · vim/vim · GitHub
[go: up one dir, main page]

Skip to content

WIP: json_encode, and vim_snprintf, loses floating point precision when using %g #15902

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

S0AndS0
Copy link
Contributor
@S0AndS0 S0AndS0 commented Oct 20, 2024

⚠️ This breaks echo and messes with other Vim Ex mode commands, ex;

echo json_encode({'value':0.5670403838157654})

Results could be;

  • {"value":5.068898e-310}
  • {"value":1.237282e-313}
  • {"value":0.0}
:echo json_decode('{"value":0.5670403838157654}')

Results, regardless of float size, always be;

  • {'value': 0.0}

... Side note; printf within Vim Ex mode also has inconsistent behaviors.

Questions

  • Is this worth pursuing? And if so;
    • What causes the inconsistencies?
    • Where else do I need to apply updates?
    • Is there a convenient way to avoid/expand e notation?
    • Any pointers on adding some unit tests?

Story time

I'm writing a plugin, aimed at Vim8 compatibility, that makes calls to an API that I do not control outputs of. And the API returns JSON that contains large floats, example;

{
  "embedding": [
    0.5670403838157654,
    0.009260174818336964,
    0.23178744316101074,
    -0.2916173040866852,
    -0.8924556970596313,
    0.8785552978515625,
    -0.34576427936553955,
    0.5742510557174683,
    -0.04222835972905159,
    -0.137906014919281
  ]
}

... As of Vim version 9.1.1-785 packaged by Arch, I use Arch (BTW™), these values are truncated to about five digits past the decimal point, eg.

:echo json_decode('{"embedding":[0.5670403838157654]}')

Result: {'embedding': [0.56704]}

Oddly it looks like this might be a echo bug, because wrapping with printf and using a large limit mostly preserves precision of input floats... mostly

:echo printf('%.25g', json_decode('{"embedding":[0.5670403838157654]}').embedding[0])

Result: 0.5670403838157653808593750

Notice, we lost the last 4 and picked-up a bunch of extra 3808593750 🤷

Warning: this breaks `json_decode`, and does less than good things to
`json_encode` Vim functions!

```vim
echo json_encode({'value':0.5670403838157654})
```
> Results could be;
>
> - `{"value":5.068898e-310}`
> - `{"value":1.237282e-313}`
> - `{"value":0.0}`

```vim
:echo json_decode('{"value":0.5670403838157654}')
```
> Results, regardless of float size, always be;
>
> - `{'value': 0.0}`
@chrisbra chrisbra marked this pull request as draft October 21, 2024 06:52
@tonymec
Copy link
tonymec commented Oct 21, 2024

I can't find it back in the Vim help, but IIRC :echo has always displayed Floats with some standardized default precision of 6 significant digits or so regardless of what the hardware could handle. With printf(), OTOH, it is possible to specify the precision, including a better precision than what the hardware affords, in which case "anything" will be displayed beyond the hardware limit. For instance if you try to use printf() to display 1.0 / 3.0 with tremendous precision, you'll notice that at some point it ceases to be a string of repetitions of the digit 3.

@chrisbra
Copy link
Member

It currently is at :h floating-point-precision

@tonymec
Copy link
tonymec commented Oct 21, 2024

It currently is at :h floating-point-precision

Ah, thanks. Yes, this is what I meant.

@chrisbra
Copy link
Member

Isn't that just a display issue only? In which case I think we don't even need to worry about?

@S0AndS0
Copy link
Contributor Author
S0AndS0 commented Oct 21, 2024

I agree that echo be a non-issue to worry about, and the main thing I'm currently concerned with is preserving precision whenever possible; especially in regards to JSON encode/decode operations.

Double-checking myself on Vim version 9.1.1-785, after learning that echo for sure cannot be trusted, it looks like the decoding is fine, ex.

let json_stringy = '{ "value": "0.009260174818336964" }'
let json_floater = '{ "value": 0.009260174818336964 }'

let parsed_stringy = printf('%s', json_decode(json_stringy).value)
let parsed_floater = printf('%.18g', json_decode(json_floater).value)

echo parsed_stringy == parsed_floater
"> 1
echo len(parsed_stringy)
"> 20
echo len(parsed_floater)
"> 20

... But things get funky when re-encoding;

echo json_encode(json_decode(json_stringy))
"> {"value":"0.009260174
8000
818336964"}

echo json_encode(json_decode(json_floater))
"> {"value":0.00926}

So I may need to revert previously proposed changes and instead fixate on what json_encode is doing to floats.


Update

I reverted changes, because it looks like the %g is the spot where floats are losing precision;

  • File: src/json.c
  • Function: json_encode_item
  • Line: 428 (as of commit c73fc86bf8fe318aab41900dd92e380143211cda)
		vim_snprintf((char *)numbuf, NUMBUFLEN, "%g",
							   val->vval.v_float);

... And indeed hard-coding an explicitly long length gets me closer to joy, eg.

-		vim_snprintf((char *)numbuf, NUMBUFLEN, "%g",
+		vim_snprintf((char *)numbuf, NUMBUFLEN, "%.20g",

After re-compiling, tests show only the last digit is failing to retain precision!!!

let json_floater = '{ "value": 0.009260174818336964 }'
echo json_encode(json_decode(json_floater))
"> {"value":0.00926017481833696365}

Digging deeper into the source it looks like %g, and friends, behaviors are defined in;

  • File: src/strings.c
  • Function: vim_vsnprintf_typval
  • Lines: 3691 through 3852

... But at this moment I'm not feeling clever enough for that.


Side note

For those following along at home, here are the configuration and make commands I've been using;

cd ~/git/hub/vim/vim

./configure --enable-gui=gtc3\
            --enable-luainterp=yes\
            --enable-pythoninterp=yes\
            --with-python-config-dir="$(python-config --configdir)"\
            --enable-python3interp=yes\
            --with-python3-config-dir="$(python3-config --configdir)"\
            --enable-perlinterp=yes\
            --enable-rubyinterp=yes\
            --enable-cscope\
            --enable-farsi\
            --enable-fontset\
            --enable-gpm\
            --enable-largefile\
            --enable-multibyte\
            --enable-rightleft\
            --enable-terminal\
            --enable-xim\
            --with-features=huge\
            --with-compiledby='S0AndS0'\
            --prefix='/usr/local'
_last_version="$(git tag --list | tail -n1)"
_date_time="$(date +%Y%m%d_%H%M%S)"
_new_version="${_last_version}-${_date_time}"
make -j$(nproc) VIMRUNTIMEDIR="/usr/local/share/vim/vim_${_new_version}"

I think at this point the big questions I have are;

  • What would be the preferred way of dynamically defining %g length based on input?
  • Is it possible to get that last bit of floating precision, or is 18 characters past the decimal too much?

Update 2

I figured it'd be easier to play around with possible dynamic precision via a separate file;

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define NUMBUFLEN 65
#define MIN(x, y) ( (x < y) ? x : y )

/**
 * Attributions:
 * - https://stackoverflow.com/questions/3437404/min-and-max-in-c
 * - https://stackoverflow.com/questions/1745726/how-to-store-printf-into-a-variable
 * - https://stackoverflow.com/questions/4824/string-indexof-function-in-c
 */
int main() {
	// Assume we have conversions to and fro
	double f = 0.41968;
	char *string = "0.41968";

	// Get the number of characters beyond decimal point
	char *ptr = strchr(string, '.');
	int index = ptr - string;
	int float_size = strlen(string) - (index + 1);

	// Create a printf compatible format string
	int precision = MIN(NUMBUFLEN, float_size);
	char *buf;
	size_t sz;
	sz = snprintf(NULL, 0, "%%.%ig", precision);
	buf = (char *)malloc(sz + 1);
	snprintf(buf, sz+1, "%%.%ig", precision);

	// Debugging stuff commented out
	/* printf("Index: %i\nFloat size: %i\n", index, float_size); */
	/* printf("Precision: %i\n", precision); */
	/* printf("Format: %s\n", buf); */

	// Show we got the float and no more!
	printf(buf, f);
	printf("\n");

	return 0;
}

... And though it works, I'm not sure it really sparks joy from a future maintenance perspective 🤔


Update 3

After much fiddling about on my end it looks as though dynamically defining %.<size>g is gonna be a losing battle if the underlying type of double isn't also changed :-(

Also the whole dynamic sizing thing is likely to be a point of future frustration, for various reasons, so at this point I'm thinking a data-structure may be the safest way of preventing loss of precision; though likely at the cost of more memory.

@S0AndS0 S0AndS0 changed the title WIP: Conditionally support long double type for float_T WIP: json_encode, and vim_snprintf, loses floating point precision when using %g Oct 22, 2024
Manual testing shows this seems to _mostly_ work...  _mostly_...

Mainly it fails when echoing a dictionary;

```vim
echo {'floater': 0.419}
```
> `{'floater': 0.00000000000000000000}`

...  But, partially succeeds when accessing a value!?

```vim
echo {'floater': 0.419}.floater
```
> `6.95323265041282803632e-310`

On a positive note, we can now use `%L` with `f`, `g`, and friends!

```vim
echo printf('%.18Lg', {'value':0.00926017481833696365}.value)
```
> `0.009260174818336964`

...  Which proves some of these changes function as intended X-D
@zeertzjq
Copy link
Member
zeertzjq commented Oct 24, 2024

Please do not change version.c in a PR

@S0AndS0
Copy link
Contributor Author
S0AndS0 commented Oct 24, 2024

I think I got most of the bits for allowing long double (AKA long floats) usage!

Only bit that is being a stinker are print/echo-ing dictionaries as a whole, eg.

echo {'floater': 0.419}
"> `{'floater': 0.00000000000000000000}`
echo {'floater': 0.419}.floater
"> `6.95323265041282803632e-310`

... But if feels close to doable, closer at least, and it seems printf is now properly passing the long variants of floats;

echo printf('%.18Lg', {'value':0.00926017481833696365}.value)
"> `0.009260174818336964`

TLDR I'd much appreciate a pointer to where I'm messing thing up when a dictionary''

@S0AndS0
Copy link
Contributor Author
S0AndS0 commented Oct 24, 2024

Please do not change version.c in a PR

Uh-oh 🤦 ... I'll try to get that reverted soon

Edit

Okay I think after a bit of git reflog and git reset --hard HEAD@{<number>} I was able to recover from that mistake! Things be building without any new issues.

@S0AndS0 S0AndS0 force-pushed the wip_json-floats branch 2 times, most recently from 0d6b7c1 to dc05bdb Compare October 25, 2024 00:52
@@ -6891,7 +6891,16 @@
get_var_special_name(iptr->isn_arg.number));
break;
case ISN_PUSHF:
smsg("%s%4d PUSHF %g", pfx, current, iptr->isn_arg.fnumber);
#if defined(__LP64__)
smsg("%s%4d PUSHF %.*Lg", FLOAT_PRECISION, pfx, current,

Check failure

Code scanning / CodeQL

Wrong type of arguments to formatting function High

This format specifier for type 'int' does not match the argument type 'char *'.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0