Conversation
|
I understand the need, but my intuition is that this is not a good idea: for one, it leaks an implementation detail, but more to the point each issue mentioned above revolves around the possibility of reading from a I would rather first discuss whether we agree to extend the API of |
|
@nojb I'm afraid I disagree on all points :).
|
|
Related: #6538, #2672, and also #6975 (comment). While I like the idea of exposing a more powerful API to turn Buffer.t into proper mutable/extensible strings, I think we should indeed avoid exposing its internal implementation. It could for instance be considered in the future to switch to char bigarrays (esp. now that bigarray is part of the stdlib) which might be better suited than |
|
@alainfrisch honestly, even moving to bigarray seems dubious to me. It's clear that allocating a lot of small buffers is fine, and people do it (even in, say, I think the backup buffer is an implementation detail, but the main one isn't. It's been around for decades so the implementation details have irreversibly leaked. relevant xkcd if one wanted to modify the implementation. |
|
Of course, I should add that I've regularly reimplemented my own crappy buffer because I needed access to the internals after some IO reads. Even calling |
Which is not incompatible with automatically switching to char bigarray when resizing above a certain limit... |
|
Another remark:
how do you do that if you don't have access to the internal buffer? 🙂 . Or are you advocating for having these builtin in the stdlib? To me that's a different, specialized type, of IO buffer with a long lifetime and guaranteed bigarray backing. You don't want your writev to fail because it's actually a small Why not just make |
Why don't you work out a design and an implementation of the "high-performance, C-compatible, IO buffer" or whatever you have in mind as a separate library, or as part of one of the alternate standard libraries around there ? The OCaml stdlib is not the place for experiments, but can benefit from such experiments. |
|
I'm not sure I understand your angle, @xavierleroy. I'm advocating to commit to the current design (which I think is arguably fixed in stone at this point), and provide access to the buffer's data as it is. On the other hand, my understanding is that @alainfrisch is advocating for keeping the internal buffer private so it could change in the future (and perhaps be the foundation for bigstring-backed IOs); I think this ship has sailed for a module that is 23 years old. |
|
I probably quoted the wrong part of the discussion, sorry. It's the part where you mention " I've regularly reimplemented my own crappy buffer because I needed access to the internals after some IO reads" that made me think you could first design and implement your own non-crappy buffer that does what you want, then show it to the world. I would hope something could come out of this effort that would be nicer than just "let's expose the internal state of the existing abstraction". |
|
I don't have a nicer abstraction, I don't think there can exist one unless we had rust's immutable slices, in which case it'd be A few examples of where I just redid my thing:
As for more general IO buffers, we have biniou which long powered Yojson (although this should change with 2.0), and ocamlnet's netbuffer that is linked in a previous issue. |
I'm not sure to understand this missing part where |
|
Some discussion of reading integers, floats etc. from a Buffer without copying to a string here, and elsewhere in that issue: |
these slices:
so I don't think it's a valid counterpoint. |
|
Just to clarify my position: I'm not trying to push for a change of representation, but I don't think it's a good idea to prevent any such change in the future. Abstraction is useful! I think that extending |
|
Buffer is a high-performance data structure, and realistically its implementation is extremely unlikely to change. If we ever want a BigArray-backed Buffer, we'll have to make a new type. Just think of having to test the performance impact of that kind of change in the OCaml universe: it's realistically never ever going to happen. I therefore see nothing wrong with exposing the internals so as to let the data structure do what it was meant to do, which is to allow for high performance operations. |
|
I agree with @bluddy. Also, abstraction is useful, but not for such low-level components; one also expects @alainfrisch could you sketch what a realistic change to I should have opened this years ago and I'm really frustrated by the pushback. I never wrote |
|
As for the "missing functions": the de-facto blocking IO interfaces in OCaml are: val read: bytes -> int -> int -> int
val write: bytes -> int -> int -> unitYou cannot add to |
|
I'm a relatively new user to OCaml (since 2018 part-time, 2020 full-time) but I really like this PR. One of my growing pains has been to choose between A |
|
BTW, is there any reason this is called |
|
The reason is twofold:
|
|
I'd also prefer to keep |
|
… Am I making no arguments here? There's been no plan to change it FOR THE 23 YEARS IT HAS EXISTED. no one has a suggestion of how it might change (the bigstring one would break memory behavior for so many programs it's not funny). This is seriously the last time I even consider contributing to the stdlib. |
|
Since the proposed function is clearly labeled "unsafe" and that it would solve a problem, I think it would be perfectly reasonable to include it now and if in some distant future the underlying implementation changed, which is unlikely but not impossible, the function could become a harmless alias of |
Something like: type buf = Bytes of bytes | BaString of (char, Bigarray.int8_unsigned_elt, Bigarray.c_layout) Bigarray.Array1.t
type t =
{mutable buffer : buf;
mutable position : int;
mutable length : int;
initial_buffer : buf}(with an unboxing annotation onf Are you concerned by the cost of the extra pattern-matching indirection on every access? |
You can make that opt-in at buffer creation time.
That one would break memory behaviour for so many programs it's not funny :–) In general the story on how we do bytes – For example it would be nice to be able to choose the backing store of a buffer abstraction so that buffer values can be used to encode to either bigbytes or bytes without client even knowing to what they are encoding. I'm not sure these kind of things can or should be retrofitted in |
I don't have the patience, energy, or motivation to work on that, especially if it's to face 2 years of nitpicking on the PR afterwards. So, no, fuck it. |
|
I'll do my own buffer somewhere else, probably in containers, and let the stdlib do its thing. |
|
@c-cube wrote:
It's just that in the 24th year, OCaml 5.0 happens to have non-relocating |
given |
|
@avsm |
|
I experimented a bit with what @alainfrisch was suggesting, and here's a super basic comparison of
I'm really not an assembly expert. It doesn't look that bad, but it's still a likely slowdown for code that calls edit: a bigger problem is probably the conditional jump in |
|
I also did a quick benchmark: bench.ml.txt I cannot observe any difference between the two versions (note the inlining annotation on nth and add_char), and this doesn't even include a possible future optimization which would avoid an extra memory load in the new version (unboxing of the sum type, using the fact that bytes and bigarrays have different tags). (Side note: I doubt that performance-critical code would rely a lot on But my point is not so much about this possible change, but about keeping the possibility to consider changes in the future. This could be a radically different implementation when targetting Javascript, or keeping list of chunks above a certain size (which can avoid fragmentation when there is no compaction, and avoid copying large strings if compaction exists). |
Maybe, maybe not. It depends whether we add a compactor or not. Certainly, there's a lot of work to do to figure it out (some of which is happening in https://github.com/ocaml-multicore/eio). I'm not in a rush to break abstraction in |
|
That's alright, I'm done with this issue anyway.
It's clear that Buffer isn't designed for my use cases beyond "append*; contents". I'll write my own with stronger guarantees, fragmentation notwithstanding.
I'd have loved to propose a standard vector implementation at some point, but I feel like yesterday: not going to happen.
|
This is something I've wanted to do for the past 8 years or so. Currently, Buffer.t is useful as a way to produce values of type
stringefficiently, but it falls short in other areas where access to the underlying byte buffer is required, because it provides no escape hatch at all. Functions likeoutput_buffer, oradd_int64_nehave to be hardcoded, and anything else will require an extra copy, which is problematic when the use of a buffer in the first place is to reduce allocation. If I wanted to read an int64 from a buffer, I'd have to extract tobytesfirst. If I want to use aLwt_iofunction to write bytes to some channel, or use gzip bindings to compress the content of a buffer, or any other form of C bindings, or search for a delimiter (like\n), I have to copy/sub.This is "unsafe" in the sense of "I know what I'm doing", but it doesn't violate any OCaml invariant. It's less risky than
Bytes.unsafe_to_string, so I'm not entirely sure it deserves the prefix.