8000 Trying to give better guidance regarding large files. (#594) · JavaScriptExpert/simdjson@d1eef24 · GitHub
[go: up one dir, main page]

Skip to content

Commit d1eef24

Browse files
authored
Trying to give better guidance regarding large files. (simdjson#594)
1 parent ceee00b commit d1eef24

File tree

1 file changed

+19
-3
lines changed

1 file changed

+19
-3
lines changed

README.md

Lines changed: 19 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -146,11 +146,27 @@ The json stream parser is threaded, using exactly two threads.
146146

147147
## Large files
148148

149-
If you are processing large files (e.g., 100 MB), it is likely that the performance of simdjson will be limited by page misses and/or page allocation. [On some systems, memory allocation runs far slower than we can parse (e.g., 1.4GB/s).](https://lemire.me/blog/2020/01/14/how-fast-can-you-allocate-a-large-block-of-memory-in-c/)
149+
If you are processing large files (e.g., 100 MB), it is possible that the performance of simdjson will be limited by page misses and/or page allocation. [On some systems, memory allocation runs far slower than we can parse (e.g., 1.4GB/s).](https://lemire.me/blog/2020/01/14/how-fast-can-you-allocate-a-large-block-of-memory-in-c/)
150150

151-
You will get best performance with large or huge pages. Under Linux, you can enable transparent huge pages with a command like `echo always > /sys/kernel/mm/transparent_hugepage/enabled` (root access may be required). We recommend that you report performance numbers with and without huge pages.
151+
A viable strategy is to amortize the cost of page allocation by reusing the same `parser` object over several files:
152+
153+
```C++
154+
// create one parser
155+
simdjson::document::parser parser;
156+
...
157+
// the parser is going to pay a memory allocation price
158+
auto [doc1, error1] = parser.parse(largestring1);
159+
...
160+
// use again the same parser, it will be faster
161+
auto [doc2, error2] = parser.parse(largestring2);
162+
...
163+
auto [doc3, error3] = parser.load("largefilename");
164+
```
165+
166+
If you cannot reuse the same parser instance, maybe because your application just processes one large document once, you will get best performance with large or huge pages. Under Linux, you can enable transparent huge pages with a command like `echo always > /sys/kernel/mm/transparent_hugepage/enabled` (root access may be required). It may be more difficult to achieve the same result under other systems like macOS or Windows.
167+
168+
In general, when running benchmarks over large files, we recommend that you report performance numbers with and without huge pages if possible. Furthermore, you should amortize the parsing (e.g., by parsing several large files) to distinguish the time spent parsing from the time spent allocating memory.
152169

153-
Another strategy is to reuse pre-allocated buffers. That is, you avoid reallocating memory. You just allocate memory once and reuse the blocks of memory.
154170

155171
## Including simdjson
156172

0 commit comments

Comments
 (0)
0