Make benchfeatures work again #857

jkeiser · 2020-05-02T20:49:29Z

This was checked in a while ago, but I didn't bring it up to date with warnings and cmake. For reintroduction, it runs the parser against a set of painstakingly crafted generated files, each designed to differ from others in only one respect (for example, blocks with utf-8 and without). We run the parser against both of them, and count the difference as the cost of that feature.

The files are generated by genfeaturejson.rb, and are all exactly 640KiB. We divide this into 64-byte blocks, each of which has a given set of features in it.

Structural Density: files that measure our speed at a given # of structurals
- 0-structurals-full.json: "ab" followed by 640K of zeroes. This is the "Base" number in the result.
- 7-structurals-full.json: ,"ab","ab",{} repeated every 64 bytes for 640K.
- 15-structurals-full.json: ,"ab","ab","ab","ab","ab","ab",{} repeated every 64 bytes for 640K.
- 23-structurals-full.json: ,"ab","ab","ab","ab","ab","ab","ab","ab","ab","ab",{} repeated every 64 bytes for 640K.
String Features: measure our speed at particular string features.
- utf-8-full.json: ',"֏","֏",{} repeated every 64 bytes for 640K.
- escape-full.json: ,"\\"","\\"",{} repeated every 64 bytes for 640K.
Miss Cost: we also test the cost of a missed branch for all the above features (since each of them takes a different branch). Like, not only how much does it cost to validate UTF-8, how much extra does it cost when we mispredict that we have to validate UTF-8?

The methodology: we first generate a file that starts with 320K of the feature (50%), and ends with 320K of a baseline (something that differs only in that it will not trigger that branch). Then we generate a second file, also filled with exactly 50% of the feature and baseline, but intermixed and alternated according to a pseudorandom (stable) sequence defined in miss-templates/64.txt or miss-templates/128.txt. This sequence was designed to generate a 25% miss rate on my machine; YMMV. We then take the difference between the two files, assume the entire difference results from branch misses, and divide that by 4 (25%) to get the per-block miss cost.

To reiterate: it's counting differential costs here, so the expected ns/block for a file with with 7 structurals with UTF-8 in them totals Base + 7 Struct + UTF-8 nanoseconds/block. (The UTF-8 file is based on 7 structurals.)

Here are some numbers from my machine:

Stage	Platform	Base	7 Struct	UTF-8	Escape	15 Str.	16+ Str.	7 Struct Miss	UTF-8 Miss	Escape Miss	15 Str. Miss	16+ Str. Miss
Stage 1	g++ 9.3.0 (WSL)	6.63 8000	2.6	4.8	0	2.68	3.74	1.08	3.4	0	3.64	4.96
Stage 1	clang 9.0.0 (WSL)	6.87	2.54	4.68	0	1.96	3.92	2.1	6.52	0	5.04	5.32
Stage 1	ClangCL 9.0.0	6.89	9.42	5.06	0.32	5.34	10.5	3.32	6.68	-0.68	5.08	6.56
Stage 1	MSVC 19.25.28614.0	20.5	10.1	10	-0.14	5.22	6	0.84	1.92	0.2	1.76	1.12
Stage 2	g++ 9.3.0 (WSL)	0.09	12.9	-0.02	5.66	16.6	16.6	0.12	-0.04	11.5	0.36	0.28
Stage 2	clang 9.0.0 (WSL)	0.09	14	0	4.78	16.6	16.8	0.2	0	9.96	0.44	0.2
Stage 2	ClangCL 9.0.0	0.35	25.9	0.04	4.26	22.4	33.5	0.2	-0.36	8.52	1.32	-0.88
Stage 2	MSVC 19.25.28614.0	0.36	32.9	0.08	4.18	33.4	41	0.38	-0.24	12.1	-0.28	1
All	g++ 9.3.0 (WSL)	6.75	15.5	4.78	5.64	19.3	20.3	1.26	3.4	11.5	4	5.4
All	clang 9.0.0 (WSL)	6.99	16.6	4.78	4.78	18.6	20.7	2.3	6.48	9.92	5.52	6
All	ClangCL 9.0.0	10.6	36.2	8.3	6.04	29.9	43	2.9	4.36	9.52	6.88	5.8
All	MSVC 19.25.28614.0	23.6	44.2	10.5	4.68	38.2	48.6	1.26	-0.2	14.2	4.16	1.08

jsonexamples/generated/genfeaturejson.rb

jsonexamples/generated/CMakeLists.txt

jsonexamples/CMakeLists.txt

benchmark/benchfeatures.cpp

lemire · 2020-05-05T00:16:54Z

jsonexamples/generated/CMakeLists.txt

@@ -0,0 +1,14 @@
+set(generated_files


I know this works, obviously, but I find it confusing. Nothing else in the project seems to depend on generated_files. But benchmark/benchfeatures.cpp requires these files... Gosh. How does this work? And what happens when ruby is missing... how can benchfeatures work?

When ruby is missing, benchfeatures will fail :) This is all manual right now: you have to make benchfeatures generated-data and run it yourself. Ultimately, I'd like to write the generator in something other than Ruby (probably C++, just because that's what we have available) and make it a full target that builds and runs.

When ruby is missing, benchfeatures will fail :)

This is not a request on my part... but wouldn't the nice thing be to make it so that the benchfeatures is disabled if ruby is missing? (I am not asking for changes, just inquiring.)

I thought about that, but I kinda want to keep it compiling in CI if we can. I'd rather it not rot again.

lemire

I see nothing to object to in this PR. I am a bit puzzled by how it all comes together, but I am not concerned.

jkeiser · 2020-05-05T16:15:28Z

Feel free to continue asking questions, I'll magnanimously bestow the magnificent gift of my 2 weeks worth of cmake knowledge! (Seriously, ask and I'll do my best and bring in the @furkanusta big guns if I can't, even though they probably would prefer not to be considered the "big guns" :)

jkeiser · 2020-05-05T16:15:43Z

I'm going to remove the -a ARCH parameter and merge.

jkeiser · 2020-05-05T18:54:10Z

Just discovered benchfeatures.rb didn't work in Ruby 1.9, and some of Appveyor's CI machines are on that version, so I tweaked it a tiny bit to deal with that.

lemire · 2020-05-06T00:56:03Z

+1

jkeiser force-pushed the jkeiser/benchfeatures branch from caf4bc0 to 537c265 Compare May 2, 2020 20:56

This was referenced May 2, 2020

MSVC simdjson twice as slow as ClangCL #847

Closed

ClangCL simdjson 15% slower than Clang on WSL #848

Open

jkeiser force-pushed the jkeiser/benchfeatures branch from 9a8da2b to 1ed57da Compare May 3, 2020 16:14

jkeiser requested a review from lemire May 3, 2020 17:13

lemire reviewed May 4, 2020

View reviewed changes

jsonexamples/generated/genfeaturejson.rb Show resolved Hide resolved

lemire reviewed May 5, 2020

View reviewed changes

jsonexamples/generated/CMakeLists.txt Show resolved Hide resolved

lemire reviewed May 5, 2020

View reviewed changes

jsonexamples/CMakeLists.txt Show resolved Hide resolved

lemire reviewed May 5, 2020

View reviewed changes

benchmark/benchfeatures.cpp Show resolved Hide resolved

lemire reviewed May 5, 2020

View reviewed changes

lemire self-requested a review May 5, 2020 00:46

lemire approved these changes May 5, 2020

View reviewed changes

jkeiser force-pushed the jkeiser/benchfeatures branch from 1ed57da to eac8e61 Compare May 5, 2020 16:18

jkeiser added 2 commits May 5, 2020 09:39

Make benchfeatures work again

8c600ca

Generate test data when tests are built

aa53d87

jkeiser force-pushed the jkeiser/benchfeatures branch from eac8e61 to aa53d87 Compare May 5, 2020 18:30

Fix benchfeatures.rb on Ruby 1.9

1e8a54a

jkeiser merged commit a64d2f4 into master May 5, 2020

jkeiser deleted the jkeiser/benchfeatures branch May 5, 2020 19:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make benchfeatures work again #857

Make benchfeatures work again #857

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Make benchfeatures work again #857

Make benchfeatures work again #857

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!