-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Make benchfeatures work again #857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
caf4bc0
to
537c265
Compare
9a8da2b
to
1ed57da
Compare
@@ -0,0 +1,14 @@ | |||
set(generated_files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this works, obviously, but I find it confusing. Nothing else in the project seems to depend on generated_files. But benchmark/benchfeatures.cpp requires these files... Gosh. How does this work? And what happens when ruby is missing... how can benchfeatures work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When ruby is missing, benchfeatures will fail :) This is all manual right now: you have to make benchfeatures generated-data
and run it yourself. Ultimately, I'd like to write the generator in something other than Ruby (probably C++, just because that's what we have available) and make it a full target that builds and runs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When ruby is missing, benchfeatures will fail :)
This is not a request on my part... but wouldn't the nice thing be to make it so that the benchfeatures is disabled if ruby is missing? (I am not asking for changes, just inquiring.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that, but I kinda want to keep it compiling in CI if we can. I'd rather it not rot again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see nothing to object to in this PR. I am a bit puzzled by how it all comes together, but I am not concerned.
Feel free to continue asking questions, I'll magnanimously bestow the magnificent gift of my 2 weeks worth of cmake knowledge! (Seriously, ask and I'll do my best and bring in the @furkanusta big guns if I can't, even though they probably would prefer not to be considered the "big guns" :) |
I'm going to remove the -a ARCH parameter and merge. |
1ed57da
to
eac8e61
Compare
eac8e61
to
aa53d87
Compare
Just discovered benchfeatures.rb didn't work in Ruby 1.9, and some of Appveyor's CI machines are on that version, so I tweaked it a tiny bit to deal with that. |
+1 |
This was checked in a while ago, but I didn't bring it up to date with warnings and cmake. For reintroduction, it runs the parser against a set of painstakingly crafted generated files, each designed to differ from others in only one respect (for example, blocks with utf-8 and without). We run the parser against both of them, and count the difference as the cost of that feature.
The files are generated by genfeaturejson.rb, and are all exactly 640KiB. We divide this into 64-byte blocks, each of which has a given set of features in it.
Structural Density: files that measure our speed at a given # of structurals
"ab"
followed by 640K of zeroes. This is the "Base" number in the result.,"ab","ab",{}
repeated every 64 bytes for 640K.,"ab","ab","ab","ab","ab","ab",{}
repeated every 64 bytes for 640K.,"ab","ab","ab","ab","ab","ab","ab","ab","ab","ab",{}
repeated every 64 bytes for 640K.String Features: measure our speed at particular string features.
',"֏","֏",{}
repeated every 64 bytes for 640K.,"\\"","\\"",{}
repeated every 64 bytes for 640K.Miss Cost: we also test the cost of a missed branch for all the above features (since each of them takes a different branch). Like, not only how much does it cost to validate UTF-8, how much extra does it cost when we mispredict that we have to validate UTF-8?
The methodology: we first generate a file that starts with 320K of the feature (50%), and ends with 320K of a baseline (something that differs only in that it will not trigger that branch). Then we generate a second file, also filled with exactly 50% of the feature and baseline, but intermixed and alternated according to a pseudorandom (stable) sequence defined in miss-templates/64.txt or miss-templates/128.txt. This sequence was designed to generate a 25% miss rate on my machine; YMMV. We then take the difference between the two files, assume the entire difference results from branch misses, and divide that by 4 (25%) to get the per-block miss cost.
To reiterate: it's counting differential costs here, so the expected ns/block for a file with with 7 structurals with UTF-8 in them totals Base + 7 Struct + UTF-8 nanoseconds/block. (The UTF-8 file is based on 7 structurals.)
Here are some numbers from my machine: