8000 Fix --x-initial and --x-assign random stability (#2662) by toddstrader · Pull Request #5958 · verilator/verilator · GitHub
[go: up one dir, main page]

Skip to content

Fix --x-initial and --x-assign random stability (#2662) #5958

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
May 16, 2025

Conversation

toddstrader
Copy link
Member

We recently noticed that seeds were not portable between builds w/ and w/o --trace. This is because the two have different numbers of VL_RAND_RESET_*() calls. This is a proof of concept which, while not fully fleshed out, does resolve the issue for us (at least in limited testing).

We can work around this by setting --x-initial and --x-assign to 0. However, this is obviously not ideal.

The basic idea is that since VL_RAND_RESET_*() call may be optimized away (or differently) between builds, we don't let these calls advance the PRNG stream. Instead we calculate salt values during verilation (based on the AST or RTL path or something stable) and run the PRNG algorithm on that without modifying its state.

Open items:

  • in *rand64(), if reset is true return the next value per the algorithm after the state has been salted, but do not update the sate (instead of the simple XOR which is there now)
  • compute salts for various places which emit hard-coded salts currently (should be stable between verilations and w/ and w/o --trace
  • tests

I recognize that this isn't foolproof since one could:

int value;
always_ff @(posedge clk) value = $random();

And if value isn't used then I assume Verilator can prune it when we're running w/o --trace. However, this scenario feels a lot more obvious / straightforward to deal with whereas x randomization is very pervasive and implicit.

Anyway, I wanted to get initial thoughts on this before I went further. Please advise.

@wsnyder wsnyder changed the title POC: VL_RAND_RESET PRNG stability Fix --x-initial and --x-assign random stability (#2662) Apr 25, 2025
Copy link
Member
@wsnyder wsnyder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, makes sense, similar idea to #2662, thanks for working on this.

Please add a test that checks the randomization stability, e.g. that is calls compile and execute twice, and prints two random integers, then checks that the print outputs are the same.

Comment on lines 306 to 307
uint64_t VlRNG::rand64(bool reset, uint64_t salt) VL_MT_UNSAFE {
if (reset) { return (m_state[0] + m_state[1]) ^ salt; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears a misuse of VlRNG::rand64. If you want a new seed you should be using srandom(...) instead. I presume this is because you used VL_RANDOM, I think instead we should remove use of VL_RANDOM and make a new function.

This will also be faster as existing VL_RANDOM outside of init won't need to initialize and hit the new branch you added here.

Copy link
Member Author
@toddstrader toddstrader Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'll lose the reset flag and just make a different method. But do be clear, I'm not trying to set the seed here, I'm trying to piggy back on the seed that was set and get a psuedo-random value that is a function of the seed and the variable (by way of salt) without disturbing the state of VlRNG.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok. I think we should rely on only the original seed and hash of signal hierarchy.

Another thing to add to test is inset a new signal first in the second test.

@toddstrader
Copy link
Member Author

Re: testing, is there a way to create multiple VlTest objects in a single t_foo.py? I'm trying to do something along these lines:

for suffix, flags in [
    ("", []),
    ("_trace", ["--trace"]),
    ("_added", ["-DADD_SIGNAL"]),
    ("_trace_added", ["--trace", "-DADD_SIGNAL"]),
]:
    test.obj_dir = obj_dir + suffix

    verilator_flags2 = ["--x-initial unique"] + flags
    test.compile(verilator_flags2=verilator_flags2)

Doing this fails because the suffixed obj dirs don't exist. If I work around that I run into other problems because -Mdir is wrong. It seems like what I really want to do is create new VlTest objects with different Args.obj_suffix. But I don't see this pattern anywhere and the bootstrapping code continues to confuse me. Is there a different way to go about this?

@wsnyder
Copy link
Member
wsnyder commented Apr 28, 2025

There are a couple of tests that compile/execute within a single test more than once, but generally this is to be avoided as is hard to debug and won't parallelize. Instead please make 4 separate .py tests referring to the same Verilog file.

@toddstrader
Copy link
Member Author

There are a couple of tests that compile/execute within a single test more than once, but generally this is to be avoided as is hard to debug and won't parallelize. Instead please make 4 separate .py tests referring to the same Verilog file.

Gotcha. I was hoping to avoid the golden files and just compare logs with each other. But yeah, that'll work too.

@toddstrader toddstrader force-pushed the rand_reset_stability branch from 2609ac3 to 3acdb15 Compare April 28, 2025 20:40
@toddstrader
Copy link
Member Author

Still a WIP, but the first commit is a test which can be run to demonstrate the issue.

Question: when I use AstVar::prettyName() in t_x_rand_stability.py I get fully scoped names (e.g. test.uninitialized) but when I add a verilator public in t_x_rand_stability_add.py prettyName() now returns just the variable name (uninitialized). Why is this and is there something else I should be doing to get the fully qualified variable name?

@toddstrader
Copy link
Member Author

I see. It's an inlining difference. So I guess that's the question here. Should the random reset value be based on the longest possible Verilog path for the given variable? Or should it be based on just the variable name?

If we go with the former, then inlining changes will perturb the randomization like I'm seeing here. But if we go with the latter, there's less entropy on these reset values. Thoughts?

@wsnyder
Copy link
Member
wsnyder commented Apr 28, 2025

Maybe a hash of the module's scope, combined with a hash of the variable's name within module (not full scope). Accept that if inlining changes that may make it inconsistent; generally it'll take a pretty large change for that to flip.

@toddstrader
Copy link
Member Author

Sorry, doesn't module's scope have the same maybe-inlined-maybe-not problem? I don't believe AstNodeModule knows or can know this information (aside from maybe someInstanceName()). E.g. if I instantiate a non-inlined sub module twice I get:

    1: MODULE 0x5555570886c0 <e1247> {f44ai}  sub  L3 [1ps]
    1:2: VAR 0x5555570887e0 <e1255> {f45ar} @dt=0x55555708b080@(G/w8)  no_init [VSTATIC]  VAR
    1:2: SCOPE 0x5555570ba1c0 <e1642#> {f39aj}  TOP.t__DOT__the_sub_1 [abovep=0x5555570ba0e0] [cellp=0x555557088480] [modp=0x5555570886c0]
    1:2:1: VARSCOPE 0x55555708be00 <e1644#> {f45ar} @dt=0x55555708b080@(G/w8)  TOP.t__DOT__the_sub_1->no_init [scopep=0x5555570ba1c0] -> VAR 0x5555570887e0 <e1255> {f45ar} @dt=0x55555708b080@(G/w8)  no_init [VSTATIC]  VAR
    1:2: SCOPE 0x5555570ba2a0 <e1646#> {f40aj}  TOP.t__DOT__the_sub_2 [abovep=0x5555570ba0e0] [cellp=0x5555570885a0] [modp=0x5555570886c0]
    1:2:1: VARSCOPE 0x55555708bec0 <e1648#> {f45ar} @dt=0x55555708b080@(G/w8)  TOP.t__DOT__the_sub_2->no_init [scopep=0x5555570ba2a0] -> VAR 0x5555570887e0 <e1255> {f45ar} @dt=0x55555708b080@(G/w8)  no_init [VSTATIC]  VAR

But there's only one VAR and one VL_RAND_RESET_I(). So what would the module's scope be here?

@wsnyder
Copy link
Member
wsnyder commented Apr 29, 2025

The scope's hash needs to be computed at runtime as "Hash(vlSelf->name())" this then gets combined with the verilation-time computed "Hash(AstVar->name())" that is presumably passed to runtime through VL_RESET_WITH_HASHED_NEW_WHATEVER_I.

BTW the other reason to do at runtime is there might be multiple instantiations of the verilated module, and each should hash differently.

So generated code is something like

    modhash = hash_func(vlSelf->name())
     ...
    sig1 = VL_RESET_HASHED(0x1234, modhash, RNG);
    sig2 = VL_RESET_HASHED(0x4567, modhash, RNG);

    where 0x1234 is compiletime from Hash(AstVar->name())
    and VL_RESET_HASH combines 0x1234 and modhash value with RNG

@toddstrader
Copy link
Member Author

Ah, OK. Thanks, I was missing the runtime part. I believe I'm with you now.

@toddstrader
Copy link
Member Author

Further along now but am seeking some advice on this:

Vt_class_packed_sub__03a__03aWrReqQ__Vclpkg__DepSet_hdfbb1e03__0.cpp: In member function âvoid Vt_class_packed_sub__03a__03aWrReqQ::_ctor_var_reset(Vt_class_packed__Syms*)â:
Vt_class_packed_sub__03a__03aWrReqQ__Vclpkg__DepSet_hdfbb1e03__0.cpp:19:54: error: âvlSelfâ was not declared in this scope
   19 |     uint64_t __VscopeHash = std::hash<std::string>{}(vlSelf->name());
      |                                                      ^~~~~~

Basically, VlClass doesn't know the scope in which an object of said class is constructed. I think I can have VlClassRef pass along a VerilatedModule* its m_objp but am not sure if there are any gotchas there. E.g. is it possible to construct a VlClass object without a VlClassRef?

Anyway, I'll try this next but any guidance is appreciated.

@toddstrader
Copy link
Member Author

Actually, this is short sighted ^. If I am able to get the scope name into VlClass for hashing, that only gets us up to the scope where the new() happens. And since the new() is an expression and not a var, there's no name to hash. So:

Cls obj1 = new();
Cls obj2 = new();

would get the same x randomization.

I'm not immediately sure what to do about this but will continue poking at it.

@wsnyder
Copy link
Member
wsnyder commented May 3, 2025

Probably the "right" model is for it to be seeded similar to how IEEE specifies the RNG key be passed as classes are created (see 18.14).

However that might be a pain, it also requires a RNG in each class object which slows things down.

An alternative is to use the class package's name() this is static for the class (what $display %m uses). The downside is it will result in the same random setting for every class created.

Another complication is we'd like thread stability, new() could be called from different threads and want same randomization.

Perhaps then it's easiest to use the static class name and accept the same randomization for every object of a class. People that care about this should really be using either different seeds, or "rand" tags. Also the rand-reset stuff is really more for design randomization, classes are usually for the verif world where that design randomization doesn't apply and again people should be using "rand" and constraints for such classes.

@toddstrader toddstrader marked this pull request as ready for review May 5, 2025 17:46
@toddstrader
Copy link
Member Author

I made the class --x-initial change as suggested above. Some other notes:

I didn't touch vlReadMem::setData() which also uses VL_RAND_RESET_I(). This one feels more self-inflicted because you have to provide a memory file with x's or z's for it to matter. I guess we could have this take in the scope and compute a stable random value based off of that and maybe the character offset in the file? But I'm inclined not to mess with this one for now.

I removed m_reset from AstRand and instead have EmitCFunc::emitVarResetRecurse() detect whether the variable is XTEMP or not. I think it's fine to do the --x-assign randomization during reset because there shouldn't really be a notion of --x-initial for verilator-constructed variables. In fact, we were doing both x initial and x assign randomization for these variables, so this gets rid of one of them.

Beyond that, everything is passing for me locally and I've run out of immediate things to do here. So I believe this is in a review-able state.

There are a couple concepts in here at this point. Please let me know if it's desirable to chop any of this up.

// Evaluate one clock posedge
top->clk = 0;
top->eval();
top->clk = 1;
top->eval();

#if defined(T_X_ASSIGN_UNIQUE_0) || defined(T_X_ASSIGN_UNIQUE_1)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this test was previously expecting a "random" value during randReset() of 0 or 1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure perhaps mis-copied from another test.

@@ -2096,6 +2096,7 @@ V3Options::V3Options() {
m_makeDir = "obj_dir";
m_unusedRegexp = "*unused*";
m_xAssign = "fast";
m_xInitial = "unique";
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed that this parameter did not have the advertised default value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Looks like the code using it though treated "" as "unique".

@toddstrader toddstrader force-pushed the rand_reset_stability branch from 1db7971 to b40500a Compare May 5, 2025 19:38
Copy link
Member
@wsnyder wsnyder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, getting close.

I think the tests for stability need something to make sure that the outputs are compatible. My concern is we'll change something and update the golden files and not realize we've lost the similarity between them.

Comment on lines 410 to 415
if (Verilated::threadContextp()->randReset() == 0) return 0;
if (Verilated::threadContextp()->randReset() == 1) return ~0;
state[1] ^= state[0];
state[0] = (((state[0] << 55) | (state[0] >> 9)) ^ state[1] ^ (state[1] << 14));
state[1] = (state[1] << 36) | (state[1] >> 28);
return state[0] + state[1];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use the VlRNG class instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I switched to using VlRNG::rand64() which then makes VL_SCOPED_RAND_RESET_*() VL_MT_UNSAFE. I'm unsure of the implications here. Is this the right way to go or do I need to make something akin to VlRNG::vl_thread_rng() but for the global seed and which doesn't disturb the seed epoch?

out += ", " + varNameProtected + suffix;
if (!zeroit) {
emitVarResetScopeHash();
uint64_t salt = std::hash<std::string>{}(varp->prettyName());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
uint64_t salt = std::hash<std::string>{}(varp->prettyName());
const uint64_t salt = std::hash<std::string>{}(varp->prettyName());

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C++ "Hash functions are only required to produce the same result for the same input within a single execution of a program". So use VString functions instead, making a new one if needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a bummer.

I dropped a hash function in V3String and also the runtime library. Is there something better to do here? I don't see any hash functions already associated with VString. There is a SHA256 in there, but it speaks about 64B blocks and I don't think that's what we want for this application. There's also V3Hash but that is intrinsically 32b. Also, is there a reasonable way to share code between the compiler and the runtime? I'm not seeing any examples.

@toddstrader
Copy link
Member Author

Thanks, getting close.

I think the tests for stability need something to make sure that the outputs are compatible. My concern is we'll change something and update the golden files and not realize we've lost the similarity between them.

Yeah, see t_x_rand{,_mt}_stability.py. It already compares the golden files between the different flavors of this test (modulo the _zero one which is clearly not going to be the same).

One thing to note is that the single and multi-threaded versions do not compare. The x randomization values do, but the $random() values do not. I'm guessing this is known / expected. In any case, I'm then not comparing the results between the single and multi-threaded tests (even though I could do so by stripping out the $random() values) because the entire point of this exercise is to preserve PRNG behavior and that's apparently already not going to happen between these two sets of tests.

@toddstrader
Copy link
Member Author

@wsnyder just circling back on this one. Anything else to do here?

Copy link
Member
@wsnyder wsnyder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, missed this was ready. A few minor things then can merge.

@toddstrader toddstrader force-pushed the rand_reset_stability branch from 2cbc7d8 to 8efb24b Compare May 16, 2025 15:18
@wsnyder wsnyder merged commit 4581023 into verilator:master May 16, 2025
36 checks passed
@toddstrader toddstrader deleted the rand_reset_stability branch May 16, 2025 18:07
wsnyder added a commit that referenced this pull request May 18, 2025
toddstrader added a commit to toddstrader/verilator that referenced this pull request May 19, 2025
toddstrader added a commit to toddstrader/verilator that referenced this pull request May 21, 2025
toddstrader added a commit to toddstrader/verilator that referenced this pull request May 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0