8000 Initial stuff by charris · Pull Request #1 · numpy/numpy-dtypes · GitHub
[go: up one dir, main page]

Skip to content
This repository was archived by the owner on Nov 25, 2022. It is now read-only.

Initial stuff #1

Merged
merged 70 commits into from
May 7, 2012
Merged

Initial stuff #1

merged 70 commits into from
May 7, 2012

Conversation

charris
Copy link
Member
@charris charris commented May 7, 2012

Copy .gitignore from numpy
Bring in the rational type from G. Irving
Make some initial code cleanups of the rational code.

So far, the code claims that for bet = 0, the Nash equilibrium has

    A = 0.0801971

which is (if true) interestingly far lower than the result for
deterministic post-flop outcomes (which is about 1/4).
The Nash equilibrium computation has rather large errors, which I've yet
to diagnose.  It's possible the errors mean additional strategies need
to be added, or that cvxopt uses error-prone solvers.

To get a pretty plot of equity vs. bet, run

     ./heads-up -b 20 -n 1000 --plot
We now iteratively add pure strategies to our list of strategies to
mix over until the optimal responses are within the list.
1. Use -march=core2.
2. Fix a conceptual but not actual bug in the SCORE macro, and
   align shifts to a 64-bit boundary in case that helps.
3. Compute interesting_permutations array outside compare_hands.
I spent a significant while trying to decouple suit from card in order
to produce a faster computation, but it's so much easier to loop over
suited cards (and more reliable bug-wise) that I eventually gave up.
The new suited score_hand function passes unit tests, and the exact
probabilities differ from pokerstove.com's approximate probabilities
by only 5e-7 (i.e., identical to within their 6 significant digits).
These are expensive enough (about 45 minutes on my laptop) to warrant saving.
Storing all cards in the same suit adjacent to one another lets us
fit score_t comfortably into 32 bits instead of 128, and has the
added benefit of parallelizing flush detection.

This commit provides conversion from the new scoring system to the
old one so that the regression tests don't break.  The next commit
will remove conversion functions and update the regressions.
After this fix, I regenerated exact.txt and got identical results,
so we can be fairly confident that exact.txt is correct.

For future reference, here's are the timings on my laptop:

    real 42m32.790s
    user 275m4.439s
    sys 0m37.587s

The machine wasn't otherwise idle, so it's actually a bit faster than this.
Initial results are underwhelming.  With 8 OpenMP threads, ./exact some takes

    real 0m1.871s
    user 0m13.054s
    sys 0m0.034s

On my AMD Radeon HD 6750M, OpenCL takes

    real 0m1.559s
    user 0m0.047s
    sys 0m0.059s

Maybe too much branching is the problem?
This anticipates vectorizing score_hand, since then we'll be using
OpenCL intrinsics that don't exist in C++.
All branching is now replaced by the appropriate OpenCL relational intrinsics
(select, isequal, etc.).  This speeds up ./exact test from 1.873 s to 1.471 s.
This speeds up exact test from 1.471 s to 1.222 s.
exact all now takes 15 m 45 s.
This speeds up exact all from 15:45 s to 7:55 s.
Using both CPU and GPU speeds up exact all from 7:55 s to 6:02 s.
We appear to be spending all of our time in read back, so the timing
has already paid off.  Now to fix...
Since OpenCL doesn't provide vector operations for use on the various cl_
types, a previous change removed support for native hand evaluation on
the host.  As a consequence, we had to make two kernel invocations per
compare_cards_opencl call: one to do most of the work and one to fill in
the last few entries.  This second call turns out to be incredibly expensive.

To get around this, I moved most of the score.cl code into score.h, and
added typedefs so the host gets a nonvectorized version of score_hands.
This change speeds up exact all from 6:02 s to 5:09 s.
For some reason, writing even a moderate number of lines to the terminal
is a significant performance hit.  For example, piping the output of
exact some 100 to a file speeds it up from 4.3460 s to 3.042s.  With
output on, my timing code claims its spending 1.3250 s reading back
results, which decreases to 0.1200 s piped to a file.  If I disable
the read, the extra time switches to write free or compute, so it's
clearly not actually the read's fault.

Piping the output to a file and skipping tee speeds up exact all from
5:09 s to 4:51 s.
girving and others added 28 commits December 1, 2011 23:51
nash.py implements a pure python, dtype-agnostic version of the simplex
method, which means the output is exact if the inputs are fractions.
heads-up doesn't take advantage of this yet, but it will soon.
As a consequence, we no longer depend on cvxopt.
The "rational" module adds a new "rational" dtype to numpy stored as the
ratio of two 64-bit integers.  Overflow exceptions are thrown if any
inexactness is detected.  This allows us to take advantage of exact
arithmetic (and the exact probabilities provided by the "exact" program)
without sacrificing the convenience and speed of numpy.
1. Remove an unnecessary loop
2. Cache parsing of exact.txt
To get a curve with 2**k samples, run

    ./heads-up --donkey -k
Specifically, we no longer have variables for Alice and Bob folding,
since these are determined from probabilities summing to one.
assert disappears with -O, though I've never seen anyone actually use -O.
* tmprat: (68 commits)
  Change Python assert to numpy.testing.assert_
  Reformat declarations and blocks to match numpy
  Remove C++ style comments and change inline to NPY_INLINE
  Disable GIL acquiring by default
  Convert from C++ to C
  Strip away everything but rational numbers
  Add .DS_Store to .gitignore
  Add one plot output for variable bet equilibria
  Cache variable bet nash equilibia to allow fast reruns
  README.md: Add a note about cvxopt being GPL
  heads-up: Replace C and c with explicit eye/1
  heads-up: Remove unnecessary variables from variable bet code
  heads-up: Allow --bets without --variable
  Add variable bet Nash equilibrium computation
  rational: Fix a nondetermistic bug in rational parsing
  heads-up: Add computation of the limit equity curve vs. always all-in
  heads-up: Switch to sparse matrices for --donkey
  heads-up: Add a -p/--profile option for use with --donkey
  heads-up --donkey: Optionally show worst call hand
  heads-up: Make -d/--deterministic a clean option
  ...
There remains a lot of code cleanup to do.
taldcroft added a commit that referenced this pull request May 7, 2012
@taldcroft taldcroft merged commit a773d06 into numpy:master May 7, 2012
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0