8000 bpo-44946: Streamline operators and creation of ints for common case of single 'digit'. by markshannon · Pull Request #27832 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

bpo-44946: Streamline operators and creation of ints for common case of single 'digit'. #27832

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Aug 25, 2021
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Implement PyLong_FromLong separately from _PyLong_FromSTwoDigits to a…
…llow for 15 bit digits on 64 bit machines.
  • Loading branch information
markshannon committed Aug 23, 2021
commit 1f2d47c5b2d9c5c300d16443d33f58af75873db6
51 changes: 45 additions & 6 deletions Objects/longobject.c
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,8 @@ medium_value(PyLongObject *x)
#define IS_SMALL_INT(ival) (-NSMALLNEGINTS <= (ival) && (ival) < NSMALLPOSINTS)
#define IS_SMALL_UINT(ival) ((ival) < NSMALLPOSINTS)

/* To be valid the type of x must cover -PyLong_BASE to +PyLong_BASE.
int, long, Py_ssize_t are all ok */
#define IS_MEDIUM_INT(x) (((twodigits)x)+PyLong_MASK <= 2*PyLong_MASK)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be useful to have a comment clarifying what range of values this macro can safely be used for. I'm assuming it should be enough that it's valid for values in the range (-PyLong_BASE**2, PyLong_BASE**2).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I think I was unclear. The (twodigits)x cast potentially loses information if x is large enough, leading to the possibility of false positives for IS_MEDIUM_INT. For example, that will happen on Windows with a large Py_ssize_t value and 15-bit digits - in that case, Py_ssize_t is much larger than unsigned long.

So there's some restriction on the value of x for which this test is valid. "Fits in stwodigits" would probably be enough, but I don't think we use this macro for values outside the range (-PyLong_BASE**2, PyLong_BASE**2).

Copy link
Member
@mdickinson mdickinson Aug 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More generally, C's integer-handling rules make this sort of thing horribly messy to reason about: for example in the 15-bit digit case the addition is an addition of an unsigned long to a (signed!) int, since the integer promotions will promote the unsigned short PyLong_MASK to an int (though even that part is not guaranteed by the standard - there's nothing preventing short and int having the same precision, in which case PyLong_MASK will be promoted to unsigned int instead of int). So now we have to consult the rules for unsigned + signed addition in the "usual arithmetic conversions", which eventually say that because long has greater rank than int (even if it has the same precision), both operands will be treated as unsigned long for the addition.

The 2 * PyLong_MASK is another case that could end up being either signed or unsigned depending on ranks, types, etc; it's probably better spelled as 2U * PyLong_MASK; that way we can at least be sure that it's performed as an unsigned multiplication and that the final comparison is unsigned-to-unsigned.

I'd suggest the addition of an extra cast around the result of the addition, just to reduce the number of mental hoops one has to jump through to establish that this really does give the right result: that is,

((twodigits)((twodigits)(x)+PyLong_MASK) <= 2U*PyLong_MASK)

We should also add extra parentheses around the x, in case someone tries to use IS_MEDIUM_INT on an expression more complicated than a single name.

Copy link
Member Author
@markshannon markshannon Aug 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wholeheartedly agree that C's integer handling is a pain to think about 😞

For clarity I think this is best to use an inline function that makes all casts super explicit.
That way that it makes the cast explicit (if called with something other than stwodigits or sdigits, the caller is responsible.

static inline int is_medium_int(stwodigits x)
{
    /* We have to take care here to make sure that we are
     * comparing unsigned values. */
    twodigits x_plus_mask = ((twodigits)x) + PyLong_MASK;
    return x_plus_mask < ((twodigits)PyLong_MASK) + PyLong_BASE;
}

Does that seem sensible?


static PyObject *
Expand Down Expand Up @@ -195,9 +197,10 @@ _PyLong_FromLarge(stwodigits ival)
abs_ival = (twodigits)ival;
sign = 1;
}
/* Loop to determine number of digits */
twodigits t = abs_ival;
Py_ssize_t ndigits = 0;
/* Must be at least two digits */
assert(abs_ival >> PyLong_SHIFT != 0);
twodigits t = abs_ival >> (PyLong_SHIFT *2);
Py_ssize_t ndigits = 2;
while (t) {
++ndigits;
t >>= PyLong_SHIFT;
Expand Down Expand Up @@ -251,8 +254,44 @@ _PyLong_Negate(PyLongObject **x_p)
PyObject *
PyLong_FromLong(long ival)
{
Py_BUILD_ASSERT(sizeof(stwodigits) >= sizeof(long));
return _PyLong_FromSTwoDigits(ival);
if (IS_SMALL_INT(ival)) {
return get_small_int((sdigit)ival);
}
unsigned long abs_ival;
int sign;
if (ival < 0) {
/* negate: can't write this as abs_ival = -ival since that
invokes undefined behaviour when ival is LONG_MIN */
abs_ival = 0U-(twodigits)ival;
Copy link
Member
@mdickinson mdickinson Jan 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not have been changed. There's no guarantee that an unsigned long fits in something of type twodigits. I'll open a bug report and make a PR when I get the chance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #30496. We seem to be okay on current platforms because from longintrepr.h, twodigits has type either unsigned long or uint64_t (depending on PYLONG_BITS_IN_DIGIT), and no platform we currently care about has a long larger than uint64_t.

sign = -1;
}
else {
abs_ival = (unsigned long)ival;
sign = 1;
}
/* Fast path for single-digit ints */
if (!(abs_ival >> PyLong_SHIFT)) {
return _PyLong_FromMedium((sdigit)ival);
}
/* Must be at least two digits */
unsigned long t = abs_ival >> (PyLong_SHIFT *2);
Py_ssize_t ndigits = 2;
while (t) {
++ndigits;
t >>= PyLong_SHIFT;
}
PyLongObject *v = _PyLong_New(ndigits);
if (v != NULL) {
digit *p = v->ob_digit;
Py_SET_SIZE(v, ndigits * sign);
t = abs_ival;
while (t) {
*p++ = Py_SAFE_DOWNCAST(
t & PyLong_MASK, unsigned long, digit);
t >>= PyLong_SHIFT;
}
}
return (PyObject *)v;
}

#define PYLONG_FROM_UINT(INT_TYPE, ival) \
Expand Down Expand Up @@ -3554,7 +3593,7 @@ long_mul(PyLongObject *a, PyLongObject *b)
/* fast path for single-digit multiplication */
if (IS_MEDIUM_VALUE(a) && IS_MEDIUM_VALUE(b)) {
stwodigits v = medium_value(a) * medium_value(b);
return PyLong_FromLongLong((long long)v);
return _PyLong_FromSTwoDigits(v);
}

z = k_mul(a, b);
Expand Down
0