8000 bpo-1635741: Port _datetime module to multiphase initialization (PEP 489) by phsilva · Pull Request #19122 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

bpo-1635741: Port _datetime module to multiphase initialization (PEP 489) #19122

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Port _datetime module to multiphase initialization (:pep:`489`). Patch by
Paulo Henrique Silva.
137 changes: 80 additions & 57 deletions Modules/_datetimemodule.c
Original file line number Diff line number Diff line change
Expand Up @@ -6335,7 +6335,7 @@ static PyTypeObject PyDateTime_DateTimeType = {
* Module methods and initialization.
*/

static PyMethodDef module_methods[] = {
static PyMethodDef datetimemodule_methods[] = {
{NULL, NULL}
};

Expand All @@ -6360,32 +6360,13 @@ static PyDateTime_CAPI CAPI = {
new_time_ex2
};



static struct PyModuleDef datetimemodule = {
PyModuleDef_HEAD_INIT,
"_datetime",
"Fast implementation of the datetime type.",
-1,
module_methods,
NULL,
NULL,
NULL,
NULL
};

PyMODINIT_FUNC
PyInit__datetime(void)
static int
datetimemodule_exec(PyObject *module)
{
PyObject *m; /* a module object */
PyObject *d; /* its dict */
PyObject *x;
PyObject *delta;

m = PyModule_Create(&datetimemodule);
if (m == NULL)
return NULL;

PyTypeObject *types[] = {
&PyDateTime_DateType,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Static types are not really compatible with having two instances of the same C extension module.

Moreover, this function sets attributes of the type dictionary each time datetimemodule_exec() is called. Problem: type attributes are cached. If you run test_datetime twice, each run creates a fresh _datetime module: ./python -m test -v test_datetime test_datetime. But many tests fail at the second run. A workaround is to call explicitly PyType_ClearCache(); at the function exit wih a comment explaining that it's a workaround for static types.

Note: when running test_datetime, test_divide_and_round() fails: see https://bugs.python.org/issue40058 (it's not a regression of this PR).

Another way to see the bug:

import sys
for i in range(1, 3):
    print(f"== run #{i} ==")
    import _datetime
    print("resolution", _datetime.timedelta.resolution)
    print("min", _datetime.timedelta.min)
    print("max", _datetime.timedelta.max)
    _datetime = None
    del sys.modules['_datetime']

Output:

== run #1 ==
resolution 0:00:00.000001
min -999999999 days, 0:00:00
max 999999999 days, 23:59:59.999999
== run #2 ==
resolution -999999999 days, 0:00:00
min 999999999 days, 23:59:59.999999
max 0001-01-01

When a second instance of _datetime is created, _datetime.timedelta.max becomes a date object instead of a timedelta object. It's because the type cache isn't updated when the dict is modified directly.

8000
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added PyType_ClearCache workaround, will add the conversion to use heap-allocated PyTypeObjects to future PR.

&PyDateTime_DateTimeType,
Expand All @@ -6396,8 +6377,8 @@ PyInit__datetime(void)
};

for (size_t i = 0; i < Py_ARRAY_LENGTH(types); i++) {
if (PyModule_AddType(m, types[i]) < 0) {
return NULL;
if (PyModule_AddType(module, types[i]) < 0) {
return -1;
}
}

Expand All @@ -6406,83 +6387,83 @@ PyInit__datetime(void)

x = new_delta(0, 0, 1, 0);
if (x == NULL || PyDict_SetItemString(d, "resolution", x) < 0)
return NULL;
return -1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8000

This code path leaks x reference if PyDict_SetItemString() fails. I propose to not fix this bug in this PR. Let's move step by step.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, not fixing for now. Will add to the summary of future refactors.

Py_DECREF(x);

x = new_delta(-MAX_DELTA_DAYS, 0, 0, 0);
if (x == NULL || PyDict_SetItemString(d, "min", x) < 0)
return NULL;
return -1;
Py_DECREF(x);

x = new_delta(MAX_DELTA_DAYS, 24*3600-1, 1000000-1, 0);
if (x == NULL || PyDict_SetItemString(d, "max", x) < 0)
return NULL;
return -1;
Py_DECREF(x);

/* date values */
d = PyDateTime_DateType.tp_dict;

x = new_date(1, 1, 1);
if (x == NULL || PyDict_SetItemString(d, "min", x) < 0)
return NULL;
return -1;
Py_DECREF(x);

x = new_date(MAXYEAR, 12, 31);
if (x == NULL || PyDict_SetItemString(d, "max", x) < 0)
return NULL;
return -1;
Py_DECREF(x);

x = new_delta(1, 0, 0, 0);
if (x == NULL || PyDict_SetItemString(d, "resolution", x) < 0)
return NULL;
return -1;
Py_DECREF(x);

/* time values */
d = PyDateTime_TimeType.tp_dict;

x = new_time(0, 0, 0, 0, Py_None, 0);
if (x == NULL || PyDict_SetItemString(d, "min", x) < 0)
return NULL;
return -1;
Py_DECREF(x);

x = new_time(23, 59, 59, 999999, Py_None, 0);
if (x == NULL || PyDict_SetItemString(d, "max", x) < 0)
return NULL;
return -1;
Py_DECREF(x);

x = new_delta(0, 0, 1, 0);
if (x == NULL || PyDict_SetItemString(d, "resolution", x) < 0)
return NULL;
return -1;
Py_DECREF(x);

/* datetime values */
d = PyDateTime_DateTimeType.tp_dict;

x = new_datetime(1, 1, 1, 0, 0, 0, 0, Py_None, 0);
if (x == NULL || PyDict_SetItemString(d, "min", x) < 0)
return NULL;
return -1;
Py_DECREF(x);

x = new_datetime(MAXYEAR, 12, 31, 23, 59, 59, 999999, Py_None, 0);
if (x == NULL || PyDict_SetItemString(d, "max", x) < 0)
return NULL;
return -1;
Py_DECREF(x);

x = new_delta(0, 0, 1, 0);
if (x == NULL || PyDict_SetItemString(d, "resolution", x) < 0)
return NULL;
return -1;
Py_DECREF(x);

/* timezone values */
d = PyDateTime_TimeZoneType.tp_dict;

delta = new_delta(0, 0, 0, 0);
if (delta == NULL)
return NULL;
return -1;
x = create_timezone(delta, NULL);
Py_DECREF(delta);
if (x == NULL || PyDict_SetItemString(d, "utc", x) < 0)
return NULL;
return -1;
PyDateTime_TimeZone_UTC = x;
CAPI.TimeZone_UTC = PyDateTime_TimeZone_UTC;

Expand All @@ -6491,36 +6472,36 @@ PyInit__datetime(void)
* values. This may change in the future.*/
delta = new_delta(-1, 60, 0, 1); /* -23:59 */
if (delta == NULL)
return NULL;
return -1;
x = create_timezone(delta, NULL);
Py_DECREF(delta);
if (x == NULL || PyDict_SetItemString(d, "min", x) < 0)
return NULL;
return -1;
Py_DECREF(x);

delta = new_delta(0, (23 * 60 + 59) * 60, 0, 0); /* +23:59 */
if (delta == NULL)
return NULL;
return -1;
x = create_timezone(delta, NULL);
Py_DECREF(delta);
if (x == NULL || PyDict_SetItemString(d, "max", x) < 0)
return NULL;
return -1;
Py_DECREF(x);

/* Epoch */
PyDateTime_Epoch = new_datetime(1970, 1, 1, 0, 0, 0, 0,
PyDateTime_TimeZone_UTC, 0);
if (PyDateTime_Epoch == NULL)
return NULL;
return -1;

/* module initialization */
PyModule_AddIntMacro(m, MINYEAR);
PyModule_AddIntMacro(m, MAXYEAR);
PyModule_AddIntMacro(module, MINYEAR);
PyModule_AddIntMacro(module, MAXYEAR);

x = PyCapsule_New(&CAPI, PyDateTime_CAPSULE_NAME, NULL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that it's correct to override the capsule if PyInit__datetime() is called twice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CAPI is not fully static, there is a heap-allocated type that is created on exec. Not sure we can change this right now.

9a42428#diff-0090febf24974047330d8a02d4dbdc79L6486-L6487

Copy link
< 9E81 /details>
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we change it?

If it's important that we have two separate PyCapsule_New objects, doesn't that mean that if you have a C extension running in two subinterpreters that references this capsule, they'll get a different value for PyDateTimeAPI depending on when they look up the reference rather than depending on which subinterpreter they are running from?

Not sure I see a way around this without a breaking change to the API.

if (x == NULL)
return NULL;
PyModule_AddObject(m, "datetime_CAPI", x);
return -1;
PyModule_AddObject(module, "datetime_CAPI", x);

/* A 4-year cycle has an extra leap day over what we'd get from
* pasting together 4 single years.
Expand All @@ -6540,23 +6521,65 @@ PyInit__datetime(void)
Py_BUILD_ASSERT(DI100Y == 25 * DI4Y - 1);
assert(DI100Y == days_before_year(100+1));

us_per_ms = PyLong_FromLong(1000);
us_per_second = PyLong_FromLong(1000000);
us_per_minute = PyLong_FromLong(60000000);
seconds_per_day = PyLong_FromLong(24 * 3600);
if (!us_per_ms) {
us_per_ms = PyLong_FromLong(1000);
}
if (!us_per_second) {
us_per_second = PyLong_FromLong(1000000);
}
if (!us_per_minute) {
us_per_minute = PyLong_FromLong(60000000);
}
if (!seconds_per_day) {
seconds_per_day = PyLong_FromLong(24 * 3600);
}
if (us_per_ms == NULL || us_per_second == NULL ||
us_per_minute == NULL || seconds_per_day == NULL)
return NULL;
return -1;

/* The rest are too big for 32-bit ints, but even
* us_per_week fits in 40 bits, so doubles should be exact.
*/
us_per_hour = PyLong_FromDouble(3600000000.0);
us_per_day = PyLong_FromDouble(86400000000.0);
us_per_week = PyLong_FromDouble(604800000000.0);
if (!us_per_hour) {
us_per_hour = PyLong_FromDouble(3600000000.0);
}
if (!us_per_day) {
us_per_day = PyLong_FromDouble(86400000000.0);
}
if (!us_per_week) {
us_per_week = PyLong_FromDouble(604800000000.0);
}
if (us_per_hour == NULL || us_per_day == NULL || us_per_week == NULL)
return NULL;
return m;
return -1;

// Workaround to allow static PyTypeObjects to have they dict redefined.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Workaround to allow static PyTypeObjects to have they dict redefined.
// Workaround to allow static PyTypeObjects to have their dict redefined.

// XXX: to be removed once _datetime moves to heap allocated PyTypeObjects.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there negative consequences to this if we don't move to heap allocated PyTypeObjects?

Are there any downsides to using heap allocated types?

I don't want us to be in a situation where we have some workaround with negative consequences that we end up not being able to remove because we can't convert to heap allocated types.

PyType_ClearCache();

return 0;
}

static PyModuleDef_Slot datetimemodule_slots[] = {
{Py_mod_exec, datetimemodule_exec},
{0, NULL}
};

static struct PyModuleDef datetimemodule = {
PyModuleDef_HEAD_INIT,
"_datetime",
"Fast implementation of the datetime type.",
0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that this implies that the module is safe for subinterpreters, but this module still has several process-wide globals. Is it actually appropriate to set m_size to 0 in this case?

(Yes, I realize that it cannot be -1 if you want multi-phase initialization).

datetimemodule_methods,
datetimemodule_slots,
NULL,
NULL,
NULL
};

PyMODINIT_FUNC
PyInit__datetime(void)
{
return PyModuleDef_Init(&datetimemodule);
}

/* ---------------------------------------------------------------------------
Expand Down
0