Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
10.5, 10.6, 10.11, 10.2(EOL), 10.3(EOL), 10.4(EOL), 10.7(EOL), 10.8(EOL), 10.9(EOL), 10.10(EOL), 11.0(EOL)
Description
mleich produced an rr replay trace of an execution where InnoDB crashes soon after being started up on a freshly initialized database, without involving any crash recovery, soon after an undo tablespace was truncated:
10.5 5300c0fb7632e7e49a22997297bf731e691d3c24 |
Thread 5 hit Breakpoint 7, sql_print_information (format=format@entry=0x56404df92008 "InnoDB: %s") at /data/Server/bb-10.5-MDEV-30657/sql/log.cc:9283
|
9283 in /data/Server/bb-10.5-MDEV-30657/sql/log.cc
|
1: x/i $pc
|
=> 0x56404d8a3e20 <sql_print_information(char const*, ...)>: endbr64
|
(rr)
|
Continuing.
|
2023-02-15 19:11:55 0 [Note] InnoDB: Truncated .//undo002
|
|
Thread 14 received signal SIGSEGV, Segmentation fault.
|
[Switching to Thread 3065874.3070558]
|
__memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:497
|
497 ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: No such file or directory.
|
1: x/i $pc
|
=> 0x7f90c8b77955 <__memmove_avx_unaligned_erms+741>: vmovdqu (%rsi),%ymm0
|
(rr) bt
|
#0 __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:497
|
#1 0x000056404db25250 in memcpy (__len=<optimized out>, __src=0x7f90bb9123f7, __dest=0x7f8f8bf92078) at /usr/include/x86_64-linux-gnu/bits/string_fortified.h:34
|
#2 mem_heap_dup (len=<optimized out>, data=0x7f90bb9123f7, heap=0x7f90a40643a0) at /data/Server/bb-10.5-MDEV-30657/storage/innobase/include/mem0mem.h:240
|
#3 trx_undo_rec_copy (heap=0x7f90a40643a0, undo_rec=0x7f90bb9123f7 "") at /data/Server/bb-10.5-MDEV-30657/storage/innobase/include/trx0rec.inl:70
|
#4 trx_undo_get_undo_rec_low (roll_ptr=<optimized out>, heap=0x7f90a40643a0) at /data/Server/bb-10.5-MDEV-30657/storage/innobase/trx/trx0rec.cc:1998
|
#5 0x000056404db27513 in trx_undo_get_undo_rec (undo_rec=<synthetic pointer>, name=..., trx_id=<optimized out>, heap=<optimized out>, roll_ptr=19984723349668855)
|
at /data/Server/bb-10.5-MDEV-30657/storage/innobase/trx/trx0rec.cc:2030
|
#6 trx_undo_prev_version_build (index_rec=index_rec@entry=0x7f90bbd98088 "\200", index_mtr=index_mtr@entry=0x7f909c074370, rec=rec@entry=0x7f90bbd98088 "\200", index=index@entry=0x7f90a805a198,
|
offsets=0x7f909c073d30, heap=heap@entry=0x7f90a40643a0, old_vers=0x7f909c073b98, v_heap=0x0, vrow=0x0, v_status=0) at /data/Server/bb-10.5-MDEV-30657/storage/innobase/trx/trx0rec.cc:2116
|
#7 0x000056404db0ce3c in row_vers_build_for_consistent_read (rec=0x7f90bbd98088 "\200", mtr=0x7f909c074370, index=0x7f90a805a198, offsets=0x7f909c073cf8, view=0x7f90c803e470, offset_heap=0x7f909c073cf0,
|
in_heap=0x7f90b407df70, old_vers=0x7f909c073f90, vrow=0x0) at /data/Server/bb-10.5-MDEV-30657/storage/innobase/row/row0vers.cc:1161
|
#8 0x000056404db016dc in row_search_mvcc (buf=0x7f9094059680 "\377\377\377", mode=PAGE_CUR_G, mode@entry=PAGE_CUR_UNSUPP, prebuilt=0x7f9094071b28, match_mode=<optimized out>, direction=1)
|
at /data/Server/bb-10.5-MDEV-30657/storage/innobase/row/row0sel.cc:5220
|
After extensive debugging, I concluded that the problem is that undo tablespace truncation only waits until there are no active transactions (which could be potentially rolled back) in any rollback segment that is stored in the undo tablespace that is being considered for truncation. There is a check like this in trx_purge_truncate_history():
for (ulint i= 0; i < TRX_SYS_N_RSEGS; ++i) |
{
|
trx_rseg_t *rseg= trx_sys.rseg_array[i];
|
if (!rseg || rseg->space != &space) |
continue; |
mutex_enter(&rseg->mutex);
|
ut_ad(rseg->skip_allocation);
|
ut_ad(rseg->is_persistent());
|
if (rseg->trx_ref_count) |
{
|
not_free:
|
mutex_exit(&rseg->mutex);
|
return; |
}
|
At this point of time, rseg->needs_purge, which was added in MDEV-13536, can still hold. The transaction reference count would be decremented in trx_t::commit_in_memory().
At the time of the crash, the DB_TRX_ID=0x2aea of the clustered index record whose history we are attempting to fetch, is explicitly listed in the purge_sys.view, meaning that its history must not be purged yet:
(rr) f 7
|
#7 0x000056404db0ce3c in row_vers_build_for_consistent_read (rec=0x7f90bbd98088 "\200", mtr=0x7f909c074370, index=0x7f90a805a198, offsets=0x7f909c073cf8, view=0x7f90c803e470, offset_heap=0x7f909c073cf0,
|
in_heap=0x7f90b407df70, old_vers=0x7f909c073f90, vrow=0x0) at /data/Server/bb-10.5-MDEV-30657/storage/innobase/row/row0vers.cc:1161
|
1161 /data/Server/bb-10.5-MDEV-30657/storage/innobase/row/row0vers.cc: No such file or directory.
|
(rr) p/x *rec@17
|
$38 = {0x80, 0x0, 0x5, 0x37, 0x0, 0x0, 0x0, 0x0, 0x2a, 0xea, 0x47, 0x0, 0x0, 0x0, 0x31, 0x3, 0xf7}
|
(rr) p/x purge_sys.view
|
$39 = {m_low_limit_id = 0x2b26, m_up_limit_id = 0x2ad5, m_ids = std::vector of length 8, capacity 41 = {0x2ad5, 0x2aea, 0x2b0a, 0x2b0f, 0x2b1b, 0x2b22, 0x2b24, 0x2b25}, m_low_limit_no = 0x2b26}
|
I think that the following could fix the bug:
diff --git a/storage/innobase/trx/trx0purge.cc b/storage/innobase/trx/trx0purge.cc
|
index 4d84f295c0b..39095013457 100644
|
--- a/storage/innobase/trx/trx0purge.cc
|
+++ b/storage/innobase/trx/trx0purge.cc
|
@@ -627,7 +627,7 @@ static void trx_purge_truncate_history()
|
mutex_enter(&rseg->mutex);
|
ut_ad(rseg->skip_allocation);
|
ut_ad(rseg->is_persistent());
|
- if (rseg->trx_ref_count)
|
+ if (rseg->needs_purge || rseg->trx_ref_count)
|
{
|
not_free:
|
mutex_exit(&rseg->mutex);
|
@@ -753,6 +753,8 @@ static void trx_purge_truncate_history()
|
|
ut_ad(rseg->id == i);
|
ut_ad(rseg->is_persistent());
|
+ ut_ad(!rseg->trx_ref_count);
|
+ ut_ad(!rseg->needs_purge);
|
ut_d(const auto old_page= rseg->page_no);
|
|
buf_block_t *rblock= trx_rseg_header_create(&space, i, |
Attachments
Issue Links
- blocks
-
MDEV-29593 Purge misses a chance to free not-yet-reused undo pages
- Closed
- causes
-
MDEV-30863 Server freeze, all threads in trx_assign_rseg_low
- Closed
-
MDEV-31234 InnoDB does not free UNDO after the fix of MDEV-30671, thus shared tablespace (ibdata1) may grow indefinitely for no good reason
- Closed
- relates to
-
MDEV-30753 Possible corruption due to trx_purge_free_segment()
- Closed
-
MDEV-31355 innodb_undo_log_truncate=ON fails to wait for purge of enough transaction history
- Closed
-
MDEV-11802 innodb.innodb_bug14676111 fails in buildbot due to InnoDB purge failing to start when there is work to do
- Closed
-
MDEV-15608 Crash during transaction rollback when using optimistic parallel replication, few threads, non-durable configuration.
- Closed
-
MDEV-22718 InnoDB: purge_sys.low_limit_no() is not protected
- Stalled