Fix systemd and possibly logrotate around the wmf-pt-kill service for multi-instance wikireplicas
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• Bstorm
	Feb 7 2021, 12:19 AM

Description

Pages went off on Sun Feb 7 00:05:02 UTC 2021 because logrotate still has a job installed by the package (as well as the systemd service that the package installs), and when triggered by the schedule, it caused failure statuses in systemd.

Jan 17 00:00:02 clouddb1016 logrotate[6515]: Job for wmf-pt-kill.service failed because the control process exited with error code.
Jan 17 00:00:02 clouddb1016 logrotate[6515]: See "systemctl status wmf-pt-kill.service" and "journalctl -xe" for details.
Jan 17 00:00:02 clouddb1016 logrotate[6515]: error: error running shared postrotate script for '/var/log/wmf-pt-kill/wmf-pt-kill.log '
Jan 17 00:00:02 clouddb1016 systemd[1]: logrotate.service: Main process exited, code=exited, status=1/FAILURE
Jan 17 00:00:02 clouddb1016 systemd[1]: logrotate.service: Failed with result 'exit-code'.
Jan 17 00:00:02 clouddb1016 systemd[1]: Failed to start Rotate log files.

Get puppet to clean up the logrotate for wmf-pt-kill, add logrotate scripts for the multi-socket services, and perhaps mask the service that is installed by the package so it stops "failing" when things try to run it.

Details

	Subject	Repo	Branch	Lines +/-
	wikireplicas: adjust logrotate for multiinstance on wmf-pt-kill	operations/puppet	production	+17 -2

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Resolved	Marostegui	T233766 labsdb1011 mariadb crashed
		Restricted Task
		Restricted Task
Open	None	T204950 Public Edit Data Lake: Mediawiki history snapshots available in SQL data store to cloud (labs) users
Open	None	T215858 Plan a replacement for wiki replicas that is better suited to typical OLAP use cases than the MediaWiki OLTP schema
Resolved	fnegri	T280152 Mitigate breaking changes from the new Wiki Replicas architecture
		Unknown Object (Task)
Resolved	RobH	T260441 (Need By: ASAP) rack/setup/install clouddb10[13-20]
Resolved	• Bstorm	T260389 Redesign and rebuild the wikireplicas service using a multi-instance architecture
Resolved	• Bstorm	T260511 Parametrize wmf-pt-kill so it can connect to different sockets
Resolved	• Bstorm	T274044 Fix systemd and possibly logrotate around the wmf-pt-kill service for multi-instance wikireplicas

Event Timeline

• Bstorm triaged this task as Medium priority.Feb 7 2021, 12:19 AM

• Bstorm created this task.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 7 2021, 12:19 AM

• Bstorm moved this task from Backlog to Wiki replicas on the Data-Services board.Feb 7 2021, 12:19 AM

• Bstorm added a parent task: T260511: Parametrize wmf-pt-kill so it can connect to different sockets.

Change 662797 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wikireplicas: adjust logrotate for multiinstance on wmf-pt-kill

https://gerrit.wikimedia.org/r/662797

gerritbot added a project: Patch-For-Review.Feb 9 2021, 12:34 AM

Made some comments on the patchset regarding the current situation with two processes accessing the same file.

Marostegui added a project: Data-Persistence (work done).Feb 9 2021, 7:09 AM

Change 662797 merged by Bstorm:
[operations/puppet@production] wikireplicas: adjust logrotate for multiinstance on wmf-pt-kill

https://gerrit.wikimedia.org/r/662797

jcrespo awarded a token.Feb 10 2021, 6:05 PM

Maintenance_bot removed a project: Patch-For-Review.Feb 10 2021, 6:11 PM

I think this should be good now. We'll know if it continues to log things after logrotate runs. If it doesn't, then copytruncate wasn't sufficient and it does need a restart.

For the record I have generated a query that got logged - we can check once it is rotated and generate another one and check if it gets logged to the new file

root@clouddb1014:/var/log# cat wmf-pt-kill/wmf-pt-kill-s7.log
# 2021-02-15T08:22:29 KILL 4939549 (Query 309 sec) select sleep(600)

This is done, I believe.

Fix systemd and possibly logrotate around the wmf-pt-kill service for multi-instance wikireplicasClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Fix systemd and possibly logrotate around the wmf-pt-kill service for multi-instance wikireplicas
Closed, ResolvedPublic
Actions

Related Objects
Search...