Ability to alert when we get a sudden increase in bad passwords for privileged accounts
Open, MediumPublic
Actions

Assigned To

None

Authored By

	• csteipp
	Jan 11 2016, 5:20 PM

Description

Login failures are stored in various places. We should be able to alert when the number of failures suddenly increases, as we would typically see for password brute forcing.

Failed password attempts for privileged accounts are logged in elastic search. Yelp uses elastic search and elastalert (https://github.com/yelp/elastalert) to detect brute forcing, we could do similar.

In response to the alert, we can start with alerting the security team / ops. If the alerts look reliable, we can add alerting for the account being brute forced. If that appears to reliably detect brute-forcing, we could in the future automatically block the IP from logging in for a short period of time.

Details

	Subject	Repo	Branch	Lines +/-
	Move auth logging to different channels for easier counting	operations/mediawiki-config	master	+33 -38

Customize query in gerrit

Related Objects

Mentioned In: T213933: PoC alert/notification functionality with Elastic Stack
T150300: icinga notification if elevated writing to badpass.log
T140942: Tracking: Monitoring and alerts for "business" metrics
Mentioned Here: T213933: PoC alert/notification functionality with Elastic Stack
T150300: icinga notification if elevated writing to badpass.log
T193769: Thousands of failed login attempts (wrong password)
T140942: Tracking: Monitoring and alerts for "business" metrics

Event Timeline

• csteipp created this task.Jan 11 2016, 5:20 PM

• csteipp raised the priority of this task from to Needs Triage.

• csteipp updated the task description. (Show Details)

• csteipp added a project: Security-Team.

• csteipp subscribed.

Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald TranscriptJan 11 2016, 5:20 PM

Krenair subscribed.Jan 11 2016, 5:21 PM

ori subscribed.Apr 24 2016, 6:00 PM

• Tgr mentioned this in T140942: Tracking: Monitoring and alerts for "business" metrics.Sep 16 2016, 9:40 PM

T193769 may be a good example

Bawolff edited projects, added acl*security, Security-team-backlog; removed Security-Team.Sep 4 2018, 3:46 PM

• chasemp mentioned this in T150300: icinga notification if elevated writing to badpass.log.Oct 2 2018, 12:50 PM

Change 464077 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[operations/mediawiki-config@master] Move auth logging to different channels for easier counting

https://gerrit.wikimedia.org/r/464077

gerritbot added a project: Patch-For-Review.Oct 2 2018, 11:06 PM

Change 464077 merged by jenkins-bot:
[operations/mediawiki-config@master] Move auth logging to different channels for easier counting

https://gerrit.wikimedia.org/r/464077

Mentioned in SAL (#wikimedia-operations) [2018-11-01T00:05:47Z] <tgr@deploy1001> Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:464077|Move auth logging to different channels for easier counting (T150300, T123243)]] (duration: 00m 53s)

Mentioned in SAL (#wikimedia-operations) [2018-11-01T00:07:13Z] <tgr@deploy1001> Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:464077|Move auth logging to different channels for easier counting (T150300, T123243)]] (duration: 00m 53s)

Framawiki subscribed.Dec 9 2018, 9:42 PM

• chasemp renamed this task from Ability to alert when we get a sudden increase in bad passwords for privileged accounts, to possibly detect password brute-forcing to Ability to alert when we get a sudden increase in bad passwords for privileged accounts.Dec 20 2018, 8:46 PM

• chasemp edited projects, added User-chasemp; removed Patch-For-Review.

fgiunchedi subscribed.Jan 11 2019, 11:26 AM

• chasemp mentioned this in T213933: PoC alert/notification functionality with Elastic Stack.Jan 16 2019, 3:25 PM

• chasemp triaged this task as Medium priority.Dec 9 2019, 5:22 PM

• chasemp edited projects, added Security-Team; removed Security-team-backlog.Dec 23 2019, 5:12 PM

• chasemp moved this task from Incoming to Back Orders on the Security-Team board.

• chasemp added a project: Security.Feb 10 2020, 10:59 PM

• chasemp removed a project: acl*security.Feb 20 2020, 8:07 PM

Reedy removed a project: Security-Team.Nov 3 2021, 7:22 PM

Aklapper added a project: observability.Nov 27 2021, 4:04 PM

We have prometheus-es-exporter available that will turn the result of ES queries into Prometheus metrics. Alertmanager can easily turn these metrics into alerts.

Is there a Kibana dashboard or saved search we could reference?

Not that I know. The relevant searches are type:mediawiki AND channel:badpass (all failed login attempts) and type:mediawiki AND channel:badpass-priv (failed login attempts into admin and similar accounts).
(Except the first search will match both channel names, not sure what's the right syntax there.)

(There's also goodpass / goodpass-priv if you want to make it a ratio instead of an absolute number.)

There's also a login throttle hit dashboard, though probably less useful as a careful attacker could spread out his attempts and avoid being throttled.

Thanks @Tgr

I'm assuming -priv suffix means privileged accounts.

Looking at the data, it seems msgname.keyword:wrongpassword AND channel.keyword:badpass-priv gets us a histogram of bad password attempts for privileged accounts minute by minute. The data is pretty inconsistent though. Getting an alarm threshold that is useful and actionable would be difficult for me to find.

Could someone who wants these alerts could have a look and identify a sensible alert threshold? I'm not sure we can break it down much further given PII and metrics carnality concerns.

@sbassett any thoughts? This is a very old task, not sure how relevant it is to the Security team's current thinking.

The dashboard does show some apparent attacks (e.g.) but as long as it's just some troll manually fooling around, that probably shouldn't be alarmed on. So maybe growth of bad password volume by a magnitude or two?

@colewhite @Tgr -

Thanks for the ping on this. Given the age and dormancy of this task, I'll re-triage it for our team's clinic next Monday. I think @Dsharpe might have more insight into what functionality, if any, is currently desired for this variety of monitoring, as he is closer to the Security-Team's incident response and mitigation policies. I know there was some related work in T213933, which was eventually declined in preference of a potential, alternative approach.

Anyhow, my more personal thoughts are that these types of things can be very difficult to monitor in any meaningful sense, especially given an environment like Wikimedia production, where there are issues regarding both large volumes of data and large volumes of noise. Determining various thresholds and rates for what might constitute an actual event of concern can be more art than science. That being said, there may very well be some value in monitoring the -priv channels as mentioned above, so the Security-Team can re-evaluate this and hopefully provide some guidance soon.

sbassett moved this task from Back Orders to Incoming on the Security-Team board.Dec 3 2021, 6:35 PM

We want to investigate and deploy sound, actionable detection and alerting around identity in general, but I am not sure alerting on spikes on 100% failed login attempts will get us very far down that road.

I know this is asking a lot, but if we had some way to add on some detection around the most privileged accounts to detect higher risk behavior or maybe major deviations from normal activity from a particular account (e.g. a login from a place the account owner would never log in from), that would be useful.

In T123243#7556836, @Dsharpe wrote:

I know this is asking a lot, but if we had some way to add on some detection around the most privileged accounts to detect higher risk behavior or maybe major deviations from normal activity from a particular account (e.g. a login from a place the account owner would never log in from), that would be useful.

IIUC, it sounds like you're asking for Anomaly Detection and/or IDS-like capability. Observability can feed data into a system that does it, but we don't have this capability ourselves at this time.

In T123243#7554665, @Dsharpe wrote:

We want to investigate and deploy sound, actionable detection and alerting around identity in general, but I am not sure alerting on spikes on 100% failed login attempts will get us very far down that road.

By "100% failed" do you mean throttled? An alarm on badpass-priv volume would alert on spikes of any kind of failed password-based login attempts.
I think that could be useful for detecting mass dictionary or "pwned passwords" attempts. Most of those would be detected anyway, by LoginNotify and the notified users reaching out, but an attacker could be sneaky about it, and make a large number of attempts, each to a different user.

In T123243#7556836, @Dsharpe wrote:

I know this is asking a lot, but if we had some way to add on some detection around the most privileged accounts to detect higher risk behavior

badpass-priv is already reasonably privileged (admins and higher). Having something for even higher privileges (ifadmin/checkuser/oversighter/steward I assume?) would not be too hard either. If you mean privileged actions as opposed to login attempts, I don't think that can be detected by volume - an attack will not result of an unusually high number of, say, permission changes or JS page edits, the attacker only needs to do one or two.

or maybe major deviations from normal activity from a particular account (e.g. a login from a place the account owner would never log in from)

MediaWiki does not track login locations (the SecureSessions extension does that, but it's long unmaintained + was never deployed on Wikimedia), so it would have to be updated (which seems nontrivial) or that would have to happen in some external system that aggregates location information from logs. (IP logging to logstash should probably be fixed before that; currently most logged events send the reverse proxy's IP, not the actual client IP.)

• Dsharpe moved this task from Incoming to Watching on the Security-Team board.Dec 13 2021, 2:52 AM

herron moved this task from Inbox to Radar on the observability board.Jan 12 2022, 5:42 PM

Ability to alert when we get a sudden increase in bad passwords for privileged accountsOpen, MediumPublicActions

Description

Details

Related Objects

Event Timeline

Ability to alert when we get a sudden increase in bad passwords for privileged accounts
Open, MediumPublic
Actions