[go: up one dir, main page]

Page MenuHomePhabricator

MinT for Readers: Implement instrumentation for key events
Open, In Progress, MediumPublic

Description

As part of the work on MinT for Wikipedia Readers MVP (T359072), an schema for instrumentation has been defined using the new data platform. This ticket proposes to implement the instrumentation for a set of key events. You can check the documentation for more details on how to use the new platform in the implementation.

Form the designed schema (T341185) the key events are the following:

  • session initiation
  • users searches for a topic
  • users selects an article
  • user clicks to view to automatic translation
  • user clicks to view human generated content
  • user closes the automatic translation view

Event Timeline

Hey @KCVelaga! I have some questions regarding the instrumentation of MinT for Wikipedia Readers MVP:

  1. could you please provide me the stream name for these events?
  2. given that the action_context should be a string, I suppose that the auto_translation_card case should be a string in this format: sourceLanguage;targetLanguage, where sourceLanguage and targetLanguage are the values of the source and target language codes, separated by a semi-colon. Is my assumption correct?
  3. The "users searches for a topic" event should be a "click" action according to the spec. My assumption would be that this event will be triggered when the user actually types a query inside the search input. Am I missing something here?

Change #1029238 had a related patch set uploaded (by Nik Gkountas; author: Nik Gkountas):

[mediawiki/extensions/ContentTranslation@master] MinT MVP: Add basic instrumentation

https://gerrit.wikimedia.org/r/1029238

ngkountas changed the task status from Open to In Progress.May 9 2024, 8:17 AM
ngkountas claimed this task.

Hi @ngkountas

  1. Stream name: mediawiki.mint_for_readers
  2. Regarding action_context for auto_translation_card: yes, you are right, it should be language codes separated by a semi-colon. In the spec document, @phuedx proposed using a page object (a new schema fragment?), which can also capture page_id if required, as something like, source_page: {lang: 'en', 'id': 1234}. Sam: can you explain more about that here?
  3. For user searching a topic: you are right, this should be initiated when users types something in the search input. Thinking again on this, click doesn't make sense for this, instead we can restructure it as action: search
  1. Regarding action_context for auto_translation_card: yes, you are right, it should be language codes separated by a semi-colon. In the spec document, @phuedx proposed using a page object (a new schema fragment?), which can also capture page_id if required, as something like, source_page: {lang: 'en', 'id': 1234}. Sam: can you explain more about that here?

Sure.

The MP Web base schema has a top-level page object, which has a number of auto-fillable properties about the current page. My question is/was: Are we only talking about a target language or are we actually talking about a target page? If so, perhaps we could create a schema fragment with a single property, target_page and, perhaps, create an API for filling out that property cleanly? For example:

namespace mw {
  interface eventLog {
    createPage( title: mw.Title ): Partial<Page>;
    createPage( title: mw.Title, Partial<Page> additionalProperties ): Partial<Page>;
  }
}

Thanks @phuedx !

Thinking beyond just this event, this is a use-case that will pop-up across various schemas related to the Language team. The ability to capture both language code and page id for both source and language will be beneficial in the longer term. The approach you suggested sounds good. It will also make it easier for analysis, as compared to extracting values from a string in action_context.

Thanks for your input @phuedx! To clarify, for the auto_translation_card action_source it makes sense to use a source_page object for the source language and the source page id. About the target page, I don't believe that the target page id offers a lot of value to be logged, as any translation can be uniquely identified by the source language, the target language and the source page id (or source page title). The same is true for translations in Content and Section Translation applications that are developed/maintained by the Language Team. It's also worth noting that for some events inside Content/Section Translation app (e.g. dashboard_open event) only the source-target language pair is needed, and no page id is used.

Finally, could you provide an example about how a page object can be used with mw.eventLog.submitInteraction or mw.eventLog.submitClick?

@KCVelaga_WMF thank you for your answers. About the "users searches for a topic", if we change the action to search we will also need a schemaId to be used with the mw.eventLog.submitInteraction method. Do you have any idea what value should be used as schemaId?

@ngkountas I am not sure about schemaId, I will check and get back to you.

Change #1029238 merged by jenkins-bot:

[mediawiki/extensions/ContentTranslation@master] MinT MVP: Add basic instrumentation

https://gerrit.wikimedia.org/r/1029238

Update: @ngkountas and I met with the Metrics Platform team (thank you @Sfaci for the walkthrough). Here is a summary

  • For consistency with other streams, stream name should be mediawiki.product_metrics.mint_for_readers
  • Stream configuration and registration should be added to https://gerrit.wikimedia.org/r/plugins/gitiles/operations/mediawiki-config/%2B/master/wmf-config/ext-EventStreamConfig.php
    • As the feature will be a Special: page, Nik rightly pointed out that capturing page_id and other page_ related information will not be of much use as it will be the same for all events. We are already capturing the actual article with the interaction data. Apart from that, the following fields will be helpful, apart from the core fields (to be added to provide_values in the configuration)
    • agent
      • client_platform
      • client_platform_family
    • performer
      • is_logged_in
      • performer_name
      • session_id
      • groups
      • is_bot
      • registration_dt

I will create a seperate sub-task for stream configuration and registration.

@ngkountas I have submitted a patch for stream configuration and registration. As per the discussion with MP team yesterday, please change the stream name in instrumentation to mediawiki.product_metrics.mint_for_readers

Change #1048400 had a related patch set uploaded (by Nik Gkountas; author: Nik Gkountas):

[mediawiki/extensions/ContentTranslation@master] AX instrumentation: Update stream name

https://gerrit.wikimedia.org/r/1048400

Thank you @KCVelaga_WMF! I just submitted the patch for updating the stream name. Do you need me to review your patch, too?

@ngkountas Yes, I have add you and engineers from the Metrics Platform team as reviewers for that.

Change #1048400 merged by jenkins-bot:

[mediawiki/extensions/ContentTranslation@master] AX instrumentation: Update stream name

https://gerrit.wikimedia.org/r/1048400

Based on the QA comments (copied below) there are a couple of aspects to polish for the instrumentation to work as expected. Moving the ticket back to in development:

  • Approximately ~560 events had validation errors till now. The error is the same for all of them: '.performer' should NOT have additional properties
    • When I checked the raw events, event were using a different schema version than the rest i.e. /analytics/product_metrics/web/base/1.0.0 instead of /analytics/product_metrics/web/base/1.2.0
  • Also, it seems like all the errored events were only users clicking the automatic translation card. If a user selected the automatic translation of an article, there should be a preceding event of article selection (which is part of the key events), and that seems to be missing. Can you please check if this being logged and also the human translation selection?

These validation errors are not caused by our implementation, but rather by a bug in the mw.eventLog.submitClick method of the Metrics Platform repository. All other events, that are not click events, (e.g. search, session_init etc) are properly logged, as far as I am concerned. A pull request has been submitted to fix the issue with the above method, and according to the plan it will be backported on Monday.

I'm moving this task to "Waiting for deployment" column. @KCVelaga_WMF do you think you could QA this task once the PR is merged and deployed?

I'm moving this task to "Waiting for deployment" column. @KCVelaga_WMF do you think you could QA this task once the PR is merged and deployed?

The change has been deployed. I'm monitoring https://logstash.wikimedia.org/goto/504f7423ac1d4643e285e8cf4b80cb33.

Thank you @phuedx. I don't see any more validation errors related the schema version. The data also has click events for automatic and human translation selections.

@ngkountas Can you double check if the preceding event to user selecting an article from search before selecting automatic translation or human translation is being logged or not? (3rd event in the list of keys events)

I can confirm that click events for user selecting an article are being logged. We have ~30+ events so far.

However, we have ~1000+ events for users clicking the auto_translation_card, so I am wondering about the possibility if users can go to clicking the automatic translation card, without selecting an article.

However, we have ~1000+ events for users clicking the auto_translation_card, so I am wondering about the possibility if users can go to clicking the automatic translation card, without selecting an article.

Not sure about the instrumentation implementation details. But in terms of the user workflow there are multiple possible paths.
The general workflow consists of the following steps Home -> Search -> Confirm -> Translation View
Users can directly access any of them through the url. Entry points direct them to the Confirm step directly where the article and language pairs have been selected based on the context where the user comes form.

QA Results

The following events have been verified to be properly logged:

session_init (session initiation):

search (users searches for a topic):

"search_result" click (users selects an article):

"auto_translation_card" click (user clicks to view to automatic translation):

"human_translation_card" click (user clicks to view human generated content):

Missing piece: The only current unsupported event for this task is the "close" click event (user closes the automatic translation view). Moving this task back to priority log, until this is also fixed.

PWaigi-WMF renamed this task from MinT MVP: Implement instrumentation for key events to MinT for Readers: Implement instrumentation for key events .Tue, Aug 20, 8:57 AM

Change #1064047 had a related patch set uploaded (by Nik Gkountas; author: Nik Gkountas):

[mediawiki/extensions/ContentTranslation@master] MinT for Readers: Log "view" event after content has been loaded

https://gerrit.wikimedia.org/r/1064047

Change #1064048 had a related patch set uploaded (by Nik Gkountas; author: Nik Gkountas):

[mediawiki/extensions/ContentTranslation@master] MinT for Readers: Instrument "close automatic translation" event

https://gerrit.wikimedia.org/r/1064048

Change #1064047 merged by jenkins-bot:

[mediawiki/extensions/ContentTranslation@master] MinT for Readers: Log "view" event after content has been loaded

https://gerrit.wikimedia.org/r/1064047

Change #1064048 merged by jenkins-bot:

[mediawiki/extensions/ContentTranslation@master] MinT for Readers: Instrument "close automatic translation" event

https://gerrit.wikimedia.org/r/1064048