`blockquote` message entity api-7.0. #4038

aelkheir · 2023-12-30T06:49:01Z

Scope: Block Quotation

Added support for “blockquote” entities in received messages. @aelkheir
Added support for “blockquote” entity parsing in “MarkdownV2” and “HTML” parse modes @aelkheir
Allowed to explicitly specify “blockquote” entities in formatted texts. @aelkheir

Check-list for PRs

Added .. versionadded:: NEXT.VERSION, .. versionchanged:: NEXT.VERSION or .. deprecated:: NEXT.VERSION to the docstrings for user facing changes (for methods/class descriptions, arguments and attributes)
Created new or adapted existing unit tests
Documented code changes according to the CSI standard <https://standards.mousepawmedia.com/en/stable/csi.html>__
Added new classes & modules to the docs and all suitable __all__ s
Checked the Stability Policy <https://docs.python-telegram-bot.org/stability_policy.html>_ in case of deprecations or changes to documented behavior
If relevant:
- Added new constants at :mod:telegram.constants and shortcuts to them as class variables
- Added or updated documentation for the changed class(es) and/or method(s)

Bibo-Joshi

Thanks for the PR! The changes so far look good. Can you please also update Message.(text|caption)_(html|markdown).* and the corresponding tests?

aelkheir · 2023-12-30T21:59:27Z

Roger!

Bibo-Joshi

Thanks for the updates :) just two comments

telegram/_message.py

Bibo-Joshi · 2024-01-01T13:39:44Z

telegram/_message.py

+                        raise ValueError(
+                            "Blockquote entities are not supported for Markdown version 1"
+                        )
+                    insert = f">{escaped_text}"


needs multiline handling as already being discussed in https://t.me/pythontelegrambotdev/965. Please update the tests to cover these cases :)

Sorry for the late reply.
So, I came up with this, which I think would fit most cases

insert = "\n>" + escaped_text.replace("\n", "\n>") + "\n"

I'm assuming that the output of _parse_(html, markdown*) should produce the same visual structure as the original message if/when used like message.reply_text(message.text_markdown/html)

these are the reasons for the extra \ns

each quote line/paragraph, including the first, must be in its own new line. So trying to send ">line 1>line2" or "abc >def" as text with pars_mode=mdv2, will raise telegram.error.BadRequest: Can't parse entities: character '>' ...

the parsed quote must be separated from the following entities/text with a line break . This is to prevent the last line of the quote from including the rest of the message text as part of the quote.

Issues am facing is when the message starts (or ends) with a blockquote entity. There will be an extra unneeded blank lines.
I guess I can trim them at message.(text/caption)_(html/markdown)*, but don't like the repetition. would appreciate any ideas, or even a completely different approach

Not to worry. Thanks for your elaborations! I haven't looked into it deeply yet, but a spontanuous idea would be that one should be able to use MessageEntity.{offset, length} to determine if the quoted text is at the start/end of the message. Also I'm wondering if escaped_text.replace("\n", "\n>") works as intended if the quote-text itself contains "\n", i.e. something like

This is \n a quote

🤔

one should be able to use MessageEntity.{offset, length} to ...

Oh nice, problem solved then.

Also I'm wondering if escaped_text.replace("\n", "\n>") works as intended

Hmm, apparently it does. Telegram escapes \ns when its meant literally. and surprisingly string.replace("\n", "...") will not match escaped \\ns

$ python Python 3.11.4 ... >>> """line 1 ... line \\n 2 ... line 3""".replace("\n", ' ** ') 'line 1 ** line \\n 2 ** line 3' >>>

okay, that's good, then :) adding this case to the tests would still be sane IMO :D

Bibo-Joshi

Thanks for the updates! I did some manual testing and unfortunately it seems that the situation is even more involved than I though :/ My findings so far:

inserting \n everywhere introuduces unwanted linebreaks
in markdown > must always be the first character in the line to work for the quote block. Hence, whether or not we need to insert an \n depends on the text before the entity. E.g. "ABC<blockquote>DEF</blockquote>" works fine, but "ABC>DEF" does not - we'd have to use "ABC\n>DEF". This renders in the same way, but the text of the message is in fact different (by a \n)
if the block quote also contains a nested entity, then something in the handling for that un-escapes the line breaks leading to \n in the text_md_v2 instead of the original \\n
One thing that MD apparently just can not represent is 2 block quotes following each other directly without blank line. "<blockquote>ABC</blockquote><blockquote>DEF</blockquote>" works for HTML, but ">ABC\n>DEF" renders as one two-line quote, while ">ABC\n\n>DEF"` rendors as two one-line quotes separated by a blank line. This also needs checkinf if two block quote entities are following each other directly and TBH af first glance I think the most sane thing to do is to raise an exception and not produce any output.

I'm not sure how we can proceed here. But then again, it's late and I might see more clearly tomorrow …

For reference: I tried to test by checking if sending a message with the produced text_* has the same message entities & text as the original message. MWE:

from telegram import Message, Update
from telegram.ext import CallbackContext, Application, MessageHandler, filters


async def handle(update: Update, context: CallbackContext):
    message = update.effective_message

    print(repr(message.text))
    print(repr(message.entities))
    print(repr(message.text_html))
    print(repr(message.text_markdown_v2))

    reply_md = await message.reply_markdown_v2(message.text_markdown_v2)
    reply_html = await message.reply_html(message.text_html)
    for reply, name in ((reply_md, "markdown"), (reply_html, "HTML")):
        print("-" * 80)
        print(f"Replying with {name}:")
        print(reply.text == message.text, repr(reply.text), repr(message.text), sep="\n")
        print(
            reply.entities == message.entities,
            repr(reply.entities),
            repr(message.entities),
            sep="\n",
        )
        print(
            reply.text_html == message.text_html,
            repr(reply.text_html),
            repr(message.text_html),
            sep="\n",
        )
        print(
            reply.text_markdown_v2 == message.text_markdown_v2,
            repr(reply.text_markdown_v2),
            repr(message.text_markdown_v2),
            sep="\n",
        )


if __name__ == "__main__":
    application = (
        Application.builder()
        .token("TOKEN")
        .build()
    )
    application.add_handler(MessageHandler(filters.ALL, handle))
    application.run_polling()

Test messages that I used:

Edit:

So far insert = f">" + "\n>".join(escaped_text.splitlines()) worked best for me
Reworking the tests to check if sending the text_html/md results in the same text & entities is something that we should do IMO as it's more precise than asserting that we get the expected formatting entities in the string text. That's beyond the scope of this PR, though

Bibo-Joshi · 2024-01-05T22:13:34Z

telegram/_message.py

+                            "Blockquote entities are not supported for Markdown version 1"
+                        )
+                    prefix = ">" if entity.offset == 0 else "\n>"
+                    message_length = len(str(message_text, encoding="utf-16-le"))  # type: ignore


len(message_text.decode("utf-16-le")) would do the trick - maybe that also circumvents the type: ignore? one could save that decoded text in a variable at the top of the function

…rocess

Bibo-Joshi · 2024-01-06T16:38:00Z

I dug around more, ended up refactoring a bit of code took the liberty to push my changes to your branch. Managed to fix the nested entities stuff (and I'm wondering why the if offset == 0 never caused problems so far).

What's still missing is throwing an exception for two consequcitve block quotes in MarkdownV2.

Moreover, even though block quote parsing seems to be working correctly now, I think there is a bug on TG side. Sending a text of the form ">A\nB" leads to TG detecting a Message entity of length 2 instead of 1, which includes the "\n". You can reproduce this via

https://api.telegram.org/botTOKEN/sendMessage?chat_id=CHAT_ID&text=%3EA%0AB&parse_mode=MarkdownV2

If you try to compute the text_md_v2 for the resulting message, the B will be moved into the block quote.

For completeness, let me also share my renewed testing script:

import asyncio
import json
from pathlib import Path
from pprint import pprint

from telegram import Message, Update, Bot
from telegram.constants import ParseMode
from telegram.ext import CallbackContext, Application, MessageHandler, filters


def get_message(bot: Bot) -> Message:
    return Update.de_json(json.loads(Path("response_3.json").read_bytes()), bot).effective_message


async def main():
    async with Bot("TOKEN") as bot:
        message = get_message(bot)
        await run_tests(message)


def get_formatted_text(message: Message, parse_mode: ParseMode) -> str:
    return getattr(message, f"text_{parse_mode.name.lower()}")


async def run_tests(message: Message) -> None:
    print(repr(message.text))
    pprint(repr(message.entities))

    for parse_mode in (ParseMode.HTML, ParseMode.MARKDOWN_V2):
        print("-" * 80)
        print(f"Replying with {parse_mode}:")
        await message.get_bot().send_message(
            chat_id=1145108092,
            text=f"============ Replying with {parse_mode}:",
        )
        formatted_text = get_formatted_text(message, parse_mode)
        print(repr(formatted_text))
        reply = await message.get_bot().send_message(
            chat_id=1145108092,
            text=formatted_text,
            parse_mode=parse_mode,
        )
        print(reply.text == message.text, repr(reply.text), rep
8000
r(message.text), sep="\n")
        print(
            reply.entities == message.entities,
            repr(reply.entities),
            repr(message.entities),
            sep="\n",
        )
        pprint(reply.parse_entities())
        print(
            reply.text_html == message.text_html,
            repr(reply.text_html),
            repr(message.text_html),
            sep="\n",
        )
        print(
            reply.text_markdown_v2 == message.text_markdown_v2,
            (reply.text_markdown_v2),
            (message.text_markdown_v2),
            sep="\n====\n",
        )
        next_reply = await reply.reply_text(
            get_formatted_text(reply, parse_mode), parse_mode=parse_mode, quote=True
        )
        await next_reply.reply_text(
            get_formatted_text(next_reply, parse_mode), parse_mode=parse_mode, quote=True
        )


if __name__ == "__main__":
    asyncio.run(main())

response.json:

{
    "update_id": 219020534,
    "message": {
        "message_id": 31843,
        "from": {
            "id": 1145108092,
            "is_bot": false,
            "first_name": "Hinrich",
            "last_name": "Mahler (@Bibo-Joshi)",
            "username": "BiboJoshi",
            "language_code": "de"
        },
        "chat": {
            "id": 1145108092,
            "first_name": "Hinrich",
            "last_name": "Mahler (@Bibo-Joshi)",
            "username": "BiboJoshi",
            "type": "private"
        },
        "date": 1704553973,
        "text": "block \\n quote\nABC\nblock \\n quote\nwith \\n line break\nDEF\nblock \\n quote\nwith \\n line breakp",
        "entities": [
            {
                "offset": 0,
                "length": 14,
                "type": "blockquote"
            },
            {
                "offset": 9,
                "length": 5,
                "type": "underline"
            },
            {
                "offset": 19,
                "length": 33,
                "type": "blockquote"
            },
            {
                "offset": 28,
                "length": 5,
                "type": "italic"
            },
            {
                "offset": 28,
                "length": 5,
                "type": "strikethrough"
            },
            {
                "offset": 57,
                "length": 34,
                "type": "blockquote"
            },
            {
                "offset": 66,
                "length": 5,
                "type": "spoiler"
            }
        ]
    }
}

Bibo-Joshi · 2024-01-06T17:30:39Z

Ah, what I forgot is handling of cases that result from things like ABC<blockquote>DEF</blockquote>GHI, where we need to insert newline characters.

For the potential bug in TG sidr mentioned above, I guess we can try to work around it but checking if the text if the block quote ends on a newline character. Funny things may still happen if the quote is located at the end of the message 🤔 in any case I think its worth pointing oit this behavior to TG.

In fact one could also point them to the first past of this comment and ask them to unify the handling of these cases, always breaking lines before and after block quotes ...

Bibo-Joshi · 2024-01-07T13:32:38Z

I opened tdlib/telegram-bot-api#515

Bibo-Joshi · 2024-01-08T19:23:27Z

A first reply on tdlib/telegram-bot-api#515 is there. Punshlines:

We're discuraged from using mechanism like text_hmtl/markdown that convert MessageEntities back into markup syntax and the newline behavior
code blocks and block quotes start start & end of new lines, but in the end the rendering is client side decision

While somewhat expected, this reply is still a bummer for me and I see the need for an internal discussion on how to proceed with text_html/markdown.

For this PR, I will revert to the stage that "just" added the MessageEntity extensions and open a new WIP-PR with the current state for reference & further discussion. Sorry that my request to implement text_html/markdown took this turn @aelkheir . I want to thank you for the contribution in any case!

…in the process" This reverts commit 0dd0529.

This reverts commit ba9cfcf.

This reverts commit 0af70b3.

…uote'" This reverts commit 1d5c3b1.

aelkheir · 2024-01-09T11:40:59Z

No problem! I'd be happy to help with the outcome solution of the discussions. As always, thank you.

Add blockquote message entity.

7fdea9f

Bibo-Joshi requested changes Dec 30, 2023

View reviewed changes

aelkheir added 3 commits December 31, 2023 19:03

Merge remote-tracking branch 'upstream/api-7.0' into blockquote-api-7.0

12c19aa

Update Message._parse(html/markdown) to account for 'blockquote'

1d5c3b1

Adapt tests to include blockquote message entity

0af70b3

aelkheir requested a review from Bibo-Joshi December 31, 2023 17:07

Bibo-Joshi requested changes Jan 1, 2024

View reviewed changes

aelkheir added 2 commits January 5, 2024 21:55

Add support for parsing multi-line blockquotes

ba9cfcf

Merge remote-tracking branch 'upstream/api-7.0' into blockquote-api-7.0

877d6d6

Bibo-Joshi reviewed Jan 5, 2024

View reviewed changes

Try fixing blockquote things and clean up Message._parse_* in the p…

0dd0529

…rocess

Bibo-Joshi mentioned this pull request Jan 7, 2024

Inconsistencies in handling of block quotes and pre-formatted code blocks tdlib/telegram-bot-api#515

Closed

harshil21 added the ⚙️ bot-api affected functionality: bot-api label Jan 8, 2024

Bibo-Joshi mentioned this pull request Jan 8, 2024

Blockquote api 7.0 branched #4061

Merged

1 task

Bibo-Joshi added 4 commits January 8, 2024 20:28

Revert "Try fixing blockquote things and clean up Message._parse_* …

dd8f9c5

…in the process" This reverts commit 0dd0529.

Revert "Add support for parsing multi-line blockquotes"

b0d2ef5

This reverts commit ba9cfcf.

Revert "Adapt tests to include blockquote message entity"

71024d2

This reverts commit 0af70b3.

Revert "Update Message._parse(html/markdown) to account for 'blockq…

961d271

…uote'" This reverts commit 1d5c3b1.

Bibo-Joshi merged commit f682d17 into python-telegram-bot:api-7.0 Jan 8, 2024

Bibo-Joshi mentioned this pull request Jan 8, 2024

[FEATURE] API 7.0 #4033

Closed

34 tasks

aelkheir deleted the blockquote-api-7.0 branch January 10, 2024 21:24

github-actions bot locked and limited conversation to collaborators Jan 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`blockquote` message entity api-7.0. #4038

`blockquote` message entity api-7.0. #4038

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

blockquote message entity api-7.0. #4038

blockquote message entity api-7.0. #4038

Uh oh!

Conversation

Uh oh!

Scope: Block Quotation

Check-list for PRs

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

`blockquote` message entity api-7.0. #4038

`blockquote` message entity api-7.0. #4038