-
Notifications
You must be signed in to change notification settings - Fork 5.7k
blockquote
message entity api-7.0.
#4038
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
blockquote
message entity api-7.0.
#4038
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! The changes so far look good. Can you please also update Message.(text|caption)_(html|markdown).*
and the corresponding tests?
Roger! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates :) just two comments
telegram/_message.py
Outdated
raise ValueError( | ||
"Blockquote entities are not supported for Markdown version 1" | ||
) | ||
insert = f">{escaped_text}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs multiline handling as already being discussed in https://t.me/pythontelegrambotdev/965. Please update the tests to cover these cases :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late reply.
So, I came up with this, which I think would fit most cases
insert = "\n>" + escaped_text.replace("\n", "\n>") + "\n"
I'm assuming that the output of _parse_(html, markdown*)
should produce the same visual structure as the original message if/when used like message.reply_text(message.text_markdown/html)
these are the reasons for the extra \n
s
- each quote line/paragraph, including the first, must be in its own new line. So trying to send
">line 1>line2"
or"abc >def"
as text withpars_mode=mdv2
, will raisetelegram.error.BadRequest: Can't parse entities: character '>' ...
- the parsed quote must be separated from the following entities/text with a line break . This is to prevent the last line of the quote from including the rest of the message text as part of the quote.
Issues am facing is when the message starts (or ends) with a blockquote entity. There will be an extra unneeded blank lines.
I guess I can trim them at message.(text/caption)_(html/markdown)*
, but don't like the repetition. would appreciate any ideas, or even a completely different approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not to worry. Thanks for your elaborations! I haven't looked into it deeply yet, but a spontanuous idea would be that one should be able to use MessageEntity.{offset, length}
to determine if the quoted text is at the start/end of the message. Also I'm wondering if escaped_text.replace("\n", "\n>")
works as intended if the quote-text itself contains "\n", i.e. something like
This is \n a quote
🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one should be able to use
MessageEntity.{offset, length} to ...
Oh nice, problem solved then.
Also I'm wondering if
escaped_text.replace("\n", "\n>")
works as intended
Hmm, apparently it does. Telegram escapes \n
s when its meant literally. and surprisingly string.replace("\n", "...")
will not match escaped \\n
s
$ python
Python 3.11.4 ...
>>> """line 1
... line \\n 2
... line 3""".replace("\n", ' ** ')
'line 1 ** line \\n 2 ** line 3'
>>>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, that's good, then :) adding this case to the tests would still be sane IMO :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates! I did some manual testing and unfortunately it seems that the situation is even more involved than I though :/ My findings so far:
- inserting
\n
everywhere introuduces unwanted linebreaks - in markdown
>
must always be the first character in the line to work for the quote block. Hence, whether or not we need to insert an\n
depends on the text before the entity. E.g."ABC<blockquote>DEF</blockquote>"
works fine, but"ABC>DEF"
does not - we'd have to use"ABC\n>DEF"
. This renders in the same way, but the text of the message is in fact different (by a\n
) - if the block quote also contains a nested entity, then something in the handling for that un-escapes the line breaks leading to
\n
in thetext_md_v2
instead of the original\\n
- One thing that MD apparently just can not represent is 2 block quotes following each other directly without blank line.
"<blockquote>ABC</blockquote><blockquote>DEF</blockquote>"
works for HTML, but">ABC\n>DEF" renders as one two-line quote, while
">ABC\n\n>DEF"` rendors as two one-line quotes separated by a blank line. This also needs checkinf if two block quote entities are following each other directly and TBH af first glance I think the most sane thing to do is to raise an exception and not produce any output.
I'm not sure how we can proceed here. But then again, it's late and I might see more clearly tomorrow …
For reference: I tried to test by checking if sending a message with the produced text_*
has the same message entities & text as the original message. MWE:
from telegram import Message, Update
from telegram.ext import CallbackContext, Application, MessageHandler, filters
async def handle(update: Update, context: CallbackContext):
message = update.effective_message
print(repr(message.text))
print(repr(message.entities))
print(repr(message.text_html))
print(repr(message.text_markdown_v2))
reply_md = await message.reply_markdown_v2(message.text_markdown_v2)
reply_html = await message.reply_html(message.text_html)
for reply, name in ((reply_md, "markdown"), (reply_html, "HTML")):
print("-" * 80)
print(f"Replying with {name}:")
print(reply.text == message.text, repr(reply.text), repr(message.text), sep="\n")
print(
reply.entities == message.entities,
repr(reply.entities),
repr(message.entities),
sep="\n",
)
print(
reply.text_html == message.text_html,
repr(reply.text_html),
repr(message.text_html),
sep="\n",
)
print(
reply.text_markdown_v2 == message.text_markdown_v2,
repr(reply.text_markdown_v2),
repr(message.text_markdown_v2),
sep="\n",
)
if __name__ == "__main__":
application = (
Application.builder()
.token("TOKEN")
.build()
)
application.add_handler(MessageHandler(filters.ALL, handle))
application.run_polling()
Test messages that I used:
Edit:
- So far
insert = f">" + "\n>".join(escaped_text.splitlines())
worked best for me - Reworking the tests to check if sending the
text_html/md
results in the same text & entities is something that we should do IMO as it's more precise than asserting that we get the expected formatting entities in the string text. That's beyond the scope of this PR, though
telegram/_message.py
Outdated
"Blockquote entities are not supported for Markdown version 1" | ||
) | ||
prefix = ">" if entity.offset == 0 else "\n>" | ||
message_length = len(str(message_text, encoding="utf-16-le")) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
len(message_text.decode("utf-16-le"))
would do the trick - maybe that also circumvents the type: ignore
? one could save that decoded text in a variable at the top of the function
I dug around more, ended up refactoring a bit of code took the liberty to push my changes to your branch. Managed to fix the nested entities stuff (and I'm wondering why the What's still missing is throwing an exception for two consequcitve block quotes in MarkdownV2. Moreover, even though block quote parsing seems to be working correctly now, I think there is a bug on TG side. Sending a text of the form ">A\nB" leads to TG detecting a Message entity of length 2 instead of 1, which includes the "\n". You can reproduce this via https://api.telegram.org/botTOKEN/sendMessage?chat_id=CHAT_ID&text=%3EA%0AB&parse_mode=MarkdownV2 If you try to compute the For completeness, let me also share my renewed testing script: import asyncio
import json
from pathlib import Path
from pprint import pprint
from telegram import Message, Update, Bot
from telegram.constants import ParseMode
from telegram.ext import CallbackContext, Application, MessageHandler, filters
def get_message(bot: Bot) -> Message:
return Update.de_json(json.loads(Path("response_3.json").read_bytes()), bot).effective_message
async def main():
async with Bot("TOKEN") as bot:
message = get_message(bot)
await run_tests(message)
def get_formatted_text(message: Message, parse_mode: ParseMode) -> str:
return getattr(message, f"text_{parse_mode.name.lower()}")
async def run_tests(message: Message) -> None:
print(repr(message.text))
pprint(repr(message.entities))
for parse_mode in (ParseMode.HTML, ParseMode.MARKDOWN_V2):
print("-" * 80)
print(f"Replying with {parse_mode}:")
await message.get_bot().send_message(
chat_id=1145108092,
text=f"============ Replying with {parse_mode}:",
)
formatted_text = get_formatted_text(message, parse_mode)
print(repr(formatted_text))
reply = await message.get_bot().send_message(
chat_id=1145108092,
text=formatted_text,
parse_mode=parse_mode,
)
print(reply.text == message.text, repr(reply.text), rep
8000
r(message.text), sep="\n")
print(
reply.entities == message.entities,
repr(reply.entities),
repr(message.entities),
sep="\n",
)
pprint(reply.parse_entities())
print(
reply.text_html == message.text_html,
repr(reply.text_html),
repr(message.text_html),
sep="\n",
)
print(
reply.text_markdown_v2 == message.text_markdown_v2,
(reply.text_markdown_v2),
(message.text_markdown_v2),
sep="\n====\n",
)
next_reply = await reply.reply_text(
get_formatted_text(reply, parse_mode), parse_mode=parse_mode, quote=True
)
await next_reply.reply_text(
get_formatted_text(next_reply, parse_mode), parse_mode=parse_mode, quote=True
)
if __name__ == "__main__":
asyncio.run(main()) response.json: {
"update_id": 219020534,
"message": {
"message_id": 31843,
"from": {
"id": 1145108092,
"is_bot": false,
"first_name": "Hinrich",
"last_name": "Mahler (@Bibo-Joshi)",
"username": "BiboJoshi",
"language_code": "de"
},
"chat": {
"id": 1145108092,
"first_name": "Hinrich",
"last_name": "Mahler (@Bibo-Joshi)",
"username": "BiboJoshi",
"type": "private"
},
"date": 1704553973,
"text": "block \\n quote\nABC\nblock \\n quote\nwith \\n line break\nDEF\nblock \\n quote\nwith \\n line breakp",
"entities": [
{
"offset": 0,
"length": 14,
"type": "blockquote"
},
{
"offset": 9,
"length": 5,
"type": "underline"
},
{
"offset": 19,
"length": 33,
"type": "blockquote"
},
{
"offset": 28,
"length": 5,
"type": "italic"
},
{
"offset": 28,
"length": 5,
"type": "strikethrough"
},
{
"offset": 57,
"length": 34,
"type": "blockquote"
},
{
"offset": 66,
"length": 5,
"type": "spoiler"
}
]
}
} |
Ah, what I forgot is handling of cases that result from things like For the potential bug in TG sidr mentioned above, I guess we can try to work around it but checking if the text if the block quote ends on a newline character. Funny things may still happen if the quote is located at the end of the message 🤔 in any case I think its worth pointing oit this behavior to TG. In fact one could also point them to the first past of this comment and ask them to unify the handling of these cases, always breaking lines before and after block quotes ... |
I opened tdlib/telegram-bot-api#515 |
A first reply on tdlib/telegram-bot-api#515 is there. Punshlines:
While somewhat expected, this reply is still a bummer for me and I see the need for an internal discussion on how to proceed with For this PR, I will revert to the stage that "just" added the |
No problem! I'd be happy to help with the outcome solution of the discussions. As always, thank you. |
Closes #4033
Scope: Block Quotation
Check-list for PRs
Added
.. versionadded:: NEXT.VERSION
,.. versionchanged:: NEXT.VERSION
or.. deprecated:: NEXT.VERSION
to the docstrings for user facing changes (for methods/class descriptions, arguments and attributes)Created new or adapted existing unit tests
Documented code changes according to the
CSI standard <https://standards.mousepawmedia.com/en/stable/csi.html>
__Added new classes & modules to the docs and all suitable
__all__
sChecked the
Stability Policy <https://docs.python-telegram-bot.org/stability_policy.html>
_ in case of deprecations or changes to documented behaviorIf relevant:
telegram.constants
and shortcuts to them as class variables