Add Bootstrapping Logic to `Application.run_*` #4673

Bibo-Joshi · 2025-02-07T11:08:24Z

When ready, closes #4657.

This also changes the default value for bootstrap_retries for start/run_polling to 0. Previously the logic was roughly

async def run_main(bootstrap_retries="indefinite"):
    delete_webhook_and_drop_pending_updates(retries=bootstrap_retries)
    poll_for_updates(retries="indefinite")

I now changed this to

async def run_main(bootstrap_retries=None):
    delete_webhook_and_drop_pending_updates(retries=bootstrap_retries)
    poll_for_updates(retries="indefinite")

I think this is saner, and I'm not even sure if indefinite retries during the bootstrapping phase were ever actually intended …

ToDo

Update existing tests to new logs
Add new tests for bootstrapping in Application

Also changes the default value for `bootstrap_retries` on polling

Poolitzer

Im just asking questions at this point

Poolitzer · 2025-02-08T20:15:57Z

telegram/ext/_updater.py

        self.__polling_task = asyncio.create_task(
-            self._network_loop_retry(
+            network_retry_loop(
+                is_running=lambda: self.running,


why is that a lambda?

python-telegram-bot/telegram/ext/_utils/networkloop.py

Line 113 in 315a97f

while effective_is_running():

this line expects a callable. If we simply pass is_runnig=self.running and edit above line to

while is_running:

then we have an infinite loop, b/c setting self.running=False within Updater won't change the value in the network retry loop :) There are probably a billion other ways do to this (pass an object with a running attribute to check, pass a dict with a corresponding entry, pass an event …). This one seemed rather straight forward to me, but I'm open to changing it. Accepting an instance of

class HasRunning(Protocol): running: bool

would sound like the next best thing to me at first glance

tsnoam · 2025-02-09T06:37:33Z

@Eldinnie
here's my vague memory for the original changes:

infinite retries were really intended.
the idea was to find some common ground (i.e. defaults) for most of the user base (we understood that not all can be pleased)
there is a bootstrapping phase in which certain actions had to be made (are they still needed? I don't know... I haven't been following the bot API in the past years).
what happens if these operations fail? do we resume with normal execution of the bot? do we abort (and let the user run the bot again)? or do we just retry until a success?
we chose the option of indefinite retries (and afair, with appropriate logging). that way we make sure that the bot is properly started and will be started eventually. aborting seemed like too much, after all, the user will just need to restart the bot and come to the same "problematic" part of the initialization. so what's the point in that?

I hope that helps...

tsnoam · 2025-02-09T06:38:23Z

and why did i tag Eldinnie instead of @Bibo-Joshi ?

it's my memory playing tricks on me. lol.

Bibo-Joshi · 2025-02-09T08:26:35Z

Thanks very much for your insights @tsnoam !

The bootstrapping phase is indeed still needed. It deletes the previous webhook if you run polling, sets a new webhook if your run in webhook mode and drops pending updates if requested.

Do you recall why you chose indefinite retries over doing a limited number of retries (say, 3)?. Even though we have logging, indefinite retries can get you in a situation where the process is running in a healthy way (eagerly retrying to set the webhook) but the bot doesn't react at all. That sounds undesirable to me. A limitd number of retries OTOH would be a sane effort of getting things going whil still preventing that the process get's stuck in an unusable state without explicit indication to the user.

Moreover I noticed that for webhook mode, the retries were kept at "none". I suspect that this was an oversight: The docstring was updated, but not the signature: https://github.com/python-telegram-bot/python-telegram-bot/pull/1018/files#r1948035215

tsnoam · 2025-02-09T08:52:54Z

@Bibo-Joshi

Do you recall why you chose indefinite retries over doing a limited number of retries (say, 3)?.
as stated on bullet 5:

we chose the option of indefinite retries (and afair, with appropriate logging). that way we make sure that the bot is properly started and will be started eventually. aborting seemed like too much, after all, the user will just need to restart the bot and come to the same "problematic" part of the initialization. so what's the point in that?

A limitd number of retries OTOH would be a sane effort of getting things going whil still preventing that the process get's stuck in an unusable state without explicit indication to the user.

I'm not sure about it. The initialization phase is the for a reason. Skipping it might lead to a limbo state in which you don't know exactly what happens.

Moreover I noticed that for webhook mode, the retries were kept at "none". I suspect that this was an oversight

Could be an oversight. That much I don't remember.

Bibo-Joshi · 2025-02-09T09:31:04Z

Skipping it might lead to a limbo state in which you don't know exactly what happens.

I would not skip it, but rather abort if the bootstrapping phase fails. Meaning that fater n < ∞ retries you either have a bot that's able to handle updates or the process has shut down.

But okay, I get your reasoning and now have to decide what to make of it 😅 Thanks for the input!

tsnoam · 2025-02-09T09:32:50Z

as long as you can get to an expected state, any solution is good.

septatrix · 2025-02-09T12:31:55Z

I think a finite number of retries is more than enough as long as the delay between them is large enough to eliminate intermittent connectivity problems as the source of potential problems. Eg. in my issue report the problem was that the service was brought up before network/DNS was online. This state is usually resolved within the first 10-20 seconds after boot.

Having an infinite amount of retries by default sound dangerous and may instead hide problems which the user wants the be notified about (by the service exiting)

Bibo-Joshi · 2025-02-09T15:38:09Z

as long as the delay between them is large enough to eliminate intermittent connectivity problems as the source of potential problems

So far I have implemented the initial retry-interval to 0 seconds for Application.initialize

python-telegram-bot/telegram/ext/_application.py

Line 1056 in 315a97f

interval=0,

but I see that the bootstrapping within Updater uses an initial interval of 1 second - I can use that for Application as well:

python-telegram-bot/telegram/ext/_updater.py

Line 715 in 315a97f

bootstrap_interval: float = 1,

Note that the retry-loop retries immediately on timeout errors. On other errors, the interval is increased step by step up to 30 seconds:

python-telegram-bot/telegram/ext/_utils/networkloop.py

Lines 114 to 145 in 315a97f

    
           try: 
        
               if not await do_action(): 
        
                   break 
        
           except RetryAfter as exc: 
        
               slack_time = 0.5 
        
               _LOGGER.info( 
        
                   "%s %s. Adding %s seconds to the specified time.", log_prefix, exc, slack_time 
        
               ) 
        
               cur_interval = slack_time + exc.retry_after 
        
           except TimedOut as toe: 
        
               _LOGGER.debug("%s Timed out: %s. Retrying immediately.", log_prefix, toe) 
        
               # If failure is due to timeout, we should retry asap. 
        
               cur_interval = 0 
        
           except InvalidToken: 
        
               _LOGGER.exception("%s Invalid token. Aborting retry loop.", log_prefix) 
        
               raise 
        
           except TelegramError as telegram_exc: 
        
               if on_err_cb: 
        
                   on_err_cb(telegram_exc) 
        
               if max_retries < 0 or retries < max_retries: 
        
                   _LOGGER.debug( 
        
                       "%s Failed run number %s of %s. Retrying.", log_prefix, retries, max_retries 
        
                   ) 
        
               else: 
        
                   _LOGGER.exception( 
        
                       "%s Failed run number %s of %s. Aborting.", log_prefix, retries, max_retries 
        
                   ) 
        
                   raise 
        
               # increase waiting times on subsequent errors up to 30secs 
        
               cur_interval = 1 if cur_interval == 0 else min(30, 1.5 * cur_interval)

Specifying the retry-interval is currently not exposed to the user in the Updater.start_* methods and for starters that would seem like a bit of overkill, TBH.

@septatrix would this + (customizable) finite number of retries be enough for you?

septatrix · 2025-02-09T16:31:34Z

as long as the delay between them is large enough to eliminate intermittent connectivity problems as the source of potential problems

So far I have implemented the initial retry-interval to 0 seconds for Application.initialize [...] but I see that the bootstrapping within Updater uses an initial interval of 1 second - I can use that for Application as well:

I think having a default 1s interval is better than 0s - or at least something non-zero. Depending on how far along the network stack is the connections would try instantly instead of after a timeout so adding a small delay (such as the 1s) on our side is better.

Note that the retry-loop retries immediately on timeout errors. On other errors, the interval is increased step by step up to 30 seconds:

On timeout errors this should not be a problem because the timeout itself will result in some delay (though it also wouldn't hurt to wait, it's just not necessary).

Specifying the retry-interval is currently not exposed to the user in the Updater.start_* methods and for starters that would seem like a bit of overkill, TBH.

I agree. As long as the default interval is non-zero it should be fine for most people.

@septatrix would this + (customizable) finite number of retries be enough for you?

Yes, as far as I understand how this is hooked up now this would be enough for me.

Small aside while peeking at the code I found a small curiosity, nothing urgent: I noticed the slack added to RetryAfter exceptions (introduced in a68cf8d) is different from the one in AIORateLimiter. If you already extract it into a variable maybe put it in some shared location? However, this is completely independent from the rest of this PR so feel free to disregard it

codecov · 2025-02-17T16:50:07Z

❌ 2 Tests Failed:

Tests completed	Failed	Passed	Skipped
6531	2	6529	746

View the top 2 failed test(s) by shortest run time

tests.test_bot.TestBotWithRequest::test_send_close_date_default_tz[Europe/Berlin-ZoneInfo]

Stack Traces | 7.35s run time

self = <tests.test_bot.TestBotWithRequest object at 0x000001CE6EFE70D0>
tz_bot = PytestExtBot[token=690091347:AAFLmR5pAB5Ycpe_mOh7zM4JFBOh0z3T0To]
super_group_id = '-1001279600026'

    async def test_send_close_date_default_tz(self, tz_bot, super_group_id):
        question = "Is this a test?"
        answers = ["Yes", "No", "Maybe"]
        reply_markup = InlineKeyboardMarkup.from_button(
            InlineKeyboardButton(text="text", callback_data="data")
        )
    
        aware_close_date = dtm.datetime.now(tz=tz_bot.defaults.tzinfo) + dtm.timedelta(seconds=5)
        close_date = aware_close_date.replace(tzinfo=None)
    
        msg = await tz_bot.send_poll(  # The timezone returned from this is always converted to UTC
            chat_id=super_group_id,
            question=question,
            options=answers,
            close_date=close_date,
            read_timeout=60,
        )
        msg.poll._unfreeze()
        # Sometimes there can be a few seconds delay, so don't let the test fail due to that-
>       msg.poll.close_date = msg.poll.close_date.astimezone(aware_close_date.tzinfo)
E       AttributeError: 'NoneType' object has no attribute 'astimezone'

tests\test_bot.py:2843: AttributeError

tests.test_bot.TestBotWithRequest::test_send_close_date_default_tz[Asia/Singapore-timezone]

Stack Traces | 7.37s run time

self = <tests.test_bot.TestBotWithRequest object at 0x00000223F0A048D0>
tz_bot = PytestExtBot[token=690091347:AAFLmR5pAB5Ycpe_mOh7zM4JFBOh0z3T0To]
super_group_id = '-1001279600026'

    async def test_send_close_date_default_tz(self, tz_bot, super_group_id):
        question = "Is this a test?"
        answers = ["Yes", "No", "Maybe"]
        reply_markup = InlineKeyboardMarkup.from_button(
            InlineKeyboardButton(text="text", callback_data="data")
        )
    
        aware_close_date = dtm.datetime.now(tz=tz_bot.defaults.tzinfo) + dtm.timedelta(seconds=5)
        close_date = aware_close_date.replace(tzinfo=None)
    
        msg = await tz_bot.send_poll(  # The timezone returned from this is always converted to UTC
            chat_id=super_group_id,
            question=question,
            options=answers,
            close_date=close_date,
            read_timeout=60,
        )
        msg.poll._unfreeze()
        # Sometimes there can be a few seconds delay, so don't let the test fail due to that-
>       msg.poll.close_date = msg.poll.close_date.astimezone(aware_close_date.tzinfo)
E       AttributeError: 'NoneType' object has no attribute 'astimezone'

tests\test_bot.py:2843: AttributeError

To view more test analytics, go to the Test Analytics Dashboard
_{📋 Got 3 mins? Take this short survey to help us improve Test Analytics.}

Bibo-Joshi · 2025-02-17T17:21:06Z

updated the tests, ready for review :)

Copilot

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

telegram/ext/_updater.py

Add Bootstrapping Logic to Application.run_*

315a97f

Also changes the default value for `bootstrap_retries` on polling

Bibo-Joshi added 🛠 refactor change type: refactor 🔌 enhancement pr description: enhancement labels Feb 7, 2025

Poolitzer reviewed Feb 8, 2025

View reviewed changes

Bibo-Joshi added 3 commits February 17, 2025 17:17

change retry interval to 1 second on application bootstrap

6e64043

Merge branch 'master' into feature/app-bootstrapping

f933cfd

Add tests for Application

4d60f00

Bibo-Joshi added 2 commits February 17, 2025 17:54

try fixing Unix-based tests

14008d3

try fixing py 3.13 tests

575fdce

Bibo-Joshi marked this pull request as ready for review February 17, 2025 17:20

Bibo-Joshi requested review from Poolitzer and harshil21 February 17, 2025 17:20

harshil21 requested a review from Copilot February 26, 2025 18:18

Copilot AI reviewed Feb 26, 2025

View reviewed changes

telegram/ext/_updater.py Show resolved Hide resolved

Bibo-Joshi merged commit b0d22ac into master Mar 1, 2025
25 of 26 checks passed

Bibo-Joshi deleted the feature/app-bootstrapping branch March 1, 2025 10:13

github-actions bot locked and limited conversation to collaborators Mar 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Bootstrapping Logic to `Application.run_*` #4673

Add Bootstrapping Logic to `Application.run_*` #4673

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add Bootstrapping Logic to Application.run_* #4673

Add Bootstrapping Logic to Application.run_* #4673

Uh oh!

Conversation

Uh oh!

ToDo

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

❌ 2 Tests Failed:

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add Bootstrapping Logic to `Application.run_*` #4673

Add Bootstrapping Logic to `Application.run_*` #4673