[go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

making the option --limit in pull usable. #165

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

Lattitude75
Copy link

This, basically, makes the --limit option functional.

Copy link
Owner
@gauteh gauteh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will only work once. Next pulls you have to skip the limit option.

@Lattitude75
Copy link
Author

This will only work once. Next pulls you have to skip the limit option.

Yes, it is unnecessary after the first pull. Don't you think so? After the first pull, limit option does not make sense as only the new mail are pulled over. It also removes the need to associate it with any other option.

@gauteh
Copy link
Owner
gauteh commented Jul 15, 2020 via email

@Lattitude75
Copy link
Author

No, it is the full_pull function where the --limit is implemented as shown in the excerpt from it below.

  def full_pull (self):
    total = 1

    self.bar_create (leave = True, total = total, desc = 'fetching messages')

    # NOTE:
    # this list might grow gigantic for large quantities of e-mail, not really sure
    # about how much memory this will take. this is just a list of some
    # simple metadata like message ids.
    message_gids = []
    last_id      = self.remote.get_current_history_id (self.local.state.last_historyId)

    for mset in self.remote.all_messages ():
      (total, gids) = mset

      self.bar.total = total
      self.bar_update (len(gids))

      for m in gids:
        message_gids.append (m['id'])

      if self.limit is not None and len(message_gids) >= self.limit:
        break

It is the mset value which is limited in the full_pull. And since this is the very function which is called in case of a forced pull or if the historyId is too old, it should not cause any inconsistencies.

Further, a full_pull with the --limit option is considered as a successful full pull by the partial_pull function and I have been able to use it for the past few days now without any inconsistency or a problem. Thus, the --limit option works, if the --limit variable can be made to be a global variable in the .gmailieer.json file (as I have added to this PR). Do let me know if what I said seems right.

@Lattitude75
Copy link
Author

I do not think that any full pull with a limit variable defined is going to pull all the email. Looking at the full_pull function:

 def full_pull (self):
    total = 1

    self.bar_create (leave = True, total = total, desc = 'fetching messages')
.... 
for mset in self.remote.all_messages ():
      (total, gids) = mset

      self.bar.total = total
      self.bar_update (len(gids))

      for m in gids:
        message_gids.append (m['id'])

      if self.limit is not None and len(message_gids) >= self.limit:
        break

The variable mset is limited using the limit variable. So, if we make sure that limit is added to the set variables in .gmailieer.json, the full pull should never cause a problem. Even the scenarios that you listed, both forced and a very old historyId, call the very full_pull function. I have added the commit to add limit to the full_pull variables in this PR. Thus, the end user can just use the gmi set --limit 500 to limit the number of mails to 500 in all the subsequent pull and sync commands. This should not be causing any inconsistencies. Do let me know if I missed something or If I am wrong.

@Lattitude75
Copy link
Author

Sorry, closed it by mistake!

@Lattitude75 Lattitude75 reopened this Jul 15, 2020
@gauteh
Copy link
Owner
gauteh commented Jul 15, 2020 via email

@Lattitude75
Copy link
Author

The last commit I added should fix the problem with the partial pull. It makes sure that the number of messages in the database is always below the limit set using gmi set --limit. I created an extra function nm_messages_to_gids, even though I am aware of messages_to_gids, to avoid the unnecessary processing of messages each time it is run. Lastly, I removed the option --limit in any other individual calls of pull and push, it is now only possible to set it using gmi set. Do take a look, and let me know if it works.

@@ -549,6 +539,17 @@ def remove_from_list (lst, m):

changed = True

#limiting the number of messages in the database to local.config.limit parameter
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might not have the same order as the list from gmail.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, that is the least we can do to make sure that the oldest messages are discarded to maintain the limit. One way might be to chose a different way to sort other than NEWEST_FIRST.

Copy link
Author
@Lattitude75 Lattitude75 Jul 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, now I see the problem. Maybe, we can go for an oldest thread. That way we only delete a set of messages which did not show up in the front in quite sometime.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if there is a reliable way to make notmuch sort messages the same way as gmail does.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it does come pretty close. The only warning the users should face is that the threads at the very end might be incomplete because of the message limit. But, what we can do is to make sure that threads remain complete and thus remove the old threads as a whole until the sum of the messages is below the limit. What do you prefer for this limit?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the recent commit, I decided to keep the threads intact and thus deciding on the number of messages to be deleted based on threads. And, I also added a progress bar to indicate that older messages are being removed.

lieer/local.py Outdated
@@ -413,6 +419,24 @@ def messages_to_gids (self, msgs):

return (messages, gids)

def nm_messages_to_gids (self, msgs):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the existing function, or share code in some way.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, using the existing function might affect the performance if the number of new messages is large. Then, making a new array of messages is completely unnecessary and can add to the processing load. Should I still resort to code directly than to make a new function?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have still used the function that I have added, let me know if you want me to include the code directly in gmailieer.py instead.

lieer/gmailieer.py Outdated Show resolved Hide resolved
@@ -344,8 +333,8 @@ def _got_msgs (ms):
actions = [ a for a in actions if a ]

# limit
if self.limit is not None and len(actions) >= self.limit:
actions = actions[:self.limit]
if self.local.config.limit is not None and len(actions) >= self.local.config.limit:
Copy link
Owner
@gauteh gauteh Jul 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>= ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not checked this part of the code actually. This does not seem necessary now anyways, if the number of messages itself is limited.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actions is not the same as number of messages. i think if this PR is going to be considered for merging it needs to be thoroughly tested, over a longer time, and shown to work well. many users rely on lieer to work correctly, and we have to make sure to not break setups or cause data loss. it also needs to be tested with different limits and also with no limit. I do not guarantee that it will be merged in the end.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the >= comment refers to mismatching use of sometimes > and sometimes >=.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I did go through the mismatching operations. I did not change any of them and it is your call as they remain the same as in the original repository. On that note in the push function, I do not think the if loops to limit the messages and actions are necessary now. They can be removed. Should I do that in the next commit?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actions is not the same as number of messages. i think if this PR is going to be considered for merging it needs to be thoroughly tested, over a longer time, and shown to work well. many users rely on lieer to work correctly, and we have to make sure to not break setups or cause data loss. it also needs to be tested with different limits and also with no limit. I do not guarantee that it will be merged in the end.

And, yes I will test it for the said scenarios to see if it works well.

@Lattitude75
Copy link
Author

In the final commit, I made the necessary changes like removing the extra function I defined and adding indicators where the limiting action was being taken. I also tested it for the following scenarios:

  • Without setting any limit
  • Setting the limit and then performing multiple pulls
  • Start a full pull without setting the limit, then set the limit to some number below the total messages in my email

And it worked as expected without any issues.

The users should only be informed of one particular use case. When a user has limited the number of messages to, for example, 400 messages and after a while decided to increase the limit to like 1000, the user needs to make a gmi pull -f. Thus, after any increase in the limit parameter set would require a forced pull. In any other case, the partial_pull function would take care to enforce the limit.

@gauteh
Copy link
Owner
gauteh commented Jul 17, 2020 via email

@Lattitude75
Copy link
Author
Lattitude75 commented Jul 17, 2020

But one thought I had was that we should probably limit the number of messages requested from gmail. This would also fix the progressbar. Please check if such an option is possible in the gmail API.

Though the limit in self.remote.all_messages(limit=None) can be used for this purpose and it does work, I do not think it would be a good idea to set the total messages for the progress bar for a use case where the number of messages is less than the limit set. The progress bar then remains incomplete, which is not very good. I think it is better to leave it here as it serves as a counter. In the latest commit, the output looks like this:

pull: full synchronization (forced)
Limit parameter set, number of messages that will be fetched: 400
fetching messages: 400it [00:02, 150.17it/s]
removing deleted: 0it [00:00, ?it/s]
receiving content: 100%|█████████████████████████████████████████████████████████████████████████████| 300/300 [00:19<00:00, 15.25it/s]
receiving metadata: 100%|████████████████████████████████████████████████████████████████████████████| 100/100 [00:02<00:00, 48.12it/s]

This way, the counter fetched messages serves as a final count to the number of messages to be fetched irrespective of whether it is less or more than the limit set.

Edit: By the way, I will be using lieer with this commit for three of my emails with Astroid on a regular basis. This should be enough for testing in my opinion. I'll put in a comment if I notice something fishy.

@Lattitude75
Copy link
Author

Added --unset-limit to gmi set and added the option to Readme.md.

@aspiers
Copy link
Contributor
aspiers commented Sep 22, 2024

Please could some kind soul explain what this PR actually does? It sounds like something which could help solve the problems I'm experiencing with syncing a massive account.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants