[RFC] [Meta] Design Reconsiderations #670

sigmavirus24 · 2017-01-21T17:06:28Z

So I'm beginning to have some doubts about the current direction of github3.py's design, specifically around:

Empty
NullObject

It seems far too many people want to deal with something that's not a "NullObject" or want to treat it like None. The way it's currently implemented, it actually does behave like None. You can still do

import github3

gh = github3.login(...)
real_organization = gh.organization('github3py')
fake_organization = gh.organization('non-existent-org')

if real_organization:
    print('github3py exists')

if fake_organization:
    print('non-existent-org exists')

And see what you expect

That said, people seem to prefer None to NullObject and that's fine, but, I'm wondering how appropriate None is. We do return it carefully from methods. In the example above, the request for non-existent-org will return a 404. That is the only time we return NullObject (or None). With that in mind, I think using None there is valid.

That brings us to using None for attribute values parsed from the API. Previously, we were using None for attributes returned as null as well as attributes that didn't exist in the response. Why did we do this? Because I believed (and still do believe) that having an attribute is better than having an AttributeError, especially when on one instance of an object you get a value and on another you might get an AttributeError.

This + a false idea of DRY led me to think that having something like Empty would be the best option for users. I'm now convinced that I was wrong.

Instead, I think we should approach this differently. For some resources there are two different (although intersecting) representations. For example, for users there is a short user listing and a long one. The short listing is returned when you list all users, e.g., gh.all_users(). The long one is returned when you request a specific one, e.g., gh.user('itsmemattchung').

Thus, I'm proposing a new way of dealing with this:

Let's create separate classes (that don't inherit from each other) for resources that have these separate representations. For example (names are up for debate) ShortUser and User.

The Short<X> classes should understand their relation to X classes so that the Short<X> classes can return an instance of X when a user calls refresh(). This is because currently, users can do the following to get full information on users:
```
for user in gh.all_users():
    user = user.refresh()
    # ...
```
This will just mean that we introduce something that inherits from the base class for the short classes and overrides the refresh method and looks for a class attribute that defines the longer class.
Remove Empty and its usage.

After we've done a few of these classes, determine the best way to reduce duplication. I suspect what we'll end up doing is something like this:

class _User(models.GitHubObject):
    def _update_attributes(self, json):
        # ... Set up attributes for ShortUser

    # define all methods already on a User


class User(_User):
    """ Docstring with examples """

    def _update_attributes(self, json):
        super(User, self)._update_attributes(json)
        # finish setting up attributes that are present for /users/:user_name


class ShortUser(models.ShortRefresh, _User):
    """ Docstring with examples """
    long_form = User

Is this more convoluted than using Empty? From an implementation stand-point, yes. From a user stand-point it offers the explicit that was missing. It also eliminates the confusion that users seem to be encountering.

Work Items

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

The text was updated successfully, but these errors were encountered:

sigmavirus24 · 2017-01-28T00:19:28Z

I'm going to interpret no response as tacit approval. =P

After a couple weeks of users trying this out, it has revealed itself to be far too confusing. Users do not enjoy it and it makes them scratch their heads. This still needs to also be excised from our testsuite. Related-to gh-670

Now that we've excised the usage of Empty and its presence in github3.py, let's remove all references to it in our testsuite. Related-to #670

Now that we've removed Empty from our code and tests, remove it from our documentation. Related-to gh-670

As part of on going work to improve the overall design and simplicity, we're removing the NullObject and it's usage. At the present time, it causes a lot of confusion for users (not as much as Empty, but still up there) and there are simpler ways of designing the library than this. Related-to #670

esacteksab · 2017-01-30T16:40:45Z

Apologies for the lack of response.

Life got a little chaotic.
I haven't touched this repo in quite some time so felt my voice shouldn't matter.
I haven't touched python in any real capacity in over two years, so I'm a dead fish.

Here's the only thing I want to add. Don't hide Github's API responses. They annoy me because if you search for a repo that does exist, but you're not logged in, they return a 404 rather than a 401/403. I think this is some security by obscurity and a script kidding will stop with a 404, but a 401/403 says try harder. /tangent /rant

Anyways, I want to know that Github returned that 404, so don't obscure/obfuscate their response.

My $.03 cents.

sigmavirus24 · 2017-01-31T00:11:38Z

Don't hide Github's API responses.

So let's approach this with you being a user. What would you want back to not hide/obscure it? An exception? A different object that's a little like a nullobject? You can still rely on it being falsey, but it has extra data and doesn't pretend to have every method imaginable? I'm not sure what the right thing is here that won't obscure important information and won't confuse users.

esacteksab · 2017-01-31T00:31:29Z

My stupid is gonna shine through, please allow some ignorant pseudo code

Scenario:
I'm not logged-in/auth'd. "my_repo" is a private repo.

repo = gh3.get_repo("my_repo")

I'd expect something like

repo.response() or repo.reponse_text()

And it pukes "404. LOLNOPE" << this being the response from GH's API.

Don't hide that response.

guyzmo · 2017-01-31T14:09:40Z

if I may add $0.02 to @esacteksab to make it $0.05:

it'd be nice to offer the option of raising an error if the call does not succeed and we've passed some sort of raise_errors attribute to github3.py. That way, you might offer a general use case where it returns None if there's no result, but get the full HTTP response (as a NullObject in Exceptions.args or something like that) if you really want to discriminate the error.

sigmavirus24 · 2017-01-31T14:23:09Z

@guyzmo so presently all exceptions contain the response (or if there is no response, the original exception).

Here's my only counter-point to @esacteksab's thoughts: The only time we return None from a method is a 404, and there's nothing in the 404 response that has any value for someone trying to debug the issue.

I'm also firmly of the opinion that libraries like this should choose one design and should not be configurable to change fundamental behaviour.

We can absolutely default to raising 404s as exceptions if people would find that useful, but I still fail to see the value in forcing users to litter their code with

try:
    gh.<something>(...)
except github3.exceptions.NotFound:
    # handle it, or not

guyzmo · 2017-01-31T14:38:22Z

Your library is by far the most comfortable one to use to control a git service (compared to gitlab's ones or bitbucket's ones), so whatever you decide, I'll be happy using it ☺ That said:

[…] I still fail to see the value in forcing users to litter their code […]

it depends on the use case, the real advantage with exceptions is to let them flow and catch them up in the stack:

def do_something_with_issue(…):
        gh = …
        repository = gh.repository(…)
        issue = repository.issue(…)
        use(issue)

def main():
    try:
        do_something_with_issues(…)
        …
    except github3.exceptions.NotFound:
        # nice error
    except github3.exceptions.NotAuthorized:
        # nice error

there you don't need to clutter your code with neither:

if not repository:
    # return or raise?

you could even safely write:

gh.repository(…).issue(…).owner

and get a proper error instead of an AttributeError.

sigmavirus24 · 2017-01-31T15:27:28Z

True. All that said, it's rather easy to flip this switch right now. I'm hacking on splitting up the User object right now. I think I'll tack this onto the list.

This will simplify our internals too. If doing self._json(response, 200) raises an exception, then we don't have to worry about it returning something that isn't json to pass into an object. the code then looks like

url = self._build_url(...)
params = {...}
response = self._request(VERB, url, params=params)
json = self._json(response, SUCCESS_STATUS_CODE)
return Object(json, self)

guyzmo · 2017-01-31T15:51:06Z

Generally speaking, I think the most pythonic strategy with errors is 'fail early, fail loudly' (explicit etc.).

I'd say, though, it'd be nice to have in the exception in information about the github object that was queried and triggered the exception. In my example earlier, it'd be great to print an error with print("Not found happened when pulling {} from github.".format(err.args[0]).

sigmavirus24 · 2017-01-31T16:46:33Z

I'd say, though, it'd be nice to have in the exception in information about the github object that was queried and triggered the exception.

As in the object/object type and method where the exception came from? Yeah no. That will never happen. A few reasons why:

That will mean that objects will have longer lifetimes, including objects that hold onto sessions (which hold onto sockets, etc.) and which will cause no end of headaches for anyone using this library in a long-running application.
To do that, we would have to lift the exception raising pieces up out of where they would live and add a ton more information. That's far too much work to do for every method that could raise a 404.
Introspection about the current type is flimsy and may (read: probably) will break if further updates aren't carefully reviewed. That's far too much pressure for an entirely volunteer team.

guyzmo · 2017-01-31T16:50:03Z

well, of course to ①, ② and ③!

I was just thinking of something like a non-instanciated Repository class, or Issue class, so we know what we could have had but failed to get.

An alternative would be to have in the exception message the name of the thing we tried to get, but failed to. But I guess, we can also get that by looking at the Response object carried by the exception.

itsmemattchung · 2017-01-31T17:03:40Z

Not much comment, only a side note: Sorry for lagging so hard—a lot of free time chewed up in the last couple weeks.

sigmavirus24 · 2017-01-31T18:28:48Z

I was just thinking of something like a non-instanciated Repository class, or Issue class, so we know what we could have had but failed to get.

Nah. I don't like that at all. I know I'll get bug reports in that case because people will have a repo object but no data.

So far we have two Repository objects (ShortRepository, Repository). In looking at API responses, it seems GitHub has removed many Repository attributes altogether. I'll need to investigate this further. Related-to: gh-670

Julian · 2017-03-09T15:22:02Z

I'm quite late to the party, and quite underinformed overall, just a new user :), so take the following with a grain of salt, but:

To me a few things are pretty clear:

As a user, I'd definitely prefer an exception over both a null object and an error object like None.

The null object pattern is fantastic, but I think it's the wrong thing here. This is pretty standard Python seemingly -- an error condition has happened, and as a user I want immediate notification that something has gone wrong. A data-less Repository object (or any other) just delays that inevitable fact, and None, or any other status response, just isn't rich enough.

As a specific example, the current app I'm porting over wants to know what scopes the token that is being used was configured with, with the knowledge that the data we need requires the repo scope. The GitHub API will respond back with an X-OAuth-Scopes header in any 404 response with that info, and my app would love to pull that out to say "you misconfigured your token" instead of simply reporting "there's no such repo". With a None, there's no such option.

Having the option to make github3.py abstract even more of that away from users, like doing the work to pull out the X-OAuth-Scopes header itself and possibly turn it into its own richer exception, seems tempting, but I suspect will involve quite a bit of work and maintenance to keep up with API changes.

As for splitting objects up: that also seems to make sense to me, but I'm skeptical about how much that seems to be being exposed to end-users of the library. I'd have to look closer, but my intuition tells me that the fact that GitHub returns partial objects is something that can be much more hidden from end users -- e.g. you have some _IncompleteResponse object which encapsulates the partial response and also a way to make it reify into a complete one, but for users, that reification is transparent, it happens just if a user tries to use things that were not present in the original partial response.

I think this probably shouldn't require individual implementations for each model object, it'd just require each model to be a bit more transparent about the data it needs to be whole, but I could be wrong.

On the other hand, I've already been bitten by github3.py making network requests at times I did not expect it to, so potentially users should want that implementation detail to be more obvious, but even so, I suspect it could be done in a way that did not require individual implementations for each model.

Would love to discuss any of this further, especially if I'm way off here.

sigmavirus24 · 2017-03-09T15:55:53Z

As a specific example, the current app I'm porting over wants to know what scopes the token that is being used was configured with, with the knowledge that the data we need requires the repo scope. The GitHub API will respond back with an X-OAuth-Scopes header in any 404 response with that info, and my app would love to pull that out to say "you misconfigured your token" instead of simply reporting "there's no such repo". With a None, there's no such option.

So, every exception includes a response attribute. That should allow you to introspect that once we start raising exceptions instead of returning None.

On the other hand, I've already been bitten by github3.py making network requests at times I did not expect it to

That's concerning. We've tried to make that completely transparent sans adding to every method that it does/does not make a network request.

Can you demonstrate the code (in a separate issue) that surprised you?

This creates 3 new objects, one for iterations (Short), one for direct GETs (Organization), and one for events (EventOrganization). Most attributes can now be directly assigned, aside from a few that are only returned in the API if they are set on the server. Default those to None. Many test cassettes needed to be updated to pick up attributes added to orgs since the cassette was recorded (yay for additive REST APIs...). Introduce a 'auto_login' method to handle either token auth or username/password auth. Use it in the places I needed to update the cassette. Fix a few POST calls that were sending malformed (according to GitHub API) json data. Specifically an empty permission is not accepted. Related-to: sigmavirus24#670 Signed-off-by: Jesse Keating <jkeating@j2solutions.net>

Clean them up as done in sigmavirus24#680 for users.py Related-to: sigmavirus24#670 Signed-off-by: Jesse Keating <jkeating@j2solutions.net>

This creates 3 new objects, one for iterations (Short), one for direct GETs (Organization), and one for events (EventOrganization). Most attributes can now be directly assigned, aside from a few that are only returned in the API if they are set on the server. Default those to None. Many test cassettes needed to be updated to pick up attributes added to orgs since the cassette was recorded (yay for additive REST APIs...). Introduce a 'auto_login' method to handle either token auth or username/password auth. Use it in the places I needed to update the cassette. Fix a few POST calls that were sending malformed (according to GitHub API) json data. Specifically an empty permission is not accepted. Related-to: sigmavirus24#670 Signed-off-by: Jesse Keating <jkeating@j2solutions.net>

Related-to sigmavirus24#670 Signed-off-by: Jesse Keating <jkeating@j2solutions.net>

This creates 3 classes; one for iterations (Short), one for direct GETs (PullRequest), and one for events (EventPullRequest). Most attributes are directly assigned, except where otherwise commented. A couple cassettes needed to be updated, and the sample json for pull_request needed to be updated as well. Updating that one required updating the tests for the now current data. Related-to sigmavirus24#670 Signed-off-by: Jesse Keating <jkeating@j2solutions.net>

This creates 3 classes; one for iterations (Short), one for direct GETs (Issue), and one for events (EventIssue). Most attributes are directly assigned, except where otherwise commented. Some cassettes needed to be updated for the relatively new 'assignees' attribute that now comes back. A sample json needed to be updated as well. Cassettes that needed updated that were associated with authenticated calls got the calls updated to use auto_login as well. The search tests needed to be updated as well, label name changed and syntax changed slightly. Related-to sigmavirus24#670 Signed-off-by: Jesse Keating <jkeating@j2solutions.net>

This creates 3 classes; one for iterations (Short), one for direct GETs (Issue), and one for events (EventIssue). Most attributes are directly assigned, except where otherwise commented. Some cassettes needed to be updated for the relatively new 'assignees' attribute that now comes back. A sample json needed to be updated as well. Cassettes that needed updated that were associated with authenticated calls got the calls updated to use auto_login as well. Some cassettes will break between older requests and newer, so tag those tests accordingly. Allow the tests to work in older requests land for now. The search tests needed to be updated as well, label name changed and syntax changed slightly. Related-to sigmavirus24#670 Signed-off-by: Jesse Keating <jkeating@j2solutions.net>

The Gist and GistFile objects have both short and regular representations. Further, we needed a separate representation for Gist forks that are part of the Gist representation. Refs gh-670

This started as an effort to break up Release classes for "Short" versions. However I discovered that there is no difference in data when iterating over a Release or an Asset vs a direct GET of them. I still cleaned up the classes to be more in line with other efforts. Remove custom header as Releases are out of prerelease. Update the example data sets with real release data from our own repository. Account for upload url differences from real data. Related: sigmavirus24#670 Signed-off-by: Jesse Keating <jkeating@j2solutions.net>

omgjlk · 2018-01-24T21:22:25Z

So @sigmavirus24 , where do we stand on this? I haven't gone digging to see if there is more to revamp, but is there an overall test or something we can use to see if our coverage is right? What else needs to be done to consider this ready, and to consider a release?

sigmavirus24 · 2018-01-27T13:12:49Z

Sorry @omgjlk there's a lot to parse in this thread. I added two more items that I think help summarize what I understand to be left. I'm kind of covering the last item in my WIP Pr.

omgjlk · 2018-01-27T16:55:20Z

Thanks. We should probably add the open PR (I'll be fixing it this weekend) to make session non-optional.

Also raise exceptions on any 400 or greater status code as discussed in gh-670 As it stands today, it's hard for people to determine whether None is returned because that's the API's response or because None is what we return in the event of a 404 or some other "error"-like status code. This change undoes that ambiguity and allows people to handle missing or permission-related problems using exceptions.

This adds logic so that Short* classes can return the fuller representation. For example: >>> import github3 >>> users = [(u, u.refresh()) for u in github3.all_users(10)] >>> for short_user, user in users: ... assert isinstance(short_user, github3.users.ShortUser) ... assert isinstance(user, github3.users.User) Will run without exceptions. Related to gh-670

…ion). See sigmavirus24#730 for some background information. This still fails 1 test because the cassette Repository_collaborators.json needs to be updated with the permissions as it somehow does not contain that information. The GitHub API does say that it should be there, but the recording is from 2014 so perhaps it wasn't returned back then. Note that I have currently put the named parameter `affiliation` at the front of the arguments in `Repository.collaborators`. I did this to be in line with the other similar iterators. This can be moved to the back, but I assume some backwards-incompatible changes are fine for 1.0.0? This uses separate classes for the specific User object uses as I understood the idea was from sigmavirus24#670.

…ion). See sigmavirus24#730 for some background information. Note that I have currently put the named parameter `affiliation` at the front of the arguments in `Repository.collaborators`. I did this to be in line with the other similar iterators. This can be moved to the back, but I assume some backwards-incompatible changes are fine for 1.0.0? This uses separate classes for the specific User object uses as I understood the idea was from sigmavirus24#670.

sigmavirus24 assigned esacteksab and itsmemattchung Jan 21, 2017

sigmavirus24 added the Request For Comments label Jan 21, 2017

sigmavirus24 added this to the 1.0 milestone Jan 21, 2017

sigmavirus24 mentioned this issue Jan 21, 2017

how to check if no resource was found? #665

Closed

sigmavirus24 mentioned this issue Jan 29, 2017

Excise github3.empty.Empty #676

Merged

sigmavirus24 added a commit that referenced this issue Jan 29, 2017

Remove tests that affirm existence and use of Empty

6109f33

Now that we've excised the usage of Empty and its presence in github3.py, let's remove all references to it in our testsuite. Related-to #670

sigmavirus24 added a commit that referenced this issue Jan 29, 2017

Remove Empty from our documentation

10d625b

Now that we've removed Empty from our code and tests, remove it from our documentation. Related-to gh-670

sigmavirus24 mentioned this issue Jan 29, 2017

Docs show a function issues_on for GitHub object, but I am getting attribute error #677

Closed

sigmavirus24 mentioned this issue Jan 29, 2017

Excise NullObject from github3.py #678

Merged

sigmavirus24 mentioned this issue Feb 2, 2017

Split User object into more accurate objects #680

Merged

sigmavirus24 mentioned this issue Feb 4, 2017

Split up Repository object #681

Merged

sigmavirus24 pushed a commit to omgjlk/github3.py that referenced this issue Dec 18, 2017

Clean up docstrings in github3.orgs

5fb0817

Clean them up as done in sigmavirus24#680 for users.py Related-to: sigmavirus24#670 Signed-off-by: Jesse Keating <jkeating@j2solutions.net>

omgjlk added a commit to omgjlk/github3.py that referenced this issue Dec 19, 2017

Update docstrings for pulls.py

511675f

Related-to sigmavirus24#670 Signed-off-by: Jesse Keating <jkeating@j2solutions.net>

omgjlk mentioned this issue Dec 19, 2017

Implement PullRequest refactor #749

Merged

omgjlk mentioned this issue Dec 20, 2017

Split up Issue objects #751

Merged

sigmavirus24 mentioned this issue Dec 26, 2017

Split-up Gist-related objects as necessary #753

Merged

omgjlk mentioned this issue Jan 2, 2018

GitHub.repository objs returned by gh.iter_all_repos() missing created_at (and other) fields #760

Closed

omgjlk mentioned this issue Jan 4, 2018

Clean up and enhance Release objects #764

Merged

omgjlk unassigned itsmemattchung Jan 24, 2018

sigmavirus24 mentioned this issue Mar 13, 2018

Raise exceptions for unexpected status codes #789

Merged

sigmavirus24 mentioned this issue Mar 13, 2018

Allow .refresh to return the longer object #790

Merged

sigmavirus24 closed this as completed Mar 13, 2018

bkruger99 mentioned this issue Jul 24, 2018

github3.null is gone jlevy/ghizmo#4

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC] [Meta] Design Reconsiderations #670

[RFC] [Meta] Design Reconsiderations #670

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!