[go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find better examples #9

Open
mkrd opened this issue Oct 16, 2022 · 1 comment
Open

Find better examples #9

mkrd opened this issue Oct 16, 2022 · 1 comment

Comments

@mkrd
Copy link
Owner
mkrd commented Oct 16, 2022

First, I appreciate the work you've put into this. But, frankly, I don't know if this is something I could see myself using. The main reason being that it introduces a lot of non-obvious syntax for something as primitive as dict manipulation. I would consider it a burden on whoever is trying to understand my code because this is such a low-level thing to have a customized implementation for. The only trade-off being that I save some time reading KeyError guards or if-statements.

And on the other hand, I'm always unconvinced to utilize a library when examples seem to be constructed strawmen to pit their new implementation against. It's kind of like seeing someone pick a fight with a blind guy; it's obvious you're going to win. The code in your examples is questionably written in my opinion:

for user_name in db:
user_music = None
if user_name in db:
if "interests" in db[user_name]:
if "music" in db[user_name]["interests"]:
user_music = db[user_name]["interests"]["music"]
print(user_music)

out: None
out: ["pop", "alternative"]
Why are we iterating over the keys of db and then checking to see if the keys are in the iterable we're iterating over? That's a given.

Why are we initializing user_music as None? It seems like an attempt to make this code seem even longer and more cumbersome than it really needs to be.

Why are we not utilizing db.items()?

In my opinion, this a more realistic representation of code you'd actually come across in this situation:

for user, user_dict in db.items():
for interest, interest_list in user_dict['interests'].items():
print(interest_list if interest == 'music' else None)

out: None
out: ["pop", "alternative"]
Half the work I do is data analysis and it's also uncommon to encounter data in the form of records with random missing keys (interests I mean). If you find yourself the point where you're handling record data and you don't know if your first nested key is going to exist, let alone the second nested key... It seems like the solution isn't to have more capable dict-handling syntax but to reassess why your data looks like that to begin with and fix that.

I understand that in this context-free example we don't know if interests will be a key for each user. But in reality that would mean an entire column of your records data doesn't exist for an entry. And that's just poor data at that point.

Again, thanks for sharing and I hope this doesn't come off as harsh or discouraging. I just thought I'd share what stopping me and maybe other developers from pip installing your package.

@mkrd
Copy link
Owner Author
mkrd commented Oct 16, 2022

Thanks for the comment! I think you are absolutely right, the example is just bad. I opened an issue to find a better one.

Especially in the academic field, theres often spotty json data you have to work with. I had a project once, where I got terabytes of json files from a crawler, where some attributes where sometimes present, and sometimes not. In this case, it helped a lot to be able to just work with data that is there, and skip if it is None.

That's the thing with json data, that is does not have to adhere to a table schema with rows and columns. I'm not saying that you should structure your data like that, but if you get data that looks like this, then it makes easier to work with it.

PathDict is not perfect by no means, and I think it still has a long way to go, but I know of at least one instance where it is used in production level code and provides a benefit there. I just have to dig up some better examples: Heres one:

crawler_output = {
"crawl_time": "2021-09-01 12:00:00",
"posts": [
{
"id": 1,
"title": "Hello World",
"meta": {
"deleted": True,
"author": "u1"
}
},
{
"id": 2,
"title": "Hello aswell",
"meta": {
"author": "u2"
}
}
]
}

Get all deleted Posts:

deleted_posts = []
for post in crawler_output["posts"]:
if "meta" in post:
if post["meta"].get("deleted", False):
deleted_posts.append(post)

Or

deleted_posts = [post for post in crawler_output["posts"] if post.get("meta", {}).get("deleted", False)]

Remove all deleted posts

db["posts"] = [post for post in crawler_output["posts"] if not post.get("meta", {}).get("deleted", False)]

print("Deleted Posts:")
print(deleted_posts)

PD version get deleted posts

pd = PathDict(crawler_output)
deleted_posts = pd.filtered("posts", lambda x: x["meta", "deleted"])["posts"]
print(deleted_posts)

PD version remove deleted posts

pd.filter("posts", f=lambda x: not x["meta", "deleted"])
In this example we get the output of a crawler which contains a list of posts crawler from a social network. Sometimes it contains a "posts" attribute if there were new posts, and if there are none then it does not exist. Posts can have a deleted flag, in a meta attribute. The meta attribute only exists if there is metadata, and the deleted flag only exists if it is true. Here using a PathDict produces cleaner code. Not even to mention that the regular dict version crashes if "posts" does not exist.

Do you think this is a better example?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant