8000 Redesign of make_request and the returned response object. And use of xmltodict. by elcolumbio · Pull Request #75 · python-amazon-mws/python-amazon-mws · GitHub
[go: up one dir, main page]

Skip to content
This repository was archived by the owner on Jan 4, 2025. It is now read-only.

Conversation

@elcolumbio
Copy link
Contributor
@elcolumbio elcolumbio commented May 27, 2018
  • deleted Datawrapper see below
  • renamed Dictwrapper to Datawrapper introduced functionality for text (reports)
  • use xmltodict
  • introduced DotDict which replaces the ObjectDict. Difference it's by design recursive. So you can use methods on complete nested structures. Like you can at any point of your parsing convert it to a 100 % python data structure build up of dicts and lists. DotDict has useful functionality enable deepcopy for DictWrapper #63, get method, use pprint as default printout, handles cdata correctly.
  • removed some of the implicit try except confusing parts
  • use chardet to guess encoding (same how requests does, but better :) )
  • correctly handles cdata in xml:
    Example old for actual cdata: 'Length': {'Units': {'value': 'inches'}, 'value': '9.00'},
    Example new for actual cdata: 'Length': {'#text': '2.80', '@Units': 'inches'}
    The other 'value' tags are removed.

What should be the response?
sensible defaults are in the parsed property (it seems the name parsed is confusing people):
for XML: DotDict
for Text: decoded text

reverted pull requests #72 #68 those expected a functionality which i have implemented by default.
in case of #72 its better to be explicit, this is possible right now.
check if parsed_response.parsed is None and then fallback to parsed_response.response.content
For XML reports it makes sense to always fallback to the dot_dict as default. I introduced a flag for this.

#68 assumes we would parse or encode or unzip text which we have never done. In this pull request its implemented by design, via the request text method.

The now unified DataWrapper Object allows you to access:

  • a python dictionary only for XML: pydict
  • the headers: headers
  • the request.response object: original

Try it out :)

Also i am happy that i found this as logical base for the DotDict.

@elcolumbio
Copy link
Contributor Author
elcolumbio commented May 28, 2018

removed not used namespaces

If we use xmltodict we can make our codebase much cleaner.
Like i would remove everything about namespaces.
We parse them from the xml and so can everyone who wants to parse the xml himself.
I don't understand namespaces, but i question if we ever helped anybody with those Constants?

Right now we removed all namespaces, without any marker. It looks like it worked, so we can exclude naming cullusions and the need of namespaces.

now we parse them from the response
before often the first thing we did was to remove the namespace without any marker
@elcolumbio elcolumbio changed the title first version of xmltodict replace parsing of xml with the library xmltodict May 28, 2018
@elcolumbio elcolumbio changed the title replace parsing of xml with the library xmltodict Replace parsing of xml with the library xmltodict May 28, 2018
@codecov-io
Copy link
codecov-io commented May 28, 2018

Codecov Report

❗ No coverage uploaded for pull request base (develop@0300af9). Click here to learn what that means.
The diff coverage is 68.42%.

Impacted file tree graph

@@            Coverage Diff             @@
##             develop      #75   +/-   ##
==========================================
  Coverage           ?   81.16%           
==========================================
  Files              ?       18           
  Lines              ?      924           
  Branches           ?       94           
==========================================
  Hits               ?      750           
  Misses             ?      165           
  Partials           ?        9
Impacted Files Coverage Δ
mws/apis/products.py 100% <ø> (ø)
mws/apis/orders.py 100% <ø> (ø)
mws/apis/inbound_shipments.py 100% <ø> (ø)
mws/apis/inventory.py 100% <ø> (ø)
mws/apis/recommendations.py 100% <ø> (ø)
mws/apis/sellers.py 100% <ø> (ø)
mws/apis/subscriptions.py 23.18% <0%> (ø)
mws/apis/reports.py 100% <100%> (ø)
mws/utils.py 82.14% <54.54%> (ø)
mws/mws.py 78.91% <88.05%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0300af9...d37de8c. Read the comment docs.

Florian Benkö added 11 commits May 28, 2018 10:23
just to get an overview
it seems to me we often did not parse anything
i dont understand the hashfunction just by inference i could see the docs and design are just wrong.
- it takes not a string, it takes a bytes object
- you don't need a class for that one string
if you think its too ugly. there a bunch of other ways to do it. I think its ok.
@nateshmbhat
Copy link

sorry if its a basic question. How do I clone this pull request ?

@GriceTurrble
Copy link
Member

Changes merged from #84 .

Florian Benkö added 3 commits December 18, 2018 10:43
it now uses the incremental reading
the old version needed some refactoring anyway.
https://chardet.readthedocs.io/en/latest/usage.html#basic-usage
if this is too slow we can use the old approach besides we use detection on all lines at once.
@elcolumbio
Copy link
Contributor Author
elcolumbio commented Dec 18, 2018

Just looked over everything again.
I only changed the guess_encoding function and some comments.
Now it is nearly impossible to run into encoding errors, i tested even some bytestrings with mixed encodings (malformatted).

Now I am quite happy with the code. But most likely i dont see the critical things.

@elcolumbio
Copy link
Contributor Author

Should i make a new clean pull request?

@GriceTurrble
Copy link
Member

No need, I just need to find the time to get into it. I'll set time aside this week and get cracking.

description=short_description,
long_description=long_description,
packages=['mws'],
packages=['mws', 'mws.apis'],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fixing a bug since we introduced the additional folder apis.
It is recommended to do it like this for simple projects, see for e.g. here https://setuptools.readthedocs.io/en/latest/python3.html, its also done with a point instead of a backslash.

@elcolumbio elcolumbio changed the title Very much needed redesign of make_request and the returned response object. And use of xmltodict. Redesign of make_request and the returned response object. And use of xmltodict. Dec 20, 2019
@Bobspadger
Copy link
Member

So just doing a bit of testing by importing it into my product code.

I've only spent 15 minutes so far, (and i'm about to leave the office for Christmas!) but this seems to have broken then exected behaviour of being able to do object.ASIN etc as it now returns a DotDict which no longer returns attributes as before.

I recovered this by using the .parsed method to get the indiviudual objects.

I've also tried with the .pydict option and this too is now nested in a GetCompetitivePricingForASINResult when calling the GetCompetitivePricing api.

I think the old interface was much more user friendly - are we able to refactor this to allow access in the old way so as not to break too much for v1.0 ?

Other than that good work , Merry Christmas!

class DictWrapper(object):
"""
Main class that converts XML data to a parsed response object as a tree of ObjectDicts,
stored in the .parsed property.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was supposed to be the .parsed property before too. And the syntax: response.parsed reads nice.

Copy link
Member
@Bobspadger Bobspadger Dec 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here's a quick example of what I mean:

import mws

SECRET_KEY = ""
AWS_ACCESS_KEY = ""
SELLER_ID = ""
MARKETPLACE_ID = ""

# Setup - the same for both
products_api = mws.Products(access_key=AWS_ACCESS_KEY, secret_key=SECRET_KEY, account_id=SELLER_ID, region='UK')
asin = 'B0107YJU8Q'

competitive_prices = products_api.get_competitive_pricing_for_asin(
    asins=[asin], marketplace_id=MARKETPLACE_ID)


product = competitive_prices.parsed

# For the current version, including the develop branch, this will return the ASIN of the product by using the dot
# notation, the .get() method and a deeper dot notation
# Access the ASIN:
try:
    print(product.ASIN)
except KeyError as exc:
    print('Cannot access this by dot notation')
print(product.get('ASIN')) # this returns the ASIN
print(product.Product.Identifiers.MarketplaceASIN.ASIN) # This returns the ASIN


# xmltodict version
# Sadly, this version does not return the ASIN as it is now @ASIN 
try:
    print(product.ASIN) # This is returning None, should retrun the ASIN
except Exception as exc:
    print('Cannot access the .ASIN')

print(product.get('ASIN')) # This returns None, should retrun the ASIN
print(product.Product.Identifiers.MarketplaceASIN.ASIN)
print(product.get('@ASIN')) # Returns the ASIN, its not very 'clean'

## There are other areas that the same @ symbol is prefixing keys, such as offer listings etc

We are now getting elements returned prefiex with @ which breaks the API.
I feel it should keep the consistent dot notation so it works with all nodes, not just some.

so you should be able to access any node using the dot notation, and not have to change back to the .get('@ASIN) , .get('@belongsToRequester) etc:

{'CompetitivePrice': {'@belongsToRequester': 'false', '@condition': 'New', '@subcondition': 'New', '@xmlns': OrderedDict([('ns2', 'http://mws.amazonservices.com/schema/Products/2011-10-01/default.xsd')]), 'CompetitivePriceId': '1', 'Price': {'LandedPrice': {'CurrencyCode': 'GBP', 'Amount': '99.00'}, 'ListingPrice': {'CurrencyCode': 'GBP', 'Amount': '99.00'}, 'Shipping': {'CurrencyCode': 'GBP', 'Amount': '0.00'}}}}

I think that the dot notation either works fully (so you can get access to whichever bit of the returned data you want, or gets dropped completely and the dictionary method is used (so .get() and data['key'] etc)

Does that make sense?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done a bit of digging and it looks like the xmltodict library won't let us achieve this:

https://github.com/martinblech/xmltodict/blob/master/xmltodict.py#L204

Which is a shame.

I still think its worth a discussion on if this is a breaking change we are happy to use.

@GriceTurrble GriceTurrble changed the base branch from develop to feature-xmltodict-revamp February 25, 2020 18:52
@GriceTurrble
Copy link
Member

Hi folks,

I've been holding on this one for a long while now, but I'd like to bring it into the larger project for further testing and tweaking.

Internal branch feature-xmltodict-revamp is a recent copy of develop, so it serves as a clean starting point we can build on.

@elcolumbio , thank you for the initial contribution here. You are welcome to keep contributing on the internal feature branch so we can get a good final candidate before merging into develop.

@GriceTurrble GriceTurrble merged commit 3f12b76 into python-amazon-mws:feature-xmltodict-revamp Feb 25, 2020
@Bobspadger
Copy link
Member

Hi folks,

I've been holding on this one for a long while now, but I'd like to bring it into the larger project for further testing and tweaking.

Internal branch feature-xmltodict-revamp is a recent copy of develop, so it serves as a clean starting point we can build on.

@elcolumbio , thank you for the initial contribution here. You are welcome to keep contributing on the internal feature branch so we can get a good final candidate before merging into develop.

I think this makes perfect sense.

I think we need to think how the xml2dict should work - as the new implementation has some fairly big changes (making some of the dot access almost pointless / redundant)

@Mitalee
Copy link
Mitalee commented Feb 27, 2020

(making some of the dot access almost pointless / redundant)
Could you elaborate?

I have noticed different responses for single object versus multiple object xml responses - in the parsed version, the single object response is converted to a simple dict, whereas the multi object response is converted to a list. So I had to code in different actions to both events (whether the parsed response was a dict or a list). Maybe this could be taken into account while implementing a new format?

@Bobspadger
Copy link
Member

(making some of the dot access almost pointless / redundant)
Could you elaborate?

I have noticed different responses for single object versus multiple object xml responses - in the parsed version, the single object response is converted to a simple dict, whereas the multi object response is converted to a list. So I had to code in different actions to both events (whether the parsed response was a dict or a list). Maybe this could be taken into account while implementing a new format?

This is something we have discussed about the original implementation a few times.
My in house solution was to make EVERYTHING into a list, even if it was only 1 item long, meaning the rest of my code dealt with the response correctly.

I believe treating the results this way in the module would be beneficial and save people hours of their life testing :)

As for the dot access redundant, on the new xmltodict system, the response.ASIN is no longer accessible via the dot notation as it is response.@ASIN which is not accessible.

I feel this is a really important function to have, as when you are retrieving long lists of results of unknown products, being able to quickly and easily access the ASIN is incredibly important.

Just my feelings on the subject :)

@elcolumbio
Copy link
Contributor Author

That is fine, it was my first bigger contribution and i still have to learn a lot of things. Like not changing too many different things or make the process easier to follow.
@Bobspadger I very much agree with you on the dot accessibility. If i remember correctly those are used because it could be a list and that is not achievable with a normal dictionary. Those conflicts also appear some levels down. Most of the time it works, but there are definetely cases in our production code where it fails (i found: finance api and the different transaction types, the lists gets overwritten by unique dictionary keys).
But i believe there is a pragmatic solution for dot access. With one additional rule like Bobspadger mentioned.

@Bobspadger
Copy link
Member

So we have two discussions going on here it looks like.

  1. Dot access to the root of the response for things like .ASIN
  2. Always returning a list of items

Does that sound fair?

@Bobspadger
Copy link
Member

That is fine, it was my first bigger contribution and i still have to learn a lot of things. Like not changing too many different things or make the process easier to follow.
@Bobspadger I very much agree with you on the dot accessibility. If i remember correctly those are used because it could be a list and that is not achievable with a normal dictionary. Those conflicts also appear some levels down. Most of the time it works, but there are definetely cases in our production code where it fails (i found: finance api and the different transaction types, the lists gets overwritten by unique dictionary keys).
But i believe there is a pragmatic solution for dot access. With one additional rule like Bobspadger mentioned.

Its all good @elcolumbio its fantastic works , keep it up.

As long as we can all agree on the route forward thats the best for everyone then everything will be awesome!

https://youtu.be/StTqXEQ2l-Y

@GriceTurrble GriceTurrble mentioned this pull request Aug 28, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants

0