8000 Support for arbitrary range types · Issue #66 · jinja2cpp/Jinja2Cpp · GitHub
[go: up one dir, main page]

Skip to content

Support for arbitrary range types #66

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Manu343726 opened this issue Oct 7, 2018 · 23 comments
Closed

Support for arbitrary range types #66

Manu343726 opened this issue Oct 7, 2018 · 23 comments
Assignees
Labels
die hard Difficult to fix/implement bug or feature enhancement New feature or request
Milestone

Comments

@Manu343726
Copy link
Contributor
Manu343726 commented Oct 7, 2018

first of all, tag this as "feature-request" "discussion" etc, for the first time I'm not coming here with a bug report :P

I've been thinking about reflecting any kind of iterable range type. By range I mean any type where calls to std::begin() and std::end() are valid.

I see two advantages of implementing that interface:

  • Out of the box support for any container: As long as you're doing a foreach loop (which is the case in most of the template loops) it works. Of course we could provide a fake size() method that does std::distance(), for the uncommon case an user applies a length filter on the range.

  • Lazy ranges: This could be a game changer. By accepting arbitrary range types users could pass lazy ranges to the template, which means true zero-copy API binding. In my case, with @foonathan's cppast, a coroutine range implementing the AST visit comes to my mind.

@flexferrum flexferrum added the enhancement New feature or request label Oct 7, 2018
@flexferrum
Copy link
Collaborator

I suppose, you can do it in the same way as ContainerReflector implemented:
https://github.com/flexferrum/Jinja2Cpp/blob/a146f5bdae19d6fe2eff165f03ccf20cde4ddce8/include/jinja2cpp/reflected_value.h#L114

Actually, I thought about generic range reflectors. But I postponed it till the main Jinja2 features would be implemented and debugged. And you should always keep in mind objects lifetime and prevent situation you run across recently. With arbitrary/generator that's even easier than with temporary containers.

The second point: good implementation of this feature requires refactoring of the current for statement implementation and GenericList data type. It should be closer to the bare Jinja2 when loop 'knows' is it necessary to calculate size of the iterable range and so on. And iterates it not by the index, but by the iterator.

@Manu343726
Copy link
Contributor Author
Manu343726 commented Oct 8, 2018

I suppose, you can do it in the same way as ContainerReflector implemented

I'm not sure, since you're handling the range there by index. The point of lazy ranges is that the iterator is visited only once, with the iteration going from begin to end. I would instead write a generic range for iteration adapter (For the {% for item in range %} loops). Something like the boost::range::any_range, but that type-erases the value type too.

@Manu343726
Copy link
Contributor Author

Just figured out a way: With range-v3 we could use a transform view to take any range value and transform it into a jinja2::Value:

template<typename Range>
ranges::v3::any_view<jinja2::Value> make_type_erased_range(const Range& range)
{
    return {range | ranges::v3::transform([](const auto& item)
    {
        return jinja2::Reflect(item);
    })};
}

Now you can wrap a ranges::v3::any_view<jinja2::Value> in a generic range interface:

@flexferrum
Copy link
Collaborator

I'm not sure, since you're handling the range there by index.

I see no easy way to avoid this till Jinja2 lists/tuples should support both index-based and range-based access concept. From another point of view, lists in bare Jinja2 could be generator-based (which can cause tricky errors in loops according to their issues tracker). The one of possible ways to allow index-based access only if underlying collection is support it.

Now you can wrap a ranges::v3::any_viewjinja2::Value in a generic range interface:

You missed the interface definition. :) Personally I tends to implement .Net-based collection access style. I.e. introduce something like IEnumerable/IEnumerator interface which is (usually) quite simple. And then implement it for any type of collections/ranges.

@Manu343726
Copy link
Contributor Author

Here comes the full example of what I think Jinja2Cpp could be capable of with this generic range support:

template<typename AstEntity>
using coro_t = boost::coroutine2::coroutine<type_safe::object_ref<const AstEntity>>;

template<typename AstEntity>
using yield_t = coro_t<AstEntity>::push_type;

template<typename AstEntity, typename Predicate>
void visit(yield_t<AstEntity>& yield, const cppast::cpp_entity& root, Predicate predicate)
{
    cppast::visit(root, [&yield, predicate](const cppast::cpp_entity& entity, const cppast::cpp_visitor_info& info)
    {
        if(entity.kind() == AstEntity::kind() && predicate(root, entity, info))
        {
            yield(type_safe::cref(entity));
        }
    });
}

for(const auto& class_ : visit<cppast::cpp_class>(parser.parse("foo.hpp"), [](const auto& root, const auto& entity, const auto& info)
{
    return entity.name() == "MyClass";
}))
{
    std::cout << "Another class named \"MyClass\" found!\n";
}

template<>
struct TypeReflection<cppast::cpp_file>
{
    ...
    {"classes", [](const cppast::cpp_file& file) { return jinja2::Reflect(visit<cppast::cpp_class>(file, ...)); }}
    ...
};

@Manu343726
Copy link
Contributor Author
Manu343726 commented Oct 8, 2018

You missed the interface definition. :)

I think that's my main problem, that I've tried to see the full example of how a generic interface could be implemented and used to do TypeReflection for my custom container/range type. I think once I see it in full we will agree 100%.

Personally I tends to implement .Net-based collection access style. I.e. introduce something like IEnumerable/IEnumerator interface which is (usually) quite simple. And then implement it for any type of collections/ranges.

That's perfectly fine for me, since it's basically a different POV of the same problem. Whether the range is accessed by a pair of iterators or by a Iterable interface is not the issue I think (Both could be translated from one to the other by an adapter).

@Manu343726
Copy link
Contributor Author

and used to do TypeReflection for my custom container/range type. I think once I see it in full we will agree 100%.

After reading your response in #67, now I see that the solution is not TypeReflection but implementing the container interface and specializing the reflector in jinja2::detail namespace to use that interface to access the ranges.

@flexferrum
Copy link
Collaborator

Yep.

@flexferrum
Copy link
Collaborator

That's perfectly fine for me, since it's basically a different POV of the same problem. Whether the range is accessed by a pair of iterators or by a Iterable interface is not the issue I think (Both could be translated from one to the other by an adapter).

I keep in mind this feature, but due to other activities (and issues) I'm planning to start implementing it not early than the beginning of November. Have to prepare for the conference talk.

@Manu343726
Copy link
Contributor Author

Conference talk? Where and when?

@flexferrum
Copy link
Collaborator
flexferrum commented Oct 8, 2018

In Minsk, at the Corehard++. Will introduce my Sutter's metaclasses out-of-compiler implementation.

@Manu343726
Copy link
Contributor Author

Will introduce my Sutter's metaclasses out-of-compiler implementation

Whoa, that's awesome! It's one of the experiments I wanted to do with tinyrefl. Do you have link to the project or something?

@flexferrum
Copy link
Collaborator

Implementation prototype is here: https://github.com/flexferrum/autoprogrammer/tree/metaclasses/src/generators/metaclasses

And here the usage sample:
https://github.com/flexferrum/autoprogrammer/tree/metaclasses/test/metaclasses

@Manu343726
Copy link
Contributor Author

I've been reading your sources in depth to try to implement this myself, but I've found that the for evaluation is much more complex than I thought :(

@flexferrum
Copy link
Collaborator

Honestly, I took this particular part (statements evaluator) from clang constexpr evaluation module. But only this. And as far as I can understand (and clang guys wrote in the docs) it's impossible to implement such kind of interpreter over the libclang-c API. You need access to the full AST, not only reflected (inside libclang-c) and erasured subset.

@Manu343726
Copy link
Contributor Author

In case I would like to implement arbitrary range support myself, could you please provide me with a todo list or something? For me, the module/part most difficult to understand is how the template for statements are evaluated. A full explanation of the process would help a lot :)

@flexferrum
Copy link
Collaborator

Hm... Ok. I'll try. 'From the stove', as we use to say. :)

User-provided lists are described buy the two types: ValuesList, which is simple vector of Value and GenericList which covers all reflection-related cases. The interface of the GenericList defined as follows:

struct ListItemAccessor
{
    virtual ~ListItemAccessor() {}

    virtual size_t GetSize() const = 0;
    virtual Value GetValueByIndex(int64_t idx) const = 0;
};

I.e. simple (primitive) index-based enumerator. After params preprocessing inside Jinja2Cpp both ValuesList and GenericList are converted to the ListAdapter generic list type:

class ListAdapter
{
public:
    ListAdapter() {}
    explicit ListAdapter(ListAccessorProvider prov) : m_accessorProvider(std::move(prov)) {}
    ListAdapter(const ListAdapter&) = default;
    ListAdapter(ListAdapter&&) = default;

    static ListAdapter CreateAdapter(InternalValueList&& values);
    static ListAdapter CreateAdapter(const GenericList& values);
    static ListAdapter CreateAdapter(const ValuesList& values);
    static ListAdapter CreateAdapter(GenericList&& values);
    static ListAdapter CreateAdapter(ValuesList&& values);

    ListAdapter& operator = (const ListAdapter&) = default;
    ListAdapter& operator = (ListAdapter&&) = default;

    size_t GetSize() const
    {
        if (m_accessorProvider && m_accessorProvider())
        {
            return m_accessorProvider()->GetSize();
        }

        return 0;
    }
    InternalValue GetValueByIndex(int64_t idx) const;
    bool ShouldExtendLifetime() const
    {
        if (m_accessorProvider && m_accessorProvider())
        {
            return m_accessorProvider()->ShouldExtendLifetime();
        }

        return false;
    }

    ListAdapter ToSubscriptedList(const InternalValue& subscript, bool asRef = false) const;
    InternalValueList ToValueList() const;

    class Iterator;

    Iterator begin() const;
    Iterator end() const;

private:
    ListAccessorProvider m_accessorProvider;
};
https://github.com/flexferrum/Jinja2Cpp/blob/a146f5bdae19d6fe2eff165f03ccf20cde4ddce8/src/internal_value.h#L180

This type has iterator-based access and used in every place where list processing is involved (`for` loop, filters etc.). Type is actually kind of type-erasure and interacts with the underlying list via implementation of the `IListAccessor` interface which is similar to `ListItemAccessor`:

```c++
struct IListAccessor
{
    virtual ~IListAccessor() {}

    virtual size_t GetSize() const = 0;
    virtual InternalValue GetValueByIndex(int64_t idx) const = 0;
    virtual bool ShouldExtendLifetime() const = 0;
};

All desired wrappers created here: https://github.com/flexferrum/Jinja2Cpp/blob/a146f5bdae19d6fe2eff165f03ccf20cde4ddce8/src/internal_value.cpp#L193

There is also ConvertToList utility function which can convert provided arbitrary Value/InternalValue to list. If it's possible: https://github.com/flexferrum/Jinja2Cpp/blob/a146f5bdae19d6fe2eff165f03ccf20cde4ddce8/src/internal_value.cpp#L114

So, closer to for loop. The basic logic is simple:

  1. Take value
  2. Convert it to list
  3. Apply loop body for every list item

This is the basic logic. But actual behavior of for loop is more complex:

  1. There is a loop var with different fields such as revindex, revindex0, length etc. Calculation of this fields involves knowledge of actual input list size.
  2. else statement which should applied if no items was rendered during the loop processing
  3. if part which causes conditionally items rendering (but all indexing, length etc. should reflect the actually rendered items number)

So, the full execution of for loop is following:

In this line: https://github.com/flexferrum/Jinja2Cpp/blob/a146f5bdae19d6fe2eff165f03ccf20cde4ddce8/src/statements.cpp#L25
loop.loop() callable is constructed. It allows to reenter the loop body in case of recursive call.

Here:
https://github.com/flexferrum/Jinja2Cpp/blob/a146f5bdae19d6fe2eff165f03ccf20cde4ddce8/src/statements.cpp#L45

input value is converted to the list. If failed, the else branch is rendered.

Here:
https://github.com/flexferrum/Jinja2Cpp/blob/a146f5bdae19d6fe2eff165f03ccf20cde4ddce8/src/statements.cpp#L55

the input list is preliminary filtered with if expression if present. It allows to keep result list indexes (and size) consistent with the number of filtered items.

Here:
https://github.com/flexferrum/Jinja2Cpp/blob/a146f5bdae19d6fe2eff165f03ccf20cde4ddce8/src/statements.cpp#L77
loop var is initialized and list is iterated by index. In case of loops like this: {% for k, v in items %} the corresponding fields are extracted from the current list item and passed to the body executor. Overwise list item passed as-is.

Here:
https://github.com/flexferrum/Jinja2Cpp/blob/a146f5bdae19d6fe2eff165f03ccf20cde4ddce8/src/statements.cpp#L106

the else statement is executed if no items were rendered.

That is how I was implemented it. As you see, there could be some problems with moving it to generic iterator/enumerator-based list processing. But in pandor/jinja these problems are solved. What can we do here I'll describe later.

@Manu343726
Copy link
Contributor Author
Manu343726 commented Oct 16, 2018

There is a loop var with different fields such as revindex, revindex0, length etc. Calculation of this fields involves knowledge of actual input list size.

I see that these are part of Jinja2 standard variables available inside a for block. I will copy the full table here for reference:

Variable Description Implemented by Jinja2Cpp Semantic equivalence in non-sized ranges with the least restricted iterator (forward iterator)
loop.index The current iteration of the loop. (1 indexed)
  • [ yes ]
number of iterations
loop.index0 The current iteration of the loop. (0 indexed)
  • [ yes ]
number of iterations - 1
loop.revindex The number of iterations from the end of the loop (1 indexed)
  • [ no ]
loop.revindex0
  • [ no ]
loop.first True if first iteration.
  • [ yes ]
true if first iteration
loop.last True if last iteration.
  • [ yes ]
true if std::advance(it) == end()
loop.length The number of items in the sequence.
  • [ yes ]
loop.cycle A helper function to cycle between a list of sequences. See the explanation below.
  • [ yes ]
loop.depth Indicates how deep in a recursive loop the rendering currently is. Starts at level 1
  • [ no ]
loop.depth0 Indicates how deep in a recursive loop the rendering currently is. Starts at level 0
  • [ no ]
loop.previtem The item from the previous iteration of the loop. Undefined during the first iteration.
  • [ yes ]
loop.nextitem The item from the following iteration of the loop. Undefined during the last iteration.
  • [ yes ]
loop.changed(*val) True if previously called with a different value (or not called at all).
  • [ no ]

@flexferrum
Copy link
Collaborator
flexferrum commented Oct 16, 2018

(updated your comment and put marks into the third column)

Ok, let's continue. As far, as I can understand, here is the python implementation of the for loop;
https://github.com/pallets/jinja/blob/7a6704db55fcd9bbe5a90c3868e03f5a5fcf176a/jinja2/runtime.py#L346

In particularly, look at this piece of code:

def length(self):
        if self._length is None:
            # if was not possible to get the length of the iterator when
            # the loop context was created (ie: iterating over a generator)
            # we have to convert the iterable into a sequence and use the
            # length of that + the number of iterations so far.
            iterable = tuple(self._iterator)
            self._iterator = iter(iterable)
            iterations_done = self.index0 + 2
            self._length = len(iterable) + iterations_done
        return self._length

I. e. length is a dynamic property and calculated only if needed by the parsed template. But the length calculation causes the whole source list copying (this makes sense).

Actually, we can reproduce this functionality, but this will cause a lot of changes in the library core. It needs to:

  1. Improve the ListItemAccessor interface in the following way:
struct IndexBasedAccessor
{
     virtual Value GetValueByIndex(int64_t idx) const = 0;
};

struct Enumerator
{
    virtual void Reset() = 0;
    
    virtual bool MoveNext() = 0;
    virtual Value GetCurrent() = 0;
};

struct ListItemAccessor
{
    virtual ~ListItemAccessor() {}

    virtual IndexBasedAccessor* GetIndexer() = 0;
    virtual Enumerator* GetEnumerator() = 0;
    virtual nonstd::optional<size_t> GetSize() = 0;
};

In this case ListItemAccessor returns IndexedBasedAccessor if and only if it supports such kind of access. GetSize return materialized value if underlying range supports it as well. But Enumerator should be returned in any case because represents the concept of forward-only iterator.

  1. GenericList, ListAdapter and all connected interfaces should be refactored respectively. It shouldn't significantly affect the recent code because of most of it use iterators instead of index-based access. But it needs to be checked.

  2. ListAdapter should completely hide details of underlying list. It means that index operation, access first and last elements, size etc. should be emulated if underlying list doesn't provide such facilities. Possible corresponding pitfalls are acceptable.

  3. Dynamic properties. It should be kind of callable which can be accessed without ().

  4. Refactor of the for loop implementation according to the changes above which means:
    4.1 Replace if part calculation with list generator
    4.2 Replace length, revindex etc. with dynamic props
    4.3 Replace index-based iteration with iterator-based

That are the minimum set of changes that I see as implementation of the generic ranges.

@flexferrum flexferrum added the die hard Difficult to fix/implement bug or feature label Oct 16, 2018
@Manu343726
Copy link
Contributor Author
Manu343726 commented Oct 17, 2018

some ideas:

  • Write some traits to check if the C++ side ranges support size, indexing, access by key, etc
  • Write (or rewrite the existing) generic interface to access ranges and their properties (Indexing, access by key, size, begin, end, first item, last item, etc). Depending on the above traits, the implementation may return dummy results, throw, etc.
  • Rewrite the for evaluation code so that if is implemented as a filtered range:
    void evaluate_for(for_statement, for_range)
    {
        if(for_statement.has_if)
        {
            evaluate_for(for_statement.without_if, filtered_range(for_range, if_statement))
            return
        }
    
        // evaluate for here
    }
    so that evaluating an if in the for does not require copies, but the filtering is done on demand under the hood.
  • Expose the traits of the range in the template side as special variables in the for body, so users could check if it's safe to read index0, access by index, access by key, etc (In 99% cases there's no use of variables and indexing, or at least users only require index/index0).
  • Maybe add a --strict-range-variables option to the template engine so that if users request the evaluation of a property that is not possible with the underlying range (Such as access by index in an infinite range) the template engine issues an error like "Requested access by index in range 'foo', which is a <Range concept here, took from the traits>". This error may appear as a warning if the option is not set, and the template returns the dummy values.

I like this approach because every container/range (infinite ranges, vectors, maps, lists, etc) is handled as a generic range, which reduces special casing in the engine code.
An array is just a range with support of size() and indexing by integer
A map is just a range with support of size() and indexing by key
A lazy range is just a range with no size() and no indexing.

etc, etc

@flexferrum
Copy link
Collaborator

@Manu343726 , I agree with you in generally, but disagree in particularly. Unfortunately, form the prospective of the original Jinja2 (and Python as well) dictionaries aren't sequences. For instance, this template: {% set testMap={"key1": "val1", "key2": "val2"} %} {{ testMap | last | pprint }}, {{ testMap["key1"] | pprint }} reports the error: TypeError: argument to reversed() must be a sequence because of the last item calculated as: return next(iter(reversed(seq))). Meanwhile this template: {% set testMap={"key1": "val1", "key2": "val2"} %} {% for i in testMap %} {{i | pprint}} {% endfor %} produces the following result: key2 key1.You can get the similar result if apply 'join' filter to the dictionary. Only keys will be joined in some order.

So, I think, we can't mix list's 'indexing' traits and map's 'indexing' traits, and lists and maps as well. It's better to keep then distinguish. That's not good news but it allows us to keep the compatibility with python Jinja2.

Completely agree with the "string-range-variables" option. It will help to debug difficult cases.

@Manu343726
Copy link
Contributor Author

Unfortunately, form the prospective of the original Jinja2 (and Python as well) dictionaries aren't sequences

Oh, didn't know about that. I was trying to make integration of std and other containers as smooth as possible. So ranges only as iterable sequences, forget maps.

@flexferrum
Copy link
Collaborator

Oh, didn't know about that.

Me too. Just checked it today. But we can have another option here. This will be a complicated path and (possible) error-prone, but more C++-ish. We can integrate such containers in case of unset 'python-jinja2-compatilibity-mode' option.

@flexferrum flexferrum added this to the Release 0.9.2 milestone Feb 24, 2019
@flexferrum flexferrum modified the milestones: Release 0.9.2, Release 1.0 May 11, 2019
@flexferrum flexferrum modified the milestones: Release 1.0, Release 0.9.3 Jun 19, 2019
@flexferrum flexferrum self-assigned this Jun 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
die hard Difficult to fix/implement bug or feature enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants
0