Simplify type bounds. #2417

paulp · 2013-04-19T22:12:06Z

I started out looking to limit the noise from empty type
bounds, i.e. the endless repetition of

  class A[T >: _root_.scala.Nothing <: _root_.scala.Any]

This led me to be reminded of all the unnecessary and
in fact damaging overreaches which are performed during parsing.
Why should a type parameter for which no bounds are
specified be immediately encoded with this giant tree:

  TypeBounds(
    Select(Select(Ident(nme.ROOTPKG), tpnme.scala_), tpnme.Nothing),
    Select(Select(Ident(nme.ROOTPKG), tpnme.scala_), tpnme.Any)
  )

...which must then be manually recognized as empty type bounds?
Truly, this is madness.

It deftly eliminates the possibility of recognizing
whether the user wrote "class A[T]" or "class A[T >: Nothing]"
or "class A[T <: Any]" or specified both bounds. The fact
that these work out the same internally does not imply the
information should be exterminated even before parsing completes.
It burdens everyone who must recognize type bounds trees,
such as this author
It is far less efficient than the obvious encoding
It offers literally no advantage whatsoever

Encode empty type bounds as

  TypeBounds(EmptyTree, EmptyTree)

What could be simpler.

paulp · 2013-04-20T17:58:45Z

Review by @retronym

retronym · 2013-04-20T20:31:03Z

Looks like a winner to me, but I might run it past the Scala meeting on Tuesday to see there are any arguments in defence of the status quo.

Blaisorblade · 2013-04-20T21:24:15Z

src/compiler/scala/tools/nsc/ast/parser/Parsers.scala

+      if (defined.nonEmpty)
+        t setPos wrappingPos(defined)
+      else
+        t setPos o2p(in.offset)


What's the difference between start, end and in.offset? I guess in.offset changes, but still why are start and end saved and not used?

Just a minor nitpick: Is saving three characters worth picking lo/hi over low/high`?

Or put another way, is adding three characters worth gratuitously making up different names for constructs which already have names? "lo" and "hi" are what they are called.

start and end are residue from the great deal of flailing it took before I managed to get this tree past the position validator. I'll remove them.

After losing start and end, the ='s could be aligned further left if there were a 2-letter form for "defined". My brother is an avid Scrabble player and keeps a list of meaningful 2-letter combinations, with surprising results.

I wonder if that means you don't know about this... http://www.scrabble-assoc.com/ratings/state/co.html

(You have to scroll all the way down to #4.)

Then there's no excuse.

I would expect all the ='s to line up following seven-letter identifiers.

Blaisorblade · 2013-04-20T21:37:43Z

Just to be sure, you still canonicalize empty bounds, but only from Typers on, right? In general, the phases before canonicalization need to handle more equivalent or almost equivalent trees, so there are a few more risks for bugs. OTOH, GHC is proud to bring this even further for better error reporting — it performs type inference before desugaring exactly for this reason, so that it never needs to show the user code that the user didn't write.

paulp · 2013-04-20T22:23:34Z

@Blaisorblade This is the parser. It is not performing canonicalization; if you enter "A >: Nothing <: Any" that is what will be encoded. It is simply not inventing bounds.

I started out looking to limit the noise from empty type bounds, i.e. the endless repetition of class A[T >: _root_.scala.Nothing <: _root_.scala.Any] This led me to be reminded of all the unnecessary and in fact damaging overreaches which are performed during parsing. Why should a type parameter for which no bounds are specified be immediately encoded with this giant tree: TypeBounds( Select(Select(Ident(nme.ROOTPKG), tpnme.scala_), tpnme.Nothing), Select(Select(Ident(nme.ROOTPKG), tpnme.scala_), tpnme.Any) ) ...which must then be manually recognized as empty type bounds? Truly, this is madness. - It deftly eliminates the possibility of recognizing whether the user wrote "class A[T]" or "class A[T >: Nothing]" or "class A[T <: Any]" or specified both bounds. The fact that these work out the same internally does not imply the information should be exterminated even before parsing completes. - It burdens everyone who must recognize type bounds trees, such as this author - It is far less efficient than the obvious encoding - It offers literally no advantage whatsoever Encode empty type bounds as TypeBounds(EmptyTree, EmptyTree) What could be simpler.

Blaisorblade · 2013-04-20T22:47:16Z

@paulp
I think I didn't make myself clear. Reading the code more closely, I see that indeed, with your change, the parser stopped "inventing bounds", but this is now done by the typer. Now, "inventing bounds" is the canonicalization I was talking about, and it's a canonicalization since it's a pure function which turns inputs which are different but equivalent into the same output.

And my point is that whenever you change the parser to canonicalize less equivalent constructs in the source, which is something you want to do, later phases have to be more careful, since they'll have to deal with trees which were impossible before.

Probably the extra cost is still worth the more accurate error reporting, but I guess that's not for me to say since I don't know the code well enough.

paulp · 2013-04-20T23:07:41Z

Oh, my comment from my first version of this PR #2405 didn't make the jump. "This will impact anyone who parses TypeBoundsTrees and never expects to see an EmptyTree in there. I don't know what can be done about that beyond alerting/preparing the usual parties." So I understand it makes trees possible which weren't possible before.

paulp · 2013-04-20T23:13:17Z

I'm not sure anyone can call the typer a "pure function" with too straight a face. BTW you already had to deal with a potential wide range of untyped trees, which I'm sure nobody did, because they had to look for (at minimum) all of these:

Select(Select(Ident(nme.ROOTPKG), tpnme.scala_), tpnme.Nothing)
Select(Ident(tpnme.scala_), tpnme.Nothing)
Ident(tpnme.Nothing)

And you can't even know if that Nothing is the real Nothing and not some other type called Nothing until it's typed. Not to mention type aliases, etc. etc. etc. You just don't know what the bounds are until it is typed.

8000

Blaisorblade · 2013-04-20T23:42:42Z

OK, so the parser was really not canonicalizing so much in a useful way.
Well, I'll leave you to work :-)

retronym · 2013-04-23T19:39:13Z

LGTM, and to the Scala meeting today.

Simplify type bounds.

Blaisorblade reviewed Apr 20, 2013
View reviewed changes

retronym added a commit that referenced this pull request Apr 23, 2013

Merge pull request #2417 from paulp/pr/empty-type-bounds

cd148d9

Simplify type bounds.

retronym merged commit cd148d9 into scala:master Apr 23, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Simplify type bounds. #2417

Simplify type bounds. #2417

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Simplify type bounds. #2417

Simplify type bounds. #2417

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!