rel-urls Parsing Issues

Section 1.4 of the microformats2 parsing specification outlines how to parse link elements (<a>, <link>, etc.) for rel values and defines the JSON output structure.

8AB1 The rels structure is reasonably straightforward and maps one-to-one with matched elements:

<a rel="author" href="http://example.com/a">author a</a>
<a rel="author" href="http://example.com/b">author b</a>
<a rel="in-reply-to" href="http://example.com/1">post 1</a>
<a rel="in-reply-to" href="http://example.com/2">post 2</a>
<a rel="alternate home"
   href="http://example.com/fr"
   media="handheld"
   hreflang="fr">French mobile homepage</a>

…results in…

{
  "rels": { 
    "author": [ "http://example.com/a", "http://example.com/b" ],
    "in-reply-to": [ "http://example.com/1", "http://example.com/2" ],
    "alternate": [ "http://example.com/fr" ],
    "home": [ "http://example.com/fr" ]
  }
}

The parsing rules break down slightly when compiling results for the rel-urls structure. For each unique URL, the resulting JSON hash should include a key rels whose value is an array of strings found across matched link elements. The spec also defines rules for parsing various attributes (hreflang, media, title, and type) and the node's text value. These extended attributes are specified as strings (not arrays), resulting in data loss and a seemingly inconsistent parsing pattern.

Parser Results

Parser developers have implemented this feature with differing results.

Given the markup:

<link rel="me" href="https://sixtwothree.org">

<a rel="me" href="https://sixtwothree.org">Jason Garber</a>
<a rel="home" href="https://sixtwothree.org">Go back home</a>

…the parsers provide differing result JSON.

Go

{
  "items": [],
  "rels": {
    "home": ["https://sixtwothree.org"],
    "me": ["https://sixtwothree.org"]
  },
  "rel-urls": {
    "https://sixtwothree.org": {
      "rels": ["me"]
    }
  }
}

PHP

{
  "items": [],
  "rels": {
    "me": ["https://sixtwothree.org"],
    "home": ["https://sixtwothree.org"]
  },
  "rel-urls": {
    "https://sixtwothree.org": {
      "text": "Jason Garber",
      "rels": ["home", "me"]
    }
  }
}

Python

{
  "items": [],
  "rels": {
    "me": ["https://sixtwothree.org"],
    "home": ["https://sixtwothree.org"]
  },
  "rel-urls": {
    "https://sixtwothree.org": {
      "text": "",
      "rels": ["home", "me"]
    }
  }
}

Ruby

{
  "items": [],
  "rels": {
    "me": ["https://sixtwothree.org"],
    "home": ["https://sixtwothree.org"]
  },
  "rel-urls": {
    "https://sixtwothree.org": {
      "rels": ["home"],
      "text": "Jason Garber"
    }
  }
}

Note: The Node parser on microformats.io appears to be offline.

So…

The test suite's rel tests appear to conform to the spec as its written today. What I'd like help sorting out is what seems like an arbitrary (or, at least undocumented) decision to only aggregate rel attribute values in the rel-urls result structure. The extended attributes are, per the spec, worth capturing, but not worth capturing as arrays. That seems strange.

Can someone shed some light on the subject and/or can we update the spec to be more clear or to change behavior?

Edit 1: #39 is tangentially related to this, as well.

Edit 2: #32 is also related to this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parser Results

Go

PHP

Python

Ruby

So…

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

Parser Results

Go

PHP

Python

Ruby

So…

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions