8000 `rel-urls` Parsing Issues · Issue #50 · microformats/microformats2-parsing · GitHub
[go: up one dir, main page]

Skip to content
rel-urls Parsing Issues #50
Open
Open
@jgarber623

Description

@jgarber623

Section 1.4 of the microformats2 parsing specification outlines how to parse link elements (<a>, <link>, etc.) for rel values and defines the JSON output structure.

8AB1 The rels structure is reasonably straightforward and maps one-to-one with matched elements:

<a rel="author" href="http://example.com/a">author a</a>
<a rel="author" href="http://example.com/b">author b</a>
<a rel="in-reply-to" href="http://example.com/1">post 1</a>
<a rel="in-reply-to" href="http://example.com/2">post 2</a>
<a rel="alternate home"
   href="http://example.com/fr"
   media="handheld"
   hreflang="fr">French mobile homepage</a>

…results in…

{
  "rels": { 
    "author": [ "http://example.com/a", "http://example.com/b" ],
    "in-reply-to": [ "http://example.com/1", "http://example.com/2" ],
    "alternate": [ "http://example.com/fr" ],
    "home": [ "http://example.com/fr" ]
  }
}

The parsing rules break down slightly when compiling results for the rel-urls structure. For each unique URL, the resulting JSON hash should include a key rels whose value is an array of strings found across matched link elements. The spec also defines rules for parsing various attributes (hreflang, media, title, and type) and the node's text value. These extended attributes are specified as strings (not arrays), resulting in data loss and a seemingly inconsistent parsing pattern.

Parser Results

Parser developers have implemented this feature with differing results.

Given the markup:

<link rel="me" href="https://sixtwothree.org">

<a rel="me" href="https://sixtwothree.org">Jason Garber</a>
<a rel="home" href="https://sixtwothree.org">Go back home</a>

…the parsers provide differing result JSON.

Go

{
  "items": [],
  "rels": {
    "home": ["https://sixtwothree.org"],
    "me": ["https://sixtwothree.org"]
  },
  "rel-urls": {
    "https://sixtwothree.org": {
      "rels": ["me"]
    }
  }
}

PHP

{
  "items": [],
  "rels": {
    "me": ["https://sixtwothree.org"],
    "home": ["https://sixtwothree.org"]
  },
  "rel-urls": {
    "https://sixtwothree.org": {
      "text": "Jason Garber",
      "rels": ["home", "me"]
    }
  }
}

Python

{
  "items": [],
  "rels": {
    "me": ["https://sixtwothree.org"],
    "home": ["https://sixtwothree.org"]
  },
  "rel-urls": {
    "https://sixtwothree.org": {
      "text": "",
      "rels": ["home", "me"]
    }
  }
}

Ruby

{
  "items": [],
  "rels": {
    "me": ["https://sixtwothree.org"],
    "home": ["https://sixtwothree.org"]
  },
  "rel-urls": {
    "https://sixtwothree.org": {
      "rels": ["home"],
      "text": "Jason Garber"
    }
  }
}

Note: The Node parser on microformats.io appears to be offline.

So…

The test suite's rel tests appear to conform to the spec as its written today. What I'd like help sorting out is what seems like an arbitrary (or, at least undocumented) decision to only aggregate rel attribute values in the rel-urls result structure. The extended attributes are, per the spec, worth capturing, but not worth capturing as arrays. That seems strange.

Can someone shed some light on the subject and/or can we update the spec to be more clear or to change behavior?

Edit 1: #39 is tangentially related to this, as well.

Edit 2: #32 is also related to this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0