8000 hlink: add Hyperlink.fragment and .url · python-openxml/python-docx@96f80f7 · GitHub
[go: up one dir, main page]

Skip to content

Commit 96f80f7

Browse files
committed
hlink: add Hyperlink.fragment and .url
1 parent 129dd83 commit 96f80f7

File tree

7 files changed

+310
-121
lines changed
  • src/docx
  • tests/text
  • 7 files changed

    +310
    -121
    lines changed

    docs/dev/analysis/features/text/hyperlink.rst

    Lines changed: 142 additions & 111 deletions
    Original file line numberDiff line numberDiff line change
    @@ -2,31 +2,45 @@
    22
    Hyperlink
    33
    =========
    44

    5-
    Word allows hyperlinks to be placed in a document wherever paragraphs can appear.
    5+
    Word allows a hyperlink to be placed in a document wherever a paragraph can appear. The
    6+
    actual hyperlink element is a peer of |Run|.
    67

    7-
    The target (URL) of a hyperlink may be external, such as a web site, or internal, to
    8-
    another location in the document.
    8+
    The link may be to an external resource such as a web site, or internal, to another
    9+
    location in the document. The link may also be a `mailto:` URI or a reference to a file
    10+
    on an accessible local or network filesystem.
    911

    1012
    The visible text of a hyperlink is held in one or more runs. Technically a hyperlink can
    1113
    have zero runs, but this occurs only in contrived cases (otherwise there would be
    1214
    nothing to click on). As usual, each run can have its own distinct text formatting
    1315
    (font), so for example one word in the hyperlink can be bold, etc. By default, Word
    14-
    applies the built-in `Hyperlink` character style to a newly inserted hyperlink.
    16+
    applies the built-in `Hyperlink` character style to a newly inserted hyperlink. Like
    17+
    other text, the hyperlink text may often be broken into multiple runs as a result of
    18+
    edits in different "revision-save" editing sessions (between "Save" commands).
    1519

    1620
    Note that rendered page-breaks can occur in the middle of a hyperlink.
    1721

    1822
    A |Hyperlink| is a child of |Paragraph|, a peer of |Run|.
    1923

    2024

    25+
    TODO: What about URL-encoding/decoding (like %20) behaviors, if any?
    26+
    27+
    2128
    Candidate protocol
    2229
    ------------------
    2330

    2431
    An external hyperlink has an address and an optional anchor. An internal hyperlink has
    25-
    only an anchor. An anchor is also known as a *URI fragment* and follows a hash mark
    26-
    ("#").
    32+
    only an anchor. An anchor is more precisely known as a *URI fragment* in a web URL and
    33+
    follows a hash mark ("#"). The fragment-separator hash character is not stored in the
    34+
    XML.
    35+
    36+
    Note that the anchor and address are stored in two distinct attributes, so you need to
    37+
    concatenate `.address` and `.anchor` like `f"{address}#{anchor}"` if you want the whole
    38+
    thing.
    2739

    28-
    Note that the anchor and URL are stored in two distinct attributes, so you need to
    29-
    concatenate `.address` and `.anchor` if you want the whole thing.
    40+
    Also note that Word does not rigorously separate a fragment in a web URI so it may
    41+
    appear as part of the address or separately in the anchor attribute, depending on how
    42+
    the hyperlink was authored. Hyperlinks inserted using the dialog-box seem to separate it
    43+
    and addresses typed into the document directly don't, based on my limited experience.
    3044

    3145
    .. highlight:: python
    3246

    @@ -49,6 +63,16 @@ concatenate `.address` and `.anchor` if you want the whole thing.
    4963
    >>> hyperlink.address
    5064
    'https://google.com/'
    5165

    66+
    **Access hyperlink fragment**::
    67+
    68+
    >>> hyperlink.fragment
    69+
    'introduction'
    70+
    71+
    **Access hyperlink history (visited or not, True means not visited yet)**::
    72+
    73+
    >>> hyperlink.history
    74+
    True
    75+
    5276
    **Access hyperlinks runs**::
    5377

    5478
    >>> hyperlink.runs
    @@ -58,6 +82,11 @@ concatenate `.address` and `.anchor` if you want the whole thing.
    5882
    <docx.text.run.Run at 0x7f...>
    5983
    ]
    6084

    85+
    **Access hyperlink URL**::
    86+
    87+
    >>> hyperlink.url
    88+
    'https://us.com#introduction'
    89+
    6190
    **Determine whether a hyperlink contains a rendered page-break**::
    6291

    6392
    >>> hyperlink.contains_page_break
    @@ -68,29 +97,31 @@ concatenate `.address` and `.anchor` if you want the whole thing.
    6897
    >>> hyperlink.text
    6998
    'an excellent Wikipedia article on ferrets'
    7099

    71-
    **Add an external hyperlink**::
    100+
    **Add an external hyperlink** (not yet implemented)::
    72101

    73102
    >>> hyperlink = paragraph.add_hyperlink(
    74-
    'About', address='http://us.com', anchor='about'
    75-
    )
    103+
    ... 'About', address='http://us.com', fragment='about'
    104+
    ... )
    76105
    >>> hyperlink
    77106
    <docx.text.hyperlink.Hyperlink at 0x7f...>
    78107
    >>> hyperlink.text
    79108
    'About'
    80109
    >>> hyperlink.address
    81110
    'http://us.com'
    82-
    >>> hyperlink.anchor
    111+
    >>> hyperlink.fragment
    83112
    'about'
    113+
    >>> hyperlink.url
    114+
    'http://us.com#about'
    84115

    85116
    **Add an internal hyperlink (to a bookmark)**::
    86117

    87-
    >>> hyperlink = paragraph.add_hyperlink('Section 1', anchor='Section_1')
    118+
    >>> hyperlink = paragraph.add_hyperlink('Section 1', fragment='Section_1')
    88119
    >>> hyperlink.text
    89120
    'Section 1'
    90-
    >>> hyperlink.anchor
    121+
    >>> hyperlink.fragment
    91122
    'Section_1'
    92123
    >>> hyperlink.address
    93-
    None
    124+
    ''
    94125

    95126
    **Modify hyperlink properties**::
    96127

    @@ -183,8 +214,8 @@ file, keyed by the w:hyperlink@r:id attribute::
    183214
    <Relationship Id="rId4" Mode="External" Type="http://..." Target="http://google.com/"/>
    184215
    </Relationships>
    185216

    186-
    A hyperlink can contain multiple runs of text (and a whole lot of other
    187-
    stuff, including nested hyperlinks, at least as far as the schema indicates)::
    217+
    A hyperlink can contain multiple runs of text (and a whole lot of other stuff, at least
    218+
    as far as the schema indicates)::
    188219

    189220
    <w:p>
    190221
    <w:hyperlink r:id="rId2">
    @@ -256,97 +287,97 @@ Schema excerpt
    256287

    257288
    ::
    258289

    259-
    <xsd:complexType name="CT_P">
    260-
    <xsd:sequence>
    261-
    <xsd:element name="pPr" type="CT_PPr" minOccurs="0"/>
    262-
    <xsd:group ref="EG_PContent" minOccurs="0" maxOccurs="unbounded"/>
    263-
    </xsd:sequence>
    264-
    <xsd:attribute name="rsidRPr" type="ST_LongHexNumber"/>
    265-
    <xsd:attribute name="rsidR" type="ST_LongHexNumber"/>
    266-
    <xsd:attribute name="rsidDel" type="ST_LongHexNumber"/>
    267-
    <xsd:attribute name="rsidP" type="ST_LongHexNumber"/>
    268-
    <xsd:attribute name="rsidRDefault" type="ST_LongHexNumber"/>
    269-
    </xsd:complexType>
    270-
    271-
    <xsd:group name="EG_PContent"> <!-- denormalized -->
    272-
    <xsd:choice>
    273-
    <xsd:element name="r" type="CT_R"/>
    274-
    <xsd:element name="hyperlink" type="CT_Hyperlink"/>
    275-
    <xsd:element name="fldSimple" type="CT_SimpleField"/>
    276-
    <xsd:element name="sdt" type="CT_SdtRun"/>
    277-
    <xsd:element name="customXml" type="CT_CustomXmlRun"/>
    278-
    <xsd:element name="smartTag" type="CT_SmartTagRun"/>
    279-
    <xsd:element name="dir" type="CT_DirContentRun"/>
    280-
    <xsd:element name="bdo" type="CT_BdoContentRun"/>
    281-
    <xsd:element name="subDoc" type="CT_Rel"/>
    282-
    <xsd:group ref="EG_RunLevelElts"/>
    283-
    </xsd:choice>
    284-
    </xsd:group>
    285-
    286-
    <xsd:complexType name="CT_Hyperlink">
    287-
    <xsd:group ref="EG_PContent" minOccurs="0" maxOccurs="unbounded"/>
    288-
    <xsd:attribute name="tgtFrame" type="s:ST_String"/>
    289-
    <xsd:attribute name="tooltip" type="s:ST_String"/>
    290-
    <xsd:attribute name="docLocation" type="s:ST_String"/>
    291-
    <xsd:attribute name="history" type="s:ST_OnOff"/>
    292-
    <xsd:attribute name="anchor" type="s:ST_String"/>
    293-
    <xsd:attribute ref="r:id"/>
    294-
    </xsd:complexType>
    295-
    296-
    <xsd:group name="EG_RunLevelElts">
    297-
    <xsd:choice>
    298-
    <xsd:element name="proofErr" type="CT_ProofErr"/>
    299-
    <xsd:element name="permStart" type="CT_PermStart"/>
    300-
    <xsd:element name="permEnd" type="CT_Perm"/>
    301-
    <xsd:element name="bookmarkStart" type="CT_Bookmark"/>
    302-
    <xsd:element name="bookmarkEnd" type="CT_MarkupRange"/>
    303-
    <xsd:element name="moveFromRangeStart" type="CT_MoveBookmark"/>
    304-
    <xsd:element name="moveFromRangeEnd" type="CT_MarkupRange"/>
    305-
    <xsd:element name="moveToRangeStart" type="CT_MoveBookmark"/>
    306-
    <xsd:element name="moveToRangeEnd" type="CT_MarkupRange"/>
    307-
    <xsd:element name="commentRangeStart" type="CT_MarkupRange"/>
    308-
    <xsd:element name="commentRangeEnd" type="CT_MarkupRange"/>
    309-
    <xsd:element name="customXmlInsRangeStart" type="CT_TrackChange"/>
    310-
    <xsd:element name="customXmlInsRangeEnd" type="CT_Markup"/>
    311-
    <xsd:element name="customXmlDelRangeStart" type="CT_TrackChange"/>
    312-
    <xsd:element name="customXmlDelRangeEnd" type="CT_Markup"/>
    313-
    <xsd:element name="customXmlMoveFromRangeStart" type="CT_TrackChange"/>
    314-
    <xsd:element name="customXmlMoveFromRangeEnd" type="CT_Markup"/>
    315-
    <xsd:element name="customXmlMoveToRangeStart" type="CT_TrackChange"/>
    316-
    <xsd:element name="customXmlMoveToRangeEnd" type="CT_Markup"/>
    317-
    <xsd:element name="ins" type="CT_RunTrackChange"/>
    318-
    <xsd:element name="del" type="CT_RunTrackChange"/>
    319-
    <xsd:element name="moveFrom" type="CT_RunTrackChange"/>
    320-
    <xsd:element name="moveTo" type="CT_RunTrackChange"/>
    321-
    <xsd:group ref="EG_MathContent" minOccurs="0" maxOccurs="unbounded"/>
    322-
    </xsd:choice>
    323-
    </xsd:group>
    324-
    325-
    <xsd:complexType name="CT_R">
    326-
    <xsd:sequence>
    327-
    <xsd:group ref="EG_RPr" minOccurs="0"/>
    328-
    <xsd:group ref="EG_RunInnerContent" minOccurs="0" maxOccurs="unbounded"/>
    329-
    </xsd:sequence>
    330-
    <xsd:attribute name="rsidRPr" type="ST_LongHexNumber"/>
    331-
    <xsd:attribute name="rsidDel" type="ST_LongHexNumber"/>
    332-
    <xsd:attribute name="rsidR" type="ST_LongHexNumber"/>
    333-
    </xsd:complexType>
    334-
    335-
    <xsd:simpleType name="ST_OnOff">
    336-
    <xsd:union memberTypes="xsd:boolean ST_OnOff1"/>
    337-
    </xsd:simpleType>
    338-
    339-
    <xsd:simpleType name="ST_OnOff1">
    340-
    <xsd:restriction base="xsd:string">
    341-
    <xsd:enumeration value="on"/>
    342-
    <xsd:enumeration value="off"/>
    343-
    </xsd:restriction>
    344-
    </xsd:simpleType>
    345-
    346-
    <xsd:simpleType name="ST_RelationshipId">
    347-
    <xsd:restriction base="xsd:string"/>
    348-
    </xsd:simpleType>
    349-
    350-
    <xsd:simpleType name="ST_String">
    351-
    <xsd:restriction base="xsd:string"/>
    352-
    </xsd:simpleType>
    290+
    <xsd:complexType name="CT_P">
    291+
    <xsd:sequence>
    292+
    <xsd:element name="pPr" type="CT_PPr" minOccurs="0"/>
    293+
    <xsd:group ref="EG_PContent" minOccurs="0" maxOccurs="unbounded"/>
    294+
    </xsd:sequence>
    295+
    <xsd:attribute name="rsidRPr" type="ST_LongHexNumber"/>
    296+
    <xsd:attribute name="rsidR" type="ST_LongHexNumber"/>
    297+
    <xsd:attribute name="rsidDel" type="ST_LongHexNumber"/>
    298+
    <xsd:attribute name="rsidP" type="ST_LongHexNumber"/>
    299+
    <xsd:attribute name="rsidRDefault" type="ST_LongHexNumber"/>
    300+
    </xsd:complexType>
    301+
    302+
    <xsd:group name="EG_PContent"> <!-- denormalized -->
    303+
    <xsd:choice>
    304+
    <xsd:element name="r" type="CT_R"/>
    305+
    <xsd:element name="hyperlink" type="CT_Hyperlink"/>
    306+
    <xsd:element name="fldSimple" type="CT_SimpleField"/>
    307+
    <xsd:element name="sdt" type="CT_SdtRun"/>
    308+
    <xsd:element name="customXml" type="CT_CustomXmlRun"/>
    309+
    <xsd:element name="smartTag" type="CT_SmartTagRun"/>
    310+
    <xsd:element name="dir" type="CT_DirContentRun"/>
    311+
    <xsd:element name="bdo" type="CT_BdoContentRun"/>
    312+
    <xsd:element name="subDoc" type="CT_Rel"/>
    313+
    <xsd:group ref="EG_RunLevelElts"/>
    314+
    </xsd:choice>
    315+
    </xsd:group>
    316+
    317+
    <xsd:complexType name="CT_Hyperlink">
    318+
    <xsd:group ref="EG_PContent" minOccurs="0" maxOccurs="unbounded"/>
    319+
    <xsd:attribute name="tgtFrame" type="s:ST_String"/>
    320+
    <xsd:attribute name="tooltip" type="s:ST_String"/>
    321+
    <xsd:attribute name="docLocation" type="s:ST_String"/>
    322+
    <xsd:attribute name="history" type="s:ST_OnOff"/>
    323+
    <xsd:attribute name="anchor" type="s:ST_String"/>
    324+
    <xsd:attribute ref="r:id"/>
    325+
    </xsd:complexType>
    326+
    327+
    <xsd:group name="EG_RunLevelElts">
    328+
    <xsd:choice>
    329+
    <xsd:element name="proofErr" type="CT_ProofErr"/>
    330+
    <xsd:element name="permStart" type="CT_PermStart"/>
    331+
    <xsd:element name="permEnd" type="CT_Perm"/>
    332+
    <xsd:element name="bookmarkStart" type="CT_Bookmark"/>
    333+
    <xsd:element name="bookmarkEnd" type="CT_MarkupRange"/>
    334+
    <xsd:element name="moveFromRangeStart" type="CT_MoveBookmark"/>
    335+
    <xsd:element name="moveFromRangeEnd" type="CT_MarkupRange"/>
    336+
    <xsd:element name="moveToRangeStart" type="CT_MoveBookmark"/>
    337+
    <xsd:element name="moveToRangeEnd" type="CT_MarkupRange"/>
    338+
    <xsd:element name="commentRangeStart" type="CT_MarkupRange"/>
    339+
    <xsd:element name="commentRangeEnd" type="CT_MarkupRange"/>
    340+
    <xsd:element name="customXmlInsRangeStart" type="CT_TrackChange"/>
    341+
    <xsd:element name="customXmlInsRangeEnd" type="CT_Markup"/>
    342+
    <xsd:element name="customXmlDelRangeStart" type="CT_TrackChange"/>
    343+
    <xsd:element name="customXmlDelRangeEnd" type="CT_Markup"/>
    344+
    <xsd:element name="customXmlMoveFromRangeStart" type="CT_TrackChange"/>
    345+
    <xsd:element name="customXmlMoveFromRangeEnd" type="CT_Markup"/>
    346+
    <xsd:element name="customXmlMoveToRangeStart" type="CT_TrackChange"/>
    347+
    <xsd:element name="customXmlMoveToRangeEnd" type="CT_Markup"/>
    348+
    <xsd:element name="ins" type="CT_RunTrackChange"/>
    349+
    <xsd:element name="del" type="CT_RunTrackChange"/>
    350+
    <xsd:element name="moveFrom" type="CT_RunTrackChange"/>
    351+
    <xsd:element name="moveTo" type="CT_RunTrackChange"/>
    352+
    <xsd:group ref="EG_MathContent" minOccurs="0" maxOccurs="unbounded"/>
    353+
    </xsd:choice>
    354+
    </xsd:group>
    355+
    356+
    <xsd:complexType name="CT_R">
    357+
    <xsd:sequence>
    358+
    <xsd:group ref="EG_RPr" minOccurs="0"/>
    359+
    <xsd:group ref="EG_RunInnerContent" minOccurs="0" maxOccurs="unbounded"/>
    360+
    </xsd:sequence>
    361+
    <xsd:attribute name="rsidRPr" type="ST_LongHexNumber"/>
    362+
    <xsd:attribute name="rsidDel" type="ST_LongHexNumber"/>
    363+
    <xsd:attribute name="rsidR" type="ST_LongHexNumber"/>
    364+
    </xsd:complexType>
    365+
    366+
    <xsd:simpleType name="ST_OnOff">
    367+
    <xsd:union memberTypes="xsd:boolean ST_OnOff1"/>
    368+
    </xsd:simpleType>
    369+
    370+
    <xsd:simpleType name="ST_OnOff1">
    371+
    <xsd:restriction base="xsd:string">
    372+
    <xsd:enumeration value="on"/>
    373+
    <xsd:enumeration value="off"/>
    374+
    </xsd:restriction>
    375+
    </xsd:simpleType>
    376+
    377+
    <xsd:simpleType name="ST_RelationshipId">
    378+
    <xsd:restriction base="xsd:string"/>
    379+
    </xsd:simpleType>
    380+
    381+
    <xsd:simpleType name="ST_String">
    382+
    <xsd:restriction base="xsd:string"/>
    383+
    </xsd:simpleType>

    features/hlk-props.feature

    Lines changed: 19 additions & 0 deletions
    Original file line numberDiff line numberDiff line change
    @@ -19,6 +19,11 @@ Feature: Access hyperlink properties
    1919
    | one | True |
    2020

    2121

    22+
    Scenario: Hyperlink.fragment has the URI fragment of the hyperlink
    23+
    Given a hyperlink having a URI fragment
    24+
    Then hyperlink.fragment is the URI fragment of the hyperlink
    25+
    26+
    2227
    Scenario Outline: Hyperlink.runs contains Run for each run in hyperlink
    2328
    Given a hyperlink having <zero-or-more> runs
    2429
    Then hyperlink.runs has length <value>
    @@ -33,3 +38,17 @@ Feature: Access hyperlink properties
    3338
    Scenario: Hyperlink.text has the visible text of the hyperlink
    3439
    Given a hyperlink
    3540
    Then hyperlink.text is the visible text of the hyperlink
    41+
    42+
    43+
    Scenario Outline: Hyperlink.url is the full URL of an internet hyperlink
    44+
    Given a hyperlink having address <address> and fragment <fragment>
    45+
    Then hyperlink.url is <url>
    46+
    47+
    Examples: Hyperlink.url cases
    48+
    | address | fragment | url |
    49+
    | '' | linkedBookmark | '' |
    50+
    | https://foo.com | '' | https://foo.com |
    51+
    | https://foo.com?q=bar | '' | https://foo.com?q=bar |
    52+
    | http://foo.com/ | intro | http://foo.com/#intro |
    53+
    | https://foo.com?q=bar#baz | '' | https://foo.com?q=bar#baz |
    54+
    | court-exif.jpg | '' | court-exif.jpg |

    0 commit comments

    Comments
     (0)
    0