8000 hlink: add Hyperlink.fragment and .url · python-openxml/python-docx@96f80f7 · GitHub
[go: up one dir, main page]

Skip to content

Commit 96f80f7

Browse files
committed
hlink: add Hyperlink.fragment and .url
1 parent 129dd83 commit 96f80f7

File tree

7 files changed

+310
-121
lines changed

7 files changed

+310
-121
lines changed

docs/dev/analysis/features/text/hyperlink.rst

Lines changed: 142 additions & 111 deletions
+
**Access hyperlink URL**::
Original file line numberDiff line numberDiff line change
@@ -2,31 +2,45 @@
22
Hyperlink
33
=========
44

5-
Word allows hyperlinks to be placed in a document wherever paragraphs can appear.
5+
Word allows a hyperlink to be placed in a document wherever a paragraph can appear. The
6+
actual hyperlink element is a peer of |Run|.
67

7-
The target (URL) of a hyperlink may be external, such as a web site, or internal, to
8-
another location in the document.
8+
The link may be to an external resource such as a web site, or internal, to another
9+
location in the document. The link may also be a `mailto:` URI or a reference to a file
10+
on an accessible local or network filesystem.
911

1012
The visible text of a hyperlink is held in one or more runs. Technically a hyperlink can
1113
have zero runs, but this occurs only in contrived cases (otherwise there would be
1214
nothing to click on). As usual, each run can have its own distinct text formatting
1315
(font), so for example one word in the hyperlink can be bold, etc. By default, Word
14-
applies the built-in `Hyperlink` character style to a newly inserted hyperlink.
16+
applies the built-in `Hyperlink` character style to a newly inserted hyperlink. Like
17+
other text, the hyperlink text may often be broken into multiple runs as a result of
18+
edits in different "revision-save" editing sessions (between "Save" commands).
1519

1620
Note that rendered page-breaks can occur in the middle of a hyperlink.
1721

1822
A |Hyperlink| is a child of |Paragraph|, a peer of |Run|.
1923

2024

25+
TODO: What about URL-encoding/decoding (like %20) behaviors, if any?
26+
27+
2128
Candidate protocol
2229
------------------
2330

2431
An external hyperlink has an address and an optional anchor. An internal hyperlink has
25-
only an anchor. An anchor is also known as a *URI fragment* and follows a hash mark
26-
("#").
32+
only an anchor. An anchor is more precisely known as a *URI fragment* in a web URL and
33+
follows a hash mark ("#"). The fragment-separator hash character is not stored in the
34+
XML.
35+
36+
Note that the anchor and address are stored in two distinct attributes, so you need to
37+
concatenate `.address` and `.anchor` like `f"{address}#{anchor}"` if you want the whole
38+
thing.
2739

28-
Note that the anchor and URL are stored in two distinct attributes, so you need to
29-
concatenate `.address` and `.anchor` if you want the whole thing.
40+
Also note that Word does not rigorously separate a fragment in a web B41A URI so it may
41+
appear as part of the address or separately in the anchor attribute, depending on how
42+
the hyperlink was authored. Hyperlinks inserted using the dialog-box seem to separate it
43+
and addresses typed into the document directly don't, based on my limited experience.
3044

3145
.. highlight:: python
3246

@@ -49,6 +63,16 @@ concatenate `.address` and `.anchor` if you want the whole thing.
4963
>>> hyperlink.address
5064
'https://google.com/'
5165

66+
**Access hyperlink fragment**::
67+
68+
>>> hyperlink.fragment
69+
'introduction'
70+
71+
**Access hyperlink history (visited or not, True means not visited yet)**::
72+
73+
>>> hyperlink.history
74+
True
75+
5276
**Access hyperlinks runs**::
5377

5478
>>> hyperlink.runs
@@ -58,6 +82,11 @@ concatenate `.address` and `.anchor` if you want the whole thing.
5882
<docx.text.run.Run at 0x7f...>
5983
]
6084

85
86+
87+
>>> hyperlink.url
88+
'https://us.com#introduction'
89+
6190
**Determine whether a hyperlink contains a rendered page-break**::
6291

6392
>>> hyperlink.contains_page_break
@@ -68,29 +97,31 @@ concatenate `.address` and `.anchor` if you want the whole thing.
6897
>>> hyperlink.text
6998
'an excellent Wikipedia article on ferrets'
7099

71-
**Add an external hyperlink**::
100+
**Add an external hyperlink** (not yet implemented)::
72101

73102
>>> hyperlink = paragraph.add_hyperlink(
74-
'About', address='http://us.com', anchor='about'
75-
)
103+
... 'About', address='http://us.com', fragment='about'
104+
... )
76105
>>> hyperlink
77106
<docx.text.hyperlink.Hyperlink at 0x7f...>
78107
>>> hyperlink.text
79108
'About'
80109
>>> hyperlink.address
81110
'http://us.com'
82-
>>> hyperlink.anchor
111+
>>> hyperlink.fragment
83112
'about'
113+
>>> hyperlink.url
114+
'http://us.com#about'
84115

85116
**Add an internal hyperlink (to a bookmark)**::
86117

87-
>>> hyperlink = paragraph.add_hyperlink('Section 1', anchor='Section_1')
118+
>>> hyperlink = paragraph.add_hyperlink('Section 1', fragment='Section_1')
88119
>>> hyperlink.text
89120
'Section 1'
90-
>>> hyperlink.anchor
121+
>>> hyperlink.fragment
91122
'Section_1'
92123
>>> hyperlink.address
93-
None
124+
''
94125

95126
**Modify hyperlink properties**::
96127

@@ -183,8 +214,8 @@ file, keyed by the w:hyperlink@r:id attribute::
183214
<Relationship Id="rId4" Mode="External" Type="http://..." Target="http://google.com/"/>
184215
</Relationships>
185216

186-
A hyperlink can contain multiple runs of text (and a whole lot of other
187-
stuff, including nested hyperlinks, at least as far as the schema indicates)::
217+
A hyperlink can contain multiple runs of text (and a whole lot of other stuff, at least
218+
as far as the schema indicates)::
188219

189220
<w:p>
190221
<w:hyperlink r:id="rId2">
@@ -256,97 +287,97 @@ Schema excerpt
256287

257288
::
258289

259-
<xsd:complexType name="CT_P">
260-
<xsd:sequence>
261-
<xsd:element name="pPr" type="CT_PPr" minOccurs="0"/>
262-
<xsd:group ref="EG_PContent" minOccurs="0" maxOccurs="unbounded"/>
263-
</xsd:sequence>
264-
<xsd:attribute name="rsidRPr" type="ST_LongHexNumber"/>
265-
<xsd:attribute name="rsidR" type="ST_LongHexNumber"/>
266-
<xsd:attribute name="rsidDel" type="ST_LongHexNumber"/>
267-
<xsd:attribute name="rsidP" type="ST_LongHexNumber"/>
268-
<xsd:attribute name="rsidRDefault" type="ST_LongHexNumber"/>
269-
</xsd:complexType>
270-
271-
<xsd:group name="EG_PContent"> <!-- denormalized -->
272-
<xsd:choice>
273-
<xsd:element name="r" type="CT_R"/>
274-
<xsd:element name="hyperlink" type="CT_Hyperlink"/>
275-
<xsd:element name="fldSimple" type="CT_SimpleField"/>
276-
<xsd:element name="sdt" type="CT_SdtRun"/>
277-
<xsd:element name="customXml" type="CT_CustomXmlRun"/>
278-
<xsd:element name="smartTag" type="CT_SmartTagRun"/>
279-
<xsd:element name="dir" type="CT_DirContentRun"/>
280-
<xsd:element name="bdo" type="CT_BdoContentRun"/>
281-
<xsd:element name="subDoc" type="CT_Rel"/>
282-
<xsd:group ref="EG_RunLevelElts"/>
283-
</xsd:choice>
284-
</xsd:group>
285-
286-
<xsd:complexType name="CT_Hyperlink">
287-
<xsd:group ref="EG_PContent" minOccurs="0" maxOccurs="unbounded"/>
288-
<xsd:attribute name="tgtFrame" type="s:ST_String"/>
289-
<xsd:attribute name="tooltip" type="s:ST_String"/>
290-
<xsd:attribute name="docLocation" type="s:ST_String"/>
291-
<xsd:attribute name="history" type="s:ST_OnOff"/>
292-
<xsd:attribute name="anchor" type="s:ST_String"/>
293-
<xsd:attribute ref="r:id"/>
294-
</xsd:complexType>
295-
296-
<xsd:group name="EG_RunLevelElts">
297-
<xsd:choice>
298-
<xsd:element name="proofErr" type="CT_ProofErr"/>
299-
<xsd:element name="permStart" type="CT_PermStart"/>
300-
<xsd:element name="permEnd" type="CT_Perm"/>
301-
<xsd:element name="bookmarkStart" type="CT_Bookmark"/>
302-
<xsd:element name="bookmarkEnd" type="CT_MarkupRange"/>
303-
<xsd:element name="moveFromRangeStart" type="CT_MoveBookmark"/>
304-
<xsd:element name="moveFromRangeEnd" type="CT_MarkupRange"/>
305-
<xsd:element name="moveToRangeStart" type="CT_MoveBookmark"/>
306-
<xsd:element name="moveToRangeEnd" type="CT_MarkupRange"/>
307-
<xsd:element name="commentRangeStart" type="CT_MarkupRange"/>
308-
<xsd:element name="commentRangeEnd" type="CT_MarkupRange"/>
309-
<xsd:element name="customXmlInsRangeStart" type="CT_TrackChange"/>
310-
<xsd:element name="customXmlInsRangeEnd" type="CT_Markup"/>
311-
<xsd:element name="customXmlDelRangeStart" type="CT_TrackChange"/>
312-
<xsd:element name="customXmlDelRangeEnd" type="CT_Markup"/>
313-
<xsd:element name="customXmlMoveFromRangeStart" type="CT_TrackChange"/>
314-
<xsd:element name="customXmlMoveFromRangeEnd" type="CT_Markup"/>
315-
<xsd:element name="customXmlMoveToRangeStart" type="CT_TrackChange"/>
316-
<xsd:element name="customXmlMoveToRangeEnd" type="CT_Markup"/>
317-
<xsd:element name="ins" type="CT_RunTrackChange"/>
318-
<xsd:element name="del" type="CT_RunTrackChange"/>
319-
<xsd:element name="moveFrom" type="CT_RunTrackChange"/>
320-
<xsd:element name="moveTo" type="CT_RunTrackChange"/>
321-
<xsd:group ref="EG_MathContent" minOccurs="0" maxOccurs="unbounded"/>
322-
</xsd:choice>
323-
</xsd:group>
324-
325-
<xsd:complexType name="CT_R">
326-
<xsd:sequence>
327-
<xsd:group ref="EG_RPr" minOccurs="0"/>
328-
<xsd:group ref="EG_RunInnerContent" minOccurs="0" maxOccurs="unbounded"/>
329-
</xsd:sequence>
330-
<xsd:attribute name="rsidRPr" type="ST_LongHexNumber"/>
331-
<xsd:attribute name="rsidDel" type="ST_LongHexNumber"/>
332-
<xsd:attribute name="rsidR" type="ST_LongHexNumber"/>
333-
</xsd:complexType>
334-
335-
<xsd:simpleType name="ST_OnOff">
336-
<xsd:union memberTypes="xsd:boolean ST_OnOff1"/>
337-
</xsd:simpleType>
338-
339-
<xsd:simpleType name="ST_OnOff1">
340-
<xsd:restriction base="xsd:string">
341-
<xsd:enumeration value="on"/>
342-
<xsd:enumeration value="off"/>
343-
</xsd:restriction>
344-
</xsd:simpleType>
345-
346-
<xsd:simpleType name="ST_RelationshipId">
347-
<xsd:restriction base="xsd:string"/>
348-
</xsd:simpleType>
349-
350-
<xsd:simpleType name="ST_String">
351-
<xsd:restriction base="xsd:string"/>
352-
</xsd:simpleType>
290+
<xsd:complexType name="CT_P">
291+
<xsd:sequence>
292+
<xsd:element name="pPr" type="CT_PPr" minOccurs="0"/>
293+
<xsd:group ref="EG_PContent" minOccurs="0" maxOccurs="unbounded"/>
294+
</xsd:sequence>
295+
<xsd:attribute name="rsidRPr" type="ST_LongHexNumber"/>
296+
<xsd:attribute name="rsidR" type="ST_LongHexNumber"/>
297+
<xsd:attribute name="rsidDel" type="ST_LongHexNumber"/>
298+
<xsd:attribute name="rsidP" type="ST_LongHexNumber"/>
299+
<xsd:attribute name="rsidRDefault" type="ST_LongHexNumber"/>
300+
</xsd:complexType>
301+
302+
<xsd:group name="EG_PContent"> <!-- denormalized -->
303+
<xsd:choice>
304+
<xsd:element name="r" type="CT_R"/>
305+
<xsd:element name="hyperlink" type="CT_Hyperlink"/>
306+
<xsd:element name="fldSimple" type="CT_SimpleField"/>
307+
<xsd:element name="sdt" type="CT_SdtRun"/>
308+
<xsd:element name="customXml" type="CT_CustomXmlRun"/>
309+
<xsd:element name="smartTag" type="CT_SmartTagRun"/>
310+
<xsd:element name="dir" type="CT_DirContentRun"/>
311+
<xsd:element name="bdo" type="CT_BdoContentRun"/>
312+
<xsd:element name="subDoc" type="CT_Rel"/>
313+
<xsd:group ref="EG_RunLevelElts"/>
314+
</xsd:choice>
315+
</xsd:group>
316+
317+
<xsd:complexType name="CT_Hyperlink">
318+
<xsd:group ref="EG_PContent" minOccurs="0" maxOccurs="unbounded"/>
319+
<xsd:attribute name="tgtFrame" type="s:ST_String"/>
320+
<xsd:attribute name="tooltip" type="s:ST_String"/>
321+
<xsd:attribute name="docLocation" type="s:ST_String"/>
322+
<xsd:attribute name="history" type="s:ST_OnOff"/>
323+
<xsd:attribute name="anchor" type="s:ST_String"/>
324+
<xsd:attribute ref="r:id"/>
325+
</xsd:complexType>
326+
327+
<xsd:group name="EG_RunLevelElts">
328+
<xsd:choice>
329+
<xsd:element name="proofErr" type="CT_ProofErr"/>
330+
<xsd:element name="permStart" type="CT_PermStart"/>
331+
<xsd:element name="permEnd" type="CT_Perm"/>
332+
<xsd:element name="bookmarkStart" type="CT_Bookmark"/>
333+
<xsd:element name="bookmarkEnd" type="CT_MarkupRange"/>
334+
<xsd:element name="moveFromRangeStart" type="CT_MoveBookmark"/>
335+
<xsd:element name="moveFromRangeEnd" type="CT_MarkupRange"/>
336+
<xsd:element name="moveToRangeStart" type="CT_MoveBookmark"/>
337+
<xsd:element name="moveToRangeEnd" type="CT_MarkupRange"/>
338+
<xsd:element name="commentRangeStart" type="CT_MarkupRange"/>
339+
<xsd:element name="commentRangeEnd" type="CT_MarkupRange"/>
340+
<xsd:element name="customXmlInsRangeStart" type="CT_TrackChange"/>
341+
<xsd:element name="customXmlInsRangeEnd" type="CT_Markup"/>
342+
<xsd:element name="customXmlDelRangeStart" type="CT_TrackChange"/>
343+
<xsd:element name="customXmlDelRangeEnd" type="CT_Markup"/>
344+
<xsd:element name="customXmlMoveFromRangeStart" type="CT_TrackChange"/>
345+
<xsd:element name="customXmlMoveFromRangeEnd" type="CT_Markup"/>
346+
<xsd:element name="customXmlMoveToRangeStart" type="CT_TrackChange"/>
347+
<xsd:element name="customXmlMoveToRangeEnd" type="CT_Markup"/>
348+
<xsd:element name="ins" type="CT_RunTrackChange"/>
349+
<xsd:element name="del" type="CT_RunTrackChange"/>
350+
<xsd:element name="moveFrom" type="CT_RunTrackChange"/>
351+
<xsd:element name="moveTo" type="CT_RunTrackChange"/>
352+
<xsd:group ref="EG_MathContent" minOccurs="0" maxOccurs="unbounded"/>
353+
</xsd:choice>
354+
</xsd:group>
355+
356+
<xsd:complexType name="CT_R">
357+
<xsd:sequence>
358+
<xsd:group ref="EG_RPr" minOccurs="0"/>
359+
<xsd:group ref="EG_RunInnerContent" minOccurs="0" maxOccurs="unbounded"/>
360+
</xsd:sequence>
361+
<xsd:attribute name="rsidRPr" type="ST_LongHexNumber"/>
362+
<xsd:attribute name="rsidDel" type="ST_LongHexNumber"/>
363+
<xsd:attribute name="rsidR" type="ST_LongHexNumber"/>
364+
</xsd:complexType>
365+
366+
<xsd:simpleType name="ST_OnOff">
367+
<xsd:union memberTypes="xsd:boolean ST_OnOff1"/>
368+
</xsd:simpleType>
369+
370+
<xsd:simpleType name="ST_OnOff1">
371+
<xsd:restriction base="xsd:string">
372+
<xsd:enumeration value="on"/>
373+
<xsd:enumeration value="off"/>
374+
</xsd:restriction>
375+
</xsd:simpleType>
376+
377+
<xsd:simpleType name="ST_RelationshipId">
378+
<xsd:restriction base="xsd:string"/>
379+
</xsd:simpleType>
380+
381+
<xsd:simpleType name="ST_String">
382+
<xsd:restriction base="xsd:string"/>
383+
</xsd:simpleType>

features/hlk-props.feature

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,11 @@ Feature: Access hyperlink properties
1919
| one | True |
2020

2121

22+
Scenario: Hyperlink.fragment has the URI fragment of the hyperlink
23+
Given a hyperlink having a URI fragment
24+
Then hyperlink.fragment is the URI fragment of the hyperlink
25+
26+
2227
Scenario Outline: Hyperlink.runs contains Run for each run in hyperlink
2328
Given a hyperlink having <zero-or-more> runs
2429
Then hyperlink.runs has length <value>
@@ -33,3 +38,17 @@ Feature: Access hyperlink properties
3338
Scenario: Hyperlink.text has the visible text of the hyperlink
3439
Given a hyperlink
3540
Then hyperlink.text is the visible text of the hyperlink
41+
42+
43+
Scenario Outline: Hyperlink.url is the full URL of an internet hyperlink
44+
Given a hyperlink having address <address> and fragment <fragment>
45+
Then hyperlink.url is <url>
46+
47+
Examples: Hyperlink.url cases
48+
| address | fragment | url |
49+
| '' | linkedBookmark | '' |
50+
| https://foo.com | '' | https://foo.com |
51+
| https://foo.com?q=bar | '' | https://foo.com?q=bar |
52+
| http://foo.com/ | intro | http://foo.com/#intro |
53+
| https://foo.com?q=bar#baz | '' | https://foo.com?q=bar#baz |
54+
| court-exif.jpg | '' | court-exif.jpg |

0 commit comments

Comments
 (0)
0