File tree Expand file tree Collapse file tree 3 files changed +8
-3
lines changed Expand file tree Collapse file tree 3 files changed +8
-3
lines changed Original file line number Diff line number Diff line change @@ -13,14 +13,17 @@ Based on:
13
13
- Ruby port by starrhorne and iterationlabs
14
14
- Python port by gfxmonk ( https://github.com/gfxmonk/python-readability , based on BeautifulSoup )
15
15
- Decruft effort to move to lxml ( http://www.minvolai.com/blog/decruft-arc90s-readability-in-python/ )
16
+ - "BR to P" fix from readability.js which improves quality for smaller texts.
17
+ - Github users contributions.
16
18
17
19
Usage:
18
20
21
+ from readability.readability import Document
19
22
import urllib
20
23
html = urllib.urlopen(url).read()
21
24
readable_article = Document(html).summary()
22
25
readable_title = Document(html).short_title()
23
26
24
27
Command-line usage:
25
28
26
- python -m readability.readability -u http://yoursite.com/yourpage
29
+ python -m readability.readability -u http://pypi.python.org/pypi/readability-lxml
Original file line number Diff line number Diff line change @@ -120,7 +120,9 @@ def summary(self):
120
120
continue
121
121
else :
122
122
logging .debug ("Ruthless and lenient parsing did not work. Returning raw html" )
123
- article = self .html .find ('body' ) or self .html
123
+ article = self .html .find ('body' )
124
+ if article is None :
125
+ article = self .html
124
126
125
127
cleaned_article = self .sanitize (article , candidates )
126
128
of_acceptable_length = len (cleaned_article or '' ) >= (self .options ['retry_length' ] or self .RETRY_LENGTH )
Original file line number Diff line number Diff line change 3
3
4
4
setup (
5
5
name = "readability-lxml" ,
6
- version = "0.1dev " ,
6
+ version = "0.2 " ,
7
7
author = "Yuri Baburov" ,
8
8
author_email = "burchik@gmail.com" ,
9
9
description = "python port of arc90's readability bookmarklet" ,
You can’t perform that action at this time.
0 commit comments