File tree Expand file tree Collapse file tree 3 files changed +62
-63
lines changed Expand file tree Collapse file tree 3 files changed +62
-63
lines changed Load Diff This file was deleted.
Original file line number Diff line number Diff line change
1
+ python-readability
2
+ ==================
3
+
4
+ Given a html document, it pulls out the main body text and cleans it up.
5
+
6
+ This is a python port of a ruby port of `arc90's readability
7
+ project <http://lab.arc90.com/experiments/readability/> `__.
8
+
9
+ Installation
10
+ ------------
11
+
12
+ It's easy using ``pip ``, just run:
13
+
14
+ ::
15
+
16
+ $ pip install readability-lxml
17
+
18
+ Usage
19
+ -----
20
+
21
+ ::
22
+
23
+ >> import requests
24
+ >> from readability import Document
25
+ >>
26
+ >> response = requests.get('http://example.com')
27
+ >> doc = Document(response.text)
28
+ >> doc.title()
29
+ >> 'Example Domain'
30
+
31
+ Change Log
32
+ ----------
33
+
34
+ - 0.3 Added Document.encoding, positive\_ keywords and
35
+ negative\_ keywords
36
+ - 0.4 Added Videos loading and allowed more images per paragraph
37
+ - 0.5 Preparing a release to support Python versions 2.6, 2.7, 3.3 and
38
+ 3.4
39
+ - 0.6 Finally a release which supports Python versions 2.6, 2.7, 3.3
40
+ and 3.4
41
+
42
+ Licensing
43
+ =========
44
+
45
+ This code is under `the Apache License
46
+ 2.0 <http://www.apache.org/licenses/LICENSE-2.0> `__ license.
47
+
48
+ Thanks to
49
+ ---------
50
+
51
+ - Latest
52
+ `readability.js <https://github.com/MHordecki/readability-redux/blob/master/readability/readability.js >`__
53
+ - Ruby port by starrhorne and iterationlabs
54
+ - `Python port <https://github.com/gfxmonk/python-readability >`__ by
55
+ gfxmonk
56
+ - `Decruft
57
+ effort <http://www.minvolai.com/blog/decruft-arc90s-readability-in-python/> `__
58
+ to move to lxml
59
+ - "BR to P" fix from readability.js which improves quality for smaller
60
+ texts
61
+ - Github users contributions.
Original file line number Diff line number Diff line change 19
19
author_email = "burchik@gmail.com" ,
20
20
description = "fast html to text parser (article readability tool) with python3 support" ,
21
21
test_suite = "tests.test_article_only" ,
22
- long_description = open ("README" ).read (),
22
+ long_description = open ("README.rst " ).read (),
23
23
license = "Apache License 2.0" ,
24
24
url = "http://github.com/buriy/python-readability" ,
25
25
packages = ['readability' , 'readability.compat' ],
You can’t perform that action at this time.
0 commit comments