8000 Messing around with link scraping my blog. · devopsrudr/practice-python@8ccce90 · GitHub
[go: up one dir, main page]

Skip to content

Commit 8ccce90

Browse files
committed
Messing around with link scraping my blog.
1 parent 313bfd2 commit 8ccce90

File tree

1 file changed

+22
-0
lines changed

1 file changed

+22
-0
lines changed

experiments/blog_posts.py

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
from bs4 import BeautifulSoup
2+
import requests
3+
4+
5+
def main():
6+
url = 'https://googleyasheck.com'
7+
8+
req = requests.get(url)
9+
if req.status_code == requests.codes.ok:
10+
html = req.text
11+
12+
soup = BeautifulSoup(html, 'html.parser')
13+
for page in soup.find_all('article', 'post'):
14+
href = page.h2.a['href']
15+
title = page.h2.text.strip()
16+
date = page.footer.time.text.strip()
17+
18+
print('{title} ({date}): {domain}{url}'.format(title=title, url=href, date=date, domain=url))
19+
20+
21+
if __name__ == '__main__':
22+
main()

0 commit comments

Comments
 (0)
0