Peter Vacho
|
ea03f0cf75
|
Don't download non-html content
|
2024-11-25 10:55:44 +01:00 |
|
Peter Vacho
|
fd563ef46c
|
Use dataframe based caching
|
2024-11-25 10:55:43 +01:00 |
|
Peter Vacho
|
56947296b5
|
Handle parsing errors
|
2024-11-24 22:09:52 +01:00 |
|
Peter Vacho
|
422b0d5880
|
Improve comments in pagerank algo
|
2024-11-24 22:09:24 +01:00 |
|
Peter Vacho
|
2fdb600c50
|
Show page ranks in scientific notation
|
2024-11-24 19:53:23 +01:00 |
|
Peter Vacho
|
bdb9529b77
|
Add caching support
|
2024-11-24 19:53:08 +01:00 |
|
Peter Vacho
|
299350a90a
|
Print total amt of found urls
|
2024-11-24 19:46:45 +01:00 |
|
Peter Vacho
|
726b60eb82
|
Print top 50 URLs & their scores
|
2024-11-24 19:46:37 +01:00 |
|
Peter Vacho
|
e7f0b5ce4e
|
Add pagerank algorithm
|
2024-11-24 19:04:21 +01:00 |
|
Peter Vacho
|
e853747cdd
|
Add logging & better exc handling
|
2024-11-24 18:11:38 +01:00 |
|
Peter Vacho
|
16373bc014
|
Fix depth handling
|
2024-11-24 17:23:31 +01:00 |
|
Peter Vacho
|
bd5347c299
|
Support following redirects
|
2024-11-19 20:26:15 +01:00 |
|
Peter Vacho
|
9dfac02aab
|
Allow exception suppressing
|
2024-11-19 20:16:58 +01:00 |
|
Peter Vacho
|
7f9798ed28
|
Use regex for filter condition
|
2024-11-19 20:14:52 +01:00 |
|
Peter Vacho
|
47c9a9f555
|
Basic link scraper
|
2024-11-19 19:51:44 +01:00 |
|
Peter Vacho
|
b1e815e588
|
Initial commit
|
2024-11-18 14:13:51 +01:00 |
|