Commit graph

19 commits

Author SHA1 Message Date
Peter Vacho ac0c2a8c21
Store ranking into the cache 2024-11-25 11:50:06 +01:00
Peter Vacho 50543bd22a
Handle empty dataframes 2024-11-25 11:37:12 +01:00
Peter Vacho 82178b4c6e
Use a more readable name for cache files 2024-11-25 10:58:54 +01:00
Peter Vacho ea03f0cf75
Don't download non-html content 2024-11-25 10:55:44 +01:00
Peter Vacho fd563ef46c
Use dataframe based caching 2024-11-25 10:55:43 +01:00
Peter Vacho 56947296b5
Handle parsing errors 2024-11-24 22:09:52 +01:00
Peter Vacho 422b0d5880
Improve comments in pagerank algo 2024-11-24 22:09:24 +01:00
Peter Vacho 2fdb600c50
Show page ranks in scientific notation 2024-11-24 19:53:23 +01:00
Peter Vacho bdb9529b77
Add caching support 2024-11-24 19:53:08 +01:00
Peter Vacho 299350a90a
Print total amt of found urls 2024-11-24 19:46:45 +01:00
Peter Vacho 726b60eb82
Print top 50 URLs & their scores 2024-11-24 19:46:37 +01:00
Peter Vacho e7f0b5ce4e
Add pagerank algorithm 2024-11-24 19:04:21 +01:00
Peter Vacho e853747cdd
Add logging & better exc handling 2024-11-24 18:11:38 +01:00
Peter Vacho 16373bc014
Fix depth handling 2024-11-24 17:23:31 +01:00
Peter Vacho bd5347c299
Support following redirects 2024-11-19 20:26:15 +01:00
Peter Vacho 9dfac02aab
Allow exception suppressing 2024-11-19 20:16:58 +01:00
Peter Vacho 7f9798ed28
Use regex for filter condition 2024-11-19 20:14:52 +01:00
Peter Vacho 47c9a9f555
Basic link scraper 2024-11-19 19:51:44 +01:00
Peter Vacho b1e815e588
Initial commit 2024-11-18 14:13:51 +01:00