Tutorials  - 
 
Python: how to write a Simple Web Crawler http://xahlee.info/perl-python/python_simple_web_crawler.html

comment, improvements, welcome.
here's how to write a simple web crawler in Python. # -*- coding: utf-8 -*- # python 2 # craw a website, list all url under a specific given path inputURL = "http://ergoemacs.github.io/ergoemacs-mode/" resultUrl = {inputURL:False} # key is a url we want. value is True or False.
22
8
Darwis Daeng Sijaya's profile photoNandan Vaidya's profile photoJari Vasell's profile photoJonathan Poczatek's profile photo
6 comments
 
moreToCrawl() is called twice.

while True:
    toCrawl = moreToCrawl()
    if not toCrawl:
        break
    processOneUrl(toCrawl)
    ...

might be better
Xah Lee
 
+Sorawee Porncharoenwase super. Thanks a lot. I updated.

that part had me worked for a while. Originally, i was trying to figure out how to loop a hash table that gets updated, still haven't figured that part. Any idea if that's possible? I was thinking using views should be possible.
Xah Lee
 
+Craig Addyman that seems odd. I can't figure out why that would happen. Were you running this script too often or without sleep that github ban'd your ip temporarily?
 
No tried it on my own site and thought I must have done something wrong so then copied your code completely and still got the one result
Add a comment...