Now that i have the files needed to start digging all over the internet for Statistical data, i would probably need to start with a domain name to display the information at, And i (after buying a bunch of domain names) can not make up my mind yet, but i will let you know as soon as i do
I would like the name to reflect my intention to collect statistics and provide history for every statistic, as i said, i will let you know when i come up with a nice vacant name.
So, a nameserver, what different networks has the nameserver machine moved to over time, What domain names has it served over time, what nameservers a domain has used over it’s lifetime, How many websites (Domains) is this web server serving, How has the number changed from year to year, and other things
While trying to design such a system, it proved to be far more challenging than i thought, the interconnections of everything are a bit complicated (Taking efficiency into account), every host name can have multiple IPs, every IP can have multiple names, an example of a questions that need to be answered before the design would sound like, a website that changed Nameservers but the new NS has the same ip as the old (Same physical serveron same network), has it really changed nameservers, and no this is not like the tree that fell in the woods, they are real questions ?
So the statistics that need to be collected and used are complicated but there are other issues that i must deal with as well to protect the internet and keep it safe
1- Spammers, somehow, the statistics MUST be protected from spammers (And other types of crooks).
2- Scalability, Where the terrabytes of data are stored, and where the processing is done should not be the same as the web server that serves that data, and the web server that serves the text data should not be the same as that that serves the graphics etc… cheaper networks should be used in a way, and more expensive faster networks should be used in a different way, processing intensive operations should be done somewhere, and data fetch and display operations in another, synchronisation should be automatic
So for the above, the following are guidelines i have come up with till now
* A captcha image must appear after x page views where domain names are displayed for any client IP
* Google and other spiders must be able to archive the website and serve relevant usefull pages to there visitors, but must not provide a way for spammers to mine data, meaning, the search engine must NOT cache (Or display cached copies) of the pages where domain names appear
I will keep everyone posted right here on all such issues
For now, i need to go create an open source FAST IP-2-Country system for use with this system, and a way to seek help in building it (Where people report there GEO Location).
Stay tuned for some serious internet statistics.