November 7, 2000
Faults With Counting Internet & Website Traffic
by Sidd Mukherjee, Phd.

as I, and others, have said ad nauseam 'calculating the number of viewers from webserver logs is like a radio station trying to measure the number of listeners from the power broadcast by the antenna'

1 a) they don't sample all viewers is certainly true... but if you do good sampling and have a nice 'smooth' population, good statistics will save you but see below

1 b) they say they are sampling from the most visited sites... and then they trot out their own numbers to prove it but I think in the 1 Megahit+ a day class and over they probably are sampling at the servers at exodus and the rest of the colocation areas and may even have an idea

this is because bandwith in the 1Mhit/day class is still not widely available... rite now they may have enuf sampling sites at major ISPs where this kinda bandwidth is available ... but large bandwidth is coming and as it approaches their numbers will get further away from reality ... unless they put a sampling box on every machine in the world

and, we wont talk about the distributed services like Gnutella, Napster, and Freenet to which the methods cannot apply

but, they are almost certainly undersampling the sites that get a 0.1 Megahits/day a day .. like us

but, I don't think that 1 a) holds .. the internet user population is not smooth ... but is rather a class of disparate groups like on Usenet, with limited overlapping interest so perhaps while their numbers might hold for generalist sites like yahoo, they don't apply to sites like ours .. which is a collection of disparate interest groups, and micromanaged traffic-- in fact our sites are really more like Usenet-newsgroups/email than the web

the Web itself is not a smooth homogenous structure, it has blobs and tendrils and a rich structure -- almost biological in complexity. there are huge sites inside corporations and universities, that transfer petabytes of data daily, and are not even mentioned by the net ratings gods

so, you have a fragmented medium and a fragmented user base... you have little chance of estimating viewership from serverlogs what works is microtargeting, as we well know

but, all this is just not relevant.. the web is a fine thing, but the killer app is email

the web is a broadcast medium where the cost of entering the market is near zero so... you will eventually have as many websites as users e.g. the number of porn sites is growing faster than the number of porn viewers

heehee... take that and put it in your models.

old broadcast models developed for tv and radio need not apply

email is not a broadcast medium (misused as a broadcast medium we call it spamming) is targeted, contextual and many times more effective than a website

email if you think about it is the logical extension of the fragmentation of the web except that it came first !

so, i think that point i have expanded on points 1,2 ... leave you to deal with 3... cookies, java and udder attempts to track actual users, and so on....

