文/斯蒂芬妮·帕帕斯 譯/周臻
By Stephanie Pappas
互聯(lián)網(wǎng)是一個(gè)繁忙之所。國(guó)際實(shí)時(shí)統(tǒng)計(jì)項(xiàng)目的網(wǎng)站“互聯(lián)網(wǎng)實(shí)時(shí)統(tǒng)計(jì)”顯示,每秒大約有6000推特更新,4萬(wàn)多谷歌搜索,200多萬(wàn)電郵發(fā)送。
[2]但這些統(tǒng)計(jì)數(shù)據(jù)只暗示了網(wǎng)絡(luò)大小。截至2014年9月,互聯(lián)網(wǎng)上有10億個(gè)網(wǎng)站,其數(shù)量隨著每分鐘有網(wǎng)站消失和誕生而波動(dòng)?;ヂ?lián)網(wǎng)不停變化,某種程度上卻可量化——就在這人人熟知的互聯(lián)網(wǎng)之下,是谷歌和其他搜索引擎都未索引的“深網(wǎng)”。深網(wǎng)的內(nèi)容可以與在線數(shù)據(jù)庫(kù)的搜索結(jié)果一樣無(wú)害,也可以像那些使用特殊Tor軟件才能訪問(wèn)的黑市論壇一樣神秘。(使用Tor是人們?yōu)榱四撤N理由需要在網(wǎng)上匿名,而不僅僅是為了非法活動(dòng)。)
[3]結(jié)合“表層”網(wǎng)絡(luò)的不斷變化和深網(wǎng)的無(wú)法量化,很容易看出為什么估算互聯(lián)網(wǎng)的大小是一項(xiàng)艱巨的任務(wù)。不過(guò)分析師們認(rèn)為,網(wǎng)絡(luò)規(guī)模龐大且越來(lái)越大。
[4]除了大約10億個(gè)網(wǎng)站,網(wǎng)絡(luò)還是更多個(gè)人網(wǎng)頁(yè)的家園。www.worldwidewebsize.com是其中之一,旨在通過(guò)互聯(lián)網(wǎng)顧問(wèn)莫里斯·德昆德?tīng)柕难芯縼?lái)取得數(shù)字量化。德昆德?tīng)柡退耐聜冇?016年2月在《科學(xué)計(jì)量學(xué)》雜志上發(fā)表了他們的研究方法。為了取得估算結(jié)果,研究人員在谷歌和必應(yīng)上批量搜索了50個(gè)常用詞。(雅虎搜索和Ask.com曾經(jīng)被納入,但因?yàn)樗鼈儾辉亠@示結(jié)果總數(shù)而被排除。)研究人員知道這些詞在普通印刷品中的出現(xiàn)頻率,他們便能基于詞匯引用的多少來(lái)推算出頁(yè)面總數(shù)。搜索引擎索引出的頁(yè)面會(huì)互相重復(fù),因此該方法還需要估計(jì)和去除可能的重疊部分。
[5]根據(jù)這些計(jì)算,截至2016年3月中,至少有46.6億個(gè)網(wǎng)頁(yè)在線。但是,該計(jì)算僅涵蓋可搜索的網(wǎng)絡(luò),不包括深網(wǎng)。
[6]那么互聯(lián)網(wǎng)存有多少信息呢?據(jù)加州大學(xué)戴維斯分校通信系教授馬丁·希爾伯特說(shuō),有三種方法來(lái)審視這個(gè)問(wèn)題。
[7]“互聯(lián)網(wǎng)存儲(chǔ)信息,互聯(lián)網(wǎng)傳播信息,互聯(lián)網(wǎng)計(jì)算信息。”希爾伯特如是說(shuō)。他表示,互聯(lián)網(wǎng)的通信能力,可以通過(guò)在任何給定時(shí)間內(nèi)能夠傳輸多少信息或?qū)嶋H傳輸多少信息來(lái)衡量。
[8] 2014年,研究人員在《超級(jí)計(jì)算前沿和創(chuàng)新》雜志上發(fā)表了一份研究報(bào)告,估算互聯(lián)網(wǎng)的存儲(chǔ)容量為10的24次方字節(jié),即100萬(wàn)埃字節(jié)。1個(gè)字節(jié)是一個(gè)包含8個(gè)比特的數(shù)據(jù)單元,相當(dāng)于您正讀到的1個(gè)單詞中的單個(gè)字符。1個(gè)埃字節(jié)是100億億字節(jié)。
[9]估算互聯(lián)網(wǎng)通信能力的一種方法是測(cè)量互聯(lián)網(wǎng)的流量。根據(jù)思科可視網(wǎng)絡(luò)指數(shù)計(jì)劃,互聯(lián)網(wǎng)正處于“澤字節(jié)時(shí)代”。1個(gè)澤字節(jié)等于十萬(wàn)億億個(gè)字節(jié)或1000埃字節(jié)。根據(jù)思科推斷,截至2016年底,全球互聯(lián)網(wǎng)流量將達(dá)到每年1.1澤字節(jié),到2019年,全球流量預(yù)計(jì)將達(dá)到每年2澤字節(jié)。
[10]思科思維領(lǐng)袖總監(jiān)小托馬斯·巴內(nèi)特在2011年的一篇博客中寫到了公司的發(fā)現(xiàn),1個(gè)澤字節(jié)相當(dāng)于長(zhǎng)達(dá)3.6萬(wàn)年的高清視頻,也相當(dāng)于播放Netflix的整個(gè)目錄3177次。
[11] 2011年,希爾伯特和他的同事在《科學(xué)》雜志上發(fā)表了一篇論文,以帶寬測(cè)量估算出互聯(lián)網(wǎng)的通信能力,為每秒3兆千比特。這是基于硬件的能力,而不是任何時(shí)刻實(shí)際傳輸?shù)男畔⒘俊?/p>
[12]在一項(xiàng)特別不尋常的研究中,一個(gè)匿名黑客通過(guò)計(jì)算使用了多少個(gè)IP(互聯(lián)網(wǎng)協(xié)議)來(lái)測(cè)量互聯(lián)網(wǎng)的大小。IP是數(shù)據(jù)通行于互聯(lián)網(wǎng)的起點(diǎn),每個(gè)在線設(shè)備至少有一個(gè)IP地址。據(jù)該黑客估計(jì),2012年在線的IP地址有13億個(gè)。
[13]互聯(lián)網(wǎng)大大改變了數(shù)據(jù)格局。希爾伯特及同事發(fā)現(xiàn),2000年,在互聯(lián)網(wǎng)應(yīng)用無(wú)所不在之前,電信容量為2.2個(gè)完美壓縮的埃字節(jié)。2007年,這個(gè)數(shù)字為65。這個(gè)容量包括電話網(wǎng)絡(luò)和語(yǔ)音呼叫,以及龐大的互聯(lián)網(wǎng)信息庫(kù)接入。然而研究者們發(fā)現(xiàn),2007年移動(dòng)網(wǎng)絡(luò)上的數(shù)據(jù)流量已經(jīng)超過(guò)了語(yǔ)音流量。
[14]如果感覺(jué)所有這些位元和字節(jié)有點(diǎn)抽象,別擔(dān)心:2015年,研究人員嘗試了用物理術(shù)語(yǔ)來(lái)表達(dá)互聯(lián)網(wǎng)的大小。他們?cè)凇犊鐚W(xué)科科學(xué)課題》雜志上發(fā)文稱,據(jù)估計(jì),需要用2%的亞馬孫熱帶雨林制造的紙張來(lái)打印出整個(gè)網(wǎng)絡(luò)(包括暗網(wǎng))。對(duì)于這項(xiàng)研究,他們做出了一些關(guān)于網(wǎng)上文本的大膽假設(shè):一張普通網(wǎng)頁(yè)估計(jì)需要30頁(yè)A4紙(8.27×11.69英寸)?;谶@個(gè)假設(shè),打印互聯(lián)網(wǎng)上的文本將需要1360億頁(yè)之多。(后來(lái),《華盛頓郵報(bào)》的一名記者想要提升估算的準(zhǔn)確率,他認(rèn)為一張網(wǎng)頁(yè)的平均長(zhǎng)度更接近6.5頁(yè),因而估算出需要3055億頁(yè)來(lái)打印整個(gè)互聯(lián)網(wǎng)。)
[15]當(dāng)然,用文本形式打印出來(lái)的互聯(lián)網(wǎng)不會(huì)包含大量在線的非文本數(shù)據(jù)。根據(jù)思科的調(diào)查結(jié)果,2015年,視頻的IP傳輸量為每月8000拍字節(jié),而網(wǎng)頁(yè)、電郵和數(shù)據(jù)傳輸每月則為約3000拍字節(jié)。(拍字節(jié)是100萬(wàn)吉字節(jié)或2的50次方字節(jié)。)據(jù)了解,該公司估計(jì),視頻占當(dāng)年大部分互聯(lián)網(wǎng)流量,達(dá)到3.4萬(wàn)拍字節(jié)。文件共享排在第二,達(dá)1.4萬(wàn)拍字節(jié)。
[16]希爾伯特及同事采取了自己的方式,將全世界的信息可視化。在發(fā)表于2011年《科學(xué)》雜志的文章里,他們計(jì)算出,全世界模擬和數(shù)字存儲(chǔ)的信息容量為295個(gè)完美壓縮埃字節(jié)。研究人員寫道:若用光盤存儲(chǔ)295埃字節(jié),需要的光盤將摞到月球(238900英里,即384400公里),接著再壘起地球到月球的四分之一距離。總距離為298625英里(480590公里)。到2007年,94%的信息是數(shù)字化的,意味著如果存儲(chǔ)在光盤上,僅世界上的數(shù)字信息就會(huì)沖過(guò)月球,延伸280707.5英里(451755公里)。
[17]希爾伯特說(shuō),互聯(lián)網(wǎng)的規(guī)模在不斷變化,而其增長(zhǎng)呈跳躍式。這些洶涌而來(lái)的信息,只有一個(gè)可取之處:比起存儲(chǔ)的數(shù)據(jù)量,我們的計(jì)算能力增長(zhǎng)更快。
[18]希爾伯特說(shuō),全世界存儲(chǔ)容量每三年翻一番,但全世界計(jì)算能力每一年半翻一番。2011年,人類可以用其所有計(jì)算機(jī)每秒執(zhí)行64艾條指令——相當(dāng)于人腦每秒的神經(jīng)脈沖數(shù)。5年后,計(jì)算機(jī)的能力將大致達(dá)到8個(gè)人類大腦的水平。當(dāng)然,這并不意味著一個(gè)房間里的8個(gè)人就可以超越全世界的電腦。從許多方面講,人工智能已經(jīng)勝過(guò)人類的認(rèn)知能力(盡管人工智能還遠(yuǎn)未能模擬普通的類人智力)。在線上,人工智能決定了你能看到的臉書(shū)帖子、谷歌搜索內(nèi)容,甚至80%的股票交易。希爾伯特說(shuō),線上數(shù)據(jù)爆炸式增長(zhǎng)唯一有用的地方就是計(jì)算能力的擴(kuò)展。
[19]他說(shuō):“我們正從信息時(shí)代進(jìn)入知識(shí)時(shí)代?!?□
The Internet is a busy place. Every second, approximately 6,000 tweets are tweeted; more than 40,000 Google queries are searched; and more than 2 million emails are sent, according to Internet Live Stats1該信息來(lái)源于http://www.internetlivestats.com/one-second/。本文寫于2016年,數(shù)據(jù)已不準(zhǔn)確。有興趣, a website of the international Real Time Statistics Project.
[2] But these statistics only hint at the size of the Web. As of September 2014,there were 1 billion websites on the Internet, a number that fluctuates by the minute as sites go defunct and others are born.And beneath this constantly changing (but sort of quanti fiable) Internet that’s familiar to most people lies the “Deep Web2這里參照了淺海和深海的概念。,”which includes things Google and other search engines don’t index. Deep Web content can be as innocuous as the results of a search of an online database or as secretive as black-market forums accessible only to those with special Tor33 Tor是The Onion Router的縮寫,是第二代洋蔥路由(onion routing)的一種實(shí)現(xiàn),用戶通過(guò)Tor可以防范流量過(guò)濾、嗅探分析,在互聯(lián)網(wǎng)上實(shí)現(xiàn)匿名交流。software.(Though Tor isn’t only for illegal activity,it’s used wherever people might have reason to go anonymous online.)
[3] Combine the constant change in the “surface” Web with the unquanti fiability of the Deep Web, and it’s easy to see why estimating the size of the Internet is a dif ficult task. However, analysts say the Web is big and getting bigger.
[4] With about 1 billion websites, the Web is home to many more individual Web pages. One of these pages, www.worldwidewebsize.com, seeks to quantify the number using research by Internet consultant Maurice de Kunder. De Kunder and his colleagues published their methodology in February 2016 in the journal Scientometrics4由Springer發(fā)行的學(xué)術(shù)期刊,關(guān)注科學(xué)和科學(xué)研究中的量化方法和特征研究。. To come to an estimate, the researchers sent a batch of 50 common words to be searched by Google and Bing. (Yahoo Search and Ask.com used to be included but are not anymore because they no longer show the total results.) The researchers knew how frequently these words have appeared in print in general, allowing them to extrapolate the total number of pages out there based on how many contain the reference words. Search engines overlap in the pages they index,so the method also requires estimating and subtracting the likely overlap.
[5] According to these calculations,there were at least 4.66 billion Web pages online as of mid-March 2016.This calculation covers only the searchable Web, however, not the Deep Web.
[6] So how much information does the Internet hold? There are three ways to look at that question, said Martin Hilbert, a professor of communications at the University of California, Davis.
[7] “The Internet stores information,the Internet communicates information and the Internet computes information,”Hilbert said. The communication capacity of the Internet can be measured by how much information it can transfer,or how much information it does transfer at any given time, he said.
[8] In 2014, researchers published a study in the journal Supercomputing Frontiers and Innovations estimating the storage capacity of the Internet at 1024bytes, or 1 million exabytes. A byte is a data unit comprising 8 bits, and is equal to a single character in one of the words you’re reading now. An exabyte is 1 billion billion bytes.5有必要列表一下各種字節(jié)單位的換算:1B(byte 字節(jié))=8bit(比特),1KB(Kilobyte千字節(jié))=1024B,1MB(Megabyte 兆字節(jié),簡(jiǎn)稱“兆”)=1024KB,1GB(Gigabyte吉字節(jié),又稱“千兆”)=1024MB,1TB(Terabyte 萬(wàn)億字節(jié),太字節(jié))=1024GB,1PB(Petabyte 千萬(wàn)億字節(jié),拍字節(jié))=1024TB,1EB(Exabyte 百億億字節(jié),埃字節(jié))=1024PB,1ZB(Zettabyte 十萬(wàn)億億字節(jié),澤字節(jié))= 1024EB,1YB(Yottabyte 一億億億字節(jié),堯字節(jié))= 1024ZB。
[9] One way to estimate the communication capacity of the Internet is to measure the traffic moving through it.According to Cisco’s Visual Networking Index initiative, the Internet is now in the“zettabyte era.” A zettabyte equals 1 sextillion6根據(jù)國(guó)際單位制,一個(gè)sextillion相當(dāng)于10的21次方。bytes, or 1,000 exabytes. By the end of 2016, global Internet traffic will reach 1.1 zettabytes per year, according to Cisco, and by 2019, global traffic is expected to hit 2 zettabytes per year.
[10] One zettabyte is the equivalent of 36,000 years of high-definition video,which, in turn, is the equivalent of streaming Net flix7全球最大的在線電視電影節(jié)目付費(fèi)收’s entire catalog 3,177 times, Thomas Barnett Jr., Cisco’s director of thought leadership, wrote in a 2011 blog post about the company’s findings.
[11] In 2011, Hilbert and his colleagues published a paper in the journal Science estimating the communication capacity of the Internet at 3 × 1012kilobits per second, a measure of bandwidth. This was based on hardware capacity, and not on how much information was actually being transferred at any moment.
[12] In one particularly offbeat study,an anonymous hacker measured the size of the Internet by counting how many IPs (Internet Protocols) were in use. IPs are the wayposts of the Internet through which data travels, and each device online has at least one IP address. According to the hacker’s estimate, there were 1.3 billion IP addresses used online in 2012.
[13] The Internet has vastly altered the data landscape. In 2000, before Internet use became ubiquitous, telecommunications capacity was 2.2 optimally compressed exabytes, Hilbert and his colleagues found. In 2007, the number was 65. This capacity includes phone networks and voice calls as well as access to the enormous information reservoir that is the Internet. However,data traffic over mobile networks was already outpacing voice traf fic in 2007,the researchers found.
[14] If all of these bits and bytes feel a little abstract, don’t worry: In 2015,researchers tried to put the Internet’s size in physical terms. The researchers estimated that it would take 2 percent of the Amazon rainforest to make the paper to print out the entire Web (including the Dark Web), they reported in the Journal of Interdisciplinary Science Topics. For that study, they made some big assumptions about the amount of text online by estimating that an average Web page would require 30 pages of A4 paper (8.27 by 11.69 inches). With this assumption, the text on the Internet would require 1.36 × 1011pages to print a hard copy. (A Washington Post reporter later aimed for a better estimate and determined that the average length of a Web page was closer to 6.5 printed pages, yielding an estimate of 305.5 billion pages to print the whole Internet.)
[15] Of course, printing out the Internet in text form wouldn’t include the massive amount of nontext data hosted online. According to Cisco’s research,8,000 petabytes per month of IP traf fic was dedicated to video in 2015, compared with about 3,000 petabytes per month for Web, email and data transfer.(A petabyte is a million gigabytes or 250bytes.) All told, the company estimated that video accounted for most Internet traffic that year, at 34,000 petabytes.File sharing came in second, at 14,000 petabytes.
[16] Hilbert and his colleagues took their own stab at visualizing the world’s information. In their 2011 Science paper, they calculated that the information capacity of the world’s analog and digital storage was 295 optimally compressed exabytes. To store 295 exabytes on CD-ROMS would require a stack of discs reaching to the moon (238,900 miles, or 384,400 kilometers), and then a quarter of the distance from the Earth to the moon again, the researchers wrote. That’s a total distance of 298,625 miles (480,590 km). By 2007,94 percent of information was digital,meaning that the world’s digital information alone would overshoot the moon if stored on CD-ROM. It would stretch 280,707.5 miles (451,755 km).
[17] The Internet’s size is a moving target, Hilbert said, but it’s growing by leaps and bounds. There’s just one saving grace when it comes to this deluge of information: Our computing capacity is growing even faster than the amount of data we store.
[18] While world storage capacity doubles every three years, world computing capacity doubles every year and a half, Hilbert said. In 2011, humanity could carry out 6.4 × 1018instructions per second with all of its computers—similar to the number of nerve impulses per second in the human brain. Five years later, computational power is up in the ballpark of about eight human brains. That doesn’t mean, of course,that eight people in a room could outthink the world’s computers. In many ways, artificial intelligence already outperforms human cognitive capacity(though A.I. is still far from mimicking general, humanlike intelligence). Online, artificial intelligence determines which Facebook posts you see, what comes up in a Google search and even 80 percent of stock market transactions.The expansion of computing power is the only thing making the explosion of data online useful, Hilbert said.
[19] “We’re going from an information age to a knowledge age,” he said. ■