How much information berkeley




















Good data is available for the worldwide production of each storage medium, providing an upper bound for the potential production of original information and copies. There are often good estimates for how much original content is produced in each of these different storage formats, particularly for the advanced economies that produce the most information. Detailed source information and the inferences that were used to produce these calculations are presented in detail in the web pages on Paper , Film , Magnetic , and Optical accessible from the links at the top of this page.

Upper estimates assume information is digitally scanned, lower estimates assume digital content has been compressed.

There has been dramatic growth in storage of new information over the past two years in every storage medium except film. Film-based content — especially photographs — is migrating to digital media, both optical and magnetic. A tree can produce about 80, sheets of paper, thus it requires about million trees to produce the world's annual paper supply. But paper consumption is not equal; annually each of the inhabitants of North America consumes 11, sheets of paper 24 reams , and inhabitants of the European Union consume 7, sheets of paper 15 reams.

At least half of this paper is used in printers and copiers to produce office documents. Upper estimate is scanned; lower estimate is compressed. The vast majority of this increase is from the creation of office documents -- largely the production of computer printers. Office documents are a larger proportion of print in the U. Also noteworthy is the increase in simultaneous publication of printed information in digital format, such as online newspapers and journals. There appears to be an increase in newspaper production in developing countries, although this may be a reflection of better statistical reporting.

For details on this data, our sources and calculations see Paper. Mail 6, 5, 4. The U. About half of all postal mail in the United States is currently first class and about half is junk mail. If we assume 2 pages per piece of mail, digitized at 15 kilobytes per page, U. This represents an increase of about one-half of a petabyte over estimates. Film is a storage medium for analog images that is evolving towards digital images stored on magnetic and optical media.

For details on this data, our sources and calculations see Film. For details on this data, our sources and calculations, see Magnetic. Lower Estimate Report.

Optical storage media are the medium of choice for the distribution of software, data, cinema and music -- although a small proportion of digital information overall. For details on this data, our sources and calculations see Optical. Communication flows through four electronic channels: radio and television broadcasting, telephone calls, and the Internet.

Each channel requires access to a form of information technology: radios, television sets, telephones, and computers. Thus like storage media, information flows are distributed unequally around the world. Information stored on paper, film, optical, and magnetic media totals about 5 exabytes of new information each year; this is less than one third of the new information that is communicated through electronic information flows — telephone, radio and TV, and the Internet — which is about The striking finding here is that most of the total volume of new information flows is derived from the volume of voice telephone traffic, most of which is unique content.

The second largest component of information flows is the Internet. World radio stations produce million hours of radio broadcasting, which would require 16, terabytes to store; we estimate 70 million hours are original programming, which would require an annual storage requirement of about 3, terabytes.

World television stations produce about million hours total programming; we estimate about 31 million hours are original programming, requiring about 70, terabytes of storage.

In the United States, there are 13, radio stations producing As of there were broadcast TV stations in the United States producing about For details on this data, our sources and calculations see Broadcast. There are 1. There are million main telephone lines in the U. It would take 9.

The number of landline phones in the U. Mobile phones used more than billion minutes in , an equivalent of 2. For details on this data, our sources and calculations see Telephony. Although the Internet is the newest medium for information flows, it is the fastest growing new medium of all time, becoming the information medium of first resort for its users. Note that the Web consists of the surface web fixed web pages and what Bright Planet calls the deep web the database driven websites that create web pages on demand.

For details on this data, our sources and calculations see Internet. In we estimated the volume of information on the public Web at 20 to 50 terabytes; in we measured the volume of information on the Web at terabytes - at least triple the amount of information.

The surface web is about terabytes as of Summer ; BrightPlanet estimates the deep web to be to times larger, thus between 66, and 91, terabytes.

About 31 billion emails are sent daily, on the Internet and elsewhere, a figure which is expected to double by source: International Data Corporation IDC. The average email is about 59 kilobytes in size, thus the annual flow of emails worldwide is , terabytes. A significant new source of storing, creating and exchanging media and data on the Internet is through P2P file sharing networks.

KaZaA, the most popular of these applications, has recently reached over million downloads worldwide, with an average of 2 million more per week source: Download. Users on KaZaA share almost 5, terabytes of information, over million files and have over 3 million users active on average at any given time source: KaZaA.

We found 1,, files consisting of GB Files ranged in size from 1 Byte to 1. Using this sample, we were able to describe how P2P users consume information.

We have had to make various working assumptions in order to construct these estimates, and some data sources are contradictory or simply not available, thus our estimates are often rough.

Here we list some of the most serious methodological qualifications, each of which offers interesting challenges for those who would seek to refine these estimates. Our documentary research methodology is to estimate yearly U. The data supporting these estimates is often difficult to find, or does not exist at all, and key questions often cannot be answered because no data is collected e.

Estimates are marked with three question marks [??? For those reasons we have documented our sources in these reports and defined the working assumptions we have made in producing these estimates, hoping that our readers will help us to identify better sources and to improve our working assumptions.

It is very difficult to distinguish "copies" from "original" information. There is also lot of duplication within each medium: many newspapers reproduce stock prices, wire stories, advertisements and so on.

Ideally, we would like to measure the storage required for the unique content in the newspaper, but it is very hard to measure that number. As indicated above, the duplication issue is particularly serious for digital storage, since little of what is stored on individual hard drives is unique. We've tried to adjust for this the best we can, and documented our assumptions in the detailed treatment of each medium.

Unlike print or film, there is no unambiguous way to measure the size of digital information. A dot per inch scanned digital image of text can be compressed to about one hundredth of its original size.

A DVD version of a movie can be times smaller than the original digital image. It is worth noting that the fact that digital storage can be compressed to different degrees depending on needs is a significant advantage for digital over analog storage. Should information stored as "backup" be included in the total? This question arises for microfilm, rewritable CD ROMS, and even with print, but digital magnetic tape is the most difficult case.

Industry rules of thumb suggest that there is about 10 times as much storage on tape as on hard drives. This fraction has been falling as more and more data is stored on arrays of hard drives, which are much more convenient to use. We've omitted most tape storage for this reason. However, we should also note that vast quantities of original scientific data are stored in tape libraries; we describe a few such repositories in the detailed treatment of magnetic storage.

World and US production. We don't have good data on magnetic storage, but it seems plausible that the US produces at least half of the content stored on magnetic media. We've used numbers for world production when available, but in some cases have had to extrapolate from US production. Little data is available about information production in the Third World.

Growth rates. The production of unique content in books, photos, and CDs is barely growing. DVD content is growing rapidly, but that's because it is a new medium and a significant amount of legacy content is being converted. By contrast, shipments of digital magnetic storage are essentially doubling every year.

TV and Radio. Original TV content produced each year is generally stored on magnetic camcorder tapes, and so is counted in that category of storage media. Much radio content is simply broadcast music, which we have already captured with the CD statistics. See Table 3 for information on how much storage it would take to back up all TV and radio broadcasts, with minimal adjustment for duplication.

The world's total production of information amounts to about megabytes for each man, woman, and child on earth. It is clear that we are all drowning in a sea of information. The challenge is to learn to swim in that sea, rather than drown in it. Better understanding and better tools are desperately needed if we are to take full advantage of the ever-increasing supply of information described in this report. About this Report. Financial support for this study was provided by EMC.

We view this report as a "living document" and intend to revise it based on comments, corrections, and suggestions. Moderate information overload: read the Sound Bytes and look at the Charts illustrating our findings.

Normal information overload: read the Executive Summary. Information deprived: read the detailed reports by clicking on the contents to your left. Or download the entire Web site as a PDF file. It is about pages long. This study was produced by faculty and students at the School of Information Management and Systems at the University of California at Berkeley. We gratefully acknowledge financial support from EMC.



0コメント

  • 1000 / 1000