Sunday, March 30, 2008
World Wide Web
From Wikipedia, the free encyclopedia
The World Wide Web (commonly shortened to the Web) is a system of interlinked, hypertext documents accessed via the Internet. With a web browser, a user views web pages that may contain text, images, videos, and other multimedia and navigates between them using hyperlinks. The World Wide Web was created in 1989 by Sir Tim Berners-Lee and Sir Sam Walker from the United Kingdom, and Robert Cailliau from Belgium, working at CERN in Geneva, Switzerland. Since then, Berners-Lee has played an active role in guiding the development of web standards (such as the markup languages in which web pages are composed), and in recent years has advocated his vision of a Semantic Web.
How the Web works
Viewing a web page on the World Wide Web normally begins either by typing the URL of the page into a web browser, or by following a hypertext link to that page or resource. The web browser then begins a series of communications, behind the scenes, in order to fetch and display it.
First, the server-name portion of the URL is resolved into an IP address using the global, distributed Internet database known as the domain name system, or DNS. This IP address is necessary to contact and send data packets to the web server.
The browser then requests the resource by sending an HTTP request to the web server at that particular address. In the case of a typical web page, the HTML text of the page is requested first and parsed immediately by the web browser, which will then make additional requests for images and any other files that form a part of the page. Statistics measuring a website's popularity are usually based on the number of 'page views' or associated server 'hits', or file requests, which take place.
Having received the required files from the web server, the browser then renders the page onto the screen as specified by its HTML, CSS, and other web languages. Any images and other resources are incorporated to produce the on-screen web page that the user sees.
Most web pages will themselves contain hyperlinks to other related pages and perhaps to downloads, source documents, definitions and other web resources. Such a collection of useful, related resources, interconnected via hypertext links, is what was dubbed a "web" of information. Making it available on the Internet created what Tim Berners-Lee first called the WorldWideWeb (note the original name's use of CamelCase, subsequently discarded) in 1990.[1]
Caching
If a user revisits a web page after only a short interval, the page data may not need to be re-obtained from the source web server. Almost all web browsers cache recently-obtained data, usually on the local hard drive. HTTP requests sent by a browser will usually only ask for data that has changed since the last download. If the locally-cached data is still current, it will be reused.
Caching helps reduce the amount of web traffic on the Internet. The decision about expiration can be made independently for each downloaded file, whether image, stylesheet, JavaScript, HTML, or whatever other content the site may provide. Thus even on sites with highly dynamic content, many of the basic resources may only need to be refreshed once every few sessions. Web site designers may find it worthwhile to collate shared resources such as CSS data and JavaScript into a few site-wide files so that they can be cached efficiently. This helps reduce page download times and lowers demands on the web server.
There are other components of the Internet that can also cache web content. In practice, the most widely-used caches are built into corporate and academic firewalls which cache web resources requested by one user for the benefit of all. (See also Caching proxy server.) Some search engines, such as Google or Yahoo!, also store cached content from web sites.
Apart from the facilities built into web servers that can determine when files have been updated, designers of dynamically-generated web pages can control the HTTP headers sent back to requesting users, so that transient or sensitive pages are not cached. Internet banking and news sites frequently use these facilities.
Data requested with an HTTP 'GET' is likely to be cached if other conditions are met, whereas data obtained via a 'POST' command is assumed to be dependent on the data that was POSTed and so will not be cached.
History
This NeXTcube used by Berners-Lee at CERN became the first Web server.The concept of a home-based global information system goes back at least as far as Isaac Asimov's short story "Anniversary" (Amazing Stories, March 1959), in which the characters look up information on a home computer called a "Multivac outlet" -- which was connected by a "planetwide network of circuits" to a mile-long "super-computer" somewhere in the bowels of the Earth. One character is thinking of installing a Mulitvac, Jr. model for his kids.
The story was set in the far distant future when commercial space travel was commonplace, and yet the machine "prints the answer on a slip of tape" that comes out a slot –
there is no video display -- and the owner of the home computer says that he doesn't spend the kind of money to get a Multivac outlet that talks.
The underlying ideas of the Web can be traced as far back as 1980, when, at CERN in Switzerland, Tim Berners-Lee built ENQUIRE (referring to Enquire Within Upon Everything, a book he recalled from his youth). While it was rather different from the system in use today, it contained many of the same core ideas (and even some of the ideas of Berners-Lee's next project after the World Wide Web, the Semantic Web).
In March 1989, Tim Berners-Lee wrote a proposal,[2] which referenced ENQUIRE and described a more elaborate information management system. With help from Robert Cailliau, he published a more formal proposal for the World Wide Web on November 12, 1990.[3]
A NeXTcube was used by Berners-Lee as the world's first web server and also to write the first web browser, WorldWideWeb, in 1990. By Christmas 1990, Berners-Lee had built all the tools necessary for a working Web:[4] the first web browser (which was a web editor as well), the first web server, and the first web pages[5] which described the project itself.
On August 6, 1991, he posted a short summary of the World Wide Web project on the alt.hypertext newsgroup.[6] This date also marked the debut of the Web as a publicly available service on the Internet.
The crucial underlying concept of hypertext originated with older projects from the 1960s, such as Ted Nelson's Project Xanadu and Douglas Engelbart's oN-Line System (NLS). Both Nelson and Engelbart were in turn inspired by Vannevar Bush's microfilm-based "memex," which was described in the 1945 essay "As We May Think."
Berners-Lee's breakthrough was to marry hypertext to the Internet. In his book Weaving The Web, he explains that he had repeatedly suggested that a marriage between the two technologies was possible to members of both technical communities, but when no one took up his invitation, he finally tackled the project himself. In the process, he developed a system of globally unique identifiers for resources on the Web and elsewhere: the Uniform Resource Identifier.
The World Wide Web had a number of differences from other hypertext systems that were then available. The Web required only unidirectional links rather than bidirectional ones. This made it possible for someone to link to another resource without action by the owner of that resource. It also significantly reduced the difficulty of implementing web servers and browsers (in comparison to earlier systems), but in turn presented the chronic problem of link rot. Unlike predecessors such as HyperCard, the World Wide Web was non-proprietary, making it possible to develop servers and clients independently and to add extensions without licensing restrictions.
On April 30, 1993, CERN announced[7] that the World Wide Web would be free to anyone, with no fees due. Coming two months after the announcement that the Gopher protocol was no longer free to use, this produced a rapid shift away from Gopher and towards the Web. An early popular web browser was ViolaWWW, which was based upon HyperCard.
Scholars generally agree, however, that the turning point for the World Wide Web began with the introduction[8] of the Mosaic web browser[9] in 1993, a graphical browser developed by a team at the National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign (NCSA-UIUC), led by Marc Andreessen. Funding for Mosaic came from the High-Performance Computing and Communications Initiative, a funding program initiated by then-Senator Al Gore's High Performance Computing and Communication Act of 1991, also known as the Gore Bill.[10] (See Al Gore's contributions to the Internet and technology for more information.) Prior to the release of Mosaic, graphics were not commonly mixed with text in web pages, and its popularity was less than older protocols in use over the Internet, such as Gopher and Wide Area Information Servers (WAIS). Mosaic's graphical user interface allowed the Web to become, by far, the most popular Internet protocol.
Standards
Many formal standards and other technical specifications define the operation of different aspects of the World Wide Web, the Internet, and computer information exchange. Many of the documents are the work of the World Wide Web Consortium (W3C), headed by Berners-Lee, but some are produced by the Internet Engineering Task Force (IETF) and other organizations.
Usually, when web standards are discussed, the following publications are seen as foundational:
Recommendations for markup languages, especially HTML and XHTML, from the W3C. These define the structure and interpretation of hypertext documents.
Recommendations for stylesheets, especially CSS, from the W3C.
Standards for ECMAScript, a.k.a. JavaScript, from Ecma International.
Recommendations for the Document Object Model, from W3C.
Additional publications provide definitions of other essential technologies for the World Wide Web, including, but not limited to, the following:
Uniform Resource Identifier (URI), which is a universal system for referencing resources on the Internet, such as hypertext documents and images. URIs, often called URLs, are defined by the IETF's RFC 3986 / STD 66: Uniform Resource Identifier (URI): Generic Syntax, as well as its predecessors and numerous URI scheme-defining RFCs;
HyperText Transfer Protocol (HTTP), especially as defined by RFC 2616: HTTP/1.1 and RFC 2617: HTTP Authentication, which specify how the browser and server communicate with each other.
Java and JavaScript
A significant advance in Web technology was Sun Microsystems' Java platform. It enables web pages to embed small programs (called applets) directly into the view. These applets run on the end-user's computer, providing a richer user interface than simple web pages. Java client-side applets never gained the popularity that Sun had hoped for a variety of reasons, including lack of integration with other content (applets were confined to small boxes within the rendered page) and the fact that many computers at the time were supplied to end users without a suitably installed Java Virtual Machine, and so required a download by the user before applets would appear. Adobe Flash now performs many of the functions that were originally envisioned for Java applets, including the playing of video content, animation, and some rich UI features. Java itself has become more widely used as a platform and language for server-side and other programming.
JavaScript, on the other hand, is a scripting language that was initially developed for use within web pages. The standardized version is ECMAScript. While its name is similar to Java, JavaScript was developed by Netscape and it has almost nothing to do with Java, although, like Java, its syntax is derived from the C programming language. In conjunction with a web page's Document Object Model, JavaScript has become a much more powerful technology than its creators originally envisioned. The manipulation of a page's Document Object Model after the page is delivered to the client has been called Dynamic HTML (DHTML), to emphasize a shift away from static HTML displays.
In simple cases, all the optional information and actions available on a JavaScript-enhanced web page will have been downloaded when the page was first delivered. Ajax ("Asynchronous JavaScript And XML") is a JavaScript-based technology that provides a method whereby parts within a web page may be updated, using new information obtained over the network at a later time in response to user actions. This allows the page to be more responsive, interactive and interesting, without the user having to wait for whole-page reloads. Ajax is seen as an important aspect of what is being called Web 2.0. Examples of Ajax techniques currently in use can be seen in Gmail, Google Maps, and other dynamic web applications.
Publishing web pages
Web pages are available to individuals outside mass media. In order to publish a web page, one does not have to go through a publisher or other media institution, and potential readers could be found in all corners of the globe.
Unlike books and other documents, hypertext does not need to have a linear order from beginning to end. It is not necessarily broken down into the hierarchy of chapters, sections, subsections, and so on.
Many different kinds of information are now available on the Web, and for those who wish to know other societies, cultures, and peoples, it has become easier. When traveling in a foreign country or a remote town, one might be able to find some information about the place on the Web, especially if the place is in one of the developed countries. Local newspapers, government publications, and other materials are easier to access, and therefore the variety of information obtainable with the same effort may be said to have increased for the users of the Internet.
Although some web sites are available in multiple languages, many are in the local language only. Additionally, not all software supports all special characters, and RTL languages. These factors would challenge the notion that the World Wide Web will bring a unity to the world.[citation needed]
The increased opportunity to publish materials is certainly observable in the countless personal pages, as well as pages by families, small shops, etc., facilitated by the emergence of free web hosting services.
Statistics
According to a 2001 study, there were more than 550 billion documents on the Web, mostly in the "invisible web", or deep web.[11] A 2002 survey of 2,024 million web pages[12] determined that by far the most web content was in English: 56.4%; next were pages in German (7.7%), French (5.6%), and Japanese (4.9%). A more recent study, which used web searches in 75 different languages to sample the Web, determined that there were over 11.5 billion web pages in the publicly indexable web as of the end of January 2005.[13]
Speed issues
Frustration over congestion issues in the Internet infrastructure and the high latency that results in slow browsing has led to an alternative, pejorative name for the World Wide Web: the World Wide Wait. Speeding up the Internet is an ongoing discussion over the use of peering and QoS technologies. Other solutions to reduce the World Wide Wait can be found on W3C.
Standard guidelines for ideal web response times are (Nielsen 1999, page 42):
0.1 second (one tenth of a second). Ideal response time. The user doesn't sense any interruption.
1 second. Highest acceptable response time. Download times above 1 second interrupt the user experience.
10 seconds. Unacceptable response time. The user experience is interrupted and the user is likely to leave the site or system.
These numbers are useful for planning server capacity.
The World Wide Web (commonly shortened to the Web) is a system of interlinked, hypertext documents accessed via the Internet. With a web browser, a user views web pages that may contain text, images, videos, and other multimedia and navigates between them using hyperlinks. The World Wide Web was created in 1989 by Sir Tim Berners-Lee and Sir Sam Walker from the United Kingdom, and Robert Cailliau from Belgium, working at CERN in Geneva, Switzerland. Since then, Berners-Lee has played an active role in guiding the development of web standards (such as the markup languages in which web pages are composed), and in recent years has advocated his vision of a Semantic Web.
How the Web works
Viewing a web page on the World Wide Web normally begins either by typing the URL of the page into a web browser, or by following a hypertext link to that page or resource. The web browser then begins a series of communications, behind the scenes, in order to fetch and display it.
First, the server-name portion of the URL is resolved into an IP address using the global, distributed Internet database known as the domain name system, or DNS. This IP address is necessary to contact and send data packets to the web server.
The browser then requests the resource by sending an HTTP request to the web server at that particular address. In the case of a typical web page, the HTML text of the page is requested first and parsed immediately by the web browser, which will then make additional requests for images and any other files that form a part of the page. Statistics measuring a website's popularity are usually based on the number of 'page views' or associated server 'hits', or file requests, which take place.
Having received the required files from the web server, the browser then renders the page onto the screen as specified by its HTML, CSS, and other web languages. Any images and other resources are incorporated to produce the on-screen web page that the user sees.
Most web pages will themselves contain hyperlinks to other related pages and perhaps to downloads, source documents, definitions and other web resources. Such a collection of useful, related resources, interconnected via hypertext links, is what was dubbed a "web" of information. Making it available on the Internet created what Tim Berners-Lee first called the WorldWideWeb (note the original name's use of CamelCase, subsequently discarded) in 1990.[1]
Caching
If a user revisits a web page after only a short interval, the page data may not need to be re-obtained from the source web server. Almost all web browsers cache recently-obtained data, usually on the local hard drive. HTTP requests sent by a browser will usually only ask for data that has changed since the last download. If the locally-cached data is still current, it will be reused.
Caching helps reduce the amount of web traffic on the Internet. The decision about expiration can be made independently for each downloaded file, whether image, stylesheet, JavaScript, HTML, or whatever other content the site may provide. Thus even on sites with highly dynamic content, many of the basic resources may only need to be refreshed once every few sessions. Web site designers may find it worthwhile to collate shared resources such as CSS data and JavaScript into a few site-wide files so that they can be cached efficiently. This helps reduce page download times and lowers demands on the web server.
There are other components of the Internet that can also cache web content. In practice, the most widely-used caches are built into corporate and academic firewalls which cache web resources requested by one user for the benefit of all. (See also Caching proxy server.) Some search engines, such as Google or Yahoo!, also store cached content from web sites.
Apart from the facilities built into web servers that can determine when files have been updated, designers of dynamically-generated web pages can control the HTTP headers sent back to requesting users, so that transient or sensitive pages are not cached. Internet banking and news sites frequently use these facilities.
Data requested with an HTTP 'GET' is likely to be cached if other conditions are met, whereas data obtained via a 'POST' command is assumed to be dependent on the data that was POSTed and so will not be cached.
History
This NeXTcube used by Berners-Lee at CERN became the first Web server.The concept of a home-based global information system goes back at least as far as Isaac Asimov's short story "Anniversary" (Amazing Stories, March 1959), in which the characters look up information on a home computer called a "Multivac outlet" -- which was connected by a "planetwide network of circuits" to a mile-long "super-computer" somewhere in the bowels of the Earth. One character is thinking of installing a Mulitvac, Jr. model for his kids.
The story was set in the far distant future when commercial space travel was commonplace, and yet the machine "prints the answer on a slip of tape" that comes out a slot –
there is no video display -- and the owner of the home computer says that he doesn't spend the kind of money to get a Multivac outlet that talks.
The underlying ideas of the Web can be traced as far back as 1980, when, at CERN in Switzerland, Tim Berners-Lee built ENQUIRE (referring to Enquire Within Upon Everything, a book he recalled from his youth). While it was rather different from the system in use today, it contained many of the same core ideas (and even some of the ideas of Berners-Lee's next project after the World Wide Web, the Semantic Web).
In March 1989, Tim Berners-Lee wrote a proposal,[2] which referenced ENQUIRE and described a more elaborate information management system. With help from Robert Cailliau, he published a more formal proposal for the World Wide Web on November 12, 1990.[3]
A NeXTcube was used by Berners-Lee as the world's first web server and also to write the first web browser, WorldWideWeb, in 1990. By Christmas 1990, Berners-Lee had built all the tools necessary for a working Web:[4] the first web browser (which was a web editor as well), the first web server, and the first web pages[5] which described the project itself.
On August 6, 1991, he posted a short summary of the World Wide Web project on the alt.hypertext newsgroup.[6] This date also marked the debut of the Web as a publicly available service on the Internet.
The crucial underlying concept of hypertext originated with older projects from the 1960s, such as Ted Nelson's Project Xanadu and Douglas Engelbart's oN-Line System (NLS). Both Nelson and Engelbart were in turn inspired by Vannevar Bush's microfilm-based "memex," which was described in the 1945 essay "As We May Think."
Berners-Lee's breakthrough was to marry hypertext to the Internet. In his book Weaving The Web, he explains that he had repeatedly suggested that a marriage between the two technologies was possible to members of both technical communities, but when no one took up his invitation, he finally tackled the project himself. In the process, he developed a system of globally unique identifiers for resources on the Web and elsewhere: the Uniform Resource Identifier.
The World Wide Web had a number of differences from other hypertext systems that were then available. The Web required only unidirectional links rather than bidirectional ones. This made it possible for someone to link to another resource without action by the owner of that resource. It also significantly reduced the difficulty of implementing web servers and browsers (in comparison to earlier systems), but in turn presented the chronic problem of link rot. Unlike predecessors such as HyperCard, the World Wide Web was non-proprietary, making it possible to develop servers and clients independently and to add extensions without licensing restrictions.
On April 30, 1993, CERN announced[7] that the World Wide Web would be free to anyone, with no fees due. Coming two months after the announcement that the Gopher protocol was no longer free to use, this produced a rapid shift away from Gopher and towards the Web. An early popular web browser was ViolaWWW, which was based upon HyperCard.
Scholars generally agree, however, that the turning point for the World Wide Web began with the introduction[8] of the Mosaic web browser[9] in 1993, a graphical browser developed by a team at the National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign (NCSA-UIUC), led by Marc Andreessen. Funding for Mosaic came from the High-Performance Computing and Communications Initiative, a funding program initiated by then-Senator Al Gore's High Performance Computing and Communication Act of 1991, also known as the Gore Bill.[10] (See Al Gore's contributions to the Internet and technology for more information.) Prior to the release of Mosaic, graphics were not commonly mixed with text in web pages, and its popularity was less than older protocols in use over the Internet, such as Gopher and Wide Area Information Servers (WAIS). Mosaic's graphical user interface allowed the Web to become, by far, the most popular Internet protocol.
Standards
Many formal standards and other technical specifications define the operation of different aspects of the World Wide Web, the Internet, and computer information exchange. Many of the documents are the work of the World Wide Web Consortium (W3C), headed by Berners-Lee, but some are produced by the Internet Engineering Task Force (IETF) and other organizations.
Usually, when web standards are discussed, the following publications are seen as foundational:
Recommendations for markup languages, especially HTML and XHTML, from the W3C. These define the structure and interpretation of hypertext documents.
Recommendations for stylesheets, especially CSS, from the W3C.
Standards for ECMAScript, a.k.a. JavaScript, from Ecma International.
Recommendations for the Document Object Model, from W3C.
Additional publications provide definitions of other essential technologies for the World Wide Web, including, but not limited to, the following:
Uniform Resource Identifier (URI), which is a universal system for referencing resources on the Internet, such as hypertext documents and images. URIs, often called URLs, are defined by the IETF's RFC 3986 / STD 66: Uniform Resource Identifier (URI): Generic Syntax, as well as its predecessors and numerous URI scheme-defining RFCs;
HyperText Transfer Protocol (HTTP), especially as defined by RFC 2616: HTTP/1.1 and RFC 2617: HTTP Authentication, which specify how the browser and server communicate with each other.
Java and JavaScript
A significant advance in Web technology was Sun Microsystems' Java platform. It enables web pages to embed small programs (called applets) directly into the view. These applets run on the end-user's computer, providing a richer user interface than simple web pages. Java client-side applets never gained the popularity that Sun had hoped for a variety of reasons, including lack of integration with other content (applets were confined to small boxes within the rendered page) and the fact that many computers at the time were supplied to end users without a suitably installed Java Virtual Machine, and so required a download by the user before applets would appear. Adobe Flash now performs many of the functions that were originally envisioned for Java applets, including the playing of video content, animation, and some rich UI features. Java itself has become more widely used as a platform and language for server-side and other programming.
JavaScript, on the other hand, is a scripting language that was initially developed for use within web pages. The standardized version is ECMAScript. While its name is similar to Java, JavaScript was developed by Netscape and it has almost nothing to do with Java, although, like Java, its syntax is derived from the C programming language. In conjunction with a web page's Document Object Model, JavaScript has become a much more powerful technology than its creators originally envisioned. The manipulation of a page's Document Object Model after the page is delivered to the client has been called Dynamic HTML (DHTML), to emphasize a shift away from static HTML displays.
In simple cases, all the optional information and actions available on a JavaScript-enhanced web page will have been downloaded when the page was first delivered. Ajax ("Asynchronous JavaScript And XML") is a JavaScript-based technology that provides a method whereby parts within a web page may be updated, using new information obtained over the network at a later time in response to user actions. This allows the page to be more responsive, interactive and interesting, without the user having to wait for whole-page reloads. Ajax is seen as an important aspect of what is being called Web 2.0. Examples of Ajax techniques currently in use can be seen in Gmail, Google Maps, and other dynamic web applications.
Publishing web pages
Web pages are available to individuals outside mass media. In order to publish a web page, one does not have to go through a publisher or other media institution, and potential readers could be found in all corners of the globe.
Unlike books and other documents, hypertext does not need to have a linear order from beginning to end. It is not necessarily broken down into the hierarchy of chapters, sections, subsections, and so on.
Many different kinds of information are now available on the Web, and for those who wish to know other societies, cultures, and peoples, it has become easier. When traveling in a foreign country or a remote town, one might be able to find some information about the place on the Web, especially if the place is in one of the developed countries. Local newspapers, government publications, and other materials are easier to access, and therefore the variety of information obtainable with the same effort may be said to have increased for the users of the Internet.
Although some web sites are available in multiple languages, many are in the local language only. Additionally, not all software supports all special characters, and RTL languages. These factors would challenge the notion that the World Wide Web will bring a unity to the world.[citation needed]
The increased opportunity to publish materials is certainly observable in the countless personal pages, as well as pages by families, small shops, etc., facilitated by the emergence of free web hosting services.
Statistics
According to a 2001 study, there were more than 550 billion documents on the Web, mostly in the "invisible web", or deep web.[11] A 2002 survey of 2,024 million web pages[12] determined that by far the most web content was in English: 56.4%; next were pages in German (7.7%), French (5.6%), and Japanese (4.9%). A more recent study, which used web searches in 75 different languages to sample the Web, determined that there were over 11.5 billion web pages in the publicly indexable web as of the end of January 2005.[13]
Speed issues
Frustration over congestion issues in the Internet infrastructure and the high latency that results in slow browsing has led to an alternative, pejorative name for the World Wide Web: the World Wide Wait. Speeding up the Internet is an ongoing discussion over the use of peering and QoS technologies. Other solutions to reduce the World Wide Wait can be found on W3C.
Standard guidelines for ideal web response times are (Nielsen 1999, page 42):
0.1 second (one tenth of a second). Ideal response time. The user doesn't sense any interruption.
1 second. Highest acceptable response time. Download times above 1 second interrupt the user experience.
10 seconds. Unacceptable response time. The user experience is interrupted and the user is likely to leave the site or system.
These numbers are useful for planning server capacity.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment