An internet primer: web addresses and web pages

URLs

A Uniform Resource Locator (URL), colloquially termed a web address, is a reference to a web resource that specifies its location on a computer network and a mechanism for retrieving it. A URL is a specific type of Uniform Resource Identifier (URI), although many people use the two terms interchangeably. The most common URLs are those referencing web pages, primarily described here, but they are also used for many other applications.

Most web browsers display the URL of a web page above the page in an address bar.

A typical URL would be in the form http://www.example.com/index.html referencing a scheme (http), a hostname (www.example.com) and a path (/index.html). It might also include a suffixed query (starting ?) or a fragment (starting #).

Although schemes are case-insensitive, the canonical form is lowercase and documents that specify schemes must do so with lowercase letters. The scheme name is followed by a colon. Two following slashes are required by some schemes (including http) but not by some others. URI schemes should be registered with the Internet Assigned Numbers Authority (IANA) which also administers the Domain Name System, although non-registered schemes are used in practice.

Common schemes include:

  • http://     a web page
  • https://   a secure web page
  • mailto:    an email address
  • ftp:           a file transfer address

A hostname must be either a registered domain name or an IP address. IPv4 addresses must be in dot-decimal notation; IPv6 addresses must be enclosed in square brackets.

A path, usually organised in hierarchical form, is a sequence of segments separated by slashes.

A query string, starting ?, by convention is most often a sequence of attribute–value pairs separated by a delimiter. For example: ?key1=value1&key2=value2

A fragment, starting #, references a secondary resource, such as a section heading within a page.

A web browser will usually dereference a URL by performing an HTTP request to the specified host. URLs using the HTTPS scheme require that requests and responses will be made over a secure connection to the website.

Web pages

A web page is a document that can be displayed in a web browser. Browsers receive documents coded in HTML (see below) from a web server or from local storage and render them into web pages.

A so-called “static” web page is a web page that is delivered to the user’s web browser exactly as stored, ie as HTML; the user requests a web page and simply views the page and the information on that page. A “dynamic” web page on the other hand is one that is dynamically generated and displayed in response to different contexts or conditions, either on the server side and/or on the client (user) side.

By far the majority of web pages are now dynamic pages, managed and delivered by content management systems, the most common of which is WordPress. Scripting languages such as PHP, ASP and others are used to dynamically build the requested pages.

Server-side responses are typically determined by parameters in the URL or posted in a web form, but also by other factors such as the type of browser, the passage of time or a database or server state.

Client-side scripting languages like JavaScript are used to change interface behaviours within a specific web page in response to user actions or timed events. Typical examples would be pop up dialogues and drop down menus.

HTML

HyperText Markup Language (HTML) is the standard markup language that web browsers use to interpret and compose text, images, and other material into web pages.

HTML describes the structure of a web page semantically. Default characteristics for every item of HTML markup are defined in the browser. However, generally style sheets written in CSS are used to define the look and layout of content. As mentioned above, HTML pages can also include embedded programs written in a scripting language such as JavaScript which affect their behaviour and content.

Tim Berners-Lee specified HTML in 1989 and wrote the first browser and server software in late 1990. The first publicly available description of HTML described a set of 18 textual elements representing headings, paragraphs, lists, links, quotes and other items.

HTML elements are delineated by “tags”, written using angle brackets. Elements such as <p>…</p> surround and provide information about document text and may include other tags as sub-elements. Browsers use these tags to interpret the content of the page. Other tags such as <img /> and <input /> introduce content into the page directly.

Many of the text elements are found in the 1988 ISO technical report on Techniques for using the Standard Generalised Markup Language (SGML); Berners-Lee considered HTML to be a subset of SGML. The SGML concept of generalised markup is based on elements (nested annotated ranges with attributes), describing structure rather than print or presentational effects. HTML has been progressively moved in this direction with CSS.

Since 1995 the HTML specifications have been maintained by the World Wide Web Consortium (W3C).

In 2000 W3C recommended use of Extensible HyperText Markup Language (XHTML) which is more restrictive than HTML, requiring documents to be “well-formed”. For example XHTML elements must have an end tag whereas HTML elements need not.

CSS

Cascading Style Sheets (CSS) is a style sheet language used for describing the presentation of a document written in a markup language. Although most often used to set the visual style of web pages and user interfaces written in HTML, CSS can be applied to any XML document.

CSS is designed primarily to enable the separation of presentation and content, including aspects such as the layout, colours and fonts. It enables the same page to be presented in different styles for different rendering methods, such as on-screen, in print, by voice (via speech-based browser or screen reader), and on Braille-based tactile devices. Importantly, for on-screen viewing, it enables the display to be optimised for different screen sizes. Readers can also specify a different style sheet, such as a CSS file stored on their own computer, to override the one the author specified.

Changes to the graphic design of a document (or hundreds of documents) can be applied quickly and easily, by editing a few lines in the CSS file they use, rather than by changing markup in the documents.

If more than one rule matches against a particular element a series of rules (the “cascade”) is used to determine which style applies. The rules are based on importance (generally user > author > browser), specificity (inline > ID > class > element) and the order of the code itself (latest takes precedence).

The CSS specifications are maintained by W3C which operates a free CSS validation service for CSS documents.

JavaScript

JavaScript (JS) is a scripting language used to create and control dynamic in-page website content, ie anything that refreshes, moves or otherwise changes on your screen without requiring you to reload the web page. Everyday examples of implementations include popups and dropdowns, typeahead suggestions and social media timeline auto updates.

JavaScript is regarded as one of the core technologies of the web; the vast majority of websites use it for client-side page behaviour and all major web browsers have a dedicated JavaScript engine.

JavaScript was first developed by Netscape for its browser in 1995. Netscape submitted JavaScript to ECMA International for adoption and development as a standard specification for all browsers. The ECMAScript specification (the official name of JavaScript) is now maintained as an open standard on GitHub and editions are produced annually.

Learning more

Start at Wikipedia: Web page.

Dive in to W3Schools.

Image cc by-sa Fitriani2 via Wikimedia Commons.