In February 2010, techy web magazine Read Write Web published an article entitled Facebook Wants to Be Your One True Login. The article was about a new partnership between Facebook and AOL aimed at making it easier for users to log in to Facebook. Which is quite ironic considering what happened next.
Soon after it was published, the article somehow managed to reach number one in Google for the search term Facebook log in. So when thousands of people started landing on this Read Write Web page, believing it was the Facebook log in page, they proceeded to log in! Of course, they couldn't log in because it wasn't the Facebook site they were on! Comments at the time were extraordinary, with users incredulous that their access details didn't work on the Read Write Web log-in page.
Who's at fault?
What's wrong with this situation? Who's fault is it that they did a search for Facebook log in and arrived on a site that wasn't the Facebook log in page? Some argued that Google was to blame because these people were already familiar with logging into Facebook in this way and because the Google results changed, they weren't to know better? Others argued that the blame lies with the individuals in question because they never carried out the most basic security check... to check the URL.
They didn't check the URL?
Evidently not. Hello? Phishing scams?
What is a URL?
URL stands for Uniform Resource Locator, or in other words, the web address of an online resource, i.e. a web site or document.
Web browsers display the URL in the address bar
Every website you visit has a URL. The following URL points to this article: https://doepud.co.uk/anatomy-of-a-url
.
The URL is created in one of two ways
- after clicking a link in a web page, bookmark or email
- by typing the URL directly into the address bar
♫ The protocol's connected to the domain name and the domain name's connected to the file path... ♫
Using the URL of this article as an example, the three basic parts of a URL you should understand are the protocol, the domain name and the path.
And, based on the example URL from Matt Cutts' URL definitions, here's an example of a more complex-looking URL:
URL anatomy explained
- Protocol
- The protocol declares how your web browser should communicate with a web server when sending or fetching a web page or document. The most common protocol is
http
which stands for Hypertext Transfer Protocol. - Another common protocol is
https
which stands for Hypertext Transfer Protocol Secure. You'll see this on secure pages, like shopping sites and log in pages. If you're visiting a site where you need to enter sensitive information, like bank details and passwords, make sure the protocol is declared ashttps
. This means your web browser encrypts any information you provide so it can't be understood by any phishers who try to intercept the page during transfer. - Some protocols you're less likely to see include
ftp
(File Transfer Protocol)pop
(Post Office Protocol),smtp
(Simple Mail Transfer Protocol) andimap
(Internet Message Access Protocol). If you want to know more here's a list of protocols. - Subdomain
- A subdomain is a subdivision of the main domain name. For example,
mail.doepud.com
andcalendar.doepud.com
are subdomains of the domain namedoepud.com
. - Domain name
- A domain name is a unique reference that identifies a web site on the internet, for example
doepud.co.uk
. A domain name always includes the top-level domain (TLD), which in Doepud's case isuk
. Theco
part is shorthand for commercial and combined.co.uk
is called a second-level domain (SLD). - Port
- The port number is rarely visible in URLs but always required. When declared in a URL it comes right after the TLD, separated by a colon. When it's not declared and in most cases where the protocol is http, port 80 is used. For https (secure) requests port 443 is used.
- Read more about port numbers in URLs.
- Path
- The path typically refers to a file or directory on the web server, e.g.
/directory/file.php
. - Sometimes the file name won't be specified, e.g.
https://doepud.co.uk/blog/
so a web browser will automatically look inside the/blog/
folder for a file calledindex
ordefault
. If neither can be found, a 404 Not Found error will usually be returned by the server. - Query
- A query is commonly found in the URL of dynamic pages (ones which are generated from database or user-generated content) and is represented by a question mark followed by one or more parameters. The query directly follows the domain name, path or port number.
- For example, have a look at this URL which was generated by Google when doing a search for the word URL:
http://www.google.co.uk/search?q=anatomy+of+a+url&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-GB:official&client=firefox-a
. - The query part is
?q=anatomy+of+a+url&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-GB:official&client=firefox-a
. - Parameters
- Parameters are snippets of information found in the query string of a URL. With reference to the Google query above, the parameters follow the question mark and are separated by an ampersand (&) character so they can be understood individually and used to display content on that page. The parameters are:
q=anatomy+of+a+url
ie=utf-8
oe=utf-8
aq=t
rls=org.mozilla:en-GB:official
client=firefox-a
- Fragment
- A fragment is an internal page reference, sometimes called a named anchor. It usually appears at the end of a URL and begins with a hash (#) character followed by an identifier. It refers to a section within a web page.
Further reading
- Uniform Resource Locator - Wikipedia reference
- URL shortening - making URLs shorter