Searching for a Needle in a Haystack

Tim Fountain explains how to get your Web site indexed on the popular search engines

After several weeks of slaving over a (insert the name of your favourite text editor here) window full of HTML, you've finally finished off your Web site and uploaded it to your Web space. You've now got a potential audience of hundreds of millions of people around the world, but how do you get them to end up at your site rather than at one of the other millions of sites out there?

About 85% of Web surfers find what they're looking for by using one of the top search engines. They enter a search term, the search engine looks through its database for the addresses of pages that have mentioned that term, and then presents the results in ranked order. Getting your site listed on search engines isn't very tricky; it's getting them listed highly that's the hard bit. If your site is listed 200th in the search results, it is unlikely many people will get to you; whereas, if your site is listed first, your site will probably be the first site that the searcher will visit. Research has shown that the top ten results receive 78% more traffic than those in positions 11 to 30, and the top thirty results get over 90% of the search traffic.

Firstly, there are two types of search services: search directories and search engines. The boundaries between the two are getting more and more blurred, as most search engines also have directory listings, and vice versa; but they operate quite differently, so it's an important distinction to make. Search directories (such as Yahoo), have a large database of Web sites, all split into categories. Directories are usually compiled by humans, whereas search engines (such as AltaVista) are largely automated.

To confuse the matter further, some ISPs also have their own search engines which are often re-badged versions of other services. These are just as important as conventional search engines, as they often act as portals for users of that ISP. A few months ago, one of my Web sites suddenly started getting several hundred hits a week from AOL users. I quickly discovered that this was because the site had appeared in AOL's search service (AOL Find), and was displayed second in the list of results when someone searched for "Sim City".

Search engine ranking

The key to getting listed highly on search engines is understanding how the ranking system works and tweaking your Web site accordingly. Because of the size of the Web and the diversity of its content, it is quite unlikely that your site is the only one on the topic, so you have to convince the search engine that your site is more relevant than the others.

There are three main factors that affect your search engine ranking (although obviously this varies from search engine to search engine):

1. Word positioning

If you were searching for a site on the same topic as yours, what words would you use? These are your keywords. These words should appear in key locations on your pages, especially the page title (<TITLE>Page title</TITLE>), and headings (<H1>...</H1>, <H2>...</H2>...). This is why you often see sites that will have titles such as "Acorn Arcade - the RISC OS gaming paradise" rather than just "Acorn Arcade". Sneaky, but it does make a difference.

Try to include keywords in headings and sub-headings; i.e. rather than saying "Introduction", say "An introduction to stamp collecting". Also be aware that search engines can't read text within graphics, so if you have a huge title image saying "Stamp collecting" across the top of all your pages, it won't make any difference to your search engine position. Many search engines can read images' ALT tags, though, so it's important to put them in.

2. Word frequency

The more times the keywords appear on your page, the higher your page will be listed when that keyword is used a search term. However, don't think that by repeating the keyword several hundred times at the bottom of your page you can cheat the system! If a search engine thinks you may be 'keyword spamming' (whether you're repeating a keyword several times, displaying it 'invisibly' by setting the text colour to be the same as the page background colour, or displaying it in very small text) then it will apply negative weighting or may even remove you from its index completely.

It also makes a difference whether the keyword is near the top of the page or near the bottom, as search engines assume that a word is likely to be more relevant if it is near the top of the page. Remember that search engines are reading the page source, not what you see in a browser; so if your page consists of a large table with the main text in a right-hand column, it appears much lower down in the page source.

3. Popularity

More and more search engines are using link popularity as an important factor in ranking. This means that sites which are linked to by lots of other sites on the Web will appear higher up in the list. For this reason it is a good idea to find a few other sites on the same topic as yours and ask them if they would link to you.

To find out how many sites are linking to yours, go to AltaVista and do a search for link:yoursiteaddress.com. For example, to see how many sites currently link to Acorn Arcade, you would search for link:acornarcade.com.

Meta tags

Meta tags aren't the be-all and end-all of getting your Web site ranked highly. Many Webmasters put META keyword tags on their Web site and then wonder why, when they go to a search engine and type in one of the keywords, their site isn't top of the list.

Most search engines only index meta keywords in the same way that they index the main text of a page, and some search engines (Excite, Google, Lycos and Northern Light) ignore meta keywords completely.

Meta descriptions are a little more useful. Search engines that support them will display your meta description on their search results page instead of the first sentence or two on the page (which is often image ALT text). For example, if The Icon Bar didn't have a meta description, it would appear on a search engine results page a little like this:

  1. The Icon Bar
    onepointnought About | Staff | Jobs Random pearl of wisdom: Using ADJUST...
    URL: http://www.iconbar.com/

Because it does have a meta description, though, it appears more like this:

  1. The Icon Bar
    News and resources for all things RISC OS
    URL: http://www.iconbar.com/

To include meta tags on your site, incorporate the following lines of HTML into the <HEAD>...</HEAD> section of your HTML:

Keywords are case-insensitive so it's easiest just to use lower case. Don't repeat keywords varying the case (e.g. RISC OS, risc os, Risc Os), as some search engines may regard this as keyword spamming. Don't include keywords that are not relevant to your site content (you'd be surprised how many sites contain the keyword "mp3") as you will be penalised for this.

Frames

One of the main problems with frames is that they are not well supported by search engines. Currently, the only search engines that will follow <FRAME> links are AltaVista, Google, Lycos and Northern Light. All the other search engines will only index the <NOFRAMES> content, so if you don't have any, your Web site will effectively be invisible to many search engines.

Quite often, when you're searching for something you'll see "sorry, you need a frames-compatible browser to view this site" listed as the description of many of the search results, as this is the default <NOFRAMES> content generated by Microsoft FrontPage.

Registering with search engines

Okay, now for the easy bit. Below is a list of direct links to the 'submit URL' pages of various search engines (correct at the time of writing). There are services on the Internet that will automatically submit your site to lots of search engines at once, but it's far better to do each one manually. It doesn't take very long anyway.

Once you've filled in the relevant details and hit the Submit button, the first thing the search engine will do is check that your site exists, so don't submit your site before you've uploaded it. For the search engines, at some point after this initial check (anything from a day to two months), a spider will visit your site and build up a list of keywords from that page. The spider will later return and follow the links on that page and index those pages. For the search directories, an editor will visit your site and see whether your site content matches the category you submitted your site for. If he/she has any doubts whatsoever, your application will be rejected.

On the list above, I'd say that the three most important to get registered with are AltaVista, Yahoo and the Open Directory Project (ODP). Yahoo is still one of the most popular search engines, but is also probably the most difficult one to get registered on as they only have a few hundred editors. The Open Directory Project has somewhere in the rejoin of 15,000 to 20,000 editors, each with an interest in the topic they are responsible for, so getting registered should be pretty easy providing your site has a lot of content. The ODP provides directory content for about 500 other search services (including AltaVista, Google, Netscape search and HotBot), so if you can get on the ODP you'll soon find yourself getting hits from all sorts of other search services, too.

Other things to remember

Not all search engines can read pages generated via CGI or a database, and quite a few of them have problems with unusual URLs (especially ones containing the ? symbol).

If your Web site moves, simply resubmit the old URL (which the search engine will check and, when it gets a 404 error, remove from its database), and then submit the new one. If you want to tell people of the new address, don't create a page which automatically forwards visitors to the new URL, as most search engines can't follow these links. Instead, just create a page with a normal link to the new address (an online example).

Registering your site with search engines can be frustrating, as you can wait and wait for your site to appear in a search engine like Yahoo and never see anything. However, a regular influx of new visitors is very useful (especially for business sites), so if at first you don't succeed, try and try again...