|
Some people believe that they should create different pages for different search engines, each
page optimized for one keyword and for one search engine. Now, while I don't recommend that
people create different pages for different search engines, if you do decide to create such
pages, there is one issue that you need to be aware of.
These pages, although optimized for different search engines, often turn out to be pretty
similar to each other. The search engines now have the ability to detect when a site has created
such similar looking pages and are penalizing or even banning such sites. In order to prevent
your site from being penalized for spamming, you need to prevent the search engine spiders from
indexing pages which are not meant for it, i.e. you need to prevent
AtlaVista from indexing pages meant for
Google and vice-versa. The best way to do
that is to use a robots.txt file.
You should create a robots.txt file using a text editor like Windows Notepad. Don't use your
word processor to create such a file.
Here is the basic syntax of the robots.txt file:
User-Agent: [Spider Name]
Disallow: [File Name]
For instance, to tell AtlaVista's spider,
Scooter, not to spider the file named myfile1.html residing in the root directory of the server,
you would write:
User-Agent: Scooter
Disallow: /myfile1.html
To tell Google's spider, called Googlebot,
not to spider the files myfile2.html and myfile3.html, you would write:
User-Agent: Googlebot
Disallow: /myfile2.html
Disallow: /myfile3.html
You can, of course, put multiple User-Agent statements in the same robots.txt file. Hence, to
tell AtlaVista not to spider the file
named myfile1.html, and to tell Google not
to spider the files myfile2.html and myfile3.html, you would write:
User-Agent: Scooter
Disallow: /myfile1.html
User-Agent: Googlebot
Disallow: /myfile2.html
Disallow: /myfile3.html
If you want to prevent all robots from spidering the file named myfile4.html, you can use the *
wildcard character in the User-Agent line, i.e. you would write:
User-Agent: *
Disallow: /myfile4.html
However, you cannot use the wildcard character in the Disallow line.
Once you have created the robots.txt file, you should upload it to the root directory of your
domain. Uploading it to any sub-directory won't work - the robots.txt file needs to be in the
root directory.
For more info on robots.txt files, please visit
robotstxt.org.
Now we come to how the robots.txt file can be used to prevent your site from being penalized for
spamming in case you are creating different pages for different search engines. What you need to
do is to prevent each search engine from spidering pages which are not meant for it.
For simplicity, let's assume that you are targeting only two keywords: "tourism in Australia"
and "travel to Australia". Also, let's assume that you are targeting only three of the
major search engines:
AtlaVista,
HotBot and
Google.
Now, suppose you have followed the following convention for naming the files: Each page is named
by separating the individual words of the keyword for which the page is being optimized by
hyphens. To this is added the first two letters of the name of the search engine for which the
page is being optimized.
Hence, the files for AtlaVista are:
tourism-in-australia-al.html
travel-to-australia-al.html
The files for HotBot are:
tourism-in-australia-ho.html
travel-to-australia-ho.html
The files for Google are:
tourism-in-australia-go.html
travel-to-australia-go.html
As I noted earlier, AtlaVista's spider is
called Scooter and Google's spider is called
Googlebot.
A list of spiders for the Major Search Engines
can be found
Here.
Now, we know that HotBot uses
Inktomi and from this list, we find that
Inktomi's spider is
called Slurp.
Using this knowledge, here's what the robots.txt file should contain:
User-Agent: Scooter
Disallow: /tourism-in-australia-ho.html
Disallow: /travel-to-australia-ho.html
Disallow: /tourism-in-australia-go.html
Disallow: /travel-to-australia-go.html
User-Agent: Slurp
Disallow: /tourism-in-australia-al.html
Disallow: /travel-to-australia-al.html
Disallow: /tourism-in-australia-go.html
Disallow: /travel-to-australia-go.html
User-Agent: Googlebot
Disallow: /tourism-in-australia-al.html
Disallow: /travel-to-australia-al.html
Disallow: /tourism-in-australia-ho.html
Disallow: /travel-to-australia-ho.html
When you put the above lines in the robots.txt file, you instruct each search engine not to
spider the files meant for the other search engines.
When you have finished creating the robots.txt file, double-check to ensure that you have not
made any errors anywhere in it. A small error can have disastrous consequences - a search engine
may spider files which are not meant for it, in which case it can penalize your site for
spamming, or, it may not spider any files at all, in which case you won't get top rankings in
that search engine.
This
WebSite has a useful tool to check the syntax of your robots.txt file that will help you correct syntactical errors in the robots.txt file, it won't help you correct any logical
errors, for which you will still need to go through the robots.txt thoroughly, as mentioned
above.
This article may be re-published as long as the following resource box is included at the end of
the article and as long as you link to the email address and the URL mentioned in the resource
box:
Article by Sumantra Roy. Sumantra is one of the most respected and recognized search engine positioning specialists on the Internet. For more articles on search engine placement, subscribe to his 1st Search Ranking Newsletter by sending a blank email to mailto:1stSearchRanking.999.99@optinpro.com or by going to http://www.1stSearchRanking.net
|