|
When optimizing your web site most webmasters dont consider using the robot.txt
file. This is a very important file for your site. It let the spiders and crawlers
know what they can and can not index. This is helpful in keeping them out of folders
that you do not want index like the admin or stats folder or content that they
can not index.
Here is a list of variables that you can include in a robot.txt file and there
meaning:
1) User-agent: In this field you can specify a specific robot to describe access
policy for or a * for all robots more explained in example.
2) Disallow: In the field you specify the files and folders not to include in
the crawl.
3) # the number sign represents comments
Here are some examples of a robot.txt file for redball.com
User-agent: *
Disallow:
The above would let all spiders index all content.
Here another
User-agent: *
Disallow: /cgi-bin/
The above would block all spiders from indexing the cgi-bin directory.
User-agent: googlebot
Disallow:
User-agent: *
Disallow: /admin.php
Disallow: /cgi-bin/
Disallow: /admin/
Disallow: /stats/
In the above example googlebot can index everything while all other spiders
can not index admin.php, cgi-bin, admin, and stats directory. Notice that you
can block single files like admin.php.
About the Author:
Jimmy Whisenhunt is the owner of VIP Enterprises |