Wed Dec 26 14:03:24 PST 2018






121
You do not need to use a robots.txt file. By default, search engines will index your
site. If you do create a robots.txt file, it goes in the root level of your domain using
robots.txt as the file name.
This allows all robots to index everything:
User-agent: *
Disallow:
This disallows all robots to your site:
User-agent: *
Disallow: /
You also can disallow a folder or a single file in the robots txt file. This disallows a
folder:
User-agent: *
Disallow: /projects/
This disallows a file:
User-agent: *
Disallow: /cheese/please.html
If you make a robots.txt user-agent command for a specific search engine (e.g.
User-agent:Googlebot
the associated search engine will ignore the more general
rules located in the section for all search engines (User-agent: *
.
One problem many dynamic sites have is sending search engines multiple URLs
with nearly identical content. If you have products in different sizes and colors, or
other small differences, it is likely that you could generate lots of near-duplicate
content, which will prevent search engines from fully indexing your sites.
If you place your variables at the start of your URLs, then you can easily block all
of the sorting options using only a few disallow lines. For example, the following
would block search engines from indexing any URLs that start with ?cart.php?size?
or ?cart.php?color?.
User-agent: *
Disallow: /cart.php?size
Disallow: /cart.php?color
Notice how there is no trailing slash at the end of the above disallow lines. That
means the engines will not index anything that starts with that in the URL. If there
were a trailing slash, search engines would only block a specific folder.
If the sort options were at the end of the URL, you would either need to create an
exceptionally long robots.txt file or place the robots noindex meta tags inside the
sort pages. You also can specify any specific user agent, such as Googlebot, instead
of using the asterisk wild card. Many bad bots will ignore your robots txt files
and/or harvest the blocked information, so you do not want to use robots.txt to
block individuals from finding confidential information.


No comments:

Post a Comment

Featured Post

Tue May 18 12:07:08 CDT 2021