Welcome to MobyThreads.com!
FAQFAQ      ProfileProfile    Private MessagesPrivate Messages   Log inLog in
All support for the MobyThreads Threaded phpBB MOD can now be found on welsolutions at this forum

robots.txt

 
   Web Hosting and Web Master Forums (Home) -> Apache RSS
Next:  Accessing files outside of chroot  
Author Message
taylorjo

External


Since: Jan 22, 2005
Posts: 1



(Msg. 1) Posted: Sat Jan 22, 2005 10:35 pm
Post subject: robots.txt
Archived from groups: alt>apache>configuration (more info?)

Both urls use /var/www/html. I want /var/www/html/robots.txt to work for ccl.flsh but not for compcanlit. Is this doable? Should I put it elsewhere and create an alias or a redirect?

<VirtualHost *>
ServerName ccl.flsh.usherbrooke.ca
DocumentRoot /var/www/html
</VirtualHost>
<VirtualHost *>
ServerName compcanlit.usherbrooke.ca
DocumentRoot /var/www/html
</VirtualHost>

--
John Taylor-Johnston
-----------------------------------------------------------------------------
°v° Bibliography of Comparative Studies in Canadian, Québec and Foreign Literatures
/(_)\ Université de Sherbrooke
^ ^ http://compcanlit.ca/

 >> Stay informed about: robots.txt 
Back to top
Login to vote
fatkinson

External


Since: Jul 11, 2003
Posts: 111



(Msg. 2) Posted: Sat Jan 22, 2005 11:35 pm
Post subject: Re: robots.txt [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

If I understand you correctly, this is what I think you want
to do. You want to have two domains going to the same subdirectory on
your Webserver. You want to have two separate robots.txt files, one
for each domain (http://www.domain1.com and <a style='text-decoration: underline;' href="http://www.domain2.com" target="_blank">http://www.domain2.com</a>).

  Well, you can't do it. The robots.txt file for each domain
must be in the main directory for the site. There can't be two
robots.txt files in the same directory.

  What you can do is have one file that blocks each subdirectory
separately. Example: <a style='text-decoration: underline;' href="http://www.domain1.com/myfirstpage" target="_blank">http://www.domain1.com/myfirstpage</a> and
<a style='text-decoration: underline;' href="http://www.domain2.com/mysecondpage." target="_blank">http://www.domain2.com/mysecondpage.</a>

  Make these entries in your robots.txt file and you will
prevent both pages from being scanned by the spiders:

   User-agent: *
   Disallow: /myfirstpage/
   Disallow: /mysecondpage/

  If I've misunderstood you, please understand it's late and I'm
a little tired.

  If this helps you, then I'm glad.

  There is a lot of information about robots.txt at
<a style='text-decoration: underline;' href="http://www.robotstxt.org." target="_blank">http://www.robotstxt.org.</a>

  Regards,


    Fred<!-- ~MESSAGE_AFTER~ -->

 >> Stay informed about: robots.txt 
Back to top
Login to vote
kd6lvw

External


Since: Nov 02, 2003
Posts: 31



(Msg. 3) Posted: Sun Jan 23, 2005 10:35 pm
Post subject: Re: robots.txt [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Sun, 23 Jan 2005, John Taylor-Johnston wrote:
 > Hi,
  > > If I understand you correctly, ... , you can't do it. The robots.txt file for each domain
  > > must be in the main directory for the site.
 >
 > Maybe I was not clear enough?
 >
 > Alias will not work if I placed robots.txt in /var/www/elsewhere? Please see this example below.
 > My thinking was if robots.txt does not reside in /var/www/html compcanlit crawlers will not see it or be affected by it, but ccl.flsh crawlers will be forced to obey it.
 >
 > <VirtualHost *>
 > ServerName ccl.flsh.usherbrooke.ca
 > DocumentRoot /var/www/html
 > Alias /robots.txt /var/www/elsewhere/robots.txt
 > </VirtualHost>
 > <VirtualHost *>
 > ServerName compcanlit.usherbrooke.ca
 > DocumentRoot /var/www/html
 > </VirtualHost>
 >
 > (I know how to program robots.txt pretty much.)

That could work.

You could also make your robots.txt a dynamic file (i.e. a CGI result) if you
know how to program it directly (and do a redirect to a CGI-type file like
PHP).

However, I don't really see the point. Many if not most systems will only
disallow the cgi-bin directory(-ies) and perhaps some internals (e.g. the error
pages).<!-- ~MESSAGE_AFTER~ -->
 >> Stay informed about: robots.txt 
Back to top
Login to vote
Display posts from previous:   
   Web Hosting and Web Master Forums (Home) -> Apache All times are: Pacific Time (US & Canada) (change)
Page 1 of 1

 
You can post new topics in this forum
You can reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum



[ Contact us | Terms of Service/Privacy Policy ]