Keith writes:
This might be of interest in relation to your post earlier regarding the National Newspapers of ireland taking issue with search engines ‘pirating’ their content.
All the major search engine crawlers will obey a file called robots.txt where you can set rules as to what they are allowed to index.
All of the major newspaper can make use of this file and you will be surprised to know all the major Irish newspapers explicitly allow search engines to index their content.
If they don’t want links to their content showing up in search results all they have to do is add the following line to the file and all will be sorted,
User-agent: *
Disallow: /
But why would they do so as most of their traffic is from search engines?
Oh.
Earlier: The Dead Tree Trolls

I work in SEO and it baffles me that they wouldn’t want to be crawled by google, especially as they’re closing off a large revenue stream in the process.
OK class…. typo test.
How many typos in this sentence?
“All of the major newspaper make use this file and you will be surprised to know that the all explicitly allow search engines to index their content.”
3?
Should be:
“All of the major newspaper[s] make use [of] this file and you will be surprised to know that the[y] all explicitly allow search engines to index their content.”
Sorry. Fixed now. Thanks.
I like it better when posts appear as if they were made while on the sauce. :D
Clearly the original post was from an Irish journalist, see any of our papers for hundreds of further examples. Do we not have sub-editors in this country?
Nope, we don’t have sub-editors anymore. We also don’t appear to have folks that can use a spell checker either. Bad journos, bad.
Should it not be, “surprised to learn” as we either know or don’t know something but we can learn things? Just sayin’
This is true and anyone can test it. Just add /robots.txt to the end of most URLs, i.e.
http://www.irishtimes.com/robots.txt
Theres very often some interesting information disclosure in those files, which can be used for “security testing” purposes.
i’ve never quite understood why the print interst when publishing their content on-line haven’t bothered to fund a service along the files of turnitin.com but modifiied so that it would ensure that the original source creator was always the first to appear in search results meaning that no one could then ‘steal’ their content or the traffic associated with it. Or could it be because a considerable of the content in the press is lifted from elsewhere whether online or simply less prominent sources.
american mainstream media just make stuff up. more profitable. and no one cares anymore.
Such as the following?
http://www.irishexaminer.com/robots.txt
User-agent: *
Disallow: /admin/
It’s not linking in the sense of creating a hyperlink to a site. The issue seems to be that the papers don’t want aggregators taking the text from their sites and using it elsewhere. Such as a page that doesn’t have their ads on it.
In my head that’s the same as them wanting to charge people for reading the headline of a print newspaper in a queue at Tescos.
If I’m interested, I’ll by it. If not, I’ll leave it.
News sites seems to want to charge us for not being interested enough in their content to actually want to read it.
The printed press does have a legitimate issue in that with more and more people able to get free news online that unless they out their content out online for free then no one will read it, but if it is there for free then they have little or no revenue stream. So long as someone have decent willing to publish news content online for free then all the others have to follow suit and that’s killing them. It does cost money to generate the content, the question is do we really need as much coverage as there is when so much of it is so similar.
Google News doesnt have any ads on it so its not like they are stealing ad revenue from the papers.
Exactly No Fun. I was in Easons earlier and browsed through lots of magazines while i was there. Next step, cctv cameras totting up what you have read and giving you a bill at the door.
I have been given out top by the odd cranky newsagent, ‘This is not a library.’
I think it would be great if online newspapers stopped being linked to by google.
Provided of course the newspapers in kind stopped using blog and forum posts as the inspiration and source quotes for their articles.
And that f**king ‘best of youtube’ segment on sky news.
Great post Broadsheet. Murdock complains about Google all the time, similarily to most news orgs. Last year, News Int blocked their papers from appearing in Google news. Once they realised how much traffic visiting their sites fell they unblocked them all. It’s a stupid argument from the papers. Just more crying because they dont make the large profits of the boom years. Without the use of news aggregators, I would not know about, nor visit, the majority of news sites that are out there.
I would not know about, nor visit, the majority of news sites that are out there.
this sentence is clearly a lie. do you spend your time looking at porn? most people use the internet as an information source and the best information is from news sites. if you don’t use news sites then what do you look at? Porn perhaps?
Porn news sites it is then. I’m off to peruse page 3 now. All the page 3s.
That’s only part of the sentence. You left out “without the use of news aggregators”. They help to quickly find news you want to read.
You should read the comment properly first before commenting, Though thanks for letting us know what you look at on the net when you time on your hands!
Hang on there’s a copy of the S*n in that picture.
That’s not even a newspaper.