Sunday, September 09, 2007

Did Ancestry Violate Copyright Law? . . . . Part 2 of 4

Before we get to the heart of the legal analysis, here are some additional facts which may be legally significant. They were provided in the Comments to yesterday's post by Janice Brown of Cow Hampshire. Janice first called my attention to this issue in late August.

Ancestry also provided an option (to subscribers only, and even after IBC became "free") to click and save the cached page to their "Shoebox"--a holding area of documents that subscribers are interested in.

Also, the initial Ancestry.com source description calls the IBC a "database-online," not a search engine . . . .

Janice is correct about these additional facts and we will analyze their legal significance.

Janice also writes:

Also, there were several people who argued in commentary on various blogs and message boards that we, as bloggers and web sites owners, should have known that Ancestry would be doing this, due to various announcements and press releases they made, and the burden was on each of us to place a robots.txt file or some sort of HTML coding to prevent Ancestry.com from caching our sites. Is the burden truly on the blogger or web site owner, even if they are not commercial (i.e., the "mom and pop" web sites and blogs).

We'll explore what the courts have to say about this issue as well. At the end of the series, I'll have some suggestions for copyright owners.

I should point out to all readers that this remains an unsettled and evolving area of law; this ride may prove a bit frustrating at times. Now on with the show . . . .

Field v. Google, Inc., 412 FSupp2d 1106 (D.Nev. 2006) [the link is to a PDF version of the court's Order], is the case that was cited by most commentators and bloggers concerning the Ancestry IBC issue. They opined that the outcome of that case likely would dictate the rule of law applicable to the IBC issue. My preliminary reaction was that since Field is a decision of a trial court, the lowest level of the federal judiciary, no other court is obligated to follow it; and second, there are some unique facts in this case that may have had an influence on the outcome.

Blake Field is a lawyer in Nevada. He's also a poet. Field was familiar with Google's search and caching processes. With this knowledge, according to the court, "Field decided to manufacture a claim for copyright infringement against Google in the hopes of making money from Google's standard practice." [412 FSupp2d at 1113]. In January 2004, Field created fifty-one works and put them on a website, accessible for free. He also created a "robots.txt" file for his site because he wanted search engines to visit his site and include the site within their search results. The court notes that "Field knew that if he used the 'no-archive' meta-tag on the pages of his site, Google would not provide "Cached" links for the pages containing his works." [412 FSupp2d at 1114] So, he consciously chose not to use the "no-archive" meta-tag on his Website.

As Field intended and expected, the "Googlebot" visited his site, and indexed and cached its pages. Thereafter, each of Field's pages was retrieved from Google's cache by some individual or individuals.

Field sued Google for copyright infringement. "Field allege[d] that Google directly infringed his copyrights when a Google user clicked on a "Cached" link to the Web pages containing Field's copyrighted works and downloaded a copy of those pages from Google's computers." [412 FSupp2d at 1115; emphasis added] Field did not allege that Google infringed his copyrights when the Googlebot initially copied his pages and stored then in the system cache.

Following established legal precedent, the court pointed out that for copyright infringement, a plaintiff must show ownership by the plaintiff, and copying by the defendant. Furthermore, the copying must the result of a volitional act on the part of the defendant. [CoStar Group, Inc. v. LoopNet, Inc., 373 F.3d 544, 555 (4th Cir.2004)].

Applying the law to the facts, the court ruled in favor of Google. The court said, "[W]hen a user requests a Web page contained in the Google cache by clicking on a 'Cached' link, it is the user, not Google, who creates and downloads a copy of the cached Web page. Google is passive in this process." [412 FSupp2d at 1115] In other words, the court found no volitional act on the part of Google when a user accesses its system cache.

There's more to the Field case, certainly. And certainly, it doesn't answer questions such as whether the user can be sued for copyright infringement; whether Google is liable for infringement for the actions of its bot; and others. But let's stop here for a moment and examine how the law would apply to Ancestry.

Presumably, the path leads in the same direction. That is, when a user clicked on the relevant link in the IBC, Ancestry would be "passive" in that process and thus there would be no infringement by Ancestry when users requested information from the IBC.

But a couple of facts seemed important to the court in reaching this conclusion. First, the court pointed out that pages retrieved from Google's cache contain a "conspicuous" disclaimer that the cached page is not the "original" and that there are two separate links to the current page. It is not clear, or certainly was not clear at the outset, that Ancestry's IBC would operate in that manner. Second, the court examined the purposes of Google's cache. For example, "Google's 'Cached' links allow users to view pages that the user cannot, for whatever reason, access directly." As to the IBC, while it was behind Ancestry's paid subscription wall, this was true only for paid subscribers. Additionally, Google's cache enables users to determine how a Web page may have been altered over time as well as to determine more quickly whether and where a search query appears and thus whether the page is germane to the user's query. It is not at all clear that Ancestry's IBC would operate in this manner. Recall that Ancestry began calling it a "search engine" only after the negative initial response. We do not now know Ancestry's true intent at the outset of this project or what would have happened had they chosen to press ahead despite the negative reaction. [These are matters that we might be able to discover through various procedures if litigation had been commenced].

Back to the Field Case: Google asserted several defenses to Field's claim. First, Google asserted that Field had granted it an implied license to use his content. The law on this matter is that a copyright owner may grant a nonexclusive license expressly or impliedly through conduct. Melville B. Nimmer & David Nimmer, Nimmer On Copyright, vol. 3, section 10.03[A] (1989) An implied license can be found where the copyright holder engages in conduct from which the other party may properly infer that the owner consents to his use. The United States Supreme Court endorsed this rule in the 1927 patent case of De Forest Radio Telegraph & Telephone Co. v. United States, 273 U.S. 236. Consent to use a copyrighted work may be based on the copyright holder's silence where the copyright holder knows of the use and encourages it.
Recall that Field knew that had he placed a "no archive" meta-tag on the pages of his Web site, Google would have known not to display "Cached" links to his pages. Nonetheless, Field specifically chose not to include the no-archive meta-tag on his site, knowing that Google would interpret this absence as permission to allow access to the pages via "Cached" links. The court said: "Thus, with knowledge of how Google would use the copyrighted works he placed on those pages,and with knowledge that he could prevent such use, Field instead made a conscious decision to permit it. His conduct is reasonably interpreted as the grant of a license to Google for that use." [412 FSupp2d at 1116]

Does this ruling in the Field case mean the burden is always on the copyright holder to preemptively fend off those crawling or scavenging the Web for copyrighted material? Consider that the inclusion of a "no archive" meta-tag or the appropriate "robots.txt" file is relatively simple for the content owner while as the court said "Given the breadth of the Internet, it is not possible for Google (or other search engines) to personally contact every Web site owner to determine whether the owner wants the pages in its site listed in search results or accessible through 'Cached' links." [412 FSupp2d at 1112]

On the other hand, a copyright owner should have the right to choose which "distributors" or search engines the copyright owner wishes to grant a license. This would require knowledge of the use to which the other party intended to make of the copyright holder's content, as the Field court said. In the case of Ancestry's IBC, no content owner knew in advance that Ancestry would make such use of their content.

On this last point, some have referred to The Generations Network's Terms and Conditions, specifically this provision:

User provided content
Portions of the Service will contain user provided content, to which you may contribute appropriate content. For this content, Ancestry is a distributor only. By submitting content to Ancestry, you grant MyFamily.com, Inc., the corporate host of the Service, a license to the content to use, host, distribute that Content and allow hosting and distribution of that Content, to the extent and in that form or context we deem appropriate. Should you contribute content to the site, you understand that it will be seen and used by others under the license described herein. You should submit only content which belongs to you and will not violate the property or other rights of other people or organizations. MyFamily.com, Inc. is sensitive to the copyright of others.

In my view, nothing in that provision puts one on notice that Ancestry.com would use robots to crawl the Web in a manner similar to Google or other search engines. Indeed, the choice of the verbs "submit" and "contribute" suggest more than a passive or silent consent to use content.

Recall that Mr. Field set out to get Google to use his content so he could sue them for infringement!

But, one additional point on the responsibility of content owners to protect their content: the court points out that the use of meta-tags has been an industry standard "for years." I can see a court in a future case using this fact to hold Web publishers responsible to protect their content by communicating their preferences to Web crawlers.

The "Estoppel" Defense: Google put forth (successfully) a defense to copyright infringement known as "estoppel." This means that: (1) the content owner knew of the allegedly infringing conduct; (2) the content owner intended that the alleged infringer should rely on the content owner's conduct or acted in such a way that the alleged infringer had a right to believe it was so intended; (3) the alleged infringer was ignorant of the true facts; and (4) the alleged infringer relied on the content owner’s conduct to its detriment.

Put plainly, this means, for example, that the content owner acted in a manner to lead the alleged infringer to believe that the content owner did not object to the alleged infringing conduct and in reliance on that, the alleged infringer went ahead with the conduct.

In the Field case, the success of this defense has much to do with Mr. Field's (dishonest) conduct. But this defense could succeed where there is no dishonest conduct. For example, this morning, I discovered a rather new site called Blogoholix. It purports to be a "blog search engine." There is a note on the main page which says "es.blogoholix.com is a blog search engine in development. The tech and design work is still in progress, so please send an e-mail to info@blogoholix.com if you have any suggestions on how to improve the site." I found GeneaBlogie on that site. Suppose with that knowledge and the knowledge that I can prevent my blog from showing there, I do nothing, and the owner of that site continues to crawl my blog. I think a court following the Field reasoning would say that my silence is conduct that they are entitled to rely upon.

Well, that may be enough law for today. Tomorrow in Part 3, we'll explain fair use and the Digital Millenium Copyright Act. In Part 3, we'll take a very specific look at Ancestry's IBC. After that, we wrap this up with Part 4 and some conclusions and suggestions.

Part 1 can be found here.

TOMORROW: Fair Use and The Digital Millenium Copyright Act Meet Ancestry.com

Notice: The information in this writing is intended for educational use only and is not intended nor should it be construed as legal advice. If you have a legal problem, consult a lawyer admitted to practice in your state of residence. I am an active member of the bar of the State of California and am admitted to practice before the United States Supreme Court and various other federal courts. I am not licensed to practice in any other state. I am not presently soliciting or accepting new clients in the matters discussed above.

14 comments:

Terry Thornton said...

Craig, Thanks for your attempt to bring us all up to speed on this important issue. I look forward to your "Fair Use" discussion tomorrow.

It is shameful that a large for-profit organization was taking work from others without asking and without notice and making a profit from that sleazy activity. And it is even more shameful that when challenged, they started offering those stolen goodies "free" to all without so much as a thank you to the individuals whose creative efforts had put it together.

We know that the wolf is already in the chickenhouse --- but what to do? I appreciate your calm and educated analysis of this situation; I'm learning more than I ever needed to know about property rights in the modern age.

Thanks for this series of articles.

Terry Thornton
Hill Country of Monroe County Mississippi

Terry Thornton said...

Craig, Thanks for your attempt to bring us all up to speed on this important issue. I look forward to your "Fair Use" discussion tomorrow.

It is shameful that a large for-profit organization was taking work from others without asking and without notice and making a profit from that sleazy activity. And it is even more shameful that when challenged, they started offering those stolen goodies "free" to all without so much as a thank you to the individuals whose creative efforts had put it together.

We know that the wolf is already in the chickenhouse --- but what to do? I appreciate your calm and educated analysis of this situation; I'm learning more than I ever needed to know about property rights in the modern age.

Thanks for this series of articles.

Terry Thornton
Hill Country of Monroe County Mississippi

Anonymous said...

Craig,

I stumbled onto your blog by complete accident tonight. In fact, I think Google sent me a "News alert" related to "copyrights". It was like an anonymous "true confession" e-mail from my point of view.

I am very involved with the same subject matter you write about each and every day ... and I'm not a lawyer. Unless you count the school of hard knocks, that is!

In fact, my small graphic arts content development company here in Virginia (www.imageline2.com)spends more time trying to stop the infringement of our property than we are able to spend developing new content and selling. It is a crying shame!

The Field case is a bad example because it appears this gentleman set out to deceive folks to make his point. However, the defenses used by Google in this case, and others, are a complete joke, and I , for one, am shocked that these federal judges (even at the trial level) would accept such nonsense.

Google's image search engine directly infringes copyrights routinely and willfully. I'm not talking about indirect infringement as Perfect 10 tried to claim. I'm talking about DIRECT infringement.

Google makes a copy of our copyright-registered clip art illustrations all the time. And we have never given them permission ... direct or implied.

In fact, even after infringing web site publishers remove the infringing images after notice from Imageline, they often remain on the Google servers. They are displayed, accessed, copied, and delivered from the Google servers, not anyone else's, as Google tries to falsely proclaim.

The infringed images end up on web sites, in e-mails, as backgrounds, for screensavers, and as icons ... all over the world ... and all because Google willfully infringes them.

"That is the business we are in, Google. Not you. Don't give me any of this nonsense about crawlers, meta-tags, spiders, and such .. save that for your engineering meetings and gourmet lunch talk."

Google even continues to sell advertising on the infringing web sites long after notice has been given. They refer people to this advertising by using the images they have pirated from Imageline as an enticement.

As long as the courts don't hold companies like Google more accountable, no small copyright owner stands a chance with the way things are now going.

Can't wait to read your "Fair Use" and DMCA blogs tomorrow. I have never in my life seen anyone try to mislead the judiciary and the general public the way Google is now doing on these two subjects either.

Imageline, for one, intends to do something about this. Keep you eye on our web site over the next few months.

And keep up the awareness. You are obviously a very sharp guy.

George P. Riddick, III
Chairman/CEO
Imageline, Inc.

griddick@imageline2.com

Anonymous said...

Wow, you people are amazing you just can't stop bloggin about this.
Google does this every day and we all love them for it. Ancestry.com does it and there is a firestorm of contempt.
The perplexing part is this - when I get google search results I have to use caution with every link I click - genealogy or not. If not careful you could end up on a site you don't want to be on - horrible content.
With Ancestry they are narrowing your potential search results - giving you more liklihood of finding what you are looking for and helping eliminate so many 'other' things that could be way off the mark. Seriously folks Ancestry was doing a good thing!
Kendall - bring it back!

NevadaGenealogist said...

Thank you for your thoughtful and informative analysis of this vital topic. I too look forward to the next posts.

Janice said...

Craig,

Thank you for Part II, and for your patience with my additional questions.

So far your presentation is very clear, and finally I am starting to understand the Field vs Google ruling that has been so greatly but erroneously used on various message boards to justify Ancestry.com's actions.

I'm anxiously awaiting Part III.

Janice

Anonymous said...

Craig, I have noticed in many blogs and message boards comments similar to the one that Terry Thornton posted to yours, i.e, that The Generations Network (TGN) provided no notice that it was caching Internet content. That is incorrect.

Back on 21 October 2006 (almost a year ago), a blogger at a site called Cleverhack noted that TGN had a bot crawling the web. http://cleverhack.com/2006/10/21/575/

How did Cleverhack find this out? Cleverhack found it out from his server log, which he excerpts in his blog post. From this, I assume, as is standard industry practice, that everytime the bot crawled a website, it left behind, in the server log, a notice that the site had been crawled, and the URL for where to go to for more information. Cleverhack provided this URL in his blog post, and the URL is still live: http://www.ancestry.com/learn/bot.aspx

Note that this webpage is on the Ancestry.com site, and it provides the following information:
[quote]
The MyFamilyBot Information Page:

What is MyFamilyBot? Why is it accessing my files?:
MyFamily is creating an index based on a powerful person-based biographical ranking engine that gives superior results over searches done using the more general purpose internet search engines. Ancestry.com indexes the biographic text and provides a search service that points users back to the originating website.

MyFamilyBot is the name of a web crawler (a.k.a. robot, spider) used by MyFamily.com to find biographical text on the internet in connection with this engine. The crawler works by deeply crawling sites that contain biographical text. We have constructed the bot to limit its affect on site usage to be within the range of that of the large commercial search engines. Sites that do not contain biographical text are examined in a superficial manner.

How do I prevent MyFamilyBot from crawling my site?
MyFamilyBot supports the internet standard protocols for restricting spiders from crawling web sites. These protocols are described here:
http://www.robotstxt.org/wc/exclusion.html

How can I contact someone concerning MyFamilyBot?
Please send questions and concerns about MyFamilyBot to SearchBot@MyFamilyInc.com.
[/quote]
One of the most widely read genealogical bloggers, Chris Dunham (aka The Genealogue) picked up on Cleverhack's information. As I recall Dick Eastman and Leland Metzler posted links to Dunham's blog on their blogs. A Jewish genealogy website also noted the existence of the TGN bot, although I am not sure of the chain of causation in their case. Just do a Google search for "Genealogue bot" and you will see all the links.

The point I am trying to make is that, contrary to popular opinion, TGN provided clear notice in the server logs of each site that it crawled what it was doing and what the site operator needed to do in order to not have his/her site crawled. Providing such notice via server logs is the standard industry practice for how such notice is to be provided. I suspect those who are crying the loudest that they were not informed are those who have never in their life ever looked at their server logs. It is well known that doing so is an important thing to do regularly: http://websecrets.biz/page-305.html

So what about those people who posted genealogical information to a site for which they did not have access to the root directory and thus could not set up a robots.txt file? Well, from my point of view, no one HAD to post the information to a site that they did not have administrative rights to. It is extremely cheap to have your own site. Just as with a house, if you are renting, you cannot do a lot of the things that you can do if you own. People who posted content to sites they were just "renting" should have known that they would not have the ability to have total control over that informtion thereafter.

Anonymous said...

I concur with the others who have commented. Your analysis is concise and easy to understand. A most excellent read on a timely subject. I think this would make a most excellent article in a genealogical magazine. Some very interesting reading indeed.

Bob

Anonymous said...

Craig, you might find this web page on Ancestry.com's corporate website useful to your analysis:
http://www.tgn.com/default.aspx?html=copyright

Craig Manson said...

Thanks for your info about TGN's webcrawling bot. This is very important to know because as I've said, the legal outcomes depend on the facts.

Terry Thornton said...

Craig, Am I reading correctly "Anon's" statement that the sending of a bot/spider/crawling thingamodo into someones copyrighted work is "giving notice" --- hogwash!

If materials collected through the back door by a spying-like device sent to collect data without the up-front permission of the copyright holder is "fair use" then god help us all. The wolf is beyond the chickenhouse --- he is attacking the very fabric of property rights!

Terry Thornton
Hill Country of Monroe County, Mississippi

Charley "Apple" Grabowski said...

Craig, Thanks for your hard work on this and for keeping it easy enough for me to follow. As for the comment about the cleverhack blog and the reference to his server log - many of us use blogger or other blog hosts. Do we have access to this server log? If not does that make a difference?

Anonymous said...

Craig, one point I hope you can clarify, recent stories in the popular press lead me to believe that courts have ruled that merely placing a copyrighted work, in cases involving musical recordings, on a web site with free access constitutes copyright infringement. So there is seems to be an additional test regarding volition, and I suspect it speaks to a difference between Google and Ancestry. Google is very analogous to an ISP in that it is not in the content business. Ancestry on the other hand is taking content for the purpose of publishing it on their own site. One is a side effect of providing search services and has other uses, the other is appropriation for the purpose of publishing. The discussion on "transformational" use in Field seems to me to be quite key to understanding the difference in the two uses.

Anonymous said...

Lindsay is misinformed in saying that Google is not in the Content business. They have digitized and placed online on their site thousands of books and hundreds of videos.