Friday, September 14, 2007

Some Final Thoughts on "Did Ancestry Violate Copyright Law?"

I think an analysis of the statutory "fair use" factors can lead to the conclusion that's "Internet Biographical Collection" as it was initially set up, did not constitute a fair use of the copyrighted material collected and used.

I think that Ancestry's IBC probably does not qualify for the system caching "safe harbor" for infringement in the Digital Millennium Copyright Act. Some of Ancestry's statements about the IBC suggest a certain permanence, not the "intermediate and temporary" caching that the law protects.

I think that issues of "fair use" and DMCA "safe harbor" as they relate to search engine caching have not yet had a full examination by the courts. I think that reliance on the few decided cases leaves a great deal of uncertainty, no matter how much the industry puts an optimistic face on it. I think that the United States Supreme Court will eventually decide these issues. The law yet may turn out to be as the industry wants it, but that's not where we are today.

The few decided cases on this matter seem to suggest that the copyright holder must take steps to keep the protected content from being collected by the various bots that roam the Internet. In Field v. Google, for example, the court held that Field's failure to utilize a "no-archive" meta-tag was the basis on which the court could find that Field had given Google an "implied license" to use his copyrighted content. This holding either turns the law on its head or shows how different copyright law is from other law or is an example of judicial value imposition.

Here's what I mean by the last sentence above. To say that a web publisher gives an implied licence to anyone who wants to take protected content if the publisher fails to use certain meta-tags or other technical means is akin to saying that one gives an implied license to a burglar if one's door isn't locked. That turns the law on its head. Or perhaps that's just how different copyright law is.

Actually, I think it's the third thing: judicially imposed values. By this I mean that judges have decided that there are salutary purposes served by the practices of companies like Google and that to enjoin them would constrain the economic growth of the Internet. This can be seen by the recitation by the courts of the "socially important purposes" served by Google, for example. [See Field v. Google, Inc., 412 FSupp2d 1106, 1119 (D.Nev. 2006)] In the nineteenth century, courts took a doctrinally similar approach to the development of law concerning railroads. Philosophically, I may agree with the notion that the law shouldn't hinder the development of the Internet, but some of the questions about how the law operates with respect to the Internet, are for the Congress, and not judges to decide. That's especially so when it appears that the judges got the intent of Congress wrong in the first place. [See the discussion of the DCMA in the post yesterday.]

===>So did Ancestry's IBC violate copyright law? Well, lawyers are infamously cautious . . . . If I were advising Ancestry [which I am not] and sticking to a careful reading of the law, I would tell them to take the IBC back the drawing board, because I would not be comfortable with the infringement risk that they took. It looks to me like they collected and archived and made avail to third parties content owned by other publishers. I would tell them not to rely on the Field case, the Parker case, or the Kelly case. I might tell Google the same thing were I advising them [which I'm not]. Lindsay said in the comments the other day:

Google is very analogous to an ISP in that it is not in the content business. Ancestry on the other hand is taking content for the purpose of publishing it on their own site. One is a side effect of providing search services and has other uses, the other is appropriation for the purpose of publishing.

As to Google's search engine, I think this is true.

====>Must content owners utilize meta-tags and "robot.txt" files to avoid giving an implied licence? [What follows is opinion, not legal advice] I know that an unlocked door is no defense to a burglar, but I still lock my doors. The problem is that the burden of using these technical tools is, in many cases, fairly minimal, whereas the courts believe the burden on the service provider like Google to communicate with each publisher is substantial, if not insurmountable. Now some content owners do not necessarily have access to the page source to insert such code,; they may have to ask their web-hosting services to help them out. If they won't, get another host. Now I happen to believe that the Supreme Court may modify this emerging rule to some extent, but until they do, perhaps "safe rather than sorry" is a good idea.

====>Does Ancestry's Terms and Conditions of Use protect it in this matter? I think not.

====>What is the significance of "notice" of the MyFamilyBot? At some point, added the following page to its site:

The MyFamilyBot Information Page:

What is MyFamilyBot? Why is it accessing my files?:

MyFamily is creating an index based on a powerful person-based biographical ranking engine that gives superior results over searches done using the more general purpose internet search engines. indexes the biographic text and provides a search service that points users back to the originating website.

MyFamilyBot is the name of a web crawler (a.k.a. robot, spider) used by to find biographical text on the Internet in connection with this engine. The crawler works by deeply crawling sites that contain biographical text. We have constructed the bot to limit its affect on site usage to be within the range of that of the large commercial search engines. Sites that do not contain biographical text are examined in a superficial manner.

How do I prevent MyFamilyBot from crawling my site?
MyFamilyBot supports the Internet standard protocols for restricting spiders from crawling web sites. These protocols are described here:

How can I contact someone concerning MyFamilyBot?
Please send questions and concerns about MyFamilyBot to

I have no idea when this page was added and I'm still not sure how to access it on the site. (Ironically, I used Google's cache to find it). Some may believe that this page constitutes some sort of notice to Web publishers who thereupon should have put into place the well-known protocols for preventing "MyFamilyBot" from crawling their site. I don't agree with this for a variety of reasons. First, I'm concerned about the adequacy of the notice. Second, the reasons I gave above apply here. This would continue to turn property law upside down.

CONCLUSION: Ancestry did the right thing by removing the IBC. They were in a fog of legal uncertainty. And more importantly, the rather surreptitious manner in which they established the IBC breached faith with their membership and the rest of the genealogical community. The legal issues ultimately will be resolved the by the United States supreme Court. The ethical and social issues can only be worked out if Ancestry reaches out to the community and engages the community in a genuine effort to close the breach. They've got some work to do on that issue. Now's the time to start.

Part 1
Part 2
Part 3
Part 4

Notice: The information in this writing is intended for educational use only and is not intended nor should it be construed as legal advice. If you have a legal problem, consult a lawyer admitted to practice in your state of residence. I am an active member of the bar of the State of California and am admitted to practice before the United States Supreme Court and various other federal courts. I am not licensed to practice in any other state. I am not presently soliciting or accepting new clients in the matters discussed above.


Terry Thornton said...

Well said!

Thank you Craig for all the time and effort that this series must have required. The hard work and effort shows --- and I am better equipped to form an opinion of this/these issue(s) as a result of your fine teaching. And I know that those within the blogging community who read your words will join me in saying, "Well Done!"

Thank you.
Terry Thornton
Hill Country of Monroe County, Mississippi

Anonymous said...

Hi Craig,
I'm too rushed to read it all again at the moment, but when I read the Field case it seemed to me that the judge said that Field had created a robots.txt file that said allow *, i.e. he explicitly gave permission to crawl. Also, that he knew he could add the no-cache meta tag and had the ability to do so and deliberately chose not to. We don't know that a judge faced with a case where the complainant didn't create a robots.txt, or couldn't, would still decide the same way.

The idea that the meta tag might be necessary is particularly disturbing because the meta tag does not allow you to treat different search engines differently. You may want to allow Google to cache, knowing their mode of operation, you may even be a Google partner, but you may at the same time not want Ancestry to do the same because they are your competitor. I am infinitely less qualified than you to interpret the findings, but I didn't read as much into this one aspect as you have.

Thanks for a very informative series of posts, the world needs more bloggers like yourself to provide expert calm reasoned examination of issues.

Craig Manson said...

I agree with you, Lindsay. The facts in the Field case are unique, and we don"t know how a case with different facts might turn out. I understand what you're saying about the meta-tag. Maybe there's a technical solution to that.

Ambar said...


a couple of comments from someone who has long been involved with search engines and the Internet, but is a newcomer to genealogy.

First, it is indeed possible to use robots.txt to block some bots while allowing others. Here is the example from

User-agent: BadBot
Disallow: /

Second, people complaining about having their web pages copied by other entities with the intent to maintain a permanent archive should be familiar with, which does just that, albeit as a non-profit.