Where Einstein Meets Edison

Semantic Technologies Series: Interview with Jamie Taylor of Metaweb

Semantic Technologies Series: Interview with Jamie Taylor of Metaweb

Jul 23, 2010

I talked to Jamie Taylor, Minister of Information at Metaweb, the company behind Freebase. Freebase is a community-writable semantic database, spanning millions of topics in thousands of categories.

Jamie, tell us what sparked your interest in semantic technologies: you got a PhD in Behavioral Economics, at some point you started San Francisco’s first ISP. How did you become the Minister of Information at Metaweb?

I think I was very fortunate as an academic. I started my graduate studies in the mid 80s and being a person who is interested in behavior, whether or not I pursued AI or behavioral studies, looking at the state of AI, they were making some pretty big promises that seemed a little sketchy, but I had a deep interest. I went on to the Harvard Psychology Department, pursuing a very pure behavioral approach working with animal models which led me to anthropology. At the time, I was rooming with an economist from NYU, Robin Cowan. He was my tutor in economics; our evening conversations led to the underpinnings of economics and I realized I was much interested in the types of analysis the economists were doing than the general behavior. To test fundamental questions, like price-taking, I built a computer-based trading system. Internet was available, I could set up 100s of traders -sophomores earning 5 dollars an hour – and I learnt a lot about internet development. And I saw some real opportunities that were about to be. So I started an ISP because I needed fast connection at home and I needed a way for paying for that connection.

My career took me from running an ISP to working with pretty large companies that were very early on the internet. I met a bunch of people and together with an advisor who I knew quit well in the valley, we started a company which in theory would be trading very large capital leases – so it was my behavioral economics background combined with my computer skills – and that evolved into a large enterprise contract management system. So I started dealing with Fortune 1000 companies and the models that I wanted for managing these transactions were semantic and it turned out the tools weren’t quite there.

That was in 2000?

That was in late 1999 and the company [Determine Software] was acquired by Selectica in 2005. And around that time, my friend Robert Cook had been working with Applied Minds and they were building very large graph based systems. They spun off MetaWeb from Applied Minds and in a lot of conversations with him the desire to have these sort of tools and make them available to the wider developer community drew me in.

So what’s your job description as Minister of Information?

I work on the data team and on different data projects, frequently interacting with partners. Then I do a lot of evangelism; semantic technologies have been described in ways that make them sound very complicated and very hard, the truth of the matter is the models are pretty simple to understand and it is easy for a developer to get started with these things. Once you get started, you are going to get value back from them and if you spend more time you can build up higher and higher value propositions. However, I think too frequently we are making the case that you have to jump into the deep end, that there’s going to be a lag until you get utility out of it. So, part of my evangelism is to help people to see that there is a nice slope you can follow.

Do you see an increased effort by the community make semantic technologies more accessible, more mainstream?

So I think, just watching this conference [Semantic Technologies Conference 2010] but also other conferences, just the sheer size of the conference has increased pretty dramatically as well as the diversity of people who actually have commercial offerings in terms of tools that matter to your typical webmaster, your typical content manager. Content management turns out to need a strong semantic underpinning which it has not had in the past. So WSJ is sort of our flagship for doing this. They have a lot of content, and their content management told them what section a piece was published in, when and who wrote it, but not how it was connected to other pieces of content within their system or across the internet.

Where today we have “You might also be interested in the following articles”…

It certainly falls into this category but it’s also if you are looking at an entity in the article’s context and you want to learn more about it. By tying entities in a given articles to freebase concepts, they can run a query against it and will find a feature article at the sister property. That’s one of the areas were we see real opportunities emerge and which have direct monetization opportunities behind it.

So what are some of the business models that you have seen that work for semantic technologies?

I see semantic technologies as a way to pull data together; the idea that you can aggregate is something very novel: all of a sudden my data is not limited to my data silo. From an enterprise standpoint that is actually very fascinating. A lot of organizations are managing data that they need to have to drive their core business. It’s not the core but the secondary data, context data (for example, geo data) we need to look at; they need to manage it because there is no way to obtain it from other sources. What semantic technologies allow is in some sense to outsource that important but not core data to the community for maintenance.

Related to the previous point and considering the progress you have seen over the past 5 years, would you say there has been a strong shift from semantic technologies as a truly academic discussion towards a business driver?

I think there is clearly an academic track to all of this where people are pushing very hard at the formal analysis and inference aspect. But it’s very interesting that sometimes semantic technologies have met the Web 2.0 lightweight user contribution-type model and as you add semantics into these types of systems – fairly lightweight semantics – all of a sudden they start getting much greater benefit. One of the things that semantics do is allow you to mash up data very easily, very quickly and that has been the premise of where Web 2.0 type value comes from. It turns out that if you start using semantic technologies, these things get easier, you can expose more data and have a higher value proposition.

So your definition of the Semantic Web would be rather lightweight then?

I am absolutely a lightweight semantics guy. To me RDF [Note: Resource Description Framework, a Semantic Web data standard] is just a serialization of the graph. I think when I think about semantic technologies in general, one of the core stories is about graph representation of data and the idea that you have addressable entities with attributes. RDF is a wonderful interoperable way to exchange that data but it’s actually just one of many.

So talking about Freebase, last time I checked you had somewhere around 10 million topics. Can you share with us some of the latest stats with us?

It’s now over 12 million topics! It’s fun looking back at some of my slide decks – the numbers are going from – oh, we think it’s 5 million, to 9 million, now it’s over 12 million. It’s pretty exciting. And we have about 400 million relationships between them.

What are some of the most exciting use cases you have seen lately?

I think what’s very exciting about Freebase is that it services need at all levels of the community – so we have people who are going in there and organizing very specific pieces of information that they understand and want to connect. There is a group that came in very early who added annotations to the human genome and another group expanded that to include annotations to viruses, tagging literature against freebase identifies. That was very simple; they got a lot of value out of it immediately, all with only 4 lines of javascript.

Then there is another group that came in and they built a bio venture portal (BioVentures) on top of the freebase database. And then you get the very large [companies], for example Powerset, now being a big part of Bing, you definitely see freebase data appearing on Bing.

You mentioned Microsoft’s Bing, what’s your take on Google’s and Facebook’s approach with respect to the semantic web and semantic search?

Microdata [Note: an extension of the existing microformat idea] and HTML 5 is a big deal. I think microdata is simple enough that truly every developer can understand it. If you are a PHP developer you will have no problem understanding the data model behind it.

With respect to Facebook, the OpenGraph protocol is actually a fantastic development. Where linked open data provides you back that whole graph and you have to figure out how to digest it, they give you back a very simple JSON [Note: JavaSript Object Notation, a popular data interchange standard] structure. To me, that’s very exciting; this is semantic technologies finally at the point where anybody can use it.

And with respect to [Google’s] rich snippets – certainly Yahoo started the trend with SearchMonkey [Note: SearchMonkey is Yahoo’s developer platform which allows use of structured data to make search results more useful and relevant], but when Google got on board saying they would use rich snippets in the same way, there was a surprising number of pages that got marked up in that way. Google is a big value driver, so when they say they will actually make use of whatever semantics you provide, all of a sudden people get on board to reap the benefits of doing this. This hasn’t been well articulated by [the Semantic Web] community frankly.

So, with respect to the evolution of search: When will we see semantic search as our preferred search method? Would you go as far as some people in saying current keyword search will die and we will entirely be guided by recommendations and results coming from our social network?

I think the two actually go hand in hand. To me the exciting thing about semantic technologies, especially in the hands of developers and with users starting to appear on the other end, you get feedback to what actually matters and what’s valuable. I don’t think we have done enough to sort of expose those things. The good thing about Freebase is that it’s so lightweight that 1000’s of applications have been written against it; many by just independent developers trying something out. That’s the level of investment where you get many flowers to surface and you can see which ones will actually bloom.

We are going to start see more and more people identify that thing they have information about. I have a feeling that with more entities exposed, the idea that you can actually do social recommendations becomes greater.