Free download: 10 terabytes of patents and trademarks
Wednesday, June 2, 2010
When we launched Google Patent Search in 2006, we wanted to make it easier for people to understand the world of inventions, whether they were browsing for curious patents or researching serious engineering. Recently, we’ve also worked on a number of public data search features, as well as experimental features like the Public Data Explorer.
There are many places to search for individual patents -- the US Patent and Trademark Office and Google Patent Search are two examples. But sometimes that’s not enough. If you’re trying to identify trends in innovation over time or analyze all the patents relevant to your invention, it helps to have all the patent data on hand. For example, the non-profit Cambia’s PatentLens creates topical analyses of patent information, and they can only do this with a comprehensive data set. Others have experimented with a variety of online mashups of the data, such as an interactive map showing the most innovative states.
The trouble is, that’s a lot of information -- terabytes of it -- and in the past the only way to deliver that information was on DVDs and other physical media. The USPTO will ship them to you, and over the last decade Cambia alone has spent hundreds of thousands of dollars on this data. But with high-bandwidth connections on the rise, both the USPTO and Google think it’s time to help people download the bulk data directly.
That’s why we’re proud to announce that the USPTO and Google are making this data available for free at http://www.google.com/googlebooks/uspto.html. This includes all granted patents and trademarks, and published applications -- with both full text and images. And in the future we will be making more data available including file histories and related data.
We look forward to continuing to work with the USPTO and other public organizations to expand access to public data. You can read the official press release from the USPTO here.
How long until Google launches fully functional searches of trademark records?
ReplyDelete10 Terabytes? Oh my god thats kinda crazy I love it though, thanks Google!
ReplyDeleteI'm working on a trademark search site. It's still very much in Alpha state, but it has the trademark databases of the US, Canada, Germany, Austria, Australia, Ireland, and New Zealand. The UK and some other databases are coming soon.
ReplyDeleteThe site is trade.mar.cx. Clever domain, eh?
Crazy cool - thanks! Any plans for adding a feed for these files or some other way to auto-download when new ones become available?
ReplyDeleteI actually work for the company who converts the patents into digital format for the PTO
ReplyDeleteNo wonder the data is terabytes, even for inventing small things, people carve out tens of pages of text of obscure legal language. I wonder what the common man can use that for if its not understandable by him.
ReplyDeleteGood move, but hardly of any use. Check http://patentabsurdity.com/
As a patent agent, searchable access and bulk downloads of patent file wrapper data would be very helpful.
ReplyDeletehow do I see these files?
ReplyDeletethe size after unpacking the text is so huge, I can't even see the XMLs. Of course, I am doing this from home. I guess this isn't aimed at Joes at home.
This comment has been removed by the author.
ReplyDeleteThanks for sharing, this gives some food for thought.
ReplyDeleteThis is very good news. I have a couple of suggestions though...
ReplyDeleteSplit the huge download pages (at for example http://www.google.com/googlebooks/uspto-patents-grants.html) into one for each year. Loading that huge page maxes out the CPU for quite a while even on a reasonably modern machine.
Also, now you have the actual/"master" data, is there any chance of replacing all the broken patent images on the Google Patents site? As anyone who has used Google Patents to view patent images will know, some drawings/diagrams for many patents are broken. Presumably that was an unintended side-effect of Google's OCR process, but it's been like that for years now.