TMCnet News
Tamr Awarded Patent for Enterprise-Scale Data Unification SystemTamr, Inc. today announced that it has been issued a patent (US9,542,412) from the United States Patent and Trademark Office covering the principles underlying its enterprise-scale data unification platform. The patent, titled Method and System for Large Scale Data Curation, describes a comprehensive approach for integrating a large number of data sources by normalizing, cleaning, integrating, and deduplicating them using machine learning techniques supplemented by human expertise. While the challenge of data unification is decades old, the award of this patent recognizes and protects the uniqueness of the innovations embedded in Tamr's software platform. "When my co-inventors and I began work at MIT (News - Alert) CSAIL on what is now is now Tamr, we believed that traditional approaches to data integration had outlived their usefulness," said Mike Stonebraker, co-founder & CTO of Tamr. "Our goal was to build an end-to-end system for enterprise-scale data curation that leveraged modern machine learning techniques to radically reduce the time and cost of producing clean, unified data sets. Tamr's growth has proven the commercial value of the many innovations in our software, and this patent now confirms the uniqueness of our invention." Tamr's patent describes several features and advantages implemented in the company's software, including: the techniques used to obtain training data for the machine learning algorithms; a unified methodology for linking attributes and database records in a holistic fashion; multiple methods for pruning the large space of candidate matches for scalability and high data volume considerations; and novel ways to generate highly relevant questions for experts across all stages of the data curation lifecycle. Other characteristics of Tamr's unique data unification system covered by the patent include:
2. Data cleaning. Enterprise data sources inevitably include raw data that is both dirty and / or noisy. Attribute data may be incorrect, inaccurate, or missing, thus necessitating an automated solution with human help only when necessary. 3. Non-programmer orientation. Current Extract, Transform, and Load (ETL) systems have scripting languages that are appropriate for professional programmers. The scale of today's problems requires that less skilled employees (e.g., system operators) be able to perform data integration tasks. 4. Incremental data integration and data curation. New data sources must be integrated incrementally as they are uncovered. There is never a notion of the data integration task being finished. "I'm incredibly proud of the work that Tamr has done to bring this invention from the lab at MIT to the data centers of our customers," said Andy Palmer, co-founder and CEO of Tamr. "Our company and our customers owe a particular debt of gratitude to the Tamr employees named on this patent: Nik Bates-Haus, George Beskales, Dan Bruckner, Ihab Ilyas, Alex Pagan, and Mike Stonebraker. This patent, and the others that we've filed, confirms what we've known all along: their work was, and continues to be, groundbreaking."
Additional Resources:
About Tamr, Inc.
View source version on businesswire.com: http://www.businesswire.com/news/home/20170209005327/en/ |