This is a guest post from Oren Falkowitz, who will be presenting in more detail on this topic on July 15th at a free web seminar.
Business leaders are embracing the increasing amounts of data available to them. Whether to increase profits, bolster supply chain efficiency, deploy more precise marketing tactics, or improve security operations, they see opportunity in the vast amounts of data available to them. As it has become cheaper to store, analyze and share data, more data is coming together than ever before. Simple data integration has proven ineffective and the need for data curation has become ever more important. Data curation is more than data integration because it is more than transporting, cleaning, transforming, and de-duping. Data curation is about discovery. Data curation reveals new features, and helps entitiy disambiguation.
As we’ve come to appreciate, data attracts data, and when this happens at scale and with diversity, the quality and context of the results increases exponentially. “What we’ve found is that data that reveals insights on human behavior benefits from scale itself rather than hoping for annotations in data that isn’t available. We will never have uniform inputs, instead we must embrace complexity and make use of the best ally we have.” The unreasonable effectiveness of data allows us to pursue insights into human behavior that were not possible with elaborate models based on less data.
The possibilities for new modern data curation tools have never been more valuable than they are today in the computer security industry. Countering Advanced Persistent Threats (APTs), which have been widely reported on as of late in major breaches at Target, Google, and across the U.S. Government, reveals the need to move beyond signatures, and the stale structure of data fed into Security Incident Event Management (SIEM) solutions. In order to prevent the behaviors, or what security experts refer to as Tactics, Techniques, and Procedure (TTPs), winners and losers in the infosec community will be determined by their ability to fold in broader and richer data for curation outside of traditional data sources. This has never been more valuable if we are to prevent — rather than detect malicious events expo facto.