Simply put, Dark Data is stored, largely non-inventoried, unstructured data not currently used for the purpose of conducting data science, but which is nevertheless maintained on a "just in case" basis - either to meet regulatory requirements, or in the hope that the data will prove useful for research purposes at some time in the future. "Gathering dust" in archives, Dark Data, is - as less simply put by CIO and industry pundit Isaac Sacolick - "data and content that exists and is stored, but is not leveraged and analyzed for intelligence or used in forward looking decisions. It includes data that is in physical locations or formats that make analysis complex or too costly, or data that has significant data quality issues. It also includes data that is currently stored and can be connected to other data sources for analysis, but the business has not dedicated sufficient resources to analyze and leverage." Add to this unstructured data of a nature for which sufficiently robust or accurate analysis tools have not yet been invented, and some data (notably most log files) which will simply never be of use and will never yield useful Business Intelligence, commonly known as BI.
In Dark Data & Dark Social, Lars Nielsen explores then nature of Dark Data, how to go about discerning genuinely useful Dark Data amid the large balance of useless data debris with which most enterprises are swamped, how to build a data science team to accomplish this task and leverage Dark Data to its utmost potential, how to safely and irrevocably dispose of unusable data debris, and also how to exploit some of the darkest of dark data: "Dark Social" the hard-to-track but incredibly valuable real-time data pegged to largely-anonymous second party referrals to web sites (as opposed to direct click-throughs). Throughout the book, Nielsen provides information in a user-friendly, jargon-free manner which assumes little technical background.