Big data can be considered as data that exceeds the storage and processing capacity of conventional database system.
Following are few definitions from various sources:
- Wikipedia defines Big Data as follows: “Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate”.
- In 2012, Gartner updated its definition for Big Data as follows: “Big data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization.”
The Gartner definition can be split into two parts. First part says that Big Data is data that is high in volume, high in velocity, and/or high in variety. Volume, Velocity and Variety are usually called as the 3 Vs of Big Data.
Based on the 3Vs of Big Data, we can define Big Data as “Big Data is data that is very big, that comes in very fast for processing (e.g. a fast continuous streaming data) and may be very diverse (e.g. like structured, unstructured, NoSQL database data etc. ).
Many researchers have introduced many additional Vs
to the list of 3Vs, saying 3Vs are not enough. We will see them in detail in another note.
EXAMPLES OF BIG DATA APPLICATIONS
Big Data has great opportunities and following are some of the popular ones.
- Consumer examples
- Recommendation engines provided by most shopping sites like Amazon that suggests products based on what you have bought or even searched before.
- Siri is a feature that is part of Apple Inc.’s iOS which works as an intelligent personal assistant and knowledge navigator. The feature uses a natural language user interface to answer questions, make recommendations, and perform actions.
- Business examples
- Search suggestions when you start typing on search engines, provided by various search engines like Google.
- Ad targeting, through various providers like Google adwords, show you ads based on your previous searches and other activities online.
- Predictive marketing, that provides the target audience or trends based on various factors such as consumer behavior, demographic info like age, salary etc. which are readily available or which may be also purchased.
- Fraud detection, especially credit and debit card or online usages, based on point of sale, geo-location and IP, login time and even biometric details like time to make mouse movements.
- Research examples
- Google Flu Trends, that uses aggregated Google search data to estimate flu activity.
- NASA’s Kepler, a space observatory launched by to discover Earth-like planets orbiting other stars, continuously transmits data to Earth, and data is then analyzed to detect periodic dimming caused by extra solar planets that cross in front of their host star.
KEY CHARACTERISTICS OF BIG DATA TECHNOLOGIES
Big data technologies are various technologies that help us process big data effectively.
Two of the key characteristics of big data technologies are that:
- Data may be distributed across several nodes. This helps in improved performance through parallel processing, and also better availability through replication.
- Processing (or applications) are distributed to data nodes rather than fetching date from all nodes and processing from a single server node. This helps in improving the performance as multiple systems can process the data in parallel and there will be lesser transfer of data while processing data.
Big data technologies may also have additional characteristics such as:
- If it is required to fetch or process data from other nodes, big data systems tend to fetch or process data from closer nodes as possible.
- Blocks of data is usually read sequentially and filtered in the main memory, than random reads as in the case of traditional databases.
- Lynda.com’s Techniques and Concepts of Big Data with Barton Poulson.
- Book: Pro Apache Hadoop by Sameer Wadkar, Madhu Siddalingaiah and Jason Venner
- Wikipedia pages for all products listed here (if available).
- Volume, Velocity and Variety are usually called as the 3 Vs of BigData and was introduced by META Group (now Gartner) analyst Doug Laney in a 2001 research report and related lectures.