More

    TEN WAYS BIG DATA DIFFERENT FROM SMALL DATA

    We already saw what big data is. Seeing how big data is different from small data will give you a better understanding of big data over traditional data sets. The book “Principles of Big Data: Preparing, Sharing, and Analyzing Complex Information” by “Jules J. Berman” lists 10 ways that big data’s different from small data. Here is a quick summary of them.

    1. Goals
      1. Small data usually has a specific goal or purpose.
      2. BigData may have one goal or purpose in the beginning, but might take unexpected turns later.
    2. Location
      1. Small data usually reside in one file or one place, like a single computer.
      2. BigData may spread across multiple files or computers, and even multiple geographical locations.
    3. Data structure and contents
      1. Small data is usually structured, like an RDBMS or an excel.
      2. BigData may be unstructured and may belong to different file formats.
    4. Data preparation
      1. Small data is usually prepared by the end user for their own use.
      2. BigData may be prepared, analyzed and used by different groups of people.
    5. Longevity (life expectancy)
      1. Small data is usually kept for a specific period of time and may be deleted or archived after that time period.
      2. BigData usually stays for longer periods and new data may be added to the existing data set.
    6. Measurements
      1. Small data is usually measured with same protocol or unit of measurement.
      2. Big data may be measured with different protocols or units of measurement, and may also involve some conversions to make the units consistent for analysis.
    7. Reproducibility
      1. Small data can be reproduced in its entirety if something goes wrong in the process, as it usually will be coming from a single source and is easy to recreate.
      2. BigData may come from various sources and hence may not be able to reproduce in its entirety.
    8. Stakes
      1. Cost, if something goes wrong to the data set, is limited in case of small data.
      2. Cost, if something goes wrong with BigData can be very high, even to the extent of affecting the researcher and even the organization.
    9. Introspection
      1. Small data may carry some additional data (e.g. a description element tag) that describes the content, to allow easier introspection.
      2. BigData might not carry description data for all content, and may contain unidentifiable, un-locatable, and meaningless data.
        1. Note that BigData and related processing might not be a candidate for all data scenarios because of these limitations.
    10. Analysis
      1. It is easy to analyze small data as it usually stays in a single computer.
      2. Since BigData may be spread across many computers, and hence analysis of BigData may involve many tasks such as abstraction, reviewing, reducing and finally aggregating results.

    REFERENCES: 

    Principles of Big Data: Preparing, Sharing, and Analyzing Complex Information by Jules J. Berman.

    Recent Articles

    OAUTH – FREQUENTLY ASKED QUESTIONS FOR INTERVIEWS AND SELF EVALUATION

    Why is refresh token needed when you have access token? Access tokens are usually short-lived and refresh tokens are...

    SUMO LOGIC VIDEOS AND TUTORIALS

    Sumo Logic Basics - Part 1 of 2 (link is external) (Sep 29, 2016)Sumo Logic Basics - Part 2 of 2...

    GIT – USEFUL COMMANDS

    Discard all local changes, but save them for possible re-use later:  git stash Discarding local changes...

    DISTRIBUTED COMPUTING – RECORDED LECTURES (BITS)

    Module 1 - INTRODUCTION Recorded Lecture - 1.1 Introduction Part I – Definition

    BOOK REVIEW GUIDELINES FOR COOKBOOKS

    Whenever you add reviews for the book, please follow below rules. Write issues in an excel.Create an excel...

    Related Stories

    Leave A Reply

    Please enter your comment!
    Please enter your name here

    Stay on op - Ge the daily news in your inbox