INFS2030: Digital
Business
Management
Week 6 – The Data-Driven
Organisation: Big Data, Analytics and
Decision-Making
Uri Gal
Rm 4067/H70
uri.gal@sydney.edu.au
Consultation: By appointment
                                        Page 1
 Some admin issues…
Mid-term exam April 1
– 10 multiple-choice questions
– Covers material from weeks 1-5
                                   Page 2
Page 18
Exercise 1: What is Big Data?
What is big data?
–   “Big”, “lots”, “exponentially growing” data – so there is just massive volume!
–   Human or machine-generated
–   New types of data incl. unstructured data (clickstream, pictures, video, social media
    text, etc.) – variety
–   Speed (velocity) of data
What is its relevance for decision-making in business?
–   Relevant, valuable for business/marketing
–   –or–‘potentially valuable” (hence we hoard it for its potential for future analyses)
                                                                                       Page 19
The 3 Vs of Big Data
“The 3 Vs” of Big Data:
–   Volume – massive amount of data (quantity)
–   Variety – unstructured & structured data; different sources (variations)
–   Velocity – high frequency (speed)
                                                                               Page 20
Page 21
1) Human-Generate Big Data
Intentionally created:
–   Text messages
–   Posts on social media
–   Photos, video, audio
–   “Likes” or “helpful” votes
–   Web searches
–   Webpages bookmarks
–   Emails
–   Phone calls
–   Online purchases
              22
                                 Page 22
1) Human-Generated Big Data
Unintentionally created:
–   Photo metadata: Time, GPS location,
    Direction of phone
–   Phone call metadata: Time, location (cell
    tower), length
–   Email metadata: From/to/cc/timestamp
    etc.
–   Twitter metadata: Header includes
    location, creation date of account,
    application sent with…
– Metadata can be much larger than
  the actual user data
     – May not be user readable but is
       machine readable
                                                Page 23
2) Machine-Generated Data
–   Cell phone <> cell tower
–   Satellite > GPS device
–   RFID readings
–   Medical devices readings
–   Spam bots vs. spam filters!
– The Internet of Things
   – Smart homes, sensors etc.
                                  Page 24
Uses of Internet of Things
–   Monitoring production lines
–   Metering utilities
–   Tracking animals
–   Measuring temperature/environment (e.g., earthquake, tsunami warnings)
–   Automating home/building
–   Managing energy
–   Directing industrial appliances
–   Managing infrastructure
–   Monitoring health
– Content: machine-readable, often high volume/velocity but low variety
                                                                             Page 25
Some Examples
– Fraud detection in banking industry:
    – Identifying patterns in customer behavior
    – Detecting any deviations from known patterns
– Healthcare:
    – Drug development and new treatments
    – Personalized medicine
– Public policy
    – Leveraging data from healthcare, finance, and
      education, provides governments with insights needed
      to create policies
                                                             Page 26
Some Examples
– Starbucks:
    – Concern over the taste of a new coffee product
    – Social Media Sentiment Analysis revealed the taste was
      fine, but price was too high
– Chevron:
    – Each drilling miss can cost $100M
    – Up to 50 terabytes of seismic survey data is analyzed
      before drilling
    – Odds have increased from 1 in 5 to 1 in 3
– US Express
    – In-cab system generates more than 950 pieces of data
    – Data indicates the truck’s location, where they have been,
      whether the truck is idle or moving, and what customer is
      being served
    – Data is used in real-time to re-route the fleet
    – Reduced idle time has saved millions
                                                                   Page 27
Three types of Big Data Analytics
       Descriptive                  Predictive                   Prescriptive
-   What happened?          -   What will happen?         -   What should we do?
-   Dashboards              -   Forecasting               -   Optimization (achieve
-   Business intelligence       performance                   the best outcome)
                            -   Predicting failure of     -   Minimize risk
-   Team A is 30% below         machines                  -   Make best use of
    quarterly sales quota   -   Estimating risk (credit       resources, minimize
                                scores)                       wastage
                            -   Team A will finish 5%     -   Change team A’s
                                under quota                   incentive structure
                                                                                    Page 28
Why Big Data Analytics? → The promises
                        Data
                      Abundance
              Predictive       Unbiased
               power           decisions
                                           Page 29
1) The data abundance argument:
– More data coming from more sources
    – Social media, sensor data, machine-generate data, RFID,
      image, audio, GPS
    – A terabyte used to be a Data Warehouse; today many are
      over a petabyte in size
– The argument then goes something like this:
    –   We have more and more data at our disposal
    –   There is little value in simply storing the data
    –   We need to analyse the data
    –   Imagine what we could learn from all this data
– Core assumption: The more data we have the more we can learn (more =
  better)
                                                                    Page 30
But: Are claims about BIG data justified?
– Does data speak for itself?
    – All possible correlations between 1,000 variables in a data-set leads to 21000
      possible combinations; much larger than the number of particles in the Universe
      (1080).
    – Claims that data will simply speak to us are nonsensical
         • We must decide what variables to focus on, what questions to ask, which correlations
           are most informative, which data to ignore!
– Is more data actually better?
    – Any analysis will create spurious or wrong correlations (“false positives”)
    – The more data we have, the higher the likelihood of false positives
    – Volume by itself increases noise; data must be relevant, understood, timely
                                                                                            Page 31
2) The prediction argument:
– “In the next two decades, we will be able to predict huge areas of the
  future with far greater accuracy than ever before in human history, including
  events long thought to be beyond the realm of human inference.”
    –   (Patrick Tucker “The Naked Future”)
– Causality is dead: “Petabytes allow us to say: ‘Correlation is enough.’”
   – Using machine learning and predictive analytics we are able to simply
      predict what will happen next – there is no longer a need to understand
      ‘why’
– The end of theory and science?
    –   “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete” Chris Anderson, Wired Magazine, June 23,
        2008
– Assumption: The future can be predicted
  on the basis of the past
                                                                                                                     Page 32
But: Are predictions realistic?
– Extrapolation is not
  prediction
    – Only works when the future
      resembles the past
    – The behavior of dynamic
      complex systems as an
      emergent property of the
      interconnections between
      their parts
    – Potential for error increases
      with prediction time-scale
– Self-fulfilling prophecy
    – We help create the future
      we predict
                                      Page 33
3) The decision-making argument
– “Analytics brings rigorous techniques to decision making; big data is at once
  simpler and more powerful.”
– “It’s a simple formula: Using big data leads to better predictions, and better
  predictions yield better decisions.”
– “Leaders will either embrace this fact or be replaced by others who do.
  Companies that figure out how to combine domain expertise with data
  science will pull away from their rivals.”
– Assumption: Data is objective, based on facts and
  analytics (not intuition) and will therefore lead to correct and
  better decisions (Data is unbiased!)
                                                                             Page 34
Does Big Data analytics lead to objective decisions?
– Reducing complex (qual) phenomena
  to their (quant) digital residues
– Using simplistic proxies for multi-
  faceted traits or skills
– Working with Big Data requires
  making subjective (human) decisions
– Numbers are not neutral
                                                       Page 35
Page 37
 Next week
– AI or Design thinking?
                           Page 41