While the origins of the term are elusive, and even debated, big data is one of those concepts that many know about, yet it defies a simple definition. At the heart of big data, as the term directly suggests, is an extremely large volume of data. This is often drawn from diverse sources and even different types of data, which is then crunched through advanced analytic techniques which hopefully pick out patterns that can lead to useful conclusions.
Big data also infers the three Vs: Volume, Variety and Velocity. Volume refers to the size of the data, variety indicates that the datasets are non-homogenous, and velocity is the speed at which the analysis takes place, often with the goal of achieving real-time analysis.
The datasets involved are indeed seriously large – we’re talking terabytes to zettabytes (1ZB is equivalent to 909,494,701TB, for the curious). In addition to the size of these datasets, the data can be of different types: structured, semi-structured and unstructured, plus it can be drawn from multiple sources.
This does beg the question as to where all this data is being generated from. It comes from all types of places, including the web, social media, networks, log files, video files, sensors, and from mobile devices.
The latter are particularly important as most of us keep our phones with us and on 24/7, and they have an array of sensors, including GPS, cameras, a microphone, and a motion sensor. Furthermore, the majority of smartphone use is not voice communication, but rather other activities, including emails, games, web browsing, and social apps – which ultimately translates to 90% of use being mobile apps. A large driver of big data is this mobile data, which gets generated at a breakneck pace.
But data without any analysis is hardly worth much, and this is the other part of the big data process. This analysis is referred to as data mining, and it endeavors to search for patterns and anomalies within these large datasets. These patterns then generate information that is used for a variety of purposes, such as improving marketing campaigns, increasing sales or cutting costs. The big data and data mining approach not only has the power to transform entire industries, but it has already done so.
For example, Trainline is a leading European independent train ticket retailer, selling domestic and cross-border tickets in 173 countries, with approximately 127,000 journeys taken daily by customers. The company utilized big data to modernize its approach to travel, with a focus on improving the customer experience via innovation through its app.
The results are that now customers receive enhanced disruption notifications through the app. More than just notifications of delays, these enhanced notifications are specific to each traveler’s journey, a first for the UK rail industry. The firm has also innovated in terms of predictive pricing, which is able to predict when advance fares will rise from the initial discounted rate, allowing passengers to purchase fares at lower prices.
Big data has also been used in restaurants, and in particular the fast food industry. McDonald’s is the world’s largest restaurant chain by revenue, and serves over 69 million customers daily at over 36,900 locations in over 100 countries.
Due to sheer volume alone, tons of data is generated, and therefore McDonald’s has adopted a data-driven culture, with the goal of improving its understanding on the level of each individual location, with the overall goal of a better chain of restaurants.
Through big data, McDonald’s has optimized its drive-through experience, for example taking note of the size of the cars coming through, and preparing for a spike in demand when larger cars join the queue.
Another big data innovation has been those digital menu displays that can flexibly show menu items based on a real-time analysis of the data. The menus shift the highlighted items based on data including the time of day and the weather outside, specifically promoting cold drinks when it is hot outside, and more comfort foods on cooler days. This approach has boosted sales at Canadian locations by a reported 3% to 3.5%.
This big data approach has also been applied to healthcare. An obvious example is the major shift away from ‘pen and paper’ charting where your physician’s data is locked away in a filing cabinet in the office, to Electronic Health Records (EHR), which now have all patient information neatly entered into a computer database, ready to be mined.
This approach promises to be disruptive, with a recent publication in the European Heart Journal promising the “potential to improve our understanding of disease causation and classification relevant for early translation and to contribute actionable analytics to improve health and healthcare”.
The benefits of big data in healthcare will go beyond data mining the EHR. A significant challenge for hospitals is staffing, which has to be adequate at all times, with the potential to ramp up during peak periods.
At a group of four Paris hospitals that comprise the Assistance Publique-Hôpitaux de Paris (AP-HP), they are looking to improve flexibility in staffing. They used a dataset of 10 years of hospital admission records, down to a granular level of the number of admissions by the day, as well as the hour of the day, and combined it with weather data, flu patterns, and public holidays.
Using machine learning, they then honed their algorithms for future trends to predict the number of upcoming admissions for different days and times. The result is that they now have an easy to use, browser-based interface for hospital administration, as well as clinical staff who are able to forecast admission rates over the next 15 days, which is used to obtain extra staff at times when a larger number of admissions is anticipated.
With data, and in particular mobile data being generated at a ridiculously fast rate, the big data approach is needed to turn this massive heap of information into actionable intelligence. In the examples we’ve cited above, the challenge has been met, and as even more data is collected, there will be more opportunities to increase quality and efficiency across a number of diverse industries via faster and better analysis of these disparate sprawling datasets.