2018/07/17
Why Stanby moved big data analysis from Amazon Web Services to Google Cloud Platform
Stefan Meier is Lead Engineer at Stanby - the search engine division at BizReach, Inc. Today, he explains why Stanby is moving from Amazon Web Services (AWS) to Google Cloud Platform (GCP) for data analysis.
What is the challenge with big data? “Big data is data sets that are so voluminous and complex that traditional data-processing application software [is] inadequate to deal with them."1
One of the most fundamental principles lies in the discrimination between streaming and batch processing. In batch processing, a set of events is collected over time and then processed later, whereas in the streaming model, data is processed in near real-time. Also, when dealing with streaming data new challenges appear, such as handling unordered data with unknown delays (see graph below). Indeed, the collected data events might originate from users with slow Internet connections, users accessing the site from multiple devices or there might be just network delays. Therefore, the gap between the time the event occurred (event time) and the time the event is observed by the system (processing time) ranges from a few milliseconds up to several hours. In order to combine and group related data, methods such as windowing are used.