Algorithms for Big Data
Full course description
Algorithms for Big Data presents an algorithmic toolkit to efficiently deal with the challenges that the ever growing amount of data pose. For example, the data may not fit into main memory anymore, and caching from a hard-drive becomes a new bottleneck that needs to be addressed. Similarly, algorithms with larger than linear running time take simply too long on very large inputs. Simple sensory devices can observe large amount of data over time, but cannot store all the observed information due to insufficient storage, and an immediate decision of what to store and compute needs to be made. Standard algorithmic solutions do not address these challenges, and new algorithmic techniques are needed. This course looks at a number of algorithmic responses to these problems, such as algorithms with linear or sublinear running time, algorithms where the data arrives in a stream, and computational models where memory is organized hierarchically (with larger storage units, such as hard-drives, being slower to access than smaller, faster storage such as CPU cache memory). It also covers a number of topics from classical algorithm design that have undiminished relevance in the era of big data, such as approximation algorithms (obtaining a suboptimal solution with a mathematically rigorous guarantee of proximity to optimality), on-line algorithms, multivariate algorithmic analysis (where the running time can be described by parameters of the input other than purely its size) and multi-core computation. After completion of this course, students will be familiar with the complexities and difficulties of dealing with very large datasets and will be able to address these issues with a variety for suited tools.