MySpace to open source data processing
Social network unveils Qizmt, a distributed computation framework developed by its data mining team. Managing big data is the new black.
Qizmt is based on the MapReduce distributed processing framework, well-known as a core part of Google's search indexing infrastructure. Qizmt, however, runs on large clusters of Microsoft Windows servers, an interesting sidebar to a computing style we most commonly associate with commodity Linux machines.
MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key.
I spoke with Java architect and distributed systems expert Eugene Ciurana about MapReduce and he contends that "indexing large amounts of unstructured data is a difficult task regardless of the technologies involved. MapReduce provides a simple, elegant solution for data processing in parallelized systems."
As more sites move to manage large data sets, the uptake of frameworks like MapReduce and projects like Hadoop is sure to grow. And along with the growth of the data is the growth of the market opportunity. Open source is a great way to expand and enlarge the adoption curve as users figure out the best way to use these new tools.
Qizmt is currently being used in the MySpace "People You May Know" feature, and will soon expand to user recommendations and other new areas.
Follow me on Twitter @daveofdoom.