MIT Researchers have devised an algorithm that is fast and efficient by compressing the data and processing them into smaller data coresets. This approach which they applied to a GPS program can also be utilized by other algorithms.
Computers process data based on a series of sequential procedures in performing its calculations. This is called an algorithm. Algorithms help computers decide how to treat data without consulting the user.
But how does an algorithm work? Here is a sample algorithm to see if the user is old enough to register in a website. First the program asks the user to input his birthday. The computer checks the present age based on the birthday. If the result is below the allowed age, then the computer informs the user he is too young to register for the sit.
The example above is just a simple straightforward algorithm. In the present day, there are highly complex algorithms engineered to predict such things as stock market trends, weather forecasting, and even consumer buying habits.
Garbage In, Garbage Out
What is important to an algorithm is the data available to it. If the data is insufficient, inaccurate, or even late, the algorithm will not work. Some even rely on instantaneous access (speed) such as in the case of stock market algorithms where current fresh data must be available in microseconds for it to be efficient.
There are even algorithms in place to ensure the integrity of the data.
In theory, the more data available to an algorithm, the more flexible and efficient it can be. But sometimes, there are just massive amounts of data where some of it is not needed and inconsequential to the algorithm, that it just weighs down the computers.
In the sample age algorithm above, the location, name, race, or even occupation of the user is irrelevant.
Streamlining The Data
Using a smartphone GPS program, researchers at MIT devised an algorithm to process the data faster and more efficiently.
Video: Introduction to Algorithms
Daniela Rus, a professor of computer science and engineering and director of MIT’s Computer Science and Artificial Intelligence Laboratory, explains, GPS receivers take position readings every 10 seconds, which adds up to a gigabyte of data each day. A computer system trying to process GPS data from tens of thousands of cars in order to infer traffic patterns could quickly be overwhelmed by data.
Using linear simplification and signal clustering, Rus and his team constructed an algorithm that analyzes the trajectory of the moving vehicle. A general example of linear simplification is that a straight road would always be straight and because of this, pinpointing the location of the vehicle in this stretch would be inconsequential since there are no turns available to it. What is important to the algorithm is when the vehicle comes up to a point where a left turn or right turn is made available.
In signal clustering, the algorithm analyzes the behavior of all vehicles on a specific location. If the data shows that a majority of vehicles stop at a certain point, it may mean that there is a traffic light or stop sign in the vicinity.
Compression
An important part of the algorithm is that it compresses data on the fly (as it arrives). Dan Feldman, a postdoc in Rus’ group and lead author on the new paper, explains that their algorithm can compress the data one megabyte at a time as it is received from the vehicle. This way, the resulting compressed data is almost the same as when the algorithm waits for the complete data batch to arrive before compressing.
In their paper, The Single Pixel GPS: Learning Big Data Signals from Tiny Coresets, they note that, "Even for small datasets when storage of the signals is not an issue, analyzing the signals usually involves non-trivial optimization problems. Applying the algorithms on small semantic representations of the signals reduces their running time, which usually depends on the input signal size. This idea will be formalized as coresets..."
A coreset is reduced data set that can be processed as if it were a larger set.
Satisfaction guaranteed
During compression, some information is lost, but Feldman, Rus, and graduate student Cynthia Sung, the paper’s third author, were able to provide strict mathematical guarantees that the error introduced will stay beneath a low threshold. In many big-data contexts, a slightly erroneous approximation is much better than a calculation that can’t be performed at all.
According to Alexandre Bayen, an associate professor of systems engineering at the University of California, at Berkeley, the MIT researchers’ new paper “pioneers the field” of “extracting repeated patterns from a GPS signal and using this data to produce maps for streaming GPS data.”
In computer science parlance, Bayen explains that, “The coreset is a good solution to big-data problems because they extract efficiently the semantically important parts of the signal and use only this information for processing,” Bayen says. “These important parts are selected such that running the algorithm on the coreset data is only a little bit worse than running the algorithm on the entire data set, and this error has guaranteed bounds.”
Computers process data based on a series of sequential procedures in performing its calculations. This is called an algorithm. Algorithms help computers decide how to treat data without consulting the user.
But how does an algorithm work? Here is a sample algorithm to see if the user is old enough to register in a website. First the program asks the user to input his birthday. The computer checks the present age based on the birthday. If the result is below the allowed age, then the computer informs the user he is too young to register for the sit.
The example above is just a simple straightforward algorithm. In the present day, there are highly complex algorithms engineered to predict such things as stock market trends, weather forecasting, and even consumer buying habits.
Garbage In, Garbage Out
What is important to an algorithm is the data available to it. If the data is insufficient, inaccurate, or even late, the algorithm will not work. Some even rely on instantaneous access (speed) such as in the case of stock market algorithms where current fresh data must be available in microseconds for it to be efficient.
There are even algorithms in place to ensure the integrity of the data.
In theory, the more data available to an algorithm, the more flexible and efficient it can be. But sometimes, there are just massive amounts of data where some of it is not needed and inconsequential to the algorithm, that it just weighs down the computers.
In the sample age algorithm above, the location, name, race, or even occupation of the user is irrelevant.
Streamlining The Data
Using a smartphone GPS program, researchers at MIT devised an algorithm to process the data faster and more efficiently.
Video: Introduction to Algorithms
Daniela Rus, a professor of computer science and engineering and director of MIT’s Computer Science and Artificial Intelligence Laboratory, explains, GPS receivers take position readings every 10 seconds, which adds up to a gigabyte of data each day. A computer system trying to process GPS data from tens of thousands of cars in order to infer traffic patterns could quickly be overwhelmed by data.
Using linear simplification and signal clustering, Rus and his team constructed an algorithm that analyzes the trajectory of the moving vehicle. A general example of linear simplification is that a straight road would always be straight and because of this, pinpointing the location of the vehicle in this stretch would be inconsequential since there are no turns available to it. What is important to the algorithm is when the vehicle comes up to a point where a left turn or right turn is made available.
In signal clustering, the algorithm analyzes the behavior of all vehicles on a specific location. If the data shows that a majority of vehicles stop at a certain point, it may mean that there is a traffic light or stop sign in the vicinity.
Compression
An important part of the algorithm is that it compresses data on the fly (as it arrives). Dan Feldman, a postdoc in Rus’ group and lead author on the new paper, explains that their algorithm can compress the data one megabyte at a time as it is received from the vehicle. This way, the resulting compressed data is almost the same as when the algorithm waits for the complete data batch to arrive before compressing.
In their paper, The Single Pixel GPS: Learning Big Data Signals from Tiny Coresets, they note that, "Even for small datasets when storage of the signals is not an issue, analyzing the signals usually involves non-trivial optimization problems. Applying the algorithms on small semantic representations of the signals reduces their running time, which usually depends on the input signal size. This idea will be formalized as coresets..."
A coreset is reduced data set that can be processed as if it were a larger set.
Satisfaction guaranteed
During compression, some information is lost, but Feldman, Rus, and graduate student Cynthia Sung, the paper’s third author, were able to provide strict mathematical guarantees that the error introduced will stay beneath a low threshold. In many big-data contexts, a slightly erroneous approximation is much better than a calculation that can’t be performed at all.
According to Alexandre Bayen, an associate professor of systems engineering at the University of California, at Berkeley, the MIT researchers’ new paper “pioneers the field” of “extracting repeated patterns from a GPS signal and using this data to produce maps for streaming GPS data.”
In computer science parlance, Bayen explains that, “The coreset is a good solution to big-data problems because they extract efficiently the semantically important parts of the signal and use only this information for processing,” Bayen says. “These important parts are selected such that running the algorithm on the coreset data is only a little bit worse than running the algorithm on the entire data set, and this error has guaranteed bounds.”
RELATED LINKS
Speeding algorithms by shrinking data
Learning Big Data Signals from Tiny Coresets
Distributed Robotics Laboratory
Association for Computing Machinery
Algorithm Developed To Trace Source of Internet Rumor, Epidemic, or Terrorist Attack Within A Network
Algorithm Developed For Resampling Framework Of Point Set Surfaces With Gaussian Spheres
The Stock Market - High Frequency Trading, the Algorithms and the Science Behind It
MIT News: Algorithm Developed For Determining Trajectory For Robot Planes Without GPS
MIT News: Algorithm Developed To Allow Cars To Connect To Wi-Fi Network
MIT News: Researchers Develop Algorithm That Allows Robots To Learn And Understand
MIT News: Sometimes The Quickest Path Is Not A Straight Line
MIT NEWS: The faster-than-fast Fourier transform
Computer Systems: Introduction to Siebel