Keywords: GraphX , Spark, Air Traffic, Graph Analytics
Spark is general purpose cluster computing system with APIs in Java, Scala, Python and R. It uses multistep data pipelines using direct acyclic graphs (DAGs). As a result of in memory data sharing spark is comparably faster than other systems. Overall, Spark provides a unified framework to manage Big Data processing with variety of data sets that are diverse in nature as well as the source (batch and real time streaming). Graph X is embedded in Apache Spark platform for graph analytics. Most of the dedicated graph-parallel systems are faster and simpler but real world graph analytics involves both graphs and tables. This requires separate system for data-parallel and graph-parallel. Since graphX is built upon Spark it can handle data and graphs at the same system, which reduces processing time and data losses. GraphX treat tables and graphs as different views of the same data. In this study, US Flight Data from Bureau of Transportation Statistics are analyzed using GraphX. Tables and Graph operators are tested on the data set.