Data Mining on Amazon Product Reviews Dataset. This work was part of the final project for the Computational Tools for Big Data course offered by DTU, A.Y. 2016/17. By leveraging big data technologies such as Apache Spark, Neo4j, Pandas DataFrames, SQL, the overall goal was to exploit the potential of such tools to carry out an extensive analysis on an inconveniently large dataset, with all the drawbacks that kick in when it comes to storing, handling and processing the data.
Links
- Source code: Github repository
- Project report