Amazon Elastic MapReduce (EMR)


  • EMR helps create Hadoop clusters (Big Data) to analyze and process vast amounts of data
  • The clusters can be made of 100s of EC2s
  • EMR comes bundled with Apache Spark, HBase, Presto, Flink, โ€ฆ
  • EMR takes care of all the provisioning and configuration
  • Auto-scaling and integrated with Spot instances
  • Use cases: data processing, ML, web indexing, big data, โ€ฆ

Node types & Purchasing

  • Master Node: Manage the cluster, coordinate, and manage health - long-running
  • Core Node: Run tasks and store data - long-running
  • Task Node (optional): Just to run tasks - usually Spot
  • Purchasing options
    • On-demand: Reliable, predictable, wonโ€™t be terminated
    • Reserved (min 1yr): Cost savings (EMR will automatically use if available)
    • Spot Instances: Cheaper, can be terminated, less reliable
  • Can have a long-running cluster, or transient (temporary) cluster

References