Redshift


  • Redshift is based on PostgreSQL, but itโ€™s not used for OLTP
  • Itโ€™s OLAP - Online Analytical Processing (Analytics and Data warehousing)
  • 10x better performance than other data warehouses, scale to PBs of data
  • Columnar storage of data (instead of row-based) & parallel query engine
  • Pay as you go based on the instances provisioned
  • Has a SQL interface for performing the queries
  • BI tools such as Amazon Quicksight or Tableau integrate with it
  • vs Athena: faster queries/joins/aggregations thanks to indexes

Redshift Cluster

  • Leader node: for query planning, results aggregation
  • Compute node: for performing the queries, send results to the leader
  • You provision the node size in advance
  • You can use Reserved Instances for cost savings

Snapshots & DR

  • Redshift has Multi-AZ mode for some clusters
  • Snapshots are point-in-time backups of a cluster, stored internally in S3
  • Snapshots are incremental (only what has changed is saved)
  • You can restore a snapshot into a new cluster
  • Automated: every 8 hours, every 5 GB, or on a schedule. Set retention
  • Manual: snapshot is retained until you delete it
  • You can configure Redshift to automatically copy snapshots (automated or manual) of a cluster to another Region

Redshift Spectrum

  • Query data that is already in S3 without loading it
  • Must have a Redshift cluster available to start the query
  • The query is then submitted to thousands of Redshift Spectrum nodes

References