shou2017.com
JP

AWS Certification, My Memo (Database Edition)

Sat Jul 11, 2020
Sat Jun 8, 2024
AWS

Amazon RDS

The official name is Amazon Relational Database Service, abbreviated as RDS.

It’s basically Amazon’s version of a relational database. You can use MySQL and PostgreSQL, but personally, I’m also interested in trying Amazon Aurora. I guess Amazon provides all sorts of convenient database services, though I don’t know all the details yet.

Storage Types

RDS uses EBS for data storage. The available EBS storage types are:

  • General Purpose SSD
  • Provisioned IOPS SSD
  • Magnetic

You can expand storage online, but performance may drop.

Multi-AZ Configuration

When creating a DB instance in RDS, you can simply select Multi-AZ configuration, and AWS will automatically set up the environment needed for DB redundancy.

AZ stands for Availability Zone. There are two or more availability zones in a single region.

The reason for having multiple availability zones is to design for failure. Even if one availability zone goes down due to a disaster, another can take over, making the system highly resilient.

So, having multiple availability zones = being in multiple zones = Multi-AZ configuration. Just remember it that way.

However, there are some drawbacks to this convenient Multi-AZ configuration in RDS:

  • Write speed becomes slower
    • Because data is synchronized between two AZs, write and commit times are longer than with a single AZ configuration.
  • Failover takes time
    • If a failure occurs, there is a wait for the IP address to switch.

Read Replica

A Read Replica is a service that allows you to create a DB instance dedicated to read operations, separate from the main RDS instance. The name in English is Read Replica, which is easy to remember.

Data synchronization between the master and read replica is asynchronous, so depending on timing, updates to the master may not be immediately reflected in the read replica. However, unlike Multi-AZ, it doesn’t affect the performance of the master DB.

Backup and Restore

  • Automated Backups
    • This feature automatically takes a backup (DB snapshot) once a day. To restore the DB, you select the snapshot you want from the list.
  • Manual Snapshots
    • You can manually take a backup (DB snapshot) of the RDB at any time.
  • Data Restore
    • You can restore data just by selecting the snapshot you want to revert to from the snapshot list.
  • Point-in-Time Recovery
    • This service allows you to create a new RDS instance from a snapshot taken at any time from 5 minutes ago to a maximum of 35 days ago.

Security

  • Network Security
    • Supports encryption using SSL.
  • Data Encryption
    • Data related to RDS, such as storage, snapshots, and logs, is kept in an encrypted state. The data encryption option cannot be enabled midway.

Amazon Aurora

Features

  • You don’t need to specify the capacity when creating a cluster volume; it automatically scales up or down.
  • Unlike other RDS, there is no Multi-AZ configuration option. However, you can create read replicas. If the primary instance fails, the replica instance will be promoted to primary.

Endpoints

When creating an Aurora instance, three types of endpoints (FQDN) are created: cluster endpoint, reader endpoint, and instance endpoint.

  • Cluster Endpoint
    • Endpoint for connecting to the primary instance. It allows all operations on the database, such as read, create, update, delete, and schema changes.
  • Reader Endpoint
    • Endpoint for connecting to the replica instance. It only accepts read operations on the database.
  • Instance Endpoint
    • Endpoint for connecting to each DB instance constituting the Aurora cluster. If connected to the primary, all operations are allowed; if connected to a replica instance, only read operations are allowed.
  • Custom Endpoint
    • Used when accessing a specific database. It’s suitable for processing with high load.

Redshift

Amazon Redshift is a data service for data warehousing provided by AWS. A data warehouse is essentially a warehouse for data. It allows you to store data chronologically and enables you to utilize that data.

Redshift Configuration

Consists of Redshift cluster, leader node, and compute node. The key to mastering Redshift is how to create a distribution structure that allows processing to be completed without spanning multiple compute nodes.

  • Redshift Cluster
    • A group of multiple nodes that make up a Redshift.
  • Leader Node
    • There is only one leader node per cluster, and as the name suggests, it acts as a command center.
  • Compute Node
    • Nodes that receive instructions from the leader node and process the query.
  • Node Slice
    • The smallest unit of Redshift’s distributed parallel processing.

Features of Redshift

  • Columnar database
  • Supports many compression encoding methods
  • Zone maps
  • Flexible scalability
  • Workload management features
  • Redshift Spectrum

DynamoDB

A highly scalable Key-Value type database. Personally, I’m also paying attention to it, and the fact that it requires less operational overhead is attractive.

Features of DynamoDB

  • High availability design
  • Throughput capacity
    • Throughput capacity refers to processing能力. Changes are automatically scaled.
  • Data partitioning
    • Data is distributed and stored in units called partitions.
  • Automatic maintenance of expired data (Time to Live, TTL)
  • DynamoDB Streams
  • Consistent Read
    • This is not the essence of how to use a Key-Value type database, so in some cases, it might be better to consider switching to an RDB.
  • DynamoDB Accelerator (DAX)
    • An extended service that configures a cache cluster in front of DynamoDB.

Primary Key and Index

I haven’t designed DynamoDB yet, but this is the most important part.

  • Primary Key
    • An attribute used to uniquely identify a data item. It is also used as an index.

DynamoDB also has local secondary indexes and global secondary indexes, but these are not the essence of how to use a Key-Value type database, so in some cases, it might be better to consider switching to an RDB.

ElastiCache

An in-memory database service provided by AWS. If you’re at the level of AWS Architect certification, you can probably get by just remembering that ElastiCache is an in-memory database service.

What is an in-memory database service?

An in-memory database is a type of specialized database. Unlike databases that store data on disks or SSDs, they primarily rely on memory for data storage. In-memory databases are designed to achieve minimal response times by eliminating the need to access disks. Since all data is stored and managed only in the main memory, there is a risk of data loss due to processing or server failures. However, in-memory databases can persist data by logging all operations or taking snapshots.
In-memory databases are ideal for applications that require microsecond response times and may experience traffic spikes at any time, such as game leaderboards, session stores, and real-time analytics.
What is an in-memory database?

As always, it’s full of technical terms and hard to understand, but the bottom line is that it’s faster because it relies on memory instead of SSDs or other storage.

Amazon EMR

A managed cluster platform for processing and analyzing large amounts of data using big data frameworks like Apache Hadoop and Apache Spark.

When you see the term big data, just think of Amazon EMR.

Amazon Athena

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so you don’t have to manage any infrastructure.

When you see words like s3, sql, or query, just think of Amazon Athena.

References

AWS provides ample documentation, so you might not need books. However, when it comes to exam preparation, paper books that you can open anywhere are convenient, so it’s worth buying some reference books.

AWS認定 クラウドプラクティショナー 改訂第3版

AWS認定資格試験テキスト AWS認定ソリューションアーキテクト - アソシエイト 改訂第3版

See Also