Today marks a significant advancement in data management with the general availability of Amazon Aurora PostgreSQL-Compatible Edition and Amazon DynamoDB zero-ETL integrations with Amazon Redshift. This development is a game-changer for businesses seeking to streamline their data processing workflows. The zero-ETL integration, a concept that eliminates the need for the traditional extract, transform, and load (ETL) processes, allows businesses to make their transactional or operational data readily available in Amazon Redshift for analytics and machine learning. This seamless integration automates data replication, ensuring that data is consistently updated and available for analytical processing.
With zero-ETL integrations, businesses can run comprehensive analytics on data from various applications without the hassle of constructing and managing multiple data pipelines. This integration simplifies the process of consolidating data from diverse relational and non-relational sources into a singular data warehouse. In this article, we’ll delve into how you can get started with Amazon Aurora PostgreSQL and Amazon DynamoDB zero-ETL integrations with Amazon Redshift.
Understanding Zero-ETL Integration
To implement a zero-ETL integration, you need to define a source and designate Amazon Redshift as the target. The integration automatically replicates data from the source to the target data warehouse, ensuring a seamless data flow into Amazon Redshift. It also monitors the health of the pipeline to ensure data integrity and availability.
Let’s break down the process of setting up these integrations, focusing on how to create zero-ETL integrations that replicate data from Aurora PostgreSQL and DynamoDB to Amazon Redshift. We’ll also explore how to choose multiple tables or databases from Aurora PostgreSQL to replicate data to a single Amazon Redshift cluster. This integration offers flexibility without the burden of managing numerous ETL pipelines.
Getting Started with Aurora PostgreSQL Zero-ETL Integration
Before setting up your database, it’s crucial to create a custom cluster parameter group since Aurora PostgreSQL zero-ETL integration with Amazon Redshift requires specific parameters. In the Amazon RDS console, navigate to the Parameter groups section and create a new parameter group. Name it custom-pg-aurora-postgres-zero-etl, select Aurora PostgreSQL for the Engine type, and choose aurora-postgresql16 for the Parameter group family. This integration is compatible with PostgreSQL versions 16.4 and above.
Once the parameter group is created, modify it by selecting it from the Parameter groups page. Click on Actions and choose Edit to set the following parameters:
- rds.logical_replication=1
- aurora.enhanced_logical_replication=1
- aurora.logical_replication_backup=0
- aurora.logical_replication_globaldb=0
After saving these changes, proceed to create an Aurora PostgreSQL database. During the setup, choose Aurora PostgreSQL (compatible with PostgreSQL 16.4 or above) from the available versions and select the custom cluster parameter group (custom-pg-aurora-postgres-zero-etl) in the Additional configuration section.
Once your database is ready, connect to the Aurora PostgreSQL cluster, create a database named books, and a table called book_catalog. Insert some sample data to use with the zero-ETL integration.
To initiate a zero-ETL integration, use an existing Amazon Redshift data warehouse. For assistance with creating and managing Amazon Redshift resources, refer to the Amazon Redshift Getting Started Guide.
In the Amazon RDS console, navigate to the Zero-ETL integrations tab and select Create zero-ETL integration. Enter postgres-redshift-zero-etl as the Integration identifier and provide a description. Follow the prompts to select your source database and apply data filtering options using the database.schema.table pattern. Include your book_catalog table from the Aurora PostgreSQL books database. The * wildcard in filters will replicate all book_catalog tables across all schemas within the books database.
Next, browse Redshift data warehouses and select your target Amazon Redshift data warehouse. Specify authorized principals and integration source on the target to allow Amazon Aurora to replicate data into the warehouse. You can choose to let Amazon RDS handle these configurations automatically or configure them manually in Amazon Redshift.
After configuring the case sensitivity parameter and resource policy for the data warehouse, proceed to add tags and encryption. Review your setup and create the zero-ETL integration.
Once the integration succeeds, access the integration details and create a database from the integration. Enter zeroetl_aurorapg as the Destination database name and finalize the setup.
When the database is created, return to the Aurora PostgreSQL integration page to query data. Running a select query in the zeroetl_aurorapg database should show that the data in the book_catalog table is successfully replicated to Amazon Redshift.
By selecting multiple tables or databases from the Aurora PostgreSQL source database, you can replicate additional data to the same Amazon Redshift cluster. To add more tables, update the Data filtering options to include new tables using the database.schema.table format. For example, you can create another table named publisher in the Aurora PostgreSQL cluster and insert sample data. Modify the Data filtering options to include the publisher table for replication.
Switch to the Amazon Redshift Query editor to verify that the new publisher table and its records are replicated to the data warehouse.
Getting Started with DynamoDB Zero-ETL Integration
To create an Amazon DynamoDB zero-ETL integration, use an existing DynamoDB table named Book_Catalog. In the Amazon Redshift console, navigate to Zero-ETL integrations, select Create zero-ETL integration, and choose Create DynamoDB integration. Provide a name and description for the integration and proceed to the next steps.
Choose your DynamoDB table and specify a resource policy with authorized principals and integration sources. Enable point-in-time recovery (PITR) on the source table before creating the integration. Amazon DynamoDB can apply the necessary configurations for you, or you can do it manually.
Select your existing Amazon Redshift Serverless data warehouse as the target and proceed to add tags and encryption. Review your settings and create the DynamoDB integration.
Create a database from the integration by entering zeroetl_dynamodb as the Destination database name. Once the database is created, query the data to confirm that the data from the DynamoDB Book_Catalog table is replicated to Amazon Redshift.
Insert another entry into the DynamoDB Book_Catalog table and verify that the new record is replicated to the data warehouse by refreshing the select query in Amazon Redshift.
Benefits and Availability
Zero-ETL integrations between Aurora PostgreSQL and DynamoDB with Amazon Redshift unify data from multiple database clusters, unlocking valuable insights. Amazon Redshift supports cross-database queries and materialized views based on multiple tables, simplifying analytics, improving operational efficiency, and optimizing costs. With zero-ETL, there’s no need to manage complex ETL pipelines.
Aurora PostgreSQL zero-ETL integration with Amazon Redshift is now available in several AWS Regions, including US East (N. Virginia), US East (Ohio), US West (Oregon), and others in Asia Pacific and Europe. Amazon DynamoDB zero-ETL integration is available in all commercial, China, and GovCloud AWS Regions.
For pricing details, refer to the Amazon Aurora and Amazon DynamoDB pricing pages. To learn more about these integrations, consult the Working with Aurora zero-ETL integrations with Amazon Redshift and Amazon Redshift Zero-ETL integrations documentation.
For more Information, Refer to this article.