Amazon Redshift is a data warehouse cloud platform provided in AWS. This document outlines one of many possible migration paths to Hydra, geared to be accessible and demonstrate key considerations.
Redshift is a heavily modified version of PostgresSQL, making migration from Redshift to Hydra easier. However, there are some key differences which need to be addressed in the migration.
In this guide, we use a dataset which has been simplified. There are no foreign keys or other constraints (because Redshift doesn't use them). There is also no security model present. A production database will need to those address these concerns during migration. The tools chosen for this are meant to be accessible to any level of user. This document only uses a text editor of your choice, DBeaver, and the Redshift console to accomplish the migration. Command-line tools, purpose-built migration tools, or migration code in Python and Spark can be very helpful for migrating production databases. We suggest you familiarize yourself with these options before migrating.
You can read about setting up the 'tickit' database for Redshift in the AWS documentation. A free trial evaluation of Hydra is available here. Simply turning both on for the first time is all that is needed for this exercise. You will also need an S3 bucket with the permissions to talk to it from both Redshift and Hydra.
Setup a Redshift workgroup and Setting up the 'tickit' database is explained in the AWS documentation.
Start a trial of Hydra if you haven't already done so.
The data model is the structure all of the data is poured into. Our first task is to move over the structure, which will require some minor modifications.
To retrieve the DDL as a whole we will use DBeaver. As noted in the considerations, there are some parts of the Redshift DDL which will not be compatible with Hydra. You will need to edit the output from DBeaver in a text editor of your choice, but that will be described later.
In the RedShift query tool, use the following code on each table you want to extract. (change "Category" for each table name on both unload-from and to-s3 lines.)
2. Simplify the data types
- All INTs and BOOLEAN to int. If you wish, you can shorten or lengthen the ints to different sized ints which will affect memory usage. You will need to re-convert the 0-1 boolean ints back to boolean. This can be done during the INSERT.
- DATE and TIMESTAMP can be left alone
- All VARCHAR and other strings to simply text
Example output:
In case there is any type casting that needs to be done, you will need to explicitly convert the fields. For example, below we convert from int to boolean:
At this point, we have performed the main steps and considerations for migrating from Redshift. As noted in the beginning, this documentation has migrated the schema from Redshift and migrated data through a layover in S3. If more help is needed to modify code, security access models, or other items not in this documentation feel free to reach out to Hydra for professional assistance.