User Guide For Powercenter: Informatica Powerexchange For Amazon Redshift (Version 10.0)
User Guide For Powercenter: Informatica Powerexchange For Amazon Redshift (Version 10.0)
Version 10.0
November 2015
This software and documentation contain proprietary information of Informatica LLC and are provided under a license agreement containing restrictions on use and
disclosure and are also protected by copyright law. Reverse engineering of the software is prohibited. No part of this document may be reproduced or transmitted in any
form, by any means (electronic, photocopying, recording or otherwise) without prior consent of Informatica LLC. This Software may be protected by U.S. and/or
international Patents and other Patents Pending.
Use, duplication, or disclosure of the Software by the U.S. Government is subject to the restrictions set forth in the applicable software license agreement and as
provided in DFARS 227.7202-1(a) and 227.7702-3(a) (1995), DFARS 252.227-7013©(1)(ii) (OCT 1988), FAR 12.212(a) (1995), FAR 52.227-19, or FAR 52.227-14
(ALT III), as applicable.
The information in this product or documentation is subject to change without notice. If you find any problems in this product or documentation, please report them to us
in writing.
Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter Data Analyzer, PowerExchange,
PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica B2B Data Transformation, Informatica B2B Data Exchange Informatica
On Demand, Informatica Identity Resolution, Informatica Application Information Lifecycle Management, Informatica Complex Event Processing, Ultra Messaging and
Informatica Master Data Management are trademarks or registered trademarks of Informatica LLC in the United States and in jurisdictions throughout the world. All
other company and product names may be trade names or trademarks of their respective owners.
Portions of this software and/or documentation are subject to copyright held by third parties, including without limitation: Copyright DataDirect Technologies. All rights
reserved. Copyright © Sun Microsystems. All rights reserved. Copyright © RSA Security Inc. All Rights Reserved. Copyright © Ordinal Technology Corp. All rights
reserved.Copyright © Aandacht c.v. All rights reserved. Copyright Genivia, Inc. All rights reserved. Copyright Isomorphic Software. All rights reserved. Copyright © Meta
Integration Technology, Inc. All rights reserved. Copyright © Intalio. All rights reserved. Copyright © Oracle. All rights reserved. Copyright © Adobe Systems
Incorporated. All rights reserved. Copyright © DataArt, Inc. All rights reserved. Copyright © ComponentSource. All rights reserved. Copyright © Microsoft Corporation. All
rights reserved. Copyright © Rogue Wave Software, Inc. All rights reserved. Copyright © Teradata Corporation. All rights reserved. Copyright © Yahoo! Inc. All rights
reserved. Copyright © Glyph & Cog, LLC. All rights reserved. Copyright © Thinkmap, Inc. All rights reserved. Copyright © Clearpace Software Limited. All rights
reserved. Copyright © Information Builders, Inc. All rights reserved. Copyright © OSS Nokalva, Inc. All rights reserved. Copyright Edifecs, Inc. All rights reserved.
Copyright Cleo Communications, Inc. All rights reserved. Copyright © International Organization for Standardization 1986. All rights reserved. Copyright © ej-
technologies GmbH. All rights reserved. Copyright © Jaspersoft Corporation. All rights reserved. Copyright © International Business Machines Corporation. All rights
reserved. Copyright © yWorks GmbH. All rights reserved. Copyright © Lucent Technologies. All rights reserved. Copyright (c) University of Toronto. All rights reserved.
Copyright © Daniel Veillard. All rights reserved. Copyright © Unicode, Inc. Copyright IBM Corp. All rights reserved. Copyright © MicroQuill Software Publishing, Inc. All
rights reserved. Copyright © PassMark Software Pty Ltd. All rights reserved. Copyright © LogiXML, Inc. All rights reserved. Copyright © 2003-2010 Lorenzi Davide, All
rights reserved. Copyright © Red Hat, Inc. All rights reserved. Copyright © The Board of Trustees of the Leland Stanford Junior University. All rights reserved. Copyright
© EMC Corporation. All rights reserved. Copyright © Flexera Software. All rights reserved. Copyright © Jinfonet Software. All rights reserved. Copyright © Apple Inc. All
rights reserved. Copyright © Telerik Inc. All rights reserved. Copyright © BEA Systems. All rights reserved. Copyright © PDFlib GmbH. All rights reserved. Copyright ©
Orientation in Objects GmbH. All rights reserved. Copyright © Tanuki Software, Ltd. All rights reserved. Copyright © Ricebridge. All rights reserved. Copyright © Sencha,
Inc. All rights reserved. Copyright © Scalable Systems, Inc. All rights reserved. Copyright © jQWidgets. All rights reserved. Copyright © Tableau Software, Inc. All rights
reserved. Copyright© MaxMind, Inc. All Rights Reserved. Copyright © TMate Software s.r.o. All rights reserved. Copyright © MapR Technologies Inc. All rights reserved.
Copyright © Amazon Corporate LLC. All rights reserved. Copyright © Highsoft. All rights reserved. Copyright © Python Software Foundation. All rights reserved.
Copyright © BeOpen.com. All rights reserved. Copyright © CNRI. All rights reserved.
This product includes software developed by the Apache Software Foundation (http://www.apache.org/), and/or other software which is licensed under various versions
of the Apache License (the "License"). You may obtain a copy of these Licenses at http://www.apache.org/licenses/. Unless required by applicable law or agreed to in
writing, software distributed under these Licenses is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied. See the Licenses for the specific language governing permissions and limitations under the Licenses.
This product includes software which was developed by Mozilla (http://www.mozilla.org/), software copyright The JBoss Group, LLC, all rights reserved; software
copyright © 1999-2006 by Bruno Lowagie and Paulo Soares and other software which is licensed under various versions of the GNU Lesser General Public License
Agreement, which may be found at http:// www.gnu.org/licenses/lgpl.html. The materials are provided free of charge by Informatica, "as-is", without warranty of any
kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose.
The product includes ACE(TM) and TAO(TM) software copyrighted by Douglas C. Schmidt and his research group at Washington University, University of California,
Irvine, and Vanderbilt University, Copyright (©) 1993-2006, all rights reserved.
This product includes software developed by the OpenSSL Project for use in the OpenSSL Toolkit (copyright The OpenSSL Project. All Rights Reserved) and
redistribution of this software is subject to terms available at http://www.openssl.org and http://www.openssl.org/source/license.html.
This product includes Curl software which is Copyright 1996-2013, Daniel Stenberg, <daniel@haxx.se>. All Rights Reserved. Permissions and limitations regarding this
software are subject to terms available at http://curl.haxx.se/docs/copyright.html. Permission to use, copy, modify, and distribute this software for any purpose with or
without fee is hereby granted, provided that the above copyright notice and this permission notice appear in all copies.
The product includes software copyright 2001-2005 (©) MetaStuff, Ltd. All Rights Reserved. Permissions and limitations regarding this software are subject to terms
available at http://www.dom4j.org/ license.html.
The product includes software copyright © 2004-2007, The Dojo Foundation. All Rights Reserved. Permissions and limitations regarding this software are subject to
terms available at http://dojotoolkit.org/license.
This product includes ICU software which is copyright International Business Machines Corporation and others. All rights reserved. Permissions and limitations
regarding this software are subject to terms available at http://source.icu-project.org/repos/icu/icu/trunk/license.html.
This product includes software copyright © 1996-2006 Per Bothner. All rights reserved. Your right to use such materials is set forth in the license which may be found at
http:// www.gnu.org/software/ kawa/Software-License.html.
This product includes OSSP UUID software which is Copyright © 2002 Ralf S. Engelschall, Copyright © 2002 The OSSP Project Copyright © 2002 Cable & Wireless
Deutschland. Permissions and limitations regarding this software are subject to terms available at http://www.opensource.org/licenses/mit-license.php.
This product includes software developed by Boost (http://www.boost.org/) or under the Boost software license. Permissions and limitations regarding this software are
subject to terms available at http:/ /www.boost.org/LICENSE_1_0.txt.
This product includes software copyright © 1997-2007 University of Cambridge. Permissions and limitations regarding this software are subject to terms available at
http:// www.pcre.org/license.txt.
This product includes software copyright © 2007 The Eclipse Foundation. All Rights Reserved. Permissions and limitations regarding this software are subject to terms
available at http:// www.eclipse.org/org/documents/epl-v10.php and at http://www.eclipse.org/org/documents/edl-v10.php.
This product includes software licensed under the terms at http://www.tcl.tk/software/tcltk/license.html, http://www.bosrup.com/web/overlib/?License, http://
www.stlport.org/doc/ license.html, http://asm.ow2.org/license.html, http://www.cryptix.org/LICENSE.TXT, http://hsqldb.org/web/hsqlLicense.html, http://
httpunit.sourceforge.net/doc/ license.html, http://jung.sourceforge.net/license.txt , http://www.gzip.org/zlib/zlib_license.html, http://www.openldap.org/software/release/
license.html, http://www.libssh2.org, http://slf4j.org/license.html, http://www.sente.ch/software/OpenSourceLicense.html, http://fusesource.com/downloads/license-
agreements/fuse-message-broker-v-5-3- license-agreement; http://antlr.org/license.html; http://aopalliance.sourceforge.net/; http://www.bouncycastle.org/licence.html;
http://www.jgraph.com/jgraphdownload.html; http://www.jcraft.com/jsch/LICENSE.txt; http://jotm.objectweb.org/bsd_license.html; . http://www.w3.org/Consortium/Legal/
2002/copyright-software-20021231; http://www.slf4j.org/license.html; http://nanoxml.sourceforge.net/orig/copyright.html; http://www.json.org/license.html; http://
forge.ow2.org/projects/javaservice/, http://www.postgresql.org/about/licence.html, http://www.sqlite.org/copyright.html, http://www.tcl.tk/software/tcltk/license.html, http://
www.jaxen.org/faq.html, http://www.jdom.org/docs/faq.html, http://www.slf4j.org/license.html; http://www.iodbc.org/dataspace/iodbc/wiki/iODBC/License; http://
www.keplerproject.org/md5/license.html; http://www.toedter.com/en/jcalendar/license.html; http://www.edankert.com/bounce/index.html; http://www.net-snmp.org/about/
license.html; http://www.openmdx.org/#FAQ; http://www.php.net/license/3_01.txt; http://srp.stanford.edu/license.txt; http://www.schneier.com/blowfish.html; http://
www.jmock.org/license.html; http://xsom.java.net; http://benalman.com/about/license/; https://github.com/CreateJS/EaselJS/blob/master/src/easeljs/display/Bitmap.js;
http://www.h2database.com/html/license.html#summary; http://jsoncpp.sourceforge.net/LICENSE; http://jdbc.postgresql.org/license.html; http://
protobuf.googlecode.com/svn/trunk/src/google/protobuf/descriptor.proto; https://github.com/rantav/hector/blob/master/LICENSE; http://web.mit.edu/Kerberos/krb5-
current/doc/mitK5license.html; http://jibx.sourceforge.net/jibx-license.html; https://github.com/lyokato/libgeohash/blob/master/LICENSE; https://github.com/hjiang/jsonxx/
blob/master/LICENSE; https://code.google.com/p/lz4/; https://github.com/jedisct1/libsodium/blob/master/LICENSE; http://one-jar.sourceforge.net/index.php?
page=documents&file=license; https://github.com/EsotericSoftware/kryo/blob/master/license.txt; http://www.scala-lang.org/license.html; https://github.com/tinkerpop/
blueprints/blob/master/LICENSE.txt; http://gee.cs.oswego.edu/dl/classes/EDU/oswego/cs/dl/util/concurrent/intro.html; https://aws.amazon.com/asl/; https://github.com/
twbs/bootstrap/blob/master/LICENSE; https://sourceforge.net/p/xmlunit/code/HEAD/tree/trunk/LICENSE.txt; https://github.com/documentcloud/underscore-contrib/blob/
master/LICENSE, and https://github.com/apache/hbase/blob/master/LICENSE.txt.
This product includes software licensed under the Academic Free License (http://www.opensource.org/licenses/afl-3.0.php), the Common Development and Distribution
License (http://www.opensource.org/licenses/cddl1.php) the Common Public License (http://www.opensource.org/licenses/cpl1.0.php), the Sun Binary Code License
Agreement Supplemental License Terms, the BSD License (http:// www.opensource.org/licenses/bsd-license.php), the new BSD License (http://opensource.org/
licenses/BSD-3-Clause), the MIT License (http://www.opensource.org/licenses/mit-license.php), the Artistic License (http://www.opensource.org/licenses/artistic-
license-1.0) and the Initial Developer’s Public License Version 1.0 (http://www.firebirdsql.org/en/initial-developer-s-public-license-version-1-0/).
This product includes software copyright © 2003-2006 Joe WaInes, 2006-2007 XStream Committers. All rights reserved. Permissions and limitations regarding this
software are subject to terms available at http://xstream.codehaus.org/license.html. This product includes software developed by the Indiana University Extreme! Lab.
For further information please visit http://www.extreme.indiana.edu/.
This product includes software Copyright (c) 2013 Frank Balluffi and Markus Moeller. All rights reserved. Permissions and limitations regarding this software are subject
to terms of the MIT license.
DISCLAIMER: Informatica LLC provides this documentation "as is" without warranty of any kind, either express or implied, including, but not limited to, the implied
warranties of noninfringement, merchantability, or use for a particular purpose. Informatica LLC does not warrant that this software or documentation is error free. The
information provided in this software or documentation may include technical inaccuracies or typographical errors. The information in this software and documentation is
subject to change at any time without notice.
NOTICES
This Informatica product (the "Software") includes certain drivers (the "DataDirect Drivers") from DataDirect Technologies, an operating company of Progress Software
Corporation ("DataDirect") which are subject to the following terms and conditions:
1. THE DATADIRECT DRIVERS ARE PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
2. IN NO EVENT WILL DATADIRECT OR ITS THIRD PARTY SUPPLIERS BE LIABLE TO THE END-USER CUSTOMER FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, CONSEQUENTIAL OR OTHER DAMAGES ARISING OUT OF THE USE OF THE ODBC DRIVERS, WHETHER OR NOT
INFORMED OF THE POSSIBILITIES OF DAMAGES IN ADVANCE. THESE LIMITATIONS APPLY TO ALL CAUSES OF ACTION, INCLUDING, WITHOUT
LIMITATION, BREACH OF CONTRACT, BREACH OF WARRANTY, NEGLIGENCE, STRICT LIABILITY, MISREPRESENTATION AND OTHER TORTS.
4 Table of Contents
Identity Columns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Amazon Redshift Source Session Configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Amazon Redshift Target Sessions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Amazon Redshift Staging Directory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Vacuum Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Amazon Redshift Target Session Configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Configuring HTTP Proxy Options at Design-Time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Configuring HTTP Proxy Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Success and Error Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Success Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Error Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Table of Contents 5
Preface
The Informatica PowerExchange for Amazon Redshift User Guide for PowerCenter describes how to read
data from and write data to an Amazon Redshift target. The guide is written for database administrators and
developers who are responsible for moving data from a source to an Amazon Redshift target, and from an
Amazon Redshift source to a target. This guide assumes that you have knowledge of database engines,
Amazon Redshift, and PowerCenter.
Informatica Resources
Informatica Documentation
The Informatica Documentation team makes every effort to create accurate, usable documentation. If you
have questions, comments, or ideas about this documentation, contact the Informatica Documentation team
through email at infa_documentation@informatica.com. We will use your feedback to improve our
documentation. Let us know if we can contact you regarding your comments.
The Documentation team updates documentation as needed. To get the latest documentation for your
product, navigate to Product Documentation from https://mysupport.informatica.com.
6
Informatica Web Site
You can access the Informatica corporate web site at https://www.informatica.com. The site contains
information about Informatica, its background, upcoming events, and sales offices. You will also find product
and partner information. The services area of the site includes important information about technical support,
training and education, and implementation services.
Informatica Marketplace
The Informatica Marketplace is a forum where developers and partners can share solutions that augment,
extend, or enhance data integration implementations. By leveraging any of the hundreds of solutions
available on the Marketplace, you can improve your productivity and speed up time to implementation on
your projects. You can access Informatica Marketplace at http://www.informaticamarketplace.com.
Informatica Velocity
You can access Informatica Velocity at https://mysupport.informatica.com. Developed from the real-world
experience of hundreds of data management projects, Informatica Velocity represents the collective
knowledge of our consultants who have worked with organizations from around the world to plan, develop,
deploy, and maintain successful data management solutions. If you have questions, comments, or ideas
about Informatica Velocity, contact Informatica Professional Services at ips@informatica.com.
Online Support requires a user name and password. You can request a user name and password at
http://mysupport.informatica.com.
Preface 7
The telephone numbers for Informatica Global Customer Support are available from the Informatica web site
at http://www.informatica.com/us/services-and-training/support-services/global-support-centers/.
8 Preface
CHAPTER 1
Introduction to PowerExchange
for Amazon Redshift
This chapter includes the following topics:
Amazon Redshift views contain information about the functioning of the Amazon Redshift system. You can
run a query on views like you run a query on database tables.
You can use Amazon Redshift objects as sources and targets in mappings. When you use Amazon Redshift
objects in mappings, you must configure properties specific to Amazon Redshift.
You can configure HTTPS proxy to connect to Amazon Redshift. You can also configure an SSL connection
to connect to Amazon Redshift.
The PowerCenter Integration Service uses the Amazon driver to communicate with Amazon Redshift.
Example
You work for an organization that stores purchase order details, such as customer ID, item codes, and item
quantity in an on-premise MySQL database. You need to analyze purchase order details and move data from
the on-premise MySQL database to an affordable cloud-based environment. Create a mapping to read all the
purchase records from the MySQL database and write them to an Amazon Redshift target for data analysis.
9
PowerCenter Integration Service and Amazon
Redshift Integration
The PowerCenter Integration Service uses the Amazon Redshift connection to connect to Amazon Redshift.
The following image shows how PowerCenter connects to Amazon Redshift to read data:
When you run the Amazon Redshift session, the PowerCenter Integration Service reads data from Amazon
Redshift based on the workflow and Amazon Redshift connection configuration. The PowerCenter Integration
Service connects and reads data from Amazon Simple Storage Service (Amazon S3) through a TCP/IP
network. The PowerCenter Integration Service then stores data in a staging directory on the PowerCenter
machine. Amazon S3 is a storage service in which you can copy data from source and simultaneously move
data to any target. The PowerCenter Integration Service issues a copy command that copies data from
Amazon S3 to the target.
The following image shows how PowerCenter connects to Amazon Redshift to write data:
When you run the Amazon Redshift session, the PowerCenter Integration Service writes data to Amazon
Redshift based on the workflow and Amazon Redshift connection configuration. The PowerCenter Integration
Service stores data in a staging directory on the PowerCenter machine. The PowerCenter Integration Service
then connects and writes data to Amazon Simple Storage Service (Amazon S3) through a TCP/IP network.
Amazon S3 is a storage service in which you can copy data from source and simultaneously move data to
Amazon Redshift clusters. The PowerCenter Integration Service issues a copy command that copies data
from Amazon S3 to the Amazon Redshift target table.
Amazon Redshift uses columnar data storage, parallel processing, and data compression to store data and to
achieve fast query execution. Amazon Redshift uses a cluster-based architecture that consists of a leader
node and compute nodes. The leader node manages the compute nodes and communicates with the external
client programs. The leader node interacts with the client applications and communicates with compute
nodes. A compute node stores data and runs queries for the leader node. Any client that uses a PostgreSQL
driver can communicate with Amazon Redshift.
When you install the PowerExchange for Amazon Redshift server component, you enable the PowerCenter
Integration Service to read data from or write data to Amazon Redshift.
Prerequisites
Before you can use PowerExchange for Amazon Redshift, perform the following tasks:
12
4. Verify that you have read, write, and execute permissions on the following directories:
<Informatica Installation Directory>/server/bin
If you configure the PowerCenter Integration Service or PowerCenter Repository Service to run on primary
and backup nodes, install the PowerExchange for Amazon Redshift server component on each node
configured to run the PowerCenter Integration Service or PowerCenter Repository Service.
If you configure the PowerCenter Integration Service to run on a grid, install the PowerCenter for Amazon
Redshift server component on each node configured to run on the grid. If you cannot install the PowerCenter
for Amazon Redshift server component on each node on the grid, create a resource in the domain and assign
it to each node where you installed the PowerCenter for Amazon Redshift server component. When you
create a session, configure the session to use the resource.
1. Unzip the installation archive and navigate to the root directory of the extracted installer files.
2. Run the install.bat script file.
The Welcome page appears.
3. Click Next.
The Installation Directory page appears.
4. Enter the absolute path to the Informatica installation directory. Click Browse to find the directory or use
the default directory.
By default, the PowerCenter client is installed in the following location:
C:\Informatica\<version folder>
5. Click Next.
The Pre-Installation Summary page appears.
6. Verify that all installation requirements are met and click Install.
The installer shows the progress of the installation. When the installation is complete, the Post-
Installation Summary page displays the status of the installation.
7. Click Done to close the installer.
For more information about the tasks performed by the installer, view the installation log files.
A plug-in is an XML file that defines the functionality of PowerExchange for Amazon Redshift. To register the
plug-in, the repository must be running in exclusive mode. Use the Administrator tool or the pmrep
RegisterPlugin command to register the plug-in.
The plug-in file for PowerExchange for Amazon Redshift is AmazonRSCloudAdapterPlugin.xml. When you
install PowerExchange for Amazon Redshift, the installer copies the AmazonRSCloudAdapterPlugin.xml file
to the following directory:
Note: If you do not have the correct privileges to register the plug-in, contact the user who manages the
PowerCenter Repository Service.
1. Start PowerCenter Designer, and connect to a PowerCenter repository configured with an Amazon
Redshift instance.
2. Open a source or target folder.
3. Select Source Analyzer or Target Designer.
4. Click Sources or Targets, and then click Import from AmazonRSCloud Adapter.
15
The Establish Connection dialog box appears.
5. Specify the following information and click Connect.
AWS Secret Access Key Amazon S3 bucket secret access key ID.
6. Click Next.
7. Select the table that you want to import, and then click Finish. If you want to see the table metadata,
select the table, and click the table name.
The PowerCenter Integration Service writes the data to a staging directory and then to an Amazon S3 bucket
before it writes the data to Amazon Redshift. You must specify the location of the staging directory in the
session properties. You must also specify an Amazon S3 bucket name in the session properties. You must
have write access to the Amazon S3 bucket.
18
Amazon Redshift Connection Properties
When you configure an Amazon Redshift connection, you define the connection attributes that the
PowerCenter Integration Service uses to connect to the Amazon Redshift database.
Property Description
Property Description
AWS Secret Access Key Amazon S3 bucket secret access key ID.
You can encrypt data, specify the location of the staging directory, and securely unload the results of a query
to files on Amazon Redshift.
Client-side Encryption
Client-side encryption is a technique to encrypt data while writing the data to Amazon S3.
When you enable client-side encryption, you must provide a Master Symmetric Key in the connection
properties. The PowerCenter Integration Service encrypts the data by using the Master Symmetric Key. The
Master Symmetric Key is a 256-bit AES encryption key in the Base64 format. PowerExchange for Amazon
Redshift uploads the data to the Amazon S3 server by using the Master Symmetric Key and then loads the
data by using the copy command with the Encrypted option and a private encryption key for additional
security.
To support encryption with maximum security, you must update the security policy .jar files
local_policy.jar and US_export_policy.jar. The .jar files are located at <Informatica Installation
Directory>\ava\jre\lib\security.
Identity Columns
An identity column contains unique values that are automatically generated.
S3 Bucket Name Amazon S3 bucket name for the Amazon Redshift source data.
Use an S3 bucket in the same region as your Amazon Redshift cluster.
Enable Compression Compresses staged files before writing the files to Amazon Redshift.
Task performance improves when the PowerCenter Integration Service compresses
the staged files.
Default is selected.
Turn on S3 Client Side Indicates that the PowerCenter Integration Service encrypts data before writing the
Encryption data to Amazon S3 by using a private encryption key.
You can perform insert, update, delete, and upsert operations on an Amazon Redshift target.
The PowerCenter Integration Service deletes the staged files from the staging directory after it writes the
data to Amazon S3. Specify a staging directory in the session properties with an appropriate amount of disk
space for the volume of data that you want to process. Specify a directory on the machine that hosts the
PowerCenter Integration Service.
The PowerCenter Integration Service creates subdirectories in the staging directory. Subdirectories use the
following naming convention:
<staging directory>/infaRedShiftStaging<MMDDHHmmSS>
Vacuum Tables
You can use vacuum tables to recover disk space and sorts rows in a specified table or all tables in the
database.
After you run bulk operations, such as delete or load, or after you run incremental updates, you must clean
the database tables to recover disk space and to improve query performance on Amazon Redshift. Amazon
Redshift does not reclaim and reuse free space when you delete and update rows.
You can configure vacuum table recovery options in the session properties. You can choose to recover disk
space for the entire database or for individual tables in a database. Vacuum databases or tables often to
maintain consistent query performance. You must run vacuum when you expect minimal activity on the
database or during designated database administration schedules. Long durations of vacuum might impact
database operations. Run vacuum often because large unsorted regions result in longer vacuum times.
You can enable the vacuum tables option when you configure a session that writes to Amazon Redshift. You
can select the following recovery options:
None
Does not sort rows or recover disk space.
Full
Sorts the specified table or all tables in the database and recovers disk space occupied by rows marked
for deletion by previous update and delete operations.
Sort Only
Sorts the specified table or all tables in the database without recovering space freed by deleted rows.
Delete Only
Recovers disk space occupied by rows marked for deletion by previous update and delete operations,
and compresses the table to free up used space.
Property Description
S3 Bucket Name Amazon S3 bucket name for the Amazon Redshift target data.
Use an S3 bucket in the same region as your Amazon Redshift cluster.
Enable Compression Compresses staged files before writing the files to Amazon Redshift.
Session performance improves when the PowerCenter Integration Service compresses
the staged files.
Default is selected.
Batch Size Minimum number of rows in a batch. Enter a number greater than 0.
Default is 1000.
Max Redshift Errors Number of errors within a batch that causes a batch to fail. Enter a positive integer.
per Upload Batch for If the number of errors is equal to or greater than the property value, the PowerCenter
INSERT Integration Service writes the entire batch to the error file.
Default is 1.
Truncate Target Truncates an Amazon Redshift target before writing data to the target.
Table Before Data
Load
Null value for CHAR String value used to represent null values in CHAR and VARCHAR fields in Amazon
and VARCHAR data Redshift targets, such as NULL or a space character.
types Default is an empty string.
Wait time in seconds Number of seconds to wait for the PowerCenter Integration Service to make the staged
for file consistency files consistent with the list of files available on Amazon S3.
on S3 Default is 5.
Turn on S3 Server Indicates that Amazon S3 encrypts data during upload and decrypts data at the time of
Side Encryption access.
Turn on S3 Client Indicates that the PowerCenter Integration Service encrypts data by using a private
Side Encryption encryption key.
If you enable both server side and client side encryption, the PowerCenter Integration
Service ignores the server side encryption.
Vacuum Target Recovers disk space and sorts rows in a specified table or all tables in the database.
Table You can select the following recovery options:
None
Full
Sorts the specified table or all tables in the database and recovers disk space
occupied by rows marked for deletion by previous update and delete operations.
Sort Only
Sorts the specified table or all tables in the database without recovering space freed
by deleted rows.
Delete Only
Recovers disk space occupied by rows marked for deletion by previous update and
delete operations, and compresses the table to free up used space.
Default is None.
INSERT If enabled, the PowerCenter Integration Service inserts all rows flagged for insert. If
disabled, the PowerCenter Integration Service rejects the rows flagged for insert.
By default, the insert operation is enabled.
DELETE If enabled, the PowerCenter Integration Service deletes all rows flagged for delete. If
disabled, the PowerCenter Integration Service rejects all rows flagged for delete.
UPDATE Performs update and upsert operations. To perform an update operation, you must map
the primary key column and at least one column other than primary key column.
You can select the following session target attributes:
Update as Update
The PowerCenter Integration Service updates existing rows and inserts other rows as
if marked for insert.
Error File Directory Directory for the Amazon Redshift error file.
Specify a directory on the machine that hosts the PowerCenter Integration Service.
• Ensure that you enable the proxy server settings from your web browser.
• Access the jvmoptions.ini file from the following location: <Informatica Installation Location>
\clients\PowerCenterClient\client\bin
• Add the following properties to the jvmoptions.ini file:
Property Description
-Dhttp.proxyUser= Authenticated user name for the HTTP proxy server. This is required if the proxy
server requires authentication.
-Dhttp.proxyPassword= Password for the authenticated user. This is required if the proxy server requires
authentication.
Note: The password is in plain text and not encrypted.
The PowerCenter Integration Service does not overwrite success or error files. You can manually delete the
files that you no longer need.
Success Files
The PowerCenter Integration Service generates a success file after you run a session. The success file
contains an entry for each record that successfully writes into Amazon Redshift. Each entry contains the
values that are written for all the fields of the record. Use this file to understand the data that the
PowerCenter Integration Service writes to the Amazon S3 bucket and then to the Amazon Redshift target.
If you want the PowerCenter Integration Service to generate a success file, specify a directory for the
success file in the session properties. Specify a directory on the machine that hosts the PowerCenter
Integration Service.
• If you do not specify a directory for the success file as a session property, the PowerCenter Integration
Service writes an empty file to the following directory: $PMTargetFileDir.
• The PowerCenter Integration Service generates the success file with the following naming convention:
<sessionName>_<timestamp>_success.csv.
• If you do not map a primary key column in a mapping, the insert, update, upsert, and delete operations
fail.
Error Files
The error file contains an entry for each data error. Each entry in the file contains the values for all fields of
the record and the error message. Use the error file to understand why the PowerCenter Integration Service
does not write data to the Amazon Redshift target.
If you want the PowerCenter Integration Service to generate an error file, specify a directory for the error file
in the session properties. You must not specify a network path or shared path.
Consider the following guidelines when you configure the session properties for error files:
• If you do not specify a directory for the error file as a session property, the PowerCenter Integration
Service writes a blank file to the following directory: $PMBadFileDir.
• The PowerCenter Integration Service generates errors file with the following naming convention:
<sessionName>_<timestamp>_error.csv.
For example, SMALLINT is the Amazon Redshift data type name. The data type is stored as a 2-byte integer.
Here, SMALLINT is the Amazon Redshift data type name and INT2 is the Amazon Redshift data type alias.
The following table compares the Amazon Redshift data types and the transformation data types:
27
Amazon Redshift Amazon Redshift Data Description Transformation Data
Data Type Type Aliases Type
TIMESTAMP TIMESTAMP WITHOUT Date and time (without time zone). Timestamp
TIME ZONE
A P
Amazon Redshift plug-ins
connection 18, 19 registering PowerExchange for Redshift 14
sessions overview 18 prerequisites 12
source session properties 21
target session properties 22
Amazon Redshift data types
comparing with transformation data types 27 R
overview 27 rules and guidelines
success files
error files 25
C
Configure
Amazon Redshift connection 19 S
source details 15
success file 25
E
error file 26
I
installing
PowerExchange for Amazon Redshift 13
29