
Teradata Studio 14.02 provides a Smart Loader for Hadoop feature that allows users to transfer data from Teradata to Hadoop and Hadoop to Teradata. The Hadoop Smart Loader uses the Teradata Connector for Hadoop MapReduce Java classes as the underlying technology for data movement. It requires the HCatalog metadata layer to import Hadoop objects. Currently, the Smart Loader for Hadoop feature in Teradata Studio 14.02 is certified to use the Teradata Connector for Hadoop version 1.0.6, and the Hortonworks distribution of Hadoop. Since Cloudera CDH 4.2 now has HCatalog, testing will be done soon to certify Smart Loader for Hadoop with Cloudera.
With bi-directional data loading, users can easily perform ad hoc data transfer between their Teradata and Hadoop systems. The Hadoop Smart Loader can be invoked by drag and drop of a table between the two systems or from a menu option to Import from Hadoop or Export to Hadoop.
Enable Hadoop Smart Loader
To use the Hadoop Smart Loader you must first enable the Hadoop Transfer Perspective within Teradata Studio. From the Windows>Preferences toolbar option, open the Data Transfer Preferences page and check the 'Enable Hadoop Views' checkbox.
Then click the Open Perspective button in the upper toolbar and select Other.... Next select Hadoop Transfer and click OK.
This will open the Hadoop Transfer perspective, providing the Hadoop View, Transfer Progress View, and Transfer History View in your Teradata Studio display.
Create Hadoop Connection Profile
Now you are ready to create a Hadoop connection profile and transfer data. Click the 'Add a Hadoop profile' button [] in the Hadoop View to invoke the Hadoop Profile dialog. Enter the name for your Hadoop connection profile, the HCatalog hostname, and the system username and password.
The Hadoop View will connect to your Hadoop system and display the list of Hadoop schemas. Open the Tables folder to view the list of tables. Right click on a table and select the Table Properties option to see the properties for a Hadoop table, such as location, file size, file type and column names and types. Drag a table from Hadoop and drop in on your Teradata Database or invoke the Import and Export wizards from the Data Source Explorer.
Import a Table from Hadoop
You can invoke the Import from Hadoop wizard from the Teradata Studio Data Source Explorer. Connect to your Teradata database and locate the Tables folder for the database you want to import into. Right click and choose the Teradata>Import from Hadoop... menu option.
Select the Hadoop connection profile, database, and table you want to import and click Next.
The next screen allows you to edit the table name, 'No Primary Index' option, and column data types.
NOTE: columns that are defined as Strings in Hadoop are given the Teradata column type of VARCHAR with a default length of 2048. You should edit these columns to provide a more appropriate size for the VARCHAR column. Click the elipse button to edit the column type.
Click OK to create the table in your Teradata Database and start the data transfer from Hadoop.
Export a Teradata Table to Hadoop
You can invoke the Export To Hadoop wizard from the Teradata Studio Data Source Explorer. Connect to your Teradata Database and locate the table you want to export to the Hadoop system. Right click and choose the Data>Export To Hadoop... option.
Choose the Hadoop connection profile and database to export the table to and click Next.
Verify the column types created for the Hadoop table. You can choose between RC and Text file transfers. Click Finish to perform the transfer.
Hadoop Data Transfer Job
A transfer job is created to transfer the data to and from Teradata and Hadoop. You can view the progress of the transfer job in the Transfer Progress View of the Hadoop perspective. Once the job is complete, an entry is placed in the Transfer History and displayed in the Transfer History View.
Select the entry in the Transfer History and click on the Show Job Output toolbar button to view the output from the Hadoop job transfer.
Help
Teradata Studio provides Help information. Click on Help>Help Contents in the main toolbar.
Conclusion
Teradata Studio Hadoop Smart Loader provides an ad hoc data movement tool to transfer data between Teradata and Hadoop. It provides a point and click GUI where no scripting is required. You can download Teradata Studio on the Teradata Download site. For more information about other Teradata Studio features, refer to the article called Teradata Studio.