Try Teradata Loom

Additional contributors:

Cover Image:

Attributed to http://en.wikipedia.org/wiki/Loom

Teradata Loom® is an integrated big data solution for effective data and metadata management on Hadoop, enabling rapid analyst productivity by making it easy to find, access and understand data. Loom is the complete solution for getting the most from your data lake. Loom provides access to Hadoop data and metadata through an open REST API. Built on the API, the Loom Workbench is a simple browser-based UI for working with data and metadata in Hadoop.

Agile Data Preparation

Agile data preparation reduces the time required to prepare data for analysis and enables analysts to work more effectively with “big data”. Agile data preparation allows analysts to engage with the data up front and then iterate to the right schema or shape to meet the analytic requirement. Although Hadoop provides the fundamental data storage and processing engine, agile data preparation also calls for new kinds of tools for working with data.

Teradata Loom’s inbuilt Weaver tool provides data wrangling, fine-grain data preparation capabilities. Weaver enables highly exploratory, iterative interactions with the datasets to quickly prepare the data for meaningful statistical analysis. Analysts and data scientists today spend 80% of their time in finding and preparing the data—time ideally should have been spent on the analysis itself. With Teradata Loom’s Weaver capabilities, these data professionals can spend more time in analyzing the data rather than preparing the data itself, thereby dramatically increasing their productivity.

Loom Weaver

Loom Weaver is designed to be an interactive solution to work with big data. Built on the Loom’s metadata management capabilities, Weaver helps analysts prepare Hadoop-based data for analysis. Weaver allows users to sample tables registered in Loom, then execute any number of operations on the sample. Weaver supports operations for strings, as well as numeric and date-time columns. Once all of the operations have been specified on the sample data, users submit them to be executed over the full data in HDFS. Weaver generates the MapReduce required to execute the operations. As part of Loom, the lineage of all Weaver transformations is automatically tracked and incorporated into the lineage graph. Weaver increases the productivity of analysts working in Hadoop by making it easier to find and prepare the right data faster.

Hadoop Metadata Management

Hadoop ecosystem includes powerful tools for processing and analyzing large amounts of data. However, Capabilities for managing metadata are limited. The result is that analysts find it difficult to find and understand the data, and enterprises are left to watch the data lake become a data swamp. This can result in substantial risks such as the inability to determine the data origins, data context, semantic consistency and data lineage, making it very difficult for data analysts and the wider audience to find and work with the data in a seamless fashion. Subsequently, organizations struggle to trust, understand and use these data effectively for their unique purposes. It’s very critical to have an integrated data management solution in place to ensure rapid, quick access to high quality, high integrity data. Teradata Loom’s Hadoop management capabilities empower all stakeholders to maximize the return on investment from a lower-cost, more powerful data platform.

Try Loom

OK enough marketing lets get started with Loom.

The first thing you are going to need is Haddop either direct from Apache or one of the various vendors (Hortonworks, Cloudera, Teradata, MapR).

For the work below I downloaded the Hortonworks Sandbox for VirtualBox and got this running according to their instructions for Mac. Once I had imported the sandbox appliance I let it get started up I was able to determine my basic SSH parameters which I captured in a file called connection.properties as follows:

# connection.properties file

loom.hostname=127.0.0.1
ssh.port=2222
root.username=root
root.password=hadoop

If you want to manually execute the various commands tht we will cover below you willl need to use this information in connection.properties to create an SSH connection to your Hadoop environment and for this Sandbox that would be:

ssh root@127.0.0.1 -p22222

If your are more DevOps Focused and are looking for Build Automation examples then start with an IDE such as Eclipse (I always go for the latest JEE version), create a Workspace and then a new Java Project as we are using Apache Ant as our build engine, which is Java Based. I won't go into the mechanics but when you are done you should have a Loom Project directory structure like this.

Loom Project

We will fill in the various files and directorry structures as we progress. Clearly the connection.properties file from above goes into the src/config directory while the loom.properties from the directory is as follows (for loom-2.3.0-rc6-community.zip, which is my work in progress file):

# loom Properties file

LOOM_VERSION=2.3.0
LOOM_BUILD=rc6
LOOM_BASE=/opt/loom-$LOOM_VERSION-$LOOM_BUILD

Now lets walk through the src/scripts directory and the first file we see is assertFunctions.sh as follows:

#-------------------------------------------------
# source into a bigger .sh program
#-------------------------------------------------
# Begin Functions
#-------------------------------------------------
function assert {

STATUS=$?

if [ $STATUS -ne 0 ] ;
then
    msg "ERROR: $1 failed (Status: $STATUS)."
    msg "UPDATE TERMINATED DUE TO FAILURE!"
    exit 1
else
    msg "$1 successful."
fi

}

function msg {
    echo `date` "$1"
}

#-------------------------------------------------
# End Functions
#-------------------------------------------------

This just acts as a helper function to pick up build / installation errors so simply place this in the src/scripts directory and let the build pick it up if you chose to go that route.

There are a couple of support files namely haddop.sh and hive.sh that make sure your environment variables are set up whenever you login or ssh that look as follows:

hadoop.sh: copied to /etc/profile.d

export HADOOP_HOME=/usr/lib/hadoop

hive.sh: copied to /etc/profile.d

export HIVE_HOME=/usr/lib/hive

This is the point where we have gone as far as we can prior to the oficial release of the loom-2.3.0-beta1-community.zip as you need this to proceeed any further, but the count down has commenced and we are working to get you this on the 15th of October 2014..

Work In Progress

Ignore ancestor settings:

Tags:

Channel:

Apply supersede status to children:

Try Teradata Loom

Agile Data Preparation

Loom Weaver

Hadoop Metadata Management

Try Loom

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List