
Attachment | Size |
---|---|
![]() | 112.56 KB |
![]() | 153.84 KB |
![]() | 103.5 KB |
Loom provides core capabilities needed for enterprises to successfully deploy a Hadoop data lake. Loom makes the first phase of the analytic workflow more efficient, enabling analysts to quickly find, understand, and prepare data in a Hadoop data lake, which allows them to spend more time developing analytics. Ultimately, this means more business insights are developed faster, which is the ultimate driver for ROI of the data lake.
The primary purpose of Loom is to serve as a “workbench” for an analyst working in the Hadoop data lake, helping them to:
Find Data – Search/browse for data in the data lake through Loom’s Workbench
Explore Data – View data previews and navigate between related datasets
Understand Data – In addition to data previews, Loom provides the user with valuable metadata that gives extra context to the data and helps a user understand its true meaning, including statistics, business metadata, and lineage
Prepare Data – Execute transformations to convert data from its original form into the form required for analysis, including cleaning tables and merging tables
Loom includes an automation framework, called Activescan, to assist with many of the underlying data management tasks necessary to support the capabilities above, including cataloging and profiling data.
Setup
If you have not already come through one of the Try Loom pages (Try Loom, Try Loom with HDP or Try Loom with CDH), navigate to the Loom Download page to download Loom Community edition:
There you will find two options for downloading Loom:
- Loom software-only
- Loom HDP VM
If you have an existing Hadoop cluster where you would like to install Loom, download the software and follow the installation instructions below. If you do not have an existing Hadoop cluster or you want to try Loom out before installing on your cluster, download the HDP VM to run Loom on your local machine with a single-node Hortonworks cluster.
Please be sure you meet these basic requirements:
-
Software
- Hadoop cluster: HDP 2.1, TDH 2.1, CDH 5.1
- Browser: Chrome, Firefox
- OS: Ubuntu, CentOS, SLES, RHEL
-
Hardware
- CPU: 2+
- RAM: 4 GB+
HDP VM Setup
Download the virtual machine and open it with VirtualBox. Loom should automatically start when the VM starts. Point your browser to:
127.0.0.1:8080
The Loom Workbench should load, prompting you to login or register a new user. Since you are opening Loom for the first time, you should register a new username for yourself by clicking the "Register" tab.
That's it!
Some notes on the VM:
- Loom sometimes takes 2-3 minutes to start after the VM has finished booting up. If the Workbench does not load immediately, wait a few minutes and try again.
- Activescan source scanning has been enabled - any Hive databases or HDFS files (specifically, in /user/hue/) will automatically be registered. Scans are executed every 5 minutes. See the Loom Installation Guide for details on how to configure Activescan.
- Some sample data has been registered in Loom already to make it easier to get started faster
Existing Cluster Setup
This is the basic installation process for installing Loom on any generic Hadoop cluster:
1. Download and unzip the Loom distribution
2. Set environment variables
export HADOOP_HOME=/usr/lib/hadoop
export HIVE_HOME=/usr/lib/hive
export HCAT_HOME=/usr/lib/hive-hcatalog
3. Change to the Loom distribution folder (e.g. loom-1.2.3)
4. Run the check setup script:
Refer to the Loom Installation Guide if any checks fail
bin/check-setup.sh
5. Start the server:
bin/loom-server.sh
6. Open your browser and navigate to 'localhost:8080'
If the server has started successfully, you will see the login page for the Loom Workbench. Click the 'Register' tab and create a username and password.
Refer to the Loom Installation Guide for more detailed installation and configuration instructions.
Visit the forums on Teradata Developer Exchange for support.
Getting Started
This section of the Quick Start Guide will describe how to get started working with data in Loom.
Before working with Loom, download sample data and load it into HDFS. Here is a link to download some data that has been tested with Loom:
Sample data available from: http://www2.informatik.uni-freiburg.de/~cziegler/BX/
Direct file download: http://www2.informatik.uni-freiburg.de/~cziegler/BX/BX-CSV-Dump.zip
Create a Source
- Select the “Create a Source” option from the “Sources” menu along the top-level navigation bar
- Click the “Location” field and select the directory containing the files
- Click the magnifying glass icon next to each file to preview the data and change formatting options. You may need to check the “Has a header row?” box.
- Click “Save”
This should take you to the “Source Browser” screen and the Source you created for the books data should be listed. Click on the books source to explore the data and metadata.
A Source allows you to preview the data, view the schema extracted from the underlying data, and add metadata about the data. In this case, you have manually created the Source, but Sources can also be generated automatically by Activescan, in which case the key pieces of metadata (data location, format, structure) are inferred by Activescan.
The next step is to create a Dataset from the Source, which will enable you to execute transformations against the data.
Create a Dataset
- Navigate to the books Source
- Click the “Create Dataset” button in the top right corner of the screen
- Click the “Edit” button next to each table to preview the data and schema
- Click “Save”
This should take you to the “Dateset Browser” screen and the Dataset you created from the books Source should be listed. It will initially have a status of “pending” while the data is being processed. Once the Dataset becomes “active” it is available for transformations.
A Dataset provides similar capabilities to a Source – a user can preview the data in Tables in the Dataset and view/add metadata. A Dataset differs from a Source in that it is structured – there is a schema now attached to the data – and it can be transformed. On the page for a Dataset, you will see a “Use in Transform” button in the top right corner, where a Source has a “Create Dataset” button. The “Use in Transform” button will let you choose “HiveQL” or “Weaver”. Each Table in a Dataset also has a “Statistics” tab, which is absent for sources. You can generate statistics for a Table by clicking the “Compute Statistics” button or by configuring Activescan to calculate them automatically.
Now that you have a Dataset, you can try using Hive and Weaver to execute Transforms and create new Tables.
Refer to the Loom User Guide for more detailed information about how Loom is used and the capabilities it provides.
Refer to the Loom Installation Guide for more detailed installation and configuration instructions.
Visit the forums on Teradata Developer Exchange for support.
Documentation
This quick start guide is intended to help a user install the product and run a simple scenario with some sample data, in order to introduce them to the basic functionality of the tool. Please see these documents for more detail in other areas:
- Loom Installation Guide
- Loom User Guide
- Loom API Reference
The ‘docs’ folder in the Loom distribution also includes helpful technical documentation that covers more advanced configuration options.