Cloudera Datamgmt
Cloudera Datamgmt
Important Notice
Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service
names or slogans contained in this document are trademarks of Cloudera and its
suppliers or licensors, and may not be copied, imitated or used, in whole or in part,
without the prior written permission of Cloudera or the applicable trademark holder.
Hadoop and the Hadoop elephant logo are trademarks of the Apache Software
Foundation. All other trademarks, registered trademarks, product names and
company names or logos mentioned in this document are the property of their
respective owners. Reference to any products, services, processes or other
information, by trade name, trademark, manufacturer, supplier or otherwise does
not constitute or imply endorsement, sponsorship or recommendation thereof by
us.
Complying with all applicable copyright laws is the responsibility of the user. Without
limiting the rights under copyright, no part of this document may be reproduced,
stored in or introduced into a retrieval system, or transmitted in any form or by any
means (electronic, mechanical, photocopying, recording, or otherwise), or for any
purpose, without the express written permission of Cloudera.
Cloudera, Inc.
1001 Page Mill Road Bldg 2
Palo Alto, CA 94304
info@cloudera.com
US: 1-888-789-1488
Intl: 1-650-362-0488
www.cloudera.com
Release Information
Auditing.......................................................................................................................6
Audit Log Properties....................................................................................................................................6
Service Auditing Properties........................................................................................................................8
Auditing Impala Operations...................................................................................................................................12
Audit Events and Audit Reports...............................................................................................................13
Viewing Audit Events.............................................................................................................................................14
Filtering Audit Events.............................................................................................................................................14
Creating Audit Event Reports................................................................................................................................15
Editing Audit Event Reports..................................................................................................................................15
Downloading Audit Event Reports.......................................................................................................................15
Audit Event Fields...................................................................................................................................................16
Downloading HDFS Directory Access Permission Reports...................................................................19
Metadata...................................................................................................................20
Metadata Search........................................................................................................................................22
Search Syntax..........................................................................................................................................................22
Search Properties...................................................................................................................................................22
Accessing Metadata..................................................................................................................................25
Navigator Metadata UI...........................................................................................................................................25
Navigator API..........................................................................................................................................................28
Modifying Business Metadata.................................................................................................................28
Policies......................................................................................................................34
Policy Expressions.....................................................................................................................................35
Lineage Diagrams....................................................................................................43
Displaying a Template Lineage Diagram.................................................................................................45
Displaying an Instance Lineage Diagram................................................................................................47
Displaying the Template Lineage Diagram for an Instance Lineage Diagram....................................48
Downloading a Lineage File......................................................................................................................48
Impala Lineage Properties........................................................................................................................60
Schema.......................................................................................................................................................61
Displaying Hive, Sqoop, and Impala Table Schema.............................................................................................61
Displaying Pig Table Schema.................................................................................................................................61
Displaying HDFS Dataset Schema........................................................................................................................62
About Cloudera Data Management
Important: This feature is available only with a Cloudera Enterprise license; it is not available in
Cloudera Express. For information on Cloudera Enterprise licenses, see Managing Licenses.
Cloudera Navigator is a fully integrated data management tool for the Hadoop platform. Data management
capabilities are critical for enterprise customers that are in highly regulated industries and have stringent
compliance requirements.
Cloudera Navigator provides two categories of functionality:
• Auditing data access and verifying access privileges - The goal of auditing is to capture a complete and
immutable record of all activity within a system. While Hadoop has historically lacked centralized
cross-component audit capabilities, products such as Cloudera Navigator add secured, real-time audit
components to key data and access frameworks. Cloudera Navigator allows administrators to configure,
collect, and view audit events, to understand who accessed what data and how. Cloudera Navigator also
allows administrators to generate reports that list the HDFS access permissions granted to groups.
Cloudera Navigator tracks access permissions and actual accesses to all entities in HDFS, Hive, HBase, Impala,
and Sentry to help answer questions such as - who has access to which entities, which entities were accessed
by a user, when was an entity accessed and by whom, what entities were accessed using a service, which
device was used to access, and so on. Cloudera Navigator auditing supports tracking access to:
• HDFS data accessed through HDFS, Hive, HBase, Cloudera Impala, and Cloudera Search services
• HBase and Impala operations
• Hive metadata
• Sentry access
• Solr access
• Cloudera Navigator Metadata Server access
• Searching metadata and visualizing lineage - Cloudera Navigator metadata management features allow
DBAs, data modelers, business analysts, and data scientists to search for, amend the properties of, and tag
data entities.
In addition, to satisfy risk and compliance audits and data retention policies, it supports the ability to answer
questions such as: where did the data come from, where is it used, and what are the consequences of purging
or modifying a set of data entities. Cloudera Navigator supports tracking the lineage of HDFS files, datasets,
and directories, Hive tables and columns, MapReduce and YARN jobs, Hive queries, Impala queries, Pig scripts,
Oozie workflows, Spark jobs, and Sqoop jobs.
Auditing
Cloudera Navigator auditing provides data auditing and access features. The Cloudera Navigator auditing
architecture is illustrated below.
When Cloudera Navigator auditing is configured, plug-ins that enable collection and filtering of audit events are
added to the HDFS, HBase, and Hive (that is, the HiveServer2 and Beeswax servers) services. The plug-ins write
the audit events to an audit log on the local filesystem. Cloudera Impala and Sentry collection and filter audit
events and write them directly in an audit log file.
The Cloudera Manager Agent monitors the audit log files and sends these events to the Navigator Audit Server.
The Cloudera Manager Agent retries any event that it fails to transmit. As there is no in-memory transient buffer
involved, once the audit events are written to the audit log file, they are guaranteed to be delivered (as long as
filesystem is available). The Cloudera Manager Agent keeps track of current audit event offset in the audit log
that it has successfully transmitted, so on any crash/restart it picks up the audit event from the last successfully
sent position and resumes. Audit logs are rotated and the Cloudera Manager Agent follows the rotation of the
log. The Agent also takes care of purging old audit logs once they have been successfully transmitted to the
Navigator Audit Server. If a plug-in fails to write audit event to audit log file, it can either drop the event or shut
down the process in which they are running (depending on the configured queue policy).
The Navigator Audit Server performs the following functions:
• Tracking and coalescing events
• Storing events to the audit DB
• Audit Log Directory - The directory in which audit log files are written. By default, this property is not set if
Cloudera Navigator is not installed.
Note: If the value of this property is changed, and service is restarted, then the Cloudera Manager
Agent will start monitoring the new log directory for audit events. In this case it is possible that
not all events are published from the old audit log directory. To avoid loss of audit events, when
this property is changed, perform the following steps:
1. Stop the service.
2. Copy audit log files and (for Impala only) the impalad_audit_wal file from the old audit log
directory to the new audit log directory. This needs to be done on all the hosts where Impala
Daemons are running.
3. Start the service.
• Maximum Audit Log File Size - The maximum size of the audit log file before a new file is created. The unit
of the file size is service dependent:
– HDFS, HBase, Hive, Navigator Metadata Server, Sentry, Solr - MiB
– Impala - lines (queries)
• Number of Audit Logs to Retain - Maximum number of rolled over audit logs to retain. The logs will not be
deleted if they contain audit events that have not yet been propagated to the Audit Server.
Note: If the queue policy is Shutdown, the Impala service is shut down only if Impala is unable to
write to the audit log file. It is possible that an event may not appear in the audit event log due to
an error in transfer to the Cloudera Manager Agent or database. In such cases Impala will not shut
down and will keep writing to the log file. When the transfer problem is fixed the events will be
transferred to the database.
This property is not supported for the Cloudera Navigator Metadata Server.
The Audit Event Filter and Audit Event Tracker rules for filtering and coalescing events are expressed as JSON
objects.
You can edit these rules using a rule editor:
For information on the structure of the objects, and the properties for which you can set filters, display the
description on the configuration page as follows:
1. In the Cloudera Manager Admin Console, go to a service that supports auditing.
2. Click the Configuration tab.
3. Select Scope > Service (Service-Wide).
4. Select Category > Cloudera Navigator category.
5. In Audit Event Tracker row, click . For example, the Hive properties are:
• userName: the user performing the action.
• ipAddress: the IP from where the request originated.
• operation: the Hive operation being performed.
• databaseName: the databaseName for the operation.
• tableName: the tableName for the operation.
Required Role:
Follow this procedure for all cluster services that support auditing. In addition, for Impala and Solr auditing,
perform the steps in Configuring Impala Daemon Logging on page 10, Enabling Solr Auditing on page 10.
1. Go to a service that supports auditing.
2. Click the Configuration tab.
3. Select Scope > Service (Service-Wide).
4. Select Category > Cloudera Navigator category.
5. Edit the properties.
6. Click Save Changes to commit the changes.
7. Restart the service.
Required Role:
To control whether the Impala Daemon role logs to the audit log:
1. Click the Impala service.
2. Click the Configuration tab.
3. Select Scope > Impala Daemon.
4. Select Category > Logs.
5. Edit the Enable Impala Audit Event Generation.
6. Click Save Changes to commit the changes.
7. Restart the service.
To set the log file size:
1. Click the Impala service.
2. Select Scope > Impala Daemon.
3. Select Category > Logs.
4. Set the Impala Daemon Maximum Audit Log File Size property.
5. Click Save Changes to commit the changes.
6. Restart the service.
Required Role:
Solr auditing is disabled by default. To enable auditing:
1. Enable Sentry authorization for Solr following the procedure in Enabling Sentry Authorization for Solr.
2. Go to the Solr service.
3. Click the Configuration tab.
4. Select Scope > Solr Service (Service-Wide)
5. Select Category > Policy File Based Sentry category.
6. Select or deselect the Enable Sentry Authorization checkbox.
7. Select Category > Cloudera Navigator category.
8. Select or deselect the Enable Audit Collection checkbox. See Audit Log Properties on page 6.
9. Click Save Changes to commit the changes.
10. Restart the service.
Required Role:
Navigator Metadata Server auditing is enabled by default. To enable or disable auditing:
Required Role:
The Audit Server logs all audit records into a Log4j logger called auditStream. The log messages are logged at
the TRACE level, with the attributes of the audit records. By default, the auditStream logger is inactive because
the logger level is set to FATAL. It is also connected to a NullAppender, and does not forward to other appenders
(additivity set to false).
To record the audit stream, configure the auditStream logger with the desired appender. For example, the
standard SyslogAppender allows you to send the audit records to a remote syslog.
The Log4j SyslogAppender supports only UDP. An example syslog configuration would be:
$ModLoad imudp
$UDPServerRun 514
# Accept everything (even DEBUG messages) local2.* /my/audit/trail.log
It is also possible to attach other appenders to the auditStream to provide other integration behaviors.
You can audit events to syslog in two formats: JSON and RSA EnVision. To configure audit logging to syslog, do
the following:
1. Do one of the following:
• Select Clusters > Cloudera Management Service > Cloudera Management Service.
• On the Status tab of the Home page, in Cloudera Management Service table, click the Cloudera Management
Service link.
2. Click the Configuration tab.
3. Locate the Navigator Audit Server Logging Advanced Configuration Snippet property by typing
its name in the Search box.
4. Depending on the format type, enter:
log4j.logger.auditStream = TRACE,SYSLOG
log4j.appender.SYSLOG = org.apache.log4j.net.SyslogAppender
log4j.appender.SYSLOG.SyslogHost = hostname
log4j.appender.SYSLOG.Facility = Local2
log4j.appender.SYSLOG.FacilityPrinting = true
Format Properties
JSON log4j.additivity.auditStream = false
If a particular field is not applicable for that audit event, it is omitted from the message.
occurs on the source table and an insert operation occurs on the destination table. The audit log for a query
against a view records the base table accessed by the view, or multiple base tables in the case of a view that
includes a join query. Every Impala operation that corresponds to a SQL statement is recorded in the audit logs,
whether the operation succeeds or fails. Impala records more information for a successful operation than for a
failed one, because an unauthorized query is stopped immediately, before all the query planning is completed.
The information logged for each query includes:
• Client session state:
– Session ID
– User name
– Network address of the client connection
• SQL statement details:
– Query ID
– Statement Type - DML, DDL, and so on
– SQL statement text
– Execution start time, in local time
– Execution Status - Details on any errors that were encountered
– Target Catalog Objects:
– Object Type - Table, View, or Database
– Fully qualified object name
– Privilege - How the object is being used (SELECT, INSERT, CREATE, and so on)
3. Click Apply.
Adding a Filter
1. Do one of the following:
• Click the icon that displays next to a field when you hover in one of the event entries.
• Click the Filters link. The Filters pane displays.
1. Click Add New Filter to add a filter.
2. Choose a field in the drop-down list. You can search by fields such as username, service name, or
operation. The fields vary depending on the service or role. The service name of the Navigator Metadata
Server is Navigator.
3. Choose an operator in the operator drop-down list.
4. Type a field value in the value text field. To match a substring, use the like operator and specify %
around the string. For example, to see all the audit events for files created in the folder /user/joe/out
specify Source like %/user/joe/out%.
A filter control with field, operation, and value fields is added to the list of filters.
2. Click Apply. A field, operation, and value breadcrumb is added above the list of audit events and the list of
events displays all events that match the filter criteria.
Removing a Filter
1. Do one of the following:
• Click the x next to the filter above the list of events. The list of events displays all events that match the
filter criteria.
• Click the Filters link. The Filters pane displays.
1. Click the at the right of the filter.
2. Click Apply. The filter is removed from above the list of audit event and the list of events displays all
events that match the filter criteria.
curl
http://Navigator_Metadata_Server_host:port/api/v5/audits/?query=service%3D%3Dhive&startTime=1431025200000&endTime=1431032400000\
&limit=5&offset=0&format=JSON&attachment=false -X GET -u username:password
startTime and endTime are required parameters and must be specified in epoch time in milliseconds.
[ {
"timestamp" : "2015-05-07T20:34:39.923Z",
"service" : "hive",
"username" : "hdfs",
"ipAddress" : "12.20.199.170",
"command" : "QUERY",
"resource" : "default:sample_08",
"operationText" : "INSERT OVERWRITE \n TABLE sample_09 \nSELECT \n
sample_07.code,sample_08.description \n FROM sample_07 \n JOIN sample_08 \n WHERE
sample_08.code = sample_07.code",
"allowed" : true,
"serviceValues" : {
"object_type" : "TABLE",
"database_name" : "default",
"operation_text" : "INSERT OVERWRITE \n TABLE sample_09 \nSELECT \n
sample_07.code,sample_08.description \n FROM sample_07 \n JOIN sample_08 \n WHERE
sample_08.code = sample_07.code",
"resource_path" : "/user/hive/warehouse/sample_08",
"table_name" : "sample_08"
}
}, {
"timestamp" : "2015-05-07T20:33:50.287Z",
"service" : "hive",
"username" : "hdfs",
"ipAddress" : "12.20.199.170",
"command" : "SWITCHDATABASE",
"resource" : "default:",
"operationText" : "USE default",
"allowed" : true,
"serviceValues" : {
"object_type" : "DATABASE",
"database_name" : "default",
"operation_text" : "USE default",
"resource_path" : "/user/hive/warehouse",
"table_name" : ""
}
}, {
"timestamp" : "2015-05-07T20:33:23.792Z",
"service" : "hive",
"username" : "hdfs",
"ipAddress" : "12.20.199.170",
"command" : "CREATETABLE",
"resource" : "default:",
"operationText" : "CREATE TABLE sample_09 (code string,description string) ROW FORMAT
DELIMITED FIELDS TERMINATED BY '\\t' STORED AS TextFile",
"allowed" : true,
"serviceValues" : {
"object_type" : "DATABASE",
"database_name" : "default",
"operation_text" : "CREATE TABLE sample_09 (code string,description string) ROW
FORMAT DELIMITED FIELDS TERMINATED BY '\\t' STORED AS TextFile",
"resource_path" : "/user/hive/warehouse",
"table_name" : ""
}
} ]
IP Address ipAddress The IP address of the host where the action occurred.
Name name Name of a policy, saved search, or audit report in Navigator
Metadata Server.
Object Type object_type For Sentry, Hive, and Impala, the type of the object (TABLE, VIEW,
DATABASE) on which operation was performed.
Operation command The action performed.
• HBase - createTable, deleteTable, modifyTable, addColumn,
modifyColumn, deleteColumn, enableTable, disableTable, move,
assign, unassign, balance, balanceSwitch, shutdown,
stopMaster, flush, split, compact, compactSelection,
getClosestRowBefore, get, exists, put, delete, checkAndPut,
checkAndDelete, incrementColumnValue, append, increment,
scannerOpen, grant, revoke
• HDFS - setPermission, setOwner, open, concat, setTimes,
createSymlink, setReplication, create, append, rename, delete,
getfileinfo, mkdirs, listStatus, fsck, listSnapshottableDirectory,
setPermission, setReplication
• Hive - EXPLAIN, LOAD, EXPORT, IMPORT, CREATEDATABASE,
DROPDATABASE, SWITCHDATABASE, DROPTABLE, DESCTABLE,
DESCFUNCTION, MSCK, ALTERTABLE_ADDCOLS,
Operation Params operation_params Solr query or update parameters used when performing the action.
Operation Text operation_text For Sentry, Hive, and Impala, the SQL query that was executed by
user.
Permissions permissions HDFS permission of the file or directory on which the HDFS
operation was performed.
Privilege privilege Privilege needed to perform an Impala operation.
Qualifier qualifier HBase column qualifier.
Query ID query_id The query ID for an Impala operation.
Resource resource A service-dependent combination of multiple fields generated
during fetch. This field is not supported for filtering as it is not
persisted.
Table Name table_name For Sentry, HBase, Hive and Impala, the name of the table on which
action was performed.
Username username The name of the user that performed the action.
Metadata
Cloudera Navigator metadata features provides data discovery and data lineage functions. The Cloudera Navigator
metadata architecture is illustrated below.
Metadata Extraction
The Navigator Metadata Server extracts metadata for the following resource types from the listed servers:
• HDFS - Extracts HDFS metadata at the next scheduled extraction run after an HDFS checkpoint. However,
if you have high availability enabled, metadata is extracted as soon as it is written to the JournalNodes.
• Hive - Extracts database and table metadata from the Hive Metastore Server.
• Impala - Extracts database and table metadata from the Hive Metastore Server. Extracts query metadata
from the Impala Daemon lineage logs.
• MapReduce - Extracts job metadata from the JobTracker. The default setting in Cloudera Manager retains a
maximum of five jobs, which means if you run more than five jobs between Navigator extractions, the
Navigator Metadata Server would extract the five most recent jobs.
• Oozie - Extracts Oozie workflows from the Oozie Server.
• Pig - Extracts Pig script runs from the JobTracker or Job History Server.
• Sqoop 1 - Extracts database and table metadata from the Hive Metastore Server. Extracts job runs from the
JobTracker or Job History Server.
• YARN - Extracts job metadata from the Job History Server.
If an entity is created at time t0 in the system, that entity will be extracted and linked in Navigator after the
extraction poll period (default 10 minutes) plus a service-specific interval as follows:
• HDFS: t0 + extraction poll period + HDFS checkpoint interval (default 1 hour)
• HDFS + HA: t0 + extraction poll period
• Hive: t0 + extraction poll period + Hive maximum wait time (default 60 minutes)
• Impala: t0 + extraction poll period
Metadata Indexing
After metadata is extracted it is indexed and made available for searching by an embedded Solr engine. The Solr
schema indexes two types of metadata: entity properties and relationship between entities.
You can search entity metadata using the Navigator UI. Relationship metadata is implicitly visible in lineage
diagrams and explicitly available in a lineage file.
Metadata Search
Search is implemented by an embedded Solr engine that supports the syntax described in LuceneQParserPlugin.
Search Syntax
You construct search strings by specifying the value of a default property, property name-value pairs, or
user-defined name-value pairs using the syntax:
• Property name-value pairs - propertyName:value, where
– propertyName is one of the properties listed in Search Properties on page 22.
– value is a single value or range of values specified as [value1 TO value2]. In a value, * is a wildcard.
In property name-value pairs you must escape special characters :, -, and * with the backslash character
\. For example, fileSystemPath:/tmp/hbase\-staging.
Note: When viewing MapReduce jobs in the Cloudera Manager Activities page, the string that appear
in a job's Name column equates to the originalName property. Therefore, to specify a MapReduce
job's name in a search, use the following string: (sourceType:mapreduce) and
(originalName:jobName), where jobName is the value in the job's Name column.
Search Properties
A reference for the search schema properties.
Default Properties
The following properties can be searched by simply specifying a property value: type, fileSystemPath, inputs,
jobId, mapper, mimeType, name, originalName, outputs, owner, principal, reducer, tags.
Common Properties
Query
queryText string The text of a Hive or Sqoop query.
Source
clusterName string The name of the cluster in which the entity is stored.
sourceId string The ID of the source type.
sourceType caseInsensitiveText The source type of the entity: hdfs, hive, impala, mapreduce,
oozie, pig, sqoop, yarn.
Timestamps
The available date Timestamps in the Solr Date Format. For example:
timestamp fields
• lastAccessed:[* TO NOW]
vary by the source
type: • created:[1976-03-06T23:59:59.999Z TO *]
• started:[1995-12-31T23:59:59.999Z TO
• hdfs - 2007-03-06T00:00:00Z]
lastModified,
• ended:[NOW-1YEAR/DAY TO NOW/DAY+1DAY]
lastAccessed
• created:[1976-03-06T23:59:59.999Z TO
• hive - created,
1976-03-06T23:59:59.999Z+1YEAR]
lastAccessed
• lastAccessed:[1976-03-06T23:59:59.999Z/YEAR TO
• impala,
1976-03-06T23:59:59.999Z]
mapreduce, pig,
sqoop, and yarn
- started, ended
HDFS Properties
Operation Properties
Hive Properties
Oozie Properties
Pig Properties
Sqoop Properties
Accessing Metadata
Required Role:
You can access metadata through the Navigator UI or through the Navigator API.
Navigator Metadata UI
Searching Metadata
1. Start and log into the Navigator UI.
Search Results
The Search Results pane displays the number of matching entries in pages listing 25
entities per page. You can view the pages using the page control at the bottom of each
page.
Each entry in the result list contains:
• Source type
• Name - the name is a link to a page that displays the entity property editor and lineage diagram
• Properties
• If Hue is running, a link at the far right labeled View in Hue that opens the Hue browser for the entity:
– HDFS directories and files - File Browser
– Hive database and tables - Metastore Manager
– MapReduce, YARN, Pig - Job Browser
For example:
– If a property has no values, click add a new value, click the text box and select from the populated values
in the drop-down list or type a value.
• Timestamp - Timestamps are used for started, ended, created, last accessed, and last modified properties.
The server stores the timestamp in UTC and the UI displays the timestamp converted to the local timezone.
Select one of the timestamp options:
– A Last XXX day(s) link.
–
The Last checkbox, type a value, and select minutes, hours, or days using the spinner control .
– The Custom period checkbox and specify the start and end date.
– Date - Click the down arrow to display a calendar and select a date, or click a field and click the
spinner arrows or up and down arrow keys.
–
Time - Click the hour, minute, and AM/PM fields and click the spinner arrows or up and down arrow
keys to specify the value.
– Move between fields using the right and left arrow keys.
To remove filter values, click the in the breadcrumb or deselect the checkbox.
When you select a specific source type value, additional properties that apply to that source type display. For
example, HDFS has size, created, and group properties:
The number in parentheses (facet count) after a property value is the number of extracted entities that have
that property value:
When you type values, the value is enclosed in quotes; the value inside the quotes must exactly match the
metadata. For example, typing "sample_*" in the originalName property returns only entities whose names
match that exact string. To perform a wildcard search, type the wildcard string in the Search box. For example,
typing the string "sample_*" in the Search box returns all entities with "sample_" at the beginning of their original
name.
When you construct search strings with filters, multiple values of a given property are added with the OR operator.
Multiple properties are added with the AND operator. For example:
and:
To specify different operators, for example to OR properties, explicitly type the search string containing OR'd
properties in the Search box.
Saving Searches
1. Specify a search string or set of filters.
2. Select Actions > Save As....
3. Specify a name and click OK.
Navigator API
The Navigator API allows you to search entity metadata using a REST API. For information about the API, see
Cloudera Navigator API.
Required Role:
You can specify special characters (for example, ".", " ") in the name, but it will make searching for the entity
more difficult as some characters collide with special characters in the search syntax.
5. Click Save. The new metadata appears in the metadata pane:
{
"name" : "aName",
"description" : "a description",
"properties" : {
"prop1" : "value1", "prop2" : "value2"
},
"tags" : [ "tag1" ]
}
To add metadata files to files and directories, create a metadata file with the extension .navigator, naming
the files as follows:
• File - The path of the metadata file must be .filename.navigator. For example, to apply properties to the
file /user/test/file1.txt, the metadata file path is /user/test/.file1.txt.navigator.
• Directory - The path of the metadata file must be dirpath/.navigator. For example, to apply properties to
the directory /user, the metadata path must be /user/.navigator.
The metadata file is applied to the entity metadata when the extractor runs.
Modifying HDFS and Hive Business Metadata Using the Navigator API
You can use the Cloudera Navigator API to modify the metadata of HDFS or Hive entities whether or not the
entities have been extracted. If an entity has been extracted at the time the API is called, the metadata will be
applied immediately. If the entity has not been extracted, you can preregister metadata which is then applied
once the entity is extracted. Metadata is saved regardless of whether or not a matching entity is extracted, and
Navigator does not perform any cleanup of unused metadata.
If you call the API before the entity is extracted, the metadata is stored with the entity’s identity, source ID,
metadata fields (name, description, tags, properties), and the fields relevant to the identifier. The rest of the
entity fields (such as type) will not be present. To view all stored metadata, you can use the API to search for
entities without an internal type:
curl http://Navigator_Metadata_Server_host:port/api/v5/entities/?query=-internalType:*
-u username:password -X GET
The metadata provided via the API overwrites existing metadata. If, for example, you call the API with an empty
name and description, empty array for tags, and empty dictionary for properties, the call removes this metadata.
If you leave out the tags or properties fields, the existing values remain unchanged.
Modifying metadata using HDFS metadata files and the metadata API at the same time is not supported. You
must use one or the other, because the two methods behave slightly differently. Metadata specified in files is
merged with existing metadata whereas the API overwrites metadata. Also, the updates provided by metadata
files wait in a queue before being merged, but API changes are committed immediately. This means there may
be some inconsistency if a metadata file is being merged around the same time the API is in use.
You modify metadata using either the PUT or POST method. Use the PUT method if the entity has been extracted
and the POST method to preregister metadata. The syntax of the methods are:
• PUT
curl http://Navigator_Metadata_Server_host:port/api/v5/entities/identity -u
username:password -X PUT -H\
"Content-Type: application/json" -d '{properties}'
All existing naming rules apply, and if any value is invalid, the entire request will be denied.
• POST
curl http://Navigator_Metadata_Server_host:port/api/v5/entities/ -u
username:password -X POST -H\
"Content-Type: application/json" -d '{properties}'
curl http://Navigator_Metadata_Server_host:port/api/v5/entities/?query=type:SOURCE
-u username:password -X GET
For example:
[ ...
{
"identity" : "a09b0233cc58ff7d601eaa68673a20c6",
"originalName" : "HDFS-1",
"sourceId" : null,
"firstClassParentId" : null,
"parentPath" : null,
"extractorRunId" : null,
"name" : "HDFS-1",
"description" : null,
"tags" : null,
"properties" : null,
"clusterName" : "Cluster 1",
"sourceUrl" : "hdfs://hostname:8020",
"sourceType" : "HDFS",
"sourceExtractIteration" : 4935,
"type" : "SOURCE",
"internalType" : "source"
}, ...
If you have multiple services of a given type, you must specify the source ID that contains the entity you're
expecting it to match.
– parentPath: The path of the parent entity, defined as:
– HDFS file or directory: fileSystemPath of the parent directory (do not provide this field if the entity
being affected is the root directory). Example parentPath for /user/admin/input_dir: /user/admin.
If you add metadata to a directory, the metadata does not propagate to any files and folders in that
directory.
– Hive database: If you are updating database metadata, you do not specify this field.
– Hive table or view: The name of database containing the table or view. Example for a table in the
default database: default.
– Hive column: database name/table name/view name. Example for a column in the sample_07 table:
default/sample_07.
All existing naming rules apply, and if any value is invalid, the entire request will be denied.
HDFS PUT Example for /user/admin/input_dir Directory
curl
http://Navigator_Metadata_Server_host:port/api/v5/entities/e461de8de38511a3ac6740dd7d51b8d0
-u username:password -X PUT -H "Content-Type: application/json"\
-d '{"name":"my_name","description":"My description",
"tags":["tag1","tag2"],"properties":{"property1":"value1","property2":"value2"}}'
Policies
A policy defines a set of actions performed when a class of entities is extracted. The following actions are
supported:
• Adding business metadata such as tags and properties.
• Sending a message to a JMS message queue. The JSON format message contains the metadata of the entity
to which the policy applies and the message text specified in the policy:
For each action, certain properties support specifying a value using a policy expression.
Viewing Policies
Required Role:
1. Start and log into the Navigator UI.
2. Click the Policies tab.
3. In the left pane, click a policy.
Creating Policies
Required Role:
1. Start and log into the Navigator UI.
2. Depending on the starting point, do one of the following:
Action Procedure
Policies page 1. Click the Policies tab.
2. Click Create a New Policy.
Action Procedure
Assign Metadata 1. Specify the business metadata. Optionally check the Expression checkbox
and specify a policy expression for the indicated fields.
Action Procedure
Send Notification to JMS 1. If not already configured, configure a JMS server and queue.
2. Specify the queue name and message. Optionally check the Expression
checkbox and specify a policy expression for the message.
Required Role:
1. Start and log into the Navigator UI.
2. Click the Policies tab.
3. In the left pane, click a policy.
4. Click Clone Policy or Edit Policy.
5. Edit the policy name, search query, or policy actions.
6. Click Save.
Deleting Policies
Required Role:
1. Start and log into the Navigator UI.
2. Click the Policies tab.
3. In the left pane, click a policy.
4. Click Delete and click OK to confirm.
Policy Expressions
Policy expressions allow certain policy properties to be specified programmatically using Java expressions instead
of string literals.
Policy expressions are not enabled by default. To enable policy expressions, follow the procedure in Enabling
and Disabling Policy Expression Input.
The supported policy properties are entity name and description, key-value pairs, and JMS notification message.
entity.get(XXProperties.Property, return_type)
If you don't need to specify a return type, use Object.class as the return type. However, if you want to do
type-specific operations with the result, set the return type to the type in the comment in the enum property
reference. For example, in FSEntityProperties, the return type of the ORIGINAL_NAME property is
java.lang.String. If you use String.class as the return type, you can use the String method toLowerCase()
to modify the returned value: entity.get(FSEntityProperties.ORIGINAL_NAME,
String.class).toLowerCase().
Expression Examples
• Set a filesystem entity name to the original name concatenated with the entity type:
Import Statements:
import com.cloudera.nav.hdfs.model.FSEntityProperties;
Import Statements:
com.cloudera.nav.hdfs.model.FSEntityProperties
public enum FSEntityProperties implements PropertyEnum {
PERMISSIONS, // Return type: java.lang.String
TYPE, // Return type: java.lang.String
SIZE, // Return type: java.lang.Long
OWNER, // Return type: java.lang.String
LAST_MODIFIED, // Return type: org.joda.time.Instant
SOURCE_TYPE, // Return type: java.lang.String
DELETED, // Return type: java.lang.Boolean
FILE_SYSTEM_PATH, // Return type: java.lang.String
CREATED, // Return type: org.joda.time.Instant
LAST_ACCESSED, // Return type: org.joda.time.Instant
GROUP, // Return type: java.lang.String
MIME_TYPE, // Return type: java.lang.String
DELETE_TIME, // Return type: java.lang.Long
NAME, // Return type: java.lang.String
ORIGINAL_NAME, // Return type: java.lang.String
USER_ENTITY, // Return type: boolean
SOURCE_ID, // Return type: java.lang.String
EXTRACTOR_RUN_ID, // Return type: java.lang.String
PARENT_PATH; // Return type: java.lang.String
}
com.cloudera.nav.hive.model.HiveColumnProperties
public enum HiveColumnProperties implements PropertyEnum {
TYPE, // Return type: java.lang.String
SOURCE_TYPE, // Return type: java.lang.String
DELETED, // Return type: java.lang.Boolean
DATA_TYPE, // Return type: java.lang.String
ORIGINAL_DESCRIPTION, // Return type: java.lang.String
NAME, // Return type: java.lang.String
ORIGINAL_NAME, // Return type: java.lang.String
USER_ENTITY, // Return type: boolean
com.cloudera.nav.hive.model.HiveDatabaseProperties
public enum HiveDatabaseProperties implements PropertyEnum {
TYPE, // Return type: java.lang.String
ORIGINAL_DESCRIPTION, // Return type: java.lang.String
SOURCE_TYPE, // Return type: java.lang.String
DELETED, // Return type: java.lang.Boolean
FILE_SYSTEM_PATH, // Return type: java.lang.String
NAME, // Return type: java.lang.String
ORIGINAL_NAME, // Return type: java.lang.String
USER_ENTITY, // Return type: boolean
SOURCE_ID, // Return type: java.lang.String
EXTRACTOR_RUN_ID, // Return type: java.lang.String
PARENT_PATH; // Return type: java.lang.String
}
com.cloudera.nav.hive.model.HivePartitionProperties
public enum HivePartitionProperties implements PropertyEnum {
TYPE, // Return type: java.lang.String
SOURCE_TYPE, // Return type: java.lang.String
DELETED, // Return type: java.lang.Boolean
FILE_SYSTEM_PATH, // Return type: java.lang.String
CREATED, // Return type: org.joda.time.Instant
LAST_ACCESSED, // Return type: org.joda.time.Instant
COL_VALUES, // Return type: java.util.List
NAME, // Return type: java.lang.String
ORIGINAL_NAME, // Return type: java.lang.String
USER_ENTITY, // Return type: boolean
SOURCE_ID, // Return type: java.lang.String
EXTRACTOR_RUN_ID, // Return type: java.lang.String
PARENT_PATH; // Return type: java.lang.String
}
com.cloudera.nav.hive.model.HiveQueryExecutionProperties
public enum HiveQueryExecutionProperties implements PropertyEnum {
SOURCE_TYPE, // Return type: java.lang.String
TYPE, // Return type: java.lang.String
ENDED, // Return type: org.joda.time.Instant
INPUTS, // Return type: java.util.Collection
OUTPUTS, // Return type: java.util.Collection
STARTED, // Return type: org.joda.time.Instant
PRINCIPAL, // Return type: java.lang.String
WF_INST_ID, // Return type: java.lang.String
NAME, // Return type: java.lang.String
ORIGINAL_NAME, // Return type: java.lang.String
USER_ENTITY, // Return type: boolean
SOURCE_ID, // Return type: java.lang.String
EXTRACTOR_RUN_ID, // Return type: java.lang.String
PARENT_PATH; // Return type: java.lang.String
}
com.cloudera.nav.hive.model.HiveQueryPartProperties
public enum HiveQueryPartProperties implements PropertyEnum {
TYPE, // Return type: java.lang.String
SOURCE_TYPE, // Return type: java.lang.String
NAME, // Return type: java.lang.String
ORIGINAL_NAME, // Return type: java.lang.String
USER_ENTITY, // Return type: boolean
SOURCE_ID, // Return type: java.lang.String
EXTRACTOR_RUN_ID, // Return type: java.lang.String
com.cloudera.nav.hive.model.HiveQueryProperties
public enum HiveQueryProperties implements PropertyEnum {
SOURCE_TYPE, // Return type: java.lang.String
INPUTS, // Return type: java.util.Collection
OUTPUTS, // Return type: java.util.Collection
QUERY_TEXT, // Return type: java.lang.String
TYPE, // Return type: java.lang.String
WF_IDS, // Return type: java.util.Collection
NAME, // Return type: java.lang.String
ORIGINAL_NAME, // Return type: java.lang.String
USER_ENTITY, // Return type: boolean
SOURCE_ID, // Return type: java.lang.String
EXTRACTOR_RUN_ID, // Return type: java.lang.String
PARENT_PATH; // Return type: java.lang.String
}
com.cloudera.nav.hive.model.HiveTableProperties
public enum HiveTableProperties implements PropertyEnum {
OWNER, // Return type: java.lang.String
INPUT_FORMAT, // Return type: java.lang.String
OUTPUT_FORMAT, // Return type: java.lang.String
DELETED, // Return type: java.lang.Boolean
FILE_SYSTEM_PATH, // Return type: java.lang.String
COMPRESSED, // Return type: java.lang.Boolean
PARTITION_COL_NAMES, // Return type: java.util.List
CLUSTERED_BY_COL_NAMES, // Return type: java.util.List
SORT_BY_COL_NAMES, // Return type: java.util.List
SER_DE_NAME, // Return type: java.lang.String
SER_DE_LIB_NAME, // Return type: java.lang.String
TYPE, // Return type: java.lang.String
SOURCE_TYPE, // Return type: java.lang.String
CREATED, // Return type: org.joda.time.Instant
LAST_ACCESSED, // Return type: org.joda.time.Instant
NAME, // Return type: java.lang.String
ORIGINAL_NAME, // Return type: java.lang.String
USER_ENTITY, // Return type: boolean
SOURCE_ID, // Return type: java.lang.String
EXTRACTOR_RUN_ID, // Return type: java.lang.String
PARENT_PATH; // Return type: java.lang.String
}
com.cloudera.nav.hive.model.HiveViewProperties
public enum HiveViewProperties implements PropertyEnum {
DELETED, // Return type: java.lang.Boolean
QUERY_TEXT, // Return type: java.lang.String
TYPE, // Return type: java.lang.String
SOURCE_TYPE, // Return type: java.lang.String
CREATED, // Return type: org.joda.time.Instant
LAST_ACCESSED, // Return type: org.joda.time.Instant
NAME, // Return type: java.lang.String
ORIGINAL_NAME, // Return type: java.lang.String
USER_ENTITY, // Return type: boolean
SOURCE_ID, // Return type: java.lang.String
EXTRACTOR_RUN_ID, // Return type: java.lang.String
PARENT_PATH; // Return type: java.lang.String
}
com.cloudera.nav.mapreduce.model.JobExecutionProperties
public enum JobExecutionProperties implements PropertyEnum {
SOURCE_TYPE, // Return type: java.lang.String
JOB_ID, // Return type: java.lang.String
ENDED, // Return type: org.joda.time.Instant
INPUT_RECURSIVE, // Return type: boolean
TYPE, // Return type: java.lang.String
INPUTS, // Return type: java.util.Collection
OUTPUTS, // Return type: java.util.Collection
com.cloudera.nav.mapreduce.model.JobProperties
public enum JobProperties implements PropertyEnum {
ORIGINAL_NAME, // Return type: java.lang.String
INPUT_FORMAT, // Return type: java.lang.String
OUTPUT_FORMAT, // Return type: java.lang.String
OUTPUT_KEY, // Return type: java.lang.String
OUTPUT_VALUE, // Return type: java.lang.String
MAPPER, // Return type: java.lang.String
REDUCER, // Return type: java.lang.String
SOURCE_TYPE, // Return type: java.lang.String
TYPE, // Return type: java.lang.String
WF_IDS, // Return type: java.util.Collection
NAME, // Return type: java.lang.String
USER_ENTITY, // Return type: boolean
SOURCE_ID, // Return type: java.lang.String
EXTRACTOR_RUN_ID, // Return type: java.lang.String
PARENT_PATH; // Return type: java.lang.String
}
com.cloudera.nav.oozie.model.WorkflowInstanceProperties
public enum WorkflowInstanceProperties implements PropertyEnum {
TYPE, // Return type: java.lang.String
SOURCE_TYPE, // Return type: java.lang.String
CREATED, // Return type: org.joda.time.Instant
JOB_ID, // Return type: java.lang.String
STATUS, // Return type: java.lang.String
ENDED, // Return type: org.joda.time.Instant
INPUTS, // Return type: java.util.Collection
OUTPUTS, // Return type: java.util.Collection
STARTED, // Return type: org.joda.time.Instant
PRINCIPAL, // Return type: java.lang.String
WF_INST_ID, // Return type: java.lang.String
NAME, // Return type: java.lang.String
ORIGINAL_NAME, // Return type: java.lang.String
USER_ENTITY, // Return type: boolean
SOURCE_ID, // Return type: java.lang.String
EXTRACTOR_RUN_ID, // Return type: java.lang.String
PARENT_PATH; // Return type: java.lang.String
}
com.cloudera.nav.oozie.model.WorkflowProperties
public enum WorkflowProperties implements PropertyEnum {
TYPE, // Return type: java.lang.String
SOURCE_TYPE, // Return type: java.lang.String
WF_IDS, // Return type: java.util.Collection
NAME, // Return type: java.lang.String
ORIGINAL_NAME, // Return type: java.lang.String
USER_ENTITY, // Return type: boolean
SOURCE_ID, // Return type: java.lang.String
EXTRACTOR_RUN_ID, // Return type: java.lang.String
PARENT_PATH; // Return type: java.lang.String
}
com.cloudera.nav.pig.model.PigFieldProperties
public enum PigFieldProperties implements PropertyEnum {
TYPE, // Return type: java.lang.String
INDEX, // Return type: int
SOURCE_TYPE, // Return type: java.lang.String
com.cloudera.nav.pig.model.PigOperationExecutionProperties
public enum PigOperationExecutionProperties implements PropertyEnum {
SOURCE_TYPE, // Return type: java.lang.String
TYPE, // Return type: java.lang.String
ENDED, // Return type: org.joda.time.Instant
INPUTS, // Return type: java.util.Collection
OUTPUTS, // Return type: java.util.Collection
STARTED, // Return type: org.joda.time.Instant
PRINCIPAL, // Return type: java.lang.String
WF_INST_ID, // Return type: java.lang.String
NAME, // Return type: java.lang.String
ORIGINAL_NAME, // Return type: java.lang.String
USER_ENTITY, // Return type: boolean
SOURCE_ID, // Return type: java.lang.String
EXTRACTOR_RUN_ID, // Return type: java.lang.String
PARENT_PATH; // Return type: java.lang.String
}
com.cloudera.nav.pig.model.PigOperationProperties
public enum PigOperationProperties implements PropertyEnum {
SOURCE_TYPE, // Return type: java.lang.String
OPERATION_TYPE, // Return type: java.lang.String
SCRIPT_ID, // Return type: java.lang.String
TYPE, // Return type: java.lang.String
WF_IDS, // Return type: java.util.Collection
NAME, // Return type: java.lang.String
ORIGINAL_NAME, // Return type: java.lang.String
USER_ENTITY, // Return type: boolean
SOURCE_ID, // Return type: java.lang.String
EXTRACTOR_RUN_ID, // Return type: java.lang.String
PARENT_PATH; // Return type: java.lang.String
}
com.cloudera.nav.pig.model.PigRelationProperties
public enum PigRelationProperties implements PropertyEnum {
TYPE, // Return type: java.lang.String
SOURCE_TYPE, // Return type: java.lang.String
FILE_SYSTEM_PATH, // Return type: java.lang.String
SCRIPT_ID, // Return type: java.lang.String
NAME, // Return type: java.lang.String
ORIGINAL_NAME, // Return type: java.lang.String
USER_ENTITY, // Return type: boolean
SOURCE_ID, // Return type: java.lang.String
EXTRACTOR_RUN_ID, // Return type: java.lang.String
PARENT_PATH; // Return type: java.lang.String
}
com.cloudera.nav.sqoop.model.SqoopExportSubOperationProperties
public enum SqoopExportSubOperationProperties implements PropertyEnum {
TYPE, // Return type: java.lang.String
SOURCE_TYPE, // Return type: java.lang.String
INPUTS, // Return type: java.util.Collection
FIELD_INDEX, // Return type: int
NAME, // Return type: java.lang.String
ORIGINAL_NAME, // Return type: java.lang.String
USER_ENTITY, // Return type: boolean
SOURCE_ID, // Return type: java.lang.String
EXTRACTOR_RUN_ID, // Return type: java.lang.String
com.cloudera.nav.sqoop.model.SqoopImportSubOperationProperties
public enum SqoopImportSubOperationProperties implements PropertyEnum {
DB_COLUMN_EXPRESSION, // Return type: java.lang.String
TYPE, // Return type: java.lang.String
SOURCE_TYPE, // Return type: java.lang.String
INPUTS, // Return type: java.util.Collection
FIELD_INDEX, // Return type: int
NAME, // Return type: java.lang.String
ORIGINAL_NAME, // Return type: java.lang.String
USER_ENTITY, // Return type: boolean
SOURCE_ID, // Return type: java.lang.String
EXTRACTOR_RUN_ID, // Return type: java.lang.String
PARENT_PATH; // Return type: java.lang.String
}
com.cloudera.nav.sqoop.model.SqoopOperationExecutionProperties
public enum SqoopOperationExecutionProperties implements PropertyEnum {
SOURCE_TYPE, // Return type: java.lang.String
TYPE, // Return type: java.lang.String
ENDED, // Return type: org.joda.time.Instant
INPUTS, // Return type: java.util.Collection
OUTPUTS, // Return type: java.util.Collection
STARTED, // Return type: org.joda.time.Instant
PRINCIPAL, // Return type: java.lang.String
WF_INST_ID, // Return type: java.lang.String
NAME, // Return type: java.lang.String
ORIGINAL_NAME, // Return type: java.lang.String
USER_ENTITY, // Return type: boolean
SOURCE_ID, // Return type: java.lang.String
EXTRACTOR_RUN_ID, // Return type: java.lang.String
PARENT_PATH; // Return type: java.lang.String
}
com.cloudera.nav.sqoop.model.SqoopQueryOperationProperties
public enum SqoopQueryOperationProperties implements PropertyEnum {
SOURCE_TYPE, // Return type: java.lang.String
INPUTS, // Return type: java.util.Collection
QUERY_TEXT, // Return type: java.lang.String
DB_USER, // Return type: java.lang.String
DB_URL, // Return type: java.lang.String
OPERATION_TYPE, // Return type: java.lang.String
TYPE, // Return type: java.lang.String
WF_IDS, // Return type: java.util.Collection
NAME, // Return type: java.lang.String
ORIGINAL_NAME, // Return type: java.lang.String
USER_ENTITY, // Return type: boolean
SOURCE_ID, // Return type: java.lang.String
EXTRACTOR_RUN_ID, // Return type: java.lang.String
PARENT_PATH; // Return type: java.lang.String
}
com.cloudera.nav.sqoop.model.SqoopTableExportOperationProperties
public enum SqoopTableExportOperationProperties implements PropertyEnum {
DB_TABLE, // Return type: java.lang.String
SOURCE_TYPE, // Return type: java.lang.String
DB_USER, // Return type: java.lang.String
DB_URL, // Return type: java.lang.String
OPERATION_TYPE, // Return type: java.lang.String
TYPE, // Return type: java.lang.String
WF_IDS, // Return type: java.util.Collection
NAME, // Return type: java.lang.String
ORIGINAL_NAME, // Return type: java.lang.String
USER_ENTITY, // Return type: boolean
SOURCE_ID, // Return type: java.lang.String
EXTRACTOR_RUN_ID, // Return type: java.lang.String
com.cloudera.nav.sqoop.model.SqoopTableImportOperationProperties
public enum SqoopTableImportOperationProperties implements PropertyEnum {
Lineage Diagrams
Required Role:
A lineage diagram is a directed graph that depicts an entity and its relations with other entities. A lineage diagram
is limited to 3000 entities.
There are two types of lineage diagrams:
• Template - represents an entity that is a model for other entities
• Instance - represents an instance or execution of a template
Entities
In a lineage diagram, entity types are represented by icons:
HDFS Pig
• File • • Table •
• Directory • Pig script
• • Pig script execution •
Oozie
• Job template •
• Job execution
•
Note: Tables created by Impala queries and Sqoop jobs are represented as Hive entities.
Parent entities are represented by a white box enclosing other entities. The following lineage diagram illustrates
the relations between the YARN job script.pig and Pig script script.pig invoked by the parent Oozie workflow
pig-app-hue-script and its source file midsummer.txt and destination folder upperout:
Note: In the following circumstances the entity type icon will appear as :
•
Entities are not yet extracted. In this case will eventually be replaced with the correct entity
icon after the entity is extracted and linked in Navigator. For information on how long it takes for
newly created entities to be extracted, see Metadata Extraction on page 21.
• Hive entities have been deleted from the system before they could be extracted by Navigator.
Relations
Relations between the entities are represented graphically by gray lines, with arrows indicating the direction of
the data flow. There are the following types of relations:
For lines connecting database columns, a dashed line indicates that the column is in the where clause; a solid
line indicates that the column is in the select clause.
• To improve the layout of a lineage diagram you can drag and drop entities (in this case midsummer.txt and
upperout) located outside a parent box.
• You can use the mouse scroll wheel to zoom the lineage diagram in and out.
• You can move the lineage diagram in the lineage pane by pressing the mouse button and dragging it.
the Search screen is replaced with a page that displays the entity property sheet on the left and lineage
diagram on the right:
When you click each entity icon, columns and lines connecting the source and destination columns display:
If you hover over a part, the source and destination columns are highlighted:
2. Click the Instances tab, which contains a list of links to instances of the template.
3. Click a link to display an instance lineage diagram. For the preceding template diagram, the job instance
job_1426651548889_0004 replaces the word count job template.
{
"entities": {
"01043ab3a019a68f37f3d33efa122f0f": {
"level": 1,
"physical": [],
"logical": [],
"aliasOf": [],
"aliases": [],
"instances": [],
"children": [],
"workflows": [],
"identity": "01043ab3a019a68f37f3d33efa122f0f",
"originalName": "part-r-00001",
"sourceId": "a09b0233cc58ff7d601eaa68673a20c6",
"firstClassParentId": null,
"parentPath": "/user/hdfs/out1",
"extractorRunId": "a09b0233cc58ff7d601eaa68673a20c6##1372",
"name": "part-r-00001",
"description": null,
"tags": null,
"fileSystemPath": "/user/hdfs/out1/part-r-00001",
"type": "FILE",
"size": 8,
"created": "2015-03-27T17:44:20.639Z",
"lastModified": "2015-03-27T17:44:20.639Z",
"lastAccessed": "2015-03-27T17:44:16.832Z",
"permissions": "rw-r--r--",
"owner": "hdfs",
"group": "supergroup",
"blockSize": null,
"mimeType": "application/octet-stream",
"replication": null,
"userEntity": false,
"deleted": false,
"sourceType": "HDFS",
"internalType": "fselement",
"nameField": "originalName",
"sourceName": "HDFS-1",
"hueLink":
"http://tcdn2-1.ent.cloudera.com:8888/filebrowser/view/user/hdfs/out1/part-r-00001",
"isScript": false,
"hasUpstream": true,
"parent": "89612c409b76f7bdf00036df9c3cb717",
"activeChildren": []
},
"72c31f8dbe14a520bd46a747d1382d89": {
"level": 1,
"physical": [],
"logical": [],
"aliasOf": [],
"aliases": [],
"instances": [],
"children": [
"f2eca1680ecca38fa514dc191613c7b4",
"f3929c0b9b2a16490ee57e0a498eee5e"
],
"workflows": [],
"identity": "72c31f8dbe14a520bd46a747d1382d89",
"originalName": "input",
"sourceId": "a09b0233cc58ff7d601eaa68673a20c6",
"firstClassParentId": null,
"parentPath": "/user/hdfs",
"extractorRunId": "a09b0233cc58ff7d601eaa68673a20c6##1370",
"name": "input",
"description": null,
"tags": null,
"fileSystemPath": "/user/hdfs/input",
"type": "DIRECTORY",
"size": null,
"created": "2015-03-27T17:40:43.665Z",
"lastModified": "2015-03-27T17:41:06.825Z",
"lastAccessed": null,
"permissions": "rwxr-xr-x",
"owner": "hdfs",
"group": "supergroup",
"blockSize": null,
"mimeType": null,
"replication": null,
"userEntity": false,
"deleted": false,
"sourceType": "HDFS",
"internalType": "fselement",
"nameField": "originalName",
"sourceName": "HDFS-1",
"hueLink":
"http://tcdn2-1.ent.cloudera.com:8888/filebrowser/view/user/hdfs/input",
"isScript": false,
"hasDownstream": true,
"column": -1,
"renderOrdinal": 1,
"activeChildren": [
{
"level": 1,
"physical": [],
"logical": [],
"aliasOf": [],
"aliases": [],
"instances": [],
"children": [],
"workflows": [],
"identity": "f3929c0b9b2a16490ee57e0a498eee5e",
"originalName": "test.txt",
"sourceId": "a09b0233cc58ff7d601eaa68673a20c6",
"firstClassParentId": null,
"parentPath": "/user/hdfs/input",
"extractorRunId": "a09b0233cc58ff7d601eaa68673a20c6##1370",
"name": "test.txt",
"description": null,
"tags": null,
"fileSystemPath": "/user/hdfs/input/test.txt",
"type": "FILE",
"size": 6,
"created": "2015-03-27T17:41:06.825Z",
"lastModified": "2015-03-27T17:41:06.825Z",
"lastAccessed": "2015-03-27T17:41:06.405Z",
"permissions": "rw-r--r--",
"owner": "hdfs",
"group": "supergroup",
"blockSize": null,
"mimeType": "application/octet-stream",
"replication": null,
"userEntity": false,
"deleted": false,
"sourceType": "HDFS",
"internalType": "fselement",
"nameField": "originalName",
"sourceName": "HDFS-1",
"hueLink":
"http://tcdn2-1.ent.cloudera.com:8888/filebrowser/view/user/hdfs/input/test.txt",
"isScript": false,
"hasDownstream": true,
"parent": "72c31f8dbe14a520bd46a747d1382d89",
"activeChildren": []
}
],
"x": -222.4375,
"y": -52
},
"f2eca1680ecca38fa514dc191613c7b4": {
"level": 1,
"physical": [],
"logical": [],
"aliasOf": [],
"aliases": [],
"instances": [],
"children": [],
"workflows": [],
"identity": "f2eca1680ecca38fa514dc191613c7b4",
"originalName": "test.txt._COPYING_",
"sourceId": "a09b0233cc58ff7d601eaa68673a20c6",
"firstClassParentId": null,
"parentPath": "/user/hdfs/input",
"extractorRunId": "a09b0233cc58ff7d601eaa68673a20c6##1370",
"name": "test.txt._COPYING_",
"description": null,
"tags": null,
"fileSystemPath": "/user/hdfs/input/test.txt._COPYING_",
"type": "FILE",
"size": 6,
"created": "2015-03-27T17:41:06.405Z",
"lastModified": "2015-03-27T17:41:06.405Z",
"lastAccessed": "2015-03-27T17:41:06.405Z",
"permissions": "rw-r--r--",
"owner": "hdfs",
"group": "supergroup",
"blockSize": null,
"mimeType": "application/octet-stream",
"replication": null,
"userEntity": false,
"deleted": true,
"sourceType": "HDFS",
"internalType": "fselement",
"nameField": "originalName",
"sourceName": "HDFS-1",
"hueLink":
"http://tcdn2-1.ent.cloudera.com:8888/filebrowser/view/user/hdfs/input/test.txt._COPYING_",
"isScript": false,
"parent": "72c31f8dbe14a520bd46a747d1382d89",
"activeChildren": []
},
"16b093b257033463bab26bba4c707450": {
"level": 1,
"physical": [],
"logical": [],
"aliasOf": [],
"aliases": [],
"instances": [],
"children": [],
"workflows": [],
"identity": "16b093b257033463bab26bba4c707450",
"originalName": "_temporary",
"sourceId": "a09b0233cc58ff7d601eaa68673a20c6",
"firstClassParentId": null,
"parentPath": "/user/hdfs/out1",
"extractorRunId": "a09b0233cc58ff7d601eaa68673a20c6##1372",
"name": "_temporary",
"description": null,
"tags": null,
"fileSystemPath": "/user/hdfs/out1/_temporary",
"type": "DIRECTORY",
"size": null,
"created": "2015-03-27T17:41:32.486Z",
"lastModified": "2015-03-27T17:41:32.486Z",
"lastAccessed": null,
"permissions": "rwxr-xr-x",
"owner": "hdfs",
"group": "supergroup",
"blockSize": null,
"mimeType": null,
"replication": null,
"userEntity": false,
"deleted": false,
"sourceType": "HDFS",
"internalType": "fselement",
"nameField": "originalName",
"sourceName": "HDFS-1",
"hueLink":
"http://tcdn2-1.ent.cloudera.com:8888/filebrowser/view/user/hdfs/out1/_temporary",
"isScript": false,
"parent": "89612c409b76f7bdf00036df9c3cb717",
"activeChildren": []
},
"89612c409b76f7bdf00036df9c3cb717": {
"level": 1,
"physical": [],
"logical": [],
"aliasOf": [],
"aliases": [],
"instances": [],
"children": [
"fcd80476d5a968e29e86411b4a67af87",
"01043ab3a019a68f37f3d33efa122f0f",
"16b093b257033463bab26bba4c707450",
"75470b40586cde9e092a01d37798d921"
],
"workflows": [],
"identity": "89612c409b76f7bdf00036df9c3cb717",
"originalName": "out1",
"sourceId": "a09b0233cc58ff7d601eaa68673a20c6",
"firstClassParentId": null,
"parentPath": "/user/hdfs",
"extractorRunId": "a09b0233cc58ff7d601eaa68673a20c6##1372",
"name": "out1",
"description": null,
"tags": null,
"fileSystemPath": "/user/hdfs/out1",
"type": "DIRECTORY",
"size": null,
"created": "2015-03-27T17:41:32.486Z",
"lastModified": "2015-03-27T17:44:20.848Z",
"lastAccessed": null,
"permissions": "rwxr-xr-x",
"owner": "hdfs",
"group": "supergroup",
"blockSize": null,
"mimeType": null,
"replication": null,
"userEntity": false,
"deleted": false,
"sourceType": "HDFS",
"internalType": "fselement",
"nameField": "originalName",
"sourceName": "HDFS-1",
"hueLink":
"http://tcdn2-1.ent.cloudera.com:8888/filebrowser/view/user/hdfs/out1",
"isScript": false,
"hasUpstream": true,
"column": 1,
"renderOrdinal": 2,
"activeChildren": [
{
"level": 1,
"physical": [],
"logical": [],
"aliasOf": [],
"aliases": [],
"instances": [],
"children": [],
"workflows": [],
"identity": "fcd80476d5a968e29e86411b4a67af87",
"originalName": "_SUCCESS",
"sourceId": "a09b0233cc58ff7d601eaa68673a20c6",
"firstClassParentId": null,
"parentPath": "/user/hdfs/out1",
"extractorRunId": "a09b0233cc58ff7d601eaa68673a20c6##1372",
"name": "_SUCCESS",
"description": null,
"tags": null,
"fileSystemPath": "/user/hdfs/out1/_SUCCESS",
"type": "FILE",
"size": 0,
"created": "2015-03-27T17:44:20.848Z",
"lastModified": "2015-03-27T17:44:20.848Z",
"lastAccessed": "2015-03-27T17:44:20.848Z",
"permissions": "rw-r--r--",
"owner": "hdfs",
"group": "supergroup",
"blockSize": null,
"mimeType": "application/octet-stream",
"replication": null,
"userEntity": false,
"deleted": false,
"sourceType": "HDFS",
"internalType": "fselement",
"nameField": "originalName",
"sourceName": "HDFS-1",
"hueLink":
"http://tcdn2-1.ent.cloudera.com:8888/filebrowser/view/user/hdfs/out1/_SUCCESS",
"isScript": false,
"parent": "89612c409b76f7bdf00036df9c3cb717",
"hasUpstream": true,
"activeChildren": []
},
{
"level": 1,
"physical": [],
"logical": [],
"aliasOf": [],
"aliases": [],
"instances": [],
"children": [],
"workflows": [],
"identity": "75470b40586cde9e092a01d37798d921",
"originalName": "part-r-00000",
"sourceId": "a09b0233cc58ff7d601eaa68673a20c6",
"firstClassParentId": null,
"parentPath": "/user/hdfs/out1",
"extractorRunId": "a09b0233cc58ff7d601eaa68673a20c6##1372",
"name": "part-r-00000",
"description": null,
"tags": null,
"fileSystemPath": "/user/hdfs/out1/part-r-00000",
"type": "FILE",
"size": 0,
"created": "2015-03-27T17:44:20.576Z",
"lastModified": "2015-03-27T17:44:20.576Z",
"lastAccessed": "2015-03-27T17:44:16.831Z",
"permissions": "rw-r--r--",
"owner": "hdfs",
"group": "supergroup",
"blockSize": null,
"mimeType": "application/octet-stream",
"replication": null,
"userEntity": false,
"deleted": false,
"sourceType": "HDFS",
"internalType": "fselement",
"nameField": "originalName",
"sourceName": "HDFS-1",
"hueLink":
"http://tcdn2-1.ent.cloudera.com:8888/filebrowser/view/user/hdfs/out1/part-r-00000",
"isScript": false,
"hasUpstream": true,
"parent": "89612c409b76f7bdf00036df9c3cb717",
"activeChildren": []
},
{
"level": 1,
"physical": [],
"logical": [],
"aliasOf": [],
"aliases": [],
"instances": [],
"children": [],
"workflows": [],
"identity": "01043ab3a019a68f37f3d33efa122f0f",
"originalName": "part-r-00001",
"sourceId": "a09b0233cc58ff7d601eaa68673a20c6",
"firstClassParentId": null,
"parentPath": "/user/hdfs/out1",
"extractorRunId": "a09b0233cc58ff7d601eaa68673a20c6##1372",
"name": "part-r-00001",
"description": null,
"tags": null,
"fileSystemPath": "/user/hdfs/out1/part-r-00001",
"type": "FILE",
"size": 8,
"created": "2015-03-27T17:44:20.639Z",
"lastModified": "2015-03-27T17:44:20.639Z",
"lastAccessed": "2015-03-27T17:44:16.832Z",
"permissions": "rw-r--r--",
"owner": "hdfs",
"group": "supergroup",
"blockSize": null,
"mimeType": "application/octet-stream",
"replication": null,
"userEntity": false,
"deleted": false,
"sourceType": "HDFS",
"internalType": "fselement",
"nameField": "originalName",
"sourceName": "HDFS-1",
"hueLink":
"http://tcdn2-1.ent.cloudera.com:8888/filebrowser/view/user/hdfs/out1/part-r-00001",
"isScript": false,
"hasUpstream": true,
"parent": "89612c409b76f7bdf00036df9c3cb717",
"activeChildren": []
}
],
"x": 222.4375,
"y": -52
},
"a3ac8013155effa2f96e9de0f177eeb5": {
"level": 1,
"physical": [],
"logical": [],
"aliasOf": [],
"aliases": [],
"instances": [
"69b79a8c0c7701f316dd86894b97fe58"
],
"children": [],
"workflows": [],
"identity": "a3ac8013155effa2f96e9de0f177eeb5",
"originalName": "word count",
"sourceId": "a063e69e6c0660353dc378c836837935",
"firstClassParentId": null,
"parentPath": null,
"extractorRunId": "a063e69e6c0660353dc378c836837935##1381",
"name": "word count",
"description": null,
"tags": null,
"wfIds": null,
"inputFormat": null,
"outputFormat": null,
"outputKey": "org.apache.hadoop.io.Text",
"outputValue": "org.apache.hadoop.io.IntWritable",
"mapper": "org.apache.hadoop.examples.WordCount$TokenizerMapper",
"reducer": "org.apache.hadoop.examples.WordCount$IntSumReducer",
"sourceType": "YARN",
"type": "OPERATION",
"userEntity": false,
"deleted": null,
"internalType": "mrjobspec",
"nameField": "name",
"sourceName": "YARN-1",
"isScript": false
},
"69b79a8c0c7701f316dd86894b97fe58": {
"level": 1,
"physical": [],
"logical": [],
"aliasOf": [],
"aliases": [],
"instances": [],
"children": [],
"workflows": [],
"identity": "69b79a8c0c7701f316dd86894b97fe58",
"originalName": "job_1426651548889_0004",
"sourceId": "a063e69e6c0660353dc378c836837935",
"firstClassParentId": null,
"parentPath": null,
"extractorRunId": "a063e69e6c0660353dc378c836837935##1381",
"name": "job_1426651548889_0004",
"description": null,
"tags": null,
"started": "2015-03-27T17:41:20.896Z",
"ended": "2015-03-27T17:44:21.969Z",
"principal": "hdfs",
"inputs": [
"hdfs://tcdn2-1.ent.cloudera.com:8020/user/hdfs/input"
],
"outputs": [
"hdfs://tcdn2-1.ent.cloudera.com:8020/user/hdfs/out1"
],
"wfInstId": null,
"jobID": "job_1426651548889_0004",
"sourceType": "YARN",
"inputRecursive": false,
"type": "OPERATION_EXECUTION",
"userEntity": false,
"deleted": null,
"internalType": "mrjobinstance",
"nameField": "originalName",
"sourceName": "YARN-1",
"hueLink":
"http://tcdn2-1.ent.cloudera.com:8888/jobbrowser/jobs/application_1426651548889_0004",
"isScript": false,
"hasDownstream": true,
"hasUpstream": true,
"template": "a3ac8013155effa2f96e9de0f177eeb5",
"active": true,
"column": 0,
"renderOrdinal": 0,
"activeChildren": [],
"x": 0,
"y": -52
},
"75470b40586cde9e092a01d37798d921": {
"level": 1,
"physical": [],
"logical": [],
"aliasOf": [],
"aliases": [],
"instances": [],
"children": [],
"workflows": [],
"identity": "75470b40586cde9e092a01d37798d921",
"originalName": "part-r-00000",
"sourceId": "a09b0233cc58ff7d601eaa68673a20c6",
"firstClassParentId": null,
"parentPath": "/user/hdfs/out1",
"extractorRunId": "a09b0233cc58ff7d601eaa68673a20c6##1372",
"name": "part-r-00000",
"description": null,
"tags": null,
"fileSystemPath": "/user/hdfs/out1/part-r-00000",
"type": "FILE",
"size": 0,
"created": "2015-03-27T17:44:20.576Z",
"lastModified": "2015-03-27T17:44:20.576Z",
"lastAccessed": "2015-03-27T17:44:16.831Z",
"permissions": "rw-r--r--",
"owner": "hdfs",
"group": "supergroup",
"blockSize": null,
"mimeType": "application/octet-stream",
"replication": null,
"userEntity": false,
"deleted": false,
"sourceType": "HDFS",
"internalType": "fselement",
"nameField": "originalName",
"sourceName": "HDFS-1",
"hueLink":
"http://tcdn2-1.ent.cloudera.com:8888/filebrowser/view/user/hdfs/out1/part-r-00000",
"isScript": false,
"hasUpstream": true,
"parent": "89612c409b76f7bdf00036df9c3cb717",
"activeChildren": []
},
"fcd80476d5a968e29e86411b4a67af87": {
"level": 1,
"physical": [],
"logical": [],
"aliasOf": [],
"aliases": [],
"instances": [],
"children": [],
"workflows": [],
"identity": "fcd80476d5a968e29e86411b4a67af87",
"originalName": "_SUCCESS",
"sourceId": "a09b0233cc58ff7d601eaa68673a20c6",
"firstClassParentId": null,
"parentPath": "/user/hdfs/out1",
"extractorRunId": "a09b0233cc58ff7d601eaa68673a20c6##1372",
"name": "_SUCCESS",
"description": null,
"tags": null,
"fileSystemPath": "/user/hdfs/out1/_SUCCESS",
"type": "FILE",
"size": 0,
"created": "2015-03-27T17:44:20.848Z",
"lastModified": "2015-03-27T17:44:20.848Z",
"lastAccessed": "2015-03-27T17:44:20.848Z",
"permissions": "rw-r--r--",
"owner": "hdfs",
"group": "supergroup",
"blockSize": null,
"mimeType": "application/octet-stream",
"replication": null,
"userEntity": false,
"deleted": false,
"sourceType": "HDFS",
"internalType": "fselement",
"nameField": "originalName",
"sourceName": "HDFS-1",
"hueLink":
"http://tcdn2-1.ent.cloudera.com:8888/filebrowser/view/user/hdfs/out1/_SUCCESS",
"isScript": false,
"parent": "89612c409b76f7bdf00036df9c3cb717",
"hasUpstream": true,
"activeChildren": []
},
"f3929c0b9b2a16490ee57e0a498eee5e": {
"level": 1,
"physical": [],
"logical": [],
"aliasOf": [],
"aliases": [],
"instances": [],
"children": [],
"workflows": [],
"identity": "f3929c0b9b2a16490ee57e0a498eee5e",
"originalName": "test.txt",
"sourceId": "a09b0233cc58ff7d601eaa68673a20c6",
"firstClassParentId": null,
"parentPath": "/user/hdfs/input",
"extractorRunId": "a09b0233cc58ff7d601eaa68673a20c6##1370",
"name": "test.txt",
"description": null,
"tags": null,
"fileSystemPath": "/user/hdfs/input/test.txt",
"type": "FILE",
"size": 6,
"created": "2015-03-27T17:41:06.825Z",
"lastModified": "2015-03-27T17:41:06.825Z",
"lastAccessed": "2015-03-27T17:41:06.405Z",
"permissions": "rw-r--r--",
"owner": "hdfs",
"group": "supergroup",
"blockSize": null,
"mimeType": "application/octet-stream",
"replication": null,
"userEntity": false,
"deleted": false,
"sourceType": "HDFS",
"internalType": "fselement",
"nameField": "originalName",
"sourceName": "HDFS-1",
"hueLink":
"http://tcdn2-1.ent.cloudera.com:8888/filebrowser/view/user/hdfs/input/test.txt",
"isScript": false,
"hasDownstream": true,
"parent": "72c31f8dbe14a520bd46a747d1382d89",
"activeChildren": []
}
},
"relations": {
"bd3fe737364968a8fbc1831fc9915dca": {
"identity": "bd3fe737364968a8fbc1831fc9915dca",
"type": "DATA_FLOW",
"propagatorId": "268fc2fbba566558b83abd0f0fb680a1",
"extractorRunId": "a09b0233cc58ff7d601eaa68673a20c6##1372",
"sources": {
"entityIds": [
"69b79a8c0c7701f316dd86894b97fe58"
]
},
"targets": {
"entityIds": [
"01043ab3a019a68f37f3d33efa122f0f",
"75470b40586cde9e092a01d37798d921"
]
},
"propagatable": false,
"unlinked": false,
"userSpecified": false
},
"33535116782b0baff207851f9e637cf2": {
"identity": "33535116782b0baff207851f9e637cf2",
"type": "DATA_FLOW",
"propagatorId": "217788ca1d4de53a4071cf026299744f",
"extractorRunId": "a09b0233cc58ff7d601eaa68673a20c6##1372",
"sources": {
"entityIds": [
"f3929c0b9b2a16490ee57e0a498eee5e"
]
},
"targets": {
"entityIds": [
"69b79a8c0c7701f316dd86894b97fe58"
]
},
"propagatable": false,
"unlinked": false,
"userSpecified": false
},
"646e2547f1f1371e99259069f3bbd4db": {
"identity": "646e2547f1f1371e99259069f3bbd4db",
"type": "PARENT_CHILD",
"propagatorId": null,
"extractorRunId": "a09b0233cc58ff7d601eaa68673a20c6##1370",
"children": {
"entityIds": [
"f2eca1680ecca38fa514dc191613c7b4"
]
},
"parent": {
"entityId": "72c31f8dbe14a520bd46a747d1382d89"
},
"propagatable": false,
"unlinked": false,
"userSpecified": false
},
"da3e6b9ccbc9e39de59e85ea6d89fdd7": {
"identity": "da3e6b9ccbc9e39de59e85ea6d89fdd7",
"type": "PARENT_CHILD",
"propagatorId": null,
"extractorRunId": "a09b0233cc58ff7d601eaa68673a20c6##1372",
"children": {
"entityIds": [
"fcd80476d5a968e29e86411b4a67af87"
]
},
"parent": {
"entityId": "89612c409b76f7bdf00036df9c3cb717"
},
"propagatable": false,
"unlinked": false,
"userSpecified": false
},
"15816a23933df14590026425fc0e8d85": {
"identity": "15816a23933df14590026425fc0e8d85",
"type": "PARENT_CHILD",
"propagatorId": null,
"extractorRunId": "a09b0233cc58ff7d601eaa68673a20c6##1372",
"children": {
"entityIds": [
"01043ab3a019a68f37f3d33efa122f0f"
]
},
"parent": {
"entityId": "89612c409b76f7bdf00036df9c3cb717"
},
"propagatable": false,
"unlinked": false,
"userSpecified": false
},
"f8f31d2c2638c22f17600a32631c5639": {
"identity": "f8f31d2c2638c22f17600a32631c5639",
"type": "PARENT_CHILD",
"propagatorId": null,
"extractorRunId": "a09b0233cc58ff7d601eaa68673a20c6##1370",
"children": {
"entityIds": [
"16b093b257033463bab26bba4c707450"
]
},
"parent": {
"entityId": "89612c409b76f7bdf00036df9c3cb717"
},
"propagatable": false,
"unlinked": false,
"userSpecified": false
},
"3dcd15d16d13786480052adbac5e7f7f": {
"identity": "3dcd15d16d13786480052adbac5e7f7f",
"type": "DATA_FLOW",
"propagatorId": "268fc2fbba566558b83abd0f0fb680a1",
"extractorRunId": "a09b0233cc58ff7d601eaa68673a20c6##1372",
"sources": {
"entityIds": [
"69b79a8c0c7701f316dd86894b97fe58"
]
},
"targets": {
"entityIds": [
"75470b40586cde9e092a01d37798d921",
"01043ab3a019a68f37f3d33efa122f0f",
"fcd80476d5a968e29e86411b4a67af87"
]
},
"propagatable": false,
"unlinked": false,
"userSpecified": false
},
"268fc2fbba566558b83abd0f0fb680a1": {
"identity": "268fc2fbba566558b83abd0f0fb680a1",
"type": "DATA_FLOW",
"propagatorId": null,
"extractorRunId": "a09b0233cc58ff7d601eaa68673a20c6##1372",
"sources": {
"entityIds": [
"69b79a8c0c7701f316dd86894b97fe58"
]
},
"targets": {
"entityIds": [
"89612c409b76f7bdf00036df9c3cb717"
]
},
"propagatable": false,
"unlinked": false,
"userSpecified": false
},
"dd299c827ecde0a1c721b396903cc97d": {
"identity": "dd299c827ecde0a1c721b396903cc97d",
"type": "PARENT_CHILD",
"propagatorId": null,
"extractorRunId": "a09b0233cc58ff7d601eaa68673a20c6##1370",
"children": {
"entityIds": [
"f3929c0b9b2a16490ee57e0a498eee5e"
]
},
"parent": {
"entityId": "72c31f8dbe14a520bd46a747d1382d89"
},
"propagatable": false,
"unlinked": false,
"userSpecified": false
},
"217788ca1d4de53a4071cf026299744f": {
"identity": "217788ca1d4de53a4071cf026299744f",
"type": "DATA_FLOW",
"propagatorId": null,
"extractorRunId": "a09b0233cc58ff7d601eaa68673a20c6##1372",
"sources": {
"entityIds": [
"72c31f8dbe14a520bd46a747d1382d89"
]
},
"targets": {
"entityIds": [
"69b79a8c0c7701f316dd86894b97fe58"
]
},
"propagatable": false,
"unlinked": false,
"userSpecified": false
},
"e0680ada742c6fa1ad3a6192bc2a9274": {
"identity": "e0680ada742c6fa1ad3a6192bc2a9274",
"type": "PARENT_CHILD",
"propagatorId": null,
"extractorRunId": "a09b0233cc58ff7d601eaa68673a20c6##1372",
"children": {
"entityIds": [
"75470b40586cde9e092a01d37798d921"
]
},
"parent": {
"entityId": "89612c409b76f7bdf00036df9c3cb717"
},
"propagatable": false,
"unlinked": false,
"userSpecified": false
},
"234107762623c89b811fd0be8a96676a": {
"identity": "234107762623c89b811fd0be8a96676a",
"type": "INSTANCE_OF",
"propagatorId": null,
"extractorRunId": "a063e69e6c0660353dc378c836837935##1381",
"instances": {
"entityIds": [
"69b79a8c0c7701f316dd86894b97fe58"
]
},
"template": {
"entityId": "a3ac8013155effa2f96e9de0f177eeb5"
},
"propagatable": false,
"unlinked": false,
"userSpecified": false
},
"38358d90c0c9675c76626148732a63a4": {
"identity": "38358d90c0c9675c76626148732a63a4",
"type": "DATA_FLOW",
"propagatorId": "268fc2fbba566558b83abd0f0fb680a1",
"extractorRunId": "a09b0233cc58ff7d601eaa68673a20c6##1372",
"sources": {
"entityIds": [
"69b79a8c0c7701f316dd86894b97fe58"
]
},
"targets": {
"entityIds": [
"01043ab3a019a68f37f3d33efa122f0f",
"75470b40586cde9e092a01d37798d921"
]
},
"propagatable": false,
"unlinked": false,
"userSpecified": false
}
}
}
Note: If the value of this property is changed, and service is restarted, then the Cloudera Manager
Agent will start monitoring the new log directory. In this case it is possible that not all events are
published from the old directory. To avoid loss of lineage, when this property is changed, perform
the following steps:
1. Stop the service.
2. Copy lineage log files and (for Impala only) the impalad_lineage_wal file from the old log
directory to the new log directory. This needs to be done on all the hosts where Impala daemons
are running.
3. Start the service.
• Impala Daemon Maximum Lineage Log File Size - The maximum size in number of queries of the lineage log
file before a new file is created.
Required Role:
1. Go to the Impala service.
2. Click the Configuration tab.
3. Select Scope > Impala Daemon.
4. Type lineage in the Search box.
5. Edit the lineage log properties.
6. Click Save Changes to commit the changes.
7. Restart the service.
Schema
Required Role:
A table schema contains information about the names and types of the columns of a table.
HDFS schema contains information about the names and types of the fields in an HDFS Avro or Parquet file.
{
"type" : "record",
"name" : "Stocks",
"namespace" : "com.example.stocks",
"doc" : "Schema generated by Kite",
"fields" : [ {
"name" : "Symbol",
"type" : [ "null", "string" ],
"doc" : "Type inferred from 'AAIT'"
}, {
"name" : "Date",
"type" : [ "null", "string" ],
"doc" : "Type inferred from '28-Oct-2014'"
}, {
"name" : "Open",
"type" : [ "null", "double" ],
"doc" : "Type inferred from '33.1'"
}, {
"name" : "High",
"type" : [ "null", "double" ],
"doc" : "Type inferred from '33.13'"
}, {
"name" : "Low",
"type" : [ "null", "double" ],
"doc" : "Type inferred from '33.1'"
}, {
"name" : "Close",
"type" : [ "null", "double" ],
"doc" : "Type inferred from '33.13'"
}, {
"name" : "Volume",
"type" : [ "null", "long" ],
"doc" : "Type inferred from '400'"
} ]
}