HDFS is the primary storage system used in Hadoop for distributed storage and data
processing. Here are a few fundamental commands:
1. Upload a file to HDFS:
php
hdfs dfs -put <local-file> <hdfs-destination>
Example:
sql
hdfs dfs -put local-file.txt /user/hadoop/data/
List files and directories in HDFS:
bash
hdfs dfs -ls <hdfs-path>
Example:
bash
hdfs dfs -ls /user/hadoop/data/
Create a directory in HDFS:
arduino
hdfs dfs -mkdir <hdfs-directory>
Example:
bash
hdfs dfs -mkdir /user/hadoop/output/
Copy files between HDFS and local filesystem:
php
hdfs dfs -copyToLocal <hdfs-source> <local-destination>
Example:
sql
hdfs dfs -copyToLocal /user/hadoop/data/local-file.txt local-copy/
Move files within HDFS:
php
hdfs dfs -mv <hdfs-source> <hdfs-destination>
Example:
bash
hdfs dfs -mv /user/hadoop/data/local-file.txt /user/hadoop/archive/
Remove files or directories from HDFS:
bash
hdfs dfs -rm <hdfs-path>
Example:
bash
hdfs dfs -rm /user/hadoop/data/unwanted-file.txt
View the content of a file in HDFS:
bash
hdfs dfs -cat <hdfs-file>
Example:
bash
hdfs dfs -cat /user/hadoop/data/sample.txt
Get the summary of a file or directory in HDFS:
bash
hdfs dfs -du -s -h <hdfs-path>
Example:
bash
hdfs dfs -du -s -h /user/hadoop/data/
Change the replication factor of a file in HDFS:
php
hdfs dfs -setrep -w <replication-factor> <hdfs-file>
Example:
bash
hdfs dfs -setrep -w 3 /user/hadoop/data/high-replication-file.txt
Check the available disk space on HDFS:
hdfs dfsadmin -report
1.
Remember to replace placeholders like <local-file>, <hdfs-destination>, <hdfs-
path>, etc., with actual paths and filenames.
These basic commands should help you get started with managing files and directories in
HDFS. Hadoop HDFS commands often follow a similar structure to traditional Unix commands,
so if you're familiar with Unix/Linux commands, you'll find it relatively easy to work with HDFS.
Check Hadoop version:
hadoop version
View Hadoop cluster information:
hadoop dfsadmin -report
Submit a MapReduce job:
hadoop jar <jar-file> <main-class> <input-path> <output-path>
Example:
hadoop jar myjob.jar com.example.WordCount /user/hadoop/input
/user/hadoop/output
View the status of MapReduce jobs:
hadoop job -list
Kill a MapReduce job:
hadoop job -kill <job-id>
Example:
hadoop job -kill job_123456789_0001
View the content of a file in HDFS:
hadoop fs -cat <hdfs-file>
Example:
hadoop fs -cat /user/hadoop/data/sample.txt
List files and directories in HDFS:
hadoop fs -ls <hdfs-path>
Example:
hadoop fs -ls /user/hadoop/data/
Create a directory in HDFS:
hadoop fs -mkdir <hdfs-directory>
Example:
hadoop fs -mkdir /user/hadoop/output/
Copy files between HDFS and local filesystem:
hadoop fs -copyToLocal <hdfs-source> <local-destination>
Example:
hadoop fs -copyToLocal /user/hadoop/data/local-file.txt local-copy/
Remove files or directories from HDFS:
hadoop fs -rm <hdfs-path>
Example:
hadoop fs -rm /user/hadoop/data/unwanted-file.txt
View available disk space on HDFS:
hadoop fs -df -h
1.
Remember to replace placeholders like <jar-file>, <main-class>, <input-path>,
<output-path>, <hdfs-file>, <hdfs-path>, etc., with actual paths, filenames, and IDs.