HDFS - Hadoop Distributed File System

39 / 58

Who splits the file into blocks?

See Answer

Note - Having trouble with the assessment engine? Follow the steps listed here


No hints are availble for this assesment

Please login to comment

9 Comments

Sir please clarify this..

 1  Upvote    Share

Hi, Sonal.  

If you want to splits the file into blocks then user should write the programs.

When a file is written into HDFS, HDFS divides the file into blocks and takes care of its replication. This is done by writing into the configutaions files. 

The parameters like no, of blocks to be splits, replications factors, NUmber of mappers and reducers. etc need to be done. 

In coming session it will be explained!

All the best!

 2  Upvote    Share

thank you so much

  Upvote    Share

Hi,
I think the file into blocks is done by hdfs.But data into inputsplits is done by client program.

  Upvote    Share

Hi Manoj,

The Client writing to HDFS splits the file. The Client is aware of block size. A temporary file of size BLOCK is created in local disk which is then transferred to HDFS

Hope this helps.

 1  Upvote    Share

Hi Abhinav,
I think you misunderstood my question or may be i communicated badly:).Anyways i am speaking of two different concepts block(physical file division) and inputsplit(logical division as input to each task).In your reply can you please explain what do you mean by client divides?will there be any client process started when he writes data to hdfs?

And can you please explain who and how inputsplit size is decided?

/Manoj

  Upvote    Share

Hi Manoj,

When a user is writing to HDFS, they use a client program or library such as we use "hadoop fs ..." from unix command line.
This program interacts with namenodes and datanodes while writing the file. This client program or a client library is responsible to split the file into blocks or chunks of 128MB (or 64MB previously) while writing to various datanodes.

Regarding the InputSplits:
InputSplits are formed out of HDFS blocks during the processing phase. The HDFS blocks are raw blocks of data of fixed size irrespective of format while the inputsplits are logical groups of records. Say, during map-reduce process the raw data of HDFS blocks is converted into InputSplits (which are bunch of records). The conversion is done by InputFormat class.

So, depending upon the data and our requirements of processing we choose various input formats. One file can be loaded using different inputformats and thus different kinds of input formats can be created out of a file.

 5  Upvote    Share

Can you please explain what all functions the client program performs ?

 1  Upvote    Share

Can someone answer? I too have the same question

 1  Upvote    Share