Hive

2 / 18

Hive - Data Types




Not able to play video? Try with vimeo

Many relational databases such as MySQL and PostgreSQL can be used for metastore in Hive. To accommodate different data types in relational databases, hive provides an extensive list of data types.

Let's go through data types in Hive.

  • TINYINT - Represents 1-byte signed integer
  • SMALLINT - Represents 2-byte signed integer
  • INT - Represents 4-byte signed integer
  • BIGINT - Represents 8-byte signed integer
  • FLOAT - Represents 4-byte single-precision floating-point number
  • DOUBLE - Represents 8-byte double-precision floating-point number
  • DECIMAL - is used to specify user-defined precision
  • TIMESTAMP - supports traditional UNIX timestamp
  • DATE - describes a particular year, month and day

Hive provides STRING, VARCHAR, and CHAR as string data types. Hive provides Boolean and Binary data types to store boolean values. Arrays support list of values. Arrays in Hive are similar to list in Java. Maps in Hive are similar to Java Maps. We can store key, value pairs in maps. structs in Hive are similar to structs in C. It is used to store complex data. A struct is basically a predefined key-value pair along with datatypes of the values. Union is a collection of heterogeneous data types. Union types can hold only one of their specified data types at any point of time for a record.

Let's understand how we will use various data types in the employees table.

The name can be of varchar, char, or string data type. Salary is a float. Subordinates is an array having the list of subordinate names or ids. Deductions is a map. The address is of struct data type. A user can signup using social providers such as Facebook and google or using email id. So auth column is of union datatype. Please note that auth column can have only one data type from Facebook id, google id, and email at any point in time because users can signup using either Facebook, Google, or email.


Loading comments...