Indexing in Hive

Indexing in Hive

What is Index?

An Index acts as a reference (pointer) to the records.
Instead of searching all the records, we can refer to the index to search for a record.
Indexes to search for a record with minimum overheads.
Indexes also speed up the searching of records.

Why use Indexing in Hive?

Hive is a data warehousing tool present on the top of Hadoop, which provides the SQL kind of interface to perform queries on large datasets.
Since Hive deals with Big Data, the size of files is naturally large and can span up to TeraBytes, PetaBytes or even more.
In a Hive table, there are many numbers of rows & columns. If we want to perform queries only on some columns without Indexing, it will take a large amount of time because queries will be executed on all the columns present in the table.
The major advantage of using indexing is whenever we perform a query on a table that has an index, there is no need for the query to scan all the rows in the table. Further, it checks the index first and then goes to the particular column and perform the operation.

Type of Indexes in Hive:-
1. Compact Indexing
2. Bitmap Indexing

Compact Indexing stores the pair of indexed columns value & its block id.

Bitmap Indexing stores the combination of indexed column value & list of rows as a bitmap.

Difference Between Compact & Bitmap Indexing:-
The main difference is the storing of mapped values of the rows in the different blocks.

To Create the Compact Index:-
Syntax: - CREATE INDEX IndexName
ON TABLE TableName(Columns..)
AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
WITH DEFERRED REBUILD;
Here org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler specifies that a built in CompactIndexHandler will act on the created index, which means we are creating a Compact Index for the table and WITH DEFERRED REBUILD statement should be present in the created index to alter the index in later stages.

hive (hivedb)> create index nyCompactIndex on table ny_cause_of_death_new(year)
             > as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
             > with DEFERRED REBUILD;
OK
Time taken: 0.597 seconds

To View the Compact Index:-
Syntax: - SHOW INDEX on IndexedTableName

hive (hivedb)> show index on ny_cause_of_death_new;
OK
nycompactindex          ny_cause_of_death_new    year                    hivedb__ny_cause_of_death_new_nycompactindex__    compact                 
Time taken: 0.152 seconds, Fetched: 1 row(s)

To Drop the Compact Index:-
Syntax: - DROP INDEX IndexName on IndexedTableName

hive (hivedb)> drop index nyCompactIndex on ny_cause_of_death_new;
OK
Time taken: 0.54 seconds

hive (hivedb)> show index on ny_cause_of_death_new;
OK
Time taken: 0.113 seconds

To Create the BITMAP Index:-
Syntax: - CREATE INDEX IndexName
ON TABLE TableName(Columns..)
AS 'BITMAP'
WITH DEFERRED REBUILD;
Here, BITMAP defines the type of index as BITMAP.

hive (hivedb)> create index nyBitmapIndex on table ny_cause_of_death_new(year)
             > as 'BITMAP'
             > with DEFERRED REBUILD;
OK
Time taken: 0.599 seconds

To View the Bitmap Index:-
Syntax: - SHOW INDEX on IndexedTableName

hive (hivedb)> show index on ny_cause_of_death_new;
OK
nybitmapindex           ny_cause_of_death_new    year                    hivedb__ny_cause_of_death_new_nybitmapindex__    bitmap                  
Time taken: 0.105 seconds, Fetched: 1 row(s)

To Drop the Bitmap Index:-
Syntax: - DROP INDEX IndexName on IndexedTableName

hive (hivedb)> drop index nyBitmapIndex on ny_cause_of_death_new;
OK
Time taken: 0.397 seconds

hive (hivedb)> show index on ny_cause_of_death_new;
OK
Time taken: 0.126 seconds

Comments

Mahesh Chandra Maharana4 February 2017 at 19:12
Good Explanation...
Really helpful for exploring additional information on HIVE..
Unknown4 February 2017 at 20:42
Good Snehal... Really helpfull
తెలుగు అప్డేట్స్ 15 February 2017 at 07:07
Good sneha ....!very clear explanation ....!
Unknown5 July 2017 at 23:48
Really Good one
and very clear explanations!!!
Tejuteju17 April 2018 at 01:24
Excellent article. Very interesting to read. I really love to read such a nice article. Thanks! keep rocking.Hadoop Admin Online Training India
Unknown8 July 2018 at 05:08
thakyou it vry nice blog for beginners
https://www.emexotechnologies.com/courses/big-data-analytics-training/big-data-hadoop-training/
Unknown28 December 2018 at 01:44
It is really a great work and the way in which you are sharing the knowledge is excellent.

big data company in chennai
Alfred Avina29 January 2020 at 01:18
As we know, AWS big data consultant is the future of the industries these days, this article helps me to figure out which language I need to learn to pursue the future in this field.

veera4 July 2020 at 00:20
Very nice post with very unique content.
keep updating More Blogs.
big data hadoop course
big data hadoop training
big data online training
big data online course

Thank You..
Anonymous19 April 2022 at 22:19
Indexing In Hive >>>>> Download Now

>>>>> Download Full

Indexing In Hive >>>>> Download LINK

>>>>> Download Now

Indexing In Hive >>>>> Download Full

>>>>> Download LINK gF

Snehal's Big Data Spark Hadoop Zone

Search This Blog

Indexing in Hive

Comments

Post a Comment