Indexing in Hive
What is Index?
- An Index acts as a reference (pointer) to the records.
- Instead of searching all the records, we can refer to the index to search for a record.
- Indexes to search for a record with minimum overheads.
- Indexes also speed up the searching of records.
- Hive is a data warehousing tool present on the top of Hadoop, which provides the SQL kind of interface to perform queries on large datasets.
- Since Hive deals with Big Data, the size of files is naturally large and can span up to TeraBytes, PetaBytes or even more.
- In a Hive table, there are many numbers of rows & columns. If we want to perform queries only on some columns without Indexing, it will take a large amount of time because queries will be executed on all the columns present in the table.
- The major advantage of using indexing is whenever we perform a query on a table that has an index, there is no need for the query to scan all the rows in the table. Further, it checks the index first and then goes to the particular column and perform the operation.
1. Compact Indexing
2. Bitmap Indexing
Compact Indexing stores the pair of indexed columns value & its block id.
Bitmap Indexing stores the combination of indexed column value & list of rows as a bitmap.
Difference Between Compact & Bitmap Indexing:-
The main difference is the storing of mapped values of the rows in the different blocks.
To Create the Compact Index:-
Syntax: - CREATE INDEX IndexName
ON TABLE TableName(Columns..)
AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
WITH DEFERRED REBUILD;
Here org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler specifies that a built in CompactIndexHandler will act on the created index, which means we are creating a Compact Index for the table and WITH DEFERRED REBUILD statement should be present in the created index to alter the index in later stages.
hive (hivedb)> create index nyCompactIndex on table ny_cause_of_death_new(year)
> as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
> with DEFERRED REBUILD;
OK
Time taken: 0.597 seconds
To View the Compact Index:-
Syntax: - SHOW INDEX on IndexedTableName
hive (hivedb)> show index on ny_cause_of_death_new;
OK
nycompactindex ny_cause_of_death_new year hivedb__ny_cause_of_death_new_nycompactindex__ compact
Time taken: 0.152 seconds, Fetched: 1 row(s)
To Drop the Compact Index:-Syntax: - DROP INDEX IndexName on IndexedTableName
hive (hivedb)> drop index nyCompactIndex on ny_cause_of_death_new;
OK
Time taken: 0.54 seconds
hive (hivedb)> show index on ny_cause_of_death_new;
OK
Time taken: 0.113 seconds
To Create the BITMAP Index:-
Syntax: - CREATE INDEX IndexName
ON TABLE TableName(Columns..)
AS 'BITMAP'
WITH DEFERRED REBUILD;
Here, BITMAP defines the type of index as BITMAP.
hive (hivedb)> create index nyBitmapIndex on table ny_cause_of_death_new(year)
> as 'BITMAP'
> with DEFERRED REBUILD;
OK
Time taken: 0.599 seconds
To View the Bitmap Index:-
Syntax: - SHOW INDEX on IndexedTableName
hive (hivedb)> show index on ny_cause_of_death_new;
OK
nybitmapindex ny_cause_of_death_new year hivedb__ny_cause_of_death_new_nybitmapindex__ bitmap
Time taken: 0.105 seconds, Fetched: 1 row(s)
To Drop the Bitmap Index:-
Syntax: - DROP INDEX IndexName on IndexedTableName
hive (hivedb)> drop index nyBitmapIndex on ny_cause_of_death_new;
OK
Time taken: 0.397 seconds
hive (hivedb)> show index on ny_cause_of_death_new;
OK
Time taken: 0.126 seconds
Good Explanation...
ReplyDeleteReally helpful for exploring additional information on HIVE..
Indexing In Hive >>>>> Download Now
Delete>>>>> Download Full
Indexing In Hive >>>>> Download LINK
>>>>> Download Now
Indexing In Hive >>>>> Download Full
>>>>> Download LINK TI
Good Snehal... Really helpfull
ReplyDeleteGood sneha ....!very clear explanation ....!
ReplyDeleteReally Good one
ReplyDeleteand very clear explanations!!!
Excellent article. Very interesting to read. I really love to read such a nice article. Thanks! keep rocking.Hadoop Admin Online Training India
ReplyDeletethakyou it vry nice blog for beginners
ReplyDeletehttps://www.emexotechnologies.com/courses/big-data-analytics-training/big-data-hadoop-training/
It is really a great work and the way in which you are sharing the knowledge is excellent.
ReplyDeletebig data company in chennai
As we know, AWS big data consultant is the future of the industries these days, this article helps me to figure out which language I need to learn to pursue the future in this field.
ReplyDeleteVery nice post with very unique content.
ReplyDeletekeep updating More Blogs.
big data hadoop course
big data hadoop training
big data online training
big data online course
Thank You..
Indexing In Hive >>>>> Download Now
ReplyDelete>>>>> Download Full
Indexing In Hive >>>>> Download LINK
>>>>> Download Now
Indexing In Hive >>>>> Download Full
>>>>> Download LINK gF