Web Log Analysis using Hive with Regex
A Server log is a log file automatically created and maintained by a server consists of a list of activities it performed. A typical example is a web server log which maintains a history of page requests.
The W3C maintains a standard format (the Common Log Format) for web server log files. Information about the request, including client IP address, request date/time, the page requested, HTTP code, byte served, user agent and referrer are typically added. This data can be combined into a single file, or separated into distinct logs, such as an access log, error log, or referrer log.
Hive is a data warehouse infrastructure that provides data summarization and ad-hoc querying. Hive provides an SQL dialect, called Hive Query Language (HQL) for querying data stored in a Hadoop cluster.
Hive’s data model provides a high-level, table-like structure on top of HDFS.
Hive’s data model provides a high-level, table-like structure on top of HDFS.
Web Log Format:-
Below regular expression, we will use for log pattern matching.
64.242.88.10 - - [07/Mar/2014:16:20:55 -0800] "GET /twiki/bin/view/Main/DCCAndPostFix HTTP/1.1" 200 5253
Ipaddress %h : (64.242.88.10) ip address of the client (hostname).
Logname %l : (-) The “hyphen” in the output indicates that the requested piece of information is not available.
Userid %u : (-) This is the userid of the person request the document as determined by the HTTP authentication. "–" present then the requested information is not available NA
Timestamp %t : [07/Mar/2014:16:20:55 -0800] time at which server finished processing request.
The format is
[day/month/year:hour:minute:second zone]
day = 2*digit
month = 3*letter
year = 4*digit
hour = 2*digit
minute = 2*digit
second = 2*digit
zone = (`+’ | `-‘) 4*digit
Request %r : "GET /twiki/bin/view/Main/DCCAndPostFix HTTP/1.1" request made by client. Denoted by “GET”
Status code %s : 200 is the HTTP status code returned to the client.
2xx is a successful response,
3xx a redirection,
4xx a client error,
5xx a server error.
Size of Object %b : 5253 is the size of the object returned to the client, measured in bytes.
Regular Expression Regex:-
Refer the link to learn Regular Expression Regular Expression tutorial
Below regular expression, we will use for log pattern matching.
Download Input log_access file
In the below HiveQL script, we are using RegexSerDe class to process the log file with the help of above regular expression.
Execution of the Hive script logprochiveregex.hql
In the below HiveQL script, we are using RegexSerDe class to process the log file with the help of above regular expression.
Execution of the Hive script logprochiveregex.hql
[hduser@localhost bin]$ hive -f /home/hduser/HIVE/logprochiveregex.hql
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.6.0/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in jar:file:/usr/local/hadoop-2.6.0/hive/lib/hive-common-2.1.0.jar!/hive-log4j2.properties Async: true
OK
Time taken: 8.815 seconds
OK
Time taken: 1.496 seconds
Loading data to table default.log_processing
OK
Time taken: 2.1 seconds
OK
# col_name data_type comment
ipaddress string
logname string
userid string
time string
request string
status string
size string
# Detailed Table Information
Database: default
Owner: hduser
CreateTime: Fri Mar 24 09:32:53 PDT 2017
LastAccessTime: UNKNOWN
Retention: 0
Location: hdfs://localhost:9000/user/hive/warehouse/log_processing
Table Type: MANAGED_TABLE
Table Parameters:
numFiles 1
numRows 0
rawDataSize 0
totalSize 174447
transient_lastDdlTime 1490373175
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.RegexSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
input.regex ([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\]) ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)
output.format.string %1$s %2$s %3$s %4$s %5$s %6$s %7$s
serialization.format 1
Time taken: 0.861 seconds, Fetched: 37 row(s)
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hduser_20170324093243_e4985fc8-d3d5-4eda-8b47-80a7e1911b63
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1490354948174_0003, Tracking URL = http://localhost:8088/proxy/application_1490354948174_0003/
Kill Command = /usr/local/hadoop/bin/hadoop job -kill job_1490354948174_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2017-03-24 09:36:07,239 Stage-1 map = 0%, reduce = 0%
2017-03-24 09:37:07,594 Stage-1 map = 0%, reduce = 0%
2017-03-24 09:38:08,660 Stage-1 map = 0%, reduce = 0%
2017-03-24 09:39:03,514 Stage-1 map = 67%, reduce = 0%, Cumulative CPU 41.82 sec
2017-03-24 09:39:05,651 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 42.37 sec
2017-03-24 09:40:06,470 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 45.91 sec
MapReduce Total cumulative CPU time: 45 seconds 910 msec
Ended Job = job_1490354948174_0003
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 45.91 sec HDFS Read: 184082 HDFS Write: 1423 SUCCESS
Total MapReduce CPU Time Spent: 45 seconds 910 msec
OK
64.242.88.10 - - [07/Mar/2014:16:05:49 -0800] "GET /twiki/bin/edit/Main/Double_bounce_sender?topicparent=Main.ConfigurationVariables HTTP/1.1" 401 12846
64.242.88.10 - - [07/Mar/2014:16:06:51 -0800] "GET /twiki/bin/rdiff/TWiki/NewUserTemplate?rev1=1.3&rev2=1.2 HTTP/1.1" 200 4523
64.242.88.10 - - [07/Mar/2014:16:10:02 -0800] "GET /mailman/listinfo/hsdivision HTTP/1.1" 200 6291
64.242.88.10 - - [07/Mar/2014:16:11:58 -0800] "GET /twiki/bin/view/TWiki/WikiSyntax HTTP/1.1" 200 7352
64.242.88.10 - - [07/Mar/2014:16:20:55 -0800] "GET /twiki/bin/view/Main/DCCAndPostFix HTTP/1.1" 200 5253
64.242.88.10 - - [07/Mar/2014:16:23:12 -0800] "GET /twiki/bin/oops/TWiki/AppendixFileSystem?template=oopsmore¶m1=1.12¶m2=1.12 HTTP/1.1" 200 11382
64.242.88.10 - - [07/Mar/2014:16:24:16 -0800] "GET /twiki/bin/view/Main/PeterThoeny HTTP/1.1" 200 4924
64.242.88.10 - - [07/Mar/2014:16:29:16 -0800] "GET /twiki/bin/edit/Main/Header_checks?topicparent=Main.ConfigurationVariables HTTP/1.1" 401 12851
64.242.88.10 - - [07/Mar/2014:16:30:29 -0800] "GET /twiki/bin/attach/Main/OfficeLocations HTTP/1.1" 401 12851
64.242.88.10 - - [07/Mar/2014:16:31:48 -0800] "GET /twiki/bin/view/TWiki/WebTopicEditTemplate HTTP/1.1" 200 3732
Time taken: 432.103 seconds, Fetched: 10 row(s)
Is this only possible with Hive,HQL and hadoop.what if I am using hibernate or any other ORM and any other SQL database
ReplyDeleteWeb Log Analysis Using Hive With Regex >>>>> Download Now
Delete>>>>> Download Full
Web Log Analysis Using Hive With Regex >>>>> Download LINK
>>>>> Download Now
Web Log Analysis Using Hive With Regex >>>>> Download Full
>>>>> Download LINK 6J
I don't have any idea about Hibernate and other ORM. But you can process your log file using Python also.
ReplyDeletethis blog having more useful information and helpful to everyone.. thanks a lot for sharing
ReplyDeletehadoop training institute in velachery | big data training institute in velachery | hadoop training in chennai velachery | big data training in chennai velachery
Thanks for your comment.
Deleteafter reading this blog i got more useful information.. thanks a lot for sharing
ReplyDeletehadoop training and placements | big data training and placements | hadoop training course contents | big data training course contents
Thanks for your comment.
Delete[07/Mar/2014:16:20:55 -0800] like this there are so many in year right.
ReplyDeletehow to apply a query on this to retrive the data of only march,or for a particular date of the week
I really appreciate information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor lead live training in TECHNOLOGY , kindly contact us http://www.maxmunus.com/contact
ReplyDeleteMaxMunus Offer World Class Virtual Instructor led training on TECHNOLOGY. We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 100000+ trainings in India, USA, UK, Australlia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.
For Demo Contact us.
Pratik Shekhar
MaxMunus
E-mail: pratik@maxmunus.com
Ph:(0) +91 9066268701
http://www.maxmunus.com/
Excellent Blog very imperative good content, this article is useful to beginners and real time
ReplyDeleteemployees.Thank u for sharing...
Hadoop Training in Hyderabad
Thanks for sharing such a usefull information.
ReplyDeleteHadoop training in Hyderabad
Hadoop training institute in Hyderabad
Hadoop training institute in ameerpet
Hadoop institutes in Hyderabad
Hadoop training centers in Hyderabad
Great and interesting article to read.. i Gathered more useful and new information from this article.thanks a lot for sharing this article to us..
ReplyDeletebig data training in Velachery | big data hadoop training
Thanks for sharing. Nice Blog! Thanks for sharing valuable information with us.
ReplyDeleteBig Data Hadoop Online Training
Thanks For Sharining..A good Information..This is a nice Blog Keep Sharining This Type Of Information..
ReplyDeleteHadoop Online Training In Hyderabad
very useful information .Thank you for sharing big data hadoop online training bangalore
ReplyDeletevery useful concept in Hadoop . thank you for sharing big data hadoop online training bangalore
ReplyDeleteYour good knowledge and kindness in playing with all the pieces were very useful. I don’t know what I would have done if I had not encountered such a step like this. Big Data Training Institute in Chennai
ReplyDeleteIt is nice blog Thank you porovide importent information and i am searching for same information to save my timeBig data hadoop online training India
ReplyDeleteHi,
ReplyDeleteThanks For Sharing a great Information..This is a nice Blog Keep updating us by sharing latest information...
Thank you
Hariprasad
Hadoop Training In Hyderabad
Hello,
ReplyDeleteAn in-depth knowledge of a Hadoop Administration project ensures all the critical components are well-covered. With this knowledge, you can increase your visibility and enhance your efficiency in drawing real connections among different components of Hadoop.
Hello Snehal your blog is very nice and knowledgeable. Keep sharing these type of post and thanks for sharing this one.
ReplyDeleteBig Data Analyst | Spark and Hadoop HDFS | Apache Spark
Hi, It is really nice to see the best blog for HadoopTutorial .This blog helped me a lot easily understandable too.And i learnt about Agile Methodology too... Hadoop Training in Velachery | Hadoop Training .
ReplyDeletehadoop training in bangalore
ReplyDeleteWonderful blog. It was a great content. Thanks for posting such a great information.
ReplyDeletehadoop training in pune
hadoop spark classes in pune
hadoop testing
hadoop pune
ReplyDeleteThe strategy you have posted on this technology helped me to get into the next level and had lot of information in it...
AngularJs Training in Chennai OMR | ReactJs Training in Chennai OMR | Node.Js Training in Chennai OMR
It is really a great work and the way in which you are sharing the knowledge is excellent.
ReplyDeletebig data analytics company in hyderabad
This is best practiced for using user generated content and having right article to see you here and thanks a lot for sharing with us.
ReplyDeletehttps://www.bharattaxi.com
Data Science
ReplyDeleteData science refers to complex processing techniques on statistical data for mining purpose, commons techniques are regression, classification etc. Machine learning algorithms would be covered as a part of this syllabus.
Visit - https://www.etlhive.com/course/data-science-training-pune-and-data-analytics-training-pune/
Really very informative and creative contents. This concept is a good way to enhance the knowledge.thanks for sharing. please keep it up.
ReplyDeleteBig data hadoop training in mumbai
Big data hadoop training in mumbai
Thanks for sharing your valuable information and time.
ReplyDeleteBig Data Training in Delhi
Big Data Training institute in Delhi
awesome blog.. thank you for sharing your good information..
ReplyDeleteAngularJS interview questions and answers/angularjs interview questions/angularjs 6 interview questions and answers/mindtree angular 2 interview questions/jquery angularjs interview questions/angular interview questions/angularjs 6 interview questions/angularjs interview questions and answers for 3 years experience
very nice blog...I will definitely follow your blog in future
ReplyDeleteHadoop Online Training
Hadoop Training in Hyderabad
At Superfastprocessing, we use a range of servers with high-fault tolerance and equipped with load balancers. The load balancers ensure high availability of servers at all times.
ReplyDeleteThanks for sharing such a great blog Keep posting..
ReplyDeleteBig Data Training In Delhi
Big Data Training Institute in Delhi
Good to know about the email list business. I was looking for such a service for a long time o grow my local business but the rates that other companies were offering were not satisfactory. Thanks for sharing the recommendations in this post.hadoop training in bangalore
ReplyDeleteVery good write-up. I definitely appreciate this site. Keep it up!
ReplyDeleteUI Development Training in Bangalore
Reactjs Training in Bangalore
PHP Training in Bangalore
Good Blog, well descrided, Thanks for sharing this information.
ReplyDeleteSpark and Scala Online Training
really good information thank you
ReplyDeleteSpark and Scala Online Training
Nice content it has usefull information
ReplyDeletehttps://snehalthakur.blogspot.com/2017/03/sdlcagile-methodolgy.html?showComment=1583927758809#c1191429501244426371
I like your post there is a lot of information about software testing, which i would like to learn, thank you for the great guide. Very useful post and I think it is rather easy to see from the other comments as well that this post is well written and useful. I bookmarked this blog a while ago because of the useful content and I am never being disappointed. Keep up the good work.. Read more about QA Services
ReplyDeleteYou there, this is really good post here. Thanks for taking the time to post such valuable information. Quality content is what always gets the visitors coming. Data Blending in Tableau
ReplyDeletevery nice post snehal.
ReplyDeleteThis post is really good and it's very helpful for me.
Thank you.
big data training
big data course
big data and hadoop training
big data hadoop certification
This comment has been removed by the author.
ReplyDeleteHi to everybody, here everyone is sharing such knowledge, so it’s fastidious to see this site, and I used to visit this blog daily
ReplyDeletedata scientist course malaysia
I am looking for and I love to post a comment that "The content of your post is awesome" Great work!
ReplyDeletebusiness analytics course
data analytics courses hyderabad
data science training
I am impressed by the information that you have on this blog. It shows how well you understand this subject.
ReplyDeletedata science course in guwahati
Very nice post.
ReplyDeletebig data and hadoop online training
Thanks for taking the time to discuss this, I feel strongly about it and love learning more on this topic about analysis process. If possible, as you gain expertise, would you mind updating your blog with extra information? It is extremely helpful for me.
ReplyDeleteDevOps Training in Chennai
DevOps Online Training in Chennai
DevOps Training in Bangalore
DevOps Training in Hyderabad
DevOps Training in Coimbatore
DevOps Training
DevOps Online Training
I need to to thank you for your time due to this fantastic read!! I definitely enjoyed every bit of it and I have you bookmarked to see new information on your blog.keep it up!!
ReplyDeleteJava training in Chennai
Java Online training in Chennai
Java Course in Chennai
Best JAVA Training Institutes in Chennai
Java training in Bangalore
Java training in Hyderabad
Java Training in Coimbatore
Java Training
Java Online Training
thank you for the information. It is very useful and informative
ReplyDeleteangular js training in chennai
angular training in chennai
angular js online training in chennai
angular js training in bangalore
angular js training in hyderabad
angular js training in coimbatore
angular js training
angular js online training
Really very informative and creative contents. This concept is a good way to enhance the knowledge.thanks for sharing. please keep it up.
ReplyDeleteAWS Course in Chennai
AWS Course in Bangalore
AWS Course in Hyderabad
AWS Course in Coimbatore
AWS Course
AWS Certification Course
AWS Certification Training
AWS Online Training
AWS Training
Nice post ! Thanks for sharing valuable information with us. Keep sharing.
ReplyDeleteacte reviews
acte velachery reviews
acte tambaram reviews
acte anna nagar reviews
acte porur reviews
acte omr reviews
acte chennai reviews
acte student reviews
very nice blogs!!! i have to learning for lot of information for this sites...Sharing for wonderful information.Thanks for sharing this valuable information to our vision. You have posted a trust worthy blog keep sharing.
ReplyDeletepython training in bangalore
python training in hyderabad
python online training
python training
python flask training
python flask online training
python training in coimbatore
On this count, Python scores far better than JavaScript. It is designed to be as beginner-friendly as possible and uses simple variables and functions. JavaScript is full of complexities like class definitions. When it comes to ease of learning, Python is the clear winner.thanks lot!!
ReplyDeleteAndroid Training in Chennai
Android Online Training in Chennai
Android Training in Bangalore
Android Training in Hyderabad
Android Training in Coimbatore
Android Training
Android Online Training
Superb blog post! And this blog clearly explain about for useful information. I would Thanks for sharing this wonderful content.its very useful to us. Keep it up!
ReplyDeleteSoftware Testing Training in Chennai
Software Testing Online Training in Chennai
Software Testing Courses in Chennai
Software Testing Training in Bangalore
Software Testing Training in Hyderabad
Software Testing Training in Coimbatore
Software Testing Training
Software Testing Online Training
Your blog is very useful for me, Thanks for your sharing.
ReplyDelete| Certification | Cyber Security Online Training Course | Ethical Hacking Training Course in Chennai | Certification | Ethical Hacking Online Training Course | CCNA Training Course in Chennai | Certification | CCNA Online Training Course | RPA Robotic Process Automation Training Course in Chennai | Certification | RPA Training Course Chennai | SEO Training in Chennai | Certification | SEO Online Training Course
Very nice article,keep sharing more information with us.
ReplyDeletethank you.....
big data online training
I would like to thank you for the efforts you have made in writing this article. I am hoping the same best work from you in the future as well. In fact your creative writing abilities has inspired me to start my own Blog Engine blog now. Really the blogging is spreading its wings rapidly. Your write up is a fine example of it.
ReplyDelete360digitmg
We are the leading Blockchain Development Company headquartered in the USA. We follow a dynamic approach to deliver the most innovative and robust Blockchain-based applications. Moreover, We are continuously updated with the technologies emerging in the market, and adopting them quickly as well is what makes us stand out and the best.
ReplyDelete
ReplyDeleteSuch a very useful article. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article.
data science training in ecil
Thanks for such a wonderful content. Our Motive is not just to create links but to get them indexed as will
ReplyDeleteIncrease Domain Authority (DA).We’re on a mission to increase DA PA of your domain
High Quality Backlink Building Service
Boost DA upto 15+ at cheapest
Boost DA upto 25+ at cheapest . Very Helpful
I have no words for this amazing post. Really good information. As we know mobile app industry is increasing day by day. People are planning to move into mobile technology, but they are not sure about the best mobile app development framework. In this case, I want you to get the perfect knowledge of app framework. Thanks!
ReplyDelete
ReplyDeleteI know this is an amazing post, it defines the true value of your knowledge. In fact, running a business is not common. People keep running to drive more business and generate more customers. At RisingMax which is best IT consulting companies in NYC, you can maintain a leading position with real estate software development in New York. keep it up. I really think this article is amazing, I can't describe it in words. Also, if you need an automotive software development service, do not delay in shaking hands with RisingMax.
More impressive Blog!!! Its more useful for us...Thanks for sharing with us...
ReplyDeleteBig Data Training in Chennai
Big Data Training in Bangalore
Big Data Online Course
Big Data Course in Coimbatore
Thanks for sharing this blog along with reference links.
ReplyDeleteData Science Certification in Chennai
Ethical Hacking Training in Chennai
You made such an interesting piece to read, giving every subject enlightenment for us to gain knowledge. Thanks for sharing the such information with us to read this... Alves Anus Dybala Dina Mika Mitali Manik Luis Eric Marlisa
ReplyDeleteThis is a really authentic and informative blog. Share more posts like this.
ReplyDeletePhonetics Sounds With Examples
Basics Of Phonetics
Thank you for the useful information. Share more updates.
ReplyDeleteIdioms
Speaking Test
Looking forward to reading more. Great blog post. Great.
ReplyDeletesccm training
sccm online training
Thanks for this blog, This blog contains more useful Information...
ReplyDeleteWhat Is MERN
What Is MERN Stack Used For
This is really very nice post you shared, i like the post, thanks for sharing..
ReplyDeletefull stack web development course in malaysia
Web Log Analysis Using Hive With Regex >>>>> Download Now
ReplyDelete>>>>> Download Full
Web Log Analysis Using Hive With Regex >>>>> Download LINK
>>>>> Download Now
Web Log Analysis Using Hive With Regex >>>>> Download Full
>>>>> Download LINK Yi
Professor laugh form now beat drug. Record ago data little teach though.entertainment
ReplyDeleteThank you for providing information about CMA Coaching Centers in Hyderabad
ReplyDeleteCMA Coaching Centres in Hyderabad
Your blog on Web Log Analysis using Hive with Regex is an excellent resource! Your step-by-step guide to performing web log analysis using Hive with regular expressions is incredibly informative and practical.
ReplyDeleteThe Ultimate Data Analytics Training Course- Hands-On Learning and Real-World Applications
This article is helpful to newcomers and working professionals alike. Excellent blog with really important solid stuff.You're welcome for sharing.
ReplyDeleteColleges in Hyderabad For BBA
This article is helpful to newcomers and working professionals alike. Excellent blog with really important solid stuff.You're welcome for sharing. React JS Training in chennai
ReplyDeleteThis article is helpful to newcomers and working professionals alike. Excellent blog with really important solid stuff.You're welcome for sharing < a href="https://www.credosystemz.com/training-in-chennai/best-selenium-training-in-chennai/"> Selenium Training in chennai
ReplyDelete