0% found this document useful (0 votes)

137 views2 pages

Case Study Github

GitHub uses Elasticsearch to power search across over 8 million code repositories and 2 billion documents, enabling powerful search for both users and developers. Elasticsearch allows GitHub to scale effectively through robust sharding and queries to serve over 4 million users. GitHub also leverages Elasticsearch's analytic capabilities to monitor internal infrastructure for abuse, bugs, and more through advanced queries. This satisfies both regular users and developers through the Elasticsearch API.

Uploaded by

Eranga Udesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

137 views2 pages

Case Study Github

Uploaded by

Eranga Udesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

The Challenge:

How do you satisfy the search needs of GitHub’s 4 million users while simultaneously providing tactical
operational insights that help you iteratively improve customer service?

The Solution:
By using Elasticsearch to index over 8 million code repositories as well as indexing critical event data.

Enable Powerful Search For Both Leverage Analytics On Search Data

End-Users And Developers
• Reveal rogue users by querying indexed
• Scale out to meet the needs of burgeoning logging data
user base by migrating away from Apache Solr
to Elasticsearch • Find software bugs within the GitHub platform
by indexing all alerts, events, logs and
• Index and query almost any type of publicly tracking the rate of specific code exceptions
exposed data
• Make queries that go beyond standard SQL
• Enable deep programmatic search for
developer applications
• Provide near real-time indexing as soon as
users upload new data

Sophisticated Searching For Sophisticated Users

Elasticsearch powers search on GitHub, the largest hosted revision control system in the world with a demanding customer base
of over 4 million technical users. GitHub uses Elasticsearch to continually index the data from an ever-growing store of over 8
million code repositories, comprising over 2 billion documents. Using Elasticsearch, GitHub was able to let users easily search
this data.

“Search is at the core of GitHub,” says Tim Pease, an Operations Engineer at GitHub. “If you go to GitHub.com/search you can
search through repositories, users, issues, pull requests, and source code.”

One goal of GitHub’s Elasticsearch implementation is to index everything that is publicly available on GitHub.com and make
it easy to find. Of course, full-text searching is fully supported, but searching based on a wide variety of criteria is also possible
and dead simple.
“You can search for a project that uses Clojure as the primary language, and has had activity over the past month, and all this
functionality is powered by Elasticsearch,” says Pease.

Elasticsearch’s flexible storage and retrieval formats, which permit both highly structured and loosely structured data to co-exist
in search storage, along with Elasticsearch’s extensive set of search primitives, made search implementation straightforward.
“You can do lots of queries on that data using Elasticsearch that a standard SQL database won’t support,” notes Pease.

Powering Analytic Insights Behind The Firewall

GitHub utilizes Elasticsearch’s combination of search indexing and analytics capability to drive multiple projects. For example,
GitHub found that the analysis capabilities of Elasticsearch queries could be used on stored audit and logging data in order to
track users’ security-related activity.

“Using Elasticsearch queries, we can quickly see every action the user has done,” says Pease. “This is a great way to see whether
an account has been stolen, hijacked, or whether the user has done something naughty.”

To learn more about Elastic, contact sales@elastic.co | www.elastic.co

When GitHub was looking to track and analyze code exceptions generated by the various software components that power
GitHub. com, they originally used a popular NoSQL database. Code exceptions were stored in secondary indexes, and its
analysis features were used to analyze exceptions over time with the results stored back into the database.
“It didn’t work very well for our use case,” remembered Grant Rodgers, a technical staff member at GitHub. “Once we moved
everything to Elasticsearch and used its histogram facet queries, everything worked really well.”

GitHub uses Elasticsearch’s histogram facet query capability, as well as other statistical facets, to track increases in the rate of
specific types of code exceptions. That process reveal bugs in their software systems.
“Elasticsearch’s histogram facet query capability performs extremely well. We’re looking to expand its use in that particular
application,” says Rodgers.

Scaling To Millions Of Users

GitHub originally used Solr for search, but found that Solr couldn’t scale effectively and was more difficult to manage.
“As more people started using GitHub, we quickly exceeded the storage space that one Solr cluster and Solr instance could
handle,” says Pease.

Faced with the choice of sharding its own data in Solr in order to handle the load, or moving to Elasticsearch, the choice was
easy. “We decided to move to Elasticsearch because we figured they could shard things much better than we could,” says
Pease.

Elasticsearch offers automatic shard rebalancing to increase performance and handle failover conditions. Replica shards are
automatically distributed to new nodes in a cluster and, in the case of node failure, shards are automatically migrated from failed
nodes to good nodes.

Advanced Sharding For High Performance

With over 2 billion documents, all indexed by Elasticsearch, and with users constantly uploading and modifying code, search
performance is a key metric for the GitHub team. GitHub serves, on average, 300 search requests per minute.

GitHub uses Elasticsearch to index new code as soon as users push it to a repository on GitHub. The data can be searched on
very soon after, and search results are returned for both public repositories, and, for logged-in users, any private repositories
they can access.

To optimize access to search data, GitHub uses sharding extensively. In GitHub’s main Elasticsearch cluster, they have about 128
shards, with each shard storing about 120 gigabytes each.

To optimize search within a single repository, GitHub uses the Elasticsearch routing parameter based on the repository ID. “That
allows us to put all the source code for a single repository on one shard,” says Pease. “If you’re on just a single repository page,
and you do a search there, that search actually hits just one shard. Those queries are about twice as fast as searches from the
main GitHub search page.”

GitHub’s benefits using Elasticsearch

Scale Effectively High Performance
3 GitHub uses Elasticsearch’s robust sharding and
3 GitHub uses Elasticsearch’s routing parameter and
advanced queries to serve up search across data in 4 flexible sharding schemes to perform searches within
million users’ code repositories. a single repository on a single shard, doubling the
speed at which results are served.
Analytics via Advanced Queries
3 GitHub uses Elasticsearch’s histogram facet queries, Satisfies Users and Developers
as well as other Elasticsearch analytic queries, to 3 Elasticsearch satisfies the search needs of both
monitor their internal infrastructure for abuse, bugs regular users, and, via the Elasticsearch API,
and more. application developers as well.

Elastic believes getting immediate, actionable insight from data matters. As the company behind the three open source projects — Elasticsearch, Logstash,
and Kibana — designed to take data from any source and search, analyze, and visualize it in real time, Elastic is helping people make sense of data. From stock
quotes to Twitter streams, Apache logs to WordPress blogs, our products are extending what’s possible with data, delivering on the promise that good things
come from connecting the dots.

To learn more about Elastic, contact sales@elastic.co | www.elastic.co

Elasticsearch-Engineer-8.15.3-1
No ratings yet
Elasticsearch-Engineer-8.15.3-1
520 pages
Mastering Elasticsearch: A Comprehensive Guide
From Everand
Mastering Elasticsearch: A Comprehensive Guide
Brett Neutreon
No ratings yet
CoreDeveloper-5 5 1
No ratings yet
CoreDeveloper-5 5 1
559 pages
ELK Session
No ratings yet
ELK Session
30 pages
Learning Elasticsearch 7.x: Index, Analyze, Search and Aggregate Your Data Using Elasticsearch (English Edition)
From Everand
Learning Elasticsearch 7.x: Index, Analyze, Search and Aggregate Your Data Using Elasticsearch (English Edition)
Anurag Srivastava
No ratings yet
Communication Manager Reports
No ratings yet
Communication Manager Reports
318 pages
Unicorn User Manual
No ratings yet
Unicorn User Manual
584 pages
What Is Elasticsearch
No ratings yet
What Is Elasticsearch
63 pages
Networking
No ratings yet
Networking
51 pages
Elastic
No ratings yet
Elastic
61 pages
What is the ELK Stack
No ratings yet
What is the ELK Stack
9 pages
An Industrial Training Presentation: Crest Infosystems Pvt. LTD
No ratings yet
An Industrial Training Presentation: Crest Infosystems Pvt. LTD
80 pages
es_lab_final
No ratings yet
es_lab_final
19 pages
ElasticSearch_Interview_Questions
No ratings yet
ElasticSearch_Interview_Questions
24 pages
Engineer II 6.2.2
No ratings yet
Engineer II 6.2.2
492 pages
Elasticsearch Guidebook: From Basics to Expert Proficiency
From Everand
Elasticsearch Guidebook: From Basics to Expert Proficiency
William Smith
No ratings yet
ELK Interview Project Based Qwestions2
No ratings yet
ELK Interview Project Based Qwestions2
7 pages
ELK Stack Explanation & Configuration
No ratings yet
ELK Stack Explanation & Configuration
24 pages
Elasticsearch Py
No ratings yet
Elasticsearch Py
112 pages
Defense Information Systems Agency (DISA) GIG Convergence Master Plan 2012
No ratings yet
Defense Information Systems Agency (DISA) GIG Convergence Master Plan 2012
27 pages
Beginner's Crash Course To Elastic Stack - Part 1. 1 Intro To Elasticsearch and Kibana
100% (1)
Beginner's Crash Course To Elastic Stack - Part 1. 1 Intro To Elasticsearch and Kibana
59 pages
Elk Stack Ascii
No ratings yet
Elk Stack Ascii
24 pages
3a Artificial Intelligence in Healthcare Guidelines (AIHGLE) Publishedoct21
100% (1)
3a Artificial Intelligence in Healthcare Guidelines (AIHGLE) Publishedoct21
49 pages
Introduction To Elasticsearch.: Ruslan Zavacky
No ratings yet
Introduction To Elasticsearch.: Ruslan Zavacky
75 pages
Scaling Twitter 12758
No ratings yet
Scaling Twitter 12758
56 pages
JCI Local Action Guides
100% (2)
JCI Local Action Guides
62 pages
Possibilities For User-Centric and Participatory Design in Modular Health Care Facilities
No ratings yet
Possibilities For User-Centric and Participatory Design in Modular Health Care Facilities
16 pages
Elasticsearch Optimization
No ratings yet
Elasticsearch Optimization
25 pages
Shri Shiva Bharatam - Nivaskara Kavindra Paramananda
No ratings yet
Shri Shiva Bharatam - Nivaskara Kavindra Paramananda
131 pages
Data Leakage Worldwide - Common Risks and Mistakes Employees Make
No ratings yet
Data Leakage Worldwide - Common Risks and Mistakes Employees Make
10 pages
ES Tutorial PDF
No ratings yet
ES Tutorial PDF
61 pages
Introductory Concepts for the Course to Elasticsearch
No ratings yet
Introductory Concepts for the Course to Elasticsearch
34 pages
Stack Overflow Architecture Devconf
No ratings yet
Stack Overflow Architecture Devconf
40 pages
dell-lifecycle-hub-white-paper
No ratings yet
dell-lifecycle-hub-white-paper
6 pages
Elasticsearch Why Big System Need You
No ratings yet
Elasticsearch Why Big System Need You
28 pages
Unpacking Melcs in Css 910
100% (2)
Unpacking Melcs in Css 910
3 pages
Elasticsearch Server - Third Edition - Sample Chapter
No ratings yet
Elasticsearch Server - Third Edition - Sample Chapter
56 pages
Elasticsearch
100% (2)
Elasticsearch
21 pages
Elasticsearch Py
100% (1)
Elasticsearch Py
63 pages
Kib PDF
No ratings yet
Kib PDF
7 pages
Jobst - 2022 - Efficient GitHub Crawling Using The GraphQL API
No ratings yet
Jobst - 2022 - Efficient GitHub Crawling Using The GraphQL API
16 pages
Elastic Search
No ratings yet
Elastic Search
19 pages
Ocr Computing Coursework Example
100% (2)
Ocr Computing Coursework Example
7 pages
A Review of Elastic Search Performance M
No ratings yet
A Review of Elastic Search Performance M
8 pages
Article About Elasticsearch
No ratings yet
Article About Elasticsearch
5 pages
Elasticsearch-Datasheet
No ratings yet
Elasticsearch-Datasheet
2 pages
Office Technology Ques
No ratings yet
Office Technology Ques
17 pages
Elasticsearch INTRODUCTION
No ratings yet
Elasticsearch INTRODUCTION
8 pages
ASSR - AA - STF - NEW - L8 - PM8.2.4.3B - Payroll - Expense - Controls Form - SE Payroll Controls 6
No ratings yet
ASSR - AA - STF - NEW - L8 - PM8.2.4.3B - Payroll - Expense - Controls Form - SE Payroll Controls 6
7 pages
Elasticsearch Blueprints - Sample Chapter
No ratings yet
Elasticsearch Blueprints - Sample Chapter
24 pages
Elk
No ratings yet
Elk
5 pages
Free Writing Elasticsearch
No ratings yet
Free Writing Elasticsearch
2 pages
Article 43
No ratings yet
Article 43
4 pages
Ambidextrous: District Collectorate Office Information Integration
No ratings yet
Ambidextrous: District Collectorate Office Information Integration
35 pages
advantage of elasticsearch and alternatives to elasticsearch - Gh
No ratings yet
advantage of elasticsearch and alternatives to elasticsearch - Gh
4 pages
Communications Plan PPT Template
No ratings yet
Communications Plan PPT Template
12 pages
MAHAGENCO - NirmITee - Recruit To Retire ESS MSS-Handout
No ratings yet
MAHAGENCO - NirmITee - Recruit To Retire ESS MSS-Handout
24 pages
Elastic Search: Lessons Learned
0% (1)
Elastic Search: Lessons Learned
22 pages
Impacts Analysis of The Canadian Photonics Fabrication Centre - Final Report
No ratings yet
Impacts Analysis of The Canadian Photonics Fabrication Centre - Final Report
26 pages
Elastic Search
No ratings yet
Elastic Search
9 pages
Github As Devops
No ratings yet
Github As Devops
4 pages
A Framework For Social Media Data Analytics Using Elasticsearch and Kibana
No ratings yet
A Framework For Social Media Data Analytics Using Elasticsearch and Kibana
9 pages
LTS Secure Security Information and Event Management (Siem)
No ratings yet
LTS Secure Security Information and Event Management (Siem)
8 pages
B-Whitepaper Veritas Netbackup Operations Manager and Veritas Backup Reporter 4 2008 11572764.en-Us
No ratings yet
B-Whitepaper Veritas Netbackup Operations Manager and Veritas Backup Reporter 4 2008 11572764.en-Us
24 pages
Top Ten Lists of Software Project Risks: Evidence From The Literature Survey
No ratings yet
Top Ten Lists of Software Project Risks: Evidence From The Literature Survey
6 pages
Elasticsearch Advantages
No ratings yet
Elasticsearch Advantages
3 pages
MNP Prepaid Call Flow PDF
No ratings yet
MNP Prepaid Call Flow PDF
6 pages
MNP Prepaid Call Flow PDF
No ratings yet
MNP Prepaid Call Flow PDF
6 pages
Consideration of Internal Control in An Information Technology Environment
No ratings yet
Consideration of Internal Control in An Information Technology Environment
11 pages
Big Data Analytics On Large-Scale Socio-Technical Software Engineering Archives
No ratings yet
Big Data Analytics On Large-Scale Socio-Technical Software Engineering Archives
5 pages
Data Center Relocation - Plan Summary Report
0% (1)
Data Center Relocation - Plan Summary Report
15 pages
Facilitating User Computing
No ratings yet
Facilitating User Computing
20 pages
An Elasticsearch Crash Course Presentation PDF
No ratings yet
An Elasticsearch Crash Course Presentation PDF
81 pages
mining-soft-eg-data-github
No ratings yet
mining-soft-eg-data-github
2 pages
mNoticeBoard - Marketing Presentation
No ratings yet
mNoticeBoard - Marketing Presentation
10 pages
Splunk Open Source Build Vs Buy Workshop
No ratings yet
Splunk Open Source Build Vs Buy Workshop
36 pages
A Survey Paper On Elastic Search Similarity Algorithm
No ratings yet
A Survey Paper On Elastic Search Similarity Algorithm
4 pages
CISA Exam Prep Domain 4 - 2019
100% (1)
CISA Exam Prep Domain 4 - 2019
148 pages
GraphQL Vs REST
No ratings yet
GraphQL Vs REST
2 pages
Elasticsearch: Getting Started With Elasticsearch
No ratings yet
Elasticsearch: Getting Started With Elasticsearch
6 pages
Elastic Stack: Elasticsearch Logstash and Kibana
No ratings yet
Elastic Stack: Elasticsearch Logstash and Kibana
24 pages
Veloce Arm Iq Ds
No ratings yet
Veloce Arm Iq Ds
3 pages
Dialogic TX 5000 Series SS7 Boards: Features Benefits
No ratings yet
Dialogic TX 5000 Series SS7 Boards: Features Benefits
7 pages
Dialogic TX 5000 Series SS7 Boards: Features Benefits
No ratings yet
Dialogic TX 5000 Series SS7 Boards: Features Benefits
7 pages
List All Indices: Shards & Replicas
No ratings yet
List All Indices: Shards & Replicas
5 pages
Elastic Assignment
No ratings yet
Elastic Assignment
28 pages
Typical End-User Certificate - UNITED KINDGODM
No ratings yet
Typical End-User Certificate - UNITED KINDGODM
3 pages
ElasticSearch IEEE Format1
No ratings yet
ElasticSearch IEEE Format1
3 pages
Learning ELK Stack: Build mesmerizing visualizations, analytics, and logs from your data using Elasticsearch, Logstash, and Kibana
From Everand
Learning ELK Stack: Build mesmerizing visualizations, analytics, and logs from your data using Elasticsearch, Logstash, and Kibana
Saurabh Chhajed
No ratings yet
Install Instructions PC5208 v1-0 29007167R004
No ratings yet
Install Instructions PC5208 v1-0 29007167R004
2 pages
CIS3454 Writting Assignment
No ratings yet
CIS3454 Writting Assignment
2 pages
Kibana Fundamental 7.6.0 PDF
No ratings yet
Kibana Fundamental 7.6.0 PDF
52 pages
Subject: A Glance To Elasticsearch in The Era of Analytics and Machine Learning
No ratings yet
Subject: A Glance To Elasticsearch in The Era of Analytics and Machine Learning
8 pages
Elasticsearch and The Elk Stack For Monitoring and Data Analysis
No ratings yet
Elasticsearch and The Elk Stack For Monitoring and Data Analysis
46 pages
Fundamental Information System Concepts
100% (3)
Fundamental Information System Concepts
7 pages
Handling Tickets Is Called Issue Tracking System.: Sap SD
No ratings yet
Handling Tickets Is Called Issue Tracking System.: Sap SD
6 pages
Analysis: User Profiles Personae Scenarios
No ratings yet
Analysis: User Profiles Personae Scenarios
32 pages

Case Study Github

Uploaded by

Case Study Github

Uploaded by

The Challenge:

Enable Powerful Search For Both Leverage Analytics On Search Data

Sophisticated Searching For Sophisticated Users

Powering Analytic Insights Behind The Firewall

To learn more about Elastic, contact sales@elastic.co | www.elastic.co

Scaling To Millions Of Users

Advanced Sharding For High Performance

GitHub’s benefits using Elasticsearch

To learn more about Elastic, contact sales@elastic.co | www.elastic.co

You might also like