0% found this document useful (0 votes)

452 views55 pages

Databricks - Spark Streaming

Spark Streaming is a scalable stream processing framework built on top of Spark. It allows processing of live data streams using complex algorithms expressed with high-level functions like map, reduce, join and window. The document discusses the history and development of Spark Streaming, how it works, examples of usage, and organizations that are using it in production including Netflix, Intel and Janelia Research Campus. It also outlines the roadmap for future enhancements including more streaming machine learning algorithms, integration with SQL and DataFrames, and improved operational capabilities.

Uploaded by

SlavimirVesić

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

452 views55 pages

Databricks - Spark Streaming

Uploaded by

SlavimirVesić

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 55

Spark Streaming

State of the Union and Beyond

Tathagata TD Das
@tathadas
Feb 19, 2015

Who am I?
Project Management Committee (PMC) member of Spark
Lead developer of Spark Streaming
Formerly in AMPLab, UC Berkeley
Software developer at Databricks

What is Databricks?
Founded by the creators of Spark in 2013
Largest organization contributing to Spark
End-to-end hosted service, Databricks Cloud

What is Spark
Streaming?

Spark Streaming
Scalable, fault-tolerant stream processing system
High-level API

Fault-tolerant

Integration

joins, windows,
often 5x less code

Exactly-once semantics,
even for stateful ops

Integrate with MLlib, SQL,

DataFrames, GraphX

Kafka

File systems

Flume
Kinesis
HDFS/S3
Twitter

Streaming

Databases
Dashboards

What can you use it for?

Real-time fraud detection in transactions

React to anomalies in sensors in real-time

Cat videos in tweets as soon as they go viral

How does it work?

Data streams are chopped up into batches
Each batch is processed in Spark
Results pushed out in batches

data streams

receivers

Streaming

batches

results
7

Streaming Word Count

val lines = context.socketTextStream(localhost, 9999)

create DStream
from data over socket

val words = lines.flatMap(_.split(" "))

split lines into words

val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)

wordCounts.print()

print some counts on screen

ssc.start()

start processing the stream

count the words

Word Count
object NetworkWordCount {
def main(args: Array[String]) {
val sparkConf = new SparkConf().setAppName("NetworkWordCount")
val context = new StreamingContext(sparkConf, Seconds(1))
val lines = context.socketTextStream(localhost, 9999)
val words = lines.flatMap(_.split(" "))
val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)

wordCounts.print()
ssc.start()
ssc.awaitTermination()

public class WordCountTopology {

public static class SplitSentence extends ShellBolt implements IRichBolt {
public SplitSentence() {
super("python", "splitsentence.py");
}

Word Count
Spark Streaming

Storm

@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word"));
}

public static class WordCount extends BaseBasicBolt {

Map<String, Integer> counts = new HashMap<String, Integer>();
@Override
public void execute(Tuple tuple, BasicOutputCollector collector) {
String word = tuple.getString(0);
Integer count = counts.get(word);
if (count == null)
count = 0;
count++;
counts.put(word, count);
collector.emit(new Values(word, count));
}

object NetworkWordCount {
def main(args: Array[String]) {
val sparkConf = new SparkConf().setAppName("NetworkWordCount")
val context = new StreamingContext(sparkConf, Seconds(1))
val lines = context.socketTextStream(localhost, 9999)
val words = lines.flatMap(_.split(" "))
val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)

wordCounts.print()
ssc.start()
ssc.awaitTermination()

@Override
public Map<String, Object> getComponentConfiguration() {
return null;
}

@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word", "count"));
}

public static void main(String[] args) throws Exception {

TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8).shuleGrouping("spout");
builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word"));
Config conf = new Config();
conf.setDebug(true);
if (args != null && args.length > 0) {
conf.setNumWorkers(3);
StormSubmitter.submitTopologyWithProgressBar(args[0], conf, builder.createTopology());
}
else {
conf.setMaxTaskParallelism(3);
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("word-count", conf, builder.createTopology());
Thread.sleep(10000);
}

cluster.shutdown();

Languages
Can natively use

Can use any other language by using pipe()

Integrates with Spark Ecosystem

Spark
Spark SQL
Streaming

MLlib

GraphX

Spark Core

Combine batch and streaming processing

Join data streams with static data sets
// Create data set from Hadoop file!
val dataset = sparkContext.hadoopFile(file)
// Join each batch in stream with the dataset
kafkaStream.transform { batchRDD =>
batchRDD.join(dataset)
.filter( ... )
}

Spark SQL

Spark
Streaming

MLlib

GraphX

Spark Core

Combine machine learning with streaming

Learn models oline, apply them online
// Learn model oline
val model = KMeans.train(dataset, ...)
Spark SQL

// Apply model online on stream

kafkaStream.map { event =>
model.predict(event.feature)
}

Spark
Streaming

MLlib

GraphX

Spark Core

Combine SQL with streaming

Interactively query streaming data with SQL
// Register each batch in stream as table
kafkaStream.map { batchRDD =>
batchRDD.registerTempTable("latestEvents") Spark SQL
}

Spark
Streaming

MLlib

GraphX

Spark Core

// Interactively query table

sqlContext.sql("select * from latestEvents")
15

History
Late 2011 research idea
AMPLab, UC Berkeley

We need to
make Spark
faster

Okay...umm,
how??!?!

History
Late 2011 idea

AMPLab, UC Berkeley

Q3 2012

Spark core improvements

open sourced in Spark 0.6

Q2 2012 prototype

Rewrote large parts of Spark core

Smallest job - 900 ms <50 ms

Feb 2013 Alpha release

7.7k lines, merged in 7 days
Released with Spark 0.7

History
Late 2011 idea

AMPLab, UC Berkeley

Q3 2012

Spark core improvements

open sourced in Spark 0.6

Q2 2012 prototype

Rewrote large parts of Spark core

Smallest job - 900 ms <50 ms

Jan 2014 Stable release

Graduation with Spark 0.9

Feb 2013 Alpha release

7.7k lines, merged in 7 days
Released with Spark 0.7

Current state of
Spark Streaming

Development
Adoption
Roadmap

What have we added

in the last year?

Python API
Core functionality in Spark 1.2,
with sockets and files as sources
Kafka support coming in Spark 1.3

lines = ssc.socketTextStream(localhost", 9999))

counts = lines.flatMap(lambda line: line.split(" ")) \
.map(lambda word: (word, 1)) \
.reduceByKey(lambda a, b: a+b)
counts.pprint()

Other sources coming in future

Streaming MLlib algorithms

Continuous learning and
prediction on streaming data
StreamingLinearRegression in
Spark 1.1

val model = new StreamingKMeans()

.setK(args(3).toInt)
.setDecayFactor(1.0)
.setRandomCenters(args(4).toInt, 0.0)
// Apply model to DStreams

model.trainOn(trainingDStream)
model.predictOnValues(testDStream.map { lp =>
(lp.label, lp.features) } ).print()

StreamingKMeans in Spark 1.2

https://databricks.com/blog/2015/01/28/introducing-streaming-k-means-in-spark-1-2.html
23

Other library additions

Amazon Kinesis integration [ Spark 1.1]
More fault-tolerant Flume integration [Spark 1.1]
New Kafka API for more native integration [Spark 1.3]

System Infrastructure
Automated driver fault-tolerance [Spark 1.0]
Graceful shutdown [Spark 1.0]
Write Ahead Logs for zero data loss [Spark 1.2]

Contributors to Streaming
40
30
20
10
0

Spark 0.9

Spark 1.0

Spark 1.1

Spark 1.2
26

Contributors - Full Picture

120

Streaming

Core + Streaming
(w/o SQL, MLlib,)

60
30
0

Spark 0.9

Spark 1.0

Spark 1.1

Spark 1.2

All contributions
to core Spark
directly improve
Spark Streaming
27

Spark Packages
More contributions from the
community in spark-packages
Alternate Kafka receiver
Apache Camel receiver
Cassandra examples

http://spark-packages.org/
28

Who is using
Spark Streaming?

Spark Summit 2014 Survey

40% of Spark users were
using Spark Streaming in
production or prototyping

Production
9%
Prototyping
31%

Not using
21%

Another 39% were

evaluating it
Evaluating
39%
30

80+
known
deployments

Intel China builds big data solutions for large enterprises

Multiple streaming applications for dierent businesses
Real-time risk analysis for a top online payment company
Real-time deal and flow metric reporting for a top online shopping company

Complicated stream processing

SQL queries on streams
Join streams with large historical datasets

> 1TB/day passing through Spark Streaming

Kafka
RocketMQ

Spark
Streaming
YARN

HBase

One of the largest publishing and education company, wants

to accelerate their push into digital learning
Needed to combine student activities and domain events to
continuously update the learning model of each student
Earlier implementation in Storm, but now moved on to
Spark Streaming

Chose Spark Streaming, because Spark together combines

batch, streaming, machine learning, and graph processing

Kafka

Spark
Streaming
YARN

More information: http://dbricks.co/1BnFZZ8

Cassandra
Apache Blur

Leading advertising automation company with an exchange

platform for in-feed ads
Process clickstream data for optimizing real-time bidding for ads
Kinesis
RabbitMQ

Spark
Streaming
Mesos+Marathon

MySQL
Redis
SQS

http://techblog.netflix.com/2015/02/whats-trending-on-netflix.html

http://goo.gl/mJNf8X

Neuroscience @ Freeman Lab, Janelia Farm

Spark Streaming and MLlib to
analyze neural activities
Laser microscope scans Zebrafish
brain Spark Streaming
interactive visualization
laser ZAP to kill neurons!
http://www.jeremyfreeman.net/share/talks/spark-summit-2014/

Neuroscience @ Freeman Lab, Janelia Farm

Streaming machine learning
algorithms on time series data of
every neuron
2TB/hour and increasing with
brain size
80 HPC nodes

Why are they adopting Spark Streaming?

Easy, high-level API
Unified API across batch and streaming
Integration with Spark SQL and MLlib
Ease of operations
41

Whats coming next?

Beyond Spark 1.3

Libraries
Streaming machine learning algorithms
A/B testing
Online Latent Dirichlet Allocation (LDA)
More streaming linear algorithms

Streaming + SQL, Streaming + DataFrames

Beyond Spark 1.3

Operational Ease
Better flow control
Elastic scaling
Cross-version upgradability
Improved support for non-Hadoop environments

Beyond Spark 1.3

Performance
Higher throughput, especially of stateful operations
Lower latencies

Easy deployment of streaming apps in Databricks Cloud!

You can help!

Roadmaps are heavily driven by community feedback
We have listened to community demands over the last year
Write Ahead Logs for zero data loss
New Kafka integration for stronger semantics

Let us know what do you want to see in Spark Streaming

Spark user mailing list, tweet it to me @tathadas
46

Takeaways
Spark Streaming is scalable, fault-tolerant stream processing
system with high-level API and rich set of libraries
Over 80+ deployments in the industry
More libraries and operational ease in the roadmap

Backup slides

Typesafe survey of Spark users

2136 developers, data scientists,

and other tech professionals
http://java.dzone.com/articles/apache-spark-survey-typesafe-0

Typesafe survey of Spark users

65% of Spark users are interested

in Spark Streaming

Typesafe survey of Spark users

2/3 of Spark users want to process

event streams

More usecases

Big data solution provider for enterprises

Multiple applications for dierent businesses
- Monitoring +optimizing online services of Tier-1 bank
- Fraudulent transaction detection for Tier-2 bank

Kafka SS Cassandra, MongoDB

Built their own Stratio Streaming platform on

Spark Streaming, Kafka, Cassandra, MongoDB

Provides data analytics solutions for Communication

Service Providers

- 4 of 5 top mobile ops, 3 of 4 top internet backbone providers

- Processes >50% of all US mobile traic

Multiple applications for dierent businesses

- Real-time anomaly detection in cell tower traic
- Real-time call quality optimizations

Kafka SS
http://spark-summit.org/2014/talk/building-big-data-operational-intelligence-platform-with-apache-spark

Runs claims processing applications for healthcare providers

Predictive models can look

for claims that are likely to

be held up for approval
Spark Streaming allows
model scoring in seconds
instead of hours

http://searchbusinessanalytics.techtarget.com/feature/Spark-Streaming-project-looks-to-shed-new-light-on-medical-claims

Advanced DevOps with Spark
0% (1)
Advanced DevOps with Spark
301 pages
Key Features: General-Purpose Fast Cluster Computing Platform
No ratings yet
Key Features: General-Purpose Fast Cluster Computing Platform
16 pages
Large Scale Data Pipelines
No ratings yet
Large Scale Data Pipelines
91 pages
Databricks Cloud How To Log Analysis Example
No ratings yet
Databricks Cloud How To Log Analysis Example
9 pages
Mastering Apache Spark
100% (6)
Mastering Apache Spark
1,044 pages
Delta Lake Cheat Sheet-1
100% (1)
Delta Lake Cheat Sheet-1
2 pages
What Is Spark?: Up To 100× Faster
No ratings yet
What Is Spark?: Up To 100× Faster
56 pages
Apache Spark Interview Questions
No ratings yet
Apache Spark Interview Questions
12 pages
Spark Interview Prep Guide
No ratings yet
Spark Interview Prep Guide
31 pages
Spark Essentials for Data Engineers
No ratings yet
Spark Essentials for Data Engineers
17 pages
Delta Lake Data Engineering Overview
No ratings yet
Delta Lake Data Engineering Overview
59 pages
Spark Optimization Techniques
No ratings yet
Spark Optimization Techniques
7 pages
99 Apache Spark Interview Questions For Professionals
33% (12)
99 Apache Spark Interview Questions For Professionals
11 pages
Spark Tuning
No ratings yet
Spark Tuning
26 pages
Apache Spark Interview Questions Guide
100% (1)
Apache Spark Interview Questions Guide
7 pages
Advanced Spark Training
0% (1)
Advanced Spark Training
49 pages
Pyspark PDF
0% (1)
Pyspark PDF
239 pages
Kafka and Spark Streaming
No ratings yet
Kafka and Spark Streaming
45 pages
Spark Interview Prep Guide
No ratings yet
Spark Interview Prep Guide
4 pages
How To Work With Apache Airflow
No ratings yet
How To Work With Apache Airflow
111 pages
Apache Hadoop Developer Training
100% (1)
Apache Hadoop Developer Training
394 pages
Spark SQL & DataFrames Guide 2.2.0
No ratings yet
Spark SQL & DataFrames Guide 2.2.0
35 pages
Spark Interview Questions: Click Here
No ratings yet
Spark Interview Questions: Click Here
35 pages
Spark: Prepared by Dulari Bhatt
No ratings yet
Spark: Prepared by Dulari Bhatt
19 pages
Sparksql PDF
100% (2)
Sparksql PDF
119 pages
Mastering Apache Spark - Sample Chapter
No ratings yet
Mastering Apache Spark - Sample Chapter
24 pages
Apache Druid: Sudhindra Tirupati Nagaraj
No ratings yet
Apache Druid: Sudhindra Tirupati Nagaraj
12 pages
Distributed Database Systems: - Spark I
No ratings yet
Distributed Database Systems: - Spark I
59 pages
Data Engineering & GCP Basic Services 2. Data Storage in GCP 3. Database Offering by GCP 4. Data Processing in GCP 5. ML/AI Offering in GCP
No ratings yet
Data Engineering & GCP Basic Services 2. Data Storage in GCP 3. Database Offering by GCP 4. Data Processing in GCP 5. ML/AI Offering in GCP
3 pages
Modern Data Pipelines With Apache Airflow
No ratings yet
Modern Data Pipelines With Apache Airflow
36 pages
Hive and HBase for Data Engineers
No ratings yet
Hive and HBase for Data Engineers
25 pages
Building Data Pipelines - 1
No ratings yet
Building Data Pipelines - 1
25 pages
Spark and Scala Course
No ratings yet
Spark and Scala Course
5 pages
Microsoft Certified: Azure Data Engineer Associate - Skills Measured
No ratings yet
Microsoft Certified: Azure Data Engineer Associate - Skills Measured
4 pages
Kafka Sparkstreaming
No ratings yet
Kafka Sparkstreaming
75 pages
Apache Spark: Fast Cluster Computing
No ratings yet
Apache Spark: Fast Cluster Computing
6 pages
Airflow Introduction
No ratings yet
Airflow Introduction
9 pages
8888888888888888888
100% (1)
8888888888888888888
131 pages
Introduction To Spark For Data Engineers / Data Scientists
100% (3)
Introduction To Spark For Data Engineers / Data Scientists
100 pages
Azure Databricks Workshop Agenda
No ratings yet
Azure Databricks Workshop Agenda
43 pages
Apache Spark With Java
No ratings yet
Apache Spark With Java
209 pages
Mastering Apache Spark PDF
75% (4)
Mastering Apache Spark PDF
541 pages
Spark Application Deployment Guide
No ratings yet
Spark Application Deployment Guide
18 pages
Apache Spark Essential Training
No ratings yet
Apache Spark Essential Training
30 pages
Apache Spark: In-Memory Data Processing
No ratings yet
Apache Spark: In-Memory Data Processing
187 pages
Sqoop Cammand
No ratings yet
Sqoop Cammand
8 pages
Day1 Main
No ratings yet
Day1 Main
188 pages
Spark Interview Q&A: Key Insights
No ratings yet
Spark Interview Q&A: Key Insights
10 pages
DBT Certificate Study Guide
100% (1)
DBT Certificate Study Guide
11 pages
Stream Processing Using Kafka
No ratings yet
Stream Processing Using Kafka
46 pages
Conceptual Data Vault Model
100% (1)
Conceptual Data Vault Model
7 pages
Introduction To Data Warehousing
No ratings yet
Introduction To Data Warehousing
74 pages
Spark-Powered Big Data Platform by Atigeo
No ratings yet
Spark-Powered Big Data Platform by Atigeo
17 pages
Spark Streaming
No ratings yet
Spark Streaming
14 pages
Data Lake 1
No ratings yet
Data Lake 1
19 pages
Spark Streaming: Tathagata "TD" Das
No ratings yet
Spark Streaming: Tathagata "TD" Das
28 pages
Spark Streaming for Developers
100% (1)
Spark Streaming for Developers
28 pages
Structured Streaming Guide
No ratings yet
Structured Streaming Guide
1 page
Customer Information Form Template
No ratings yet
Customer Information Form Template
2 pages
2.entity Relationship Diagram
No ratings yet
2.entity Relationship Diagram
59 pages
Sound On: Mastercard Debuts Sonic Brand
No ratings yet
Sound On: Mastercard Debuts Sonic Brand
3 pages
Vcp6.5 DCV Study Guide
No ratings yet
Vcp6.5 DCV Study Guide
251 pages
Cedar
100% (1)
Cedar
13 pages
Data Structure Algorithm & System Design Learnbay
No ratings yet
Data Structure Algorithm & System Design Learnbay
21 pages
Part I: Managing A Microsoft Windows Server 2003 Environment
No ratings yet
Part I: Managing A Microsoft Windows Server 2003 Environment
46 pages
Naalehu School Textbook Inventory
No ratings yet
Naalehu School Textbook Inventory
8 pages
Derivation of Bernoulli Equation
No ratings yet
Derivation of Bernoulli Equation
3 pages
Home Automation Report
No ratings yet
Home Automation Report
52 pages
Breakout - NCR Writing and Closure PDF
No ratings yet
Breakout - NCR Writing and Closure PDF
29 pages
Assignment: 1: X X X e
No ratings yet
Assignment: 1: X X X e
6 pages
RG-WLAN Series Access Point RGOS Configuration Guide, Release 11.1 (5) B6 PDF
No ratings yet
RG-WLAN Series Access Point RGOS Configuration Guide, Release 11.1 (5) B6 PDF
863 pages
Samsung NP300E5C-U02IN Laptop (2nd Gen Ci3/ 4GB/ 750GB/ Win7 HB/ 1GB Graph)
No ratings yet
Samsung NP300E5C-U02IN Laptop (2nd Gen Ci3/ 4GB/ 750GB/ Win7 HB/ 1GB Graph)
13 pages
Automation Systems for Engineers
100% (1)
Automation Systems for Engineers
57 pages
As 5044.1-2010 AGLS Metadata Standard Reference Description
No ratings yet
As 5044.1-2010 AGLS Metadata Standard Reference Description
9 pages
Robomaze Proposal
No ratings yet
Robomaze Proposal
7 pages
SAS BI Dashboard User Guide
No ratings yet
SAS BI Dashboard User Guide
98 pages
Python QB Solution
No ratings yet
Python QB Solution
66 pages
PC-Controlled 7-Segment Display Guide
No ratings yet
PC-Controlled 7-Segment Display Guide
2 pages
Absolute Beginners Stata
No ratings yet
Absolute Beginners Stata
3 pages
Adding Learning Certificates To LinkedIn Profiles
No ratings yet
Adding Learning Certificates To LinkedIn Profiles
5 pages
IT Project Failure: Causes & Analysis
No ratings yet
IT Project Failure: Causes & Analysis
28 pages
Niladri - Ganguly New Bio
No ratings yet
Niladri - Ganguly New Bio
3 pages
Radar Overlay Guide v7
No ratings yet
Radar Overlay Guide v7
18 pages
LM Remote KeyMap
No ratings yet
LM Remote KeyMap
18 pages
Unit 4222-307
100% (1)
Unit 4222-307
3 pages
21 CFR Part 11 Compliance Assessment: An Oracle White Paper January 2009
100% (1)
21 CFR Part 11 Compliance Assessment: An Oracle White Paper January 2009
18 pages
Salesforce Lightning Interview Questions
100% (2)
Salesforce Lightning Interview Questions
11 pages
Haas VMC Manual
33% (3)
Haas VMC Manual
268 pages

Databricks - Spark Streaming

Uploaded by

Databricks - Spark Streaming

Uploaded by

Spark Streaming

State of the Union and Beyond

Integrate with MLlib, SQL,

What can you use it for?

React to anomalies in sensors in real-time

Cat videos in tweets as soon as they go viral

How does it work?

Streaming Word Count

val words = lines.flatMap(_.split(" "))

split lines into words

val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)

print some counts on screen

start processing the stream

count the words

public class WordCountTopology {

public static class WordCount extends BaseBasicBolt {

public static void main(String[] args) throws Exception {

Can use any other language by using pipe()

Integrates with Spark Ecosystem

Combine batch and streaming processing

Combine machine learning with streaming

// Apply model online on stream

Combine SQL with streaming

// Interactively query table

Spark core improvements

Rewrote large parts of Spark core

Feb 2013 Alpha release

Spark core improvements

Rewrote large parts of Spark core

Jan 2014 Stable release

Feb 2013 Alpha release

What have we added

lines = ssc.socketTextStream(localhost", 9999))

Other sources coming in future

Streaming MLlib algorithms

val model = new StreamingKMeans()

StreamingKMeans in Spark 1.2

Other library additions

Contributors - Full Picture

Spark Summit 2014 Survey

Another 39% were

Intel China builds big data solutions for large enterprises

Complicated stream processing

> 1TB/day passing through Spark Streaming

One of the largest publishing and education company, wants

Chose Spark Streaming, because Spark together combines

More information: http://dbricks.co/1BnFZZ8

Leading advertising automation company with an exchange

Neuroscience @ Freeman Lab, Janelia Farm

Neuroscience @ Freeman Lab, Janelia Farm

Why are they adopting Spark Streaming?

Whats coming next?

Beyond Spark 1.3

Streaming + SQL, Streaming + DataFrames

Beyond Spark 1.3

Beyond Spark 1.3

Easy deployment of streaming apps in Databricks Cloud!

You can help!

Let us know what do you want to see in Spark Streaming

Typesafe survey of Spark users

2136 developers, data scientists,

Typesafe survey of Spark users

65% of Spark users are interested

Typesafe survey of Spark users

2/3 of Spark users want to process

Big data solution provider for enterprises

Kafka SS Cassandra, MongoDB

Spark Streaming, Kafka, Cassandra, MongoDB

Provides data analytics solutions for Communication

- 4 of 5 top mobile ops, 3 of 4 top internet backbone providers

Multiple applications for dierent businesses

Runs claims processing applications for healthcare providers

for claims that are likely to

You might also like