8000 Samples Revamp: Streaming by bamurtaugh · Pull Request #319 · dotnet/spark · GitHub
[go: up one dir, main page]

Skip to content

Samples Revamp: Streaming #319

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 42 commits into from
Nov 9, 2019
Merged
Changes from 1 commit
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
c07f36d
Merge pull request #1 from dotnet/master
bamurtaugh Oct 9, 2019
4813b69
Merge pull request #2 from dotnet/master
bamurtaugh Oct 28, 2019
b0b2fee
Add readmes
bamurtaugh Oct 29, 2019
1944dc8
Update links in readme
bamurtaugh Oct 29, 2019
f0fe4f2
Moving general readmes to other branch
bamurtaugh Oct 29, 2019
e5a19e8
Move general readmes to other branch
bamurtaugh Oct 29, 2019
90262b4
Update streaming links and content
bamurtaugh Oct 29, 2019
c22feba
Update links
bamurtaugh Oct 29, 2019
375ae95
Merge branch 'master' into newsamples-stream
bamurtaugh Oct 29, 2019
c67f5be
Formatting for class/method names
bamurtaugh Oct 31, 2019
7b5b0cc
Add specific return type instead of var
bamurtaugh Oct 31, 2019
3c0021e
Move period inside asterisks
bamurtaugh Oct 31, 2019
ecc9191
Merge branch 'master' into newsamples-stream
bamurtaugh Nov 1, 2019
f78a74a
Merge branch 'master' into newsamples-stream
bamurtaugh Nov 4, 2019
dc7a2b2
Fix spacing
bamurtaugh Nov 7, 2019
4ed5cb9
Spacing
bamurtaugh Nov 7, 2019
4c81eaf
Merge branch 'master' into newsamples-stream
bamurtaugh Nov 7, 2019
45e76cb
Spacing
bamurtaugh Nov 7, 2019
5d17e49
Fix spacing in code snippets
bamurtaugh Nov 7, 2019
50b51e4
Add modified word count example
bamurtaugh Nov 7, 2019
dbac77f
Update udf explanation
bamurtaugh Nov 7, 2019
15f6060
Explain all samples
bamurtaugh Nov 7, 2019
7a76488
netcat context
bamurtaugh Nov 7, 2019
d2ec62f
Update code snippets, explanations
bamurtaugh Nov 7, 2019
0f23bb2
Wording
bamurtaugh Nov 7, 2019
f5de3fd
Update spark-submit
bamurtaugh Nov 7, 2019
2b64ae2
Relate back to udf streaming example
bamurtaugh Nov 7, 2019
a8e11ab
Fix indentation
bamurtaugh Nov 7, 2019
2ed31b8
Grammar
bamurtaugh Nov 7, 2019
10da44f
Update name of new sample
bamurtaugh Nov 7, 2019
d38854e
Update file name
bamurtaugh Nov 7, 2019
1371572
Add Sql.Streaming using
bamurtaugh Nov 7, 2019
8d59ab6
Merge branch 'newsamples-stream' of https://github.com/bamurtaugh/spa…
bamurtaugh Nov 7, 2019
13a58e2
Improve string output
bamurtaugh Nov 7, 2019
5586732
Fix readme code spacing
bamurtaugh Nov 7, 2019
e03d2d7
Shorten class ref in readme
bamurtaugh Nov 7, 2019
b7ec62c
Sort usings, add missing bracket
bamurtaugh Nov 7, 2019
4e9987c
Update comment
bamurtaugh Nov 7, 2019
38d4b08
Change netcat command to powershell id
bamurtaugh Nov 7, 2019
44fbf3d
Update output picture
bamurtaugh Nov 8, 2019
e268444
Merge branch 'master' into newsamples-stream
bamurtaugh Nov 8, 2019
cbdf3b9
Update spark-submit command
bamurtaugh Nov 8, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Update code snippets, explanations
  • Loading branch information
bamurtaugh authored Nov 7, 2019
commit d2ec62f8afbe9c7cbb0256716aaadd08f087e1a5
11 changes: 6 additions & 5 deletions examples/Microsoft.Spark.CSharp.Examples/Sql/Streaming/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,21 +80,22 @@ For example, entering *Hello world* in the terminal would produce an array where

This is just an example of how you can use UDFs to further modify and analyze your data, even live as it's being streamed in!

### 4. Use Spark SQL
### 4. Use SparkSQL

Next, we'll use Spark SQL to make SQL calls on our data. It's common to combine UDFs and Spark SQL so that we can apply a UDF to each
row of our DataFrame.
Next, we'll use SparkSQL to perform various functions on the data stored in our DataFrame. It's common to combine UDFs and SparkSQL so that we can apply a UDF to each row of our DataFrame.

```CSharp
DataFrame sqlDf = spark.Sql("SELECT WordsEdit.value, MyUDF(WordsEdit.value) FROM WordsEdit");
DataFrame arrayDF = lines.Select(Explode(udfArray(lines["value"])));
```

In the above code snippet from [StructuredNetworkWordCountUDF.cs](StructuredNetworkWordCountUDF.cs), we apply *udfArray* to each value in our DataFrame (which represents each string read in from our netcat terminal). We then apply the SparkSQL method `Explode` to put each entry of our array in its own row. Finally, we use `Select` to place the columns we've produced in the new DataFrame *arrayDF.*

### 5. Display Your Stream

We can use `DataFrame.WriteStream()` to establish characteristics of our output, such as printing our results to the console and only displaying the most recent output and not all of our previous output as well.

```CSharp
Spark.Sql.Streaming.StreamingQuery query = sqlDf
Spark.Sql.Streaming.StreamingQuery query = arrayDf
.WriteStream()
.Format("console")
.Start();
Expand Down
0