8000 Updates the Github documentation for v1.0 release. by Niharikadutta · Pull Request #712 · dotnet/spark · GitHub
[go: up one dir, main page]

Skip to content

Updates the Github documentation for v1.0 release. #712

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 59 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
03b7939
Adding section for UDF serialization
Niharikadutta Apr 20, 2020
4ef693d
removing guides from master
Niharikadutta Apr 20, 2020
81145ca
Merge latest from master
Niharikadutta May 6, 2020
e4b81af
merging latest from master
Niharikadutta May 7, 2020
4c32173
Merge remote-tracking branch 'upstream/master'
Niharikadutta Jun 2, 2020
4987a09
Merge remote-tracking branch 'upstream/master'
Niharikadutta Jun 14, 2020
ca9612e
Merge remote-tracking branch 'upstream/master'
Niharikadutta Jun 16, 2020
f581c86
Merge remote-tracking branch 'upstream/master'
Niharikadutta Jun 20, 2020
086b325
Merge remote-tracking branch 'upstream/master'
Niharikadutta Jun 23, 2020
2f72907
Merge remote-tracking branch 'upstream/master'
Niharikadutta Jul 25, 2020
6bab996
CountVectorizer
Jul 27, 2020
e2a566b
moving private methods to bottom
Jul 27, 2020
5f682a6
changing wrap method
Jul 28, 2020
31371db
setting min version required
Jul 31, 2020
60eb82f
undoing csproj change
Jul 31, 2020
ed36375
member doesnt need to be internal
Jul 31, 2020
c7baf72
too many lines
Jul 31, 2020
d13303c
removing whitespace change
Jul 31, 2020
f5b477c
removing whitespace change
Jul 31, 2020
73db52b
ionide
Jul 31, 2020
98f5e4d
Merge remote-tracking branch 'upstream/master'
Niharikadutta Aug 7, 2020
4c5d502
Merge remote-tracking branch 'upstream/master'
Niharikadutta Aug 10, 2020
a766146
Merge branch 'master' into ml/countvectorizer
GoEddie Aug 12, 2020
ad6bced
Merge branch 'ml/countvectorizer' of https://github.com/GoEddie/spark
Niharikadutta Aug 13, 2020
8e1685c
Revert "Merge branch 'master' into ml/countvectorizer"
Niharikadutta Aug 13, 2020
255515e
Revert "Merge branch 'ml/countvectorizer' of https://github.com/GoEdd…
Niharikadutta Aug 13, 2020
a44c882
Merge remote-tracking branch 'upstream/master'
Niharikadutta Aug 14, 2020
3c2c936
fixing merge errors
Niharikadutta Aug 14, 2020
88e834d
removing ionid
Niharikadutta Aug 20, 2020
59e7299
Merge remote-tracking branch 'upstream/master'
Niharikadutta Aug 20, 2020
a13de2d
Merge branch 'master' of github.com:Niharikadutta/spark
Niharikadutta Aug 21, 2020
13d0e4a
Merge remote-tracking branch 'upstream/master'
Niharikadutta Aug 24, 2020
595b141
Merge remote-tracking branch 'upstream/master'
Niharikadutta Aug 29, 2020
decfa48
Merge remote-tracking branch 'upstream/master'
Niharikadutta Sep 2, 2020
ce694ff
Merge remote-tracking branch 'upstream/master'
Niharikadutta Sep 8, 2020
8128ba0
Merge remote-tracking branch 'upstream/master'
Niharikadutta Sep 12, 2020
52f0a74
Merge remote-tracking branch 'upstream/master'
Niharikadutta Sep 19, 2020
6a89f01
Merge remote-tracking branch 'upstream/master'
Niharikadutta Sep 24, 2020
4b1de41
Merge remote-tracking branch 'upstream/master'
Niharikadutta Sep 25, 2020
929d8e2
Merge remote-tracking branch 'upstream/master'
Niharikadutta Sep 26, 2020
ffa0a4d
Merge remote-tracking branch 'upstream/master'
Niharikadutta Oct 2, 2020
2579faa
Merge remote-tracking branch 'upstream/master'
Niharikadutta Oct 5, 2020
39b3950
first draft
Niharikadutta Oct 5, 2020
2297add
Merge remote-tracking branch 'upstream/master'
Niharikadutta Oct 6, 2020
daade7a
Merge remote-tracking branch 'upstream/master'
Niharikadutta Oct 8, 2020
cb6aa7a
Merge remote-tracking branch 'upstream/master'
Niharikadutta Oct 12, 2020
cbe6e50
Merge branch 'master' of github.com:Niharikadutta/spark
Niharikadutta Oct 12, 2020
3a04b19
Merge remote-tracking branch 'upstream/master'
Niharikadutta Oct 12, 2020
9377692
removing duplicate docs
Niharikadutta Oct 12, 2020
1295934
reverting table changes
Niharikadutta Oct 12, 2020
7497bd7
changes
Niharikadutta Oct 12, 2020
2c498dc
Merge remote-tracking branch 'upstream/master'
Niharikadutta Oct 13, 2020
d19cfb6
Merge remote-tracking branch 'upstream/master'
Niharikadutta Oct 16, 2020
d34188e
Merge branch 'master' of github.com:Niharikadutta/spark
Niharikadutta Oct 16, 2020
5457ffb
Merge remote-tracking branch 'upstream/master'
Niharikadutta Oct 26, 2020
2a44453
Merge remote-tracking branch 'upstream/master'
Niharikadutta Nov 4, 2020
3ec9756
Merge remote-tracking branch 'upstream/master'
Niharikadutta Nov 12, 2020
144233b
Merge remote-tracking branch 'upstream/master'
Niharikadutta Nov 18, 2020
ff38239
resolving merge conflicts
Niharikadutta Nov 18, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
first draft
  • Loading branch information
Niharikadutta committed Oct 5, 2020
commit 39b3950f50db5d96037a666e375fe1af198e967d
26 changes: 21 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

.NET for Apache Spark is compliant with .NET Standard - a formal specification of .NET APIs that are common across .NET implementations. This means you can use .NET for Apache Spark anywhere you write .NET code allowing you to reuse all the knowledge, skills, code, and libraries you already have as a .NET developer.

.NET for Apache Spark runs on Windows, Linux, and macOS using .NET Core, or Windows using .NET Framework. It also runs on all major cloud providers including [Azure HDInsight Spark](deployment/README.md#azure-hdinsight-spark), [Amazon EMR Spark](deployment/README.md#amazon-emr-spark), [AWS](deployment/README.md#databricks) & [Azure](deployment/README.md#databricks) Databricks.
.NET for Apache Spark runs on Windows, Linux, and macOS using .NET Core, or Windows using .NET Framework. It also runs on all major cloud providers including [Azure HDInsight Spark](deployment/README.md#azure-hdinsight-spark), [Amazon EMR Spark](deployment/README.md#amazon-emr-spark), [AWS](deployment/README.md#databricks), [Azure Databricks](deployment/README.md#databricks) & [Azure Synapse Analytics](https://azure.microsoft.com/en-us/services/synapse-analytics/).

**Note**: We currently have a Spark Project Improvement Proposal JIRA at [SPIP: .NET bindings for Apache Spark](https://issues.apache.org/jira/browse/SPARK-27006) to work with the community towards getting .NET support by default into Apache Spark. We highly encourage you to participate in the discussion.

Expand Down Expand Up @@ -39,7 +39,7 @@
<tbody align="center">
<tr>
<td >2.3.*</td>
<td rowspan=6><a href="https://github.com/dotnet/spark/releases/tag/v0.12.1">v0.12.1</a></td>
<td rowspan=6><a href="https://github.com/dotnet/spark/releases/tag/v1.0">v1.0</a></td>
</tr>
<tr>
<td>2.4.0</td>
Expand All @@ -56,6 +56,18 @@
<tr>
<td>2.4.5</td>
</tr>
<tr>
<td>2.4.6</td>
</tr>
<tr>
<td>2.4.7</td>
</tr>
<tr>
<td>3.0.0</td>
</tr>
<tr>
<td>3.0.1</td>
</tr>
<tr>
<td>2.4.2</td>
<td><a href="https://github.com/dotnet/spark/issues/60">Not supported</a></td>
Expand All @@ -69,9 +81,9 @@

## Get Started
These instructions will show you how to run a .NET for Apache Spark app using .NET Core.
- [Windows Instructions](docs/getting-started/windows-instructions.md)
- [Ubuntu Instructions](docs/getting-started/ubuntu-instructions.md)
- [MacOs Instructions](docs/getting-started/macos-instructions.md)
- [Windows Instructions](https://docs.microsoft.com/en-us/dotnet/spark/tutorials/get-started?tabs=windows)
- [Ubuntu Instructions](https://docs.microsoft.com/en-us/dotnet/spark/tutorials/get-started?tabs=linux)
- [MacOs Instructions](https://docs.microsoft.com/en-us/dotnet/spark/tutorials/get-started?tabs=linux)

## Build Status

Expand Down Expand Up @@ -155,6 +167,10 @@ We welcome contributions to both categories!
</tr>
</table>

## Learn More

To learn more about some features of .NET for Apache Spark, please visit [this compilation of How-To guides](docs/how-to-guides.md).

## Contributing

We welcome contributions! Please review our [contribution guide](CONTRIBUTING.md).
Expand Down
92 changes: 0 additions & 92 deletions docs/broadcast-guide.md

This file was deleted.

25 changes: 13 additions & 12 deletions docs/building/ubuntu-instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ Building Spark .NET on Ubuntu 18.04

# Table of Contents
- [Open Issues](#open-issues)
- [Pre-requisites](#pre-requisites)
- [Prerequisites](#prerequisites)
- [Building](#building)
- [Building Spark .NET Scala Extensions Layer](#building-spark-net-scala-extensions-layer)
- [Building .NET Sample Applications using .NET Core CLI](#building-net-sample-applications-using-net-core-cli)
Expand All @@ -12,17 +12,17 @@ Building Spark .NET on Ubuntu 18.04
# Open Issues:
- [Building through Visual Studio Code]()

# Pre-requisites:
# Prerequisites:

If you already have all the pre-requisites, skip to the [build](ubuntu-instructions.md#building) steps below.
If you already have all the prerequisites, skip to the [build](ubuntu-instructions.md#building) steps below.

1. Download and install **[.NET Core 3.1 SDK](https://dotnet.microsoft.com/download/dotnet-core/3.1)** - installing the SDK will add the `dotnet` toolchain to your path.
2. Install **[OpenJDK 8](https://openjdk.java.net/install/)**
2. Install **[OpenJDK 8](https://openjdk.java.net/install/)** .
- You can use the following command:
```bash
sudo apt install openjdk-8-jdk
```
- Verify you are able to run `java` from your command-line
- Verify you are able to run `java` from your command-line.
<details>
<summary>&#x1F4D9; Click to see sample java -version output</summary>

Expand All @@ -49,7 +49,7 @@ If you already have all the pre-requisites, skip to the [build](ubuntu-instructi
```

Note that these environment variables will be lost when you close your terminal. If you want the changes to be permanent, add the `export` lines to your `~/.bashrc` file.
- Verify you are able to run `mvn` from your command-line
- Verify you are able to run `mvn` from your command-line.
<details>
<summary>&#x1F4D9; Click to see sample mvn -version output</summary>

Expand All @@ -61,8 +61,8 @@ If you already have all the pre-requisites, skip to the [build](ubuntu-instructi
OS name: "linux", version: "4.4.0-142-generic", arch: "amd64", family: "unix"
```
4. Install **[Apache Spark 2.3+](https://spark.apache.org/downloads.html)**
- Download [Apache Spark 2.3+](https://spark.apache.org/downloads.html) and extract it into a local folder (e.g., `~/bin/spark-2.3.2-bin-hadoop2.7`)
- Add the necessary [environment variables](https://www.java.com/en/download/help/path.xml) `SPARK_HOME` e.g., `~/bin/spark-2.3.2-bin-hadoop2.7/`
- Download [Apache Spark 2.3+](https://spark.apache.org/downloads.html) and extract it into a local folder (e.g., `~/bin/spark-2.3.2-bin-hadoop2.7`).
- Add the necessary [environment variables](https://www.java.com/en/download/help/path.xml) `SPARK_HOME` to point to the local directory where you installed Apache Spark e.g., `~/bin/spark-2.3.2-bin-hadoop2.7/`.
```bash
export SPARK_HOME=~/bin/spark-2.3.2-hadoop2.7
export PATH="$SPARK_HOME/bin:$PATH"
Expand Down Expand Up @@ -96,15 +96,15 @@ Please make sure you are able to run `dotnet`, `java`, `mvn`, `spark-shell` from

# Building

For the rest of the section, it is assumed that you have cloned Spark .NET repo into your machine e.g., `~/dotnet.spark/`
For the rest of the section, it is assumed that you have cloned Spark .NET repo into your machine e.g., `~/dotnet.spark/`.

```
git clone https://github.com/dotnet/spark.git ~/dotnet.spark
```

## Building Spark .NET Scala Extensions Layer

When you submit a .NET application, Spark .NET has the necessary logic written in Scala that inform Apache Spark how to handle your requests (e.g., request to create a new Spark Session, request to transfer data from .NET side to JVM side etc.). This logic can be found in the [Spark .NET Scala Source Code](../../src/scala).
When you submit a .NET application, Spark .NET has the necessary logic written in Scala that informs Apache Spark how to handle your requests (e.g., request to create a new Spark Session, request to transfer data from .NET side to JVM side etc.). This logic can be found in the [Spark .NET Scala Source Code](../../src/scala).

Let us now build the Spark .NET Scala extension layer. This is easy to do:

Expand Down Expand Up @@ -164,14 +164,15 @@ You should see JARs created for the supported Spark versions:

# Run Samples

Once you build the samples, you can use `spark-submit` to submit your .NET Core apps. Make sure you have followed the [pre-requisites](#pre-requisites) section and installed Apache Spark.
Once you build the samples, you can use `spark-submit` to submit your .NET Core apps. Make sure you have followed the [prerequisites](#prerequisites) section and installed Apache Spark.

1. Set the `DOTNET_WORKER_DIR` or `PATH` environment variable to include the path where the `Microsoft.Spark.Worker` binary has been generated (e.g., `~/dotnet.spark/artifacts/bin/Microsoft.Spark.Worker/Debug/netcoreapp3.1/linux-x64/publish`)
2. Open a terminal and go to the directory where your app binary has been generated (e.g., `~/dotnet.spark/artifacts/bin/Microsoft.Spark.CSharp.Examples/Debug/netcoreapp3.1/linux-x64/publish`)
3. Running your app follows the basic structure:
```bash
spark-submit \
[--jars <any-jars-your-app-is-dependent-on>] \
--conf <any-custom-spark-config>=<config-value> \
--class org.apache.spark.deploy.dotnet.DotnetRunner \
--master local \
<path-to-microsoft-spark-jar> \
Expand Down Expand Up @@ -214,4 +215,4 @@ Once you build the samples, you can use `spark-submit` to submit your .NET Core
./Microsoft.Spark.CSharp.Examples Sql.Streaming.StructuredKafkaWordCount localhost:9092 subscribe test
```

Feel this experience is complicated? Help us by taking up [Simplify User Experience for Running an App](https://github.com/dotnet/spark/issues/6)
Feel this experience is complicated? Help us by taking up [Simplify User Experience for Running an App](https://github.com/dotnet/spark/issues/6).
Loading
0