forked from JohnSnowLabs/spark-nlp
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathcontribute.html
196 lines (192 loc) · 10.7 KB
/
contribute.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
<!DOCTYPE html>
<!--[if IE 8]>
<html lang="en" class="ie8"> <![endif]-->
<!--[if IE 9]>
<html lang="en" class="ie9"> <![endif]-->
<!--[if !IE]><!-->
<html lang="en"> <!--<![endif]-->
<head>
<!-- Global site tag (gtag.js) - Google Analytics -->
<script async src="https://www.googletagmanager.com/gtag/js?id=UA-70312582-2"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag() {
dataLayer.push(arguments);
}
gtag('js', new Date());
gtag('config', 'UA-70312582-2');
</script>
<title>Spark NLP - Contribute</title>
<!-- Meta -->
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="description" content="">
<meta name="author" content="">
<link rel="shortcut icon" href="fav.ico">
<link href='https://fonts.googleapis.com/css?family=Open+Sans:300italic,400italic,600italic,700italic,800italic,400,300,600,700,800'
rel='stylesheet' type='text/css'>
<!-- Global CSS -->
<link rel="stylesheet" href="assets/plugins/bootstrap/css/bootstrap.min.css">
<!-- Plugins CSS -->
<link rel="stylesheet" href="assets/plugins/font-awesome/css/font-awesome.css">
<link rel="stylesheet" href="assets/plugins/prism/prism.css">
<link rel="stylesheet" href="assets/plugins/lightbox/dist/ekko-lightbox.min.css">
<link rel="stylesheet" href="assets/plugins/elegant_font/css/style.css">
<!-- Theme CSS -->
<link id="theme-style" rel="stylesheet" href="assets/css/styles.css">
<!-- HTML5 shim and Respond.js IE8 support of HTML5 elements and media queries -->
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
<![endif]-->
<script type="text/javascript" src="assets/plugins/jquery-1.12.3.min.js"></script>
<script>
function getTasks() {
$.get("https://api.github.com/repos/JohnSnowLabs/spark-nlp/commits?path=docs/contribute.html",
function (data) {
var dateObject = new Date(data[0].commit.author.date);
$(".wrapper").html(dateObject.toDateString());
});
}
getTasks();
</script>
</head>
<body class="body-purple">
<script>
$(function () {
$("#includedHeader").load("header.html");
$("#includedFooter").load("footer.html");
});
</script>
<div class="page-wrapper">
<header id="header" class="header">
<div class="container">
<div id="includedHeader"></div>
<ol class="breadcrumb">
<li><a href="index.html">Home</a></li>
<li class="active">Contribute</li>
</ol>
</div>
</header>
<div style="border:1px solid #e7e7e7;"></div>
<div class="doc-wrapper">
<div class="container">
<div id="doc-header" class="doc-header text-center">
<h1 class="doc-title"><span aria-hidden="true" class="icon icon_datareport_alt"></span> SparkNLP - Contribute
</h1>
<div class="meta"><i class="fa fa-clock-o"></i> Last updated: <span class="wrapper"></span></div>
</div><!--//doc-header-->
<div class="doc-body">
<div class="doc-content">
<div class="content-inner">
<section id="features" class="doc-section">
<h2 class="section-title">Contribute</h2>
<div class="section-block">
<div class="answer">
<p>Refer to <b>our GitHub</b> page to take a look at the <a
href="https://github.com/JohnSnowLabs/spark-nlp/issues">GH Issues</a>, as
the project is yet small. You can create in there your own issues to either work
on them yourself or simply propose them.<br><br>
Feel free to clone the repository locally and submit pull requests so we can
review them and work together.<br>
<ul>
<li>feedback, ideas and bug reports</li>
<li>testing and development</li>
<li>training and testing nlp corpora</li>
<li>documentation and research</li>
</ul>
Help is always welcome, for any further questions, contact <a
href="mailto:nlp@johnsnowlabs.com" target="_blank">nlp@johnsnowlabs.com</a>.</p>
</div>
</div>
</section>
<section id="create-annotator-section" class="doc-section">
<h2 class="section-title">Your own annotator model</h2>
<div class="section-block">
<p>
Creating your first annotator transformer should not be hard,
here are a few guidelines to get you started.
Lets assume we want a wrapper annotator, which puts a character
surrounding tokens provided by a Tokenizer
<p>
</div>
<div id="anno-create-example" class="section-block">
<div class="code-block">
<h3 class="block-title">WordWrapper</h3>
<p>
uid is utilized for transformer serialization,
AnnotatorModel[MyAnnotator] will contain the common annotator logic
We need to use standard constructor for java and python compatibility
</p>
<pre><code class="language-python">class WordWrapper(override val uid: String) extends AnnotatorModel[WordWrapper] {
def this() = this(Identifiable.randomUID("WORD_WRAPPER"))
}</code></pre>
</div>
<div class="code-block">
<h3 class="block-title">Annotator attributes</h3>
<p>
We need to establish the type of our annotator, which means its identifier when
used by another annotator
And also need to define which other annotators we require, for this example,
we will work with tokens
</p>
<pre><code class="language-python">import com.johnsnowlabs.nlp.AnnotatorType._
override val annotatorType: AnnotatorType = TOKEN
override val requiredAnnotatorTyp
6D4C
es: Array[AnnotatorType] = Array[AnnotatorType](TOKEN)</code></pre>
</div>
<div class="code-block">
<h3 class="block-title">Annotator parameters</h3>
<p>
This annotator is not flexible if we don't provide parameters
</p>
<pre><code class="language-python">protected val character: Param[String] = new Param(this, "character", "this is the character used to wrap a token")
def setCharacter(value: String): this.type = set(pattern, value)
def getCharacter: String = $(pattern)
setDefault(character, "@")
</code></pre>
</div>
<div class="code-block">
<h3 class="block-title">Annotator logic</h3>
<p>
Here is how we act, annotations will automatically provide our required
annotations
We generally use annotatorType for metadata keys
</p>
<pre><code class="language-python">override def annotate(annotations: Seq[Annotation]): Seq[Annotation] = {
annotations.map(annotation => {
Annotation(
annotatorType,
annotation.begin,
annotation.end,
Map(annotatorType -> $(character) + annotation.result + $(character))
})
}</code></pre>
</div>
</div>
</section>
</div>
</div>
<div class="doc-sidebar">
<nav id="doc-nav">
<ul id="doc-menu" class="nav doc-menu hidden-xs" data-spy="affix">
<li><a class="scrollto" href="#features">Contribute</a></li>
</ul>
</nav>
</div>
</div>
</div>
</div>
</div>
<footer id="footer" class="footer text-center">
<div id="includedFooter"></div>
</footer>
<!-- Main Javascript -->
<script type="text/javascript" src="assets/plugins/bootstrap/js/bootstrap.min.js"></script>
<script type="text/javascript" src="assets/plugins/prism/prism.js"></script>
<script type="text/javascript" src="assets/plugins/jquery-scrollTo/jquery.scrollTo.min.js"></script>
<script type="text/javascript" src="assets/plugins/lightbox/dist/ekko-lightbox.min.js"></script>
<script type="text/javascript" src="assets/plugins/jquery-match-height/jquery.matchHeight-min.js"></script>
<script type="text/javascript" src="assets/js/main.js"></script>
</body>
</html>