This is a Neo4j unmanaged extension used for document and text classification.
- Scores
~90%
accuracy on Cornell Movie Review dataset using logistic regression. - Scores
~80%
accuracy on Stanford Large Movie Review dataset using logistic regression.
The compiled extension is available from the bin
directory.
-
To build it:
src/extension mvn assembly:assembly -DdescriptorId=jar-with-dependencies
-
Copy
src/extension/target/graphify-1.0.0-jar-with-dependencies.jar
to theplugins/
directory of your Neo4j server. -
Configure Neo4j by adding a line to conf/neo4j-server.properties:
org.neo4j.server.thirdparty_jaxrs_classes=org.neo4j.nlp.ext=/service
-
Start Neo4j server.
-
Query it over HTTP.
####Get similar labels:
curl http://localhost:7474/service/graphify/similar/{label}
####Train the natural language recognition model on text about 'Document classification':
curl -H "Content-Type: application/json" -d '{"label": ["Document classification"], "text": ["Documents may be classified according to their subjects or according to other attributes (such as document type, author, printing year etc.). In the rest of this article only subject classification is considered. There are two main philosophies of subject classification of documents: The content based approach and the request based approach."]}' http://localhost:7474/service/graphify/training
####Classify an unlabeled text:
curl -H "Content-Type: application/json" -d '{"text": "A document is a written or drawn representation of thoughts. Originating from the Latin Documentum meaning lesson - the verb means to teach, and is pronounced similarly, in the past it was usually used as a term for a written proof used as evidence."}' http://localhost:7474/service/graphify/classify
####Get a list of the extracted semantic features matching a text:
curl -H "Content-Type: application/json" -d '{"text": "A document is a written or drawn representation of thoughts. Originating from the Latin Documentum meaning lesson - the verb means to teach, and is pronounced similarly, in the past it was usually used as a term for a written proof used as evidence."}' http://localhost:7474/service/graphify/extractfeatures
####Get a sorted list of labels that are most related to the label 'Document classification':
curl http://localhost:7474/service/graphify/similar/Document%20classification
#####Example response:
{
"classes": [
{
"class": "Document",
"similarity": 0.19563160874988336
},
{
"class": "Intelligence",
"similarity": 0.1778887274627789
},
{
"class": "Machine learning",
"similarity": 0.14800216450227222
},
{
"class": "Data",
"similarity": 0.1467923282078174
},
{
"class": "Memory",
"similarity": 0.14600346713601134
}
]
}