[go: up one dir, main page]

0% found this document useful (0 votes)
32 views2 pages

Data Science Assessment Task

The document outlines an assessment task for data science involving the development of a model to predict semantic similarity between pairs of text paragraphs, with a scoring range from 0 to 1. Candidates are required to build and deploy this model as a Server API Endpoint, providing specific request and response formats. The final submission must include the live API endpoint, complete code, a short report, and an updated resume, with a completion deadline of three days from receipt of the task.

Uploaded by

Anmol
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views2 pages

Data Science Assessment Task

The document outlines an assessment task for data science involving the development of a model to predict semantic similarity between pairs of text paragraphs, with a scoring range from 0 to 1. Candidates are required to build and deploy this model as a Server API Endpoint, providing specific request and response formats. The final submission must include the live API endpoint, complete code, a short report, and an updated resume, with a completion deadline of three days from receipt of the task.

Uploaded by

Anmol
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

ASSESSMENT FOR DATA SCIENCE

PROBLEM STATEMENT

Dataset (attached with the task): The data contains a pair of paragraphs. These text paragraphs are
randomly sampled from a raw dataset. Each pair of sentences may or may not be semantically similar.
The candidate is to predict a value between 0-1 indicating the similarity between the pair of text paras.
A sample of a similar dataset will be used as test data, therefore it’s crucial to the model solution using
provided dataset.

Part A

Build an algorithm/model that can quantify the degree of similarity between the two text-based on
Semantic similarity. Semantic Textual Similarity (STS) assesses the degree to which two sentences are
semantically equivalent to each other.
1 means highly similar

0 means highly dissimilar

Part B

Deploy the Algorithm/Model built-in Part A in any cloud service provider. Your final algorithm should be
exposed as a Server API Endpoint. In order to test this API make sure you hit a request to the server to
get the result as a response to the API. The request-response body should be in the following format:

Request body: {“text1”: ”nuclear body seeks new tech .......”, ”text2”: ”terror suspects face arrest ......”}
Response body: {“similarity score”: 0.2 }
Note: “text1”, “text2”, and “similarity score” keys should be kept as it is, without any change.

THE FINAL SUBMISSION MUST INCLUDE THE FOLLOWING -

• - Live API endpoint(IP Address of hosted app) of the Algorithm Deployed on the Server
• - Complete Code for Part A and Part B (.py files)
• - 1-2 page short Report explaining only the core approach taken in Part A and Part B.
• - Your updated resume with contact number

INSTRUCTIONS

• - Use only Python programming language


• - The correctness of similarity scores on test data will be evaluated from the results obtained

from the Server Response.

• - Task evaluation is equally based on both Part A and Part B. Finally delivery of task A is

through task B itself. Therefore it’s mandatory to attempt both parts.

• - Please ensure the structure of the API endpoint is as per requirement.


• - Code must be well commented
dataneuron.ai | mail@dataneuron.ai |
• - Use any approach to solve algorithms using Statistical models Machine Learning or Deep

Learning

• - Use any cloud service providers to deploy solutions eg. Azure, GCP, AWS, Heroku, etc.
• - Candidates will be judged on three criteria namely the Model/Algo approach, Successfully

deployed API, and API response results on test data.

• - Time duration: 3 days from the day of receiving the task.

NOTE:

1. The given dataset does not contain any label. Therefore, can be treated as an unsupervised
learning problem. However, this does not imply that supervised techniques/algorithms are not
applicable. The candidate is free to use any technique.
2. Please attach your updated resume and contact information with the submission mail.
3. Your time should start from when this task was sent to you.
4. If you intend to take more than 3 days, you may do so without permission. However, it would be
appreciated if you state the reasons for the delay in your report.
5. Every step in the task is self-explanatory to the best of our knowledge. If any part is unclear, use
your best judgment and mention it in your report.
6. Your project will not be used for the benefit of the company in any manner. The intention of this
task is ONLY to evaluate your skills.
7. Your submission will showcase your skills and knowledge of the said field and help us evaluate
your candidature in a better manner, so kindly try to keep the work as original as possible.
8. The deployed server can be closed after the final results are announced. We recommend that
candidates should use freely available resources only to deploy their APIs.
9. Final submission must be sent at aviral.saw@dataneuron.ai and cc mail@dataneuron.ai
Submissions via any other platform will not be considered.
10. We wish you all the best!

You might also like