0% found this document useful (0 votes)

226 views15 pages

Data Science Interview Questions 1

WRN stands for Wide Residual Networks which widen residual networks to make them more shallow while maintaining or improving accuracy. WRNs address problems with residual networks like diminishing feature reuse and circuit complexity. WRNs test different parameters like ResNet block design, depth, and width. Voice-face cross-modal matching and retrieval aims to discover associations between voices and faces algorithmically. It is formulated as a matching task to identify matching voices and faces, and a retrieval task to return faces matching a query voice. Research shows deep networks can achieve this, exceeding human-level performance. CenterNet detects objects by directly estimating object locations and sizes from a heatmap of keypoint triplets without relying on pre-generated anchors or proposals.

Uploaded by

qtdaivuong

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

226 views15 pages

Data Science Interview Questions 1

Uploaded by

qtdaivuong

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

DATA SCIENCE

INTERVIEW PREPARATION
(30 Days of Interview
Preparation)
# Day-25

Page 1 of 15
Q1. What is WRN?
Answer:
WRN: It stands for Wide Residual Networks is presented. By widening Residual Network (ResNet), the
network can be more superficial or shallow with same accuracy or improved accuracy. More external
network means:

 the number of layers can be reduced.

 Training time can be shorter, as well.

Problems on Residual Network (ResNet)

Circuit Complexity Theory
The authors of residual networks(ResNet) tried to make them as thin as possible in favor of increasing
their depth and having less parameters and even introduced a «bottleneck» block, which makes ResNet
blocks even thinner.
Diminishing Feature Reuse
However, As gradient flows through network, there is nothing to force it to go through residual block
weights, and it can avoid learning anything during training, so there may be either only few blocks that
learn useful representations or many blocks share very little information with a small contribution to the
final goal. This problem was formulated as a diminishing feature reuse.
WRNs (Wide Residual Networks)
In WRNs, plenty of parameters are tested like the design of ResNet block, how deep (deepening factor l),
and how extensive (widening factor k ) within the ResNet block.

Page 2 of 15
When k=1, it has the same width as the ResNet. While k>1, it is k time wider than ResNet.
WRN-d-k: means the WRN has a depth of d and with widening factor k.

 Pre-Activation ResNet is used in CIFAR-10, CIFAR-100, and SVHN datasets. Original ResNet is
used in the ImageNet dataset.
 The significant difference is that Pre-Activation ResNet has the structure of performing batch
norm and ReLU before convolution (i.e., BN-ReLU-Conv) while original ResNet has the
structure of Conv-BN-ReLU. And Pre-Activation ResNet is generally better than the original one,
but it has no visible improvement in ImageNet when layers are only around 100.
The design of the ResNet block

 B(3;3): Original «basic» block, in the above figure a.

 B(3;1;3): With one extra ( 1×1 ) layer in between the two 3×3 layers

 B(1;3;1): With the same dimensionality of all convolutions, bottleneck

 B(1;3): The network has the alternating (1×1, 3×3) convolutions.

 B(3;1): The network has the alternating( 3×3, 1×1) convolutions.

 B(3;1;1): This is Network in Network style block.

B(3;3) has the smallest error rate (5.73%).

Page 3 of 15
Note: The Number of depths (layers) is different is to keep the number of parameters close to each
other.

Q2.What is SIMCO: SIMilarity-based object Counting?

Answer:
Most approaches for counting similar objects in images assume a single object class; when is not, ad-hoc
learning is necessary. None of them are genuinely agnostic and multi-class, i.e., able to capture repeated
patterns of different types without any tuning. Counting approaches are based on density or regression
estimation; here, we focus on counting by detection, so the counted objects are individually detected first.

Research on agnostic counting is vital in many fields. It serves for obvious question answering, where
counting questions could be made on too-specific entities outside the semantic span of the available
classes (e.g., “What is the most occurrent thing?” in below Fig.). In representation learning, unsupervised
counting of visual primitives (i.e., visible “things”) is crucial to obtain a rich image representation.
Counting is a hot topic in cognitive robotics, where autonomous agents learn by separating sensory input
into the finite number of classes (without a precise semantics), building the classification system that

counts on each of them.

Application-wise, agnostic counting may help the manual tagging of training images, providing a starting
guess for the annotator on single- or multi-spectral images. Inpainting filters may benefit from a magic
wand capturing repeated instances to remove.

Figure: SIMCO on obvious question answering: the most occurrent object? SIMCO finds 47 LEGO
heads.

Page 4 of 15
In this paper, we present the SIMCO (SIMilarity-based object COunting) approach, which is entirely
agnostic, i.e., with no need for any ad-hoc class-specific fine-tuning, and multi-class, i.e., finding
different types of repeated patterns. Two main ideas characterize SIMCO.

First, every object to be counted is considered as a specialization of a basic 2D shape: this is particularly
true with many and small objects (see in above Fig: LEGO heads can be approximated as circles).
SIMCO incorporates this idea building upon the novel Mask-RCNN-based classifier, fine-tuned just once
on a novel synthetic shape dataset, InShape.

The second idea is that leveraging on the 2D shape approximation of objects; one can naturally perform
unsupervised grouping of the detected objects (grouping circles with circles, etc.),
discovering different types of repeated entities (without resorting to a particular set of classes). SIMCO
realizes this with a head branch in the network architecture implementing triplet losses, which provides
a 64-dim embedding that maps objects close if they share the same shape class plus some appearance
attributes. Affinity propagation clustering finds groups over this embedding.

Q3. What is Voice-Face Cross-modal Matching and Retrieval?

Answer:
Studies in biology and neuroscience have shown that human’s appearances are associated with their
voices. Both the facial features and voice-controlling organs of individuals are affected by hormones and
genetic information. Human beings can recognize this association. For example: when hearing from the
phone call, we can guess the gender, the approximate age of the person on the other end of the line. When
watching an unvoiced TV show. We can imagine an approximate voice by observing the face movement
of the protagonist. With the recent advances of deep learning, face recognition models, and speaker
recognition models have achieved exceptionally high precision. Can the associations between voices and
faces be discovered algorithmically by machines? The research on this problem can benefit a lot of
applications such as synchronizing video faces and talking sound, generating faces according to voices.
In recent years, much research attention has been paid on the voice-face cross-modal learning tasks,
which have shown the feasibility of recognizing voice-face associations. This problem is generally
formulated as a voice-face matching task and the voice-face retrieval task, as shown in Figure 1. Given
a set of voice audios and faces, voice-face matching is to tell which look makes the voice when machine
hearing voice audio. Voice-face retrieval is to present a sorted sequence of faces in the order of the

Page 5 of 15
estimated match from a query of voice recording.

SVHF is the prior of voice-face cross-modal learning, which studies the performance of CNN-based deep
network on this problem. The human’s baseline for the voice-face matching task is also proposed in the
paper. Both the “voice to face” and the “face to voice” matching tasks are studied in the Pins and
Horiguchi’s work, which exhibits similar performance on these two tasks. The curriculum learning
schedule is introduced in Pins for hard negative mining. Various visualizations of the embedding vectors
are presented to show the learned audio-visual associations in Kim’s work. DIMNet learns the common
representations for faces and voices by leveraging their relationship with some covariates such as gender
and nationality. DIMNet obtains an accuracy of 84.12% on the 1:2 matching, which exceeds the human
level.
Research on this problem is still in the early stage. Datasets used by previous research are always tiny,
which can’t evaluate the generalization ability of models sufficiently. Traditional test schemes based on
random tuple mining tend to have low confidence. The benchmark for this problem needs to be
established. This paper presents the voice-face cross-modal matching and retrieval framework, a dataset
from Chinese speakers and a data collection tool. In the frame, cross-modal embeddings are learned with
CNN-based networks, and triplet loss in a voice anchored metric space with L2-Norm constraint. An
identity-based example sampling method is adopted to improve the model efficiency. The proposed
framework achieves state-of-the-art performance on multiple tasks. For example, the result of 1:2
matching tested on 10 million triplets (thousands of people) reached 84.48%, which is also higher than
DIMNet tested on 189 people. We have evaluated the various modules of the CNN-based framework
and provided our recommendations. Even matching and retrieval based on the average of multiple voices
and multiple faces are also attempted, which can further improve the performance. This task is the
simplest way of analyzing video data. Large-scale datasets are used in this problem to ensure the
generalization ability required in a real application. The cross-language transfer capability of the model
is studied on the voice-face dataset of Chinese speakers we constructed. The series of performance

Page 6 of 15
metrics are presented on the tasks by extensive experiments. The source code of the paper and the dataset
collection tool will be published along with the article.

Q4. What is CenterNet: Object Detection with Keypoint Triplets?

Answer:
Object detection has been significantly improved and advanced with the help of deep learning, especially
convolutional neural networks (CNNs). In the current era, one of the most popular flowcharts is anchor-
based, which placed the set of rectangles with pre-defined sizes, and regressed them to the desired place
with the help of the ground-truth objects. These approaches often need a large number of anchors to
ensure the sufficiently high IoU (intersection over union) rate with the ground-truth objects, and the size
and aspect ratio of each anchor box needs to be manually designed. Also, anchors are usually not aligned
with the ground-truth boxes, which is not conducive to bounding box classification tasks.
To overcome the drawbacks of anchor based approaches, a keypoint-based object detection pipeline
named CornerNet was proposed. It represented each object by a pair of corner key points, which bypassed
the need for anchor boxes and achieved the state-of-the-art one-stage object detection accuracy.
Nevertheless, the performance of CornerNet is still restricted by its relatively weak ability to refer to the
global information of an object. That is to say since a pair of corners construct each object, the algorithm
is sensitive to detect the boundary of objects, meanwhile not being aware of which pairs of critical points
should be grouped into the objects. Consequently, as shown in Figure a, it often generates some incorrect
bounding boxes, most of which could be easily filtered out with complementary information, e.g., the
aspect ratio.

To address this issue, we equip CornerNet with an ability to perceive the visual patterns within each
proposed region, so that it can identify the correctness of each bounding box by itself. In this paper, we
present the low-cost yet effective solution named CenterNet, which explores the central part of the
proposal, i.e., the region that is close to the geometric center, with one extra keypoint. Our intuition is
that, if the predicted bounding box has a high IoU with the ground-truth box, then the probability that
the center key point in its central region is predicted as the same class is high, and vice versa. Thus,
during inference, after the proposal is generated as a pair of corner keypoints, we determine if the plan
is indeed an object by checking if there is a crucial central point of the same class falling within its central

Page 7 of 15
region. The idea, as shown in Figure a, is to use a triplet instead of a pair of key points to represent each
object.

Accordingly, for better detecting the center keypoints and corners, we propose two strategies to enrich
center and corner information, respectively. The first strategy is named as center pooling, which is used
in the branch for predicting the center keypoints. Center pooling helps the center keypoints obtain more
recognizable visual patterns within objects, which makes the central part of the proposal be better
perceived. We achieve this by getting out the max summed response in both horizontal and vertical
directions of the center key point on a feature map for predicting center keypoints. The second strategy
is named cascade corner pooling, which equips the original corner pooling module with the ability to
perceive internal information. We achieve this by getting out the max summed response in both boundary
and inner directions of objects on a feature map for predicting corners. Empirically, we verify that such
the two-directional pooling method is more stable, i.e., being more robust to the feature-level noises,
which contributes to the improvement of both precision and recall.

We evaluate the proposed CenterNet on the MS-COCO dataset, one of the most popular benchmarks for
large scale object detection. CenterNet, with both center pooling and the cascade corner pooling
incorporated, reports an AP of 47.0% on the test-dev set, which outperforms all existing one-stage
detectors by the extensive margin. With an average inference time of 270ms using a 52-layer hourglass
backbone and 340ms using a 104-layer hourglass backbone per image, CenterNet is quite efficient yet
closely matches the state-of-the-art performance of the other two-stage detectors.

Q5. What is Task2Vec: Task Embedding for Meta-Learning?

Answer:
The success of Deep Learning hinges in part on the fact that models learned for one task can be used on
the other related tasks. Yet, no general framework exists to describe and learn relations between tasks.
We introduce task2vec embedding, the technique to represent tasks as elements of the vector space is
based on the Fisher Information Matrix. The norms of the embedding correlates with the complexity of
the task, while the distance between embeddings captures the semantic similarities between tasks
(Fig. 1). When other natural distances are available, such as taxonomical distance in the biological
classification, we find that the embedding distance correlates positively with it (Fig. 2). Moreover, we
introduce an asymmetric distance on tasks that correlates with the transferability between tasks.

Computation of the embedding leverages the duality between network parameters (weights) and outputs
(activations) in a deep neural network (DNN): Just as the activations of a DNN trained on the complex
visual recognition task are the rich representation of the input images, we show that the gradients of the
weights relative to a task-specific loss are the rich representation of the task itself. Specifically, given a
task defined by the dataset D={(xi,yi)}Ni=1 of labeled samples, we feed the data through a pre-trained

Page 8 of 15
reference convolutional neural network which we call “probe network”, and compute the diagonal Fisher
Information Matrix (FIM) of the network filter parameters, to capture the structure of task. Since the
architecture and weights of the probe network are fixed, the FIM provides the fixed-dimensional
representation of the tasks. We show this embedding encodes the “difficulty” of the tasks, characteristics
of the input domain, and features of the probe network are useful to solve it.

Our task embedding can be used to reason about the space of the tasks and solve meta-tasks. As a
motivating example, we study the problem of selecting the best pre-trained feature extractor to solve a
new task. This can be particularly valuable when there is insufficient data to train or fine-tune a generic
model, and the transfer of knowledge is essential. task2vec depends solely on the task and ignores
interactions with the model, which may, however, play an essential role. To address this, we learn about
the joint task and model embedding, called model2vec, in such a way that models whose embeddings
are close to a task exhibit excellent performance on the task. We use this to select an expert from the
given collection, improving performance relative to fine-tuning a generic model trained on ImageNet
and obtaining close to the ground-truth optimal selection.

Q6. What is GLMNet: Graph Learning-Matching Networks for Feature

Matching?
Answer:
Many problems of interest in computer vision and pattern recognition area can be formulated as a
problem of finding consistent correspondences between two sets of features, which are known as feature
matching problem. Feature set that incorporates the pairwise constraint can be represented via an attribute

Page 9 of 15
graph whose nodes represent the unary descriptors of feature points, and edges encode the pairwise
relationships among different feature points. Based on this graph representation, feature matching can
then be reformulated as a graph node matching problem.
Graph matching generally first operates with both node and edge affinities that encode similarities
between the node and edge descriptors in two graphs. Then, it can be formulated mathematically as an
Integral Quadratic Programming (IQP) problem with permutation constraint on related solutions to
encode the one-to-one matching constraints. It is known to be NP-hard. Thus, many methods usually
solve it approximately by relaxing the discrete permutation constraint and finding locally optimal
solutions. Also, to obtain better node/edge affinities, learning methods have been investigated to
determine the more optimal parameters in node/edge affinity computation. Recently, deep learning
methods have also been developed for matching problems. The main benefit of deep learning matching
methods is that they can conduct visual feature representation, node/edge affinity learning, and matching
optimization together in an end-to-end manner. Zanfir et al. propose an end-to-end graph matching
model, which makes it possible to learn all the parameters of the graph matching process. Wang et al.
recently aim to explore graph convolutional networks (GCNs) for graph matching which conducts graph
node embedding and matching simultaneously in a unified system.

Inspired by recent deep graph matching methods, in this paper, we propose a novel Graph Learning-
Matching Network (GLMNet) for graph matching problems. Overall, the main contributions of this paper
are three aspects.

First, a critical aspect of (feature) graph matching is the construction of two matching graphs. Existing
deep graph matching models generally use fixed structure graphs, such as k-NN, Delaunay graph, etc.,
which thus are not guaranteed to serve the parallel task best. To address this issue, we propose to
adaptively learn a pair of optimal graphs for the matching task and integrate graph learning and graph
matching simultaneously in a unified end-to-end network architecture.

Second, the existing GCN based graph matching model adopts the general smoothing based graph
convolution operation for graph node embedding, which may encourage the feature embedding of each
node becoming more similar to those of its neighboring nodes. This is desirable for graph node labeling
or classification tasks, but undesirable for the matching task because extensive smoothing convolution
may dilute the discriminatory information. To alleviate this effect, we propose to incorporate a Laplacian
sharpening based graph convolution operation for graph node embedding and matching tasks. Laplacian
sharpening process can be regarded as the counterpart of Laplacian smoothing which encourages the
embedding of each node farther away from its neighbors.

Third, existing deep graph matching methods generally utilize a doubly stochastic normalization for the
final matching prediction. This usually ignores the discrete one-to-one matching constraints in matching

Page 10 of 15
optimization/prediction. To overcome this issue, we develop a novel constraint regularized loss to further
incorporate the one-to-one matching constraints in matching prediction.

Experimental results, including ablation studies, demonstrate the effectiveness of our GLMNet and
advantages of devised components, including graph learning-matching architecture, Laplacian
sharpening convolution for discriminative embedding, and constraint regularized loss to encode one-to-
one matching constraints.

Q7. What is SSAP: Single-Shot Instance Segmentation With Affinity

Pyramid?
Answer:
The rapid development of Convolutional networks has revolutionized various vision tasks, enabling us
to move towards a more fine-grained understanding of images. Instead of classic bounding-box level
object detection or class-level semantic segmentation, instance segmentation provides in-depth
knowledge by segmenting all the objects and distinguish different object instances. Researchers are
showing increasing interests in instance segmentation recently.

Current state-of-the-art solutions to this challenging problem can be classified into the proposal-
based and proposal-free approaches. The proposal-based methods regard it as an extension to the classic
object detection task. After localizing each object with a bounding box, the foreground mask is predicted
within each bounding box proposal. However, the performances of the scheme based methods are highly
limited by the quality of the bounding box predictions, and the two-stage pipeline also limits the speed
of systems. By contrast, the proposal-free approach has the advantage of its efficient and straightforward
design. This work also focuses on the proposal-free paradigm.

Page 11 of 15
The proposal-free methods mostly start by producing instance-agnostic pixel-level semantic class labels,
followed by clustering them into the different object instances with particularly designed instance-aware
features. However, previous methods mainly treat the two sub-processes as the two separate stages and
employ multiple modules, which is suboptimal. The mutual benefits between the two sub-tasks can be
exploited, which will further improve the performance of the instance, segmentation. Moreover,
employing multiple modules may result in additional computational costs for real-world applications.

To cope with the above issues, this work proposes a single-shot proposal-free instance segmentation
method, which jointly learns the pixel-level semantic class segmentation and object instance
differentiating in a unified model with a single backbone network, as shown in Fig. 1. Specifically, for
distinguishing different object instances, an affinity pyramid is proposed, which can be jointly learned
with the labeling of semantic classes. The pixel-pair affinity computes the probability that two pixels
belong to the same instance. In this work, the short-range relationships for pixels close to each other are
derived with dense small learning windows. Simultaneously, the long-range connections for pixels
distant from each other are also required to group objects with large scales or nonadjacent parts. Instead
of enlarging the windows, the multi-range relationships are decoupled, and long-range connections are
sparsely derived from the instance maps with lower resolutions. After that, we propose learning the
affinity pyramid at multiple scales along the hierarchy of a U-shape network, where the short-range and
long-range affinities are effectively learned from the feature levels with the higher and lower resolutions
respectively. Experiments in Table 3 show that the pixel-level semantic segmentation and the pixel-pair
affinity pyramid based grouping are indeed mutually benefited from the proposed joint learning scheme.
The overall instance of segmentation is thus further improved.

Then, to utilize the cues about global context reasoning, this work employs a graph partition method to
derive instances from the learned affinities. Unlike previous time-consuming methods, the cascaded
graph partition module is presented to incorporate the graph partition process with the hierarchical
manner of the affinity pyramid and finally provides both acceleration and performance improvements.
Concretely, with the learned pixel-pair affinity pyramid, the graph is constructed by regarding each pixel

Page 12 of 15
as the node and transforming affinities into the edge scores. Graph partition is then employed from
higher-level lower-resolution layers to the lower-level higher-resolution layers progressively. Instance
segmentation predictions from the lower resolutions produce confident proposals, which significantly
reduce node numbers at higher resolutions. Thus the whole process is accelerated.

Q8. What is TENER: Adapting Transformer Encoder for Name Entity

Recognition?
Answer:
The named entity recognition (NER) is the task of finding the start and end of an entity in the sentence
and assigning a class for this entity. NER has been widely studied in the field of natural language
processing (NLP) because of its potential assistance in question generation Zhou et al. (2017), relation
extraction Miwa and Bansal (2016), and coreference resolution Fragkou (2017). Since Collobert et al.
(2011), various neural models have been introduced to avoid the hand-crafted features Huang et al.
(2015); Ma and Hovy (2016); Lample et al.

NER is usually viewed as a sequence labeling task, the neural models typically contain three components:
word embedding layer, context encoder layer, and decoder layer Huang et al. (2015); Ma and Hovy
(2016); Lample et al. (2016); Chiu and Nichols (2016); Chen et al. Zhang et al. (2018); Gui et al.
(2019b). The difference between various NER models mainly lies in the variance in these components.

Recurrent Neural Networks (RNNs) are widely employed in NLP tasks due to its sequential
characteristic, which is aligned well with the language. Specifically, bidirectional extended short-term
memory networks (BiLSTM) Hochreiter and Schmidhuber (1997) is one of the most widely used RNN
structures. (Huang et al., 2015) was the first one to apply the BiLSTM and the Conditional Random
Fields (CRF) Lafferty et al. (2001) to sequence the labeling tasks. Owing to BiLSTM’s high power to
learn the contextual representation of words, it has been adopted by the majority of the NER models as
the encoder Ma and Hovy (2016); Lample et al. (2016); Zhang et al. (2018); Gui et al.

Recently, Transformer Vaswani et al. (2017) began to prevail in the various NLP tasks, like machine
translation Vaswani et al. (2017), language modeling Radford et al. (2018), and pretraining
models Devlin et al. (2018). The Transformer encoder adopts the fully-connected self-attention structure
to model the long-range context, which is the weakness of RNNs. Moreover, the Transformer has better
parallelism ability than RNNs. However, in the NER task, Transformer encoder has been reported to
perform poorly Guo et al. (2019), our experiments also confirm this result. Therefore, it is intriguing to
explore the reason why the Transformer does not work well in the NER task.

Page 13 of 15
The first is that the sinusoidal position embedding used in the vanilla Transformer is relative distance
sensitive and direction-agnostic, but this property will lose when used in the vanilla Transformer.
However, both the direction and relative distance information are essential in the NER task. For example,
words after “in” are more likely to be a location or time than words before it, and words before “Inc.” is
most likely to be of the entity type “ORG.” Besides, an entity is a continuous span of words. Therefore,
the awareness of relative distance might help the word better recognizes its neighbor. To endow the
Transformer with the ability of directionality and relative distance awareness, we adopt direction-aware
attention with the relative positional encoding Shaw et al. (2018); Huang et al. (2019); Dai et al. (2019).
We propose a revised relative positional encoding that uses fewer parameters and performs better.

The second is an empirical finding. The attention distribution of the vanilla Transformer is scaled and
smooth. But for NER, sparse attention is suitable since not all words are necessary to be attended. Given
the current word, a few contextual words are enough to judge its label. The smooth attention could
include some noisy information. Therefore, we abandon the scale factor of dot-production consideration
and the use of un-scaled and sharp attention.

With the above improvements, we can significantly boost the performance of the Transformer encoder
for NER.

Q9. What is Subword ELMo?

Answer:
Recently, pre-trained language representation has shown to be useful for improving many NLP tasks.
Embeddings from Language Models is one of the most outstanding works, which uses the character-
aware language model to augment word representation.

An essential challenge in training word-based language models is how to control the vocabulary size for
better rare word representation. No matter how large the vocabulary is, unique words are always
insufficiently trained. Besides, an extensive vocabulary takes too much time and computational resources
for the model to converge. Whereas, if the dictionary is too small, the out-of-vocabulary (OOV) issue
will harm the model performance slowly. To obtain effective word representation, Jozefowicz et
al. (2016) introduce character-driven word embedding using the convolutional neural network (CNN).

Page 14 of 15
However, potential insufficiency when modeling word from characters which hold little linguistic sense,
especially, the morphological source. Only 86 roles are adopted in English writing, making the input too
coarse for embedding learning. As we argue that for the better representation from a refined granularity,
the word is too large, and character is too small, it is natural for us to consider subword units between
character and the word levels.

Splitting the word into subwords and using them to augment the word representation may recover the
latent syntactic or semantic information. For example, uselessness could be divided into the following
subwords: Previous work usually considers linguistic knowledge-based methods to tokenize each word
into the subwords (namely, morphemes). However, such treatment may encounter the three main
inconveniences. First, the subwords from linguistic knowledge, typically including the morphological
suffix, prefix, and stem, may not be suitable for the targeted NLP task Banerjee and Bhattacharyya or
mislead the representation of some words, like the meaning of understanding cannot be formed
by under and stand. Second, linguistic knowledge, including related annotated lexicons or corpora, may
not even be available for the specific low-resource language. Due to these limitations, we focus on the
computationally motivated subword tokenization approaches in this work.

In this paper, we propose Embedding from Subword-aware Language Models (ESuLMo), which takes
subword as input to augment word representation and release a sizeable pre-trained language model
research communities. Evaluations show that the pre-trained language models of ESuLMo outperform
all RNN-based language models, including ELMo, in terms of PPL and ESuLMo beats state-of-the-art
results in three of four downstream NLP tasks.

------------------------------------------------------------------------------------------------------------------------------

Page 15 of 15

Deep Learning in Object Detection, PDF
No ratings yet
Deep Learning in Object Detection, PDF
64 pages
Sagar Institute of Research & Technology Department of Electronics & Communication
No ratings yet
Sagar Institute of Research & Technology Department of Electronics & Communication
13 pages
Deep Learning in Face Recognition
No ratings yet
Deep Learning in Face Recognition
47 pages
2022 Represent Compare and Learn A Similarity-Aware Framework For Class-Agnostic Counting CVPR 2022 Paper
No ratings yet
2022 Represent Compare and Learn A Similarity-Aware Framework For Class-Agnostic Counting CVPR 2022 Paper
10 pages
RP 1
No ratings yet
RP 1
6 pages
Object Detection With Deep Learning - A Review Summary
No ratings yet
Object Detection With Deep Learning - A Review Summary
11 pages
Performance Analysis of NASNet On
No ratings yet
Performance Analysis of NASNet On
26 pages
Hussain 2020 J. Phys. Conf. Ser. 1432 012087
No ratings yet
Hussain 2020 J. Phys. Conf. Ser. 1432 012087
14 pages
Image Sorting for Tech Enthusiasts
No ratings yet
Image Sorting for Tech Enthusiasts
6 pages
Zhang 2020
No ratings yet
Zhang 2020
5 pages
Real Time Object Detection Using YOLO
No ratings yet
Real Time Object Detection Using YOLO
6 pages
A Survey On Deep Network PDF
No ratings yet
A Survey On Deep Network PDF
24 pages
Report
No ratings yet
Report
33 pages
OpenFace: Mobile Face Recognition Library
No ratings yet
OpenFace: Mobile Face Recognition Library
20 pages
Comscie Algo Facenet
No ratings yet
Comscie Algo Facenet
10 pages
19N01F0003-Leveraging CNN and Transfer Learning For Vision-Based Human Activity Recognition
No ratings yet
19N01F0003-Leveraging CNN and Transfer Learning For Vision-Based Human Activity Recognition
57 pages
YOLO V3 ML Project
No ratings yet
YOLO V3 ML Project
15 pages
Du 2018 J. Phys. Conf. Ser. 1004 012029
No ratings yet
Du 2018 J. Phys. Conf. Ser. 1004 012029
9 pages
Face Recognition Based Attendance System
No ratings yet
Face Recognition Based Attendance System
59 pages
Facenet: A Unified Embedding For Face Recognition and Clustering
No ratings yet
Facenet: A Unified Embedding For Face Recognition and Clustering
9 pages
Face Mask Detection in Image and
No ratings yet
Face Mask Detection in Image and
20 pages
Deep Learning Neck Models Review
No ratings yet
Deep Learning Neck Models Review
7 pages
A Face Detection Method Based On Cascade Convolutional Neural Network
No ratings yet
A Face Detection Method Based On Cascade Convolutional Neural Network
18 pages
Objectdetection
No ratings yet
Objectdetection
7 pages
Computer Vision
No ratings yet
Computer Vision
14 pages
Trustworthy - Final Essay
No ratings yet
Trustworthy - Final Essay
21 pages
Face Recognition Using Facenet Deep Lear
No ratings yet
Face Recognition Using Facenet Deep Lear
6 pages
Computer Vision Assignment
No ratings yet
Computer Vision Assignment
9 pages
An Investigation of Deep Neural Network Based Techniques For Object Detection An
No ratings yet
An Investigation of Deep Neural Network Based Techniques For Object Detection An
6 pages
Object Recognition Techniques Guide
No ratings yet
Object Recognition Techniques Guide
34 pages
Seminar Report
0% (1)
Seminar Report
18 pages
A Comprehensive Survey On Face Recognition and Image Retrieval For Event-Based Applications
No ratings yet
A Comprehensive Survey On Face Recognition and Image Retrieval For Event-Based Applications
5 pages
Object Tracking in Crowd Environment Using Deep Learning
No ratings yet
Object Tracking in Crowd Environment Using Deep Learning
8 pages
2022 Class
No ratings yet
2022 Class
16 pages
Neural Network Based Face Detection
No ratings yet
Neural Network Based Face Detection
149 pages
One To Many Face Recognition With Bilinear CNNS: Sonal Mishra, Pratyush Tripathi
No ratings yet
One To Many Face Recognition With Bilinear CNNS: Sonal Mishra, Pratyush Tripathi
8 pages
Real-Time Facial Expression Recognition on Nao Robot
No ratings yet
Real-Time Facial Expression Recognition on Nao Robot
2 pages
Tensor Flow
No ratings yet
Tensor Flow
5 pages
Facial Recognition Using Deep Learning
No ratings yet
Facial Recognition Using Deep Learning
6 pages
A Deep Learning Approach To Building An Intelligent Video Surveillance System
No ratings yet
A Deep Learning Approach To Building An Intelligent Video Surveillance System
21 pages
Transfer Learning Approach Based On MobileNet Architecture For Human Smile Detection
No ratings yet
Transfer Learning Approach Based On MobileNet Architecture For Human Smile Detection
9 pages
RS of Building High
No ratings yet
RS of Building High
2 pages
IJRASET IntentDetection
No ratings yet
IJRASET IntentDetection
10 pages
A Facial Expression Recognizer Using Modified ResN
No ratings yet
A Facial Expression Recognizer Using Modified ResN
9 pages
Face Recognition with ResNet CNN
No ratings yet
Face Recognition with ResNet CNN
8 pages
Backbone Search For Object Detection For Applications in Intrusion Warning Systems
No ratings yet
Backbone Search For Object Detection For Applications in Intrusion Warning Systems
10 pages
Advanced Object Detection for Experts
No ratings yet
Advanced Object Detection for Experts
21 pages
A Study On Real Time Object Detection Using Deep Learning IJERTV11IS050269
No ratings yet
A Study On Real Time Object Detection Using Deep Learning IJERTV11IS050269
7 pages
Human Action Recognition On Raw Depth Maps
No ratings yet
Human Action Recognition On Raw Depth Maps
13 pages
Vehicle Object Detection Based On Deep Learning
No ratings yet
Vehicle Object Detection Based On Deep Learning
8 pages
Consciousness in Artificial Intelligence-47-88
No ratings yet
Consciousness in Artificial Intelligence-47-88
42 pages
A State-Of-The-Art Computer Vision Adopting Non - E
No ratings yet
A State-Of-The-Art Computer Vision Adopting Non - E
33 pages
Tinaface: Strong But Simple Baseline For Face Detection
No ratings yet
Tinaface: Strong But Simple Baseline For Face Detection
9 pages
Deep Convolutional Neural Network-Based Approaches
No ratings yet
Deep Convolutional Neural Network-Based Approaches
21 pages
1 s2.0 S0031320317304120 Main
No ratings yet
1 s2.0 S0031320317304120 Main
24 pages
bmvc14 Sun Fromvirtualtoreal
No ratings yet
bmvc14 Sun Fromvirtualtoreal
12 pages
AMM Axioms
No ratings yet
AMM Axioms
58 pages
Restricting RAG To The Source Documents
No ratings yet
Restricting RAG To The Source Documents
3 pages
Analysis and Comparison of The Main Zero-Knowledge Proof Scheme
No ratings yet
Analysis and Comparison of The Main Zero-Knowledge Proof Scheme
7 pages
Elastic MM
No ratings yet
Elastic MM
17 pages
Blench AA Prehistory Final
No ratings yet
Blench AA Prehistory Final
15 pages
Data Science Interview Prep
No ratings yet
Data Science Interview Prep
15 pages
SMS Spam Detection with NLP
No ratings yet
SMS Spam Detection with NLP
21 pages
Wu 2023 PASP 135 124501
No ratings yet
Wu 2023 PASP 135 124501
14 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
4 pages
Unlocking The Potential of Deep Learning For Marin
No ratings yet
Unlocking The Potential of Deep Learning For Marin
44 pages
Intro to CNNs for Image Processing
No ratings yet
Intro to CNNs for Image Processing
52 pages
Hussain 2017
No ratings yet
Hussain 2017
2 pages
Algorithm Unrolling Interpretable Efficient Deep Learning For Signal and Image Processing
No ratings yet
Algorithm Unrolling Interpretable Efficient Deep Learning For Signal and Image Processing
27 pages
Urban Noise Classification Using Machine Learning Techniques Comparative Analysis and Future
No ratings yet
Urban Noise Classification Using Machine Learning Techniques Comparative Analysis and Future
17 pages
Major Project Presentation v1.0 For Review Final
No ratings yet
Major Project Presentation v1.0 For Review Final
20 pages
A Global Earthquake Prediction Model Based On Spherical Convolutional LSTM
No ratings yet
A Global Earthquake Prediction Model Based On Spherical Convolutional LSTM
10 pages
Utilizing Deep Learning For Drug Side Effect Insig
No ratings yet
Utilizing Deep Learning For Drug Side Effect Insig
7 pages
1 s2.0 S1566253525005792 Main
No ratings yet
1 s2.0 S1566253525005792 Main
58 pages
Traffic Sign Detection For Autonomous VehicleApplication
No ratings yet
Traffic Sign Detection For Autonomous VehicleApplication
5 pages
A Novel Approach For Breast Cancer Detection Using Optimized Ensemble
No ratings yet
A Novel Approach For Breast Cancer Detection Using Optimized Ensemble
12 pages
A Study of Network Intrusion Detection S
No ratings yet
A Study of Network Intrusion Detection S
27 pages
Semantic Convolutional Neural Network Model For Safe Business Investment by Using BERT
No ratings yet
Semantic Convolutional Neural Network Model For Safe Business Investment by Using BERT
6 pages
A Systematic Literature Review of Deep and
No ratings yet
A Systematic Literature Review of Deep and
13 pages
1 s2.0 S2667305322000539 Main
No ratings yet
1 s2.0 S2667305322000539 Main
10 pages
HCIA-AI V3.0 Training Material
100% (2)
HCIA-AI V3.0 Training Material
474 pages
Akshat CV
No ratings yet
Akshat CV
2 pages
GhostNet: Efficient CNN Feature Generation
No ratings yet
GhostNet: Efficient CNN Feature Generation
10 pages
YOLOv12 To Its Genesis A Decadal and Comprehensive Review of The You Only Look Once (YOLO) Series
No ratings yet
YOLOv12 To Its Genesis A Decadal and Comprehensive Review of The You Only Look Once (YOLO) Series
29 pages
Decoding Artificial Intelligence-X - (Lesson Plans) (Final)
No ratings yet
Decoding Artificial Intelligence-X - (Lesson Plans) (Final)
17 pages
An Executives Guide To AI PDF
No ratings yet
An Executives Guide To AI PDF
12 pages
Aircraft Fault Detection Using Artificial Intelligence A Review
No ratings yet
Aircraft Fault Detection Using Artificial Intelligence A Review
6 pages
Attention Based Image Caption Generation ABICG Using Encoder-Decoder Architecture
No ratings yet
Attention Based Image Caption Generation ABICG Using Encoder-Decoder Architecture
9 pages
Malaria Detection Using Deep Learning
No ratings yet
Malaria Detection Using Deep Learning
5 pages
PRD8
No ratings yet
PRD8
11 pages
Gray Scale Image Captioning Using CNN and LSTM
No ratings yet
Gray Scale Image Captioning Using CNN and LSTM
8 pages
3 Sequence and Language Modeling
No ratings yet
3 Sequence and Language Modeling
56 pages

Data Science Interview Questions 1

Uploaded by

Data Science Interview Questions 1

Uploaded by

DATA SCIENCE

 the number of layers can be reduced.

 Training time can be shorter, as well.

Problems on Residual Network (ResNet)

 B(3;3): Original «basic» block, in the above figure a.

 B(1;3;1): With the same dimensionality of all convolutions, bottleneck

 B(1;3): The network has the alternating (1×1, 3×3) convolutions.

 B(3;1): The network has the alternating( 3×3, 1×1) convolutions.

 B(3;1;1): This is Network in Network style block.

B(3;3) has the smallest error rate (5.73%).

Q2.What is SIMCO: SIMilarity-based object Counting?

counts on each of them.

Q3. What is Voice-Face Cross-modal Matching and Retrieval?

Q4. What is CenterNet: Object Detection with Keypoint Triplets?

Q5. What is Task2Vec: Task Embedding for Meta-Learning?

Q6. What is GLMNet: Graph Learning-Matching Networks for Feature

Q7. What is SSAP: Single-Shot Instance Segmentation With Affinity

Q8. What is TENER: Adapting Transformer Encoder for Name Entity

Q9. What is Subword ELMo?

You might also like