SlideShare a Scribd company logo
Machine Learning Design Interview Machine
Learning System Design Interview Khang Pham
download
https://ebookbell.com/product/machine-learning-design-interview-
machine-learning-system-design-interview-khang-pham-49053002
Explore and download more ebooks at ebookbell.com
Here are some recommended products that we believe you will be
interested in. You can click the link to download.
Machine Learning Design Patterns Solutions To Common Challenges In
Data Preparation Model Building And Mlops 1st Edition Valliappa
Lakshmanan
https://ebookbell.com/product/machine-learning-design-patterns-
solutions-to-common-challenges-in-data-preparation-model-building-and-
mlops-1st-edition-valliappa-lakshmanan-23913214
Machine Learning Design Patterns Valliappa Lakshmanan Sara Robinson
Michael Munn Valliappa Lakshmanan
https://ebookbell.com/product/machine-learning-design-patterns-
valliappa-lakshmanan-sara-robinson-michael-munn-valliappa-
lakshmanan-29800692
Machine Learning Design Patterns Solutions To Common Challenges In
Data Preparation Model Building And Mlops Valliappa Lakshmanan
https://ebookbell.com/product/machine-learning-design-patterns-
solutions-to-common-challenges-in-data-preparation-model-building-and-
mlops-valliappa-lakshmanan-63127456
Machine Learning Design Patterns Valliappa Lakshmanan
https://ebookbell.com/product/machine-learning-design-patterns-
valliappa-lakshmanan-170761324
Mastering Machine Learning Design A Practical Handbook For Creating
Scalable Cloudnative Systems For Developers
https://ebookbell.com/product/mastering-machine-learning-design-a-
practical-handbook-for-creating-scalable-cloudnative-systems-for-
developers-55989684
Artificial Intelligence By Example Acquire Advanced Ai Machine
Learning And Deep Learning Design Skills 2nd Edition Denis Rothman
https://ebookbell.com/product/artificial-intelligence-by-example-
acquire-advanced-ai-machine-learning-and-deep-learning-design-
skills-2nd-edition-denis-rothman-10815800
Machine Learning System Design Meap V03 Chapters 1 To 7 Of 16 Valerii
Babushkin
https://ebookbell.com/product/machine-learning-system-design-
meap-v03-chapters-1-to-7-of-16-valerii-babushkin-50563132
Machine Learningbased Design And Optimization Of Highspeed Circuits
1st Edition Vazgen Melikyan
https://ebookbell.com/product/machine-learningbased-design-and-
optimization-of-highspeed-circuits-1st-edition-vazgen-
melikyan-54706782
Machine Learning System Design With Endtoend Examples 1st Edition
Valerii Babushkin
https://ebookbell.com/product/machine-learning-system-design-with-
endtoend-examples-1st-edition-valerii-babushkin-230360222
Machine Learning Design Interview Machine Learning System Design Interview Khang Pham
Machine Learning Design Interview Machine Learning System Design Interview Khang Pham
MACHINE LEARNING DESIGN
INTERVIEW
KHANG PHAM
CALIFORNIA, 2022
Machine Learning Design Interview
Copyright 2022 Khang Pham
All rights reserved under International and Pan-American Copyright
Conventions
No part of this book may be reproduced, stored in a retrieval system or
transmitted in any way by any means, electronic, mechanical, photocopy,
recording or otherwise without the prior permission of the author except as
provided by USA copyright law.
ISBN-13: 979-8-8130-3157-1 [paperback]
Success Stories
Name Company Offer Positions
Victor Facebook Facebook Senior Manager (E7) and others
Stanford
Conor
Facebook,
Amazon
Facebook ML Manager (E6/M0) and 5 other
offers
Ted Vice
Amazon,
Facebook
Amazon DS (L5) and Facebook DS (E5)
Mike
Bloomberg
Facebook,
Spotify
Facebook MLE (E5), Spotify MLE (Senior)
and others
Jerry
Google, Apple,
Facebook, Cruise
Google MLE (L5), Cruise MLE (L5), Apple
MLE (Senior), FB MLE (E5) and others
Steven
Chris
Google Google MLE (L4) and others
Adam Google Google MLE (L5)
Patrick Amazon, Wish Amazon DS (L5), Wish DS
Bolton
Chandra
Intuit Intuit MLE (Senior)
David
Nguyen
NVIDIA,
NBCUniversal
Intern
Daniel Series B startup Senior MLE.
Steven AccuWeather SWE, ML
Sanchez Pinterest, Citadel Senior SWE, ML
Ben Amazon Applied Scientist
Mary Twitter, Apple Senior MLE
Mark
Facebook,
Tiktok, other
Senior MLE (E5)
Ariana
Intuit,
Bloomberg
Data Scientist
Michael
Wong Docusign Senior SWE/ML
Teo
LinkedIn,
Google, Tiktok
Senior SWE/AI, Google (L4)
Nick
Facebook,
Google
MLE (E5), Google (L4)
Quinton
Microsoft,
Vmware
Staff SWE/ML
Lex
Facebook,
Google
Google Brain (L5) and Facebook (E5)
Strange Facebook Senior SWE/ML (E5)
Jim Amazon Amazon Applied Scientist (L4)
Shawn
Amazon, Google,
Uber
Amazon Applied Scientist, Google Senior
SWE/ML(L5) and Uber L4
I wanted to update you that I just signed my offer with Amazon. Thank you so much for your help!
Ted, Amazon Senior Data Scientist
I’m done with interviews. FB called with an initial offer, now I need to start negotiating
Michael, Facebook Senior SWE/ML
I got the offer from Intuit. Thank you so much, it would not be possible without your help.
Bolton, Intuit Senior SWE/ML
Hi Khang, thank you for your help in the past few months. I have received an offer from Docusign.
Michael Wong, DocuSign Senior SWE/ML
Thanks to your Github repo, I got a research internship at Samsung. Your notes are so helpful. Many
thanks.
Dave Newton, Samsung MLE Intern
I received an offer from Facebook (exciting!!!)
Victor, Facebook Senior ML Manager
I’m very happy that I could crack the Google interview. So finally I could pass Uber, Google and
Amazon. Amazon was an applied scientist while the other two were ML engineers. Lots of hard work
and effort to get to this state Thank you for staying with me all this time.
Shawn, Google Senior SWE/ML
Thanks to ML system design on , I just wanted to say I thought the course was super helpful! I got
offers from google, fb, apple and tesla.
Jerry, Google Senior SWE/ML
Hi Khang, I got the offer from the company. THANK YOU for your coaching!!!
Daniel, Unicorn startup Senior SWE/ML
Hi Khang, I got an offer from Apple and Twitter. Thanks for your support.
Mary, Apple Senior SWE/ML
Hi Khang, I got an offer from Intuit today. Thank you so much for all your help.
Ariana, Intuit Data Scientist
Hi Khang, I want to let you know that I got offers from FB and Google. I decided to take the FB offer
in the end. Thank you so much for guiding me through the whole interviewing process. Best!
Mark, Facebook Senior SWE/ML
Hey Khang, Received an E5 offer :). Thanks for helping me throughout this process.
Strange, Facebook Senior SWE/ML
This book is for my wife,
my son, my mom and my dad.
Preface
Machine learning design is a difficult topic. It requires knowledge from
multiple disciplines, such as big data processing, linear algebra, machine
learning and deep learning. This is the first book that focuses on applied
machine learning and deep learning in production. It covers all the design
patterns and state-of-the-art techniques from top companies, such as Google,
Facebook, Amazon, etc.
Who should read this book?
Data scientist, software engineer or data engineer who have a background in
Machine Learning but never work on Machine Learning at scale will find this
book helpful.
How to read this book?
Section 1.1.1 to 1.1.4 helps you review Machine Learning fundamentals.
If you have a lot of experience in Machine Learning you can skip these
sections.
Section 1.1.5 is very important at big tech companies, especially
Facebook.
Chapter 2 helps you review important topics in the Recommendation
System.
Chapter 3 to chapter 8 explains end to end design of the most popular
Machine Learning system at big tech companies.
Chapter 9 helps you test your understanding.
I’d like to acknowledge my friends Steven Tartakovsky, Tommy Nguyen and
Jocelyn Huang for helping me in proofreading this book.
Keep up to date with Machine Learning Engineer: mlengineer.io. All the quiz
solutions can be found at: https://rebrand.ly/bookerrata.
Khang Pham
California
April 2022
Machine Learning Primer
In this chapter, we will review commonly used machine learning techniques
from industry. We focus on the application of Machine Learning techniques.
The readers should already know most of these concepts in theory.
Feature Selection and Feature Engineering
One Hot Encoding
One hot encoding is very popular when you have to deal with categorical
features having medium size cardinality.
In this example, when we have one column with four unique values, we
create three more extra columns after one hot encoding. If one column has
thousands of unique values, one hot encoding will create thousands of new
columns.
One Hot Encoding. Source: mlengineer.io
Common Problems
Tree-based models, such as decision trees, random forests, and boosted trees,
don’t perform well with one hot encodings, especially when the tree has
many levels (i.e., when there are values of categorical attributes). This is
because they pick the feature to split, based on how well splitting the data on
that feature will “purify” it. If we have several levels, only a small fraction of
the data will usually belong to any given level, so the one hot encoded
columns will be mostly zeros. Since splitting on this column will only
produce a small gain, tree-based algorithms typically ignore the information
in favor of other columns. This problem persists, regardless of the volume of
data you actually have. Linear models or deep learning models do not have
this problem.
Expansive computation and high memory consumption: many unique values
will create high-dimensional feature vectors. For example, if a column has a
million unique values, it produces feature vectors, each with a dimensionality
of one million.
Best Practices
When levels (categories) are not important, we can group them together in
”Other” class. Make sure that the pipeline can handle unseen data in the test
set. In python, you can use pandas.get_dummies or sklearn OneHotEncoder.
However, pandas.get_dummies does not “remember” the encoding during
training, and if testing data has new values, it can lead to inconsistent
mapping.
One Hot Encoding in Tech Companies
One Hot Encoding is used a lot in tech companies. For example, at Uber, one
hot encoding is used on features before training some of their production
XGboost models. However, sometimes, when the there are a large number of
categorical values, such as in the tens of thousands, it becomes impractical to
reasonably use One Hot Encoding. There is another technique, that’s actually
used at Instacart on their models, that’s called mean encoding mean.
Mean Encoding
Take the Adult income data set example. We have data about the income of
50,000 people with different demographics: age, gender, education
OneHotEncoder in scikit-learn has the advantage as you can
use fit/transform/fit_transform, therefore, you can
persist and use it together with Pipeline.
background, etc. Let’s assume we want to handle Age data as categorical.
There can be 80-90 unique values for this column. If we apply one hot
encoding, it will create a lot of new columns for this small data set.
Adult income dataset
Age Income
18 60,000
18 50,000
18 40,000
19 66,000
19 51,000
19 42,000
We treat the Age feature as continuous variables by taking the average of
income for that Age value. For example, we can create a new column
Age_mean_enc. It represents the mean value of income for a specific Age.
The benefit is that we can use this new column as a continuous variable.
Mean Encoding for Income Data
Age Income Age_mean_enc
18 60,000 50,000
18 50,000 50,000
18 40,000 50,000
19 66,000 53,000
19 51,000 53,000
19 42,000 53,000
If we use this method for the whole data we use for training, it will lead to
label leakage. So it’s important that we use separate data for computing mean
encoding. To make mean encoding even more robust, we can also apply
Additive Smoothing1 or Cross Validation methods.
Feature Hashing
Feature hashing, or hashing trick, converts text data, or categorical attributes
with high cardinalities, into a feature vector of arbitrary dimensionality. In
some AdTech companies (Twitter, Pinterest, etc.), it’s not uncommon for a
model to have thousands of raw features.
Feature Hashing
Benefits
Feature hashing is very useful for features with very high cardinality with
hundreds, and sometimes thousands, of unique values. Hashing trick is a way
to reduce the increase in dimension and memory footprint by allowing
multiple values to be present/encoded as the same value.
Feature Hashing Example
First, you decide on the desired dimensionality of your feature vectors. Then,
using a hash function, you first convert all values of your categorical attribute
(or all tokens in your collection of documents) into a number, and then
convert this number into an index of your feature vector. The process is
illustrated in figure 1.1.
Let’s illustrate how it would work for converting the text “the quick brown
fox” into a feature vector. Let us have a hash function h that takes a string as
input and outputs a non-negative integer, and let the desired dimensionality
be 5. By applying the hash function to each word and applying the modulo of
5 to obtain the index of the word, we get:
Then we build the feature vector as, [1, 0, 0, 1, 2]. Indeed, h(the) mod
5 = 0 means that we have one word in dimension 0 of the feature vector;
h(quick) mod 5 = 4 and h(brown) mod 5 = 4 means that we have two
words in dimension 4 of the feature vector, and so on.
As you can see, there is a collision between the words “quick” and “brown”:
they both are represented by dimension 3. The lower the desired
dimensionality, the higher are the chances of collision. This is the trade-off
between speed and quality of learning.
Commonly used hash functions are MurmurHash3, Jenkins, CityHash, and
MD5.
Feature Hashing in Tech Companies
Feature hashing is widely popular in a lot of tech companies such as
Booking, Facebook (Semantic Hashing using Tags and Topic Modeling,
2013), Yahoo, Yandex, Avazu, and Criteo.
h(the) mod 5 = 0
h(quick) mod 5 = 4
h(brown) mod 5 = 4
h(fox) mod 5 = 3
One problem with hashing is collisions. If the hash size is too small, more
collisions will happen and negatively affect model performance. On the other
hand, the larger the hash size the more it will consume memory. Collisions
also affect model performance. With high collisions, the model won’t be able
to differentiate coefficients between feature values. For example, the
coefficient for “User login/User logout” might end up being the same, which
makes no sense.
Feature Hashing: Hash Size vs Logloss. Source: booking.com
Depending on the application, you can choose the number of bits for feature
hashing that provide the right balance between model accuracy and
computing cost.
Cross Feature
Cross feature, or conjunction, between two categorical variables of
cardinality c1c1 and c2c2 is just another categorical variable of cardinality
c1×c2c1 times c2. If c1c1 and c2c2 are large, the conjunction feature has
high cardinality, and the use of the hashing trick is even more crucial in this
case. Cross feature is usually used with a hashing trick to reduce the high
dimensions. As an example, suppose we have Uber pick up data with latitude
and longitude stored in a database, and we want to predict demand at a
certain location. If we only use the feature latitude for learning, the model
might learn that city blocks at particular latitudes are more likely to have
higher demand than others. Likewise, for the feature longitude. However, if
we cross longitude by latitude, the cross feature represents a well-defined city
block and allows the model to learn more accurately.
What would happen if we don’t create a cross feature? In this example, we
have two classes: orange and blue. Each point has two features: x1 and x2.
Can we draw a line to separate them? Can we use a linear model to learn to
separate these classes? To solve this problem, we can introduce a new
feature: x3 = x1 * x2. Now we can learn a linear model with three features:
x1, x2 and x1*x2.
Cross feature. Source: developers.google.com
Cross features are also very common in recommendation systems. In
practice, we can also use wide and deep architecture to combine many dense
features and sparse features. You can see one concrete example in section
Wide and Deep [sec-wide-and-deep].
Embedding
Both one hot encoding and feature hashing can represent features in
multidimensions. However, these representations do not usually preserve the
semantic meaning of each feature. For example, using OntHotEncoding can’t
guarantee the word ‘cat’ and ‘animal’ are close to each other in
multidimensions; or user ‘Kanye West’ is close to ‘rap music’ in YouTube
data. The proximity here can be interpreted from the semantic perspective or
engagement perspective. This is an important distinction and has implications
for how we train embedding.
How to Train Embedding
In practice, there are two ways to train embedding: pre-trained embedding
i.e: word2vec2 style or cotrained, (i.e., YouTube video embedding).
In Word2Vector representation, we want our vector representation for each
word such that if vector(word1) is close to vector(word2), then they are
somewhat semantically similar. We can achieve this by using the surrounding
words to predict the middle word in the sentence or using one word to predict
surrounding words. We can see an example in the section below.
Embedding
As an example, each word can be represented as a dd dimension vector, and
we can train our supervised model. We then use the outputs of one of the
fully connected layers near the output layer of the neural network model as
embeddings of the input object. In this example, embedding for ’cat’ is
represented as a [1.2,−0.1,4.3,3.2][1.2, -0.1, 4.3, 3.2] vector.
There are two ways to formulate the problems: Continuous Bag of Words
(CBOW) and Skip-gram. For CBOW, we want to predict one word based on
the surrounding words. For example, if we are given: word1 word2 word3
word4 word5, we want to use (word1, word2, word4, word5) to predict
word3.
CBOW. Source: Exploiting Similarities Among Languages for Machine Translation
In the skip-gram model, we use ’word3’ to predict all surrounding words
’word1, word2, word4, word5’.
Skipgram. Source: Exploiting Similarities Among Languages for Machine Translation
Word2Vec example
Work2vec CBOW example
Model Input Label
the, cat, on, the sat
cat, sat, the, orange on
sat, on, orange, tree the
Instagram uses this type of embedding to provide personalized
recommendations for their users, while Pinterest uses this as part of their Ads
Ranking model. In practice, for some apps like Pinterest and Instagram where
the user’s intention is strong, we can use word2vec style embedding training.
How Does Instagram Train User Embedding?
Within one session, Instagram user A sees the photos for user B then user C’s
and so on. If we assume user A is currently interested in certain topics, we
can also assume user B and user C’s photos might be relevant to those topics
of interest. For each user session, we have a collection of actions like the
below diagram:
User A →longrightarrow see user
B photos →longrightarrow see
user C photos
Sequence Embedding
We can formulate each session as a sentence and each user’s photos as
words. This is suitable for users who are exploring photos or accounts in a
similar context during a specific session.
How Does DoorDash Train Store Embedding?
DoorDash uses this approach to do store embedding. For each session, we
assume users may have a certain type of food in mind, and they view store A,
store B, etc. We can assume these stores are somewhat similar to the user’s
interests.
Store 1 →longrightarrow Store 2
→longrightarrow Store 3
Store Embedding. Source: Doordash
We can train a model to classify a given pair of stores if they show up in a
user session. Next, we will see another way to train embedding. It usually
looks at embedding to optimize for some engagement metrics.
How Does YouTube Train Embedding in Retrieval?
Recommendation System usually consists of three stages: Retrieval, Ranking
and Re-ranking (read Chapter [rec-sys]). In this example, we will cover how
YouTube builds Retrieval (Candidate Generation) component using Two-
tower architecture.
Figure 1.2 provides an illustration of the two-tower model (read Common
Deep Learning 1.5 section) architecture where left and right towers encode
user, context and item respectively. Intuitively we can treat this problem as a
multi-class classification problem. We have two towers3: left tower takes
(users, context) as input and right tower takes movies as input.
Two-tower Deep Neural Network4 is generalized from the multi-class
classification neural network, a multi-layer perceptron (MLP) model,
where the right tower of Figure 1.2 is simplified to a single layer with
item embeddings.
Given input x (user, context), we want to pick candidate y (videos) from
all available videos.
A common choice is to use Softmax function
P(y|x;θ)=es(x,y)∑i=1mes(x,yi)P(y| x; theta) = frac{e^{s(x, y)}}
{sum_{i=1}^m e^{s(x, y_i)}}
Loss function: use log-likelihood L=−1T∑i=1Tlog(P(yi|xi;θ))L = -
frac{1}{T} sum_{i=1}^T log(P(y_i|x_i;theta))
As a result, the two-tower model architecture is capable of modeling the
situation where the label has structures or content features.
StringLookup api maps string features to integer indices.
Embedding layer API turns positive integers (indexes) into dense
vectors of fixed size.
Two-tower architecture. Source: Sampling-Bias-Corrected Neural Modeling for Large
Corpus Item Recommendations
The following are key questions we need to consider:
For multi-classification where the video repository is huge, how feasible
is it approach? Solution: for each mini-batch, we sampled data from our
videos corpus as negative samples. One example is to use power-law
distribution for sampling.
When sampling, it’s possible that popular videos are overly penalized as
negative samples in a batch. Does it introduce bias in our training data?
One solution is to “correct” the logit output sc(xi,yj)=s(xi,yj)
−log(pj)s^c(x_i, y_j) = s(x_i, y_j) - log(p_j). Here pjp_j means the
probability of selecting video j.
What if an average user only watches 2% of the videos completely, the
other 98% of videos they just watch a few seconds? Is it a good idea to
consider all engaged videos equally important? We can introduce
continuous reward rr to reflect the degree of engagement. For example:
watch time.
Why do we need to use dot product? Can we use other operators?
How many dimensions for the embeddings?
Does movie embedding dimension need to be the same as the user
embedding dimension?
Why do we use relu? Can we use other activation functions?
Facebook open sources their Deep Learning Recommendation Model5 with
similar architecture.
How Does LinkedIn Train Embedding?
Pyramid two-tower network architecture/ Source: LinkedIn engineering blog
LinkedIn used reverse pyramid architecture, which is the hidden layers
growing in number of activations as we go deeper.
LinkedIn used Hadamard product for Member Embedding and Job
Embedding.
The final prediction is a logistic regression on the Hadamard product
between each seeker and job posting pair.
Example of Hadamard product:
[1234]⊙[5326]=[56624]begin{bmatrix} 1 & 2 3 & 4 end{bmatrix}odot
begin{bmatrix} 5 & 3 2 & 6 end{bmatrix}= begin{bmatrix} 5 & 6 6 &
24 end{bmatrix}
We chose the Hadamard product over more common
functions, like cosine similarity, to give the model flexibility to
learn its own distance function, while avoiding a fully con-
nected layer to reduce scoring latency in our online
recommendation systems.
How Does Pinterest Learn Visual Embedding
Take the Pinterest Visual Search6 example. When users search for a specific
image, Pinterest uses input pins visual embedding and search for similar pins.
How do we generate visual embedding? Pinterest used image recognition
deep learning architecture, e.g., VGG16, ResNet152, Google Net, etc., to fine
tune on the Pinterest dataset. The learned features will then be used as
embedding for Pins. You can see an example in Chapter 10 with the Airbnb
room classification use case.
We can also use collaborative filtering. Read Collaborative Filtering
[collaborative-filtering] section.
Application of Embedding in Tech Companies
Twitter uses embedding for UsersID, and it’s widely used in different
Quiz About Two Tower Embedding
Recall that in two-tower user/movie embedding, we have the
last layer of each tower as embedding. When I build a
network, I decide to set the user embedding dimension to 32
and the movie embedding dimension to 64. Will this
architecture work? answer
[A] Yes, as long as the model learns we can set any
dimensions we want.
[B] No, a movie has too many embedding dimensions, and we
will run out of memory during serving millions of movies.
[C] No, there is a shape mismatch between user embedding
and movie embedding.
use cases at Twitter, such as recommendation, nearest neighbor search,
and transfer learning.
Pinterest Ads ranking uses word2vec style where each user session can
be viewed as: pin A →rightarrow pin B →rightarrow pin C, then co-
trained with multitask modeling.
Instagram’s personalized recommendation model uses word2vec style
where each user session can be viewed as: account 1 →rightarrow
account 2 →rightarrow account 3 to predict accounts with which a
person is likely to interact within a given session.
YouTube recommendations uses two-tower model embedding then co-
trained with multihead model architecture. (Read about multitask
learning in section Common Deep Learning 1.5).
DoorDash personalized store feed uses word2vec style where each user
session can be viewed as: restaurant 1 →rightarrow restaurant 2 →
rightarrow restaurant 3. This Store2Vec model can be trained to predict
if restaurants were visited in the same session using CBOW algorithm.
How Do We Evaluate the Quality of the Embedding?
There is no easy answer to this question. We have two approaches:
Apply embedding to downstream tasks and measure their model
In the Tensorflow documentation, they recommend the “rule
of thumb”: d=D4d = sqrt[4]{D} where DD is the “number
of categories”. Another way is to treat DD as a
hyperparameter and we can tune on a downstream task. In
large scale production, embedding features are usually pre-
computed and stored in key/value storage to reduce inference
latency.
performance. For certain applications, like natural language processing
(NLP), we can also visualize embeddings using t-SNE (t-distributed
stochastic neighbor embedding), EMAP. We can look for clusters in 2-3
dimensions and validate if they match with your intuition. In practice,
most embedding built with engagement optimization does not show any
clear structure, UMAP (Uniform Manifold Approximation and
Projection for Dimension Reduction).
Apply clustering (kmeans, k-Nearest Neighbor) on embedding data and
see if it forms meaningful clusters.
How Do We Measure Similarity?
To determine the degree of similarity, most recommendation systems rely on
one or more of the following.
Cosine: it’s the cosine of the angle between the two vectors
s(q,x)=cos(q,x)s(q, x) = cos (q, x)
Dot Product s(q,x)=∑i=1d(qi*xi)s(q, x) = sum_{i=1}^d (q_i*x_i) You
will also see how LinkedIn uses Hadamard product in their embedding
model (read section Embedding [subsec-embedding]).
Euclidean distance s(q,x)=[∑i=1d(qi−xi)2]12s(q, x) =
left[{sum_{i=1}^d (q_i-x_i)^2}right]^frac{1}{2} The smaller the
value the higher the similarity.
Important Considerations
Dot product tends to favor embeddings with high norm. It’s more
sensitive to the embeddings norm compared to other methods. Because
of that it can create some consequences
Popular content tends to have higher norms, hence ends up
dominating the recommendations. How do you fix this? Can you
think of parameterized dot production metrics?
If we use bad initialization in our network and the rare content is
initialized with large values, we might end up recommending rare
content over popular content more frequently.
Numeric Features
Normalization
For numeric features, the normalization must have mean 00 and range [−1,1]
[-1, 1]. There are some cases where you want to normalize data to the range
[0,1][0, 1].
v=v−min_of_vmax_of_v−min_of_vv = frac{v - text{min_of_}v}
{text{max_of_}v - text{min_of_}v} where, vv is feature value,
min_of_vv is min of feature value, max_of_vv is max of feature value.
Standardization
If the feature distribution resembles a normal distribution, we can apply a
standardized transformation.
v=v−mean_of_vstd_of_vv = frac{v - text{mean_of_}v}
{text{std_of_}v} where, vv is feature value, mean_of_vv is min of feature
value, std_of_vv is the standard deviation of feature value
If the feature distribution resembles power laws we can transform it by using
the formula: log(1+v1+median_of_v)logleft(frac{1 + v}{1 +
text{median_of_}v}right) In practice, normalization can cause an issue
because the values of min and max are usually outliers. One possible solution
is “clipping”, where we pick a “reasonable” value for min and max.
Netflix uses raw, continuous timestamp indicating the time
when the user played a video in the past, a long with current
time when making a prediction. They observed 30% increase
in offline metrics. It leads to another challenge since the
production model will al- ways use the current timestamp,
which was never observed in the training data. To handle this
Summary
We learn how to handle numerical features: normalization and
standardization. In practice, we can often apply log transformation when
features values are big with very high variance.
For sparse features, there are multiple ways to handle them: one hot
encoding, feature hashing, and entity embedding.
With entity embedding, there are two popular ways to train embedding:
pre-trained and co-trained. The common technique is to use engagement
data and train the two-tower network model. One interesting challenge is
how to select labels data when training entity embedding. We will
explore some solutions in the later chapters.
situation, production models are regularly retrained.
Feature Selection and Feature Engineering Quiz
We have a table with columns UserID, CountryID, CityID,
Zipcode, Age. Which of the following feature engineering is
suitable to present data in machine learning algorithm?
answer
[A] Apply one hot encoding for all columns
[B] Apply embedding for CountryId, CityID; One Hot
Encoding for UserID, Zipcode; apply normalization for Age
[C] Apply embedding for CountryId, CityID, UserID, Zipcode
and apply normalization for Age
Training Pipeline
Training pipeline needs to handle large volumes of data at a low cost. One
common solution is to store data in a column-oriented format like Parquet,
Avro, or ORC. These data formats enable high throughput for ML and
analytics use cases because they are column-based. In other use cases,
tfrecord (TensorFlow format for storing a sequence of binary records) data
format is widely used in TensorFlow ecosystem.
Data Partitioning
Parquet and ORC files usually get partitioned by time for efficiency so that
we can avoid scanning through the whole dataset. It’s also beneficial for
parallel training and distributed training. In this example, we partition data by
year, then by month. In practice, the most common services on AWS,
RedShift (Amazon fully managed, petabyte-scale data warehouse service in
the cloud), and Athena (interactive query service that makes it easy to
analyze data in Amazon S3 using standard SQL) support Parquet and ORC.
Compared to other formats like CSV, Parquet can speed up queries by a
factor of 30×30times faster, saving 99% cost and reducing 99% data
scanned.
Partition Data. Source: mlengineer.io
Data is partitioned by year and month
Within each partition (year = 2020, month = 02), we have all data stored
in parquet format.
Handle Imbalance Class Distribution
In machine learning use cases like fraud detection, click prediction or spam
detection, it’s common to have imbalance labels. For example, in ad click
prediction, it’s very common to have 0.2% conversion rate. If there are 1,000
clicks, only two clicks lead to some desired actions, such as installing the app
or buying the product. Why is this a problem? With too few positive
examples compared to negative examples, your model spends most of the
time learning about negative examples.
There are few strategies to handle them.
Use class weights in the loss function. For example, in spam detection
problems, where non-spam data might account for 95% of data compared to
other spam data that is only 5%, we want to penalize more on the major class.
In this case, we can modify the entropy loss function using weight.
Use naive resampling: resample major class at a certain rate to reduce the
imbalance in the training set. It’s important to have validation data and test
data intact (no resampling).
Use synthetic resampling: synthetic minority over-sampling technique
(SMOTE) consists of synthesizing elements for the minority class, based on
those that already exist. It works by randomly picking a point from the
minority class and computing the k-nearest neighbors for this point. The
synthetic points are added between the chosen point and its neighbors. For
//w0 is weight for class 0,
//w1 is weight for class 1
loss_function = -w0 * ylog(p) - w1*(1-y)*log(1-p)
practical reasons, SMOTE is not as widely used as other methods. In practice,
this method is not commonly used, especially for large-scale applications.
Resample Data. Source: imbalanced-lean.org
Common Resampling Use Cases
Due to the huge data size, it’s more common for big companies like
Facebook and Google to use downsampling for the dominant class. For
training pipeline, if your feature store has a SQL interface, you can use the
built-in rand() function for downsampling your dataset.
For deep learning models, we can sometimes use downsample as the majority
class examples and then upweight them. It helps the model train faster and
calibrate the model well with the true distribution.
example_weight=original_weight*downsampling_factortext{example_weight}
= text{original_weight} * text{downsampling_factor}
//sampling 10% of the data, source: nqbao.medium.com
SELECT
d.*
FROM dataset d
WHERE RAND() < 0.1
Quiz about Weight for Positive Class
Data Generation Strategy
When we first start a new problem that requires machine learning, especially
when supervised learning is more suitable, we have to answer the question of,
"How do we get labels data?"
LinkedIn feed ranking: We can generate label data by order feeds
chronologically first to collect data.
Facebook place recommendation: We can use places people like first
and then use them as positive labels. For negative labels, we can either
sample all other places as negative samples or pick all places that users
saw but didn’t like as negative samples.
How LinkedIn Generates Data for Course Recommendation
Design Machine Learning solution for Course Recommendations on
LinkedIn Learning.
Problem
At the beginning, the main goal of Course Recommendations is to acquire
new learners by showing highly relevant courses to learners. There are few
challenges:
If a dataset contains 100 positive and 300 negative examples
of a single class, what should be the weight for the positive
class?Answer
[A] Positive weight is 300/100=3300/100 = 3.
[B] Positive weight is 100/300=0.333100/300 = 0.333.
[C] It depends; can’t tell.
Lack of label data: if we have user activities (browse, click) available,
we can use these signals as implicit labels to train supervised model. As
we’re building this LinkedIn Learning system, we don’t have any
engagement signals yet. This is also called Cold start problem.
One way to deal with it is to rely on user survey during their on-
boarding process, i.e: ask learners which skills they want to
learn/improve. In practice, it’s usually insufficient.
Let’s take a look at one example: given learner Khang Pham with skills:
BigData, Database, Data Analysis in his LinkedIn profile. Assume we have
two courses: Data Engineering and Accounting 101, should we recommend
Data Engineering or Accounting course? It’s self-explained that Data
Engineering would be a better recommendation because it’s more relevant to
this user’s skillset. This lead us to one idea: we can use skills as a way to
measure relevance. If we can map learners to Skills and map Course to Skills,
we can measure and rank relevance accordingly.
Skill-based Model. Source: LinkedIn
Course to Skill: Cold Start Model
There are various techniques to build the mapping from scratch.
Manual tagging using taxonomy (A). All LinkedIn Learning courses are
tagged with categories. We asked taxonomist to perform mapping from
categories to skills. This approach helps us acquired high precision
human-generated courses to skill mapping. On the other hand, it doesn’t
scale i.e: low coverage.
Leverage LinkedIn skill taggers (B): leverage LinkedIn Skill Taggers
features to extract skill tags from course data.
Use supervised model: train a classification model such that for a given
pair (course, skill): return 1 if the pair is relevant and 0 otherwise.
Label data: collect samples from A and B as positive training data.
We then random samples from our data to create negative labels.
We want our training dataset to be balance.
Features: course data (title, description, categories, section names,
video names). We also leverage skill-to-skill similarity mapping
features.
Disadvantage: a) relies heavily on the quality of the skill-taggers b)
one single logistic regression model might not be able to capture
the per-skill level effects.
Use Semi supervised learning.
We learn a different model for each skill, as opposed to one
common model for all (course, skill) pairs.
Data Augmentation: leverage skill-correlation graph to add more
positive labels data. For example: if SQL is highly relevant to Data
Analysis skill then we can add Data Analysis to training data as
positive labels.
Evaluation: offline metrics
Skill-coverage: measure how many LinkedIn standardized skills are
present in the mapping.
Precision and Recall: we treat course to skill mapping from human
as ground truth. We can evaluate our classification models using
precision and recall.
Member to Skill
Member to skill via profile: LinkedIn users can add skills to their profile
by entering free-form text or choosing existing standardized skills. This
mapping is usually noisy and needs to be standardized. In practice, the
coverage is not high since not many users provide this data. We also
train supervised model
p(user_free_from_skill,standardized_skill)p(user_free_from_skill,
standardized_skill) to provide a score for the mapping.
Member to skill using title and industry: in order to increase the
coverage we can use cohort-level mapping. For example: user Khang
Pham work in Ad Tech industry and title Machine Learning Engineer
and he didn’t provide any skill set in his profile. We can rely on cohort
of Machine Learning Engineer in Ad Tech to infer this user’s skills. We
then combine the profile-based mapping using weight combination with
cohort-based mapping.
Member to skill
Skill
Profile-
based
mapping
Cohort-
based
mapping
Weight Weight Final mapping
SQL 0.01 0.5 w1 w2
0.01*w1+0.5*w20.01*w1
+ 0.5*w2
Database 0.3 0.2 w1 w2
0.3*w1+0.2*w20.3*w1 +
0.2*w2
Further reading Learning to be Relevant8
How to Split Train/Test Data
This consideration is often overlooked but very important in the production
environment.
In forecast or any time-dependent use cases, it’s important to respect the
chronological order when you split train and test data.
For example, it doesn’t make sense to use data in the future to “forecast”
data in the past.
For sales forecast use case, we want to forecast sales for each store. If
we randomly split data by storeID, that train data might not have data
for some stores. Hence, the model can’t forecast for such stores. In
practice, we need to consider split data so that we can have storeId in
train data as well as test data.
Uber Forecast Model Evaluation. Source: Uber
Sliding Window
First, we select data from day 0 to day 60 as the train set and day 61 to
day 90 as the test set.
Then, we select data from day 10 to day 70 as the train set and day 71 to
day 100 as the test set.
Expanding Window
First, we select data from day 0 to day 60 as train set and day 61 to day
90 as test set.
Then we select data from day 0 to day 70 as train set and day 71 to day
100 as test set.
Retraining Requirements
Retraining is a requirement in many tech companies. In practice, the data
distribution is a nonstationary process, so the model does not perform well
without retraining.
In AdTech and recommendation/personalization use cases, it’s important to
be able to retrain models to capture changes in users’ behavior and trending
topics. So the machine learning engineers need to make the training pipeline
run fast and scale well with big data. When you design such a system, you
need to balance between model complexity and training time.
The common design pattern is to have a scheduler retrain the model on a
regular basis, usually many times per day.
Four Levels of Retraining
Level 0: Train and forget. Train the model once and never retrain it
again. This is appropriate for the ’stationary’ problem.
Level 1: cold-start retraining: Periodically retrain the whole model on a
batch dataset.
Level 2: Near-line retraining: Similar to level 2, we retrain model per-
key components individually and asynchronously nearline on streaming
data.
Level 3: warm-start retraining: If the model has personalized per-key
components, retrain only these in bulk on data specific to each key (e.g.,
all impressions of an advertiser’s ads) once enough data has
accumulated.
Four Levels of Model Retraining - High Level. Source: LinkedIn
Loss Function and Metrics Evaluation
In this section, we will focus on Regression and Classification and use cases.
Choosing loss functions and determining which metrics to track is one of the
most important parts of Machine Learning products/services.
Regression Loss
Mean Square Error and Mean Absolute Error
Mean Square Error is one of the most common loss metrics in regression
problems. MSE=1N∑i=1n(targeti−predictioni)2MSE = frac{1}{N}
sum_{i=1}^n (text{target}_i - text{prediction}_i)^2
Mean Absolute Error
MAE=1N∑i=1n|targeti−predictioni|MAE = frac{1}{N} sum_{i=1}^n
|text{target}_i - text{prediction}_i|
MSE table
ActualPrediction
Absolute
Error
Square
error
30 0 0 0
32 29 3 9
31 33 2 4
35 36.8 1.8 3.24
In this example, MAE is 1.7(6.8/4)1.7 (6.8/4) and MSE is 4.06(16.24/4)4.06
(16.24/4).
MAE table
ActualPrediction
Absolute
Error
Square
error
30 0 0 0
32 32 0 0
31 30 1 1
50 35 15 225
In this example, MAE is 4(16/4)4 (16/4) and MSE is 56.5(226/4)56.5
(226/4). With one outlier value (50), it causes MSE error to increase
significantly.
In practice, we always need to look for the outlier. If we have an outlier in
our data, it will make the MSE loss model give more weight to the outlier
than a MAE loss model. In that case, using MAE loss is more intuitive since
it’s more robust to an outlier.
Huber Loss
Huber Loss fixed the outlier-sensitive problem of MSE, and it’s also
differentiable at 00 (since MAE’s gradient is not continuous). The idea is
pretty simple: if the error is not too big, Huber loss uses MSE; otherwise, it’s
just MAE with some penalty.
12(target−prediction)2frac{1}{2}(text{target} - text{prediction})^2, if
|target−prediction|<=delta|text{target} - text{prediction}| <=
text{delta}
delta|target−prediction|−12delta2text{delta}, |text{target} -
text{prediction}| - frac{1}{2},text{delta}^2, otherwise
The problem with Huber loss is that we need to tune the hyperparameter
delta.
Quantile Loss
In certain applications, we value underestimation vs. overestimation
differently. If you build a model to estimate arrival time, you don’t want to
overestimate; otherwise, customers might not make orders/requests, etc.
Quantile loss can give more value to positive error or negative error.
∑y<plambda*|y−p|+∑y>=p(lambda−1)*|y−p|sum_{y < p} text{lambda}*|y
- p| + sum_{y >= p} (text{lambda}-1) * |y - p|
If you set lambda to 0.50.5, it becomes MAE.
It depends on the use case to decide when to use which loss function. For
binary classification, the most popular one is cross_entropy. In the Ad Click
prediction problem, Facebook uses Normalized Cross Entropy loss (a.k.a. log
loss) to make the loss less sensitive to background conversion rate.
How Facebook Uses Normalized Cross Entropy for AdClick
Prediction?
Problem: Suppose we build a machine learning model to predict click/not-
click for an Ads System. We build two models: fixed prediction model and
fancy model.
The fixed prediction model always predicts probability(click) = 0.2. The
fancy model has slightly ‘better’ intuition; for positive labes,l it predicts 0.3
and for negative labels, it predicts 0.1, which is intuitive and better than a
random guess.
Intuitively, the fancy model should perform better because it doesn’t
predict(click) with a constant value.
Fixed Prediction Model Cross
Entropy Loss
Model
Fixed
Prediction
Uber uses pseudo-Huber loss and log-cosh loss to
approximate Huber loss and Mean Absolute Error in their
distributed XGBoost training. Doordash Estimated Time
Arrival models uses MSE then they move to Quantile loss and
Custom Asymmetric MSE
Actual Predicted Model Cross
Entropy Loss
1 0.2 1.6094373
-1 0.2 0.22314353
-1 0.2 0.22314353
-1 0.2 0.22314353
-1 0.2 0.22314353
-1 0.2 0.22314353
-1 0.2 0.22314353
-1 0.2 0.22314353
-1 0.2 0.22314353
-1 0.2 0.22314353
The click-through rate is 110frac{1}{10}. The overall cross entropy loss is
0.361772950.36177295
Fancy Model Cross Entropy Loss
Actual
Model
Predicted
Fancy Model
Cross Entropy
loss
1 0.3 1.2039728
1 0.3 1.2039728
1 0.3 1.2039728
1 0.3 1.2039728
1 0.3 1.2039728
-1 0.1 0.105360545
-1 0.1 0.105360545
-1 0.1 0.105360545
-1 0.1 0.105360545
-1 0.1 0.105360545
The click-through rate is 12frac{1}{2} and cross entropy is
0.654666660.65466666.
Given smaller cross entropy loss, does the fixed prediction model perform
better than the fancy model? In the two training data sets, the difference is
that we have different underlying CTR. This is why Facebook and other big
tech companies favor Normalized Cross Entropy9 (NCE).
NCE=logloss(model)logloss(rate)text{NCE} = frac{text{logloss(model)}}
{text{logloss(rate)}}
Properties of NCE:
Always non-negative.
Only 0 if your predictions match the labels perfectly.
Unbounded; can grow arbitrarily large.
Intuitive scale: NCE < 1: the model has learned something. NCE > 1:
the model is less accurate than always predicting the average.
Assume a given training data set has NN examples with labels yi∈−1,+1y_i
in {−1, +1} and estimated probability of click pi where i=1,2,...Ni = 1, 2,
...N. The average empirical CTR as pp
NCE=−1N∑i=1n(1+yi2log(pi))+(1−yi2log(1−pi))−(p*log(p)+
(1−p)*log(1−p))NCE = frac{-frac{1}{N} sum_{i=1}^n left(frac{1+y_i}
{2} log(p_i)right) + left(frac{1-y_i}{2}log(1-p_i)right)} {-(p*log(p) +
(1-p)*log(1-p))}
The lower the value, the better the model’s prediction.
The reason for this normalization is that the closer the background CTR
is to either 0 or 1, the easier it is to achieve a better log loss.
Dividing by the entropy of the background CTR makes the NE
insensitive to the background CTR.
In the above example, model 1 has NCE =
0.361772950.325083frac{0.36177295}{0.325083} = 1.11 and model 2 has
NCE = 0.654666660.6931472frac{0.65466666}{0.6931472} = 0.945.
Forecast Metrics
In forecast problems, the most common metrics are Mean Absolute
Percentage Error (MAPE) and Symmetric Mean Absolute Percentage Error
(SMAPE). For MAPE, one needs to pay attention if your target value is
skewed (i.e., either too big or too small). On the other hand, SMAPE is not
symmetric as it treats under-forecast and over-forecast differently.
Mean Absolute Percentage Error
M=1n∑t=1n[At−FtAt]M = frac{1}{n} sum_{t=1}^n left[frac{A_{t}-
F_{t}}{A_{t}}right]
where,
M=M = mean absolute percentage error
n=n = number of samples
At=A_{t} = actual value
Ft=F_{t} = forecast value
Mean Absolute Percentage Error
Actual
Model
Predicted
Absolute
Percentage
Error
0.5 0.3 0.4
0.1 0.9 8.0
0.4 0.2 0.5
0.15 0.2 0.334
In the second row, since the prediction is too high, we have a percentage
error of 8.0. When we calculate the mean of all the errors, the MAPE metric
value becomes too high and hard to interpret.
Advantages
Expressed as a percentage, which is scale-independent and can be
used for comparing forecasts on different scales. We should
remember, though, that the values of MAPE may exceed 100%.
Easy to explain to stakeholders.
Disadvantage
MAPE takes undefined values when there are zero values for the
actual, which can occur, for example, demand forecasting.
Additionally, it takes extreme values when the actual is very close
to zero.
MAPE is asymmetric, and it puts a heavier penalty on negative
errors (when forecasts are higher than actual) than positive errors.
This is caused by the fact that the percentage error cannot exceed
100% for forecasts that are too low. There are no upper limits for
the forecasts that are too high. As a result, MAPE will favor models
that under-forecast rather than over-forecast.
Symmetric Absolute Percentage Error
SMAPE=100%n∑t=1n[Ft−At]([At]+[Ft])/2text{SMAPE} = frac{100%}
{n} sum_{t=1}^n frac{[F_{t}-A_{t}]}{([A_{t}] + [F_{t}])/2}
Advantages
Fixes the shortcoming of the original MAPE — it has both the
lower (0%) and the upper (200%) bounds.
Disadvantage
Unstable when both the true value and the forecast are very close to
zero. When it happens, we will deal with division by a number very
close to zero.
SMAPE can take negative values, so the interpretation of an
“absolute percentage error” can be misleading.
The range of 0% to 200% is not that intuitive to interpret.
Therefore, the division by the 2 in the denominator of the SMAPE
formula is often omitted.
Other companies also use machine learning and deep learning for forecast
problems. For example, Uber uses different algorithms like recurrent neural
networks (RNNs), gradient boosting trees, and support vector regressor for
various problems. Some problems include marketplace forecasting, hardware
capacity planning, and marketing.
Classification Loss
In this section, we will focus more on the less popular metrics: focal loss and
hinge loss.
Focal Loss
When handling an imbalance class during training, a situation arises in which
there are easy samples and hard samples. How can we make the model focus
more on the hard examples? Focal loss10 addresses this by adding weight in
such a way that if the samples are easy, the loss value is small and vice versa.
If we set γgamma as 0, it becomes traditional cross entropy.
FL(pt)=−(1−p)γlog(pt)FL(p_t) = -(1-p)^gamma log(p_t)
When do we use it? Focal loss makes it easy for model to learn. It’s popular
in Objection Detection.
Random documents with unrelated
content Scribd suggests to you:
Machine Learning Design Interview Machine Learning System Design Interview Khang Pham
Machine Learning Design Interview Machine Learning System Design Interview Khang Pham
Machine Learning Design Interview Machine Learning System Design Interview Khang Pham
The Project Gutenberg eBook of
Mistress Nancy Molesworth: A Tale of
Adventure
This ebook is for the use of anyone anywhere in the United
States and most other parts of the world at no cost and with
almost no restrictions whatsoever. You may copy it, give it away
or re-use it under the terms of the Project Gutenberg License
included with this ebook or online at www.gutenberg.org. If you
are not located in the United States, you will have to check the
laws of the country where you are located before using this
eBook.
Title: Mistress Nancy Molesworth: A Tale of Adventure
Author: Joseph Hocking
Release date: February 26, 2017 [eBook #54239]
Most recently updated: October 23, 2024
Language: English
Credits: Produced by Martin Pettit and the Online Distributed
Proofreading Team at http://www.pgdp.net (This file
was
produced from images generously made available by
The
Internet Archive)
*** START OF THE PROJECT GUTENBERG EBOOK MISTRESS NANCY
MOLESWORTH: A TALE OF ADVENTURE ***
Transcriber's Note:
Obvious typographic errors have been
corrected.
Machine Learning Design Interview Machine Learning System Design Interview Khang Pham
MISTRESS NANCY
MOLESWORTH
A TALE OF ADVENTURE
BY
Joseph Hocking
Author of "The Birthright," etc.
NEW YORK
DOUBLEDAY & McCLURE CO.
1898
Copyright, 1898, by
DOUBLEDAY & McCLURE CO.
Press of J. J. Little & Co.
Astor Place, New York
Machine Learning Design Interview Machine Learning System Design Interview Khang Pham
Contents
CHAPTER PAGE
I.—Trevanion, 1
II.—Peter Trevisa's Offer, 10
III.—Crossing the Rubicon, 24
IV.—My Journey to Endellion, 37
V.—My First Night at Endellion, 51
VI.—The Uses of a Serving-Maid, 67
VII.—On the Roof of Endellion Castle, 82
VIII.—Otho Discovers My Name, 95
IX.—Benet Killigrew as a Wrestler, 111
X.—The Escape from Endellion, 125
XI.—My Fight with Benet Killigrew, 139
XII.—Roche Rock, 153
XIII.—The Wisdom of Gossiping with an Innkeeper, 168
XIV.—The Haunted Chapel of St. Mawgan, 181
XV.—The Scene at a Wayside Inn, 195
XVI.—Why I Took Nancy to Treviscoe, 210
XVII.—The Charge of Treason, 224
XVIII.—Otho Killigrew's Victory, 239
XIX.—Launceston Castle, 251
XX.—I Escape from the Witch's Tower, 267
XXI.—Describes My Journey from Launceston Castle
to a Lonely Mansion Accompanied by Two Women, 285
XXII.—Mistress Nancy Tells Me Many Things, 301
XXIII.
—In Which it is Shown that Uncle Anthony Was More
than a Droll,
315
XXIV.—Otho Killigrew Uses an Old Proverb, 330
XXV.—How January Changed to June, 344
XXVI.—I Fall Into Otho Killigrew's Hands, 358
XXVII.
—How Benet Killigrew and I Fought in the Light of the
Beacon Fire,
371
XXVIII.—Otho Killigrew's Last Move, 386
XXIX.—The King's Gratitude, 400
XXX.—In Which Uncle Anthony Plays His Harp, 414
MISTRESS NANCY MOLESWORTH
CHAPTER I.
TREVANION.
The only part of my history which I regard as worthy of placing on
record is confined to a few months. I was thirty-two years of age at
the time, and had thus entered into the very summer of my life. At
that age a man's position ought to be assured; at any rate his career
should be marked out with tolerable plainness. Such, however, was
not my fortune. Although I bear one of the best known and most
honoured names in my native country, I, Roger Trevanion, was in
sore straits at the time of which I write. And this not altogether
because of my own faults. I did not come into the possession of my
heritage until I was thirty, my father having retained absolute control
of his estate until his death. Up to that time I knew nothing of his
money matters. Neither, indeed, did I care. I had enough for my
own use; I possessed good horses and was able to enjoy what
festivities the county provided, to the full. Ever since my mother's
death, which took place when I was fourteen, my father paid me but
little attention. He saw to it that I was taught to ride, fence, shoot,
with other accomplishments befitting my station, and then allowed
me to follow my own inclinations. As a consequence I became a gay
fellow, being guilty, I am afraid, of most of the misdemeanours
common to young men. I remembered that I was a Trevanion,
however, and while I did not belong to the most important branch of
the family, I held to the code of honour to which for many
generations we had been true.
I knew that my father gambled freely, and had many relations with
people which were beyond my comprehension. I did not trouble
about this, however. Very few restraints were placed upon me, and I
was content.
When my father died, I discovered that I was a poor man. I had still
the semblance of wealth. I lived in the old house, and was supposed
to own the lands surrounding it. The old servants still called me
master, and the farmers paid their rents to me as they had paid
them to my fathers. In reality, however, everything was mortgaged
for nearly all it was worth. True, the lawyer told me that if I would
discharge a number of superfluous servants, get rid of a number of
useless horses, and consent to the sale of a quantity of timber, I
could by practicing the strictest economy for ten years, place
everything on a satisfactory footing.
"That will mean that I must give up hunting, racing, drinking,
betting, besides closing the house and living like a hermit, I
suppose?" I said to him. "That does not suit me. Is there no other
way?"
"Yes, there is one," he replied.
"And that?"
"A suitable marriage."
I shrugged my shoulders.
"Women are not in my way, Mr. Hendy," I said. The truth was, I had
fancied myself in love when I was twenty, with the daughter of John
Boscawen, a distant relation of the famous Boscawens. She had led
me on until I was mad about her. I was her slave for several months,
and she treated me as though I were a dog of the fetch-and-carry
breed. Presently a young fellow from a place near Penzance,
Prideaux by name, came to her father's place, and no sooner did he
start a-courting her than she sent me about my business, drove me
away in fact, as though I were a cur. Since that time I had hated
women, and I grew angry at the thought of ever being expected to
put confidence in one.
"The state of your affairs is not generally known," persisted the
lawyer, "and a wife with a handsome dowry would mean getting
back the deeds."
"No petticoats for me," I replied angrily.
"But if the petticoats mean comfort and freedom from money cares,
would you not be wise to put aside your prejudice against them?"
"Anything but that," I cried, remembering Amelia Boscawen.
"Retrenchment or a wife," persisted the lawyer.
"Neither," I cried, angry that directly I came into my heritage I
should find myself in such a fix.
The lawyer sighed.
"From whom did my father borrow?" I asked presently.
"Peter Trevisa," he replied.
I knew the man slightly. A little, shrivelled-up, old creature who had
married late in life, and who had one son whom we called "Young
Peter," because he was so much like his father. Young Peter was not
so old as I, and I had never been friendly with him. In fact I had
despised him as a ferrety kind of fellow, with whom I had nothing in
common.
"He holds you like that," said the lawyer, putting out his hand and
clasping it.
A great deal more was said, but to no purpose, and I went on as I
had gone before. True, I discharged one or two of the younger
servants and sold a quantity of timber, but I did not retrench as the
lawyer advised. Thus at the end of two years I was, if possible, in a
worse position than when my father died.
One day—and here my story really begins—I rode off to a fox hunt. I
still held my head high, and rode the best horse in the field. I was
careful, too, to be well dressed, and I prided myself that in spite of
my poverty I was inferior to none. I was young, regarded as
handsome, stood over six feet in my stockings, and was well set up.
As usual I avoided women, although there were many at the meet.
Although one of the heaviest men there, I kept well ahead through
the day, and in spite of the weight of my debts I was in at the death.
After the hunt I went to Geoffry Luxmore's ball, which was a part of
the day's programme, but I did not join the dancers. I wanted to be
free from women, and therefore accepted an invitation to take part
in a game of cards.
While sitting at dinner I saw old Peter Trevisa. He nodded to me in a
friendly way. Afterward he came to me and caught me by the arm.
"And how are matters going at Trevanion, eh, lad?" he asked.
"Grandly," I replied gaily, for I was heated with good wine and I felt
no cares.
"Thou shouldst be in the dancing-room, lad," he said. "There's many
a fine maid there; many with a big dowry. Geoffry Luxmore's
daughter should suit thee well, Roger."
"No women for me," I cried.
"No; dost a hate them so?"
I shrugged my shoulders.
"Then my Peter'll be getting Trevanion, Roger?" he said with a leer.
In spite of my excitement I felt uneasy as I looked at his eyes.
"I've been thinking about calling in my mortgage," he said.
"Do," I replied.
"Ah, sits the wind in that quarter, eh? Well, Roger, thou hast always
been a dare-devil fellow. But a landless Trevanion will be a sorry
sight."
"There never has been one yet."
"And if thou art the first, 'twill be a sorry business."
I felt more uncomfortable, so I swallowed a large bumper of wine to
keep my spirits up.
Presently we sat down to play. I won, I remember, freely at first, and
was in high good humour.
"Luck seems with thee to-night," said old Peter Trevisa. "After all, it
seems thou'st done well to come here rather than go a-dancing with
the maidens yonder."
As he spoke the music ceased, and on looking up I saw Ned
Prideaux, the fellow who had stolen Amelia Boscawen from me,
come into the room.
I don't know that I felt any enmity toward him; the only wrong
feeling I had for him was on account of my pride. That he should
have been preferred before me wounded my vanity.
Old Peter Trevisa knew of the business, and laughed as he came up.
"Thou didst beat him in courting, lad," he said to Prideaux, "let's see
if thou canst beat him at playing."
This he said like one who had been drinking a good deal. And
although I had not seen him making free with wine, I fancied he
must be fairly drunk; consequently I did not resent his words.
Besides, I was in high good humour because of my winnings.
"I'll take a hand with pleasure," answered Prideaux. He wiped his
brow, for he had been dancing, and sat down opposite me.
I broke a fresh bottle of wine, and we commenced playing. Fool that
I was, I drank freely throughout the evening, and presently I
became so excited that I hardly knew what I was doing. Several
fellows gathered around to watch us, and the stakes were high. I
had not been playing with Prideaux long before my luck turned. I
began to lose all I had gained. Old Peter Trevisa chuckled as he saw
that the cards were against me.
"Give it up, Roger," he said in a sneering kind of way; "Trevanion
can't stand bad luck, lad."
This wounded my pride. "Trevanion can stand as much as I care to
let it stand," I replied, and I laid my last guinea on the table.
Presently Mr. Hendy, the old family lawyer, came to my side.
"Be careful, Mr. Trevanion," he whispered, "this is no time for ducks
and drakes."
But I answered him with an oath, for I was in no humour to be
corrected. Besides, wild and lawless as I had been for several years,
I remembered that I was a Trevanion, and resented the family
attorney daring to try to check me in public.
"He won't listen to reason, Hendy," sneered old Peter Trevisa. "Ah,
these young men! Hot blood, Hendy, hot blood; we can't stop a
Trevanion."
I had now lost all my money, but I would not stop. Old Trevisa
standing at my elbow offering sage advice maddened me. I blurted
out what at another time I would not have had mentioned on any
consideration.
"You have a stake in Trevanion, Trevisa," I cried angrily.
"Nonsense, nonsense, Roger," whispered the old man, yet so loudly
that all could hear.
"You have," I cried, "you know you have. If I paid you all you lent
my father, there would be little left. How much would the remnant
be?"
"We'll not speak of that," laughed the old man.
"But we will," I said defiantly, for what with wine, and bad luck, and
the irritation of the old man's presence I was beside myself. "What
more would you lend on the estate?"
He named a sum.
"I'll play you for that sum, Prideaux," I cried.
"No," replied Prideaux; "no, Trevanion, you've lost enough."
"But I will!" I replied angrily.
"No," said Prideaux, "I'm not a gamester of that order. I only play for
such sums as have been laid on the table."
"But you shall!" I cried with an oath; "you dare not as a gentleman
refuse me. You've won five hundred guineas from me this very night.
You must give me a chance of winning it back."
"Luck is against you, Trevanion," replied Prideaux. "It shall never be
said of me that I won a man's homestead from him. I refuse to
play."
"Prideaux has won a maid from you!" laughed old Trevisa with a
drunken hiccup. "Be careful or he'll take Trevanion, too."
"I'll never play for the land," cried Prideaux again.
"But you shall," I protested. "If you refuse you are no gentleman,
and you will act like a coward to boot."
"Very well," replied Prideaux coolly, "it shall be as you say."
We arranged our terms and commenced playing again.
Half an hour later I had lost the sum which old Peter Trevisa said he
could further advance on Trevanion. I do not think I revealed my
sensations when I realized that I had lost my all, but a cold feeling
came into my heart nevertheless.
"Trevanion," said Prideaux, "we'll not regard the last half-hour's play
as anything. It was only fun."
"That will not do," I replied. "We have played, and I have lost; that
is all."
"But I shall not take——"
"You will," I cried. "You have played fairly, and it is yours. I will see
to it at once that the amount shall be handed to you."
"I will not take it," cried Prideaux. "I absolutely refuse."
I know I was mad; my blood felt like streams of molten fire in my
veins, but I was outwardly cool. The excitement I had previously
shown was gone. Perhaps despair helped me to appear calm.
"Look you, Peter Trevisa," I said; "you give Prideaux a draft for that
money."
"Roger, Roger," said the old man coaxingly, "take Prideaux's offer. He
won your maid; don't let him win Trevanion too. You'll cut a sorry
figure as a landless Trevanion."
I seized a pen which lay near, and wrote some words on a piece of
paper.
"There," I said to Prideaux as I threw it to him, "it shall not be said
that a Trevanion ever owed a Prideaux anything, not even a gaming
debt. Gentlemen, I wish you good-night."
I left the room as I spoke and ordered my horse. I was able to walk
straight, although I felt slightly giddy. I scarcely realized what I had
done, although I had a vague impression that I was now homeless
and friendless. A ten-mile journey lay before me, but I thought
nothing of it. What time I arrived at Trevanion I know not. My horse
was taken from me by an old servant, and without speaking a word
to any one I went straight to bed.
Machine Learning Design Interview Machine Learning System Design Interview Khang Pham
CHAPTER II.
PETER TREVISA'S OFFER.
The next morning I awoke with terrible pains in my head, while my
heart lay like lead within me. For some time I could not realize what
had happened; indeed, I hardly knew where I was. It was broad
daylight, but I could not tell what the hour was. Presently a clock
began to strike, and then I realized that I lay in my own bed at
Trevanion and that the clock stood in the turret of my own stables. I
counted the strokes. It stopped at eleven. No sooner had it ceased
than all that had happened the previous night flashed through my
mind. I jumped out of bed and looked out of the window. Never had
the place seemed so fair to look upon, never had the trees looked so
large and stately. And I was burdened with the dread remembrance
that it was no longer mine. When I had dressed I tried to face the
matter fairly. I tried to understand what I had done. The more I
thought about it the more I cursed myself for being a fool. For I felt
how insane I had been. I had drunk too much wine, I had allowed
myself to become angry at old Peter Trevisa's words. I had blurted
out truths which under other circumstances I would rather have
bitten my tongue in two than have told. I had acted like a madman.
Wild, foolish as I had been in the past, that night was the climax of
my folly. Why had old Peter Trevisa's presence and words aroused
me so?
The more I thought the sadder I became, the darker did my
prospects appear. I had given Prideaux a written guarantee for the
money I had been unable to pay. That piece of paper meant my
ruin, if he took advantage of it. Would he do this? Yes, I would see
that he did. In extremities as I was, I would rather sacrifice the land
than violate our old code of honour.
I heard a knock at the door, and a servant entered.
"From Mr. Trevisa of Treviscoe, sir," he said.
I am afraid my hand trembled slightly as I took the letter.
"Who brought it, Daniel?" I asked.
"A servant, sir."
"Let breakfast be ready in ten minutes, Daniel; I'll be down by that
time."
"Yes, sir."
I broke the seal of the letter and read it. I soon discovered that it
was written by young Peter Trevisa. For, first of all, it was written in
a clear hand and correctly spelt, and I knew that old Peter's writing
was crabbed and ill-shapen; besides which, the old man had not
learnt the secret of stringing words together with anything like ease.
The contents of the epistle, too, revealed the fact that the son, and
not the father, acted as scribe. The following is an exact transcript
thereof:
"Treviscoe the 25th day of March in the year 1745.
"To Roger Trevanion, Esq., of Trevanion.
"Dear Sir:—The events of last night having altered their
complexion somewhat after you left the house of Geoffry
Luxmore, Esq., and the writing which you gave to Mr. Edward
Prideaux having changed hands, with that gentleman's consent,
it has become necessary for you to visit Treviscoe without delay.
My father has therefore instructed me to write (instead of
employing our attorney, who has up to the present conducted
all correspondence relating to my father's connections with
Trevanion) urging your presence here. I am also asked to
impress upon you the fact that it will be greatly to your
advantage to journey here immediately, while your delay will be
perilous to yourself. We shall therefore expect you here within
two hours from the delivery of this letter.
"Peter Trevisa."
This communication certainly looked ominous, and I felt in no very
pleasant frame of mind as I entered the room beneath, where my
breakfast had been placed for me.
"Where is the fellow who brought this, Daniel?" I asked of my old
serving-man.
"He is standin' outside, sur. He wudden cum in. He seemed in a
terble 'urry."
I went to the door and saw a horse which had evidently been hard
ridden. It was covered with mud and sweat. The man who stood by
the animal's side touched his hat when he saw me.
"Go into the kitchen, my man, and get something to eat and drink,"
I said.
"I must not, sur," was the reply. "My master told me to ride hard,
and to return immediately I got your answer."
"Anything wrong at Treviscoe?"
"Not as I know ov, sur."
I had no hope of anything good from old Peter, and I felt like defying
him. My two years' possession of Trevanion had brought but little
joy. Every day I was pinched for money, and to have an old house to
maintain without a sufficient income galled me. The man who is
poor and proud is in no enviable position. Added to this, the desire
to hide my poverty had made me reckless, extravagant, dissolute.
Sometimes I had been driven to desperation, and, while I had never
forgotten the Trevanion's code of honour, I had become feared and
disliked by many people. Let me here say that the Trevanion code of
honour might be summed up in the following way: "Never betray a
woman. Never break a promise. Never leave an insult unavenged.
Suffer any privation rather than owe money to any man. Support the
church, and honour the king."
Having obeyed these dictates, a Trevanion might feel himself free to
do what else he liked. He could be a drunkard, a gamester, a
swashbuckler, and many other things little to be desired. I speak
now for my own branch of the family, for I had but little to do with
others of my name. In the course of years the estates had been
much divided, and my father's patrimony was never great. True,
there were many hundreds of acres of land, but, even although all of
it were free from embarrassment, it was not enough to make its
owner wealthy. My father had also quarrelled with those who bore
our name, partly, I expect, because they treated him with but little
courtesy. Perhaps this was one reason why he had been recklessly
extravagant, and why he had taken no pains to make me careful.
Anyhow I am afraid that while I was feared by many I was beloved
by few. I had had many quarrels, and the law of my county being
something lax, I had done deeds which had by no means endeared
me to my neighbours.
My pride was great, my temper was of the shortest, my tastes and
habits were expensive, and my income being small, I was weary of
keeping up a position for which I had not the means.
Consequently, as I read young Peter Trevisa's letter, I felt like
refusing to obey his bidding. I had been true to the Trevanion code
of honour. I had given Prideaux a written promise that the gaming
debt should be paid. Let them do their worst. I was young, as strong
as a horse, scarcely knew the meaning of fatigue, and I loved
adventure. I was the last of my branch of the family, so there was
no one that I feared grieving. Very well, then, I would seek my
fortune elsewhere. There were treasures in India, there were
quarrels nearer home, and strong men were needed. There were
many careers open to me; I would leave Trevanion and go to lands
beyond the seas.
I was about to tell the man to inform his master that I refused to go
to Treviscoe, when I was influenced to change my mind. I was
curious to know what old Peter had to say. I was careless as to what
he intended doing in relation to the moneys I owed him, but I
wondered what schemes the old man had in his mind. Why did he
want to see me? It would do no harm to ride to his house. I wanted
occupation, excitement, and the ride would be enjoyable.
"Very well," I said, "if I do not see your master before you do, tell
him I will follow you directly."
"Yes, sur," and without another word the man mounted the horse
and rode away.
I ate a hearty breakfast, and before long felt in a gay mood. True
the old home was dear to me, but the thought of being free from
anxious care as to how I might meet my creditors was pleasant. I
made plans as to where I should go, and what steps I should first
take in winning a fortune. The spirit of adventure was upon me, and
I laughed aloud. In a few days Cornwall should know me no more. I
would go to London; when there nothing should be impossible to a
man of thirty-two.
I spoke pleasantly to Daniel, the old serving-man, and my laughter
became infectious. A few seconds later the kitchen maids had
caught my humour. Then my mood changed, for I felt a twinge of
pain at telling them they must leave the old place. Some of them
had lived there long years, and they would ill-brook the thought of
seeking new service. They had served the family faithfully too, and
ought to be pensioned liberally instead of being sent penniless into
the world.
A little later I was riding furiously toward Treviscoe. The place was a
good many miles from Trevanion, but I reached it in a little more
than an hour. I found old Peter and his son eagerly awaiting me.
"Glad to see you, Roger, glad to see you," said the old man.
"Why did you send for me?" I asked.
"I'll tell you directly. John, take some wine in the library."
The servant departed to do his bidding, and I followed the two
Trevisas into the library.
"Sit down by the fire, Roger, lad; that's it. First of all we'll drink each
other's health in the best wine I have in my cellar. This is a special
occasion, Roger."
"Doubtless, a special occasion," I replied; "but no wine for me at
present. I want to keep my head cool in talking with such as you.
What do you want of me?"
"Let's not be hasty, Roger," said old Peter, eyeing me keenly, while
young Peter drew his chair to a spot where his face was shaded, but
from which he could see me plainly. "Let's be friendly."
"I'm in no humour to be friendly," was my rejoinder. "Tell me why
you have wished me to come to you?"
"I would have come to you, but I had a twinge of gout this morning,
and was not able to travel. I wanted to see you on an important
matter, my dear lad."
"Will you drop all such honeyed phrases, Peter Trevisa," I said
angrily. "I know you lent money to my father on Trevanion. I know I
have been a fool since I came into possession. Last night I lost my
head. Well, Prideaux shall be paid, and you will take the rest. I quite
expect this, and am prepared for it."
"Prideaux has been paid," laughed the old man.
"In cash?"
"Aye, that he has."
"Who paid him?"
"I did."
"Oh, I see. You wanted the bone all to yourself, did you," I cried
angrily. "Well, some dogs are like that. But it makes no difference to
me. Do your worst."
"You remember this," he said, holding up the piece of paper I had
given to Prideaux the night before.
"I was mad when I wrote it," I replied, "but I remember it well. How
did it come into your hands?"
"Prideaux has very fine notions about honour," remarked old Peter.
"He did not like taking advantage of it, and yet he knew that you as
a Trevanion would insist on his doing so."
"Well?"
"Well, Roger lad, seeing I have the Trevanion deeds, I thought I
might as well have this too. So I offered him money down, and he
was pleased to arrange the matter that way. He has made the thing
over to me."
"Let's see it—his writing ought to be on it to that effect."
"It is; aye, it is."
"Then let me look at it."
"No, Roger. This paper is very precious to me. I dare not let you
have it. You might destroy it then."
"Peter Trevisa," I cried, "did ever a Trevanion do a trick like that?"
"No, but you are in a tight corner, and——"
"Listen, you chattering old fool," I cried angrily. "If I wished, I could
squeeze the life out of the bodies of both of you and take the paper
from you before any one could come to your aid. But that's not my
way; give it me."
"I'll trust you, Roger; here it is."
I looked at the paper. I saw my own promise and signature;
underneath it was stated that the money had been paid by Peter
Trevisa, and signed "Edward Prideaux."
I flung it at him. "There," I said, "you've forged the last link in your
chain now. I am quite prepared for what I have no doubt you will
do. Trevanion is yours. Well, have it; may it bring you as much joy as
it has brought me."
"You misjudge me," cried old Peter. "You misjudge both me and my
son. True, Trevanion would be a fine place for my lad, but then I
should not like to drive you away from your old home. All the
Trevanions would turn in their graves if any one else lived there. I
want to be your friend. I desire to help you on to your feet again."
"Wind!" I cried. "Trust you to help any man!"
"Listen to what my father has to say," cried young Peter. "You will
see that we both wish to be friendly."
His face was partly hidden; nevertheless I saw the curious light
shining from his eyes. He was undersized, this young Peter, just as
his father was. A foxy expression was on his face, and his mouth
betrayed his nature. He was cunning and sensual. His was not unlike
a monkey's face. His forehead receded, his lips were thick, his ears
large.
"Roger Trevanion, my lad, there is no reason why you should have to
leave your old home. Nay, there is no reason why you should not be
better off than you have been. That is why I got this paper from
Edward Prideaux."
Old Peter spoke slowly, looking at me from the corner of his eyes.
"You want me to do something," I said after a minute's silence.
"Ah, Roger," laughed the old man, "how quickly you jump at
conclusions."
"It will not do, Peter Trevisa," I cried. "You have Trevanion. Well,
make the most of it. I shall not be sorry to be away from the county.
The thought that everything has really belonged to you has hung
like a millstone around my neck. I am not going to fetch and carry
for you."
"But if you had the deeds back. If I burnt this paper. If the estate
were unencumbered. What then?"
"You know it will not be. Trust you to give up your pound of flesh."
"You do me an injustice," replied old Peter, with a semblance of
righteous indignation. "What right have you to say this? Have I been
hard on you. Have I dunned you for your money."
"No; but you have lost no opportunity of letting me know that the
place belongs to you."
"That was natural, very natural. I wanted to put a check on your
extravagance."
I laughed in his face, for I knew this to be a lie.
"Roger Trevanion," cried young Peter, "my father is a merciful man.
He has your welfare at heart. He is old too. Is it manly to mock old
age."
"Let there be an end of this," I cried. "I begin to see why you have
brought me here. I knew you had some deep-laid plans or I would
not have come. It is always interesting to know what such as you
think. Well, let's know what it is."
For the moment I seemed master of the situation. An outsider would
have imagined them in my power instead of I being in theirs.
Especially did young Peter look anxious.
"I am sure we can trust Roger," said the old man. "When a
Trevanion gives his word he has never been known to break it."
"But they are learning to be careful how to give their word," I
retorted.
Peter looked uneasy. "But if I ask you to keep what I tell you a
secret, you will promise, Roger?"
"I ask for no confidences," I replied.
"You said just now that we wanted you to do something," said
young Peter. "You guessed rightly. If you do not feel inclined to do
what we ask you, you will of course respect anything we may tell
you?"
"That is but fair," was my answer.
"You promise, then?" cried old Peter.
"If I honourably can," I replied.
For a few seconds both men were silent; then old Peter began to
speak again.
"Roger Trevanion," he said, "you know that I hold the deeds of
Trevanion; you know that you are entirely at my mercy."
"Well enough."
"You would like to remain at Trevanion? You, a Trevanion, would not
like to be an outcast, a mere vagrant, a landless gipsy."
"I don't care much," I replied. "I should be free; and I would rather
be landless than be supposed to own the land, while everything
practically belonged to you. I've told you this before. Why make me
say it again?"
"But you would like the deeds back. You would like to live at the old
home with plenty of money?"
"You know I would. Why mock me?"
"You would do a great deal in order that this might come to pass."
"What do you want?"
We had come back to the same point again, and again old Peter
hesitated.
"You know Restormel?" he said at length.
"Restormel Castle, up by Lostwithiel?" I asked.
"No; Restormel in the parish of St. Miriam, a few miles north from
here?"
"Oh, yes, I know."
"What do you know?"
Both old Peter and young Peter spoke in the same breath; both
spoke eagerly, too—anxiously in fact.
"What is rumoured by certain gossips," I replied. "I expect there is
no truth in it."
"But what have you heard?"
"It is said that the estate belongs to a chit of a maid," I replied;
"that the maid's mother died at her birth, and that her father,
Godfrey Molesworth, did not long survive her. That he was broken-
hearted. That everything was left to a mere baby."
"But what became of the baby?"
"I know not. I have heard that she has never been seen on the
place, although her father has been dead wellnigh twenty years.
That the rents are paid to Colman Killigrew who lives at Endellion
Castle, and who is a godless old savage. Rumour says that he claims
to be the maid's guardian. But of this I am ignorant. He lives full fifty
miles from here, and I know nothing of him."
"That is all you have heard?"
"That is all I can remember at present."
"You have never seen the maid?"
"No. Who has? Stay; I have heard she was placed in a convent
school. Old Killigrew is a Catholic, I suppose."
"I'll tell you more, Roger Trevanion. Colman Killigrew has been
fattening on the Restormel lands for wellnigh twenty years. He hath
kept the maid, Nancy Molesworth, a prisoner. In a few months she
will be twenty-one. He intends marrying her to one of his sons. She
hates the whole tribe of Killigrews, but he cares nothing for that. He
is determined; you can guess why."
"Yes, such things are common. But what is that to me? I know
nothing of the maid, Nancy Molesworth; I do not care. Let the
Killigrews marry her; let them possess Restormel."
"My son Peter hath seen the maid, Roger."
"Ah! How?"
"He had to pay a visit in the neighbourhood of Endellion Castle, and
he saw her by chance."
"Spoke he to her?"
"No, he did not; she did not see him. She is kept a close prisoner,
but my Peter hath lost his heart."
I turned and looked at young Peter, and his face looked more
monkeyish than ever. A simpering smile played around his protruding
mouth. His eyes shone like those of a weazel.
"Well," I said, "what is this to me?"
"This, Roger Trevanion. I want that maid, Nancy Molesworth,
brought here to Treviscoe. I want to save her from those Papist
savages who would bring ruin upon the maid and upon the country."
"That's nothing to me," I replied; "I avoid women. They are all alike
—all cruel, all selfish, all false as hell. Why tell your plans to me?"
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookbell.com

More Related Content

PDF
Elements of Computer Networking: An Integrated Approach (Concepts, Problems a...
PDF
Quick Start Guide To Large Language Models Second Edition Sinan Ozdemir
PPTX
Google for education vs office 365
PPTX
Coaching teams in creative problem solving
PDF
Transformers In Action Meap V06 Chapters 1 To 8 Of 10 Nicole Koenigstein
PPTX
AI/ml workshop organized by GDG on campus SCOE.pptx
PPTX
Machine Learning for SEOs - SMXL
PDF
SOLVING MLOPS FROM FIRST PRINCIPLES, DEAN PLEBAN, DagsHub
Elements of Computer Networking: An Integrated Approach (Concepts, Problems a...
Quick Start Guide To Large Language Models Second Edition Sinan Ozdemir
Google for education vs office 365
Coaching teams in creative problem solving
Transformers In Action Meap V06 Chapters 1 To 8 Of 10 Nicole Koenigstein
AI/ml workshop organized by GDG on campus SCOE.pptx
Machine Learning for SEOs - SMXL
SOLVING MLOPS FROM FIRST PRINCIPLES, DEAN PLEBAN, DagsHub

Similar to Machine Learning Design Interview Machine Learning System Design Interview Khang Pham (20)

PPTX
Tin Can Learning Design – Andrew Downes
PDF
Board Infinity Data Science Brochure - data science learning path
PPTX
Troublefree troubleshooting ian campbell sps jhb 2019
PDF
Hacking Predictive Modeling - RoadSec 2018
PDF
Data Structures and Algorithms Made Easy in Java ( PDFDrive ).pdf
PDF
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
PDF
Simple Object Oriented Design (MEAP V04) Mauricio Aniche
PDF
Analysis Of Algorithms An Active Learning Approach 1st Edition Jeffrey J Mcco...
PDF
1. introduction to data science —
PDF
The Heart of Data Modeling: The Best Data Modeler is a Lazy Data Modeler
PDF
Machine Learning Product Managers Meetup Event
PDF
Artificial Intelligence with Python | Edureka
PDF
Data scientist enablement dse 400 week 4 roadmap
DOCX
South Sioux City Technology Classes
PDF
NLP & Machine Learning - An Introductory Talk
PDF
Binary crosswords
PDF
ML crash course
PPTX
BTech Final Project (1).pptx
PDF
Microsoft Dynamics #CRM 2016 Technical Blitz FAQ
PDF
Learning Data Science from Scratch!
Tin Can Learning Design – Andrew Downes
Board Infinity Data Science Brochure - data science learning path
Troublefree troubleshooting ian campbell sps jhb 2019
Hacking Predictive Modeling - RoadSec 2018
Data Structures and Algorithms Made Easy in Java ( PDFDrive ).pdf
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
Simple Object Oriented Design (MEAP V04) Mauricio Aniche
Analysis Of Algorithms An Active Learning Approach 1st Edition Jeffrey J Mcco...
1. introduction to data science —
The Heart of Data Modeling: The Best Data Modeler is a Lazy Data Modeler
Machine Learning Product Managers Meetup Event
Artificial Intelligence with Python | Edureka
Data scientist enablement dse 400 week 4 roadmap
South Sioux City Technology Classes
NLP & Machine Learning - An Introductory Talk
Binary crosswords
ML crash course
BTech Final Project (1).pptx
Microsoft Dynamics #CRM 2016 Technical Blitz FAQ
Learning Data Science from Scratch!
Ad

Recently uploaded (20)

PPTX
Presentation on HIE in infants and its manifestations
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
A systematic review of self-coping strategies used by university students to ...
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
Complications of Minimal Access Surgery at WLH
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
Institutional Correction lecture only . . .
PDF
Computing-Curriculum for Schools in Ghana
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
GDM (1) (1).pptx small presentation for students
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Presentation on HIE in infants and its manifestations
Final Presentation General Medicine 03-08-2024.pptx
A systematic review of self-coping strategies used by university students to ...
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
102 student loan defaulters named and shamed – Is someone you know on the list?
Complications of Minimal Access Surgery at WLH
VCE English Exam - Section C Student Revision Booklet
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Institutional Correction lecture only . . .
Computing-Curriculum for Schools in Ghana
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Pharmacology of Heart Failure /Pharmacotherapy of CHF
GDM (1) (1).pptx small presentation for students
Module 4: Burden of Disease Tutorial Slides S2 2025
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Chinmaya Tiranga quiz Grand Finale.pdf
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Ad

Machine Learning Design Interview Machine Learning System Design Interview Khang Pham

  • 1. Machine Learning Design Interview Machine Learning System Design Interview Khang Pham download https://ebookbell.com/product/machine-learning-design-interview- machine-learning-system-design-interview-khang-pham-49053002 Explore and download more ebooks at ebookbell.com
  • 2. Here are some recommended products that we believe you will be interested in. You can click the link to download. Machine Learning Design Patterns Solutions To Common Challenges In Data Preparation Model Building And Mlops 1st Edition Valliappa Lakshmanan https://ebookbell.com/product/machine-learning-design-patterns- solutions-to-common-challenges-in-data-preparation-model-building-and- mlops-1st-edition-valliappa-lakshmanan-23913214 Machine Learning Design Patterns Valliappa Lakshmanan Sara Robinson Michael Munn Valliappa Lakshmanan https://ebookbell.com/product/machine-learning-design-patterns- valliappa-lakshmanan-sara-robinson-michael-munn-valliappa- lakshmanan-29800692 Machine Learning Design Patterns Solutions To Common Challenges In Data Preparation Model Building And Mlops Valliappa Lakshmanan https://ebookbell.com/product/machine-learning-design-patterns- solutions-to-common-challenges-in-data-preparation-model-building-and- mlops-valliappa-lakshmanan-63127456 Machine Learning Design Patterns Valliappa Lakshmanan https://ebookbell.com/product/machine-learning-design-patterns- valliappa-lakshmanan-170761324
  • 3. Mastering Machine Learning Design A Practical Handbook For Creating Scalable Cloudnative Systems For Developers https://ebookbell.com/product/mastering-machine-learning-design-a- practical-handbook-for-creating-scalable-cloudnative-systems-for- developers-55989684 Artificial Intelligence By Example Acquire Advanced Ai Machine Learning And Deep Learning Design Skills 2nd Edition Denis Rothman https://ebookbell.com/product/artificial-intelligence-by-example- acquire-advanced-ai-machine-learning-and-deep-learning-design- skills-2nd-edition-denis-rothman-10815800 Machine Learning System Design Meap V03 Chapters 1 To 7 Of 16 Valerii Babushkin https://ebookbell.com/product/machine-learning-system-design- meap-v03-chapters-1-to-7-of-16-valerii-babushkin-50563132 Machine Learningbased Design And Optimization Of Highspeed Circuits 1st Edition Vazgen Melikyan https://ebookbell.com/product/machine-learningbased-design-and- optimization-of-highspeed-circuits-1st-edition-vazgen- melikyan-54706782 Machine Learning System Design With Endtoend Examples 1st Edition Valerii Babushkin https://ebookbell.com/product/machine-learning-system-design-with- endtoend-examples-1st-edition-valerii-babushkin-230360222
  • 7. Machine Learning Design Interview Copyright 2022 Khang Pham All rights reserved under International and Pan-American Copyright Conventions No part of this book may be reproduced, stored in a retrieval system or transmitted in any way by any means, electronic, mechanical, photocopy, recording or otherwise without the prior permission of the author except as provided by USA copyright law. ISBN-13: 979-8-8130-3157-1 [paperback]
  • 8. Success Stories Name Company Offer Positions Victor Facebook Facebook Senior Manager (E7) and others Stanford Conor Facebook, Amazon Facebook ML Manager (E6/M0) and 5 other offers Ted Vice Amazon, Facebook Amazon DS (L5) and Facebook DS (E5) Mike Bloomberg Facebook, Spotify Facebook MLE (E5), Spotify MLE (Senior) and others Jerry Google, Apple, Facebook, Cruise Google MLE (L5), Cruise MLE (L5), Apple MLE (Senior), FB MLE (E5) and others Steven Chris Google Google MLE (L4) and others Adam Google Google MLE (L5) Patrick Amazon, Wish Amazon DS (L5), Wish DS Bolton Chandra Intuit Intuit MLE (Senior) David Nguyen NVIDIA, NBCUniversal Intern Daniel Series B startup Senior MLE. Steven AccuWeather SWE, ML Sanchez Pinterest, Citadel Senior SWE, ML Ben Amazon Applied Scientist Mary Twitter, Apple Senior MLE Mark Facebook, Tiktok, other Senior MLE (E5) Ariana Intuit, Bloomberg Data Scientist Michael
  • 9. Wong Docusign Senior SWE/ML Teo LinkedIn, Google, Tiktok Senior SWE/AI, Google (L4) Nick Facebook, Google MLE (E5), Google (L4) Quinton Microsoft, Vmware Staff SWE/ML Lex Facebook, Google Google Brain (L5) and Facebook (E5) Strange Facebook Senior SWE/ML (E5) Jim Amazon Amazon Applied Scientist (L4) Shawn Amazon, Google, Uber Amazon Applied Scientist, Google Senior SWE/ML(L5) and Uber L4
  • 10. I wanted to update you that I just signed my offer with Amazon. Thank you so much for your help! Ted, Amazon Senior Data Scientist I’m done with interviews. FB called with an initial offer, now I need to start negotiating Michael, Facebook Senior SWE/ML I got the offer from Intuit. Thank you so much, it would not be possible without your help. Bolton, Intuit Senior SWE/ML Hi Khang, thank you for your help in the past few months. I have received an offer from Docusign. Michael Wong, DocuSign Senior SWE/ML Thanks to your Github repo, I got a research internship at Samsung. Your notes are so helpful. Many thanks. Dave Newton, Samsung MLE Intern I received an offer from Facebook (exciting!!!) Victor, Facebook Senior ML Manager I’m very happy that I could crack the Google interview. So finally I could pass Uber, Google and Amazon. Amazon was an applied scientist while the other two were ML engineers. Lots of hard work and effort to get to this state Thank you for staying with me all this time. Shawn, Google Senior SWE/ML Thanks to ML system design on , I just wanted to say I thought the course was super helpful! I got offers from google, fb, apple and tesla. Jerry, Google Senior SWE/ML Hi Khang, I got the offer from the company. THANK YOU for your coaching!!! Daniel, Unicorn startup Senior SWE/ML Hi Khang, I got an offer from Apple and Twitter. Thanks for your support. Mary, Apple Senior SWE/ML Hi Khang, I got an offer from Intuit today. Thank you so much for all your help. Ariana, Intuit Data Scientist
  • 11. Hi Khang, I want to let you know that I got offers from FB and Google. I decided to take the FB offer in the end. Thank you so much for guiding me through the whole interviewing process. Best! Mark, Facebook Senior SWE/ML Hey Khang, Received an E5 offer :). Thanks for helping me throughout this process. Strange, Facebook Senior SWE/ML
  • 12. This book is for my wife, my son, my mom and my dad.
  • 13. Preface Machine learning design is a difficult topic. It requires knowledge from multiple disciplines, such as big data processing, linear algebra, machine learning and deep learning. This is the first book that focuses on applied machine learning and deep learning in production. It covers all the design patterns and state-of-the-art techniques from top companies, such as Google, Facebook, Amazon, etc. Who should read this book? Data scientist, software engineer or data engineer who have a background in Machine Learning but never work on Machine Learning at scale will find this book helpful. How to read this book? Section 1.1.1 to 1.1.4 helps you review Machine Learning fundamentals. If you have a lot of experience in Machine Learning you can skip these sections. Section 1.1.5 is very important at big tech companies, especially Facebook. Chapter 2 helps you review important topics in the Recommendation System. Chapter 3 to chapter 8 explains end to end design of the most popular Machine Learning system at big tech companies. Chapter 9 helps you test your understanding. I’d like to acknowledge my friends Steven Tartakovsky, Tommy Nguyen and Jocelyn Huang for helping me in proofreading this book.
  • 14. Keep up to date with Machine Learning Engineer: mlengineer.io. All the quiz solutions can be found at: https://rebrand.ly/bookerrata. Khang Pham California April 2022
  • 15. Machine Learning Primer In this chapter, we will review commonly used machine learning techniques from industry. We focus on the application of Machine Learning techniques. The readers should already know most of these concepts in theory.
  • 16. Feature Selection and Feature Engineering One Hot Encoding One hot encoding is very popular when you have to deal with categorical features having medium size cardinality. In this example, when we have one column with four unique values, we create three more extra columns after one hot encoding. If one column has thousands of unique values, one hot encoding will create thousands of new columns. One Hot Encoding. Source: mlengineer.io Common Problems Tree-based models, such as decision trees, random forests, and boosted trees, don’t perform well with one hot encodings, especially when the tree has many levels (i.e., when there are values of categorical attributes). This is because they pick the feature to split, based on how well splitting the data on that feature will “purify” it. If we have several levels, only a small fraction of the data will usually belong to any given level, so the one hot encoded columns will be mostly zeros. Since splitting on this column will only produce a small gain, tree-based algorithms typically ignore the information
  • 17. in favor of other columns. This problem persists, regardless of the volume of data you actually have. Linear models or deep learning models do not have this problem. Expansive computation and high memory consumption: many unique values will create high-dimensional feature vectors. For example, if a column has a million unique values, it produces feature vectors, each with a dimensionality of one million. Best Practices When levels (categories) are not important, we can group them together in ”Other” class. Make sure that the pipeline can handle unseen data in the test set. In python, you can use pandas.get_dummies or sklearn OneHotEncoder. However, pandas.get_dummies does not “remember” the encoding during training, and if testing data has new values, it can lead to inconsistent mapping. One Hot Encoding in Tech Companies One Hot Encoding is used a lot in tech companies. For example, at Uber, one hot encoding is used on features before training some of their production XGboost models. However, sometimes, when the there are a large number of categorical values, such as in the tens of thousands, it becomes impractical to reasonably use One Hot Encoding. There is another technique, that’s actually used at Instacart on their models, that’s called mean encoding mean. Mean Encoding Take the Adult income data set example. We have data about the income of 50,000 people with different demographics: age, gender, education OneHotEncoder in scikit-learn has the advantage as you can use fit/transform/fit_transform, therefore, you can persist and use it together with Pipeline.
  • 18. background, etc. Let’s assume we want to handle Age data as categorical. There can be 80-90 unique values for this column. If we apply one hot encoding, it will create a lot of new columns for this small data set. Adult income dataset Age Income 18 60,000 18 50,000 18 40,000 19 66,000 19 51,000 19 42,000 We treat the Age feature as continuous variables by taking the average of income for that Age value. For example, we can create a new column Age_mean_enc. It represents the mean value of income for a specific Age. The benefit is that we can use this new column as a continuous variable. Mean Encoding for Income Data Age Income Age_mean_enc 18 60,000 50,000 18 50,000 50,000 18 40,000 50,000 19 66,000 53,000 19 51,000 53,000 19 42,000 53,000 If we use this method for the whole data we use for training, it will lead to label leakage. So it’s important that we use separate data for computing mean encoding. To make mean encoding even more robust, we can also apply Additive Smoothing1 or Cross Validation methods.
  • 19. Feature Hashing Feature hashing, or hashing trick, converts text data, or categorical attributes with high cardinalities, into a feature vector of arbitrary dimensionality. In some AdTech companies (Twitter, Pinterest, etc.), it’s not uncommon for a model to have thousands of raw features. Feature Hashing Benefits Feature hashing is very useful for features with very high cardinality with hundreds, and sometimes thousands, of unique values. Hashing trick is a way to reduce the increase in dimension and memory footprint by allowing multiple values to be present/encoded as the same value. Feature Hashing Example First, you decide on the desired dimensionality of your feature vectors. Then, using a hash function, you first convert all values of your categorical attribute (or all tokens in your collection of documents) into a number, and then
  • 20. convert this number into an index of your feature vector. The process is illustrated in figure 1.1. Let’s illustrate how it would work for converting the text “the quick brown fox” into a feature vector. Let us have a hash function h that takes a string as input and outputs a non-negative integer, and let the desired dimensionality be 5. By applying the hash function to each word and applying the modulo of 5 to obtain the index of the word, we get: Then we build the feature vector as, [1, 0, 0, 1, 2]. Indeed, h(the) mod 5 = 0 means that we have one word in dimension 0 of the feature vector; h(quick) mod 5 = 4 and h(brown) mod 5 = 4 means that we have two words in dimension 4 of the feature vector, and so on. As you can see, there is a collision between the words “quick” and “brown”: they both are represented by dimension 3. The lower the desired dimensionality, the higher are the chances of collision. This is the trade-off between speed and quality of learning. Commonly used hash functions are MurmurHash3, Jenkins, CityHash, and MD5. Feature Hashing in Tech Companies Feature hashing is widely popular in a lot of tech companies such as Booking, Facebook (Semantic Hashing using Tags and Topic Modeling, 2013), Yahoo, Yandex, Avazu, and Criteo. h(the) mod 5 = 0 h(quick) mod 5 = 4 h(brown) mod 5 = 4 h(fox) mod 5 = 3
  • 21. One problem with hashing is collisions. If the hash size is too small, more collisions will happen and negatively affect model performance. On the other hand, the larger the hash size the more it will consume memory. Collisions also affect model performance. With high collisions, the model won’t be able to differentiate coefficients between feature values. For example, the coefficient for “User login/User logout” might end up being the same, which makes no sense. Feature Hashing: Hash Size vs Logloss. Source: booking.com Depending on the application, you can choose the number of bits for feature hashing that provide the right balance between model accuracy and computing cost. Cross Feature Cross feature, or conjunction, between two categorical variables of cardinality c1c1 and c2c2 is just another categorical variable of cardinality c1×c2c1 times c2. If c1c1 and c2c2 are large, the conjunction feature has high cardinality, and the use of the hashing trick is even more crucial in this case. Cross feature is usually used with a hashing trick to reduce the high dimensions. As an example, suppose we have Uber pick up data with latitude and longitude stored in a database, and we want to predict demand at a
  • 22. certain location. If we only use the feature latitude for learning, the model might learn that city blocks at particular latitudes are more likely to have higher demand than others. Likewise, for the feature longitude. However, if we cross longitude by latitude, the cross feature represents a well-defined city block and allows the model to learn more accurately. What would happen if we don’t create a cross feature? In this example, we have two classes: orange and blue. Each point has two features: x1 and x2. Can we draw a line to separate them? Can we use a linear model to learn to separate these classes? To solve this problem, we can introduce a new feature: x3 = x1 * x2. Now we can learn a linear model with three features: x1, x2 and x1*x2. Cross feature. Source: developers.google.com Cross features are also very common in recommendation systems. In practice, we can also use wide and deep architecture to combine many dense features and sparse features. You can see one concrete example in section Wide and Deep [sec-wide-and-deep]. Embedding Both one hot encoding and feature hashing can represent features in
  • 23. multidimensions. However, these representations do not usually preserve the semantic meaning of each feature. For example, using OntHotEncoding can’t guarantee the word ‘cat’ and ‘animal’ are close to each other in multidimensions; or user ‘Kanye West’ is close to ‘rap music’ in YouTube data. The proximity here can be interpreted from the semantic perspective or engagement perspective. This is an important distinction and has implications for how we train embedding. How to Train Embedding In practice, there are two ways to train embedding: pre-trained embedding i.e: word2vec2 style or cotrained, (i.e., YouTube video embedding). In Word2Vector representation, we want our vector representation for each word such that if vector(word1) is close to vector(word2), then they are somewhat semantically similar. We can achieve this by using the surrounding words to predict the middle word in the sentence or using one word to predict surrounding words. We can see an example in the section below. Embedding As an example, each word can be represented as a dd dimension vector, and we can train our supervised model. We then use the outputs of one of the fully connected layers near the output layer of the neural network model as embeddings of the input object. In this example, embedding for ’cat’ is represented as a [1.2,−0.1,4.3,3.2][1.2, -0.1, 4.3, 3.2] vector. There are two ways to formulate the problems: Continuous Bag of Words
  • 24. (CBOW) and Skip-gram. For CBOW, we want to predict one word based on the surrounding words. For example, if we are given: word1 word2 word3 word4 word5, we want to use (word1, word2, word4, word5) to predict word3. CBOW. Source: Exploiting Similarities Among Languages for Machine Translation In the skip-gram model, we use ’word3’ to predict all surrounding words ’word1, word2, word4, word5’. Skipgram. Source: Exploiting Similarities Among Languages for Machine Translation
  • 25. Word2Vec example Work2vec CBOW example Model Input Label the, cat, on, the sat cat, sat, the, orange on sat, on, orange, tree the Instagram uses this type of embedding to provide personalized recommendations for their users, while Pinterest uses this as part of their Ads Ranking model. In practice, for some apps like Pinterest and Instagram where the user’s intention is strong, we can use word2vec style embedding training. How Does Instagram Train User Embedding? Within one session, Instagram user A sees the photos for user B then user C’s and so on. If we assume user A is currently interested in certain topics, we can also assume user B and user C’s photos might be relevant to those topics of interest. For each user session, we have a collection of actions like the below diagram: User A →longrightarrow see user B photos →longrightarrow see user C photos
  • 26. Sequence Embedding We can formulate each session as a sentence and each user’s photos as words. This is suitable for users who are exploring photos or accounts in a similar context during a specific session. How Does DoorDash Train Store Embedding? DoorDash uses this approach to do store embedding. For each session, we assume users may have a certain type of food in mind, and they view store A, store B, etc. We can assume these stores are somewhat similar to the user’s interests. Store 1 →longrightarrow Store 2 →longrightarrow Store 3
  • 27. Store Embedding. Source: Doordash We can train a model to classify a given pair of stores if they show up in a user session. Next, we will see another way to train embedding. It usually looks at embedding to optimize for some engagement metrics. How Does YouTube Train Embedding in Retrieval? Recommendation System usually consists of three stages: Retrieval, Ranking and Re-ranking (read Chapter [rec-sys]). In this example, we will cover how YouTube builds Retrieval (Candidate Generation) component using Two- tower architecture. Figure 1.2 provides an illustration of the two-tower model (read Common Deep Learning 1.5 section) architecture where left and right towers encode user, context and item respectively. Intuitively we can treat this problem as a multi-class classification problem. We have two towers3: left tower takes (users, context) as input and right tower takes movies as input. Two-tower Deep Neural Network4 is generalized from the multi-class classification neural network, a multi-layer perceptron (MLP) model, where the right tower of Figure 1.2 is simplified to a single layer with item embeddings. Given input x (user, context), we want to pick candidate y (videos) from
  • 28. all available videos. A common choice is to use Softmax function P(y|x;θ)=es(x,y)∑i=1mes(x,yi)P(y| x; theta) = frac{e^{s(x, y)}} {sum_{i=1}^m e^{s(x, y_i)}} Loss function: use log-likelihood L=−1T∑i=1Tlog(P(yi|xi;θ))L = - frac{1}{T} sum_{i=1}^T log(P(y_i|x_i;theta)) As a result, the two-tower model architecture is capable of modeling the situation where the label has structures or content features. StringLookup api maps string features to integer indices. Embedding layer API turns positive integers (indexes) into dense vectors of fixed size. Two-tower architecture. Source: Sampling-Bias-Corrected Neural Modeling for Large Corpus Item Recommendations
  • 29. The following are key questions we need to consider: For multi-classification where the video repository is huge, how feasible
  • 30. is it approach? Solution: for each mini-batch, we sampled data from our videos corpus as negative samples. One example is to use power-law distribution for sampling. When sampling, it’s possible that popular videos are overly penalized as negative samples in a batch. Does it introduce bias in our training data? One solution is to “correct” the logit output sc(xi,yj)=s(xi,yj) −log(pj)s^c(x_i, y_j) = s(x_i, y_j) - log(p_j). Here pjp_j means the probability of selecting video j. What if an average user only watches 2% of the videos completely, the other 98% of videos they just watch a few seconds? Is it a good idea to consider all engaged videos equally important? We can introduce continuous reward rr to reflect the degree of engagement. For example: watch time. Why do we need to use dot product? Can we use other operators? How many dimensions for the embeddings? Does movie embedding dimension need to be the same as the user embedding dimension? Why do we use relu? Can we use other activation functions? Facebook open sources their Deep Learning Recommendation Model5 with similar architecture. How Does LinkedIn Train Embedding?
  • 31. Pyramid two-tower network architecture/ Source: LinkedIn engineering blog LinkedIn used reverse pyramid architecture, which is the hidden layers growing in number of activations as we go deeper. LinkedIn used Hadamard product for Member Embedding and Job Embedding. The final prediction is a logistic regression on the Hadamard product between each seeker and job posting pair. Example of Hadamard product: [1234]⊙[5326]=[56624]begin{bmatrix} 1 & 2 3 & 4 end{bmatrix}odot begin{bmatrix} 5 & 3 2 & 6 end{bmatrix}= begin{bmatrix} 5 & 6 6 & 24 end{bmatrix} We chose the Hadamard product over more common functions, like cosine similarity, to give the model flexibility to learn its own distance function, while avoiding a fully con- nected layer to reduce scoring latency in our online recommendation systems.
  • 32. How Does Pinterest Learn Visual Embedding Take the Pinterest Visual Search6 example. When users search for a specific image, Pinterest uses input pins visual embedding and search for similar pins. How do we generate visual embedding? Pinterest used image recognition deep learning architecture, e.g., VGG16, ResNet152, Google Net, etc., to fine tune on the Pinterest dataset. The learned features will then be used as embedding for Pins. You can see an example in Chapter 10 with the Airbnb room classification use case. We can also use collaborative filtering. Read Collaborative Filtering [collaborative-filtering] section. Application of Embedding in Tech Companies Twitter uses embedding for UsersID, and it’s widely used in different Quiz About Two Tower Embedding Recall that in two-tower user/movie embedding, we have the last layer of each tower as embedding. When I build a network, I decide to set the user embedding dimension to 32 and the movie embedding dimension to 64. Will this architecture work? answer [A] Yes, as long as the model learns we can set any dimensions we want. [B] No, a movie has too many embedding dimensions, and we will run out of memory during serving millions of movies. [C] No, there is a shape mismatch between user embedding and movie embedding.
  • 33. use cases at Twitter, such as recommendation, nearest neighbor search, and transfer learning. Pinterest Ads ranking uses word2vec style where each user session can be viewed as: pin A →rightarrow pin B →rightarrow pin C, then co- trained with multitask modeling. Instagram’s personalized recommendation model uses word2vec style where each user session can be viewed as: account 1 →rightarrow account 2 →rightarrow account 3 to predict accounts with which a person is likely to interact within a given session. YouTube recommendations uses two-tower model embedding then co- trained with multihead model architecture. (Read about multitask learning in section Common Deep Learning 1.5). DoorDash personalized store feed uses word2vec style where each user session can be viewed as: restaurant 1 →rightarrow restaurant 2 → rightarrow restaurant 3. This Store2Vec model can be trained to predict if restaurants were visited in the same session using CBOW algorithm. How Do We Evaluate the Quality of the Embedding? There is no easy answer to this question. We have two approaches: Apply embedding to downstream tasks and measure their model In the Tensorflow documentation, they recommend the “rule of thumb”: d=D4d = sqrt[4]{D} where DD is the “number of categories”. Another way is to treat DD as a hyperparameter and we can tune on a downstream task. In large scale production, embedding features are usually pre- computed and stored in key/value storage to reduce inference latency.
  • 34. performance. For certain applications, like natural language processing (NLP), we can also visualize embeddings using t-SNE (t-distributed stochastic neighbor embedding), EMAP. We can look for clusters in 2-3 dimensions and validate if they match with your intuition. In practice, most embedding built with engagement optimization does not show any clear structure, UMAP (Uniform Manifold Approximation and Projection for Dimension Reduction). Apply clustering (kmeans, k-Nearest Neighbor) on embedding data and see if it forms meaningful clusters. How Do We Measure Similarity? To determine the degree of similarity, most recommendation systems rely on one or more of the following. Cosine: it’s the cosine of the angle between the two vectors s(q,x)=cos(q,x)s(q, x) = cos (q, x) Dot Product s(q,x)=∑i=1d(qi*xi)s(q, x) = sum_{i=1}^d (q_i*x_i) You will also see how LinkedIn uses Hadamard product in their embedding model (read section Embedding [subsec-embedding]). Euclidean distance s(q,x)=[∑i=1d(qi−xi)2]12s(q, x) = left[{sum_{i=1}^d (q_i-x_i)^2}right]^frac{1}{2} The smaller the value the higher the similarity. Important Considerations Dot product tends to favor embeddings with high norm. It’s more sensitive to the embeddings norm compared to other methods. Because of that it can create some consequences Popular content tends to have higher norms, hence ends up dominating the recommendations. How do you fix this? Can you think of parameterized dot production metrics? If we use bad initialization in our network and the rare content is
  • 35. initialized with large values, we might end up recommending rare content over popular content more frequently. Numeric Features Normalization For numeric features, the normalization must have mean 00 and range [−1,1] [-1, 1]. There are some cases where you want to normalize data to the range [0,1][0, 1]. v=v−min_of_vmax_of_v−min_of_vv = frac{v - text{min_of_}v} {text{max_of_}v - text{min_of_}v} where, vv is feature value, min_of_vv is min of feature value, max_of_vv is max of feature value. Standardization If the feature distribution resembles a normal distribution, we can apply a standardized transformation. v=v−mean_of_vstd_of_vv = frac{v - text{mean_of_}v} {text{std_of_}v} where, vv is feature value, mean_of_vv is min of feature value, std_of_vv is the standard deviation of feature value If the feature distribution resembles power laws we can transform it by using the formula: log(1+v1+median_of_v)logleft(frac{1 + v}{1 + text{median_of_}v}right) In practice, normalization can cause an issue because the values of min and max are usually outliers. One possible solution is “clipping”, where we pick a “reasonable” value for min and max. Netflix uses raw, continuous timestamp indicating the time when the user played a video in the past, a long with current time when making a prediction. They observed 30% increase in offline metrics. It leads to another challenge since the production model will al- ways use the current timestamp, which was never observed in the training data. To handle this
  • 36. Summary We learn how to handle numerical features: normalization and standardization. In practice, we can often apply log transformation when features values are big with very high variance. For sparse features, there are multiple ways to handle them: one hot encoding, feature hashing, and entity embedding. With entity embedding, there are two popular ways to train embedding: pre-trained and co-trained. The common technique is to use engagement data and train the two-tower network model. One interesting challenge is how to select labels data when training entity embedding. We will explore some solutions in the later chapters. situation, production models are regularly retrained. Feature Selection and Feature Engineering Quiz We have a table with columns UserID, CountryID, CityID, Zipcode, Age. Which of the following feature engineering is suitable to present data in machine learning algorithm? answer [A] Apply one hot encoding for all columns [B] Apply embedding for CountryId, CityID; One Hot Encoding for UserID, Zipcode; apply normalization for Age [C] Apply embedding for CountryId, CityID, UserID, Zipcode and apply normalization for Age
  • 37. Training Pipeline Training pipeline needs to handle large volumes of data at a low cost. One common solution is to store data in a column-oriented format like Parquet, Avro, or ORC. These data formats enable high throughput for ML and analytics use cases because they are column-based. In other use cases, tfrecord (TensorFlow format for storing a sequence of binary records) data format is widely used in TensorFlow ecosystem. Data Partitioning Parquet and ORC files usually get partitioned by time for efficiency so that we can avoid scanning through the whole dataset. It’s also beneficial for parallel training and distributed training. In this example, we partition data by year, then by month. In practice, the most common services on AWS, RedShift (Amazon fully managed, petabyte-scale data warehouse service in the cloud), and Athena (interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL) support Parquet and ORC. Compared to other formats like CSV, Parquet can speed up queries by a factor of 30×30times faster, saving 99% cost and reducing 99% data scanned. Partition Data. Source: mlengineer.io
  • 38. Data is partitioned by year and month Within each partition (year = 2020, month = 02), we have all data stored in parquet format. Handle Imbalance Class Distribution In machine learning use cases like fraud detection, click prediction or spam detection, it’s common to have imbalance labels. For example, in ad click prediction, it’s very common to have 0.2% conversion rate. If there are 1,000 clicks, only two clicks lead to some desired actions, such as installing the app or buying the product. Why is this a problem? With too few positive examples compared to negative examples, your model spends most of the time learning about negative examples. There are few strategies to handle them. Use class weights in the loss function. For example, in spam detection problems, where non-spam data might account for 95% of data compared to other spam data that is only 5%, we want to penalize more on the major class. In this case, we can modify the entropy loss function using weight. Use naive resampling: resample major class at a certain rate to reduce the imbalance in the training set. It’s important to have validation data and test data intact (no resampling). Use synthetic resampling: synthetic minority over-sampling technique (SMOTE) consists of synthesizing elements for the minority class, based on those that already exist. It works by randomly picking a point from the minority class and computing the k-nearest neighbors for this point. The synthetic points are added between the chosen point and its neighbors. For //w0 is weight for class 0, //w1 is weight for class 1 loss_function = -w0 * ylog(p) - w1*(1-y)*log(1-p)
  • 39. practical reasons, SMOTE is not as widely used as other methods. In practice, this method is not commonly used, especially for large-scale applications. Resample Data. Source: imbalanced-lean.org Common Resampling Use Cases Due to the huge data size, it’s more common for big companies like Facebook and Google to use downsampling for the dominant class. For training pipeline, if your feature store has a SQL interface, you can use the built-in rand() function for downsampling your dataset. For deep learning models, we can sometimes use downsample as the majority class examples and then upweight them. It helps the model train faster and calibrate the model well with the true distribution. example_weight=original_weight*downsampling_factortext{example_weight} = text{original_weight} * text{downsampling_factor} //sampling 10% of the data, source: nqbao.medium.com SELECT d.* FROM dataset d WHERE RAND() < 0.1 Quiz about Weight for Positive Class
  • 40. Data Generation Strategy When we first start a new problem that requires machine learning, especially when supervised learning is more suitable, we have to answer the question of, "How do we get labels data?" LinkedIn feed ranking: We can generate label data by order feeds chronologically first to collect data. Facebook place recommendation: We can use places people like first and then use them as positive labels. For negative labels, we can either sample all other places as negative samples or pick all places that users saw but didn’t like as negative samples. How LinkedIn Generates Data for Course Recommendation Design Machine Learning solution for Course Recommendations on LinkedIn Learning. Problem At the beginning, the main goal of Course Recommendations is to acquire new learners by showing highly relevant courses to learners. There are few challenges: If a dataset contains 100 positive and 300 negative examples of a single class, what should be the weight for the positive class?Answer [A] Positive weight is 300/100=3300/100 = 3. [B] Positive weight is 100/300=0.333100/300 = 0.333. [C] It depends; can’t tell.
  • 41. Lack of label data: if we have user activities (browse, click) available, we can use these signals as implicit labels to train supervised model. As we’re building this LinkedIn Learning system, we don’t have any engagement signals yet. This is also called Cold start problem. One way to deal with it is to rely on user survey during their on- boarding process, i.e: ask learners which skills they want to learn/improve. In practice, it’s usually insufficient. Let’s take a look at one example: given learner Khang Pham with skills: BigData, Database, Data Analysis in his LinkedIn profile. Assume we have two courses: Data Engineering and Accounting 101, should we recommend Data Engineering or Accounting course? It’s self-explained that Data Engineering would be a better recommendation because it’s more relevant to this user’s skillset. This lead us to one idea: we can use skills as a way to measure relevance. If we can map learners to Skills and map Course to Skills, we can measure and rank relevance accordingly. Skill-based Model. Source: LinkedIn Course to Skill: Cold Start Model There are various techniques to build the mapping from scratch. Manual tagging using taxonomy (A). All LinkedIn Learning courses are tagged with categories. We asked taxonomist to perform mapping from categories to skills. This approach helps us acquired high precision human-generated courses to skill mapping. On the other hand, it doesn’t
  • 42. scale i.e: low coverage. Leverage LinkedIn skill taggers (B): leverage LinkedIn Skill Taggers features to extract skill tags from course data. Use supervised model: train a classification model such that for a given pair (course, skill): return 1 if the pair is relevant and 0 otherwise. Label data: collect samples from A and B as positive training data. We then random samples from our data to create negative labels. We want our training dataset to be balance. Features: course data (title, description, categories, section names, video names). We also leverage skill-to-skill similarity mapping features. Disadvantage: a) relies heavily on the quality of the skill-taggers b) one single logistic regression model might not be able to capture the per-skill level effects. Use Semi supervised learning. We learn a different model for each skill, as opposed to one common model for all (course, skill) pairs. Data Augmentation: leverage skill-correlation graph to add more positive labels data. For example: if SQL is highly relevant to Data Analysis skill then we can add Data Analysis to training data as positive labels. Evaluation: offline metrics Skill-coverage: measure how many LinkedIn standardized skills are present in the mapping. Precision and Recall: we treat course to skill mapping from human as ground truth. We can evaluate our classification models using precision and recall.
  • 43. Member to Skill Member to skill via profile: LinkedIn users can add skills to their profile by entering free-form text or choosing existing standardized skills. This mapping is usually noisy and needs to be standardized. In practice, the coverage is not high since not many users provide this data. We also train supervised model p(user_free_from_skill,standardized_skill)p(user_free_from_skill, standardized_skill) to provide a score for the mapping. Member to skill using title and industry: in order to increase the coverage we can use cohort-level mapping. For example: user Khang Pham work in Ad Tech industry and title Machine Learning Engineer and he didn’t provide any skill set in his profile. We can rely on cohort of Machine Learning Engineer in Ad Tech to infer this user’s skills. We then combine the profile-based mapping using weight combination with cohort-based mapping. Member to skill Skill Profile- based mapping Cohort- based mapping Weight Weight Final mapping SQL 0.01 0.5 w1 w2 0.01*w1+0.5*w20.01*w1 + 0.5*w2 Database 0.3 0.2 w1 w2 0.3*w1+0.2*w20.3*w1 + 0.2*w2 Further reading Learning to be Relevant8 How to Split Train/Test Data This consideration is often overlooked but very important in the production environment. In forecast or any time-dependent use cases, it’s important to respect the chronological order when you split train and test data.
  • 44. For example, it doesn’t make sense to use data in the future to “forecast” data in the past. For sales forecast use case, we want to forecast sales for each store. If we randomly split data by storeID, that train data might not have data for some stores. Hence, the model can’t forecast for such stores. In practice, we need to consider split data so that we can have storeId in train data as well as test data. Uber Forecast Model Evaluation. Source: Uber Sliding Window First, we select data from day 0 to day 60 as the train set and day 61 to day 90 as the test set. Then, we select data from day 10 to day 70 as the train set and day 71 to day 100 as the test set. Expanding Window First, we select data from day 0 to day 60 as train set and day 61 to day 90 as test set. Then we select data from day 0 to day 70 as train set and day 71 to day 100 as test set. Retraining Requirements
  • 45. Retraining is a requirement in many tech companies. In practice, the data distribution is a nonstationary process, so the model does not perform well without retraining. In AdTech and recommendation/personalization use cases, it’s important to be able to retrain models to capture changes in users’ behavior and trending topics. So the machine learning engineers need to make the training pipeline run fast and scale well with big data. When you design such a system, you need to balance between model complexity and training time. The common design pattern is to have a scheduler retrain the model on a regular basis, usually many times per day. Four Levels of Retraining Level 0: Train and forget. Train the model once and never retrain it again. This is appropriate for the ’stationary’ problem. Level 1: cold-start retraining: Periodically retrain the whole model on a batch dataset. Level 2: Near-line retraining: Similar to level 2, we retrain model per- key components individually and asynchronously nearline on streaming data. Level 3: warm-start retraining: If the model has personalized per-key components, retrain only these in bulk on data specific to each key (e.g., all impressions of an advertiser’s ads) once enough data has accumulated.
  • 46. Four Levels of Model Retraining - High Level. Source: LinkedIn
  • 47. Loss Function and Metrics Evaluation In this section, we will focus on Regression and Classification and use cases. Choosing loss functions and determining which metrics to track is one of the most important parts of Machine Learning products/services. Regression Loss Mean Square Error and Mean Absolute Error Mean Square Error is one of the most common loss metrics in regression problems. MSE=1N∑i=1n(targeti−predictioni)2MSE = frac{1}{N} sum_{i=1}^n (text{target}_i - text{prediction}_i)^2 Mean Absolute Error MAE=1N∑i=1n|targeti−predictioni|MAE = frac{1}{N} sum_{i=1}^n |text{target}_i - text{prediction}_i| MSE table ActualPrediction Absolute Error Square error 30 0 0 0 32 29 3 9 31 33 2 4 35 36.8 1.8 3.24 In this example, MAE is 1.7(6.8/4)1.7 (6.8/4) and MSE is 4.06(16.24/4)4.06 (16.24/4). MAE table ActualPrediction Absolute Error Square error 30 0 0 0
  • 48. 32 32 0 0 31 30 1 1 50 35 15 225 In this example, MAE is 4(16/4)4 (16/4) and MSE is 56.5(226/4)56.5 (226/4). With one outlier value (50), it causes MSE error to increase significantly. In practice, we always need to look for the outlier. If we have an outlier in our data, it will make the MSE loss model give more weight to the outlier than a MAE loss model. In that case, using MAE loss is more intuitive since it’s more robust to an outlier. Huber Loss Huber Loss fixed the outlier-sensitive problem of MSE, and it’s also differentiable at 00 (since MAE’s gradient is not continuous). The idea is pretty simple: if the error is not too big, Huber loss uses MSE; otherwise, it’s just MAE with some penalty. 12(target−prediction)2frac{1}{2}(text{target} - text{prediction})^2, if |target−prediction|<=delta|text{target} - text{prediction}| <= text{delta} delta|target−prediction|−12delta2text{delta}, |text{target} - text{prediction}| - frac{1}{2},text{delta}^2, otherwise The problem with Huber loss is that we need to tune the hyperparameter delta. Quantile Loss In certain applications, we value underestimation vs. overestimation differently. If you build a model to estimate arrival time, you don’t want to overestimate; otherwise, customers might not make orders/requests, etc. Quantile loss can give more value to positive error or negative error. ∑y<plambda*|y−p|+∑y>=p(lambda−1)*|y−p|sum_{y < p} text{lambda}*|y
  • 49. - p| + sum_{y >= p} (text{lambda}-1) * |y - p| If you set lambda to 0.50.5, it becomes MAE. It depends on the use case to decide when to use which loss function. For binary classification, the most popular one is cross_entropy. In the Ad Click prediction problem, Facebook uses Normalized Cross Entropy loss (a.k.a. log loss) to make the loss less sensitive to background conversion rate. How Facebook Uses Normalized Cross Entropy for AdClick Prediction? Problem: Suppose we build a machine learning model to predict click/not- click for an Ads System. We build two models: fixed prediction model and fancy model. The fixed prediction model always predicts probability(click) = 0.2. The fancy model has slightly ‘better’ intuition; for positive labes,l it predicts 0.3 and for negative labels, it predicts 0.1, which is intuitive and better than a random guess. Intuitively, the fancy model should perform better because it doesn’t predict(click) with a constant value. Fixed Prediction Model Cross Entropy Loss Model Fixed Prediction Uber uses pseudo-Huber loss and log-cosh loss to approximate Huber loss and Mean Absolute Error in their distributed XGBoost training. Doordash Estimated Time Arrival models uses MSE then they move to Quantile loss and Custom Asymmetric MSE
  • 50. Actual Predicted Model Cross Entropy Loss 1 0.2 1.6094373 -1 0.2 0.22314353 -1 0.2 0.22314353 -1 0.2 0.22314353 -1 0.2 0.22314353 -1 0.2 0.22314353 -1 0.2 0.22314353 -1 0.2 0.22314353 -1 0.2 0.22314353 -1 0.2 0.22314353 The click-through rate is 110frac{1}{10}. The overall cross entropy loss is 0.361772950.36177295 Fancy Model Cross Entropy Loss Actual Model Predicted Fancy Model Cross Entropy loss 1 0.3 1.2039728 1 0.3 1.2039728 1 0.3 1.2039728 1 0.3 1.2039728 1 0.3 1.2039728 -1 0.1 0.105360545 -1 0.1 0.105360545 -1 0.1 0.105360545 -1 0.1 0.105360545
  • 51. -1 0.1 0.105360545 The click-through rate is 12frac{1}{2} and cross entropy is 0.654666660.65466666. Given smaller cross entropy loss, does the fixed prediction model perform better than the fancy model? In the two training data sets, the difference is that we have different underlying CTR. This is why Facebook and other big tech companies favor Normalized Cross Entropy9 (NCE). NCE=logloss(model)logloss(rate)text{NCE} = frac{text{logloss(model)}} {text{logloss(rate)}} Properties of NCE: Always non-negative. Only 0 if your predictions match the labels perfectly. Unbounded; can grow arbitrarily large. Intuitive scale: NCE < 1: the model has learned something. NCE > 1: the model is less accurate than always predicting the average. Assume a given training data set has NN examples with labels yi∈−1,+1y_i in {−1, +1} and estimated probability of click pi where i=1,2,...Ni = 1, 2, ...N. The average empirical CTR as pp NCE=−1N∑i=1n(1+yi2log(pi))+(1−yi2log(1−pi))−(p*log(p)+ (1−p)*log(1−p))NCE = frac{-frac{1}{N} sum_{i=1}^n left(frac{1+y_i} {2} log(p_i)right) + left(frac{1-y_i}{2}log(1-p_i)right)} {-(p*log(p) + (1-p)*log(1-p))} The lower the value, the better the model’s prediction. The reason for this normalization is that the closer the background CTR is to either 0 or 1, the easier it is to achieve a better log loss.
  • 52. Dividing by the entropy of the background CTR makes the NE insensitive to the background CTR. In the above example, model 1 has NCE = 0.361772950.325083frac{0.36177295}{0.325083} = 1.11 and model 2 has NCE = 0.654666660.6931472frac{0.65466666}{0.6931472} = 0.945. Forecast Metrics In forecast problems, the most common metrics are Mean Absolute Percentage Error (MAPE) and Symmetric Mean Absolute Percentage Error (SMAPE). For MAPE, one needs to pay attention if your target value is skewed (i.e., either too big or too small). On the other hand, SMAPE is not symmetric as it treats under-forecast and over-forecast differently. Mean Absolute Percentage Error M=1n∑t=1n[At−FtAt]M = frac{1}{n} sum_{t=1}^n left[frac{A_{t}- F_{t}}{A_{t}}right] where, M=M = mean absolute percentage error n=n = number of samples At=A_{t} = actual value Ft=F_{t} = forecast value Mean Absolute Percentage Error Actual Model Predicted Absolute Percentage Error 0.5 0.3 0.4 0.1 0.9 8.0 0.4 0.2 0.5
  • 53. 0.15 0.2 0.334 In the second row, since the prediction is too high, we have a percentage error of 8.0. When we calculate the mean of all the errors, the MAPE metric value becomes too high and hard to interpret. Advantages Expressed as a percentage, which is scale-independent and can be used for comparing forecasts on different scales. We should remember, though, that the values of MAPE may exceed 100%. Easy to explain to stakeholders. Disadvantage MAPE takes undefined values when there are zero values for the actual, which can occur, for example, demand forecasting. Additionally, it takes extreme values when the actual is very close to zero. MAPE is asymmetric, and it puts a heavier penalty on negative errors (when forecasts are higher than actual) than positive errors. This is caused by the fact that the percentage error cannot exceed 100% for forecasts that are too low. There are no upper limits for the forecasts that are too high. As a result, MAPE will favor models that under-forecast rather than over-forecast. Symmetric Absolute Percentage Error SMAPE=100%n∑t=1n[Ft−At]([At]+[Ft])/2text{SMAPE} = frac{100%} {n} sum_{t=1}^n frac{[F_{t}-A_{t}]}{([A_{t}] + [F_{t}])/2} Advantages Fixes the shortcoming of the original MAPE — it has both the lower (0%) and the upper (200%) bounds.
  • 54. Disadvantage Unstable when both the true value and the forecast are very close to zero. When it happens, we will deal with division by a number very close to zero. SMAPE can take negative values, so the interpretation of an “absolute percentage error” can be misleading. The range of 0% to 200% is not that intuitive to interpret. Therefore, the division by the 2 in the denominator of the SMAPE formula is often omitted. Other companies also use machine learning and deep learning for forecast problems. For example, Uber uses different algorithms like recurrent neural networks (RNNs), gradient boosting trees, and support vector regressor for various problems. Some problems include marketplace forecasting, hardware capacity planning, and marketing. Classification Loss In this section, we will focus more on the less popular metrics: focal loss and hinge loss. Focal Loss When handling an imbalance class during training, a situation arises in which there are easy samples and hard samples. How can we make the model focus more on the hard examples? Focal loss10 addresses this by adding weight in such a way that if the samples are easy, the loss value is small and vice versa. If we set γgamma as 0, it becomes traditional cross entropy. FL(pt)=−(1−p)γlog(pt)FL(p_t) = -(1-p)^gamma log(p_t) When do we use it? Focal loss makes it easy for model to learn. It’s popular in Objection Detection.
  • 55. Random documents with unrelated content Scribd suggests to you:
  • 59. The Project Gutenberg eBook of Mistress Nancy Molesworth: A Tale of Adventure
  • 60. This ebook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this ebook or online at www.gutenberg.org. If you are not located in the United States, you will have to check the laws of the country where you are located before using this eBook. Title: Mistress Nancy Molesworth: A Tale of Adventure Author: Joseph Hocking Release date: February 26, 2017 [eBook #54239] Most recently updated: October 23, 2024 Language: English Credits: Produced by Martin Pettit and the Online Distributed Proofreading Team at http://www.pgdp.net (This file was produced from images generously made available by The Internet Archive) *** START OF THE PROJECT GUTENBERG EBOOK MISTRESS NANCY MOLESWORTH: A TALE OF ADVENTURE ***
  • 61. Transcriber's Note: Obvious typographic errors have been corrected.
  • 63. MISTRESS NANCY MOLESWORTH A TALE OF ADVENTURE BY Joseph Hocking Author of "The Birthright," etc. NEW YORK DOUBLEDAY & McCLURE CO. 1898 Copyright, 1898, by DOUBLEDAY & McCLURE CO. Press of J. J. Little & Co. Astor Place, New York
  • 65. Contents CHAPTER PAGE I.—Trevanion, 1 II.—Peter Trevisa's Offer, 10 III.—Crossing the Rubicon, 24 IV.—My Journey to Endellion, 37 V.—My First Night at Endellion, 51 VI.—The Uses of a Serving-Maid, 67 VII.—On the Roof of Endellion Castle, 82 VIII.—Otho Discovers My Name, 95 IX.—Benet Killigrew as a Wrestler, 111 X.—The Escape from Endellion, 125 XI.—My Fight with Benet Killigrew, 139 XII.—Roche Rock, 153 XIII.—The Wisdom of Gossiping with an Innkeeper, 168 XIV.—The Haunted Chapel of St. Mawgan, 181 XV.—The Scene at a Wayside Inn, 195 XVI.—Why I Took Nancy to Treviscoe, 210 XVII.—The Charge of Treason, 224 XVIII.—Otho Killigrew's Victory, 239 XIX.—Launceston Castle, 251 XX.—I Escape from the Witch's Tower, 267 XXI.—Describes My Journey from Launceston Castle to a Lonely Mansion Accompanied by Two Women, 285 XXII.—Mistress Nancy Tells Me Many Things, 301 XXIII. —In Which it is Shown that Uncle Anthony Was More than a Droll, 315
  • 66. XXIV.—Otho Killigrew Uses an Old Proverb, 330 XXV.—How January Changed to June, 344 XXVI.—I Fall Into Otho Killigrew's Hands, 358 XXVII. —How Benet Killigrew and I Fought in the Light of the Beacon Fire, 371 XXVIII.—Otho Killigrew's Last Move, 386 XXIX.—The King's Gratitude, 400 XXX.—In Which Uncle Anthony Plays His Harp, 414 MISTRESS NANCY MOLESWORTH
  • 67. CHAPTER I. TREVANION. The only part of my history which I regard as worthy of placing on record is confined to a few months. I was thirty-two years of age at the time, and had thus entered into the very summer of my life. At that age a man's position ought to be assured; at any rate his career should be marked out with tolerable plainness. Such, however, was not my fortune. Although I bear one of the best known and most honoured names in my native country, I, Roger Trevanion, was in sore straits at the time of which I write. And this not altogether because of my own faults. I did not come into the possession of my heritage until I was thirty, my father having retained absolute control of his estate until his death. Up to that time I knew nothing of his money matters. Neither, indeed, did I care. I had enough for my own use; I possessed good horses and was able to enjoy what festivities the county provided, to the full. Ever since my mother's death, which took place when I was fourteen, my father paid me but little attention. He saw to it that I was taught to ride, fence, shoot, with other accomplishments befitting my station, and then allowed me to follow my own inclinations. As a consequence I became a gay fellow, being guilty, I am afraid, of most of the misdemeanours common to young men. I remembered that I was a Trevanion, however, and while I did not belong to the most important branch of the family, I held to the code of honour to which for many generations we had been true. I knew that my father gambled freely, and had many relations with people which were beyond my comprehension. I did not trouble about this, however. Very few restraints were placed upon me, and I was content.
  • 68. When my father died, I discovered that I was a poor man. I had still the semblance of wealth. I lived in the old house, and was supposed to own the lands surrounding it. The old servants still called me master, and the farmers paid their rents to me as they had paid them to my fathers. In reality, however, everything was mortgaged for nearly all it was worth. True, the lawyer told me that if I would discharge a number of superfluous servants, get rid of a number of useless horses, and consent to the sale of a quantity of timber, I could by practicing the strictest economy for ten years, place everything on a satisfactory footing. "That will mean that I must give up hunting, racing, drinking, betting, besides closing the house and living like a hermit, I suppose?" I said to him. "That does not suit me. Is there no other way?" "Yes, there is one," he replied. "And that?" "A suitable marriage." I shrugged my shoulders. "Women are not in my way, Mr. Hendy," I said. The truth was, I had fancied myself in love when I was twenty, with the daughter of John Boscawen, a distant relation of the famous Boscawens. She had led me on until I was mad about her. I was her slave for several months, and she treated me as though I were a dog of the fetch-and-carry breed. Presently a young fellow from a place near Penzance, Prideaux by name, came to her father's place, and no sooner did he start a-courting her than she sent me about my business, drove me away in fact, as though I were a cur. Since that time I had hated women, and I grew angry at the thought of ever being expected to put confidence in one. "The state of your affairs is not generally known," persisted the lawyer, "and a wife with a handsome dowry would mean getting
  • 69. back the deeds." "No petticoats for me," I replied angrily. "But if the petticoats mean comfort and freedom from money cares, would you not be wise to put aside your prejudice against them?" "Anything but that," I cried, remembering Amelia Boscawen. "Retrenchment or a wife," persisted the lawyer. "Neither," I cried, angry that directly I came into my heritage I should find myself in such a fix. The lawyer sighed. "From whom did my father borrow?" I asked presently. "Peter Trevisa," he replied. I knew the man slightly. A little, shrivelled-up, old creature who had married late in life, and who had one son whom we called "Young Peter," because he was so much like his father. Young Peter was not so old as I, and I had never been friendly with him. In fact I had despised him as a ferrety kind of fellow, with whom I had nothing in common. "He holds you like that," said the lawyer, putting out his hand and clasping it. A great deal more was said, but to no purpose, and I went on as I had gone before. True, I discharged one or two of the younger servants and sold a quantity of timber, but I did not retrench as the lawyer advised. Thus at the end of two years I was, if possible, in a worse position than when my father died. One day—and here my story really begins—I rode off to a fox hunt. I still held my head high, and rode the best horse in the field. I was careful, too, to be well dressed, and I prided myself that in spite of my poverty I was inferior to none. I was young, regarded as
  • 70. handsome, stood over six feet in my stockings, and was well set up. As usual I avoided women, although there were many at the meet. Although one of the heaviest men there, I kept well ahead through the day, and in spite of the weight of my debts I was in at the death. After the hunt I went to Geoffry Luxmore's ball, which was a part of the day's programme, but I did not join the dancers. I wanted to be free from women, and therefore accepted an invitation to take part in a game of cards. While sitting at dinner I saw old Peter Trevisa. He nodded to me in a friendly way. Afterward he came to me and caught me by the arm. "And how are matters going at Trevanion, eh, lad?" he asked. "Grandly," I replied gaily, for I was heated with good wine and I felt no cares. "Thou shouldst be in the dancing-room, lad," he said. "There's many a fine maid there; many with a big dowry. Geoffry Luxmore's daughter should suit thee well, Roger." "No women for me," I cried. "No; dost a hate them so?" I shrugged my shoulders. "Then my Peter'll be getting Trevanion, Roger?" he said with a leer. In spite of my excitement I felt uneasy as I looked at his eyes. "I've been thinking about calling in my mortgage," he said. "Do," I replied. "Ah, sits the wind in that quarter, eh? Well, Roger, thou hast always been a dare-devil fellow. But a landless Trevanion will be a sorry sight." "There never has been one yet."
  • 71. "And if thou art the first, 'twill be a sorry business." I felt more uncomfortable, so I swallowed a large bumper of wine to keep my spirits up. Presently we sat down to play. I won, I remember, freely at first, and was in high good humour. "Luck seems with thee to-night," said old Peter Trevisa. "After all, it seems thou'st done well to come here rather than go a-dancing with the maidens yonder." As he spoke the music ceased, and on looking up I saw Ned Prideaux, the fellow who had stolen Amelia Boscawen from me, come into the room. I don't know that I felt any enmity toward him; the only wrong feeling I had for him was on account of my pride. That he should have been preferred before me wounded my vanity. Old Peter Trevisa knew of the business, and laughed as he came up. "Thou didst beat him in courting, lad," he said to Prideaux, "let's see if thou canst beat him at playing." This he said like one who had been drinking a good deal. And although I had not seen him making free with wine, I fancied he must be fairly drunk; consequently I did not resent his words. Besides, I was in high good humour because of my winnings. "I'll take a hand with pleasure," answered Prideaux. He wiped his brow, for he had been dancing, and sat down opposite me. I broke a fresh bottle of wine, and we commenced playing. Fool that I was, I drank freely throughout the evening, and presently I became so excited that I hardly knew what I was doing. Several fellows gathered around to watch us, and the stakes were high. I had not been playing with Prideaux long before my luck turned. I
  • 72. began to lose all I had gained. Old Peter Trevisa chuckled as he saw that the cards were against me. "Give it up, Roger," he said in a sneering kind of way; "Trevanion can't stand bad luck, lad." This wounded my pride. "Trevanion can stand as much as I care to let it stand," I replied, and I laid my last guinea on the table. Presently Mr. Hendy, the old family lawyer, came to my side. "Be careful, Mr. Trevanion," he whispered, "this is no time for ducks and drakes." But I answered him with an oath, for I was in no humour to be corrected. Besides, wild and lawless as I had been for several years, I remembered that I was a Trevanion, and resented the family attorney daring to try to check me in public. "He won't listen to reason, Hendy," sneered old Peter Trevisa. "Ah, these young men! Hot blood, Hendy, hot blood; we can't stop a Trevanion." I had now lost all my money, but I would not stop. Old Trevisa standing at my elbow offering sage advice maddened me. I blurted out what at another time I would not have had mentioned on any consideration. "You have a stake in Trevanion, Trevisa," I cried angrily. "Nonsense, nonsense, Roger," whispered the old man, yet so loudly that all could hear. "You have," I cried, "you know you have. If I paid you all you lent my father, there would be little left. How much would the remnant be?" "We'll not speak of that," laughed the old man.
  • 73. "But we will," I said defiantly, for what with wine, and bad luck, and the irritation of the old man's presence I was beside myself. "What more would you lend on the estate?" He named a sum. "I'll play you for that sum, Prideaux," I cried. "No," replied Prideaux; "no, Trevanion, you've lost enough." "But I will!" I replied angrily. "No," said Prideaux, "I'm not a gamester of that order. I only play for such sums as have been laid on the table." "But you shall!" I cried with an oath; "you dare not as a gentleman refuse me. You've won five hundred guineas from me this very night. You must give me a chance of winning it back." "Luck is against you, Trevanion," replied Prideaux. "It shall never be said of me that I won a man's homestead from him. I refuse to play." "Prideaux has won a maid from you!" laughed old Trevisa with a drunken hiccup. "Be careful or he'll take Trevanion, too." "I'll never play for the land," cried Prideaux again. "But you shall," I protested. "If you refuse you are no gentleman, and you will act like a coward to boot." "Very well," replied Prideaux coolly, "it shall be as you say." We arranged our terms and commenced playing again. Half an hour later I had lost the sum which old Peter Trevisa said he could further advance on Trevanion. I do not think I revealed my sensations when I realized that I had lost my all, but a cold feeling came into my heart nevertheless.
  • 74. "Trevanion," said Prideaux, "we'll not regard the last half-hour's play as anything. It was only fun." "That will not do," I replied. "We have played, and I have lost; that is all." "But I shall not take——" "You will," I cried. "You have played fairly, and it is yours. I will see to it at once that the amount shall be handed to you." "I will not take it," cried Prideaux. "I absolutely refuse." I know I was mad; my blood felt like streams of molten fire in my veins, but I was outwardly cool. The excitement I had previously shown was gone. Perhaps despair helped me to appear calm. "Look you, Peter Trevisa," I said; "you give Prideaux a draft for that money." "Roger, Roger," said the old man coaxingly, "take Prideaux's offer. He won your maid; don't let him win Trevanion too. You'll cut a sorry figure as a landless Trevanion." I seized a pen which lay near, and wrote some words on a piece of paper. "There," I said to Prideaux as I threw it to him, "it shall not be said that a Trevanion ever owed a Prideaux anything, not even a gaming debt. Gentlemen, I wish you good-night." I left the room as I spoke and ordered my horse. I was able to walk straight, although I felt slightly giddy. I scarcely realized what I had done, although I had a vague impression that I was now homeless and friendless. A ten-mile journey lay before me, but I thought nothing of it. What time I arrived at Trevanion I know not. My horse was taken from me by an old servant, and without speaking a word to any one I went straight to bed.
  • 76. CHAPTER II. PETER TREVISA'S OFFER. The next morning I awoke with terrible pains in my head, while my heart lay like lead within me. For some time I could not realize what had happened; indeed, I hardly knew where I was. It was broad daylight, but I could not tell what the hour was. Presently a clock began to strike, and then I realized that I lay in my own bed at Trevanion and that the clock stood in the turret of my own stables. I counted the strokes. It stopped at eleven. No sooner had it ceased than all that had happened the previous night flashed through my mind. I jumped out of bed and looked out of the window. Never had the place seemed so fair to look upon, never had the trees looked so large and stately. And I was burdened with the dread remembrance that it was no longer mine. When I had dressed I tried to face the matter fairly. I tried to understand what I had done. The more I thought about it the more I cursed myself for being a fool. For I felt how insane I had been. I had drunk too much wine, I had allowed myself to become angry at old Peter Trevisa's words. I had blurted out truths which under other circumstances I would rather have bitten my tongue in two than have told. I had acted like a madman. Wild, foolish as I had been in the past, that night was the climax of my folly. Why had old Peter Trevisa's presence and words aroused me so? The more I thought the sadder I became, the darker did my prospects appear. I had given Prideaux a written guarantee for the money I had been unable to pay. That piece of paper meant my ruin, if he took advantage of it. Would he do this? Yes, I would see that he did. In extremities as I was, I would rather sacrifice the land than violate our old code of honour. I heard a knock at the door, and a servant entered.
  • 77. "From Mr. Trevisa of Treviscoe, sir," he said. I am afraid my hand trembled slightly as I took the letter. "Who brought it, Daniel?" I asked. "A servant, sir." "Let breakfast be ready in ten minutes, Daniel; I'll be down by that time." "Yes, sir." I broke the seal of the letter and read it. I soon discovered that it was written by young Peter Trevisa. For, first of all, it was written in a clear hand and correctly spelt, and I knew that old Peter's writing was crabbed and ill-shapen; besides which, the old man had not learnt the secret of stringing words together with anything like ease. The contents of the epistle, too, revealed the fact that the son, and not the father, acted as scribe. The following is an exact transcript thereof: "Treviscoe the 25th day of March in the year 1745. "To Roger Trevanion, Esq., of Trevanion. "Dear Sir:—The events of last night having altered their complexion somewhat after you left the house of Geoffry Luxmore, Esq., and the writing which you gave to Mr. Edward Prideaux having changed hands, with that gentleman's consent, it has become necessary for you to visit Treviscoe without delay. My father has therefore instructed me to write (instead of employing our attorney, who has up to the present conducted all correspondence relating to my father's connections with Trevanion) urging your presence here. I am also asked to impress upon you the fact that it will be greatly to your advantage to journey here immediately, while your delay will be perilous to yourself. We shall therefore expect you here within two hours from the delivery of this letter.
  • 78. "Peter Trevisa." This communication certainly looked ominous, and I felt in no very pleasant frame of mind as I entered the room beneath, where my breakfast had been placed for me. "Where is the fellow who brought this, Daniel?" I asked of my old serving-man. "He is standin' outside, sur. He wudden cum in. He seemed in a terble 'urry." I went to the door and saw a horse which had evidently been hard ridden. It was covered with mud and sweat. The man who stood by the animal's side touched his hat when he saw me. "Go into the kitchen, my man, and get something to eat and drink," I said. "I must not, sur," was the reply. "My master told me to ride hard, and to return immediately I got your answer." "Anything wrong at Treviscoe?" "Not as I know ov, sur." I had no hope of anything good from old Peter, and I felt like defying him. My two years' possession of Trevanion had brought but little joy. Every day I was pinched for money, and to have an old house to maintain without a sufficient income galled me. The man who is poor and proud is in no enviable position. Added to this, the desire to hide my poverty had made me reckless, extravagant, dissolute. Sometimes I had been driven to desperation, and, while I had never forgotten the Trevanion's code of honour, I had become feared and disliked by many people. Let me here say that the Trevanion code of honour might be summed up in the following way: "Never betray a woman. Never break a promise. Never leave an insult unavenged. Suffer any privation rather than owe money to any man. Support the church, and honour the king."
  • 79. Having obeyed these dictates, a Trevanion might feel himself free to do what else he liked. He could be a drunkard, a gamester, a swashbuckler, and many other things little to be desired. I speak now for my own branch of the family, for I had but little to do with others of my name. In the course of years the estates had been much divided, and my father's patrimony was never great. True, there were many hundreds of acres of land, but, even although all of it were free from embarrassment, it was not enough to make its owner wealthy. My father had also quarrelled with those who bore our name, partly, I expect, because they treated him with but little courtesy. Perhaps this was one reason why he had been recklessly extravagant, and why he had taken no pains to make me careful. Anyhow I am afraid that while I was feared by many I was beloved by few. I had had many quarrels, and the law of my county being something lax, I had done deeds which had by no means endeared me to my neighbours. My pride was great, my temper was of the shortest, my tastes and habits were expensive, and my income being small, I was weary of keeping up a position for which I had not the means. Consequently, as I read young Peter Trevisa's letter, I felt like refusing to obey his bidding. I had been true to the Trevanion code of honour. I had given Prideaux a written promise that the gaming debt should be paid. Let them do their worst. I was young, as strong as a horse, scarcely knew the meaning of fatigue, and I loved adventure. I was the last of my branch of the family, so there was no one that I feared grieving. Very well, then, I would seek my fortune elsewhere. There were treasures in India, there were quarrels nearer home, and strong men were needed. There were many careers open to me; I would leave Trevanion and go to lands beyond the seas. I was about to tell the man to inform his master that I refused to go to Treviscoe, when I was influenced to change my mind. I was curious to know what old Peter had to say. I was careless as to what he intended doing in relation to the moneys I owed him, but I
  • 80. wondered what schemes the old man had in his mind. Why did he want to see me? It would do no harm to ride to his house. I wanted occupation, excitement, and the ride would be enjoyable. "Very well," I said, "if I do not see your master before you do, tell him I will follow you directly." "Yes, sur," and without another word the man mounted the horse and rode away. I ate a hearty breakfast, and before long felt in a gay mood. True the old home was dear to me, but the thought of being free from anxious care as to how I might meet my creditors was pleasant. I made plans as to where I should go, and what steps I should first take in winning a fortune. The spirit of adventure was upon me, and I laughed aloud. In a few days Cornwall should know me no more. I would go to London; when there nothing should be impossible to a man of thirty-two. I spoke pleasantly to Daniel, the old serving-man, and my laughter became infectious. A few seconds later the kitchen maids had caught my humour. Then my mood changed, for I felt a twinge of pain at telling them they must leave the old place. Some of them had lived there long years, and they would ill-brook the thought of seeking new service. They had served the family faithfully too, and ought to be pensioned liberally instead of being sent penniless into the world. A little later I was riding furiously toward Treviscoe. The place was a good many miles from Trevanion, but I reached it in a little more than an hour. I found old Peter and his son eagerly awaiting me. "Glad to see you, Roger, glad to see you," said the old man. "Why did you send for me?" I asked. "I'll tell you directly. John, take some wine in the library."
  • 81. The servant departed to do his bidding, and I followed the two Trevisas into the library. "Sit down by the fire, Roger, lad; that's it. First of all we'll drink each other's health in the best wine I have in my cellar. This is a special occasion, Roger." "Doubtless, a special occasion," I replied; "but no wine for me at present. I want to keep my head cool in talking with such as you. What do you want of me?" "Let's not be hasty, Roger," said old Peter, eyeing me keenly, while young Peter drew his chair to a spot where his face was shaded, but from which he could see me plainly. "Let's be friendly." "I'm in no humour to be friendly," was my rejoinder. "Tell me why you have wished me to come to you?" "I would have come to you, but I had a twinge of gout this morning, and was not able to travel. I wanted to see you on an important matter, my dear lad." "Will you drop all such honeyed phrases, Peter Trevisa," I said angrily. "I know you lent money to my father on Trevanion. I know I have been a fool since I came into possession. Last night I lost my head. Well, Prideaux shall be paid, and you will take the rest. I quite expect this, and am prepared for it." "Prideaux has been paid," laughed the old man. "In cash?" "Aye, that he has." "Who paid him?" "I did." "Oh, I see. You wanted the bone all to yourself, did you," I cried angrily. "Well, some dogs are like that. But it makes no difference to
  • 82. me. Do your worst." "You remember this," he said, holding up the piece of paper I had given to Prideaux the night before. "I was mad when I wrote it," I replied, "but I remember it well. How did it come into your hands?" "Prideaux has very fine notions about honour," remarked old Peter. "He did not like taking advantage of it, and yet he knew that you as a Trevanion would insist on his doing so." "Well?" "Well, Roger lad, seeing I have the Trevanion deeds, I thought I might as well have this too. So I offered him money down, and he was pleased to arrange the matter that way. He has made the thing over to me." "Let's see it—his writing ought to be on it to that effect." "It is; aye, it is." "Then let me look at it." "No, Roger. This paper is very precious to me. I dare not let you have it. You might destroy it then." "Peter Trevisa," I cried, "did ever a Trevanion do a trick like that?" "No, but you are in a tight corner, and——" "Listen, you chattering old fool," I cried angrily. "If I wished, I could squeeze the life out of the bodies of both of you and take the paper from you before any one could come to your aid. But that's not my way; give it me." "I'll trust you, Roger; here it is." I looked at the paper. I saw my own promise and signature; underneath it was stated that the money had been paid by Peter
  • 83. Trevisa, and signed "Edward Prideaux." I flung it at him. "There," I said, "you've forged the last link in your chain now. I am quite prepared for what I have no doubt you will do. Trevanion is yours. Well, have it; may it bring you as much joy as it has brought me." "You misjudge me," cried old Peter. "You misjudge both me and my son. True, Trevanion would be a fine place for my lad, but then I should not like to drive you away from your old home. All the Trevanions would turn in their graves if any one else lived there. I want to be your friend. I desire to help you on to your feet again." "Wind!" I cried. "Trust you to help any man!" "Listen to what my father has to say," cried young Peter. "You will see that we both wish to be friendly." His face was partly hidden; nevertheless I saw the curious light shining from his eyes. He was undersized, this young Peter, just as his father was. A foxy expression was on his face, and his mouth betrayed his nature. He was cunning and sensual. His was not unlike a monkey's face. His forehead receded, his lips were thick, his ears large. "Roger Trevanion, my lad, there is no reason why you should have to leave your old home. Nay, there is no reason why you should not be better off than you have been. That is why I got this paper from Edward Prideaux." Old Peter spoke slowly, looking at me from the corner of his eyes. "You want me to do something," I said after a minute's silence. "Ah, Roger," laughed the old man, "how quickly you jump at conclusions." "It will not do, Peter Trevisa," I cried. "You have Trevanion. Well, make the most of it. I shall not be sorry to be away from the county.
  • 84. The thought that everything has really belonged to you has hung like a millstone around my neck. I am not going to fetch and carry for you." "But if you had the deeds back. If I burnt this paper. If the estate were unencumbered. What then?" "You know it will not be. Trust you to give up your pound of flesh." "You do me an injustice," replied old Peter, with a semblance of righteous indignation. "What right have you to say this? Have I been hard on you. Have I dunned you for your money." "No; but you have lost no opportunity of letting me know that the place belongs to you." "That was natural, very natural. I wanted to put a check on your extravagance." I laughed in his face, for I knew this to be a lie. "Roger Trevanion," cried young Peter, "my father is a merciful man. He has your welfare at heart. He is old too. Is it manly to mock old age." "Let there be an end of this," I cried. "I begin to see why you have brought me here. I knew you had some deep-laid plans or I would not have come. It is always interesting to know what such as you think. Well, let's know what it is." For the moment I seemed master of the situation. An outsider would have imagined them in my power instead of I being in theirs. Especially did young Peter look anxious. "I am sure we can trust Roger," said the old man. "When a Trevanion gives his word he has never been known to break it." "But they are learning to be careful how to give their word," I retorted.
  • 85. Peter looked uneasy. "But if I ask you to keep what I tell you a secret, you will promise, Roger?" "I ask for no confidences," I replied. "You said just now that we wanted you to do something," said young Peter. "You guessed rightly. If you do not feel inclined to do what we ask you, you will of course respect anything we may tell you?" "That is but fair," was my answer. "You promise, then?" cried old Peter. "If I honourably can," I replied. For a few seconds both men were silent; then old Peter began to speak again. "Roger Trevanion," he said, "you know that I hold the deeds of Trevanion; you know that you are entirely at my mercy." "Well enough." "You would like to remain at Trevanion? You, a Trevanion, would not like to be an outcast, a mere vagrant, a landless gipsy." "I don't care much," I replied. "I should be free; and I would rather be landless than be supposed to own the land, while everything practically belonged to you. I've told you this before. Why make me say it again?" "But you would like the deeds back. You would like to live at the old home with plenty of money?" "You know I would. Why mock me?" "You would do a great deal in order that this might come to pass." "What do you want?"
  • 86. We had come back to the same point again, and again old Peter hesitated. "You know Restormel?" he said at length. "Restormel Castle, up by Lostwithiel?" I asked. "No; Restormel in the parish of St. Miriam, a few miles north from here?" "Oh, yes, I know." "What do you know?" Both old Peter and young Peter spoke in the same breath; both spoke eagerly, too—anxiously in fact. "What is rumoured by certain gossips," I replied. "I expect there is no truth in it." "But what have you heard?" "It is said that the estate belongs to a chit of a maid," I replied; "that the maid's mother died at her birth, and that her father, Godfrey Molesworth, did not long survive her. That he was broken- hearted. That everything was left to a mere baby." "But what became of the baby?" "I know not. I have heard that she has never been seen on the place, although her father has been dead wellnigh twenty years. That the rents are paid to Colman Killigrew who lives at Endellion Castle, and who is a godless old savage. Rumour says that he claims to be the maid's guardian. But of this I am ignorant. He lives full fifty miles from here, and I know nothing of him." "That is all you have heard?" "That is all I can remember at present." "You have never seen the maid?"
  • 87. "No. Who has? Stay; I have heard she was placed in a convent school. Old Killigrew is a Catholic, I suppose." "I'll tell you more, Roger Trevanion. Colman Killigrew has been fattening on the Restormel lands for wellnigh twenty years. He hath kept the maid, Nancy Molesworth, a prisoner. In a few months she will be twenty-one. He intends marrying her to one of his sons. She hates the whole tribe of Killigrews, but he cares nothing for that. He is determined; you can guess why." "Yes, such things are common. But what is that to me? I know nothing of the maid, Nancy Molesworth; I do not care. Let the Killigrews marry her; let them possess Restormel." "My son Peter hath seen the maid, Roger." "Ah! How?" "He had to pay a visit in the neighbourhood of Endellion Castle, and he saw her by chance." "Spoke he to her?" "No, he did not; she did not see him. She is kept a close prisoner, but my Peter hath lost his heart." I turned and looked at young Peter, and his face looked more monkeyish than ever. A simpering smile played around his protruding mouth. His eyes shone like those of a weazel. "Well," I said, "what is this to me?" "This, Roger Trevanion. I want that maid, Nancy Molesworth, brought here to Treviscoe. I want to save her from those Papist savages who would bring ruin upon the maid and upon the country." "That's nothing to me," I replied; "I avoid women. They are all alike —all cruel, all selfish, all false as hell. Why tell your plans to me?"
  • 88. Welcome to our website – the perfect destination for book lovers and knowledge seekers. We believe that every book holds a new world, offering opportunities for learning, discovery, and personal growth. That’s why we are dedicated to bringing you a diverse collection of books, ranging from classic literature and specialized publications to self-development guides and children's books. More than just a book-buying platform, we strive to be a bridge connecting you with timeless cultural and intellectual values. With an elegant, user-friendly interface and a smart search system, you can quickly find the books that best suit your interests. Additionally, our special promotions and home delivery services help you save time and fully enjoy the joy of reading. Join us on a journey of knowledge exploration, passion nurturing, and personal growth every day! ebookbell.com