deep_coders(sourav,nitin)

DEEP CODERS
ECKOVATION MACHINE LEARNING
members
Nitin Khatkar :01711503116
Sourav Tiwari :03011503116
Gulshan :01211503116
Shrey Achreja :41311503116

What cuisine is this recipe??
Picture yourself strolling through
your local, open-air market... What
do you see? What do you smell?
What will you make for dinner
tonight?
We want to thank Yummly for providing this unique dataset.
2

Data Description
▫ In the dataset, we include the
recipe id, the type of cuisine,
and the list of ingredients of
each recipe (of variable length).
The data is stored in JSON
format.
▫ An example of a recipe node in
train.json is given aside:
3

“We would predict the cuisine for
each recipe in the test case.”
4

5
STEPS FOLLOWED TO SOLVE THE GIVEN PROBLEM
STEP 3
At last, we will apply
the suitable algorithm
to it and find the best
suitable algorithm to
it.
STEP 1
First we will perform
EDA and will remove
all the redundant data
from given dataset.
STEP 2
Then , we will form our
feature matrix as well
target metrix.

PRE-PROCESSING
TOP 10 INGREDIENTS ACCORDING TO THE CUSINE

ALGORTITHM FOR FINDING TOP 10 CUISINE GIVEN ON NEXT SLIDE ->
7

ALGORITHM FOR FINDING TOP 10 CUISINE
▫ First make a dictionary with keys as different cuisine’s
and ingredients present in it as values.(dic)
▫ Then, with the help of above dictionary make a new
dictionary containing the counts of the ingredients
present in it.(count_dictionary)
▫ At last, make the pie chart of top 10 ingredients with
the help of above two dictionaries.(code given on next
slide)
8

CODE FOR PLOTTING TOP 10 INGREDIENTS
9

APPLYING
ML TO IT
FIRST GENERATING X AND Y FOR FURTHER
APPLYING ANY ALGORITHM TO IT.
10

11
GENERATING X AND Y
▫ Create an empty list
y,total_ingredients.
▫ Append all the unique ingredients in
the list total_ingredients.
▫ Create a zero matrix using numpy and
name it as x.(number of rows equal to
y and columns equal to
total_ingredients)
▫ For every ingredient in y replace with
1.
▫ Our feature matrix x and target y is
ready.

DIFFERENT ALGORITHM USED
12
FOURTH
FINALLY WE WENT FOR FINSL TEST OF DEEP
LEARNING BUT DUE COULD NOT PERFORM ON
FULL DATA DUE TO LOW END SOECIFIACTIONS
THIRD
OUT OF CUROSITY, WE ALSO
APPLIED NAÏVE BAYES TO IT, BUT
AGAIN GOT A VERY LOW SCORE OF
0.36
SECOND
THEN ,WE TESTED FOR RANDOM
FOREST AND GOT A SATISFACTORY
SCORE OF 0.72
FIRST
WE STARTED WITH DECISION TREE.
OUTCOME WAS NOT AT ALL FRUITFUL
AS IT ACHIEVED SCORE OF 0.30

13
ALGORITHM 1
▫ APPLYING DECISION
TREE TO IT.
▫ GOT A SCORE OF 0.26

14
ALGORITHM 2
▫ APPLYING RANDOM
FOREST TO IT.

15
ALGORITHM 3
▫ APPLYING NAÏVE NAYES
TO IT

16
ALGORITHM 4
▫ APPLYING DEEP
LEARNING TO IT.
▫ GOT A SCORE OF 0.64 ON
ONLY 10000 DATA

SHORTCOMINGS
Deep learning could not be
applied on whole dataset due to
low end specifications and SVM
could also not be applied due to
memory error.
RESULT
Conclusion
Random Forest is best algo ,
But if deep learning performed
on full dataset then , conclusion
may differ.
17

DATA COMPARISON
SCORE
DECISION TREE 0.30
NAÏVE BAYES 0.36
RANDOM FOREST 0.72
DEEP LEARNING 0.64
18

0.72
Final score achieved
Highest is 0.82
19

20
Forest Cover Type Prediction
▫ The study area includes four wilderness areas located in the
Roosevelt National Forest of northern Colorado. Each observation
is a 30m x 30m patch. We are asked to predict an integer
classification for the forest cover type.

21
Data Description
▫ The seven types are:
▫ 1 - Spruce/Fir
2 - Lodgepole Pine
3 - Ponderosa Pine
4 - Cottonwood/Willow
5 - Aspen
6 - Douglas-fir
7 - Krummholz
▫ The training set (15120 observations)
contains both features and the Cover_Type.
The test set contains only the features. You
must predict the Cover_Type for every
row in the test set (565892 observations).

“We would predict the forest-cover
type based upon the given value of
parameters.”
22

23
STEPS FOLLOWED TO SOLVE THE GIVEN PROBLEM
23
STEP 3
At last, we will apply
the suitable algorithm
to it and find the best
suitable algorithm to
it.
STEP 1
First we will perform
EDA and will remove
all the redundant data
from given dataset.
STEP 2
Then , we will form our
feature matrix as well
target metrix.

24
PRE-PROCESSING
PLOTTING PARAMETER’S WITH RESPECT TO FOREST-COVER TYPE

SAMPLE CODE FOR POTTING GRAPH
26

27
APPLYING
ML TO IT
FIRST GENERATING X AND Y FOR FURTHER
APPLYING ANY ALGORITHM TO IT.
27

28
GENERATING X AND Y
▫ Assigning the value’s in
cover_type column in target
matrix(y).
▫ Then, after removing the
cover_type from data-frame
assigning the value to x.
▫ Removing the redundant columns.

29
DIFFERENT ALGORITHM USED
FOURTH
Finally we went ahead for deep learning and got
score 0.84 as the Random forest.
THIRD
OUT OF CUROSITY, WE ALSO
APPLIED NAÏVE BAYES and SVM TO
IT, BUT AGAIN GOT A VERY LOW
SCORE OF 0.58 and 0.14 respectively.
SECOND
THEN ,WE TESTED FOR RANDOM
FOREST AND GOT A SATISFACTORY
SCORE OF 0.84
FIRST
WE STARTED WITH DECISION TREE.
OUTCOME WAS NOT AT ALL FRUITFUL
AS IT ACHIEVED SCORE OF 0.60

30
ALGORITHM 1
▫ Applying decision tree to
our problem
▫ We got a score of 0.30

31
ALGORITHM 2
▫ Applying random forest to
our problem we get.

32
ALGORITHM 3
▫ Applied naïve-bayes and
SVM to it.
▫ But did not get fruitful
result because it’s not a
probability problem

33
ALGORITHM 4
▫ At last applying deep
learning to our problem

34
DATA COMPARISON
SCORE
DECISION TREE 0.60
NAÏVE BAYES 0.58
RANDOM FOREST 0.84
SVM 0.14
DEEP LEARNING 0.80
34

35
0.84
Final score achieved
35

deep_coders(sourav,nitin)

More Related Content

What's hot (19)

Similar to deep_coders(sourav,nitin) (20)

Recently uploaded (20)

deep_coders(sourav,nitin)