Oleksandr Makoveychuk Тема: "Science is the new black, або теорфізика і Computer Vision"

Download as PPTX, PDF

0 likes95 views

The document discusses how applying concepts from diverse fields like theoretical physics, evolution theory, special relativity, and graph theory can help solve problems in computer vision and data analysis. It provides examples of how representing data as multidimensional spaces, space-time cubes, and graphs allows for better understanding of topics like facial feature analysis, object counting, image segmentation, and food recognition. The author advocates drawing inspiration from different areas and finding optimally beautiful solutions.

Data & Analytics

Oleksandr Makoveychuk 1
Science is the new black, або
Теорфізика і Computer Vision
Why we need math geeks in IT

Oleksandr Makoveychuk 2
My Experience
• Theoretical physicist by education
• PhD in Engineering Sciences (military equipment with focus on satellite imagery)
• 25 years of overall experience in software engineering
• 15 years in R&D in the field of computer vision technologies
• 8 marathons - 7 full marathons and 1 ultramarathon

Oleksandr Makoveychuk 3
The Art of Science
Clarke's Third Law:
Any sufficiently advanced technology is indistinguishable from magic.
…It is also a kind of art.
Image source: Jay Mark Johnson

Oleksandr Makoveychuk 4
How we approach unfamiliar problems
• Make sense of data!
• Look for inspiration in other fields – from evolution to special relativity to graphs.
• The most beautiful solution is probably the best one.

Oleksandr Makoveychuk 5
How Evolution Theory helps us understand
multidimensional data

Oleksandr Makoveychuk 6
Understanding multidimensional data
Problem: multidimensional tabular data is hard to understand
Idea: use familiar multidimensional spaces

Oleksandr Makoveychuk 7
Experiment: Body measurement and making sense
of countless numbers
Person Height Neck Chest Waist Hips Sleeve Leg …
OM 180 41 103 103 108 59 86 …
RS 192 41 109 96 110 66 80 …
AK 179 45 132 135 134 55 75 …
VK 176 34 91 78 95 59 73 …
DY 165 40 107 104 115 50 75 …
AG 179 36 96 77 97 60 87 …
TK 189 38 98 83 99 62 79 …
IK 184 40 97 82 100 50 80 …
SD 178 37 101 86 99 59 76 …
… … … … … … … … …

Oleksandr Makoveychuk 8
Star diagrams representation
Each feature is a radial line in a data point's
star diagram
• Hard to track features, especially
of different scale
• Hard to describe shapes in a
meaningful way => hard to
reason about them

Oleksandr Makoveychuk 9
Mapping feature space to a new and better one
• Height
• Neck
• Chest
• Waist
• Hips
• Sleeve
• Leg
• …
• Size of face
• Shape of forehead
• Shape of jaw
• Width between eyes
• Vertical position of eyes
• Height of eyes
• Width of eyes
• …
Old feature space:
Measurements
New feature space:
Facial features

Oleksandr Makoveychuk 10
Chernoff’s faces – the ‘face’ space
• Provides a meaningful ‘cast’
for multidimensional data point
• Easy to spot outliers
• Easy to spot common features and clusters
• Easier to compare and discuss
• Face visualization instantly makes
much more sense

Oleksandr Makoveychuk 11
The beauty of optimality

Oleksandr Makoveychuk 12
How Special Relativity Theory helps us
with counting things

Oleksandr Makoveychuk 13
Experiment: Counting buns on bakery conveyor belt
Buns are moving on a belt,
crossing a certain ‘counting’
line.

Oleksandr Makoveychuk 14
Thinking in space-time paradigm
Problem: count moving objects
• Tracking objects is hard
• Computational cost is high
• Different types of objects require different solutions – monetary cost is high, too
Idea: instead of frame-by-frame representation
use space-time cube representation

Oleksandr Makoveychuk 15
Conveyor belt:
Frame-by-frame representation
Montage of sequential frames which
must be processed in a frame-by-
frame order.
Tracking separate buns is not easy.

Oleksandr Makoveychuk 16
Conveyor belt:
Space-time cube representation
• Video frames stacked into a
space-time cube
• Counting cuts trough the cube
along the time dimension
time

Oleksandr Makoveychuk 17
Conveyor belt: Space-time cube representationtime
• Video frames stacked into a
space-time cube
• Counting cuts trough the cube
along the time dimension

Oleksandr Makoveychuk 18
Cutting time in slices
time

Oleksandr Makoveychuk 19
Counting the buns
Total buns count: 30

Oleksandr Makoveychuk 20
What else can we count?
• People
• Cars
• Anything that crosses the line!

Oleksandr Makoveychuk 21
Experiment: Counting vehicles
Resulting picture Background Vehicles in foreground

Oleksandr Makoveychuk 22
Counting vehicles: results
Total vehicles count: 9

Oleksandr Makoveychuk 23
How Graph Theory helps us
with image segmentation

Oleksandr Makoveychuk 24
Graph theory in Computer Vision
Problem: image segmentation
• Used everywhere in computer vision applications
• Hard to achieve precise segmentation
Idea: use matrix of pixels (image) as a graph and apply
algorithms from graph theory

Oleksandr Makoveychuk 25
Hypercube (or Tesseract)

Oleksandr Makoveychuk 26
What is a graph
Graph representation of a hypercube
(cube in 4D)

Oleksandr Makoveychuk 27
Image segmentation and background removal

Oleksandr Makoveychuk 28
Experiment: Counting what you eat
Problem: qualitative (what) and quantitative (how much) analysis of
food on the plate
• Determining food type is hard
• Determining food amount is even harder
Idea: combine both synthetic (whole image recognition) and
analytic (segment based recognition) approaches
• Segment input image into different ‘ingredients’
• Train ingredient type determination model
• Then train ingredient volume determination
model

Oleksandr Makoveychuk 29
Food database examples

Learning
Meat
Plate
Potato
Descriptors:
Meat
Descriptors:
Plate
Descriptors:
Potatoes
1. Load source image
2. Detect plate, exclude background
3. Split image into clusters
4. Merge similar clusters
5. Annotate cluster ingredients
6. Build descriptors
7. Train neural network
Oleksandr Makoveychuk,

Oleksandr Makoveychuk, 31
Predicting
1. Load source image
2. Detect plate, exclude background
3. Split into clusters
4. Build descriptors for clusters
5. Predict ingredient type
with neural network
6. Filter out plate and
low confidence segments
7. Estimate volumes,
calculate nutritional value
for predicted clusters
White Meat (91%)
Plate (98%)
Plate (84%)
Rice (32%)
Potato (87%)
Plate (83%)

Oleksandr Makoveychuk 32
Experiment: Background removal
Video source: Dans Surf Videos https://youtu.be/eDFd3tjDN9M

Oleksandr Makoveychuk 33
Experiment: Background removal

Oleksandr Makoveychuk 34
Conclusions
You never know if the best solution
is hiding on another side of your bookshelf.
Every part of your school knowledge is useful and important.
The Universe is beautiful.
The science is a perfect instrument to see this beauty.

Oleksandr Makoveychuk Тема: "Science is the new black, або теорфізика і Computer Vision"

1. Oleksandr Makoveychuk 1 Science is the new black, або Теорфізика і Computer Vision Why we need math geeks in IT

2. Oleksandr Makoveychuk 2 My Experience • Theoretical physicist by education • PhD in Engineering Sciences (military equipment with focus on satellite imagery) • 25 years of overall experience in software engineering • 15 years in R&D in the field of computer vision technologies • 8 marathons - 7 full marathons and 1 ultramarathon

3. Oleksandr Makoveychuk 3 The Art of Science Clarke's Third Law: Any sufficiently advanced technology is indistinguishable from magic. …It is also a kind of art. Image source: Jay Mark Johnson

4. Oleksandr Makoveychuk 4 How we approach unfamiliar problems • Make sense of data! • Look for inspiration in other fields – from evolution to special relativity to graphs. • The most beautiful solution is probably the best one.

5. Oleksandr Makoveychuk 5 How Evolution Theory helps us understand multidimensional data

6. Oleksandr Makoveychuk 6 Understanding multidimensional data Problem: multidimensional tabular data is hard to understand Idea: use familiar multidimensional spaces

7. Oleksandr Makoveychuk 7 Experiment: Body measurement and making sense of countless numbers Person Height Neck Chest Waist Hips Sleeve Leg … OM 180 41 103 103 108 59 86 … RS 192 41 109 96 110 66 80 … AK 179 45 132 135 134 55 75 … VK 176 34 91 78 95 59 73 … DY 165 40 107 104 115 50 75 … AG 179 36 96 77 97 60 87 … TK 189 38 98 83 99 62 79 … IK 184 40 97 82 100 50 80 … SD 178 37 101 86 99 59 76 … … … … … … … … … …

8. Oleksandr Makoveychuk 8 Star diagrams representation Each feature is a radial line in a data point's star diagram • Hard to track features, especially of different scale • Hard to describe shapes in a meaningful way => hard to reason about them

9. Oleksandr Makoveychuk 9 Mapping feature space to a new and better one • Height • Neck • Chest • Waist • Hips • Sleeve • Leg • … • Size of face • Shape of forehead • Shape of jaw • Width between eyes • Vertical position of eyes • Height of eyes • Width of eyes • … Old feature space: Measurements New feature space: Facial features

10. Oleksandr Makoveychuk 10 Chernoff’s faces – the ‘face’ space • Provides a meaningful ‘cast’ for multidimensional data point • Easy to spot outliers • Easy to spot common features and clusters • Easier to compare and discuss • Face visualization instantly makes much more sense

11. Oleksandr Makoveychuk 11 The beauty of optimality

12. Oleksandr Makoveychuk 12 How Special Relativity Theory helps us with counting things

13. Oleksandr Makoveychuk 13 Experiment: Counting buns on bakery conveyor belt Buns are moving on a belt, crossing a certain ‘counting’ line.

14. Oleksandr Makoveychuk 14 Thinking in space-time paradigm Problem: count moving objects • Tracking objects is hard • Computational cost is high • Different types of objects require different solutions – monetary cost is high, too Idea: instead of frame-by-frame representation use space-time cube representation

15. Oleksandr Makoveychuk 15 Conveyor belt: Frame-by-frame representation Montage of sequential frames which must be processed in a frame-by- frame order. Tracking separate buns is not easy.

16. Oleksandr Makoveychuk 16 Conveyor belt: Space-time cube representation • Video frames stacked into a space-time cube • Counting cuts trough the cube along the time dimension time

17. Oleksandr Makoveychuk 17 Conveyor belt: Space-time cube representationtime • Video frames stacked into a space-time cube • Counting cuts trough the cube along the time dimension

18. Oleksandr Makoveychuk 18 Cutting time in slices time

19. Oleksandr Makoveychuk 19 Counting the buns Total buns count: 30

20. Oleksandr Makoveychuk 20 What else can we count? • People • Cars • Anything that crosses the line!

21. Oleksandr Makoveychuk 21 Experiment: Counting vehicles Resulting picture Background Vehicles in foreground

22. Oleksandr Makoveychuk 22 Counting vehicles: results Total vehicles count: 9

23. Oleksandr Makoveychuk 23 How Graph Theory helps us with image segmentation

24. Oleksandr Makoveychuk 24 Graph theory in Computer Vision Problem: image segmentation • Used everywhere in computer vision applications • Hard to achieve precise segmentation Idea: use matrix of pixels (image) as a graph and apply algorithms from graph theory

25. Oleksandr Makoveychuk 25 Hypercube (or Tesseract)

26. Oleksandr Makoveychuk 26 What is a graph Graph representation of a hypercube (cube in 4D)

27. Oleksandr Makoveychuk 27 Image segmentation and background removal

28. Oleksandr Makoveychuk 28 Experiment: Counting what you eat Problem: qualitative (what) and quantitative (how much) analysis of food on the plate • Determining food type is hard • Determining food amount is even harder Idea: combine both synthetic (whole image recognition) and analytic (segment based recognition) approaches • Segment input image into different ‘ingredients’ • Train ingredient type determination model • Then train ingredient volume determination model

29. Oleksandr Makoveychuk 29 Food database examples

30. Learning Meat Plate Potato Descriptors: Meat Descriptors: Plate Descriptors: Potatoes 1. Load source image 2. Detect plate, exclude background 3. Split image into clusters 4. Merge similar clusters 5. Annotate cluster ingredients 6. Build descriptors 7. Train neural network Oleksandr Makoveychuk,

31. Oleksandr Makoveychuk, 31 Predicting 1. Load source image 2. Detect plate, exclude background 3. Split into clusters 4. Build descriptors for clusters 5. Predict ingredient type with neural network 6. Filter out plate and low confidence segments 7. Estimate volumes, calculate nutritional value for predicted clusters White Meat (91%) Plate (98%) Plate (84%) Rice (32%) Potato (87%) Plate (83%)

32. Oleksandr Makoveychuk 32 Experiment: Background removal Video source: Dans Surf Videos https://youtu.be/eDFd3tjDN9M

33. Oleksandr Makoveychuk 33 Experiment: Background removal

34. Oleksandr Makoveychuk 34 Conclusions You never know if the best solution is hiding on another side of your bookshelf. Every part of your school knowledge is useful and important. The Universe is beautiful. The science is a perfect instrument to see this beauty.

35. Oleksandr Makoveychuk 35 Thank you!

Editor's Notes

#2: Hello, my name’s Oleksandr Makoveychuk, and I am a Head of Science at Abto Software, Lviv.
#3: By education I am a theoretical physicist, and I’m very proud of it. As you may expect, this also influences everything I do at work. =) I graduated from Lviv National Ivan Franko University and now hold a Doctors’ degree in Engineering Sciences. My thesis was about satellite images enhancement. I’ve been involved with applied software development for more than two decades, of which my last 15 years were dedicated to all sorts of computer vision related projects. My preferred way of thinking about problems is running ) I run a lot, I finished 4 full-length marathons and 1 ultra. And my next long run in Kyiv will be in just a week from now. All right. Here is what I would like to talk about today.
#4: Clarke's Third Law says that "Any sufficiently advanced technology is indistinguishable from magic". I would rather say that any advanced technology is a kind of art. There are two parts to this idea. First part of the art lies, of course, in the beauty of the results, that’s the side that Clarke was talking about. The second part is related to the process of solving the particular technological problem, it can be seen in the way the beautiful solutions are born from holistic view of the universe and human knowledge. I am very lucky to work in the field where both these principles are really easy to observe. I like very much this work by Jay Mark Johnson. It combines art and science. Similar picture has once inspired me to look at an old task in absolutely new way. Later you will see what I mean. I, together with my team, mainly work on scientifically demanding projects within the R&D department of our company. We do a lot of computer vision, image and video processing, and of course some machine learning related stuff. You’ve heard a lot about ML during the last couple of days, so I’d like to talk about the computer vision. We’ve been working on CV projects for almost a decade now, and I want to share with you some of the insights that we’ve gathered over this time.
#5: We have seen and worked on many clients’ projects, and we also experiment a lot. Almost all R&D projects are a kind of experiments where you often don’t exactly know where you are going and whether there’s a solution. So how do we approach this sort of unfamiliar problems? - First, When dealing with a complex multidimensional data, you have to make sense of it. You can spend a couple of hours trying to understand the data and save yourself a couple of days (or weeks even) of fruitless development work. - The second idea is that you should look for beauty. The most beautiful solution to the problem is surprisingly often the best one. - And Another idea that really helps us is remembering to Look for inspiration in other fields of human knowledge. You never know where the spark may come from. I want to show you some examples based on our actual experiments, where ideas from evolution, special relativity and graph theories have been efficiently used to solve video processing tasks. So how can Charles Darwin, Albert Einstein and Leonhard Euler help us with computer vision problems?
#6: The first example is about Evolution theory and the difficulties with understanding multidimensional data.
#7: Majority of modern problems involves a huge amount of data. And not only the number of data points in your data set may be huge, but a number of features which describe each of these data points may be huge, too. This is what is called a multidimensional data. You can imagine this data as a giant spreadsheet where rows are data points, and columns are features, and the number of columns is very large. The problem is that this sort of multidimensional tabular data is hard to understand. If we want to make a quick progress with the task at hand, we want to be able to see the structure in the data, to notice some regularities – or irregularities – in it, maybe see outliers or some forming clusters among data points. Of course you can try to start using hard math, build models, etc, and that’s fine. But we can just try to see it, literally. Visual thinking is a very powerful tool of a scientist! So here’s an idea: instead of spreadsheets, we can use a certain visual representation of a multidimensional space that is very familiar to us.
#8: Let’s look at the following experiment. We want to be able to estimate body measurements of a person based on a short video clip with this person and one or two simple measurements such as height and waist circumference. How can this be done? We can gather data – go and measure real people – and that train a model using some ML approach. This sounds easy. The problem is that our model may become overfitted to the data that we have, especially if we have outliers – people like Tyrion Lannister or Gregor Clegane. We have here a spreadsheet with body measurements of a group of people. The first row is my data, you can see my silhouette to the left of the table. Then we have other people’s measurements – height, neck, chest, and so on. Can you spot outliers in this data without using any math?
#9: Here’s one way to do it. What you see here is called a star diagrams. They are frequently used for quick visual data assessment. Here each feature – each particular measurement from the table from the previous slide - is a radial line in a data point's star-like diagram. Numerical value of the feature corresponds to the length of the spoke. This representation already shows us that serious variance is present in the data set. Some stars are much larger (or smaller) than the others, and the overall shape is clearly not uniform. We are making progress with understanding the data, right? But this representation isn’t without problems: First, It’s hard to track features, especially of different scale. All spokes look alike, and it’s hard to compare person’s height to the neck circumference. Data normalization can’t completely eliminate this problem. Also it is hard to describe shapes in a meaningful way, which means that it’s hard to reason about them. Especially when number of features is high, even only larger than five. Can we use something more familiar to us, more meaningful than just spokes in a star diagram?
#10: Luckily, the evolution has provided us with incredibly powerful visual analyzer that is fine-tuned to recognizing features of human faces. American mathematician Herman Chernoff used this fact to develop a famous system for multidimensional data visualization, where data dimensions were mapped to facial features. Instead of stars, let’s draw faces, remapping measurements to facial features, as shown in the slide. So height becomes the size of face, neck circumference corresponds to the shape of forehead, and so on. <look at slide>
#11: So how will our data look in Chernoff’s faces representation? Here’s how. <look at slide> What are the benefits here? It is easy to spot outliers. Central face, for example. It’s easy to spot common features and how they are forming clusters by face size or shape, for example. Other benefits of using Chernoff’s representation: It provides a meaningful ‘cast’ for multidimensional data point. Face visualization instantly makes much more sense. It is easier to compare and discuss the data. We don't exactly understand why or how this works on a biological level, but it does. Darwin’s theory suggests that it has something to do with the fact that as a result of millions of years of evolution the human brain has acquired an extraordinary ability to process features of human faces because it was needed for living in society. So what’s the lesson here. It’s possible to visualize and make sense of multidimensional data. You just have to find a proper representation, preferably the one supported by your brain’s ‘software’ and ‘hardware’.
#13: Here’s another example of how other domains of knowledge come to help with CV related tasks. As a theoretical physicist, I especially like this example, and you’ll see why.
#14: This experiment is about counting the objects. <look at the slide> Imagine a conveyor belt in a bakery: buns are moving on the belt, and we need to count them with a certain accuracy. Our solution will be, of course, based on computer vision. We will need a camera to shoot the video of the buns which cross a predefined line. The slide shows how it’s usually done: we obtain a video, then we try to track the buns and count their trajectories while they are crossing the counting line. And this method works. Well, for the most part. What are the issues?
#15: The method is sensitive to accurate object tracking, and tracking objects in time from frame to frame is hard. Computational cost of this task is high, which increases the monetary cost. Additionally, we would really like our solution to be universal. But different types of objects require different solutions – so the need to develop all these solutions is making the monetary cost even higher. Here’s where Einstein’s ideas from SRT come to help: instead of frame-by-frame representation let’s use a space-time cube representation.
#16: A Traditional approach where every frame is analyzed individually requires identification and tracking of every bun or car, which is a very computationally demanding task, bringing down the final counting accuracy.
#17: On the other hand, we can count these objects on the side of the hypercube after cutting it along the counting line. This allows us to move from counting in 3D, as we did in a traditional approach, to counting in 2D. This significantly simplifies the task and increases accuracy. Computationally, this also means that every frame we have to process only one raw of pixels instead of the whole image. This reduces computational cost by a factor of one thousand.
#18: We can think of the video clip not as of a sequence of 2D-frames following each other as the time flies, but, in the spirit of the SRT, as of a hypercube, composed of the 2D-frames stacked one above the other along the third dimension - time, similar to how post-it-notes are stacked in a block. This idea provides us with a surprisingly effective algorithm for counting moving objects which cross a certain line.
#19: As you can see, we no longer need to track every bun. Instead of many frames we now have only one long image with all the buns at once. Those of you who have seen movie Arrival or read the Ted Chiang’s story which the movie is based upon may recognize the similarity between our new representation and the holistic way in which heptapods the aliens perceived our world.
#20: Of course, we still have to count buns, but that part is easy.
#21: How universal is this approach? What else can we count using it? Well, Anything that moves and crosses the line! People, Cars, bikes. Here’s how the algorithm can be applied to the task of counting vehicles crossing the intersection exit. All we need is to determine a counting line <the red line on the slide>. It doesn’t even have to be parallel to the edges of the frame.
#22: Here on the left side is the output of the algorithm, and you can see how it looks similar to the buns on the belt. Even a much more complex background isn’t a problem for us: it can be estimated and later removed by simple averaging of pixels in time. You can also see how the smaller objects such as people or dogs can be filtered out be their size, so we are only left with vehicles in the rightmost picture. Also note how, almost in spirit of the SRT, the length of the objects doesn’t accurately correspond to their physical length. It shrinks in time coordinate for fast moving objects.
#23: Now we can easily count the vehicles. So once again, this approach is rather universal. It’s fast, it’s very low-cost computationally, and it can be used to count anything that moves. You just have to think of the video clip as of a hypercube composed of the 2D-frames stacked one above the other. By the way, you can see how the technology and art get entangled. This same approach was used by New York artist Jay Mark Johnson to produce the beautiful picture of the dancer which I showed you in the beginning of the talk.
#24: Ok now lets move on to the last section of the talk. At the beginning I mentioned Euler. And now I’ll show how his ideas about graphs can be applied in Computer Vision for efficient image segmentation and background removal.
#25: Segmentation is a process of finding segments within the image. Parts of the image contained by segments are similar in a certain way – they may have similar color or a common boundary. Segmentation is a fundamental part of computer vision. Look how in previous example we are segmenting the hypercube slices to count the buns or vehicles. There exists a great number of different methods of image segmentation. And the problem with them is that the fastest of these algorithms are inaccurate, while the most accurate ones are really slow. We, of course, want to have the best of both worlds. To achieve this we will need to look at the image from a different perspective: instead of treating the image as a matrix of pixels, we look at it as at a graph.
#27: Let’s talk real quick about what’s a graph. A graph is a structure built out of a number of objects (usually called nodes). Nodes are connected pairwise by the edges. Edges represent relations between the nodes and sometime have numerical values – weights – assigned to them. Here’s a graph representation of a 4-dimensional hypercube, the tesseract. It’s not easy to imagine a 4-dimensional object, but graphs still let us build its model. Now how is it useful?
#28: Let’s get back to the matrix of pixels. Consider each pixel to be a node of the graph connected to its neighbors by the edges. For segmentation purposes we look at foreground and background in the picture. Each pixel can belong to either the foreground or the background - with some probability. On the other hand we can look at each pixel’s affinity to its neighbors. These two observations let us play with the picture as with a graph and reformulate the segmentation task in terms of graph theory. And this problem is well known and has effective solution – Max-flow min-cut theorem. Now image segmentation becomes finding a way to cut the corresponding graph. In the slide you can see rough steps of the process: we move from image pixels to nodes, then we find the cut, and then we obtain a well-segmented image. The binary foreground-background segmentation can be generalized to multiple types of segments. The next experiment demonstrates usefulness of segment-based processing as opposed to whole image processing.
#29: Suppose we want to count nutritional value of the meal on a plate. The only thing we have is of course a picture of the plate, for example, taken by a smartphone. The problem here is twofold: First, we need to recognize the type of food, and there may be several types on the plate at once. This is a ‘what’ question: what’s on a plate? Secondly, we have to somehow determine the volume of each type of the food. And this is the ‘how much’ question: how much of the food is on a plate? Both questions are really hard. You have probably noticed the progress in solving the problem of recognizing types of objects in the images during last years. This is done mostly using the synthetic approaches such as deep learning, and results are very promising. So for the most part, we can answer the first, qualitative, question (‘what is on the plate’), especially when the number of food classes is limited. But synthetic approaches are not helpful in quantitative analysis of the food. They are very good at classification, but they cannot produce a value that describes physical volume of the food. Idea here lies in combining synthetic and analytic approaches. Let’s first segment the image to obtain areas of ingredients – this is the analytical part. Then we can recognize these ingredients – the synthetic part. Now, knowing types of ingredients and their area on the plate, we can estimate their volumes using another pre-trained model. And finally estimate total nutritional value of the food. Accurate segmentation is crucial here, that’s why we rely on the help of graph theory.
#30: Here are examples of actual photos that we learned to recognize and count calories for. On the right side you can see a plate of food. There’s some mashed potatoes with meat on the very top. Next is the meat segment, Then below there is the potato segment, And finally on the bottom there’s a plate without both of food segments.
#31: <PREPARE TRANSITIONS> Ok so here is how it works in practice. First the learning part. 1. Load source image 2. Detect plate, exclude background 3. Split image into clusters 4. Merge similar clusters 5. Annotate cluster ingredients 6. Build descriptors 7. Train neural network
#32: Now the prediction part. First steps are the same as with the learning part. 1. We Load source image 2. We Detect plate, exclude background 3. We Split image into clusters 4. And Merge similar clusters Then 5. We Predict ingredient types using neural network 6. We Filter out the plate and segments with low prediction confidence 7. We Estimate physical volumes of each segment using some reference object – coin, credit card or the plate itself, And finally calculate their nutritional values and sum them up.
#33: All right. Here’s another experiment demonstrating the power of graph-based image segmentation. This time we work with video. Here’s a frame from video clip of a surfer in the waves, and we want to separate her from the background. Let me play the video.
#34: You can see how the water and foam in the background are completely removed on every frame. Note how the surfer’s shadow becomes part of foreground. It can be removed too, of course.
#35: Ok. So what are the conclusions. When dealing with some hard problem, you should take a step back and look at it from perspectives of other domains of human knowledge. This may provide results that are effective and beautiful at the same time. You never know if the best solution is hiding on another side of your book-shelf. Another conclusion is that We must not forget the importance of a holistic way of looking at the world around us. During the last years I’ve heard many times this notion that most of what you are taught in school is useless and you’ll never apply it in your life. I strongly disagree with this idea. Quite contrary, I think that every part of your school knowledge is useful and important. And you can and should be applying this knowledge to the problems. The Universe around us is just beautiful. And the math – the science – is a perfect instrument helping us truly see this beauty. Thank you for your time.

Oleksandr Makoveychuk Тема: "Science is the new black, або теорфізика і Computer Vision"

More Related Content

Similar to Oleksandr Makoveychuk Тема: "Science is the new black, або теорфізика і Computer Vision" (20)

More from Lviv Startup Club (20)

Recently uploaded (20)

Oleksandr Makoveychuk Тема: "Science is the new black, або теорфізика і Computer Vision"

Editor's Notes