Andy Bosyi: Few-shot learning as a trade-off between software development and data science

mindcraft.ai
Few Shots Learning
History:
1995 - Internet
2000 - Software
2005 - Web
2010 - Machine Learning
2016 - Deep Learning
2018 - Transformers
2020 - Large Models
2022 - Transfer Learning
sudeep.co

mindcraft.ai
Transfer Learning in LLM
- what caused impact
on few shot learning
- nocode - text instruction
- text embeddings
- data translation
- federated learning
medium.com

mindcraft.ai
Fine Tuning
- image classification
- object detection
- semantic, identity segmentation
- style transfer
- text classification, NER
coco dataset

mindcraft.ai
Dataset Generation
- generate or augment problem-specific data
using LLM, diffusion etc
- select a model from HuggingFace
- fine tune it with the dataset

mindcraft.ai
Zero Shot Learning
- it is not unsupervised
- train on some classes and then predict on a new one
- multimodal (image and text embeddings)
- human language prompt in a Large Model

mindcraft.ai
One Shot Learning
- template matching
- clustering and finding
closest centroid
- triple loss and face detection
- human language prompt + example
medium.com/@crimy

mindcraft.ai
Few Shot Learning
- recommendation systems
- prototypical networks
- chat models - prompt + examples
- LLM fine tuning
towardsdatascience.com

mindcraft.ai
Reinforcement Learning
- translate images into language tokens
- using autoregressive Transformer
to learn world
- 2 hours playing games to train
- outperformed human in 10 of 26 games
ICLR 2023, Transformers are Sample-Efficient World
Models

mindcraft.ai
Document Classification
- text classification task in ~80 categories
- multiple languages
- used BERT and Ada embeddings
- dataset augmentation with GPT3.5
- simple NN for the classification task
- planning to add fine tuning

mindcraft.ai
Anomaly detection
- check fin declarations
- zero shot learning approach
- catches only obvious things
- requires historical and
snapshot clustering
open.ai

mindcraft.ai
Custom Assistant
- replacing categorization bot
with a human language one
- collected dataset of ~2k Q&A
- fine tuned open.ai Davinci
- spent $300 on open.ai
- such system can only gently suggest,
not decide

mindcraft.ai
NER
- NEs are collected by pattern
search (RegEx)
- validated if possible
- used GPT3 for context
matching
- alternatively using GPT3.5 for
direct search
open.ai

mindcraft.ai
Document Scope
- factorize document using
few shot learning GPT3.5
- create template by using clustering
on Ada embeddings
- assign names with
zero-shot learning on GPT3.5
- check document portions
against template
open.ai

mindcraft.ai
Fixing UNSPSC
- old rule-based system
- collecting embeddings (BERT, Ada)
- XGBoost for classification
- 400+ classes, almost 80% accuracy

mindcraft.ai
Address Normalization and Deduplication
- few shot learning
on GPT3.5
- similarity check
with Levenshtein distance
- moving to a model
from HuggingFace
- saved 2 months of work
open.ai

mindcraft.ai
Future of Our Job
- Data preparation
- Fine Tuning
- Edge Models
- Prompt Engineering
promptbase.com

mindcraft.ai
Das ist MindCraft
Decision-making Engines for Data-driven Businesses, especially:
- Document and Web pages Classification, Capturing (NLP, CNN, CV, NER)
- Price Prediction (DNN, Regression, Prognosis)
- Command Centers for IoT systems (RNN, Time Series, Anomaly Detection)
- Computer Vision and Object Detection
- Data Analysis and Generation

Andy Bosyi: Few-shot learning as a trade-off between software development and data science

More Related Content

Similar to Andy Bosyi: Few-shot learning as a trade-off between software development and data science (20)

More from Lviv Startup Club (20)

Recently uploaded (20)

Andy Bosyi: Few-shot learning as a trade-off between software development and data science