Classification Fine-Tuning(Multi-Class & Multi-Label)

Classification Fine-tuning
3팀
Samsung
SDS
2024.11.15

Agenda
Overview
Dataset
Model & Test
Demo
Result & Conclusion

Overview
Project Goal & Background
01
AI
Experts

Dataset
Stack Exchange
Stack Overflow
Tag Clustering
02
AI
Experts

02
분류 체계가 #WORK, #LIFE로만 분류되어있음
> Stack Exchange Dataset (중간 발표 피드백 반영)
#WORK 중, SW 문의 해결 방법의 경우
> Stackoverflow Dataset 활용
사내 정보를 직접 사용하지 않으면서도, 유사한 상황을 테스트하고 검증해서 적용해볼 수 있는 방안
Dataset - Scenario
3

02 1차 분류 - Stack Exchange
다양한 주제에 대한 질문과 답변을 주고받을 수 있는 Q&A 플랫폼
Stack Exchange 네트워크에 Stack Overflow가 포함되어있음
4

Text Label
How is RNAse contamination in RNA based experiments prevented? biology
Are there ways to determine if a wall is load bearing? diy
What is the difference between baking soda and baking powder? cooking
python: How to fix this "ValueError"? tech (Stack Overflow를 변경)
What are some Caribbean cruises for October? travel
02 1차 분류 - Stack Exchange Data Sample (Multi-Classification)
전체 5개의 클래스 ['biology', 'cooking', 'diy', 'travel', 'tech'] 중 하나의 Label만 선택
5

02 2차 분류 - Stack Overflow
5900만 개 이상의 질문과 답변을 보유한 세계 최대 개발자 커뮤니티
6

02 Original Dataset 문제 해결 방안 탐색 - Tag Clustering
K-mean clustering: Silhouette Score 기반으로 K 값을 구하고 Clustering 재수행
7

Cluster Tags Cluster Name 설명
0
['sql', 'dataframe', 'sql-server', 'database', 'firebase', 'postgresql', 'mysql',
'sqlite', 'oracle']
Databases 데이터베이스 관리 및 데이터 처리 관련 태그
1
['css', 'html', 'javascript', 'jquery', 'json', 'angularjs', 'angular', 'twitter-
bootstrap', 'reactjs', 'vue.js', 'typescript', 'ecmascript-6', 'webpack', 'ajax']
Frontend 웹 개발의 프론트엔드 및 관련 프레임워크
2
['visual-studio-code', 'visual-studio', 'android-studio', 'xcode', 'docker', 'git',
'windows', 'vb.net', 'c#']
Dev Tools 개발 도구 및 환경 관련 태그
3
['bash', 'shell', 'linux', 'file', 'perl', 'pointers', 'c', 'date', 'datetime', 'winforms',
'spring', 'spring-boot', 'selenium', 'android', 'vba']
Systems 시스템 프로그래밍, 스크립팅 및 자동화 관련 태그
4 ['ios', 'react-native', 'objective-c', 'macos', 'swift'] iOS/macOS iOS 및 macOS 애플리케이션 개발 관련 태그
5
['tensorflow', 'machine-learning', 'python-3.x', 'pandas', 'apache-spark',
'numpy', 'arrays', 'r', 'dictionary', 'dataframe', 'excel', 'csv', 'python', 'python-
2.7']
Data Science 데이터 과학, 머신러닝 및 데이터 처리 관련 태그
6 ['scala', 'kotlin', 'go', 'dart', 'flutter', 'node.js', 'npm', 'android-layout'] Mobile 모바일 및 크로스 플랫폼 개발 언어 및 프레임워크
7
['php', 'laravel', 'amazon-web-services', 'wordpress', 'asp.net', 'asp.net-mvc',
'asp.net-core', '.net-core', 'ruby', 'ruby-on-rails', 'django', '.net']
Backend 백엔드 및 웹 프레임워크 관련 태그
8
['algorithm', 'sorting', 'loops', 'function', 'for-loop', 'c++11', 'java',
'multithreading', 'regex', 'list', 'class', 'string', 'if-statement', 'xml', 'c++']
Algorithms 알고리즘 및 자료 구조 관련 태그
02 Dataset - Tag Clustering 결과 : 총 9개의 Cluster로 재분류
8

Text Label (Clustering)
How to convert a float into int using round method in ms sql?I tried with <code> select ROUND(1235.53) --(It
can contain "n" digit of scale)</code>But got error
Databases
R: find missing data and add it with a zero I have the following set of data:<a
href="https://i.stack.imgur.com/wYwVy.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/wYwVy.png"
alt="enter image
Data Science,
Algorithms
Get value from Bottom Sheet Dialog FragmentI'm starting bottomSheetDialogFragment from a fragment A.I want to
select the date from that bottomSheetDialogFragment then set it in the fragment A.The select date is already done, I
just want to get it in the fragment A to set it in some fields.How can I get the value?Any suggestions how to do it?

Systems
node.js How recieve field value?with node.js i create server. Than i build http form on the specific adress(i gona do diferent
form for diferent adress). I want to recieve data from the user in specific field(i give them diferent ID). function auth(res)
{ res.writeHead(200, { 'Content-Type': 'text/html', }); var body = ''; body= '<form action="/"
method="post">'+ '<thead>Connection details </thead>' + ' '+ '
Frontend, Mobile
02 2차 분류 - Stack Overflow Data Sample (Multi-label Classification)
전체 9개의 Label (Tag Clustering 결과) 중, Multi-Label 선택
9

Model & Test
Model
Test
Result
03
AI
Experts

03 “Medium” LMs of Code in the Era of LLMs : Lessons From StackOverflow
SOBertBase
Domain Specific하게 사전 학습시킨 작은 언어 모델을
Downstream Task에 Fine-tuning했을 때,
큰 모델과 비슷하거나 좋은 성능을 보임
125M Parameters
Stack Overflow의 27B 토큰으로 사전학습 시킨 BERT 모델
Reference: Mukherjee, M., & Hellendoorn, V. J. (2023). " Medium" LMs of Code in the Era of LLMs: Lessons From StackOverflow. arXiv preprint arXiv:2306.03268.
10

03 Models
Llama 3.2
Mamba-2
대용량 데이터 학습
: 대규모 자연어 데이터셋을 활용한 사전 학습, 정교한 언어 이해 및 생성 가능
멀티태스크 학습: 텍스트 생성뿐 아니라 텍스트 분류, 요약, 번역 등 다양한 작업에 적용 가능
확장성: 파인튜닝 및 전이 학습을 통해 특정 도메인에 대한 최적화 가능
Transformer가 아닌 SSM기반 모델
S6 : Structured State Space sequence model(S4) + Selective Scan
빠른 추론(Transformer 보다 5배 높은 처리량)과 시퀀스 길이에 대한 선형 스케일링이 강점
모델 경량화: 파라미터 최적화를 통해 빠르고 효율적인 데이터 처리 지원
모듈식 구조: 각기 다른 태스크 및 데이터셋에 맞춤형 구성 가능
낮은 메모리 사용량: 저사양 환경에서도 유연하게 적용 가능, 실시간 데이터 처리에 강점
Reference: Gu, A., & Dao, T. (2023). Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752.
11

03 LLM Embedding : Rethinking Lightweight LLM’s Genuine Function in Text Class
다양한 경량 LLM들의 텍스트 임베딩을 적절히 추출하고 융합하여 성능을 향상시킨 Multi-LLM 아키텍처 모델
최적화된 Training 속도로 빠른 학습이 가능하며, 주요 벤치마크에서 충분한 정확도를 달성
다양한 백본 및 깊이에서 융합된 임베딩을 추출하여 강인성과 일반화를 향상
Reference: Liu, C., Zhang, H., Zhao, K., Ju, X., & Yang, L. (2024). LLMEmbed: Rethinking Lightweight LLM's Genuine Function in Text Classification. arXiv preprint arXiv:2406.03725.
12

03 LLM Embedding - 경량화된 LLM을 통해 효율적인 텍스트 임베딩 추출
13

03 Test 1 - 1차분류(Multi-Class Classification) (Validation Dataset, 5 epoch)
14

03 Result : 1차분류(Multi-Class Classification)
Model # of parameters Accuracy Precision Recall F1 Traing Time
Gpt4o_mini 0.9070 0.9070 0.9070 0.9070 -
RoBERTa with LoRA 355M 0.9670 0.9670 0.9670 0.9670 21:01
Llama 3.2 1B with LoRA 1.23B 0.9780 0.9780 0.9780 0.9780 12:47
SOBertBase 125M 0.8990 0.8990 0.8990 0.8990 4:09
mamba2-130m 130M 0.9680 0.9680 0.9680 0.9680 19:02
mamba2-780m 780M 0.9770 0.9770 0.9770 0.9770 40:05
Embedded LLM (llama2 7B) 7B + 355M + 110M 0.9810 0.9810 0.9810 0.9810 3:43
Embedded LLM (llama3.2 1B) 1B + 355M + 110M 0.9530 0.9530 0.9530 0.9530 2:19
15

03 Result : 1차분류(Multi-Class Classification)
Gpt4o mini 에게는 학습이 없어도 되는 쉬운 과제
소형 모델에 적용하기 위해서는 Fine-tuning이 필요
Tech에 Fine-tuning된 모델인 SOBertBase는
1차 성능 분류에서 성능이 상대적으로 떨어짐
16

03 Test 2 - 2차분류(Multi-label Classification)(Validation Dataset, 5 epoch)
17

Model # of parameters Accuracy Partial Accuracy Precision Recall F1 Traing Time
Gpt4o_mini 0.3603 0.8551 0.4784 0.4915 0.3852 -
RoBERTa with LoRA 355M 0.6307 0.9473 0.6918 0.8188 0.7256 0:47:54
Llama 3.2 1B with LoRA 1.23B 0.7103 0.9618 0.7322 0.8922 0.7855 4:06:26
SOBertBase 125M 0.6926 0.9584 0.7247 0.8941 0.7849 1:35:19
mamba2-130m 130M 0.6970 0.9593 0.7268 0.8889 0.7814 0:43:12
mamba2-780m 780M 0.7213 0.9631 0.7501 0.8944 0.8010 4:15:47
Embedded LLM
7B +
355M + 110M
0.6863 0.9576 0.7470 0.8717 0.7924 0:39:33
Embedded LLM
(Llama3.2 1B)
1B +
355M + 110M
0.6507 0.9514 0.7076 0.8379 0.7504 0:24:58
03 Result : 2차분류(Multi-label Classification)
18

03 Result : 2차분류(Multi-label Classification)
GPT 4o-mini에게도 어려운 과제
Fine-tuning을 하게 되면 크게 성능이 향상됨
19

03 LLM 크기와 학습 시간 비교
Embedded model의 속도가 모든 training에서 우수
LoRA 학습을 하는 경우 학습 속도가 빨라짐|
(Llama 3.2 1B, RoBERTa)
모델의 크기가 커질수록 학습시간이 길어짐
(mamba2-130m vs. mamba2-780)
Transformer의 학습 Single label에서 더 빠르고
Mamba-2의 학습은 Multi label classification에서
더 빠름 (SOBertBase vs. mamba2-130m)
20

03 Resource - 모델별 GPU 사용량
Embedded LLM은 Training 시, GPU
를 거의 쓰지 않음(Training 최적화)
LoRA Fine-Tuning 시, GPU 활용률
이 높고, 메모리는 덜 쓰는 경향
(* 전체 학습시간은 감소)
21

Demo
1차 분류 : LLM Embed, Sobert
2차 분류 : Mamba-2, sobert
04
AI
Experts

Conclusion
Quantitative Methods
05
AI
Experts

05 Discussion
Embedded LLM은 최근 경량 모델(Llama3.2 1B, Mamba2) 대비 성능 측면의 경쟁력은 부족
본 연구에서 새로 시도한 Multi-label classification의 경우, 성능이 다소 떨어지는 것으로 확인.
임베딩 모델들은 Training시에는 불필요하지만, 실시간 추론 시에는 전체 모델이 필요
Embedded LLM 아키텍쳐를 다양한 용도로 연구 및 활용 가능
모델의 임베딩을 미리 계산하여 저장하고, 경량 DownStream 모델을 학습시키는 속도가 빠름
자원이 제한적인 환경에서도 이후의 모델 training이 가능 (예: 연구, 실험, 프로토타이핑, 교육 목적, 기술시연,
분산 컴퓨팅 환경 등)
23
간단한 Task에서의 mamba의 성능과 가능성
Mamba는 Classification에서 우수한 성능을 보였고 추론시간도 빠르게 나타남.
학습 속도의 개선이 필요: 비교적 최신모델이라 아직 오픈된 PEFT 방법이 마련되지 않은 것으로 보임 .

05 Conclusion
비즈니스 환경에 따라 적합한 모델 선정이 필요
분류 목적인 경우: Mamba, SOBERT 등의 작은 규모의 모델이 적합
LLM모델 사용하는 환경에서 분류를 추가 할 경우: Embedded model이 적합
24

Samsung
SDS
Thank you :)
2024.11.15

Classification Fine-Tuning(Multi-Class & Multi-Label)

More Related Content

Similar to Classification Fine-Tuning(Multi-Class & Multi-Label) (20)

Classification Fine-Tuning(Multi-Class & Multi-Label)