SlideShare a Scribd company logo
Requirement Analysis THE STAT PROJECT Milestone 1 Report
To design a framework, how many variations we need to protect? How many functionalities we need to provide for supporting all these variations? QUESTIONS
Variation for importing dataset (File Sources)
Variations for importing dataset (File formats)
Variations for importing dataset (Schemas) Even if we only consider dataset in XML, each dataset may have its own schema.
Reuters dataset example
Simplified approach One approach: High Level Reader Class,  - ReutersReader RCV1Reader Once written, can be shared by community Observation: for the sake of comparison, researchers usually deal with a few famous dataset (e.g., Reuters, RCV-1)
Able to  persist and read back  memory objects
Able to  visualize  memory objects
STAT (brief) Domain Model Note : We ignore texts on connectors for brevity. Some connections are not drawn because of space limitation
STAT framework sample code (conceptual)
 
Domain Concept:  RawCorpus A collection of  RawDocument , supporting collection operations: - Add new  RawDocument   element - Remove existing  RawDocument   element - Accessing elements in the collection - …
Domain Concept:  RawCorpus abstract class  RawCorpus  { List< RawDocument > rawDocuments; RawDocument getDocument(int index); void setDocument(int index, T doc); void removeDocument(int index); }
Domain Concept:  RawDocument An object with one or more string fields, serving as a non-processed, in-memory representation of a document unit - Like Java beans with getter and setter - All fields must be string type, even for numbers
Domain Concept:  RawDocument class  MyRawDocument  extends  RawDocument  { String title; String author; String body; String date; String numOfClicks; String topicType; … } abstract class  RawDocument  { public RawDocument() {} }
Domain Concept:  Processor An object that processes  RawCorpus  and produces  Corpus .  - Linguistic:  Tokenizer, Stemmer, StopRemover, PosTagger, … - Machine learning: Feature-specific, document-specific
Domain Concept:  Corpus An object representing a collection of  Document   for use by machine learning side of framework. This object provides a notion of splits which is commonly used (e.g., train, test)
Domain Concept:  Trainer A representation of a machine learning algorithm, which can learn from a  Corpus  and produce a  Model .
Domain Concept:  Model An object of what machine learning algorithm (i.e.,  Trainer ) creates to store parameters that are &quot;learned&quot; from the data (i.e.,  Corpus )
Domain Concept:  Classifier An object that maps  Documents  to target values (label, number, probability). It takes a  Corpus  and a  Model  as inputs, and produces a  Prediction  associated with the  Corpus  according to the  Model .
Domain Concept:  Prediction A collection of target values (label, number, probability) that associate with a  Corpus , i.e., a collection of  Document .
Domain Concept:  Evaluator An object used for comparing the  Prediction  against its associated  Corpus  and generating  Evaluation
Domain Concept:  Evaluation A representation of evaluation result given by a  Evaluator , in a summarized manner.
THE STAT PROJECT Thanks
STAT (brief) Domain Model Note : We ignore texts on connectors for brevity. Some connections are not drawn because of space limitation  Corpus Reader Processor RawCorpus Trainer Model Classifier Prediction Evaluator Evaluation Writer Vocabulary
STAT Domain Model Note : We ignore texts above lines for brevity  Corpus Reader Processor RawCorpus Trainer Model Classifier Prediction Evaluator Evaluation Writer
STAT Domain Model Note : We ignore texts above lines for brevity  Corpus Reader Processor RawCorpus Trainer Model Classifier Prediction Evaluator Evaluation Document RawDocument

More Related Content

PPT
ASP.NET Session 7
PPTX
32sql server
PPTX
30csharp
PPTX
An Introduction To Python - Files, Part 1
PPT
FireWatir - Web Application Testing Using Ruby and Firefox
PPT
score based ranking of documents
PPTX
NLP and LSA getting started
ASP.NET Session 7
32sql server
30csharp
An Introduction To Python - Files, Part 1
FireWatir - Web Application Testing Using Ruby and Firefox
score based ranking of documents
NLP and LSA getting started

What's hot (19)

PPT
ALA Interoperability
PDF
Versioned Triple Pattern Fragments
PPTX
Java stereams
PPTX
9 Inputs & Outputs
PPTX
Data structure Unit-I Part A
PDF
Javaiostream
PDF
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
PPT
Input output streams
PPTX
C programming disk file reading and writing
PPTX
Javaiostream
PPT
MPTStore: A Fast, Scalable, and Stable Resource Index
PDF
input/ output in java
PDF
EKAW - Linked Data Publishing
PPTX
RapidMiner: Word Vector Tool And Rapid Miner
PDF
Presentation of OpenNLP
PPTX
File Handling in Java Oop presentation
PPTX
Java Input Output (java.io.*)
PPTX
A Standard Data Format for Computational Chemistry: CSX
PPT
Java IO Package and Streams
ALA Interoperability
Versioned Triple Pattern Fragments
Java stereams
9 Inputs & Outputs
Data structure Unit-I Part A
Javaiostream
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
Input output streams
C programming disk file reading and writing
Javaiostream
MPTStore: A Fast, Scalable, and Stable Resource Index
input/ output in java
EKAW - Linked Data Publishing
RapidMiner: Word Vector Tool And Rapid Miner
Presentation of OpenNLP
File Handling in Java Oop presentation
Java Input Output (java.io.*)
A Standard Data Format for Computational Chemistry: CSX
Java IO Package and Streams
Ad

Viewers also liked (8)

PDF
Effective usecases
PDF
Requirement analysis with use case
PPTX
Determining Requirements In System Analysis And Dsign
PDF
Requirements engineering with UML [Software Modeling] [Computer Science] [Vri...
PDF
Software (requirement) analysis using uml
PPTX
Software Requirement Specification
PDF
Example requirements specification
PDF
Sample Business Requirement Document
Effective usecases
Requirement analysis with use case
Determining Requirements In System Analysis And Dsign
Requirements engineering with UML [Software Modeling] [Computer Science] [Vri...
Software (requirement) analysis using uml
Software Requirement Specification
Example requirements specification
Sample Business Requirement Document
Ad

Similar to STAT Requirement Analysis (20)

PDF
postgres loader
PPTX
ORDBMS.pptx
PDF
Language-agnostic data analysis workflows and reproducible research
PPTX
MongoDB is a document database. It stores data in a type of JSON format calle...
PDF
Language Server Protocol - Why the Hype?
PPTX
BERT QnA System for Airplane Flight Manual
PDF
Composable Parallel Processing in Apache Spark and Weld
PDF
SparkR: The Past, the Present and the Future-(Shivaram Venkataraman and Rui S...
PPT
DataFinder concepts and example: General (20100503)
PPT
iOS Application Development
PDF
Standardizing on a single N-dimensional array API for Python
PPT
Hatkit Project - Datafiddler
PPT
Java basics
ODP
ODF Mashups
DOCX
1 Project 2 Introduction - the SeaPort Project seri.docx
PDF
Spark meetup TCHUG
PPTX
Quantopix analytics system (qas)
PPT
DataFinder: A Python Application for Scientific Data Management
PDF
About "Apache Cassandra"
PPT
Organizing the Data Chaos of Scientists
postgres loader
ORDBMS.pptx
Language-agnostic data analysis workflows and reproducible research
MongoDB is a document database. It stores data in a type of JSON format calle...
Language Server Protocol - Why the Hype?
BERT QnA System for Airplane Flight Manual
Composable Parallel Processing in Apache Spark and Weld
SparkR: The Past, the Present and the Future-(Shivaram Venkataraman and Rui S...
DataFinder concepts and example: General (20100503)
iOS Application Development
Standardizing on a single N-dimensional array API for Python
Hatkit Project - Datafiddler
Java basics
ODF Mashups
1 Project 2 Introduction - the SeaPort Project seri.docx
Spark meetup TCHUG
Quantopix analytics system (qas)
DataFinder: A Python Application for Scientific Data Management
About "Apache Cassandra"
Organizing the Data Chaos of Scientists

More from stat (6)

PPT
Stat Design3 18 09
PDF
Stat Tech Reportv1
PDF
Requirementv4
PPT
Stat2 25 09
PDF
Requirment
PPT
Requirements - Part 1
Stat Design3 18 09
Stat Tech Reportv1
Requirementv4
Stat2 25 09
Requirment
Requirements - Part 1

Recently uploaded (20)

PPTX
MYSQL Presentation for SQL database connectivity
PDF
Empathic Computing: Creating Shared Understanding
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPT
Teaching material agriculture food technology
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
Spectroscopy.pptx food analysis technology
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Electronic commerce courselecture one. Pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
MYSQL Presentation for SQL database connectivity
Empathic Computing: Creating Shared Understanding
Chapter 3 Spatial Domain Image Processing.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Teaching material agriculture food technology
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Spectroscopy.pptx food analysis technology
Review of recent advances in non-invasive hemoglobin estimation
Electronic commerce courselecture one. Pdf
Understanding_Digital_Forensics_Presentation.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Spectral efficient network and resource selection model in 5G networks
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
“AI and Expert System Decision Support & Business Intelligence Systems”
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Programs and apps: productivity, graphics, security and other tools
Profit Center Accounting in SAP S/4HANA, S4F28 Col11

STAT Requirement Analysis

  • 1. Requirement Analysis THE STAT PROJECT Milestone 1 Report
  • 2. To design a framework, how many variations we need to protect? How many functionalities we need to provide for supporting all these variations? QUESTIONS
  • 3. Variation for importing dataset (File Sources)
  • 4. Variations for importing dataset (File formats)
  • 5. Variations for importing dataset (Schemas) Even if we only consider dataset in XML, each dataset may have its own schema.
  • 7. Simplified approach One approach: High Level Reader Class, - ReutersReader RCV1Reader Once written, can be shared by community Observation: for the sake of comparison, researchers usually deal with a few famous dataset (e.g., Reuters, RCV-1)
  • 8. Able to persist and read back memory objects
  • 9. Able to visualize memory objects
  • 10. STAT (brief) Domain Model Note : We ignore texts on connectors for brevity. Some connections are not drawn because of space limitation
  • 11. STAT framework sample code (conceptual)
  • 12.  
  • 13. Domain Concept: RawCorpus A collection of RawDocument , supporting collection operations: - Add new RawDocument element - Remove existing RawDocument element - Accessing elements in the collection - …
  • 14. Domain Concept: RawCorpus abstract class RawCorpus { List< RawDocument > rawDocuments; RawDocument getDocument(int index); void setDocument(int index, T doc); void removeDocument(int index); }
  • 15. Domain Concept: RawDocument An object with one or more string fields, serving as a non-processed, in-memory representation of a document unit - Like Java beans with getter and setter - All fields must be string type, even for numbers
  • 16. Domain Concept: RawDocument class MyRawDocument extends RawDocument { String title; String author; String body; String date; String numOfClicks; String topicType; … } abstract class RawDocument { public RawDocument() {} }
  • 17. Domain Concept: Processor An object that processes RawCorpus and produces Corpus . - Linguistic: Tokenizer, Stemmer, StopRemover, PosTagger, … - Machine learning: Feature-specific, document-specific
  • 18. Domain Concept: Corpus An object representing a collection of Document for use by machine learning side of framework. This object provides a notion of splits which is commonly used (e.g., train, test)
  • 19. Domain Concept: Trainer A representation of a machine learning algorithm, which can learn from a Corpus and produce a Model .
  • 20. Domain Concept: Model An object of what machine learning algorithm (i.e., Trainer ) creates to store parameters that are &quot;learned&quot; from the data (i.e., Corpus )
  • 21. Domain Concept: Classifier An object that maps Documents to target values (label, number, probability). It takes a Corpus and a Model as inputs, and produces a Prediction associated with the Corpus according to the Model .
  • 22. Domain Concept: Prediction A collection of target values (label, number, probability) that associate with a Corpus , i.e., a collection of Document .
  • 23. Domain Concept: Evaluator An object used for comparing the Prediction against its associated Corpus and generating Evaluation
  • 24. Domain Concept: Evaluation A representation of evaluation result given by a Evaluator , in a summarized manner.
  • 26. STAT (brief) Domain Model Note : We ignore texts on connectors for brevity. Some connections are not drawn because of space limitation Corpus Reader Processor RawCorpus Trainer Model Classifier Prediction Evaluator Evaluation Writer Vocabulary
  • 27. STAT Domain Model Note : We ignore texts above lines for brevity Corpus Reader Processor RawCorpus Trainer Model Classifier Prediction Evaluator Evaluation Writer
  • 28. STAT Domain Model Note : We ignore texts above lines for brevity Corpus Reader Processor RawCorpus Trainer Model Classifier Prediction Evaluator Evaluation Document RawDocument