Classification and analysis of unstructured data
Project Mentor :-
Dr. Navanath Saharia
Radhe Raman Tiwari
Roll No :- 17010115
Aviral Prakash
Roll No :- 17010122
Radhe Raman Tiwari Aviral Prakash
50% 50%
WindowsTool UbuntuTool
UI forWindows UI for Ubuntu
MongoDB Query MongoDB Query
Frontend Frontend
Backend Backend
Implementation(Logic) Implementation(Logic)
CONTRIBUTION:-
Unified Modeling Language(UML) Diagram
File Class
- Name:String
- Type:String
- Creation:Timestamp
- Access:Timestamp
- Modify:Timestamp
FileType Class
- ID:String
- Name:String
- Type:String
- Creation:Timestamp
- Access:Timestamp
- Modify:Timestamp
+ insert_one()
+ insert_many()
DataBase Class
- Collection:String
+ createCollection()
1 1 1
*
Schema:-
MP3 FILES
ID
File Name
File Creation Date/Time
FileAccess Date/Time
File Modify Date/Time
FileType
DOCX FILES
ID
File Name
File Creation Date/Time
FileAccess Date/Time
File Modify Date/Time
FileType
CPP FILES
ID
File Name
File Creation Date/Time
File Access Date/Time
File Modify Date/Time
FileType
Examples:-
Format:-
#FILETYPE FILES
ID
File Name
File Creation Date/Time
File Access Date/Time
File Modify Date/Time
FileType
Data Dictionary :-
Column DataType Description
_id String Object Id NOT NULL
File Name String Name of File NOT
NULL
FileType String Type of File NOT NULL
Creation Timestamp FileCreation
Date/Time NOT NULL
Access Timestamp Last FileAccess
Date/Time NOT NULL
Modify Timestamp Last File Modify
Date/Time NOT NULL
Column DataType Description
_id String Object Id NOT NULL
filename String Name of File NOT
NULL
filetype String Type of File NOT NULL
fileage Timestamp FileCreation
Date/Time NOT NULL
lastaccess Timestamp Last FileAccess
Date/Time NOT NULL
lastmodified Timestamp Last File Modify
Date/Time NOT NULL
FOR WINDOWS TOOL: FOR UBUNTU TOOL:
Description of the modules :WindowsVersion
 FileData() :- It takes path of directory and return all file form directory as list.
 MetaData() :- It takes name with path of file and return metadata of file as list.
 FileExtension() :- It takes file name and return their extension.
 MongoData() :- It takes metadata of file and return only needed information as list.
 MongoConnect() :- It take only needed information of file and store it to Mongodb.
 SetDataSet() :- It takes needed information of file and returns it to JSON format.
 StringSplit() :- It takes information, split and return according to requirement.
 Main() :- It hendles all modules based on requirement.
 Caller() :- It calls all modules according to program logics
Description of the modules : Ubuntu version
 abcd() :- It takes the path of a directory and extracts the metadata of all the files under it, stores it in a list and
returns that list.
 sort() :- It takes a list and sorts the metadata in it according to the filetype and returns the sorted list.
 MongoConnect() :- It takes metadata of a file in dictionary form and stores it in mongoDB.
 SetDataSet() :- It takes the list having metadata and converts metadata of each file in list to a dictionary, for later
storing it in mongoDB and returns back the original list.
 __init__() :- Defining variable and list globally.
 setupUi() :- It deals with the designing and functioning of the UI of tool including buttons , search bar and output
window.
 retranslateUi() :- It deals with the setting the window title, tool icon, tool name and button names.
 GetterType() :-It invokes abcd() function and gives metadata of files in sorted order of their filetype.
 GetterCreate() :-It invokes abcd() function and gives metadata of files in sorted order of their date of creation.
 GetterModify() :-It invokes abcd() function and gives metadata of files in sorted order of their last modified date.
 GetterAccess() :-It invokes abcd() function and gives metadata of files in sorted order of their last accessed date.
 Caller() :- It calls all modules according to program logics.
Declaration:
The content (such as description, source code, and diagram)
presented and submitted to the instructor by Radhe RamanTiwari,
roll no.-17010115 and Aviral Prakash, roll no.-17010122 is our own
creation (except system library/procedure). If anything found
plagiarised, I Radhe RamanTiwari and I Aviral Prakash, will accept
zero marks against the submitted project.We are allowing also, the
academic section, IIIT Senapati, Manipur to deduct ten marks from
our final score of CS 240 or CS 241 course as punishment.

More Related Content

PDF
PPTX
PPT
cpp-file-handling
PPTX
Chapter 08 data file handling
PPT
Chapter 12 - File Input and Output
PPT
Jug java7
PDF
Java Programming - 06 java file io
PPTX
Python: Manager Objects
cpp-file-handling
Chapter 08 data file handling
Chapter 12 - File Input and Output
Jug java7
Java Programming - 06 java file io
Python: Manager Objects

What's hot (19)

DOCX
java copy file program
PDF
The Ring programming language version 1.2 book - Part 15 of 84
PPTX
Files in php
PPTX
Packages and Datastructures - Python
PDF
Input File dalam C++
PPT
intro unix/linux 07
PPT
intro unix/linux 09
PDF
Core Java Programming Language (JSE) : Chapter XI - Console I/O and File I/O
PPT
intro unix/linux 11
PDF
第2回 Hadoop 輪読会
PPTX
Session 22 - Java IO, Serialization
PPTX
Sqlmap
PPTX
PPT
intro unix/linux 08
PDF
Introduction to PostgreSQL
PPTX
Type Casting C# - Lec4 (Workshop on C# Programming: Learn to Build)
TXT
Inputstream
PPTX
Data Type C# - Lec2 (Workshop on C# Programming: Learn to Build)
PPT
19 structured files
java copy file program
The Ring programming language version 1.2 book - Part 15 of 84
Files in php
Packages and Datastructures - Python
Input File dalam C++
intro unix/linux 07
intro unix/linux 09
Core Java Programming Language (JSE) : Chapter XI - Console I/O and File I/O
intro unix/linux 11
第2回 Hadoop 輪読会
Session 22 - Java IO, Serialization
Sqlmap
intro unix/linux 08
Introduction to PostgreSQL
Type Casting C# - Lec4 (Workshop on C# Programming: Learn to Build)
Inputstream
Data Type C# - Lec2 (Workshop on C# Programming: Learn to Build)
19 structured files
Ad

Similar to Classification & Analysis of Unstructured Data (20)

PDF
Linux file commands and shell scripts
PPTX
Rishav Mishra final presentation on UNIX Final.pptx
PPT
various commands in linux operating systems
PPT
various commands in linux operating systems
PPT
Linux ppt
PPSX
Unix_QT.ppsx
PPSX
Unix_QT.ppsx
PDF
Programming Embedded linux
DOCX
Unix commands
PDF
Redhat 6 & 7
PPTX
Built in function
PPT
PDF
Linux Commands CheatSheet - PDF.pdf for engineer
PPT
.NET Reflection
PDF
MacOS forensics and anti-forensics (DC Lviv 2019) presentation
PPTX
DFSNov1.pptx
PDF
Files in C++.pdf is the notes of cpp for reference
PPTX
Unix / Linux Operating System introduction.
PDF
IRE- Algorithm Name Detection in Research Papers
PPTX
Hadoop HDFS Concepts
Linux file commands and shell scripts
Rishav Mishra final presentation on UNIX Final.pptx
various commands in linux operating systems
various commands in linux operating systems
Linux ppt
Unix_QT.ppsx
Unix_QT.ppsx
Programming Embedded linux
Unix commands
Redhat 6 & 7
Built in function
Linux Commands CheatSheet - PDF.pdf for engineer
.NET Reflection
MacOS forensics and anti-forensics (DC Lviv 2019) presentation
DFSNov1.pptx
Files in C++.pdf is the notes of cpp for reference
Unix / Linux Operating System introduction.
IRE- Algorithm Name Detection in Research Papers
Hadoop HDFS Concepts
Ad

Recently uploaded (20)

PPTX
Benefits of Physical activity for teenagers.pptx
PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
CloudStack 4.21: First Look Webinar slides
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
STKI Israel Market Study 2025 version august
PDF
Convolutional neural network based encoder-decoder for efficient real-time ob...
PDF
A review of recent deep learning applications in wood surface defect identifi...
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
Consumable AI The What, Why & How for Small Teams.pdf
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PDF
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
A proposed approach for plagiarism detection in Myanmar Unicode text
PPTX
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
PPTX
The various Industrial Revolutions .pptx
PPT
What is a Computer? Input Devices /output devices
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
Benefits of Physical activity for teenagers.pptx
Developing a website for English-speaking practice to English as a foreign la...
CloudStack 4.21: First Look Webinar slides
Enhancing emotion recognition model for a student engagement use case through...
NewMind AI Weekly Chronicles – August ’25 Week III
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
STKI Israel Market Study 2025 version august
Convolutional neural network based encoder-decoder for efficient real-time ob...
A review of recent deep learning applications in wood surface defect identifi...
Zenith AI: Advanced Artificial Intelligence
1 - Historical Antecedents, Social Consideration.pdf
Consumable AI The What, Why & How for Small Teams.pdf
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
Getting started with AI Agents and Multi-Agent Systems
A proposed approach for plagiarism detection in Myanmar Unicode text
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
The various Industrial Revolutions .pptx
What is a Computer? Input Devices /output devices
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf

Classification & Analysis of Unstructured Data

  • 1. Classification and analysis of unstructured data Project Mentor :- Dr. Navanath Saharia Radhe Raman Tiwari Roll No :- 17010115 Aviral Prakash Roll No :- 17010122
  • 2. Radhe Raman Tiwari Aviral Prakash 50% 50% WindowsTool UbuntuTool UI forWindows UI for Ubuntu MongoDB Query MongoDB Query Frontend Frontend Backend Backend Implementation(Logic) Implementation(Logic) CONTRIBUTION:-
  • 3. Unified Modeling Language(UML) Diagram File Class - Name:String - Type:String - Creation:Timestamp - Access:Timestamp - Modify:Timestamp FileType Class - ID:String - Name:String - Type:String - Creation:Timestamp - Access:Timestamp - Modify:Timestamp + insert_one() + insert_many() DataBase Class - Collection:String + createCollection() 1 1 1 *
  • 4. Schema:- MP3 FILES ID File Name File Creation Date/Time FileAccess Date/Time File Modify Date/Time FileType DOCX FILES ID File Name File Creation Date/Time FileAccess Date/Time File Modify Date/Time FileType CPP FILES ID File Name File Creation Date/Time File Access Date/Time File Modify Date/Time FileType Examples:- Format:- #FILETYPE FILES ID File Name File Creation Date/Time File Access Date/Time File Modify Date/Time FileType
  • 5. Data Dictionary :- Column DataType Description _id String Object Id NOT NULL File Name String Name of File NOT NULL FileType String Type of File NOT NULL Creation Timestamp FileCreation Date/Time NOT NULL Access Timestamp Last FileAccess Date/Time NOT NULL Modify Timestamp Last File Modify Date/Time NOT NULL Column DataType Description _id String Object Id NOT NULL filename String Name of File NOT NULL filetype String Type of File NOT NULL fileage Timestamp FileCreation Date/Time NOT NULL lastaccess Timestamp Last FileAccess Date/Time NOT NULL lastmodified Timestamp Last File Modify Date/Time NOT NULL FOR WINDOWS TOOL: FOR UBUNTU TOOL:
  • 6. Description of the modules :WindowsVersion  FileData() :- It takes path of directory and return all file form directory as list.  MetaData() :- It takes name with path of file and return metadata of file as list.  FileExtension() :- It takes file name and return their extension.  MongoData() :- It takes metadata of file and return only needed information as list.  MongoConnect() :- It take only needed information of file and store it to Mongodb.  SetDataSet() :- It takes needed information of file and returns it to JSON format.  StringSplit() :- It takes information, split and return according to requirement.  Main() :- It hendles all modules based on requirement.  Caller() :- It calls all modules according to program logics
  • 7. Description of the modules : Ubuntu version  abcd() :- It takes the path of a directory and extracts the metadata of all the files under it, stores it in a list and returns that list.  sort() :- It takes a list and sorts the metadata in it according to the filetype and returns the sorted list.  MongoConnect() :- It takes metadata of a file in dictionary form and stores it in mongoDB.  SetDataSet() :- It takes the list having metadata and converts metadata of each file in list to a dictionary, for later storing it in mongoDB and returns back the original list.  __init__() :- Defining variable and list globally.  setupUi() :- It deals with the designing and functioning of the UI of tool including buttons , search bar and output window.  retranslateUi() :- It deals with the setting the window title, tool icon, tool name and button names.  GetterType() :-It invokes abcd() function and gives metadata of files in sorted order of their filetype.  GetterCreate() :-It invokes abcd() function and gives metadata of files in sorted order of their date of creation.  GetterModify() :-It invokes abcd() function and gives metadata of files in sorted order of their last modified date.  GetterAccess() :-It invokes abcd() function and gives metadata of files in sorted order of their last accessed date.  Caller() :- It calls all modules according to program logics.
  • 8. Declaration: The content (such as description, source code, and diagram) presented and submitted to the instructor by Radhe RamanTiwari, roll no.-17010115 and Aviral Prakash, roll no.-17010122 is our own creation (except system library/procedure). If anything found plagiarised, I Radhe RamanTiwari and I Aviral Prakash, will accept zero marks against the submitted project.We are allowing also, the academic section, IIIT Senapati, Manipur to deduct ten marks from our final score of CS 240 or CS 241 course as punishment.

Editor's Notes

  • #3: May require more than one slide