1. đ Project Documentation Structure
1. Title Page
ďˇ Project Title
ďˇ Your Name / Team Members
ďˇ Institution Name
ďˇ Guide/Advisor Name
ďˇ Date
2. Abstract
Automated Spammers Detection in Social Media
ABSTRACT
Twitter is one of the most popular micro blogging services, which is generally
used to share news and updates through short messages restricted to 280
characters. However, its open nature and large user base are frequently
exploited by automated spammers, content polluters, and other ill-intended
users to commit various cybercrimes, such as cyber bullying, trolling, rumour
dissemination, and stalking. Accordingly, a number of approaches have been
proposed by researchers to address
these problems. However, most of these approaches are based on user
characterization and completely disregarding mutual interactions. In this paper,
we present a hybrid approach for detecting automated spammers by
amalgamating community based features with other feature categories,
namely metadata-, content-, and interaction-based features. The novelty of the
proposed approach lies in the characterization of users based on their
interactions with their followers given that a user can evade features that are
2. related to his/her own activities, but evading those based on the followers is
difficult. Nineteen different features, including six newly defined features and
two redefined features, are identified for learning three classifiers, namely,
random forest, decision tree, and Bayesian network, on a real dataset that
comprises benign users and spammers. The discrimination power of different
feature categories is also analyzed, and interaction- and community-based
features are determined to be the most effective for spam detection, whereas
metadata-based features are proven to be the least effective.
3. Introduction
INTRODUCTION
TWITTER, a microblogging service, is considered a popular online social
network (OSN) with a large user base and is attracting users from different
walks of life and age groups. OSNs enable users to keep in touch with friends,
relatives, family members, and people with similar interests, profession, and
objectives. In addition, they allow users to interact with one another and form
communities. A user can become a member of an OSN by registering and
providing details, such as name, birthday, gender, and other contact information.
Although a large number of OSNs exist on the web, Facebook and Twitter are
among the most popular OSNs and are included in the list of the top 10
websites1 around the worldwide.
A. OSN and the Social Spam Problem
Twitter, which was founded in 2006, allows its users to post their views, express
their thoughts, and share news and other information in the form of tweets that
are restricted to280 characters. Twitter allows the users to follow their favourite
3. politicians, athletes, celebrities, and news channels, and to subscribe to their
content without any hindrance. Through following activity, a follower can
receive status updates of subscribed account. Although Twitter and other OSNs
are mainly used for various benign purposes, their open nature, huge user base,
and real-time message proliferation have made them lucrative targets for cyber
criminals and social bots. OSNs have been proven to be incubators for a new
breed of complex and sophisticated attacks and threats, such as cyberbullying,
misinformation diffusion, stalking, identity deception, radicalization, and other
illicit activities, in addition to classical cyber attacks, such as spamming,
phishing, and drive by download [1], [2]. Over the years, classical attacks have
evolved into sophisticated attacks to evade detection mechanisms. A
report2submitted to the US Securities and Exchange Commission in August
2014 indicates that approximately 14% of Twitter accounts are actually
spambots and approximately 9.3% of all tweets are spam. In social networks,
spambots are also known as socialbots that mimic human behaviour to gain trust
in a network and then exploit it for malicious activities [3]. Such reports and
findings demonstrate the extent of cyber crimes committed by spambots and
how OSNs are proving to be a heaven for these bots. Although spammers are
less than benign users, they are capable of affecting network structure and trust
for various illicit purposes.
The main contributions of this study can be summarized as follows.
⢠A novel study that uses community-based features with other feature
categories, including metadata, content, and interaction, for detecting
automated spammers.
⢠Six new features are introduced and two existing features are redefined to
design a feature set with improved discriminative power for segregating benign
users and spammers. Among the six new features, one is content based, three
4. are interaction-based, and the remaining two are community-based. Meanwhile,
both redefined features are content-based. When defining interaction based
features, focus should be on the followers of a user, rather than on the ones
he/she is followings.
⢠A detailed analysis of the working behaviour of automated spammers and
benign users with respect to newly defined features. In addition, two-tailed Z-
test statistical significance analysis is performed to answer the following
question: âis the difference between the working behaviour of spammers and
benign users in terms of newly defined features a random chance?â
⢠A thorough analysis of the discriminating power of each feature category in
segregating automated spammers from benign users.
2. SYSTEM STUDY
2.1 FEASIBILITY STUDY
The feasibility of the project is analyzed in this phase and
business proposal is put forth with a very general plan for the project and some
cost estimates. During system analysis the feasibility study of the proposed
system is to be carried out. This is to ensure that the proposed system is not a
burden to the company. For feasibility analysis, some understanding of the
major requirements for the system is essential.
Three key considerations involved in the feasibility analysis are
5. ď¨ ECONOMICAL FEASIBILITY
ď¨ TECHNICAL FEASIBILITY
ď¨ SOCIAL FEASIBILITY
ECONOMICAL FEASIBILITY
This study is carried out to check the economic impact that the system
will have on the organization. The amount of fund that the company can pour
into the research and development of the system is limited. The expenditures
must be justified. Thus the developed system as well within the budget and this
was achieved because most of the technologies used are freely available. Only
the customized products had to be purchased.
TECHNICAL FEASIBILITY
This study is carried out to check the technical feasibility, that is, the
technical requirements of the system. Any system developed must not have a
high demand on the available technical resources. This will lead to high
6. demands on the available technical resources. This will lead to high demands
being placed on the client. The developed system must have a modest
requirement, as only minimal or null changes are required for implementing
this system.
SOCIAL FEASIBILITY
The aspect of study is to check the level of acceptance of the system by
the user. This includes the process of training the user to use the system
efficiently. The user must not feel threatened by the system, instead must
accept it as a necessity. The level of acceptance by the users solely depends on
the methods that are employed to educate the user about the system and to
make him familiar with it. His level of confidence must be raised so that he is
also able to make some constructive criticism, which is welcomed, as he is the
final user of the system.
5. Feasibility Study
PRELIMINARY INVESTIGATION
7. The first and foremost strategy for development of a project starts from the thought of
designing a mail enabled platform for a small firm in which it is easy and convenient of
sending and receiving messages, there is a search engine ,address book and also including
some entertaining games. When it is approved by the organization and our project guide the
first activity, ie. preliminary investigation begins. The activity has three parts:
ďˇ Request Clarification
ďˇ Feasibility Study
ďˇ Request Approval
REQUEST CLARIFICATION
After the approval of the request to the organization and project guide, with an
investigation being considered, the project request must be examined to determine precisely
what the system requires.
Here our project is basically meant for users within the company whose
systems can be interconnected by the Local Area Network(LAN). In todayâs busy schedule
man need everything should be provided in a readymade manner. So taking into
consideration of the vastly use of the net in day to day life, the corresponding development of
the portal came into existence.
FEASIBILITY ANALYSIS
8. An important outcome of preliminary investigation is the determination that the
system request is feasible. This is possible only if it is feasible within limited resource and
time. The different feasibilities that have to be analyzed are
ďˇ Operational Feasibility
ďˇ Economic Feasibility
ďˇ Technical Feasibility
Operational Feasibility
Operational Feasibility deals with the study of prospects of the system to be
developed. This system operationally eliminates all the tensions of the Admin and helps him
in effectively tracking the project progress. This kind of automation will surely reduce the
time and energy, which previously consumed in manual work. Based on the study, the system
is proved to be operationally feasible.
Economic Feasibility
Economic Feasibility or Cost-benefit is an assessment of the economic justification
for a computer based project. As hardware was installed from the beginning & for lots of
purposes thus the cost on project of hardware is low. Since the system is a network based,
any number of employees connected to the LAN within that organization can use this tool
from at anytime. The Virtual Private Network is to be developed using the existing resources
of the organization. So the project is economically feasible.
Technical Feasibility
According to Roger S. Pressman, Technical Feasibility is the assessment of the
technical resources of the organization. The organization needs IBM compatible machines
with a graphical web browser connected to the Internet and Intranet. The system is
developed for platform Independent environment. Java Server Pages, JavaScript, HTML, SQL
server and WebLogic Server are used to develop the system. The technical feasibility has
been carried out. The system is technically feasible for development and can be developed
with the existing facility.
9. 4.3.3 REQUEST APPROVAL
Not all request projects are desirable or feasible. Some organization receives so many
project requests from client users that only few of them are pursued. However, those
projects that are both feasible and desirable should be put into schedule. After a project
request is approved, it cost, priority, completion time and personnel requirement is
estimated and used to determine where to add it to any project list. Truly speaking, the
approval of those above factors, development works can be launched.
SYSTEM DESIGN AND DEVELOPMENT
INPUT DESIGN
Input Design plays a vital role in the life cycle of software development, it requires
very careful attention of developers. The input design is to feed data to the application as
accurate as possible. So inputs are supposed to be designed effectively so that the errors
occurring while feeding are minimized. According to Software Engineering Concepts, the
input forms or screens are designed to provide to have a validation control over the input
limit, range and other related validations.
10. This system has input screens in almost all the modules. Error messages are
developed to alert the user whenever he commits some mistakes and guides him in the right
way so that invalid entries are not made. Let us see deeply about this under module design.
Input design is the process of converting the user created input into a computer-
based format. The goal of the input design is to make the data entry logical and free from
errors. The error is in the input are controlled by the input design. The application has been
developed in user-friendly manner. The forms have been designed in such a way during the
processing the cursor is placed in the position where must be entered. The user is also
provided with in an option to select an appropriate input from various alternatives related to
the field in certain cases.
Validations are required for each data entered. Whenever a user enters an erroneous
data, error message is displayed and the user can move on to the subsequent pages after
completing all the entries in the current page.
OUTPUT DESIGN
The Output from the computer is required to mainly create an efficient method of
communication within the company primarily among the project leader and his team
members, in other words, the administrator and the clients. The output of VPN is the system
which allows the project leader to manage his clients in terms of creating new clients and
11. assigning new projects to them, maintaining a record of the project validity and providing
folder level access to each client on the user side depending on the projects allotted to him.
After completion of a project, a new project may be assigned to the client. User
authentication procedures are maintained at the initial stages itself. A new user may be
created by the administrator himself or a user can himself register as a new user but the task
of assigning projects and validating a new user rests with the administrator only.
The application starts running when it is executed for the first time. The server has to be
started and then the internet explorer in used as the browser. The project will run on the
local area network so the server machine will serve as the administrator while the other
connected systems can act as the clients. The developed system is highly user friendly and
can be easily understood by anyone using it even for the first time.
6. Software Environment
Software Environment
Java Technology
Java technology is both a programming language and a platform.
The Java Programming Language
The Java programming language is a high-level language that can be
characterized by all of the following buzzwords:
ď§ Simple
ď§ Architecture neutral
12. ď§ Object oriented
ď§ Portable
ď§ Distributed
ď§ High performance
ď§ Interpreted
ď§ Multithreaded
ď§ Robust
ď§ Dynamic
ď§ Secure
With most programming languages, you either compile or interpret a
program so that you can run it on your computer. The Java programming
language is unusual in that a program is both compiled and interpreted. With the
compiler, first you translate a program into an intermediate language called
Java byte codes âthe platform-independent codes interpreted by the interpreter
on the Java platform. The interpreter parses and runs each Java byte code
instruction on the computer. Compilation happens just once; interpretation
occurs each time the program is executed. The following figure illustrates how
this works.
You can think of Java byte codes as the machine code instructions for the
Java Virtual Machine (Java VM). Every Java interpreter, whether itâs a
13. development tool or a Web browser that can run applets, is an implementation
of the Java VM. Java byte codes help make âwrite once, run anywhereâ
possible. You can compile your program into byte codes on any platform that
has a Java compiler. The byte codes can then be run on any implementation of
the Java VM. That means that as long as a computer has a Java VM, the same
program written in the Java programming language can run on Windows 2000,
a Solaris workstation, or on an iMac.
The Java Platform
A platform is the hardware or software environment in which a
program runs. Weâve already mentioned some of the most popular
platforms like Windows 2000, Linux, Solaris, and MacOS. Most platforms
can be described as a combination of the operating system and
hardware. The Java platform differs from most other platforms in that itâs
a software-only platform that runs on top of other hardware-based
platforms.
The Java platform has two components:
ďˇ The Java Virtual Machine (Java VM)
14. ďˇ The Java Application Programming Interface (Java API)
Youâve already been introduced to the Java VM. Itâs the base for the
Java platform and is ported onto various hardware-based platforms.
The Java API is a large collection of ready-made software components
that provide many useful capabilities, such as graphical user interface
(GUI) widgets. The Java API is grouped into libraries of related classes
and interfaces; these libraries are known as packages. The next section,
What Can Java Technology Do? Highlights what functionality some of
the packages in the Java API provide.
The following figure depicts a program thatâs running on the Java
platform. As the figure shows, the Java API and the virtual machine
insulate the program from the hardware.
Native code is code that after you compile it, the compiled code
runs on a specific hardware platform. As a platform-independent
environment, the Java platform can be a bit slower than native code.
However, smart compilers, well-tuned interpreters, and just-in-time byte
code compilers can bring performance close to that of native code
without threatening portability.
What Can Java Technology Do?
The most common types of programs written in the Java programming
language are applets and applications. If youâve surfed the Web, youâre
probably already familiar with applets. An applet is a program that
adheres to certain conventions that allow it to run within a Java-enabled
browser.
15. However, the Java programming language is not just for writing cute,
entertaining applets for the Web. The general-purpose, high-level Java
programming language is also a powerful software platform. Using the
generous API, you can write many types of programs.
An application is a standalone program that runs directly on the Java
platform. A special kind of application known as a server serves and
supports clients on a network. Examples of servers are Web servers,
proxy servers, mail servers, and print servers. Another specialized
program is a servlet. A servlet can almost be thought of as an applet that
runs on the server side. Java Servlets are a popular choice for building
interactive web applications, replacing the use of CGI scripts. Servlets are
similar to applets in that they are runtime extensions of applications.
Instead of working in browsers, though, servlets run within Java Web
servers, configuring or tailoring the server.
How does the API support all these kinds of programs? It does so with
packages of software components that provides a wide range of
functionality. Every full implementation of the Java platform gives you
the following features:
ďˇ The essentials: Objects, strings, threads, numbers, input and
output, data structures, system properties, date and time, and so
on.
ďˇ Applets: The set of conventions used by applets.
ďˇ Networking: URLs, TCP (Transmission Control Protocol), UDP (User
Data gram Protocol) sockets, and IP (Internet Protocol) addresses.
ďˇ Internationalization: Help for writing programs that can be
localized for users worldwide. Programs can automatically adapt
to specific locales and be displayed in the appropriate language.
16. ďˇ Security: Both low level and high level, including electronic
signatures, public and private key management, access control,
and certificates.
ďˇ Software components: Known as JavaBeansTM
, can plug into
existing component architectures.
ďˇ Object serialization: Allows lightweight persistence and
communication via Remote Method Invocation (RMI).
ďˇ Java Database Connectivity (JDBCTM
): Provides uniform access to a
wide range of relational databases.
The Java platform also has APIs for 2D and 3D graphics, accessibility,
servers, collaboration, telephony, speech, animation, and more. The
following figure depicts what is included in the Java 2 SDK.
How Will Java Technology Change My Life?
We canât promise you fame, fortune, or even a job if you learn the
Java programming language. Still, it is likely to make your programs
better and requires less effort than other languages. We believe that Java
technology will help you do the following:
17. ďˇ Get started quickly: Although the Java programming language is a
powerful object-oriented language, itâs easy to learn, especially for
programmers already familiar with C or C++.
ďˇ Write less code: Comparisons of program metrics (class counts,
method counts, and so on) suggest that a program written in the
Java programming language can be four times smaller than the
same program in C++.
ďˇ Write better code: The Java programming language encourages
good coding practices, and its garbage collection helps you avoid
memory leaks. Its object orientation, its JavaBeans component
architecture, and its wide-ranging, easily extendible API let you
reuse other peopleâs tested code and introduce fewer bugs.
ďˇ Develop programs more quickly: Your development time may be
as much as twice as fast versus writing the same program in C++.
Why? You write fewer lines of code and it is a simpler
programming language than C++.
ďˇ Avoid platform dependencies with 100% Pure Java: You can keep
your program portable by avoiding the use of libraries written in
other languages. The 100% Pure JavaTM
Product Certification
Program has a repository of historical process manuals, white
papers, brochures, and similar materials online.
ďˇ Write once, run anywhere: Because 100% Pure Java programs are
compiled into machine-independent byte codes, they run
consistently on any Java platform.
ďˇ Distribute software more easily: You can upgrade applets easily
from a central server. Applets take advantage of the feature of
18. allowing new classes to be loaded âon the fly,â without
recompiling the entire program.
ODBC
Microsoft Open Database Connectivity (ODBC) is a standard
programming interface for application developers and database systems
providers. Before ODBC became a de facto standard for Windows programs to
interface with database systems, programmers had to use proprietary languages
for each database they wanted to connect to. Now, ODBC has made the choice
of the database system almost irrelevant from a coding perspective, which is as
it should be. Application developers have much more important things to worry
about than the syntax that is needed to port their program from one database to
another when business needs suddenly change.
Through the ODBC Administrator in Control Panel, you can specify the
particular database that is associated with a data source that an ODBC
application program is written to use. Think of an ODBC data source as a door
with a name on it. Each door will lead you to a particular database. For
example, the data source named Sales Figures might be a SQL Server database,
whereas the Accounts Payable data source could refer to an Access database.
The physical database referred to by a data source can reside anywhere on the
LAN.
The ODBC system files are not installed on your system by Windows 95.
Rather, they are installed when you setup a separate database application, such
as SQL Server Client or Visual Basic 4.0. When the ODBC icon is installed in
Control Panel, it uses a file called ODBCINST.DLL. It is also possible to
administer your ODBC data sources through a stand-alone program called
ODBCADM.EXE. There is a 16-bit and a 32-bit version of this program and each
maintains a separate list of ODBC data sources.
19. From a programming perspective, the beauty of ODBC is that the
application can be written to use the same set of function calls to interface with
any data source, regardless of the database vendor. The source code of the
application doesnât change whether it talks to Oracle or SQL Server. We only
mention these two as an example. There are ODBC drivers available for several
dozen popular database systems. Even Excel spreadsheets and plain text files
can be turned into data sources. The operating system uses the Registry
information written by ODBC Administrator to determine which low-level
ODBC drivers are needed to talk to the data source (such as the interface to
Oracle or SQL Server). The loading of the ODBC drivers is transparent to the
ODBC application program. In a client/server environment, the ODBC API
even handles many of the network issues for the application programmer.
The advantages of this scheme are so numerous that you are probably
thinking there must be some catch. The only disadvantage of ODBC is that it
isnât as efficient as talking directly to the native database interface. ODBC has
had many detractors make the charge that it is too slow. Microsoft has always
claimed that the critical factor in performance is the quality of the driver
software that is used. In our humble opinion, this is true. The availability of
good ODBC drivers has improved a great deal recently. And anyway, the
criticism about performance is somewhat analogous to those who said that
compilers would never match the speed of pure assembly language. Maybe not,
but the compiler (or ODBC) gives you the opportunity to write cleaner
programs, which means you finish sooner. Meanwhile, computers get faster
every year.
JDBC
In an effort to set an independent database standard API for Java; Sun
Microsystems developed Java Database Connectivity, or JDBC. JDBC offers a
generic SQL database access mechanism that provides a consistent interface to
20. a variety of RDBMSs. This consistent interface is achieved through the use of
âplug-inâ database connectivity modules, or drivers. If a database vendor
wishes to have JDBC support, he or she must provide the driver for each
platform that the database and Java run on.
To gain a wider acceptance of JDBC, Sun based JDBCâs framework on
ODBC. As you discovered earlier in this chapter, ODBC has widespread
support on a variety of platforms. Basing JDBC on ODBC will allow vendors to
bring JDBC drivers to market much faster than developing a completely new
connectivity solution.
JDBC was announced in March of 1996. It was released for a 90 day
public review that ended June 8, 1996. Because of user input, the final JDBC
v1.0 specification was released soon after.
The remainder of this section will cover enough information about JDBC for
you to know what it is about and how to use it effectively. This is by no means a
complete overview of JDBC. That would fill an entire book.
JDBC Goals
Few software packages are designed without goals in mind. JDBC is one
that, because of its many goals, drove the development of the API. These goals,
in conjunction with early reviewer feedback, have finalized the JDBC class
library into a solid framework for building database applications in Java.
The goals that were set for JDBC are important. They will give you some
insight as to why certain classes and functionalities behave the way they do. The
eight design goals for JDBC are as follows:
1. SQL Level API
The designers felt that their main goal was to define a SQL interface for
Java. Although not the lowest database interface level possible, it is at a low
21. enough level for higher-level tools and APIs to be created. Conversely, it is at
a high enough level for application programmers to use it confidently.
Attaining this goal allows for future tool vendors to âgenerateâ JDBC code
and to hide many of JDBCâs complexities from the end user.
2. SQL Conformance
SQL syntax varies as you move from database vendor to database
vendor. In an effort to support a wide variety of vendors, JDBC will allow any
query statement to be passed through it to the underlying database driver.
This allows the connectivity module to handle non-standard functionality in
a manner that is suitable for its users.
3. JDBC must be implemental on top of common database interfaces
The JDBC SQL API must âsitâ on top of other common SQL level APIs.
This goal allows JDBC to use existing ODBC level drivers by the use of a
software interface. This interface would translate JDBC calls to ODBC and
vice versa.
4. Provide a Java interface that is consistent with the rest of the Java
system
Because of Javaâs acceptance in the user community thus far, the
designers feel that they should not stray from the current design of the core
Java system.
5. Keep it simple
This goal probably appears in all software design goal listings. JDBC is no
exception. Sun felt that the design of JDBC should be very simple, allowing
for only one method of completing a task per mechanism. Allowing
duplicate functionality only serves to confuse the users of the API.
6. Use strong, static typing wherever possible
22. Strong typing allows for more error checking to be done at compile time;
also, less error appear at runtime.
7. Keep the common cases simple
Because more often than not, the usual SQL calls used by the
programmer are simple SELECTâs, INSERTâs, DELETEâs and UPDATEâs,
these queries should be simple to perform with JDBC. However, more
complex SQL statements should also be possible.
Finally we decided to proceed the implementation using Java
Networking.
And for dynamically updating the cache table we go for MS Access
database.
Java ha two things: a programming language and a platform.
Java is a high-level programming language that is all of the
following
Simple Architecture-neutral
Object-oriented Portable
Distributed High-performance
Interpreted multithreaded
Robust Dynamic
Secure
Java is also unusual in that each Java program is both compiled and
interpreted. With a compile you translate a Java program into an
intermediate language called Java byte codes the platform-independent
code instruction is passed and run on the computer.
23. Java Program
Compilers
Interpreter
My Program
Compilation happens just once; interpretation occurs each time the
program is executed. The figure illustrates how this works.
You can think of Java byte codes as the machine code instructions for
the Java Virtual Machine (Java VM). Every Java interpreter, whether
itâs a Java development tool or a Web browser that can run Java applets,
is an implementation of the Java VM. The Java VM can also be
implemented in hardware.
Java byte codes help make âwrite once, run anywhereâ possible. You
can compile your Java program into byte codes on my platform that has
a Java compiler. The byte codes can then be run any implementation of
the Java VM. For example, the same Java program can run Windows
NT, Solaris, and Macintosh.
24. Networking
TCP/IP stack
The TCP/IP stack is shorter than the OSI one:
TCP is a connection-oriented protocol; UDP (User
Datagram Protocol) is a connectionless protocol.
IP datagramâs
The IP layer provides a connectionless and unreliable delivery
system. It considers each datagram independently of the others. Any
association between datagram must be supplied by the higher layers.
The IP layer supplies a checksum that includes its own header. The
header includes the source and destination addresses. The IP layer
handles routing through an Internet. It is also responsible for breaking
up large datagram into smaller ones for transmission and reassembling
them at the other end.
25. UDP
UDP is also connectionless and unreliable. What it adds to IP is a
checksum for the contents of the datagram and port numbers. These are
used to give a client/server model - see later.
TCP
TCP supplies logic to give a reliable connection-oriented protocol
above IP. It provides a virtual circuit that two processes can use to
communicate.
Internet addresses
In order to use a service, you must be able to find it. The Internet
uses an address scheme for machines so that they can be located. The
address is a 32 bit integer which gives the IP address. This encodes a
network ID and more addressing. The network ID falls into various
classes according to the size of the network address.
Network address
Class A uses 8 bits for the network address with 24 bits left over for
other addressing. Class B uses 16 bit network addressing. Class C uses
24 bit network addressing and class D uses all 32.
Subnet address
Internally, the UNIX network is divided into sub networks. Building
11 is currently on one sub network and uses 10-bit addressing, allowing
1024 different hosts.
26. Host address
8 bits are finally used for host addresses within our subnet.
This places a limit of 256 machines that can be on the subnet.
Total address
The 32 bit address is usually written as 4 integers separated by dots.
Port addresses
A service exists on a host, and is identified by its port. This is a 16
bit number. To send a message to a server, you send it to the port for
that service of the host that it is running on. This is not location
transparency! Certain of these ports are "well known".
Sockets
A socket is a data structure maintained by the system to handle
network connections. A socket is created using the call socket. It
returns an integer that is like a file descriptor. In fact, under Windows,
this handle can be used with Read File and Write File
functions.
#include <sys/types.h>
#include <sys/socket.h>
int socket(int family, int type, int protocol);
27. Here "family" will be AF_INET for IP communications,
protocol will be zero, and type will depend on whether TCP or
UDP is used. Two processes wishing to communicate over a network
create a socket each. These are similar to two ends of a pipe - but the
actual pipe does not yet exist.
JFree Chart
JFreeChart is a free 100% Java chart library that makes it easy for
developers to display professional quality charts in their applications.
JFreeChart's extensive feature set includes:
A consistent and well-documented API, supporting a wide range of
chart types;
A flexible design that is easy to extend, and targets both server-
side and client-side applications;
Support for many output types, including Swing components,
image files (including PNG and JPEG), and vector graphics file formats
(including PDF, EPS and SVG);
JFreeChart is "open source" or, more specifically, free software. It
is distributed under the terms of the GNU Lesser General Public Licence
(LGPL), which permits use in proprietary applications.
1. Map Visualizations
Charts showing values that relate to geographical areas. Some
examples include: (a) population density in each state of the United
States, (b) income per capita for each country in Europe, (c) life
expectancy in each country of the world. The tasks in this project
include:
28. Sourcing freely redistributable vector outlines for the countries of
the world, states/provinces in particular countries (USA in particular, but
also other areas);
Creating an appropriate dataset interface (plus default
implementation), a rendered, and integrating this with the existing
XYPlot class in JFreeChart;
Testing, documenting, testing some more, documenting some
more.
2. Time Series Chart Interactivity
Implement a new (to JFreeChart) feature for interactive time
series charts --- to display a separate control that shows a small
version of ALL the time series data, with a sliding "view" rectangle
that allows you to select the subset of the time series data to
display in the main chart.
3. Dashboards
There is currently a lot of interest in dashboard displays. Create a flexible
dashboard mechanism that supports a subset of JFreeChart chart types
(dials, pies, thermometers, bars, and lines/time series) that can be delivered
easily via both Java Web Start and an applet.
29. 4. Property Editors
The property editor mechanism in JFreeChart only handles a small
subset of the properties that can be set for charts. Extend (or
reimplement) this mechanism to provide greater end-user control over
the appearance of the charts.
J2ME (Java 2 Micro edition):-
Sun Microsystems defines J2ME as "a highly optimized Java run-time
environment targeting a wide range of consumer products, including pagers,
cellular phones, screen-phones, digital set-top boxes and car navigation
systems." Announced in June 1999 at the JavaOne Developer Conference, J2ME
brings the cross-platform functionality of the Java language to smaller devices,
allowing mobile wireless devices to share applications. With J2ME, Sun has
adapted the Java platform for consumer products that incorporate or are based
on small computing devices.
30. 1. General J2ME architecture
J2ME uses configurations and profiles to customize the Java Runtime
Environment (JRE). As a complete JRE, J2ME is comprised of a configuration,
which determines the JVM used, and a profile, which defines the application by
adding domain-specific classes. The configuration defines the basic run-time
environment as a set of core classes and a specific JVM that run on specific
types of devices. We'll discuss configurations in detail in the The profile defines
the application; specifically, it adds domain-specific classes to the J2ME
31. configuration to define certain uses for devices. We'll cover profiles in depth in
the The following graphic depicts the relationship between the different virtual
machines, configurations, and profiles. It also draws a parallel with the J2SE API
and its Java virtual machine. While the J2SE virtual machine is generally
referred to as a JVM, the J2ME virtual machines, KVM and CVM, are subsets of
JVM. Both KVM and CVM can be thought of as a kind of Java virtual machine --
it's just that they are shrunken versions of the J2SE JVM and are specific to
J2ME.
2.Developing J2ME applications
Introduction In this section, we will go over some considerations you need to
keep in mind when developing applications for smaller devices. We'll take a
look at the way the compiler is invoked when using J2SE to compile J2ME
applications. Finally, we'll explore packaging and deployment and the role
preverification plays in this process.
3.Design considerations for small devices
Developing applications for small devices requires you to keep certain
strategies in mind during the design phase. It is best to strategically design an
application for a small device before you begin coding. Correcting the code
32. because you failed to consider all of the "gotchas" before developing the
application can be a painful process. Here are some design strategies to
consider:
* Keep it simple. Remove unnecessary features, possibly making those features
a separate, secondary application.
* Smaller is better. This consideration should be a "no brainer" for all
developers. Smaller applications use less memory on the device and require
shorter installation times. Consider packaging your Java applications as
compressed Java Archive (jar) files.
* Minimize run-time memory use. To minimize the amount of memory used at
run time, use scalar types in place of object types. Also, do not depend on the
garbage collector. You should manage the memory efficiently yourself by
setting object references to null when you are finished with them. Another way
to reduce run-time memory is to use lazy instantiation, only allocating objects
on an as-needed basis. Other ways of reducing overall and peak memory use
on small devices are to release resources quickly, reuse objects, and avoid
exceptions.
4.Configurations overview
33. The configuration defines the basic run-time environment as a set of core
classes and a specific JVM that run on specific types of devices. Currently, two
configurations exist for J2ME, though others may be defined in the future:
* Connected Limited Device Configuration (CLDC) is used specifically with the
KVM for 16-bit or 32-bit devices with limited amounts of memory. This is the
configuration (and the virtual machine) used for developing small J2ME
applications. Its size limitations make CLDC more interesting and challenging
(from a development point of view) than CDC. CLDC is also the configuration
that we will use for developing our drawing tool application. An example of a
small wireless device running small applications is a Palm hand-held computer.
* Connected Device Configuration (CDC) is used with the C virtual machine
(CVM) and is used for 32-bit architectures requiring more than 2 MB of
memory. An example of such a device is a Net TV box.
5.J2ME profiles
What is a J2ME profile?
As we mentioned earlier in this tutorial, a profile defines the type of device
supported. The Mobile Information Device Profile (MIDP), for example, defines
classes for cellular phones. It adds domain-specific classes to the J2ME
configuration to define uses for similar devices. Two profiles have been defined
34. for J2ME and are built upon CLDC: KJava and MIDP. Both KJava and MIDP are
associated with CLDC and smaller devices. Profiles are built on top of
configurations. Because profiles are specific to the size of the device (amount
of memory) on which an application runs, certain profiles are associated with
certain configurations.
A skeleton profile upon which you can create your own profile, the Foundation
Profile, is available for CDC.
Profile 1: KJava
KJava is Sun's proprietary profile and contains the KJava API. The KJava profile is
built on top of the CLDC configuration. The KJava virtual machine, KVM,
accepts the same byte codes and class file format as the classic J2SE virtual
machine. KJava contains a Sun-specific API that runs on the Palm OS. The KJava
API has a great deal in common with the J2SE Abstract Windowing Toolkit
(AWT). However, because it is not a standard J2ME package, its main package is
com.sun.kjava. We'll learn more about the KJava API later in this tutorial when
we develop some sample applications.
Profile 2: MIDP
MIDP is geared toward mobile devices such as cellular phones and pagers. The
MIDP, like KJava, is built upon CLDC and provides a standard run-time
environment that allows new applications and services to be deployed
dynamically on end user devices. MIDP is a common, industry-standard profile
for mobile devices that is not dependent on a specific vendor. It is a complete
and supported foundation for mobile application
development. MIDP contains the following packages, the first three of which
are core CLDC packages, plus three MIDP-specific packages.
35. * java.lang
* java.io
* java.util
* javax.microedition.io
* javax.microedition.lcdui
* javax.microedition.midlet
* javax.microedition.rms
Client Server
Over view:
With the varied topic in existence in the fields of computers, Client Server is one, which has generated
more heat than light, and also more hype than reality. This technology has acquired a certain critical mass
attention with its dedication conferences and magazines. Major computer vendors such as IBM and DEC,
have declared that Client Servers is their main future market. A survey of DBMS magazine reveled that
76% of its readers were actively looking at the client server solution. The growth in the client server
development tools from $200 million in 1992 to more than $1.2 billion in 1996.
Client server implementations are complex but the underlying concept is simple and powerful. A client is
an application running with local resources but able to request the database and relate the services from
separate remote server. The software mediating this client server interaction is often referred to as
MIDDLEWARE.
The typical client either a PC or a Work Station connected through a network to a more powerful PC,
Workstation, Midrange or Main Frames server usually capable of handling request from more than one
client. However, with some configuration server may also act as client. A server may need to access other
server in order to process the original client request.
The key client server idea is that client as user is essentially insulated from the physical location and
formats of the data needs for their application. With the proper middleware, a client input from or report
can transparently access and manipulate both local database on the client machine and remote databases
on one or more servers. An added bonus is the client server opens the door to multi-vendor database
access indulging heterogeneous table joins.
36. What is a Client Server
Two prominent systems in existence are client server and file server systems. It is essential to distinguish
between client servers and file server systems. Both provide shared network access to data but the
comparison dens there! The file server simply provides a remote disk drive that can be accessed by LAN
applications on a file by file basis. The client server offers full relational database services such as SQL-
Access, Record modifying, Insert, Delete with full relational integrity backup/ restore performance for
high volume of transactions, etc. the client server middleware provides a flexible interface between client
and server, who does what, when and to whom.
Why Client Server
Client server has evolved to solve a problem that has been around since the earliest days of computing:
how best to distribute your computing, data generation and data storage resources in order to obtain
efficient, cost effective departmental an enterprise wide data processing. During mainframe era choices
were quite limited. A central machine housed both the CPU and DATA (cards, tapes, drums and later
disks). Access to these resources was initially confined to batched runs that produced departmental
reports at the appropriate intervals. A strong central information service department ruled the
corporation. The role of the rest of the corporation limited to requesting new or more frequent reports
and to provide hand written forms from which the central data banks were created and updated. The
earliest client server solutions therefore could best be characterized as âSLAVE-MASTERâ.
Time-sharing changed the picture. Remote terminal could view and even change the
central data, subject to access permissions. And, as the central data banks evolved
in to sophisticated relational database with non-programmer query languages,
online users could formulate adhoc queries and produce local reports with out
adding to the MIS applications software backlog. However remote access was
through dumb terminals, and the client server remained subordinate to the Slave
Master.
37. Front end or User Interface Design
The entire user interface is planned to be developed in browser specific environment
with a touch of Intranet-Based Architecture for achieving the Distributed Concept.
The browser specific components are designed by using the HTML standards, and
the dynamism of the designed by concentrating on the constructs of the Java Server
Pages.
Communication or Database Connectivity Tier
The Communication architecture is designed by concentrating on the Standards of
Servlets and Enterprise Java Beans. The database connectivity is established by
using the Java Data Base Connectivity.
The standards of three-tire architecture are given major concentration to keep the
standards of higher cohesion and limited coupling for effectiveness of the operations.
Features of The Language Used
In my project, I have chosen Java language for developing the code.
About Java
Initially the language was called as âoakâ but it was renamed as âJavaâ in 1995. The primary motivation of
this language was the need for a platform-independent (i.e., architecture neutral) language that could be
used to create software to be embedded in various consumer electronic devices.
ď Java is a programmerâs language.
ď Java is cohesive and consistent.
ď Except for those constraints imposed by the Internet environment, Java gives the
programmer, full control.
Finally, Java is to Internet programming where C was to system programming.
Importance of Java to the Internet
Java has had a profound effect on the Internet. This is because; Java expands the Universe of objects that
can move about freely in Cyberspace. In a network, two categories of objects are transmitted between
38. the Server and the Personal computer. They are: Passive information and Dynamic active programs. The
Dynamic, Self-executing programs cause serious problems in the areas of Security and probability. But,
Java addresses those concerns and by doing so, has opened the door to an exciting new form of program
called the Applet.
Java can be used to create two types of programs
Applications and Applets: An application is a program that runs on our Computer under the operating
system of that computer. It is more or less like one creating using C or C++. Javaâs ability to create Applets
makes it important. An Applet is an application designed to be transmitted over the Internet and executed
by a Java âcompatible web browser. An applet is actually a tiny Java program, dynamically downloaded
across the network, just like an image. But the difference is, it is an intelligent program, not just a media
file. It can react to the user input and dynamically change.
Features Of Java
Security
Every time you that you download a ânormalâ program, you are risking a viral
infection. Prior to Java, most users did not download executable programs
frequently, and those who did scanned them for viruses prior to execution. Most
users still worried about the possibility of infecting their systems with a virus. In
addition, another type of malicious program exists that must be guarded against.
This type of program can gather private information, such as credit card numbers,
bank account balances, and passwords. Java answers both these concerns by
providing a âfirewallâ between a network application and your computer.
When you use a Java-compatible Web browser, you can safely download Java applets
without fear of virus infection or malicious intent.
Portability
For programs to be dynamically downloaded to all the various types of platforms
connected to the Internet, some means of generating portable executable code is
needed .As you will see, the same mechanism that helps ensure security also helps
create portability. Indeed, Javaâs solution to these two problems is both elegant and
efficient.
The Byte code
The key that allows the Java to solve the security and portability problems is that
the output of Java compiler is Byte code. Byte code is a highly optimized set of
instructions designed to be executed by the Java run-time system, which is called
the Java Virtual Machine (JVM). That is, in its standard form, the JVM is an
interpreter for byte code.
39. Translating a Java program into byte code helps makes it much easier to run a
program in a wide variety of environments. The reason is, once the run-time
package exists for a given system, any Java program can run on it.
Although Java was designed for interpretation, there is technically nothing about Java that prevents on-
the-fly compilation of byte code into native code. Sun has just completed its Just In Time (JIT) compiler for
byte code. When the JIT compiler is a part of JVM, it compiles byte code into executable code in real time,
on a piece-by-piece, demand basis. It is not possible to compile an entire Java program into executable
code all at once, because Java performs various run-time checks that can be done only at run time. The JIT
compiles code, as it is needed, during execution.
Java, Virtual Machine (JVM)
Beyond the language, there is the Java virtual machine. The Java virtual machine is an important element
of the Java technology. The virtual machine can be embedded within a web browser or an operating
system. Once a piece of Java code is loaded onto a machine, it is verified. As part of the loading process, a
class loader is invoked and does byte code verification makes sure that the code thatâs has been
generated by the compiler will not corrupt the machine that itâs loaded on. Byte code verification takes
place at the end of the compilation process to make sure that is all accurate and correct. So byte code
verification is integral to the compiling and executing of Java code.
Overall Description
Picture showing the development process of JAVA Program
Java programming uses to produce byte codes and executes them. The first box indicates that the Java
source code is located in a. Java file that is processed with a Java compiler called javac. The Java compiler
produces a file called a. class file, which contains the byte code. The. Class file is then loaded across the
network or loaded locally on your machine into the execution environment is the Java virtual machine,
which interprets and executes the byte code.
Java Architecture
Java architecture provides a portable, robust, high performing environment for development. Java
provides portability by compiling the byte codes for the Java Virtual Machine, which is then interpreted
on each platform by the run-time environment. Java is a dynamic system, able to load code when needed
from a machine in the same room or across the planet.
Compilation of code
When you compile the code, the Java compiler creates machine code (called byte code) for a hypothetical
machine called Java Virtual Machine (JVM). The JVM is supposed to execute the byte code. The JVM is
created for overcoming the issue of portability. The code is written and compiled for one machine and
interpreted on all machines. This machine is called Java Virtual Machine.
.Class
Java
Java byte code
40. Compiling and interpreting Java Source Code
During run-time the Java interpreter tricks the byte code file into thinking that it is running on a Java
Virtual Machine. In reality this could be a Intel Pentium Windows 95 or Sun SARC station running Solaris
or Apple Macintosh running system and all could receive code from any computer through Internet and
run the Applets.
Simple
Java was designed to be easy for the Professional programmer to learn and to use effectively. If you are an
experienced C++ programmer, learning Java will be even easier. Because Java inherits the C/C++ syntax
and many of the object oriented features of C++. Most of the confusing concepts from C++ are either left
out of Java or implemented in a cleaner, more approachable manner. In Java there are a small number of
clearly defined ways to accomplish a given task.
Object-Oriented
Java was not designed to be source-code compatible with any other language. This allowed the Java team
the freedom to design with a blank slate. One outcome of this was a clean usable, pragmatic approach to
objects. The object model in Java is simple and easy to extend, while simple types, such as integers, are
kept as high-performance non-objects.
Robust
The multi-platform environment of the Web places extraordinary demands on a program, because the
program must execute reliably in a variety of systems. The ability to create robust programs was given a
high priority in the design of Java. Java is strictly typed language; it checks your code at compile time and
run time.
Java
Interpreter
(Sparc)
Java
Interpreter
(Macintosh)
Java
Interpreter
(PC)
Java
Byte code
(Platform
indepen
SPARC
Compiler
Macintosh
Compiler
PC Compiler
Source
Code
âŚâŚâŚ..
âŚâŚâŚ..
41. Java virtually eliminates the problems of memory management and deallocation, which is completely
automatic. In a well-written Java program, all run time errors can âand should âbe managed by your
program.
42. JAVASCRIPT
JavaScript is a script-based programming language that was developed by Netscape
Communication Corporation. JavaScript was originally called Live Script and
renamed as JavaScript to indicate its relationship with Java. JavaScript supports the
development of both client and server components of Web-based applications. On
the client side, it can be used to write programs that are executed by a Web browser
within the context of a Web page. On the server side, it can be used to write Web
server programs that can process information submitted by a Web browser and then
updates the browserâs display accordingly
Even though JavaScript supports both client and server Web programming, we
prefer JavaScript at Client side programming since most of the browsers supports it.
JavaScript is almost as easy to learn as HTML, and JavaScript statements can be
included in HTML documents by enclosing the statements between a pair of scripting
tags
<SCRIPTS>..</SCRIPT>.
<SCRIPT LANGUAGE = âJavaScriptâ>
JavaScript statements
</SCRIPT>
Here are a few things we can do with JavaScript :
ď Validate the contents of a form and make calculations.
ď Add scrolling or changing messages to the Browserâs status line.
ď Animate images or rotate images that change when we move the
mouse over them.
ď Detect the browser in use and display different content for different
browsers.
ď Detect installed plug-ins and notify the user if a plug-in is required.
We can do much more with JavaScript, including creating entire application.
43. J a v a S c r i p t V s J a v a
JavaScript and Java are entirely different languages. A few of the most glaring
differences are:
ď Java applets are generally displayed in a box within the web document;
JavaScript can affect any part of the Web document itself.
ď While JavaScript is best suited to simple applications and adding
interactive features to Web pages; Java can be used for incredibly
complex applications.
There are many other differences but the important thing to remember is that
JavaScript and Java are separate languages. They are both useful for different
things; in fact they can be used together to combine their advantages.
A D V A N T A G E S
ď JavaScript can be used for Sever-side and Client-side scripting.
ď It is more flexible than VBScript.
ď JavaScript is the default scripting languages at Client-side since all the
browsers supports it.
Hyper Text Markup Language
Hypertext Markup Language (HTML), the languages of the World Wide Web (WWW),
allows users to produces Web pages that include text, graphics and pointer to other
Web pages (Hyperlinks).
HTML is not a programming language but it is an application of ISO Standard 8879,
SGML (Standard Generalized Markup Language), but specialized to hypertext and
adapted to the Web. The idea behind Hypertext is that instead of reading text in
rigid linear structure, we can easily jump from one point to another point. We can
navigate through the information based on our interest and preference. A markup
language is simply a series of elements, each delimited with special characters that
define how text or other items enclosed within the elements should be displayed.
Hyperlinks are underlined or emphasized works that load to other documents or
some portions of the same document.
44. HTML can be used to display any type of document on the host computer, which can
be geographically at a different location. It is a versatile language and can be used
on any platform or desktop.
HTML provides tags (special codes) to make the document look attractive. HTML
tags are not case-sensitive. Using graphics, fonts, different sizes, color, etc., can
enhance the presentation of the document. Anything that is not a tag is part of the
document itself.
Basic HTML Tags :
<!-- --> Specifies comments
<A>âŚâŚâŚ.</A> Creates hypertext links
<B>âŚâŚâŚ.</B> Formats text as bold
<BIG>âŚâŚâŚ.</BIG> Formats text in large font.
<BODY>âŚ</BODY> Contains all tags and text in the HTML document
<CENTER>...</CENTER> Creates text
<DD>âŚ</DD> Definition of a term
<DL>...</DL> Creates definition list
<FONT>âŚ</FONT> Formats text with a particular font
<FORM>...</FORM> Encloses a fill-out form
<FRAME>...</FRAME> Defines a particular frame in a set of frames
<H#>âŚ</H#> Creates headings of different levels
<HEAD>...</HEAD> Contains tags that specify information about a
document
<HR>...</HR> Creates a horizontal rule
<HTML>âŚ</HTML> Contains all other HTML tags
<META>...</META> Provides meta-information about a document
<SCRIPT>âŚ</SCRIPT> Contains client-side or server-side script
<TABLE>âŚ</TABLE> Creates a table
<TD>âŚ</TD> Indicates table data in a table
45. <TR>âŚ</TR> Designates a table row
<TH>âŚ</TH> Creates a heading in a table
ADVANTAGES
ď A HTML document is small and hence easy to send over the net. It is
small because it does not include formatted information.
ď HTML is platform independent.
ď HTML tags are not case-sensitive.
Java Database Connectivity
What Is JDBC?
JDBC is a Java API for executing SQL statements. (As a point of interest, JDBC is a
trademarked name and is not an acronym; nevertheless, JDBC is often thought of as
standing for Java Database Connectivity. It consists of a set of classes and interfaces
written in the Java programming language. JDBC provides a standard API for
tool/database developers and makes it possible to write database applications using
a pure Java API.
Using JDBC, it is easy to send SQL statements to virtually any relational database.
One can write a single program using the JDBC API, and the program will be able to
send SQL statements to the appropriate database. The combinations of Java and
JDBC lets a programmer write it once and run it anywhere.
What Does JDBC Do?
Simply put, JDBC makes it possible to do three things:
ď Establish a connection with a database
ď Send SQL statements
ď Process the results.
JDBC versus ODBC and other APIs
At this point, Microsoft's ODBC (Open Database Connectivity) API is that probably
the most widely used programming interface for accessing relational databases. It
offers the ability to connect to almost all databases on almost all platforms.
46. So why not just use ODBC from Java? The answer is that you can use ODBC from Java, but this is best
done with the help of JDBC in the form of the JDBC-ODBC Bridge, which we will cover shortly. The
question now becomes "Why do you need JDBC?" There are several answers to this question:
1. ODBC is not appropriate for direct use from Java because it uses a C
interface. Calls from Java to native C code have a number of drawbacks in the
security, implementation, robustness, and automatic portability of
applications.
2. A literal translation of the ODBC C API into a Java API would not be desirable.
For example, Java has no pointers, and ODBC makes copious use of them,
including the notoriously error-prone generic pointer "void *". You can think
of JDBC as ODBC translated into an object-oriented interface that is natural
for Java programmers.
3. ODBC is hard to learn. It mixes simple and advanced features together, and it
has complex options even for simple queries. JDBC, on the other hand, was
designed to keep simple things simple while allowing more advanced
capabilities where required.
4. A Java API like JDBC is needed in order to enable a "pure Java" solution.
When ODBC is used, the ODBC driver manager and drivers must be manually
installed on every client machine. When the JDBC driver is written completely
in Java, however, JDBC code is automatically installable, portable, and secure
on all Java platforms from network computers to mainframes.
Two-tier and Three-tier Models
The JDBC API supports both two-tier and three-tier models for
database access.
In the two-tier model, a Java applet or application talks directly to the database.
This requires a JDBC driver that can communicate with the particular database
management system being accessed. A user's SQL statements are delivered to the
database, and the results of those statements are sent back to the user. The
database may be located on another machine to which the user is connected via a
network. This is referred to as a client/server configuration, with the user's machine
as the client, and the machine housing the database as the server. The network can
be an Intranet, which, for example, connects employees within a corporation, or it
can be the Internet.
DBMS-proprietary protocol
Client machine
JDBC
JAVA
Application
47. In the three-tier model, commands are sent to a "middle tier" of services, which
then send SQL statements to the database. The database processes the SQL
statements and sends the results back to the middle tier, which then sends them to
the user. MIS directors find the three-tier model very attractive because the middle
tier makes it possible to maintain control over access and the kinds of updates that
can be made to corporate data. Another advantage is that when there is a middle
tier, the user can employ an easy-to-use higher-level API which is translated by the
middle tier into the appropriate low-level calls. Finally, in many cases the three-tier
architecture can provide performance advantages.
Until now the middle tier has typically been written in languages such as C or
C++, which offer fast performance. However, with the introduction of optimizing
compilers that translate Java byte code into efficient machine-specific code, it is
becoming practical to implement the middle tier in Java. This is a big plus, making it
possible to take advantage of Java's robustness, multithreading, and security
features. JDBC is important to allow database access from a Java middle tier.
JDBC Driver Types
The JDBC drivers that we are aware of at this time fit into one of four
categories:
ď JDBC-ODBC bridge plus ODBC driver
ď Native-API partly-Java driver
ď JDBC-Net pure Java driver
ď Native-protocol pure Java driver
JDBC-ODBC Bridge
If possible, use a Pure Java JDBC driver instead of the Bridge and an ODBC
driver. This completely eliminates the client configuration required by ODBC. It also
Database server
DBMS-proprietary protocol
Server machine (business Logic)
HTTP, RMI, or CORBA calls
Client machine (GUI)
DBMS
Application
Server (Java)
JDBC
Java applet or
Html browser
48. eliminates the potential that the Java VM could be corrupted by an error in the
native code brought in by the Bridge (that is, the Bridge native library, the ODBC
driver manager library, the ODBC driver library, and the database client library).
What Is the JDBC- ODBC Bridge?
The JDBC-ODBC Bridge is a JDBC driver, which implements JDBC operations
by translating them into ODBC operations. To ODBC it appears as a normal
application program. The Bridge implements JDBC for any database for which an
ODBC driver is available. The Bridge is implemented as the
sun.jdbc.odbc Java package and contains a native library used to access
ODBC. The Bridge is a joint development of Intersolv and JavaSoft.
Java Server Pages (JSP)
Java server Pages is a simple, yet powerful technology for creating and
maintaining dynamic-content web pages. Based on the Java programming language,
Java Server Pages offers proven portability, open standards, and a mature re-usable
component model .The Java Server Pages architecture enables the separation of
content generation from content presentation. This separation not eases
maintenance headaches, it also allows web team members to focus on their areas of
expertise. Now, web page designer can concentrate on layout, and web application
designers on programming, with minimal concern about impacting each otherâs
work.
Features of JSP
Portability:
Java Server Pages files can be run on any web server or web-enabled
application server that provides support for them. Dubbed the JSP engine, this
support involves recognition, translation, and management of the Java Server Page
lifecycle and its interaction components.
Components
It was mentioned earlier that the Java Server Pages architecture can include
reusable Java components. The architecture also allows for the embedding of a
scripting language directly into the Java Server Pages file. The components current
supported include Java Beans, and Servlets.
49. Processing
A Java Server Pages file is essentially an HTML document with JSP scripting or tags.
The Java Server Pages file has a JSP extension to the server as a Java Server Pages
file. Before the page is served, the Java Server Pages syntax is parsed and
processed into a Servlet on the server side. The Servlet that is generated outputs
real content in straight HTML for responding to the client.
Access Models:
A Java Server Pages file may be accessed in at least two different ways. A clientâs
request comes directly into a Java Server Page. In this scenario, suppose the page
accesses reusable Java Bean components that perform particular well-defined
computations like accessing a database. The result of the Beans computations,
called result sets is stored within the Bean as properties. The page uses such Beans
to generate dynamic content and present it back to the client.
In both of the above cases, the page could also contain any valid Java code. Java
Server Pages architecture encourages separation of content from presentation.
Steps in the execution of a JSP Application:
1. The client sends a request to the web server for a JSP file by giving the name
of the JSP file within the form tag of a HTML page.
2. This request is transferred to the JavaWebServer. At the server side
JavaWebServer receives the request and if it is a request for a jsp file server
gives this request to the JSP engine.
3. JSP engine is program which can understands the tags of the jsp and then it
converts those tags into a Servlet program and it is stored at the server side.
This Servlet is loaded in the memory and then it is executed and the result is
given back to the JavaWebServer and then it is transferred back to the result
is given back to the JavaWebServer and then it is transferred back to the
client.
JDBC connectivity
The JDBC provides database-independent connectivity between the J2EE platform
and a wide range of tabular data sources. JDBC technology allows an Application
Component Provider to:
50. ďˇ Perform connection and authentication to a database server
ďˇ Manager transactions
ďˇ Move SQL statements to a database engine for preprocessing and
execution
ďˇ Execute stored procedures
ďˇ Inspect and modify the results from Select statements.
Tomcat 6.0 web server
Tomcat is an open source web server developed by Apache Group. Apache
Tomcat is the servlet container that is used in the official Reference
Implementation for the Java Servlet and Java Server Pages technologies. The
Java Servlet and Java Server Pages specifications are developed by Sun under
the Java Community Process. Web Servers like Apache Tomcat support only web
components while an application server supports web components as well as
business components (BEAs Weblogic, is one of the popular application server).To
develop a web application with jsp/servlet install any web server like JRun, Tomcat etc to run
your application.
52. Bibliography:
Refer ences for t he P roj ect Develop ment we re ta ken fro m the
fol lowi ng Books and Web S ites .
Oracle
PL/SQL Programming by Scott Urman
SQL complete reference by Livion
JAVA Technologies
JAVA Complete Reference
Java Script Programming by Yehuda Shiran
Mastering JAVA Security
JAVA2 Networking by Pistoria
JAVA Security by Scotl oaks
Head First EJB Sierra Bates
J2EE Professional by Shadab siddiqui
JAVA server pages by Larne Pekowsley
JAVA Server pages by Nick Todd
HTML
HTML Black Book by Holzner
JDBC
Java Database Programming with JDBC by Patel moss.
Software Engineering by Roger Pressman
53. 7. Modules
IMPLEMENTATION
ďˇ Tweet Admin
In this module, the Admin has to login by using valid user name and
password. After login successful he can perform some operations such as
View Users and Authorize(Give link on user to view Profile),View all
Uses Friend Request and Response,Add Spam Filter name,View All
spamming accounts with profile details and Block,View All Un Block
request users details using decision tree format and Unblock by clicking
user name ,View all User's Tweet Topic with Interactions and
scores,View All Spam Account(Based on Virus,Malware) And Normal
Account with Reasons based on Random Forest Tree,View All Spamming
and Normal Behaviors based on Interactions by Filter Name and give link
to show Number of boith users in chart,View All Spamming and Normal
Behaviors based on Tweet Meta Data by Filter Name and give link to
show Number of boith users in chart,View Number of Spamming
Account and Normal Account in Chart
Friend Request & Response
In this module, the admin can view all the friend requests and responses.
Here all the requests and responses will be displayed with their tags such
54. as Id, requested user photo, requested user name, user name request to,
status and time & date. If the user accepts the request then the status will
be changed to accepted or else the status will remains as waiting.
ď User
In this module, there are n numbers of users are present. User should
register before performing any operations. Once user registers, their
details will be stored to the database. After registration successful, he has
to login by using authorized user name and password. Once Login is
successful user can perform some operations like View Your Profile with
community, Search Friends based on community, View Friend Request
and Response,View My Friends based on community, Create Tweet Topic
with tweet_postname,TAbout,TUses,tcontent desc, Browse
MetaData_desc,TweetURL,TDate and Time,TOwner,add TImage, Search
Tweet Topic by keyword and give Your Interactions(increse score while
viewing) and view URL to see web page,View all your Tweets Topic with
other Interactions and scores,View all your Friends Tweet Topic with
other Interactions and scores and give your Interactions,View All Similar
Friend's Tweets Topic,show all Spamming behaviors friends Topics with
profile.
Searching Users to make friends
55. In this module, the user searches for users in Same Network and in the
Networks and sends friend requests to them. The user can search for users
in other Networks to make friends only if they have permission.
8.
9. Accepting all user Information
10. View user data details
11.
12.
13.
14.
15.
16. Store and retrievals
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
Remote User
Admin
Architecture Diagram
Authorize
the Admin
Registering
the User
Tweet Admin
View Users and Authorize(Give link on user
to view Profile)
,View all Uses Friend Request and Response
,Add Spam Filter name
,View All spamming accounts with profile
details and Block
,View All Un Block request users details using
decision tree format and Unblock by clicking
user name
,View all User's Tweet Topic with
Interactions and scores
,View All Spam Account(Based on
Virus,Malware) And Normal Account with
Reasons based on Random Forest Tree
,View All Spamming and Normal Behaviors
based on Interactions by Filter Name and
give link to show Number of boith users in
chart
,View All Spamming and Normal Behaviors
based on Tweet Meta Data by Filter Name
and give link to show Number of boith users
Register with Community
,View Your Profile with community
,Search Friends based on community
,View Friend Request and Response
,View My Friends based on community
,Create Tweet Topic
,Search Tweet Topic
,View all your Tweets Topic with other Interactions and scores
,View all your Friends Tweet Topic
,View All Similar Friend's Tweets Topic
,show all Spamming behaviors friends Topics with profile.
Process all
user queries
WEB Database
56. 9. Class Diagram
ďˇ UML diagram representing object classes and their relationships.
ď Class Diagram :
Method
Members
Members
Method
Members
Login, Register
User Name, Password
Method
Method
Members
View Users and Authorize ,View all Uses Friend Req and Res ,Add Spam Filter
name ,View All spamming accounts ,View All Un Block request users
details ,View all User's Tweet Topic with Interactions and scores ,View All
Spam Account(Based on Virus, Malware) And Normal Account ,View All
Spamming and Normal Behaviors based on Interactions by Filter Name ,Give
link to show Number of both users in chart ,View All Spamming and Normal
Behaviors based on Tweet Meta Data by Filter Name ,Give link to show
Number of both users in chart ,View Number of Spamming Account and
Normal Account in Chart
tweet_postname,TAbout,TUses,tcontent desc, Browse
MetaData_desc,TweetURL,TDate and Time,TOwner,add TImage
Tweet Admin
Login (), Reset (),
Register ().
User Name, Password.
Login
Register (), Reset ()
User Name, Password, E-mail,
Mobile, Address, DOB,
Gender, Pin code, Image
Register
,View Your Profile with community ,Search Friends based on
community ,View Friend Request and Response ,View My Friends based
on community ,Create Tweet Topic ,Search Tweet Topic ,View all your
Tweets Topic with other Interactions and scores ,View all your Friends
Tweet Topic ,View All Similar Friend's Tweets Topic ,show all Spamming
behaviors friends Topics with profile.
tweet_postname, TAbout, TUses, tcontent desc, Browse
MetaData_desc,TweetURL,TDate and Time,TOwner,add TImage
.
End User
57. 10. Use Case Diagram
ďˇ Overview of the systemâs functional requirements.
ďˇ Use case
End User
View Users and Authorize, View all
Uses Friend Req and Res
Add Spam Filter name, View All spamming
accounts
Search Friends based on
community, view Friends
View all your Tweets Topic with
other Interactions and scores
Tweet
Admin
Create Tweet
Topic
View All Un Block request
users details, View all User's
Tweet Topic with Interactions
View all your
Friends Tweet
58. 11. Sequence Diagram
ďˇ Flow of operations with time-ordering between objects.
ď Sequence Diagram
View All Spam Account(Based on Virus,Malware) And Normal
Account
,View All Spamming and Normal Behaviors based
on Interactions by Filter Name
View All Spamming and Normal Behaviors
based on Tweet Meta Data by Filter Name,
View Number of Spamming Account and
Normal Account in Chart
View Friend Request and Response
Search Tweet Topic,
Add Spam Filter name
View all Users and authorize Register and Login
Web Server
User
View All spamming accounts, View All
Un Block request users details, View
all User's Tweet Topic with
View All Spam Account (Based on
Virus,Malware) And Normal Account,
View All Spamming and Normal
Behaviors based on Interactions by
View Revisited product pages
View My Friends based on community
Tweet Admin
Search products and review about the
products
59. show all Spamming behaviors friends Topics with
profile.
ďˇ
12. Flowchart
ďˇ Logical flow of operations/processes in the system.
ď Flow Chart : User
Yes No
View All Spamming and Normal
Behaviors based on Tweet Meta
Data by Filter Name, View
Number of Spamming Account
View all your Tweets Topic with other
Interactions and scores
View all your Friends Tweet Topic, View All
Similar Friend's Tweets Topic
User Register
Start
Login
View users Profile Username & Password
Wrong
60. Flow Chart : Admin
Yes No
Search Friends based on community,
View Friend Request and Response
View My Friends based on
community
Create Tweet Topic and Search Other
Tweet Topic
Admin Login
Login
View Users and
Authorize,View all Uses
Friend Req and Res
Username &
Password Wrong
Log Out
View All spamming accounts
Add Spam Filter name
Logout
View all your Tweets Topic with
other Interactions and scores, View
all your Friends Tweet Topic, View All
Similar Friend's Tweets Topic
Start
61. 13. Data Flow Diagram (DFD)
ď Data Flow Diagram :
View All Un Block request users details,
View all User's Tweet Topic with
Interactions and scores
View All Spam Account(Based on
Virus,Malware) And Normal Account,
View All Spamming and Normal
Behaviors based on Tweet Meta Data by
Filter Name
View All Spamming and Normal
Behaviors based on Interactions by Filter
Name
View Users and Authorize, View all
Uses Friend Req and Res,Add
Spam Filter name, View All
spamming accounts, View All Un
Block request users details ,View
all User's Tweet Topic with
Admin Register and
Login
System
View Friend Request
and Response, View My
Friends based on
community, View all
your Tweets Topic with
other Interactions and
scores, View all your
End User
Register with
the system,
Create Tweet
Topic
View Their
Own Details
Response
Request
Search Friends
based on
community, Search
Tweet Topic
62. 14. System Testing
6. SYSTEM TESTING
The purpose of testing is to discover errors. Testing is the process of trying
to discover every conceivable fault or weakness in a work product. It provides a
way to check the functionality of components, sub assemblies, assemblies
and/or a finished product It is the process of exercising software with the intent
of ensuring that the
Software system meets its requirements and user expectations and does not
fail in an unacceptable manner. There are various types of test. Each test type
addresses a specific testing requirement.
63. TYPES OF TESTS
Unit testing
Unit testing involves the design of test cases that validate that the
internal program logic is functioning properly, and that program inputs produce
valid outputs. All decision branches and internal code flow should be validated.
It is the testing of individual software units of the application .it is done after
the completion of an individual unit before integration. This is a structural
testing, that relies on knowledge of its construction and is invasive. Unit tests
perform basic tests at component level and test a specific business process,
application, and/or system configuration. Unit tests ensure that each unique
path of a business process performs accurately to the documented
specifications and contains clearly defined inputs and expected results.
Integration testing
Integration tests are designed to test integrated software components to
determine if they actually run as one program. Testing is event driven and is
more concerned with the basic outcome of screens or fields. Integration tests
demonstrate that although the components were individually satisfaction, as
shown by successfully unit testing, the combination of components is correct
and consistent. Integration testing is specifically aimed at exposing the
problems that arise from the combination of components.
64. Functional test
Functional tests provide systematic demonstrations that functions tested
are available as specified by the business and technical requirements, system
documentation, and user manuals.
Functional testing is centered on the following items:
Valid Input : identified classes of valid input must be accepted.
Invalid Input : identified classes of invalid input must be rejected.
Functions : identified functions must be exercised.
Output : identified classes of application outputs must be
exercised.
Systems/Procedures: interfacing systems or procedures must be invoked.
Organization and preparation of functional tests is focused on requirements,
key functions, or special test cases. In addition, systematic coverage pertaining
to identify Business process flows; data fields, predefined processes, and
successive processes must be considered for testing. Before functional testing
is complete, additional tests are identified and the effective value of current
tests is determined.
65. System Test
System testing ensures that the entire integrated software system meets
requirements. It tests a configuration to ensure known and predictable results.
An example of system testing is the configuration oriented system integration
test. System testing is based on process descriptions and flows, emphasizing
pre-driven process links and integration points.
White Box Testing
White Box Testing is a testing in which in which the software tester has
knowledge of the inner workings, structure and language of the software, or at
least its purpose. It is purpose. It is used to test areas that cannot be reached
from a black box level.
Black Box Testing
Black Box Testing is testing the software without any knowledge of the
inner workings, structure or language of the module being tested. Black box
tests, as most other kinds of tests, must be written from a definitive source
document, such as specification or requirements document, such as
specification or requirements document. It is a testing in which the software
under test is treated, as a black box .you cannot âseeâ into it. The test provides
inputs and responds to outputs without considering how the software works.
6.1 Unit Testing:
66. Unit testing is usually conducted as part of a combined code and unit
test phase of the software lifecycle, although it is not uncommon for coding
and unit testing to be conducted as two distinct phases.
Test strategy and approach
Field testing will be performed manually and functional tests will be
written in detail.
Test objectives
ďˇ All field entries must work properly.
ďˇ Pages must be activated from the identified link.
ďˇ The entry screen, messages and responses must not be delayed.
Features to be tested
ďˇ Verify that the entries are of the correct format
ďˇ No duplicate entries should be allowed
ďˇ All links should take the user to the correct page.
6.2 Integration Testing
Software integration testing is the incremental integration testing of two
or more integrated software components on a single platform to produce
failures caused by interface defects.
67. The task of the integration test is to check that components or software
applications, e.g. components in a software system or â one step up â software
applications at the company level â interact without error.
Test Results: All the test cases mentioned above passed successfully. No
defects encountered.
6.3 Acceptance Testing
User Acceptance Testing is a critical phase of any project and requires
significant participation by the end user. It also ensures that the system meets
the functional requirements.
Test Results: All the test cases mentioned above passed successfully. No
defects encountered.
SYSTEM TESTING
TESTING METHODOLOGIES
The following are the Testing Methodologies:
o Unit Testing.
o Integration Testing.
o User Acceptance Testing.
o Output Testing.
o Validation Testing.
68. Unit Testing
Unit testing focuses verification effort on the smallest unit of Software design
that is the module. Unit testing exercises specific paths in a moduleâs control structure to
ensure complete coverage and maximum error detection. This test focuses on each module
individually, ensuring that it functions properly as a unit. Hence, the naming is Unit Testing.
During this testing, each module is tested individually and the module
interfaces are verified for the consistency with design specification. All important processing
path are tested for the expected results. All error handling paths are also tested.
Integration Testing
Integration testing addresses the issues associated with the dual problems of
verification and program construction. After the software has been integrated a set of high
order tests are conducted. The main objective in this testing process is to take unit tested
modules and builds a program structure that has been dictated by design.
The following are the types of Integration Testing:
1)Top Down Integration
This method is an incremental approach to the construction of program structure.
Modules are integrated by moving downward through the control hierarchy, beginning with
the main program module. The module subordinates to the main program module are
incorporated into the structure in either a depth first or breadth first manner.
In this method, the software is tested from main module and individual stubs
are replaced when the test proceeds downwards.
2. Bottom-up Integration
This method begins the construction and testing with the modules at the lowest level
in the program structure. Since the modules are integrated from the bottom up, processing
required for modules subordinate to a given level is always available and the need for stubs is
eliminated. The bottom up integration strategy may be implemented with the following steps:
69. ď§ The low-level modules are combined into clusters into clusters that perform a
specific Software sub-function.
ď§ A driver (i.e.) the control program for testing is written to coordinate test case input
and output.
ď§ The cluster is tested.
ď§ Drivers are removed and clusters are combined moving upward in the program
structure
The bottom up approaches tests each module individually and then each module is module
is integrated with a main module and tested for functionality.
OTHER TESTING METHODOLOGIES
User Acceptance Testing
User Acceptance of a system is the key factor for the success of any system. The
system under consideration is tested for user acceptance by constantly keeping in touch with
the prospective system users at the time of developing and making changes wherever
required. The system developed provides a friendly user interface that can easily be
understood even by a person who is new to the system.
Output Testing
After performing the validation testing, the next step is output testing of the
proposed system, since no system could be useful if it does not produce the required output
in the specified format. Asking the users about the format required by them tests the
outputs generated or displayed by the system under consideration. Hence the output
format is considered in 2 ways â one is on screen and another in printed format.
Validation Checking
70. Validation checks are performed on the following fields.
Text Field:
The text field can contain only the number of characters lesser than or equal to its
size. The text fields are alphanumeric in some tables and alphabetic in other tables. Incorrect
entry always flashes and error message.
Numeric Field:
The numeric field can contain only numbers from 0 to 9. An entry of any character
flashes an error messages. The individual modules are checked for accuracy and what it has
to perform. Each module is subjected to test run along with sample data. The individually
tested modules are integrated into a single system. Testing involves executing the real data
information is used in the program the existence of any program defect is inferred from the
output. The testing should be planned so that all the requirements are individually tested.
A successful test is one that gives out the defects for the inappropriate data and
produces and output revealing the errors in the system.
Preparation of Test Data
Taking various kinds of test data does the above testing. Preparation of test
data plays a vital role in the system testing. After preparing the test data the system under
study is tested using that test data. While testing the system by using test data errors are again
uncovered and corrected by using above testing steps and corrections are also noted for future
use.
Using Live Test Data:
Live test data are those that are actually extracted from organization files. After a
system is partially constructed, programmers or analysts often ask users to key in a set of
data from their normal activities. Then, the systems person uses this data as a way to
partially test the system. In other instances, programmers or analysts extract a set of live
data from the files and have them entered themselves.
71. It is difficult to obtain live data in sufficient amounts to conduct extensive testing.
And, although it is realistic data that will show how the system will perform for the typical
processing requirement, assuming that the live data entered are in fact typical, such data
generally will not test all combinations or formats that can enter the system. This bias toward
typical values then does not provide a true systems test and in fact ignores the cases most
likely to cause system failure.
Using Artificial Test Data:
Artificial test data are created solely for test purposes, since they can be generated to test all
combinations of formats and values. In other words, the artificial data, which can quickly be
prepared by a data generating utility program in the information systems department, make
possible the testing of all login and control paths through the program.
The most effective test programs use artificial test data generated by persons other than
those who wrote the programs. Often, an independent team of testers formulates a testing
plan, using the systems specifications.
The package âVirtual Private Networkâ has satisfied all the requirements specified as per
software requirement specification and was accepted.
USER TRAINING
Whenever a new system is developed, user training is required to educate them about the
working of the system so that it can be put to efficient use by those for whom the system
has been primarily designed. For this purpose the normal working of the project was
demonstrated to the prospective users. Its working is easily understandable and since the
expected users are people who have good knowledge of computers, the use of this system is
very easy.
MAINTAINENCE
This covers a wide range of activities including correcting code and design errors. To reduce
the need for maintenance in the long run, we have more accurately defined the userâs
requirements during the process of system development. Depending on the requirements,
this system has been developed to satisfy the needs to the largest possible extent. With
72. development in technology, it may be possible to add many more features based on the
requirements in future. The coding and designing is simple and easy to understand which
will make maintenance easier.
TESTING STRATEGY :
A strategy for system testing integrates system test cases and design techniques into a well
planned series of steps that results in the successful construction of software. The testing
strategy must co-operate test planning, test case design, test execution, and the resultant
data collection and evaluation .A strategy for software testing must accommodate low-
level tests that are necessary to verify that a small source code segment has been correctly
implemented as well as high level tests that validate major system functions against
user requirements.
Software testing is a critical element of software quality assurance and represents the
ultimate review of specification design and coding. Testing represents an interesting
anomaly for the software. Thus, a series of testing are performed for the proposed
system before the system is ready for user acceptance testing.
SYSTEM TESTING:
Software once validated must be combined with other system elements (e.g. Hardware,
people, database). System testing verifies that all the elements are proper and that overall
system function performance is achieved. It also tests to find discrepancies between the
system and its original objective, current specifications and system documentation.
UNIT TESTING:
In unit testing different are modules are tested against the specifications produced during
the design for the modules. Unit testing is essential for verification of the code produced
during the coding phase, and hence the goals to test the internal logic of the modules.
73. Using the detailed design description as a guide, important Conrail paths are tested to
uncover errors within the boundary of the modules. This testing is carried out during the
programming stage itself. In this type of testing step, each module was found to be working
satisfactorily as regards to the expected output from the module.
In Due Course, latest technology advancements will be taken into consideration. As
part of technical build-up many components of the networking system will be generic in
nature so that future projects can either use or interact with this. The future holds a lot to
offer to the development and refinement of this project.
TESTING STRATEGY :
A strategy for system testing integrates system test cases and design techniques into
a well planned series of steps that results in the successful construction of software. The
testing strategy must co-operate test planning, test case design, test execution, and the
resultant data collection and evaluation .A strategy for software testing must accommodate
low-level tests that are necessary to verify that a small source code segment has been
correctly implemented as well as high level tests that validate major system functions
against user requirements.
Software testing is a critical element of software quality assurance and represents the
ultimate review of specification design and coding. Testing represents an interesting
anomaly for the software. Thus, a series of testing are performed for the proposed
system before the system is ready for user acceptance testing.
74. 16. Conclusion
CONCLUSION
In this paper, we have proposed a hybrid approach exploiting community-based
features with metadata-, content-, and interaction-based features for detecting
automated spammers in Twitter. Spammers are generally planted in OSNs for
varied purposes, but absence of real-life identity hinders them to join the trust
network of benign users. Therefore, spammers randomly follow a number of
users, but rarely followed back by them, which results in low edge density
among their followers and followings. This type of spammers interaction pattern
can be exploited for the development of effective spammers detection systems.
Unlike existing approaches of characterizing spammers based on their own
profiles, the novelty of the proposed approach lies in the characterization of a
spammer based on its neighbouring nodes (especially, the followers) and their
interaction network. This is mainly due to the fact that users can evade features
that are related to their own activities,
but it is difficult to evade those that are based on their followers. On analysis,
metadata-based features are found to be least effective as they can be easily
evaded by the sophisticated spammers by using random number generator
algorithms. On the other hand, both interaction- and community-based features
are found to be the most discriminative for spammers detection.
Attaining perfect accuracy in spammers detection is extremely difficult, and
accordingly any feature set can never be considered as complete and sound, as
spammers keep on changing their operating behaviour to evade detection
mechanism.
Therefore, in addition to profile-based characterization, complete logs of
spammers starting from their entry in the network to their detection, need to be
analyzed to model the evolutionary behaviour and phases of the life-cycles of
75. spammers. But, generally spammers are detected when they are at very
advanced stage, and it is difficult to get their past logs data. Moreover, it may
happen that a user is operative in the network as a benign user, and later on, it
start sillicit activities due to whatsoever reasons, and considered as spammer. In
this circumstance, even analyzing log data may lead to wrong characterization.
Analysis of spammers network to unearth different types of coordinated spam
campaigns run by the spambots seems one of the promising future directions of
research. Moreover, analyzing the temporal evolution of spammersâ followers
may reveal some interesting patterns that can be utilized for spammers
characterization at different levels of granularity.
17. References
REFERENCES
[1] M. Tsikerdekis, âIdentity deception prevention using common contribution
network data,â IEEE Trans. Inf. Forensics Security, vol. 12, no. 1,pp. 188â199,
Jan. 2017.
[2] T. Anwar and M. Abulaish, âRanking radically influential Web forum users,â
IEEE Trans. Inf. Forensics Security, vol. 10, no. 6,pp. 1289â1298, Jun. 2015.
[3] Y. Boshmaf, I. Muslukhov, K. Beznosov, and M. Ripeanu, âDesign and
analysis of social botnet,â Comput. Netw., vol. 57, no. 2, pp. 556â578,2013.
[4] D. Fletcher, âA brief history of spam,â TIME, Nov. 2, 2009.[Online].
Available:http://www.time.com/time/business/article/0,8599,1933796,00.html
[5] Y. Boshmaf, M. Ripeanu, K. Beznosov, and E. Santos-Neto, âThwarting
fake OSN accounts by predicting their victims,â in Proc. AISec, Denver,CO,
USA, 2015, pp. 81â89.
76. [6] A. A. Amleshwaram, N. Reddy, S. Yadav, G. Gu, andC. Yang, âCATS:
Characterizing automation of Twitter spammers,â in Proc. COMSNETS,
Bengaluru, India, Jan. 2013, pp. 1â10.
[7] K. Lee, J. C. Lee, and S. Webb, âUncovering social spammers:
Socialhoneypots + machine learning,â in Proc. SIGIR, Geneva, Switzerland,Jul.
2010, pp. 435â442.
[8] G. Stringhini, C. Kruegel, and G. Vigna, âDetecting spammers on
socialnetworks,â in Proc. ACSAC, Austin, TX, USA, 2010, pp. 1â9.
[9] H. Yu, M. Kaminsky, P. B. Gibbons, and A. D. Flaxman,
âSybilGuard:Defending against sybil attacks via social networks,â IEEE/ACM
Trans.Netw., vol. 16, no. 3, pp. 576â589, Jun. 2008.
[10] H. Gao, J. Hu, C. Wilson, Z. Li, Y. Chen, and B. Y. Zhao, âDetectingand
characterizing social spam campaigns,â in Proc. IMC, Melbourne,VIC,
Australia, 2001, pp. 35â47.
[11] W. Wei, F. Xu, C. C. Tan, and Q. Li âSybildefender: Defend against sybil
attacks in large social networks,â in Proc. INFOCOM, Orlando, FL, USA, Mar.
2012, pp. 1951â1959.
[12] C. Yang, R. C. Harkreader, and G. Gu, âDie free or live hard? Empirical
evaluation and new design for fighting evolving Twitter spammers,â in Proc.
RAID, Menlo Park, CA, USA, 2011, pp. 318â337.
[13] S. Lee and J. Kim, âWarningBird: A near real-time detection system
forsuspicious URLs in Twitter stream,â IEEE Trans. Depend. Sec. Comput.,vol.
10, no. 3, pp. 183â195, May 2013.
[14] M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz, âA
Bayesianapproach to filtering junk e-mail,â in Proc. Workshop Learn. Text
Categorization, Madison, WI, USA, 1998, pp. 98â105.
[15] C. Schäfer, âDetection of compromised email accounts used by a spam
botnet with country counting and theoretical geographical travelling speed
77. extracted from metadata,â in Proc. ISSREW, Naples, Italy,Nov. 2014, pp. 329â
334.
[16] C. Schäfer, âDetection of compromised email accounts used for spamming
in correlation with origin-destination delivery notification extracted from
metadata,â in Proc. ISDFS, Tirgu Mures, Romania, Apr. 2017,pp. 1â6.
[17] A. H. Wang, âDetecting spam bots in online social networking sites:A
machine learning approach,â in Proc. DBSec, Rome, Italy, 2010,pp. 335â342.
[18] F. Ahmed and M. Abulaish, âA generic statistical approach for spam
detection in online social networks,â Comput. Commun., vol. 36,nos. 10â11, pp.
1120â1129, 2013.
[19] C. Yang, R. Harkreader, and G. Gu, âEmpirical evaluation and new design
for fighting evolving Twitter spammers,â IEEE Trans. Inf. Forensics Security,
vol. 8, no. 8, pp. 1280â1293, Aug. 2013.
[20] Y. Zhu, X. Wang, E. Zhong, N. N. Liu, H. Li, and Q. Yang, âDiscovering
spammers in social networks,â in Proc. AAAI, Toronto, ON, Canada,2012, pp.
52â58.
[21] E. Tan, L. Guo, S. Chen, X. Zhang, and Y. Zhao, âSpammer
behavioranalysis and detection in user generated content on social networks,âin
Proc. ICDCS, Macau, China, Jun. 2012, pp. 305â314.
[22] S. Y. Bhat and M. Abulaish, âCommunity-based features for identifying
spammers in online social networks,â in Proc. Int. Conf. Adv.Social Netw. Anal.
Mining, Niagara Falls, ON, Canada, Aug. 2013,pp. 100â107.
[23] L. M. Aiello, M. Deplano, R. Schifanella, and G. Ruffo, âPeople are
strange when youâre a stranger: Impact and influence of bots on
socialnetworks,â in Proc. AAAI, Dublin, Ireland, 2012, pp. 10â17.
[24] S. Cresci, R. D. Pietro, M. Petrocchi, A. Spognardi, and M. Tesconi, âThe
paradigm-shift of social spambots: Evidence, theories, and toolsfor the arms
race,â in Proc. WWW, Perth, WA, Australia, 2017,pp.