Cluster Validation Indexes for Spatio Temporal Data
A B S T R A C T
One of fundamental challenges of clustering is how to evaluate results, without auxiliary information. A common approach for evaluation of clustering results is to use validity indexes. Clustering validity approaches can use three criteria: External criteria (evaluate the result with respect to a pre-specified structure), internal criteria (evaluate the result with respect a information intrinsic to the data alone).
Consequently, different types of indexes are used to solve different types of problems and indexes selection depends on the kind of available information. That is why in this paper we show a comparison between external and internal indexes. Results obtained in this study indicate that internal indexes are more accurate in group determining in a given clustering structure.
Six internal indexes were used in this study: BIC, CH, DB, SIL, NIVA and DUNN and four external indexes (F-measure, NMI Measure, Entropy, Purity). The clusters that were used were obtained through clustering algorithms K-means.
Almost every clustering algorithm depends on the characteristics of the dataset and on the input parameters. Incorrect input parameters may lead to clusters that deviate from those in the dataset. In order to determine the input parameters that lead to clusters that best fit a given dataset. Clustering validity indexes are usually defined by combining compactness and separability. There are several clustering validation techniques have been proposed by several people. The result of the clustering validation index is mainly depend on the parameters like number of clusters, the clustering algorithms that will be used for clustering (Example of clustering algorithm will be k-means ,fuzzy clustering algorithm and so on). The no of clusters that will be clustered is also an important in clustering validation. There are several clustering algorithms are proposed. They are namely Davies Bouldin index, Dunn Index, Calinski Harabasz Index, Silhouette Index and Point-Biserialindex.
In the proposed systems the validation indexes are implemented for spatio-temporal. There is no new index algorithm is implemented. By using this validation indexes we can obtain the result for spatio-temporal data. Through this existing system I can learn about the clustering, and also gain some knowledge through these validation techniques. All validation techniques are implemented for textual data, in this proposed we are implementing the cluster validation techniques for Spatio-temporal data.
Preliminary investigation examine project feasibility, likelihood the system will be useful to the organization. The main objective of the feasibility study is to test the Technical, Operational and Economical feasibility for adding new modules and debugging old running system. All system is feasible if they are unlimited resources and infinite time.
System analysis is conducted with the following objectives
There are aspects in the feasibility study portion of the preliminary investigation:
It is the most difficult area to access because objectives, functions performance are somewhat hazy, anything seems to be possible if right assumptions are made. The considerations that are normally associated with technical feasibility includes
The proposed system will generate many kinds of reports depending on the requirements. By automating all these activities the work is done effectively and in time. There is also quick and good response for each operation.
Proposed project is beneficial only if it can be turned into information systems that will meet the organizations operating requirements. Simply stated, this test of feasibility asks if the system will work when it is developed and installed. Are there major barriers to Implementation? Here are questions that will help test the operational feasibility of a project: Is there sufficient support for the project from management from users? If the current system is well liked and used to the extent that persons will not be able to see reasons for change, there may be resistance.
The Economic Feasibility is generally the bottom line considerations for most systems. It is an obvious fact that the computerization of the project is economically advantageous.Firstly it will increase the efficiency and decrease the man-hour required to achieve the necessary result. Secondly it will provide timely and up to date to the administrative and individual departments. Since all the information is available with in a few seconds the system performance will be substantially increased.
Data Mining Uses:
Data mining is used for a variety of purposes in both the private and public sectors.
Retailers can use information collected through affinity programs (e.g., shoppers’ club cards, frequent flyer points, contests) to assess the effectiveness of product selection and placement decisions, coupon offers, and which products are often purchased together.
Unit testing focuses verification effort on the smallest unit of software i.e. the module. Using the detailed design and the process specifications testing is done to uncover errors within the boundary of the module. All modules must be successful in the unit test before the start of the integration testing begins.
In this project “Evaluation of Employee Performance” each service can be thought of a module. There are so many modules like Executive, Debit Card, Credit Cards, Performance, and Bills. Each module has been tested by giving different sets of inputs (giving wrong Debit card Number, Executive code) when developing the module as well as finishing the development so that each module works without any error. The inputs are validated when accepting from the user.
After the unit testing we have to perform integration testing. The goal here is to see if modules can be integrated properly, the emphasis being on testing interfaces between modules. This testing activity can be considered as testing the design and hence the emphasis on testing module interactions.
In this project ‘Evaluation of Employee Performance’, the main system is formed by integrating all the modules. When integrating all the modules I have checked whether the integration effects working of any of the services by giving different combinations of inputs with which the two services run perfectly before Integration.
Here the entire software system is tested. The reference document for this process is the requirements document, and the goals to see if software meets its requirements. Here entire ‘Evaluation of Employee Performance’ has been tested against requirements of project and it is checked whether all requirements of project have been satisfied or not.
Acceptance Test is performed with realistic data of the client to demonstrate that the software is working satisfactorily. Testing here is focused on external behavior of the system; the internal logic of program is not emphasized.
In this project ‘Evaluation of Employee Performance’s have collected some data and tested whether project is working correctly or not. Test cases should be selected so that the largest number of attributes of an equivalence class is exercised at once. The testing phase is an important part of software development. It is the process of finding errors and missing operations and also a complete verification to determine whether the objectives are met and the user requirements are satisfied.
White Box Testing:
This is a unit testing method where a unit will be taken at a time and tested thoroughly at a statement level to find the maximum possible errors. I tested step wise every piece of code, taking care that every statement in the code is executed at least once. The white box testing is also called Glass Box Testing. I have generated a list of test cases, sample data. Which is used to check all possible combinations of execution paths through the code at every module level?
Black Box Testing:
This testing method considers a module as a single unit and checks the unit at interface and communication with other modules rather getting into details at statement level. Here the module will be treated as a block box that will take some input and generate output. Output for a given set of input combinations are forwarded to other modules.
The following links contains abstract, table of contents, documentation, power-point presentation and source code of Cluster Validation Indexes for Spatio Temporal Data.