Friday, October 9, 2009

What is Data Mining ?

Data Mining is a set of process related to analysing and discovering useful or actionable knowledge burried deep beneath large volume of datasets or data stores.

The knowledge discovery involves finding patterns or behaviour within the data that lead some profitalbe business action. Data Mining requires the large amount of data including history data as well as current data to explore the knowledge. Once the required amount of data has been accumulated from the various sources, it is cleansed, validated and prepared for storing it in the data warehouse or data mart. BI reporting tool capture the required the fact from these data to be used by the knowledge discovery process.



Data Mining can be accomplish by one or more of the traditional knowledge discovery techniqes like Market Barket Analysis, Clustering,Memory Based Resoning, Link Analysis, Neutral Network and so on.



Data Mining Life Cycle Model




Find Out Business Problem Consider Company's currrent year sale is dropped by a percenage when compared to the last year. By using OLAP Tools, the exact the fact can be determined across the serveral dimension like region, time, product,,,,,,,

Knowledge Discovery Given the business problem, various reasons for the decrease in the sales have to be analyzed utilizing one or more data mining techniques. Causes may include the poor quality of the product or services or flaws in making scemes or less demand for the product or seasonal changes or regulations enforced by the government or competition pressure and so on.The exact solustions has to be find out in order to resolve this sales drop, which call it Knowledge Discover here.

Implement the knowlege Based on the upon discovery, proper actions should be taken in order to overcome this business problem.

Analyze the Results Once it has been implemeted, results needs to be monitored and meaused to find out the outcome of the actions.

OLAP Vs. Data Mining OLAP helps organization to find out the measures like sales drop, productiviy,service response time,inventory in hand. In simple terms, OLAP tells us 'what has happended' and Data Mining helps us to find ' Why it has happend ?'. Data Mining is also used to predict ' What will happen in the future ?' with the help of the data patterns available within the organization or publicaly availably data.

Say For Example Suppose a borrower with a bad credit and employement history applies for the mortgage loan, his application may be denied by a mortgage lender since he/she may default the loan is approved. The mortgage lender would come to this decision based on the historical data mined following a similar pattern.

Thursday, October 8, 2009

Quick overview about OLTP Vs. OLAP

Just click on the image below, look at more closely and clearly.....



What is database Normalization ?

Database Normalization is the process of organizing the data in a database. This includes creating tables and establishing the relationships among them according to rules designed both to protect the data and to make the database more flexible by eliminating redundancy and inconsistent dependent.

Redundant data wastes disk space and creates maintainance problem. If the same data exists more than one place then whenver we need to update or change it, we have to make those changes all the places.

Now what is inconsistent dependency?..........In very simple terms our database is made in such a way or by such relationship we are not able to find or access the relevant data coz path to find that data is missing.

(Source: http://support.microsoft.com/kb/283878 )

What is Data Modeling ?



Data Modeling is a way to structure and organize the data so that it can be easily used by the databases. Unstructured data can be found in word processing documents, email messages,audio or video files,design programs.Data modeling does not want these " ugly data ", rather, data modeling wants data is all made up in a nice, neat package for processing by a database.So in a way,data modeling concerned with how the data looks.

Data Modeling is routinely used in conjunction with the database management system. Data has been modeled and made ready for this system can be identified in various ways, according to what they represents and how they relate to other data.

The idea is to make the data as presentable as possible, so analysis and integration can be done as little effort as possible.

we can also think the data modeling as the instructions for the building database. Concentrate on the word Model, you will get what we're going after here. To make a " pretty " database, you will want to follow a model as a means towards your desire end.

Say For Example,,

If you want to analyze how many people in a given congressional district voted in the last election,you will naturally want to include a column for which party each person voted.That analysis will be very valuable to the political leader.And its a kind of detail that you can build into a database while creating from the ground up, by instruction the database management system to include that column of information in the resulting database. If you wanted to analyze that information specifically but didn't include a column for it in your database, you'd spend a lot of time for collecting the data -effort would not be necessary if you had the data model in a first place.

So, Data Modeling is therefore very important skill to implement when building database.

(Source: http://www.youtube.com/watch?v=b2n4d-L6Qfg )

How Data WareHouse and Data Mart are related?

Ans:

In simple and logical terms,Data Warehouse is filled up with Data Marts, and each DataMart is a particular subject oriented collection of data.




what is ETL ? Explain it.

Ans:

ETL stands for the Extraction, Transform and Loading,they are three database functions, they are combined into one tool to pull out data from one database and place it into another database.

Extract : The process of the reading the data from one database

Transform: The process of converting the extracted data from its previous form into the form it needs to be so that it can be placed into another database. Transformation occurs by using rules or by lookup tables or by combining data into another data.

Load: The process of writing the data to the target database.

ETL is used to migrate data from one database to another, to forms Data Marts and Data Ware House also to conver one database from one type to another.

What is the architecture of the Data WareHouse ?


Ans:
Just click the link below and have a look at it in a whole you will have a better idea,


What should be the charactersitic of the Data Wareouse ?

Answer: The nature of the Data Warehouse Should be ,
- > Subject - Oriented
- > Integrated
- > Non-Volatile
- > Time - Variant

What is Data Warehouse ?

The term Data Warehouse was first coined by Bill Inmon in 1990, and he defined in his way like, A Data Warehouse is a Subject Oriented, Integrated,Time-Variant,and Non-Volatile collection of data in support of management's decision making system.

Each Term in Detial...

What it means by Subject-Oriented?
- > Data that gives the information about particular subject instead of company's ongoing process.

What it means by Integrated ?
- >Data that is gathered into a warehouse from the different source and merged into a coherent whole.

What it means by Time-Variant ?
- > All the data in the data warehouse is identified with a particular time period.

What it means by Non-Volatile?
- > Data is stable in the data warehouse.More data is added but never removed from it.This enables the management the consistent picture of the business.


This defenation remains reasonably accurate almost till now. However single subject data warehouse is typically referred to as data mart, while generally data warehouse keeps the whole enterprise in scope.Also data warehouse can be volatile.Due to large amount of storage required for a data warehouse,only a certain number of period of the history are kept in to a warehouse.

Let's say for an example,If 3 years of data are decided on and loaded into a warehouse so every month the oldest month will be rolled off the database and newest month will be added.

(Source: "What is a Data Warehouse?" W.H. Inmon, Prism, Volume 1, Number 1, 1995).