Saturday, March 26, 2011

Set theory

Set theory is the branch of mathematics that studies sets, which are collections of distinct objects.

Member (): If o is a member (or element) of A, we write oA.  The key relation between sets is membership – when one set is an element of another. If a is a member of B, this is denoted a B, while if c is not a member of B then c B. For example, with respect to the sets A = {1,2,3,4}, B = {blue, white, red}, and F = {n2 − 4 : n is an integer; and 0 ≤ n ≤ 19} defined above,
4
A and 285 F; but
9
F and green B.
Subset relation or set inclusion (): A derived binary relation between two sets is the subset relation, also called set inclusion.  If all the members of set A are also members of set B, then A is a subset of B, denoted A B.  For example, {1,2} is a subset of {1,2,3} , but {1,4} is not.
If every member of set A is also a member of set B, then A is said to be a subset of B, written A B (also pronounced A is contained in B). Equivalently, we can write B A, read as B is a superset of A, B includes A, or B contains A. The relationship between sets established by is called inclusion or containment.
If A is a subset of, but not equal to, B, then A is called a proper subset of B, written A B (A is a proper subset of B) or B A (B is a proper superset of A).
Note that the expressions A B and B A are used differently by different authors; some authors use them to mean the same as A B (respectively B A), whereas other use them to mean the same as A B (respectively B A). A is a subset of B
Example:The set of all men is a proper subset of the set of all people.
{1, 3}
{1, 2, 3, 4}.
{1, 2, 3, 4}
{1, 2, 3, 4}.
The empty set is a subset of every set and every set is a subset of itself: A.
A
A.
An obvious but useful identity, which can often be used to show that two seemingly different sets are equal:
A = B if and only if A
B and B A.

Union (): Union of the sets A and B, denoted A B, is the set of all objects that are a member of A, or B, or both. The union of {1, 2, 3} and {2, 3, 4} is the set {1, 2, 3, 4}.
Intersection (∩): Intersection of the sets A and B, denoted A ∩ B, is the set of all objects that are members of both A and B. The intersection of {1, 2, 3} and {2, 3, 4} is the set {2, 3} .
Complement: Complement of set A relative to set U, denoted Ac, is the set of all members of U that are not members of A. This terminology is most commonly employed when U is a universal set, as in the study of Venn diagrams. This operation is also called the set difference of U and A, denoted U \ A. The complement of A: {1,2,3}  relative to U: {2,3,4} is {4} , while, conversely, the complement of {2,3,4} relative to {1,2,3} is {1} .
Symmetric difference : Symmetric difference of sets A and B is the set of all objects that are a member of exactly one of A and B (elements which are in one of the sets, but not in both). For instance, for the sets {1,2,3} and {2,3,4} , the symmetric difference set is {1,4} . It is the set difference of the union and the intersection, (A B) \ (A ∩ B).
Cartesian(x) : Cartesian product of A and B, denoted A × B, is the set whose members are all possible ordered pairs (a,b) where a is a member of A and b is a member of B.
Power set: Power set of a set A is the set whose members are all possible subsets of A. For example, the powerset of {1, 2} is { {}, {1}, {2}, {1,2} } .

References:

Tuesday, March 15, 2011

Weighted Average

It is actually quite simple to find the average we all know from our school mathematics. Let’s say we have 10 numbers, then we add them all up and divide the sum with the number of numbers, in this case 10, and we have the average.

When is weighted average, the difference is only that at least one of the 10 numbers is given a different weight compared to the rest, let’s say that is 2. That means that this number count double compared to the other 9 numbers? So our calculation will be that number times 2 + the other 9 numbers, divided 2 + 9.

The idea of weighted average is not difficult to understand. This is extremely simple.

One example , Suppose that there are two classes. One is for boys having strength of 12 and the other is for girls which has 6 students in it. If there takes place an exam, and the scores of boys are 8, 6, 7, 4, 9, 5, 8, 2, 7, 6, 5, 7. Now calculate the average score of boys and you will find that the average score per boy is (8+6+7+4+9+5+8+2+7+6+5+7)/12= 6.17. Now turn towards the girls, the girls scored 8, 9, 7, 9, 10, 6. Now calculating the average score for girls we find that girls maintained an average of above (8+9+7+9+10+6)/6=8.0. Now we calculate the cumulative average of both the classes it comes to just above ( by adding the two averages and then dividing by 2. the result is(6.17+8.0)/2= 7.085 ). This is where the average differs from the weighted average. However, many people will not like the results since these provide a misleading picture of the performance of the students since the average of the majority of students, the boys, is much less than the aggregate average. In this case a pragmatic assessment maker shall try to bring theory close to the reality. He shall assign a weightage to each class. He will multiply the average of boys by their strength, i.e. 12, and multiply the average score of girls by the strength of their section, i.e. 6 and add the two products of multiplications and then divide the sum of these products by the sum of the weightages that he assigned to the individual averages. [ {12(6.17) + 6(8.17)} / 12+6]= 6.78. If you calculate the answer now, it’s a lot different.

Another example is, in 2 regions the average weight of people is 150 and 140. Utilizing this type of Information, we simply cannot obtain the unique average weight with regards to people inside the two parts bundled.

This enable me set up this type of technique. With section 1 there are actually 40 men and women, and the average bodyweight happens to be 150 and additionally inside of segment 2 you’ll encounter 50 persons, and their unique average bodyweight is actually 140. Acquiring this exact 2 groups the weighted average calculations shall be along these lines: 40 * 150 + 50 * 140 / 40 + 50 = 144.4 
So, we’re able to obtain the average weight of folks inside of each groups, which is equal to 144.4.

http://weightedaverage.org/

Sunday, March 13, 2011

List of Organization work on Data Mining

Tom Khabaza
Tom helps organisations improve their marketing and customer processes, and to improve their efficiency, risk analysis and fraud detection, through new knowledge and predictive capabilities extracted from their data
http://www.khabaza.com/

The Modeling Agency (TMA)
The Modeling Agency (TMA) provides training, mentorship and solutions to the data-rich, yet information-poor. More than a hundred highly recognizable organizations have called upon TMA for strategic direction in predictive modeling and tactical support in data mining. http://www.the-modeling-agency.com/

CRoss Industry Standard Process for Data Mining
The CRISP-DM project developed an industry- and tool-neutral data mining process model. Starting from the embryonic knowledge discovery processes used in early
data mining projects and responding directly to user requirements, this project defined and validated a data mining process that is applicable in diverse industry sectors. This methodology makes large data mining projects faster, cheaper, more reliable and more manageable. Even small scale data mining investigations benefit from using CRISP-DM. http://www.crisp-dm.org/index.htm

Saturday, March 12, 2011

Architecture of typical data mining system

What motivated data mining?

We have huge amount of data and now need to find some valuable information of or knowledge from this huge amount of data. we can use data mining technology on fraud detection, customer retention, loan recovery etc

What is data mining ?

Data mining is process to find the pattern or discover knowledge from large amount of data. The data mining is process to discover the knowledge.  Here is some sequential process/step to complete this activities


Step 1:  Data cleaning( Remove the inconsistent and noise data)
Step 2:  Data Integration( Multiple data source  combined together)
Step 3: Data selection ( fetch the relevant data for analysis)
Step 4: Data transformation( data  transformed or consolidate by performing summery or aggregation operations for mining)
Step 5: Data mining ( Here intelligent method are applied to find the data pattern)
Step 6: Pattern evaluation:
Step 7: Knowledge presentation: ( This  is presentation layer)


References:
Jiawei Han and Micheline Kamber
Data Mining: Concepts and Techniques, 2nd ed.
The Morgan Kaufmann Series in Data Management Systems, Jim Gray, Series Editor
Morgan Kaufmann Publishers, March 2006. ISBN 1-55860-901-6

Web Address for further study
1. http://en.wikipedia.org/wiki/Data_mining