Some thoughts on Ted Codd's papers. Most of this is borrowed from the excellent series of articles by C. J. Date called "The Birth of the Relational Model". http://www.aisintl.com/case/library/Date_Birth%20of%20the%20Relational%20Model-1.html Codd had another paper called "Derivability, Redundancy, and Consistency of Relations stored in Large Data Banks", published in 1969. C. J. Date discusses the differences between the two papers in the series of articles above (some mentioned below). Everything in quotes is from Codd's papers. - Note the title of the first paper: he was investigating notions of "redundancy" and "consistency" using the relational model. - "..provides a means of describing data with its natural structure only -- that is without superimposing any additional structure for machine representation purposes..". - Three types of dependence: - ordering dependence - indexing dependence - indexes are redundant information and we should be able to create/destroy anytime. - access path dependence - "relations" vs "relationships" - the notion of relationship here has nothing to do with relationships from ER model. - the difference between the two is that "relations" care about positions, "relationships" don't - relational model as we use today doesn't care about positions - related to the confusion between "domains" and "attributes" - Codd doesn't clearly distinguish between domains and "attributes". This leads to much later confusion. - Made database management into a science by putting the field on a solid scientific footing by providing a theoretical framework - Importance of distinction between model and implementation - Physical data independence: independence bteween application programs and machine data represetnation - Relation values vs relation variables: A relation is simply a value and cannot be "time-varying"; relation variables are. Todd's use of the term "time-varying relations" is somewhat confusing. - Foreign key: wasn't allowed to be a primary key as well. This is unnecessarily restrictive. (e.g. won't allow "Person(SSN, Name, Address), Account(SSN, balance)", where SSN is used as a primary key for the second relation as well, but is also a foreign key). - 1969 paper considered query language based on "second-order predicate calculus". Domains were allowed to be relations. 1970 paper basically said all relations should be normalized, and noted that first-order predicate calculus would be sufficient. - 1969 paper: "Some deletions may be triggered by others if deletion dependencies ... are declared" ("referntial integrity"). - Operations: - Several original operations like tie, composition, restriction didn't make it to the algebra. - Join: The definition of Join was very different from what we have today. Its not constructive. Definition of "natural join" persists though. The notion of "join" required that: if joining R(A, B) and S(B, C), \pi_B(R) must be equal to \pi_B(S). The definition of "natural join" doesn't seem to have required that (according to Date). The concept of "joinability" has similarities to "nonloss decomposition". - He didn't have "select" (\sigma) operation -- "tie" can be written trivially using \sigma. - "named" relations vs "stored" relations vs "expressible" relations - The "named" set of relations could contain redundant relations (that are derivable from stored relations). 1969 paper: "If the traffic on some unnamed but expressible relation grows to signficant proportions, then it should be given a name and thereby included in the named set". In other words, "views". - Storage: - As arrays - Communication advantages: - suitably compressed, devoid of pointers, no hash addressing, no indices or ordering lists - "Connection Trap": A(supplier, part) B(part, project) Then, joining A and B on part, and projecting on {supplier, project} gives us the projects to which the supplier supplies. NOT TRUE. Consider: A B supp1 part1 part1 project1 supp2 part1 The above operation gives us: Result supp1 project1 supp2 project1 So which supplier supplies to the that project ? Both, neither ? In fact, the information is not modeled anywhere in the schema. See http://db.grussell.org/section005.html for some more details (search for "connection trap"). - Strong redundancy and consistency: Beginning of what was to become integrity part of the relational model. Consider: EMP(emp-number, dept-number, budget) and say the following two FDs exist: emp-number --> dept-number dept-number --> budget The above relation is not in 3NF or BCNF and has redundancy. But it is not "strongly redundant" according to Codd's definition. Weak redundancy didn't lead anywhere. See Date's comments in the articles above.