CS 610 Spring 2019Performance of Skip ListsCourse Project Option 1Due May 3The goal of this project is to analyze the performance of Skip List implementation of Ordered Dictionary Chapter 7 Assignment Due: To Be Determined by Class Vote by the End of the First Week of Teaching The penalty for late hand-in of coursework is 10% of the total marks per working day. No credit will be given after more than 5 working days. A working day is a 24-hour period starting from the original hand-in deadline, but skipping days on which the School is closed. 7.1 Individual Assignment (40%) We will study the so-called South-African Heart-Disease (SAHD) data. Necessary background is in [3, Section 4.4.2]. Briefly, observations on n = 462 individuals are available. On each individual, the data consists of: a binary indicator (value 0 or 1) of the presence of Coronary Heart Disease (1=have disease, 0=does not have disease), viewed as a response; and the following raw inputs: systolic blood pressure; tobacco use; LDL, also known as “bad cholesterol”; (binary) indicator of family history of heart disease; (measure of) obesity; (measure of) alcohol use; and age. In data file SAHD.mat, the response variable is chd, and the inputs are sbp, tobacco, ldl, famhist, obesity, alcohol, age respectively. We want to regress the response on inputs, where an input may be a raw input or a function of it (e.g., a quadratic function). The main tool is a logit (logistic) regression of the form Yi ~ Bernoulli(μi) and independent, where logit(μi) = X p j=1 Xj,iβj (7.1) where logit(μ) = log(μ/(1μ)), and (Yi , X1,i, . . . , Xp,i) is the i-th observation of the response and the inputs. Another possibility is probit regression of the form Yi ~ Bernoulli(μi) and independent, where μi = Φ(X p j=1 Xj,iβj ), (7.2) where Φ() is the Normal(0,1) cdf. [3, Section 4.4.2] fit logit models of the form (7.1). In [3, Section 5.2.2, pages 146-148], a more advanced model in equation (5.6) there captures the effect of a raw input Xj on the response by a spline (piecewise-polynomial) function hj (Xj ); the (estimated) functions are shown in Figure 5.4 for selected inputs. These findings suggest the following: ? Variables sbp, tobacco, and obesity may appear in (7.1) in quadratic form (b1x + b2x 2 , where x is the raw input) or in cubic form (b1x + b2x 2 + b3x 3 ). To capture the effect of age, which does not seem to resemble a low-order polynomial, consider as inputs the binary indicators of categorized age, similar to what is done in [5, Example 10]. For example, the sample deciles of age, namely 18, 28, 34, 40, 45, 49, 54, 58, 61, and 64, define the intervals (0, 18], (18, 28], (28, 34], ..., (61, 64], which give 10 categories of age. 47 Standard tasks (model estimation; testing the significance a certain coefficient; computing the AIC; etc.) may be done via sahd.m Part 1, which is similar to sco.m (lab 3, Section 6.2.2). (Part 2 supports the group assignment.) A very thorough exploration could involve transformed raw inputs beyond those seen there. Those wishing to have matlab on a non-University computer should start at software.soton.ac.uk. Install at least matlab (the basic product); and the toolboxes on statistics and optimization. 7.1.1 Deliverables and Marking Scheme Submit a typed report to the Faculty Office, with the usual cover about academic integrity. The audience is a hypothetical manager or analyst familiar with our notes, the brief and the analyzes in [3] cited above. Discuss results, and analysis as necessary, on the following: 1. Develop a preferred model of the response. A better (more parsimonious) model is often found by removing an input Xj if the hypothesis “βj = 0” cannot be rejected at some level α (the corresponding t value satisfies |t| First define an interface for Ordered Dictionary ADT with the required methods and then provide the Skip list implementation. You should give your own implementation of Skip list (not borrowed from any web site or from anyone else).Second part of the project requires you to write a program to test the performance of this implementation for findElement, insertElement, removeElement, closestKeyAfter operations. Specifically, you should generate randomly a sequence of unique integers in a range (say 1 to 100000) to be inserted as keys starting from an empty ADT. You should intersperse insert operations with equal number of findElement, removeElement and closestKeyAfter operations in the sequence. You should make sure there will be sufficient successful and unsuccessful searches and valid delete operations. One approach is as follows : Suppose you have a “findElement” operation as m-th operation in the sequence and let … be the keys inserted until then pick a number “t” randomly between 1 and 2m+1 and use key if t = 2l, key if t = 2l-1, for 1 ≤ l ≤ m and key + 1 if t = 2m+1. You can use same approach for “closestKeyAfter” operation. For “removeElement”, if … be the keys inserted until then pick a number “t” randomly between 1 and m and choose key if t = l.You should run this sequence of these operations many times and also keep statistics to measure average time for the four ADT operations as a function of number of keys in the tree at the time when operation was executed. This will enable you to plot the average time as a function of input size and make conclusions about the asymptotic growth of its performance (big-oh notation).You should implement it in either C++ or Java (preferable). For Java, submit a zip file containing (a) source code, (b) Jar file containing .class files, (c) a Windows command file that I can run to test the first part of the project by entering a sequence of ADT operations with key values when prompted and observing the skip list configuration printed by the program, (d) a Windows command file that I can run to execute the test program of second part of the project and see the performance results in a tabular form and (e) one-page summary of your findings regarding the asymptotic complexity analysis of for the various ADT operations along with the performance graphs to support your conclusions. For C++, instead of (b), submit the executables that can be run on an AFS UNIX host and for (c) and (d), submit Unix shell scripts to run the test programs.转自:http://www.7daixie.com/2019043020198071.html











网友评论