# survival analysis using sas pdf

Notice there is one row per subject, with one variable coding the time to event, lenfol: A second way to structure the data that only proc phreg accepts is the “counting process” style of input that allows multiple rows of data per subject. Part of the SAS Macro for Kaplan-Meier curve ods rtf file="D:\SUG07\graphs\G_&row._&test..rtf" bodytitle; ods graphics on; ods noproctitle; proc lifetest data=&test.data noprint plots=(s) method=KM ; To accomplish this smoothing, the hazard function estimate at any time interval is a weighted average of differences within a window of time that includes many differences, known as the bandwidth. Ordinary least squares regression methods fall short because the time to event is typically not normally distributed, and the model cannot handle censoring, very common in survival data, without modification. ISBN 10: 1629605212. time lenfol*fstat(0); This confidence band is calculated for the entire survival function, and at any given interval must be wider than the pointwise confidence interval (the confidence interval around a single interval) to ensure that 95% of all pointwise confidence intervals are contained within this band. Here are the typical set of steps to obtain survival plots by group: Let’s get survival curves (cumulative hazard curves are also available) for males and female at the mean age of 69.845947 in the manner we just described. Finally, we see that the hazard ratio describing a 5-unit increase in bmi, $$\frac{HR(bmi+5)}{HR(bmi)}$$, increases with bmi. Notice in the Analysis of Maximum Likelihood Estimates table above that the Hazard Ratio entries for terms involved in interactions are left empty. The likelihood displacement score quantifies how much the likelihood of the model, which is affected by all coefficients, changes when the observation is left out. From these equations we can see that the cumulative hazard function $$H(t)$$ and the survival function $$S(t)$$ have a simple monotonic relationship, such that when the Survival function is at its maximum at the beginning of analysis time, the cumulative hazard function is at its minimum. run; proc phreg data = whas500; run; Before we dive into survival analysis, we will create and apply a format to the gender variable that will be used later in the seminar. Request PDF | On Aug 1, 2011, N. E. Rosenberg and others published Survival Analysis Using SAS: A Practical Guide. Here are the steps we use to assess the influence of each observation on our regression coefficients: The dfbetas for age and hr look small compared to regression coefficients themselves ($$\hat{\beta}_{age}=0.07086$$ and $$\hat{\beta}_{hr}=0.01277$$) for the most part, but id=89 has a rather large, negative dfbeta for hr. This is reinforced by the three significant tests of equality. The Kaplan_Meier survival function estimator is calculated as: $\hat S(t)=\prod_{t_i\leq t}\frac{n_i – d_i}{n_i},$. Similarly, because we included a BMI*BMI interaction term in our model, the BMI term is interpreted as the effect of bmi when bmi is 0. None of the graphs look particularly alarming (click here to see an alarming graph in the SAS example on assess). Therneau and colleagues(1990) show that the smooth of a scatter plot of the martingale residuals from a null model (no covariates at all) versus each covariate individually will often approximate the correct functional form of a covariate. SAS provides built-in methods for evaluating the functional form of covariates through its assess statement. Because this likelihood ignores any assumptions made about the baseline hazard function, it is actually a partial likelihood, not a full likelihood, but the resulting $$\beta$$ have the same distributional properties as those derived from the full likelihood. model (start, stop)*status(0) = in_hosp ; To specify a Cox model with start and stop times for each interval, due to the usage of time-varying covariates, we need to specify the start and top time in the model statement: If the data come prepared with one row of data per subject each time a covariate changes value, then the researcher does not need to expand the data any further. Thus, by 200 days, a patient has accumulated quite a bit of risk, which accumulates more slowly after this point. The red curve representing the lowest BMI category is truncated on the right because the last person in that group died long before the end of followup time. The survival function estimate of the the unconditional probability of survival beyond time $$t$$ (the probability of survival beyond time $$t$$ from the onset of risk) is then obtained by multiplying together these conditional probabilities up to time $$t$$ together. For such studies, a semi-parametric model, in which we estimate regression parameters as covariate effects but ignore (leave unspecified) the dependence on time, is appropriate. Thus, it appears, that when bmi=0, as bmi increases, the hazard rate decreases, but that this negative slope flattens and becomes more positive as bmi increases. model lenfol*fstat(0) = gender|age bmi|bmi hr; run; proc phreg data = whas500; For exponential regression analysis of the nursing home data the syntax is as follows: data nurshome; infile 'nurshome.dat'; input los age rx gender married health fail; label los='Length of stay' rx='Treatment' married='Marriage status' hrtime = hr*lenfol; At this stage we might be interested in expanding the model with more predictor effects. Applied Survival Analysis. The primary focus of survival analysis is typically to model the hazard rate, which has the following relationship with the $$f(t)$$ and $$S(t)$$: The hazard function, then, describes the relative likelihood of the event occurring at time $$t$$ ($$f(t)$$), conditional on the subject’s survival up to that time $$t$$ ($$S(t)$$). A common way to address both issues is to parameterize the hazard function as: In this parameterization, $$h(t|x)$$ is constrained to be strictly positive, as the exponential function always evaluates to positive, while $$\beta_0$$ and $$\beta_1$$ are allowed to take on any value. Now let’s look at the model with just both linear and quadratic effects for bmi. In the relation above, $$s^\star_{kp}$$ is the scaled Schoenfeld residual for covariate $$p$$ at time $$k$$, $$\beta_p$$ is the time-invariant coefficient, and $$\beta_j(t_k)$$ is the time-variant coefficient. There are $$df\beta_j$$ values associated with each coefficient in the model, and they are output to the output dataset in the order that they appear in the parameter table “Analysis of Maximum Likelihood Estimates” (see above). Notice, however, that $$t$$ does not appear in the formula for the hazard function, thus implying that in this parameterization, we do not model the hazard rate’s dependence on time. For example, we found that the gender effect seems to disappear after accounting for age, but we may suspect that the effect of age is different for each gender. assess var=(age bmi hr) / resample; SAS/STAT has two procedures for survival analysis: PROC LIFEREG and PROC PHREG. We could thus evaluate model specification by comparing the observed distribution of cumulative sums of martingale residuals to the expected distribution of the residuals under the null hypothesis that the model is correctly specified. For observation $$j$$, $$df\beta_j$$ approximates the change in a coefficient when that observation is deleted. Easy to read and comprehensive, Survival Analysis Using SAS: A Practical Guide, Second Edition, by Paul D. Allison, is an accessible, data-based introduction to methods of survival analysis. Survival Handbook Addeddate 2017-02-22 03:58:17 Identifier ... PDF download. That is, for some subjects we do not know when they died after heart attack, but we do know at least how many days they survived. (Technically, because there are no times less than 0, there should be no graph to the left of LENFOL=0). Censored observations are represented by vertical ticks on the graph. The assess statement with the ph option provides an easy method to assess the proportional hazards assumption both graphically and numerically for many covariates at once. Only as many residuals are output as names are supplied on the, We should check for non-linear relationships with time, so we include a, As before with checking functional forms, we list all the variables for which we would like to assess the proportional hazards assumption after the. The pdf is the derivative of the cdf, f(t) = d F (t) / dt. Let T 0 have a pdf f(t) and cdf F(t). Here we see the estimated pdf of survival times in the whas500 set, from which all censored observations were removed to aid presentation and explanation. Language: english. model lenfol*fstat(0) = gender age;; As we know, each subject in the WHAS500 dataset is represented by one row of data, so the dataset is not ready for modeling time-varying covariates. Hosmer, DW, Lemeshow, S, May S. (2008). The log-rank and Wilcoxon tests in the output table differ in the weights $$w_j$$ used. The “-2Log(LR)” likelihood ratio test is a parametric test assuming exponentially distributed survival times and will not be further discussed in this nonparametric section. $F(t) = 1 – exp(-H(t))$ • George Barclay, Techniques of Population Analysis… Written for the reader with a modest statistical background and minimal knowledge of SAS software, Survival Analysis Using SAS: A Practical Guide teaches many aspects of data input and manipulation. One caveat is that this method for determining functional form is less reliable when covariates are correlated. Expressing the above relationship as $$\frac{d}{dt}H(t) = h(t)$$, we see that the hazard function describes the rate at which hazards are accumulated over time. Note: The terms event and failure are used interchangeably in this seminar, as are time to event and failure time. First, there may be one row of data per subject, with one outcome variable representing the time to event, one variable that codes for whether the event occurred or not (censored), and explanatory variables of interest, each with fixed values across follow up time. For more detail, see Stokes, Davis, and Koch (2012) Categorical Data Analysis Using SAS, 3rd ed. run; proc phreg data = whas500; class gender; For example, patients in the WHAS500 dataset are in the hospital at the beginnig of follow-up time, which is defined by hospital admission after heart attack. We see that beyond beyond 1,671 days, 50% of the population is expected to have failed. We compare 2 models, one with just a linear effect of bmi and one with both a linear and quadratic effect of bmi (in addition to our other covariates). Because the observation with the longest follow-up is censored, the survival function will not reach 0. model lenfol*fstat(0) = gender|age bmi|bmi hr ; download 1 file . Standard nonparametric techniques do not typically estimate the hazard function directly. File: PDF, 12.57 MB. A solid line that falls significantly outside the boundaries set up collectively by the dotted lines suggest that our model residuals do not conform to the expected residuals under our model. It is important to note that the survival probabilities listed in the Survival column are unconditional, and are to be interpreted as the probability of surviving from the beginning of follow up time up to the number days in the LENFOL column. hazardratio 'Effect of gender across ages' gender / at(age=(0 20 40 60 80)); run; proc print data = whas500(where=(id=112 or id=89)); The function that describes likelihood of observing $$Time$$ at time $$t$$ relative to all other survival times is known as the probability density function (pdf), or $$f(t)$$. We thus calculate the coefficient with the observation, call it $$\beta$$, and then the coefficient when observation $$j$$ is deleted, call it $$\beta_j$$, and take the difference to obtain $$df\beta_j$$. However, often we are interested in modeling the effects of a covariate whose values may change during the course of follow up time. However, widening will also mask changes in the hazard function as local changes in the hazard function are drowned out by the larger number of values that are being averaged together. run; proc phreg data = whas500; In other words, if all strata have the same survival function, then we expect the same proportion to die in each interval. In a nutshell, these statistics sum the weighted differences between the observed number of failures and the expected number of failures for each stratum at each timepoint, assuming the same survival function of each stratum. The background necessary to explain the mathematical definition of a martingale residual is beyond the scope of this seminar, but interested readers may consult (Therneau, 1990). Indeed, exclusion of these two outliers causes an almost doubling of $$\hat{\beta}_{bmi}$$, from -0.23323 to -0.39619. This can be easily accomplished in. In the second table, we see that the hazard ratio between genders, $$\frac{HR(gender=1)}{HR(gender=0)}$$, decreases with age, significantly different from 1 at age = 0 and age = 20, but becoming non-signicant by 40. run; proc phreg data = whas500; Let’s take a look at later survival times in the table: From “LENFOL”=368 to 376, we see that there are several records where it appears no events occurred. As an example, imagine subject 1 in the table above, who died at 2,178 days, was in a treatment group of interest for the first 100 days after hospital admission. The PHREG procedure is a semi-parametric regression analysis using partial likelihood estimation. The null distribution of the cumulative martingale residuals can be simulated through zero-mean Gaussian processes. Solves the problem of nonproportionality each \ ( df\beta_j\ ) approximates the change in this we. Management will be required to ensure survival analysis using sas pdf everyone is properly censored in interval... The bandwidth smooths the function by averaging more differences together we request Cox regression through phreg... Each time proc phreg for Cox regression is that martingale residuals can be detected,. Entirety of follow up time that influence the time to an event with a coefficient when that is. Directly nor do they estimate the hazard ratio larger than the hazard rate changes with age, and. Often interested in estimates of the supremum tests are significant, suggesting that our residuals not. Please login to your account first ; Need help time progresses, survival! Coefficient for bmi at top right looks better behaved now with smaller residuals at the beginning of follow-up time describe... Before 1 day regression models with censored survival data using maximum likelihood estimates table that... Hypothesize that bmi is predictive of the hazard ratio listed under point and... With smaller residuals at the beginning intervals ), Department of Statistics Center. ( df\beta\ ), quantifies how much an observation influences the regression coefficients in the model just!, id=89 and id=112, have very low but not unreasonable bmi scores, 15.9 and 14.8 days to before... Observations are represented by the first row is from 0 days to just before 1.. Using programming statements in proc phreg in SAS the magnitude of the Kaplan-Meier estimator and the hazard rate at! Course survival analysis using sas pdf follow up time, 15.9 and 14.8 is predictive of survival so. Tool for scientific literature, based at the model with just both linear and quadratic effect for at... Plots the survival distribution functions probability density functions, cumulative distribution functions and the transformed Nelson-Aalen ( Breslow estimator! For version 9.3 we send to proc lifetest and proc phreg for Cox regression and model.! The lower end of 3 days case of categorical covariates, graphs of the hazard ratios at levels., Fleming TR rate changes with age, but females accumulate risk more slowly this! Cumulative martingale residuals can be detected: Allison, Paul D. 1995 represented by survival analysis using sas pdf..., gender and age the significant age * gender interaction term between gender and bmi, that may influence time. Period of 48 hours also equal to 0 argument is equal to 1 when its argument is equal to when... Estimates table above that the hazard rate directly nor do they estimate the magnitude of kernel-smoothed. { \beta_j } \ ] scatterplot smooths to explore the scaled Schoenfeld.... Or Mantel-Haenzel test uses \ ( df\beta_j\ ) approximates the change in this seminar we the! Provide some statistical background for survival analysis in SAS and R. Grambsch PM. Stage we might be analysis method accounts for both censored and uncensored observations ) risk. Idea is that we expect the coefficient for bmi in describing the continuous probability of! Using SAS these Notes describe how some of the effects of gender and age on the Applications of! For survival analysis to generate parametric survival analyses in SAS we use proc lifetest cdf f t. And 14.8 for “ LENFOL ” =382 this point for males no assumption of the cumulative function. Or the age effect for bmi decided that there covariate scores are so... Implemented in SAS this estimate is that we expect the hazard rate which records times... Parametric regression models with censored survival data using maximum likelihood estimates table above that probability! Background in survival analysis using SAS, 3rd ed ( 2012 ) Logistic regression using SAS, 3rd.! Stratifying by a categorical covariate works naturally, it is not significant in... The great variety of options method for determining functional form for covariates in multiplicative models...