/*** TIP 00388 ***/

Create 3 separate totals for previous time periods for each time period 


I have been requested to develop a program that will calculate a
return rate for our products. The Rates are bucketed into 3 groups.

1-Initial (0-6 Months)
2-Yearly (7-18 Months)
3-Long Term (>18 Mnths)

There is no overlap of month with regards to the counting rules.
I had thought about using a macro and proc summary, but I am guessing
that this would be better addressed with a do loop etc.. but I have
little experience with these..

Can any recommend an approach? This file will grow monthly and will
need to be generated often. (Note; the month field is created;
month=put(date,yymmdd8.).)

Because I will be running the report in the current month for the
previous month, the code would also have to be smart enough to look at
the current month -1.

Any Ideas? A search has revealed problems not unlike this one however;
it seems to be somewhat unique. If you graciously provide sample code,
i'd ask that you please comment each step so that I may re-use this in
the future.

Based upon the data below I would want the output to show;

MONTH   IRR_QTY YRR_QTY LTR_QTY
2003-10 2812    12434   5927
2003-09 2939    12476   5358
2003-08 3043    12625   4790
2003-07 2668    12861   4242
etc..

data bogus;
input MONTH $ QTY_SHIPPED;
cards;
2003-10 400
2003-09 315
2003-08 687
2003-07 425
2003-06 398
2003-05 587
2003-04 527
2003-03 419
2003-02 312
2003-01 389
2002-12 897
2002-11 568
2002-10 521
2002-09 678
2002-08 745
2002-07 954
2002-06 875
2002-05 982
2002-04 951
2002-03 972
2002-02 875
2002-01 632
2001-12 589
2001-11 548
2001-10 569
2001-09 568
2001-08 548
2001-07 785
2001-06 658
2001-05 719
2001-04 555
2001-03 524
2001-02 512
2001-01 489
2001-03 524
2001-02 512
2001-01 489
;


Solution # 1 by Jack Hamilton The code below is one way to create the first three columsn of your report. I didn't create LTR_QTY because your description didn't match your example. I recommend that you don't store your dates as character strings; instead, store them as SAS date values and use formats. ===== data bogus; input @1 year 4. @6 month 2. @9 QTY_SHIPPED; month_shipped = mdy(month, 1, year); drop month year; format month_shipped yymmd7.; cards; 2003-10 400 2003-09 315 2003-08 687 2003-07 425 2003-06 398 2003-05 587 2003-04 527 2003-03 419 2003-02 312 2003-01 389 2002-12 897 2002-11 568 2002-10 521 2002-09 678 2002-08 745 2002-07 954 2002-06 875 2002-05 982 2002-04 951 2002-03 972 2002-02 875 2002-01 632 2001-12 589 2001-11 548 2001-10 569 2001-09 568 2001-08 548 2001-07 785 2001-06 658 2001-05 719 2001-04 555 2001-03 524 2001-02 512 2001-01 489 2001-03 524 2001-02 512 2001-01 489 ; run; proc sort data=bogus; by descending month_shipped; run; data reshape (keep=report_month qty_shipped bucket month_shipped); set bogus end=end; format report_month yymmd7.; bucket = 'IRR_QTY'; report_month = month_shipped; do while(report_month LT intnx('month', month_shipped, 6)); output; report_month = intnx('month', report_month, 1); end; bucket = 'YRR_QTY'; report_month = intnx('month', month_shipped, 6); do while(report_month LT intnx('month', month_shipped, 24)); output; report_month = intnx('month', report_month, 1); end; retain max_shipping_month; max_shipping_month = max(month_shipped, max_shipping_month); if end then call symput('MAX_SHIPPING_MONTH', put(max_shipping_month, 5.0)); run; proc report data=reshape nowindows; where report_month LE &MAX_SHIPPING_MONTH; column report_month bucket, qty_shipped; define report_month / group width=12 descending; define bucket / across order=data; define qty_shipped / analysis sum; run; Jack Hamilton Manager, Technical Development Metrics Department, First Health West Sacramento, California USA
Solution #2 by Paul Dorfman As Jack has noted, there is something out of kilter with the LTR_QTY variable in your sample output. As far as the uniqueness of the requested algorithm is concerned, I would take issue with that: On the contrary, it is quite common in CS. Although there is no way to produce the result in a single pass over the input data values (counting both disk and memory passes), it requires only two passes through the input. In the first pass, the rolling sum is accumulated; the second pass slides fixed-size queues over the rolling sums array, dropping the trailing sum in each iteration. Below, the array is made large enough to accommodate all possible months (increase the upper bound to 999999 if not sufficent :-) and the array is initialized in such a way that the algorithm always holds without the need to go through quirky conditionals. REAR and FRONT refer to (surprise, surprise!) the rear and the front of the queue: %let run_month = -1 ; *get PREVIOUS month ; %let hb = 99999 ; data roll ( keep = month irr_qty yrr_qty ltr_qty ) ; array mnth [ 0 : &hb ] $7 _temporary_ ; array roll [ 0 : &hb ] _temporary_ (100000 * 0) ; do seq = 1 to front ; set bogus nobs = front ; if month = put (intnx ('mon', date(), &run_month), yymmd7.) then rear = seq ; if rear then qty_sum ++ qty_shipped ; mnth [seq] = month ; roll [seq] = qty_sum ; end ; do seq = front + 1 to &hb ; roll [seq] = qty_sum ; end ; do seq = rear to front ; month = mnth [seq] ; sub = seq - 1 ; irr_qty = roll [sub + 6] - roll [sub + 0] ; yrr_qty = roll [sub + 24] - roll [sub + 6] ; ltr_qty = roll [sub + front] - roll [sub + 24] ; output ; end ; run ; Kind regards, ================= Paul M. Dorfman Jacksonville, FL
Solution #3 by Howard Schreier Indeed it can be done in one SQL statement: create table rolled as select b.month_shipped, sum(case when 0 <= intck('month',x.month_shipped,b.month_shipped) < 6 then x.qty_shipped else 0 end ) as irr_qty, sum(case when 6 <= intck('month',x.month_shipped,b.month_shipped) < 18 then x.qty_shipped else 0 end ) as yrr_qty, sum(case when 18 <= intck('month',x.month_shipped,b.month_shipped) then x.qty_shipped else 0 end ) as ltr_qty from bogus as b, bogus as x group by b.month_shipped order by b.month_shipped descending ;
Solution #4 by Charles Patridge data bogus; input MONTH $ QTY_SHIPPED; date = mdy( input( substr(month,6,2), 2.), 01, input( substr(month,1,4),4.) ); cards; 2003-10 400 2003-09 315 2003-08 687 2003-07 425 2003-06 398 2003-05 587 2003-04 527 2003-03 419 2003-02 312 2003-01 389 2002-12 897 2002-11 568 2002-10 521 2002-09 678 2002-08 745 2002-07 954 2002-06 875 2002-05 982 2002-04 951 2002-03 972 2002-02 875 2002-01 632 2001-12 589 2001-11 548 2001-10 569 2001-09 568 2001-08 548 2001-07 785 2001-06 658 2001-05 719 2001-04 555 2001-03 524 2001-02 512 2001-01 489 2000-03 524 /*** changed dates - assumed typed incorrectly ***/ 2000-02 512 /*** changed dates - assumed typed incorrectly ***/ 2000-01 489 /*** changed dates - assumed typed incorrectly ***/ ;;;; run; /*** 1 assumption made is there are no gaps in your data - that is missing months - not necessarily a good assumption - can easily remedy this with some additional coding ***/ /*** need to find range of dates for macro variables ***/ proc summary data=bogus nway missing; var date; output out=daterange(drop=_type_ _freq_) min=mindate max=maxdate; run; data _null_; set daterange; call symput('mindate', put(mindate,monyy7.)); /*** create minimum date macro ***/ call symput('maxdate', put(maxdate,monyy7.)); /*** create maximum date macro ***/ run; proc summary data=bogus nway missing; /*** collapse data to 1 record per date ***/ classes date; var qty_shipped; output out=summary (drop=_type_ _freq_) SUM=; RUN; %emptyyn(work.summary); /*** a macro to determine number of OBS in a Dataset ***/ %let numobs = %cmpres(&numobs); /*** how many time periods are there - compress blanks ***/ /*** flip the data to do array processing for just 1 record ***/ proc transpose data=summary out=tranpose; id date; format date monyy7.; var qty_shipped; run; /*** now calculate 3 separate summaries ***/ data tranpose1 (drop=i k); set tranpose; array perds (&numobs) &mindate -- &maxdate ; array tmp (&numobs) _temporary_ ; output; /*** put original data out for documentation ***/ /*** save raw data in temporary array ***/ do i = 1 to dim(perds); tmp(i) = perds(i); perds(i) = .; end; do i = &numobs to 6 by -1; /*** 0 to 6 most recent entries ***/ perds(i) = tmp(i)+tmp(i-1)+tmp(i-2)+tmp(i-3)+tmp(i-4)+tmp(i-5); end; _name_ = "IRR_QTY"; output; /*** re-initialize to missing ***/ do i = 1 to dim(perds); perds(i) = .; end; do i = &numobs to &numobs-18 by -1; /*** 7 to 18 next most recent entries ***/ perds(i) = tmp(i-6 )+tmp(i-7 )+tmp(i-8 )+tmp(i-9 )+tmp(i-10)+tmp(i-11) + tmp(i-12)+tmp(i-13)+tmp(i-14)+tmp(i-15)+tmp(i-16)+tmp(i-17); end; _name_ = "YRR_QTY"; output; /*** re-initialize to missing ***/ do i = 1 to dim(perds); perds(i) = .; end; do k = &numobs to &numobs-18 by -1; /*** 19 plus next most recent entries to end ***/ do i = k-18 to 1 by -1; perds(k) + tmp(i ); end; end; _name_ = "LTR_QTY"; output; run; /*** flip processed data back to previous structure with 3 summaries and original data ***/ proc transpose data=tranpose1 out=tranpose2; id _name_; var &mindate -- &maxdate; run; /*** create a sas date value just to have it in case needed such as sorting output ***/ data tranpose2; set tranpose2; date = input( _name_, monyy7. ); format date monyy.; run; proc printto file="c:\download\bogus.txt" new; run; proc print data=tranpose2; run; proc printto; run; The SAS System 08:06 Friday, October 17, 2003 2 QTY_ Obs _NAME_ SHIPPED IRR_QTY YRR_QTY LTR_QTY date 1 JAN2000 489 . . . JAN00 2 FEB2000 512 . . . FEB00 3 MAR2000 524 . . . MAR00 4 JAN2001 489 . . . JAN01 5 FEB2001 512 . . . FEB01 6 MAR2001 524 3050 . . MAR01 7 APR2001 555 3116 . . APR01 8 MAY2001 719 3323 . . MAY01 9 JUN2001 658 3457 . . JUN01 10 JUL2001 785 3753 . . JUL01 11 AUG2001 548 3789 . . AUG01 12 SEP2001 568 3833 . . SEP01 13 OCT2001 569 3847 . . OCT01 14 NOV2001 548 3676 . . NOV01 15 DEC2001 589 3607 . . DEC01 16 JAN2002 632 3454 . . JAN02 17 FEB2002 875 3781 . . FEB02 18 MAR2002 972 4185 . . MAR02 19 APR2002 951 4567 6963 489 APR02 20 MAY2002 982 5001 6999 1001 MAY02 21 JUN2002 875 5287 7064 1525 JUN02 22 JUL2002 954 5609 7207 2014 JUL02 23 AUG2002 745 5479 7570 2526 AUG02 24 SEP2002 678 5185 8018 3050 SEP02 25 OCT2002 521 4755 8414 3605 OCT02 26 NOV2002 568 4341 8677 4324 NOV02 27 DEC2002 897 4363 8894 4982 DEC02 28 JAN2003 389 3798 9063 5767 JAN03 29 FEB2003 312 3365 9260 6315 FEB03 30 MAR2003 419 3106 9370 6883 MAR03 31 APR2003 527 3112 9322 7452 APR03 32 MAY2003 587 3131 9342 8000 MAY03 33 JUN2003 398 2632 9650 8589 JUN03 34 JUL2003 425 2668 9407 9221 JUL03 35 AUG2003 687 3043 8844 10096 AUG03 36 SEP2003 315 2939 8291 11068 SEP03 37 OCT2003 400 2812 7867 12019 OCT03 /*** end of tip 00388 ***/