11da177e4SLinus Torvalds| 21da177e4SLinus Torvalds| ssin.sa 3.3 7/29/91 31da177e4SLinus Torvalds| 41da177e4SLinus Torvalds| The entry point sSIN computes the sine of an input argument 51da177e4SLinus Torvalds| sCOS computes the cosine, and sSINCOS computes both. The 61da177e4SLinus Torvalds| corresponding entry points with a "d" computes the same 71da177e4SLinus Torvalds| corresponding function values for denormalized inputs. 81da177e4SLinus Torvalds| 91da177e4SLinus Torvalds| Input: Double-extended number X in location pointed to 101da177e4SLinus Torvalds| by address register a0. 111da177e4SLinus Torvalds| 121da177e4SLinus Torvalds| Output: The function value sin(X) or cos(X) returned in Fp0 if SIN or 131da177e4SLinus Torvalds| COS is requested. Otherwise, for SINCOS, sin(X) is returned 141da177e4SLinus Torvalds| in Fp0, and cos(X) is returned in Fp1. 151da177e4SLinus Torvalds| 161da177e4SLinus Torvalds| Modifies: Fp0 for SIN or COS; both Fp0 and Fp1 for SINCOS. 171da177e4SLinus Torvalds| 181da177e4SLinus Torvalds| Accuracy and Monotonicity: The returned result is within 1 ulp in 191da177e4SLinus Torvalds| 64 significant bit, i.e. within 0.5001 ulp to 53 bits if the 201da177e4SLinus Torvalds| result is subsequently rounded to double precision. The 211da177e4SLinus Torvalds| result is provably monotonic in double precision. 221da177e4SLinus Torvalds| 231da177e4SLinus Torvalds| Speed: The programs sSIN and sCOS take approximately 150 cycles for 241da177e4SLinus Torvalds| input argument X such that |X| < 15Pi, which is the usual 251da177e4SLinus Torvalds| situation. The speed for sSINCOS is approximately 190 cycles. 261da177e4SLinus Torvalds| 271da177e4SLinus Torvalds| Algorithm: 281da177e4SLinus Torvalds| 291da177e4SLinus Torvalds| SIN and COS: 301da177e4SLinus Torvalds| 1. If SIN is invoked, set AdjN := 0; otherwise, set AdjN := 1. 311da177e4SLinus Torvalds| 321da177e4SLinus Torvalds| 2. If |X| >= 15Pi or |X| < 2**(-40), go to 7. 331da177e4SLinus Torvalds| 341da177e4SLinus Torvalds| 3. Decompose X as X = N(Pi/2) + r where |r| <= Pi/4. Let 351da177e4SLinus Torvalds| k = N mod 4, so in particular, k = 0,1,2,or 3. Overwrite 361da177e4SLinus Torvalds| k by k := k + AdjN. 371da177e4SLinus Torvalds| 381da177e4SLinus Torvalds| 4. If k is even, go to 6. 391da177e4SLinus Torvalds| 401da177e4SLinus Torvalds| 5. (k is odd) Set j := (k-1)/2, sgn := (-1)**j. Return sgn*cos(r) 411da177e4SLinus Torvalds| where cos(r) is approximated by an even polynomial in r, 421da177e4SLinus Torvalds| 1 + r*r*(B1+s*(B2+ ... + s*B8)), s = r*r. 431da177e4SLinus Torvalds| Exit. 441da177e4SLinus Torvalds| 451da177e4SLinus Torvalds| 6. (k is even) Set j := k/2, sgn := (-1)**j. Return sgn*sin(r) 461da177e4SLinus Torvalds| where sin(r) is approximated by an odd polynomial in r 471da177e4SLinus Torvalds| r + r*s*(A1+s*(A2+ ... + s*A7)), s = r*r. 481da177e4SLinus Torvalds| Exit. 491da177e4SLinus Torvalds| 501da177e4SLinus Torvalds| 7. If |X| > 1, go to 9. 511da177e4SLinus Torvalds| 521da177e4SLinus Torvalds| 8. (|X|<2**(-40)) If SIN is invoked, return X; otherwise return 1. 531da177e4SLinus Torvalds| 541da177e4SLinus Torvalds| 9. Overwrite X by X := X rem 2Pi. Now that |X| <= Pi, go back to 3. 551da177e4SLinus Torvalds| 561da177e4SLinus Torvalds| SINCOS: 571da177e4SLinus Torvalds| 1. If |X| >= 15Pi or |X| < 2**(-40), go to 6. 581da177e4SLinus Torvalds| 591da177e4SLinus Torvalds| 2. Decompose X as X = N(Pi/2) + r where |r| <= Pi/4. Let 601da177e4SLinus Torvalds| k = N mod 4, so in particular, k = 0,1,2,or 3. 611da177e4SLinus Torvalds| 621da177e4SLinus Torvalds| 3. If k is even, go to 5. 631da177e4SLinus Torvalds| 641da177e4SLinus Torvalds| 4. (k is odd) Set j1 := (k-1)/2, j2 := j1 (EOR) (k mod 2), i.e. 651da177e4SLinus Torvalds| j1 exclusive or with the l.s.b. of k. 661da177e4SLinus Torvalds| sgn1 := (-1)**j1, sgn2 := (-1)**j2. 671da177e4SLinus Torvalds| SIN(X) = sgn1 * cos(r) and COS(X) = sgn2*sin(r) where 681da177e4SLinus Torvalds| sin(r) and cos(r) are computed as odd and even polynomials 691da177e4SLinus Torvalds| in r, respectively. Exit 701da177e4SLinus Torvalds| 711da177e4SLinus Torvalds| 5. (k is even) Set j1 := k/2, sgn1 := (-1)**j1. 721da177e4SLinus Torvalds| SIN(X) = sgn1 * sin(r) and COS(X) = sgn1*cos(r) where 731da177e4SLinus Torvalds| sin(r) and cos(r) are computed as odd and even polynomials 741da177e4SLinus Torvalds| in r, respectively. Exit 751da177e4SLinus Torvalds| 761da177e4SLinus Torvalds| 6. If |X| > 1, go to 8. 771da177e4SLinus Torvalds| 781da177e4SLinus Torvalds| 7. (|X|<2**(-40)) SIN(X) = X and COS(X) = 1. Exit. 791da177e4SLinus Torvalds| 801da177e4SLinus Torvalds| 8. Overwrite X by X := X rem 2Pi. Now that |X| <= Pi, go back to 2. 811da177e4SLinus Torvalds| 821da177e4SLinus Torvalds 831da177e4SLinus Torvalds| Copyright (C) Motorola, Inc. 1990 841da177e4SLinus Torvalds| All Rights Reserved 851da177e4SLinus Torvalds| 86*e00d82d0SMatt Waddel| For details on the license for this file, please see the 87*e00d82d0SMatt Waddel| file, README, in this same directory. 881da177e4SLinus Torvalds 891da177e4SLinus Torvalds|SSIN idnt 2,1 | Motorola 040 Floating Point Software Package 901da177e4SLinus Torvalds 911da177e4SLinus Torvalds |section 8 921da177e4SLinus Torvalds 931da177e4SLinus Torvalds#include "fpsp.h" 941da177e4SLinus Torvalds 951da177e4SLinus TorvaldsBOUNDS1: .long 0x3FD78000,0x4004BC7E 961da177e4SLinus TorvaldsTWOBYPI: .long 0x3FE45F30,0x6DC9C883 971da177e4SLinus Torvalds 981da177e4SLinus TorvaldsSINA7: .long 0xBD6AAA77,0xCCC994F5 991da177e4SLinus TorvaldsSINA6: .long 0x3DE61209,0x7AAE8DA1 1001da177e4SLinus Torvalds 1011da177e4SLinus TorvaldsSINA5: .long 0xBE5AE645,0x2A118AE4 1021da177e4SLinus TorvaldsSINA4: .long 0x3EC71DE3,0xA5341531 1031da177e4SLinus Torvalds 1041da177e4SLinus TorvaldsSINA3: .long 0xBF2A01A0,0x1A018B59,0x00000000,0x00000000 1051da177e4SLinus Torvalds 1061da177e4SLinus TorvaldsSINA2: .long 0x3FF80000,0x88888888,0x888859AF,0x00000000 1071da177e4SLinus Torvalds 1081da177e4SLinus TorvaldsSINA1: .long 0xBFFC0000,0xAAAAAAAA,0xAAAAAA99,0x00000000 1091da177e4SLinus Torvalds 1101da177e4SLinus TorvaldsCOSB8: .long 0x3D2AC4D0,0xD6011EE3 1111da177e4SLinus TorvaldsCOSB7: .long 0xBDA9396F,0x9F45AC19 1121da177e4SLinus Torvalds 1131da177e4SLinus TorvaldsCOSB6: .long 0x3E21EED9,0x0612C972 1141da177e4SLinus TorvaldsCOSB5: .long 0xBE927E4F,0xB79D9FCF 1151da177e4SLinus Torvalds 1161da177e4SLinus TorvaldsCOSB4: .long 0x3EFA01A0,0x1A01D423,0x00000000,0x00000000 1171da177e4SLinus Torvalds 1181da177e4SLinus TorvaldsCOSB3: .long 0xBFF50000,0xB60B60B6,0x0B61D438,0x00000000 1191da177e4SLinus Torvalds 1201da177e4SLinus TorvaldsCOSB2: .long 0x3FFA0000,0xAAAAAAAA,0xAAAAAB5E 1211da177e4SLinus TorvaldsCOSB1: .long 0xBF000000 1221da177e4SLinus Torvalds 1231da177e4SLinus TorvaldsINVTWOPI: .long 0x3FFC0000,0xA2F9836E,0x4E44152A 1241da177e4SLinus Torvalds 1251da177e4SLinus TorvaldsTWOPI1: .long 0x40010000,0xC90FDAA2,0x00000000,0x00000000 1261da177e4SLinus TorvaldsTWOPI2: .long 0x3FDF0000,0x85A308D4,0x00000000,0x00000000 1271da177e4SLinus Torvalds 1281da177e4SLinus Torvalds |xref PITBL 1291da177e4SLinus Torvalds 1301da177e4SLinus Torvalds .set INARG,FP_SCR4 1311da177e4SLinus Torvalds 1321da177e4SLinus Torvalds .set X,FP_SCR5 1331da177e4SLinus Torvalds .set XDCARE,X+2 1341da177e4SLinus Torvalds .set XFRAC,X+4 1351da177e4SLinus Torvalds 1361da177e4SLinus Torvalds .set RPRIME,FP_SCR1 1371da177e4SLinus Torvalds .set SPRIME,FP_SCR2 1381da177e4SLinus Torvalds 1391da177e4SLinus Torvalds .set POSNEG1,L_SCR1 1401da177e4SLinus Torvalds .set TWOTO63,L_SCR1 1411da177e4SLinus Torvalds 1421da177e4SLinus Torvalds .set ENDFLAG,L_SCR2 1431da177e4SLinus Torvalds .set N,L_SCR2 1441da177e4SLinus Torvalds 1451da177e4SLinus Torvalds .set ADJN,L_SCR3 1461da177e4SLinus Torvalds 1471da177e4SLinus Torvalds | xref t_frcinx 1481da177e4SLinus Torvalds |xref t_extdnrm 1491da177e4SLinus Torvalds |xref sto_cos 1501da177e4SLinus Torvalds 1511da177e4SLinus Torvalds .global ssind 1521da177e4SLinus Torvaldsssind: 1531da177e4SLinus Torvalds|--SIN(X) = X FOR DENORMALIZED X 1541da177e4SLinus Torvalds bra t_extdnrm 1551da177e4SLinus Torvalds 1561da177e4SLinus Torvalds .global scosd 1571da177e4SLinus Torvaldsscosd: 1581da177e4SLinus Torvalds|--COS(X) = 1 FOR DENORMALIZED X 1591da177e4SLinus Torvalds 1601da177e4SLinus Torvalds fmoves #0x3F800000,%fp0 1611da177e4SLinus Torvalds| 1621da177e4SLinus Torvalds| 9D25B Fix: Sometimes the previous fmove.s sets fpsr bits 1631da177e4SLinus Torvalds| 1641da177e4SLinus Torvalds fmovel #0,%fpsr 1651da177e4SLinus Torvalds| 1661da177e4SLinus Torvalds bra t_frcinx 1671da177e4SLinus Torvalds 1681da177e4SLinus Torvalds .global ssin 1691da177e4SLinus Torvaldsssin: 1701da177e4SLinus Torvalds|--SET ADJN TO 0 1711da177e4SLinus Torvalds movel #0,ADJN(%a6) 1721da177e4SLinus Torvalds bras SINBGN 1731da177e4SLinus Torvalds 1741da177e4SLinus Torvalds .global scos 1751da177e4SLinus Torvaldsscos: 1761da177e4SLinus Torvalds|--SET ADJN TO 1 1771da177e4SLinus Torvalds movel #1,ADJN(%a6) 1781da177e4SLinus Torvalds 1791da177e4SLinus TorvaldsSINBGN: 1801da177e4SLinus Torvalds|--SAVE FPCR, FP1. CHECK IF |X| IS TOO SMALL OR LARGE 1811da177e4SLinus Torvalds 1821da177e4SLinus Torvalds fmovex (%a0),%fp0 | ...LOAD INPUT 1831da177e4SLinus Torvalds 1841da177e4SLinus Torvalds movel (%a0),%d0 1851da177e4SLinus Torvalds movew 4(%a0),%d0 1861da177e4SLinus Torvalds fmovex %fp0,X(%a6) 1871da177e4SLinus Torvalds andil #0x7FFFFFFF,%d0 | ...COMPACTIFY X 1881da177e4SLinus Torvalds 1891da177e4SLinus Torvalds cmpil #0x3FD78000,%d0 | ...|X| >= 2**(-40)? 1901da177e4SLinus Torvalds bges SOK1 1911da177e4SLinus Torvalds bra SINSM 1921da177e4SLinus Torvalds 1931da177e4SLinus TorvaldsSOK1: 1941da177e4SLinus Torvalds cmpil #0x4004BC7E,%d0 | ...|X| < 15 PI? 1951da177e4SLinus Torvalds blts SINMAIN 1961da177e4SLinus Torvalds bra REDUCEX 1971da177e4SLinus Torvalds 1981da177e4SLinus TorvaldsSINMAIN: 1991da177e4SLinus Torvalds|--THIS IS THE USUAL CASE, |X| <= 15 PI. 2001da177e4SLinus Torvalds|--THE ARGUMENT REDUCTION IS DONE BY TABLE LOOK UP. 2011da177e4SLinus Torvalds fmovex %fp0,%fp1 2021da177e4SLinus Torvalds fmuld TWOBYPI,%fp1 | ...X*2/PI 2031da177e4SLinus Torvalds 2041da177e4SLinus Torvalds|--HIDE THE NEXT THREE INSTRUCTIONS 2051da177e4SLinus Torvalds lea PITBL+0x200,%a1 | ...TABLE OF N*PI/2, N = -32,...,32 2061da177e4SLinus Torvalds 2071da177e4SLinus Torvalds 2081da177e4SLinus Torvalds|--FP1 IS NOW READY 2091da177e4SLinus Torvalds fmovel %fp1,N(%a6) | ...CONVERT TO INTEGER 2101da177e4SLinus Torvalds 2111da177e4SLinus Torvalds movel N(%a6),%d0 2121da177e4SLinus Torvalds asll #4,%d0 2131da177e4SLinus Torvalds addal %d0,%a1 | ...A1 IS THE ADDRESS OF N*PIBY2 2141da177e4SLinus Torvalds| ...WHICH IS IN TWO PIECES Y1 & Y2 2151da177e4SLinus Torvalds 2161da177e4SLinus Torvalds fsubx (%a1)+,%fp0 | ...X-Y1 2171da177e4SLinus Torvalds|--HIDE THE NEXT ONE 2181da177e4SLinus Torvalds fsubs (%a1),%fp0 | ...FP0 IS R = (X-Y1)-Y2 2191da177e4SLinus Torvalds 2201da177e4SLinus TorvaldsSINCONT: 2211da177e4SLinus Torvalds|--continuation from REDUCEX 2221da177e4SLinus Torvalds 2231da177e4SLinus Torvalds|--GET N+ADJN AND SEE IF SIN(R) OR COS(R) IS NEEDED 2241da177e4SLinus Torvalds movel N(%a6),%d0 2251da177e4SLinus Torvalds addl ADJN(%a6),%d0 | ...SEE IF D0 IS ODD OR EVEN 2261da177e4SLinus Torvalds rorl #1,%d0 | ...D0 WAS ODD IFF D0 IS NEGATIVE 2271da177e4SLinus Torvalds cmpil #0,%d0 2281da177e4SLinus Torvalds blt COSPOLY 2291da177e4SLinus Torvalds 2301da177e4SLinus TorvaldsSINPOLY: 2311da177e4SLinus Torvalds|--LET J BE THE LEAST SIG. BIT OF D0, LET SGN := (-1)**J. 2321da177e4SLinus Torvalds|--THEN WE RETURN SGN*SIN(R). SGN*SIN(R) IS COMPUTED BY 2331da177e4SLinus Torvalds|--R' + R'*S*(A1 + S(A2 + S(A3 + S(A4 + ... + SA7)))), WHERE 2341da177e4SLinus Torvalds|--R' = SGN*R, S=R*R. THIS CAN BE REWRITTEN AS 2351da177e4SLinus Torvalds|--R' + R'*S*( [A1+T(A3+T(A5+TA7))] + [S(A2+T(A4+TA6))]) 2361da177e4SLinus Torvalds|--WHERE T=S*S. 2371da177e4SLinus Torvalds|--NOTE THAT A3 THROUGH A7 ARE STORED IN DOUBLE PRECISION 2381da177e4SLinus Torvalds|--WHILE A1 AND A2 ARE IN DOUBLE-EXTENDED FORMAT. 2391da177e4SLinus Torvalds fmovex %fp0,X(%a6) | ...X IS R 2401da177e4SLinus Torvalds fmulx %fp0,%fp0 | ...FP0 IS S 2411da177e4SLinus Torvalds|---HIDE THE NEXT TWO WHILE WAITING FOR FP0 2421da177e4SLinus Torvalds fmoved SINA7,%fp3 2431da177e4SLinus Torvalds fmoved SINA6,%fp2 2441da177e4SLinus Torvalds|--FP0 IS NOW READY 2451da177e4SLinus Torvalds fmovex %fp0,%fp1 2461da177e4SLinus Torvalds fmulx %fp1,%fp1 | ...FP1 IS T 2471da177e4SLinus Torvalds|--HIDE THE NEXT TWO WHILE WAITING FOR FP1 2481da177e4SLinus Torvalds 2491da177e4SLinus Torvalds rorl #1,%d0 2501da177e4SLinus Torvalds andil #0x80000000,%d0 2511da177e4SLinus Torvalds| ...LEAST SIG. BIT OF D0 IN SIGN POSITION 2521da177e4SLinus Torvalds eorl %d0,X(%a6) | ...X IS NOW R'= SGN*R 2531da177e4SLinus Torvalds 2541da177e4SLinus Torvalds fmulx %fp1,%fp3 | ...TA7 2551da177e4SLinus Torvalds fmulx %fp1,%fp2 | ...TA6 2561da177e4SLinus Torvalds 2571da177e4SLinus Torvalds faddd SINA5,%fp3 | ...A5+TA7 2581da177e4SLinus Torvalds faddd SINA4,%fp2 | ...A4+TA6 2591da177e4SLinus Torvalds 2601da177e4SLinus Torvalds fmulx %fp1,%fp3 | ...T(A5+TA7) 2611da177e4SLinus Torvalds fmulx %fp1,%fp2 | ...T(A4+TA6) 2621da177e4SLinus Torvalds 2631da177e4SLinus Torvalds faddd SINA3,%fp3 | ...A3+T(A5+TA7) 2641da177e4SLinus Torvalds faddx SINA2,%fp2 | ...A2+T(A4+TA6) 2651da177e4SLinus Torvalds 2661da177e4SLinus Torvalds fmulx %fp3,%fp1 | ...T(A3+T(A5+TA7)) 2671da177e4SLinus Torvalds 2681da177e4SLinus Torvalds fmulx %fp0,%fp2 | ...S(A2+T(A4+TA6)) 2691da177e4SLinus Torvalds faddx SINA1,%fp1 | ...A1+T(A3+T(A5+TA7)) 2701da177e4SLinus Torvalds fmulx X(%a6),%fp0 | ...R'*S 2711da177e4SLinus Torvalds 2721da177e4SLinus Torvalds faddx %fp2,%fp1 | ...[A1+T(A3+T(A5+TA7))]+[S(A2+T(A4+TA6))] 2731da177e4SLinus Torvalds|--FP3 RELEASED, RESTORE NOW AND TAKE SOME ADVANTAGE OF HIDING 2741da177e4SLinus Torvalds|--FP2 RELEASED, RESTORE NOW AND TAKE FULL ADVANTAGE OF HIDING 2751da177e4SLinus Torvalds 2761da177e4SLinus Torvalds 2771da177e4SLinus Torvalds fmulx %fp1,%fp0 | ...SIN(R')-R' 2781da177e4SLinus Torvalds|--FP1 RELEASED. 2791da177e4SLinus Torvalds 2801da177e4SLinus Torvalds fmovel %d1,%FPCR |restore users exceptions 2811da177e4SLinus Torvalds faddx X(%a6),%fp0 |last inst - possible exception set 2821da177e4SLinus Torvalds bra t_frcinx 2831da177e4SLinus Torvalds 2841da177e4SLinus Torvalds 2851da177e4SLinus TorvaldsCOSPOLY: 2861da177e4SLinus Torvalds|--LET J BE THE LEAST SIG. BIT OF D0, LET SGN := (-1)**J. 2871da177e4SLinus Torvalds|--THEN WE RETURN SGN*COS(R). SGN*COS(R) IS COMPUTED BY 2881da177e4SLinus Torvalds|--SGN + S'*(B1 + S(B2 + S(B3 + S(B4 + ... + SB8)))), WHERE 2891da177e4SLinus Torvalds|--S=R*R AND S'=SGN*S. THIS CAN BE REWRITTEN AS 2901da177e4SLinus Torvalds|--SGN + S'*([B1+T(B3+T(B5+TB7))] + [S(B2+T(B4+T(B6+TB8)))]) 2911da177e4SLinus Torvalds|--WHERE T=S*S. 2921da177e4SLinus Torvalds|--NOTE THAT B4 THROUGH B8 ARE STORED IN DOUBLE PRECISION 2931da177e4SLinus Torvalds|--WHILE B2 AND B3 ARE IN DOUBLE-EXTENDED FORMAT, B1 IS -1/2 2941da177e4SLinus Torvalds|--AND IS THEREFORE STORED AS SINGLE PRECISION. 2951da177e4SLinus Torvalds 2961da177e4SLinus Torvalds fmulx %fp0,%fp0 | ...FP0 IS S 2971da177e4SLinus Torvalds|---HIDE THE NEXT TWO WHILE WAITING FOR FP0 2981da177e4SLinus Torvalds fmoved COSB8,%fp2 2991da177e4SLinus Torvalds fmoved COSB7,%fp3 3001da177e4SLinus Torvalds|--FP0 IS NOW READY 3011da177e4SLinus Torvalds fmovex %fp0,%fp1 3021da177e4SLinus Torvalds fmulx %fp1,%fp1 | ...FP1 IS T 3031da177e4SLinus Torvalds|--HIDE THE NEXT TWO WHILE WAITING FOR FP1 3041da177e4SLinus Torvalds fmovex %fp0,X(%a6) | ...X IS S 3051da177e4SLinus Torvalds rorl #1,%d0 3061da177e4SLinus Torvalds andil #0x80000000,%d0 3071da177e4SLinus Torvalds| ...LEAST SIG. BIT OF D0 IN SIGN POSITION 3081da177e4SLinus Torvalds 3091da177e4SLinus Torvalds fmulx %fp1,%fp2 | ...TB8 3101da177e4SLinus Torvalds|--HIDE THE NEXT TWO WHILE WAITING FOR THE XU 3111da177e4SLinus Torvalds eorl %d0,X(%a6) | ...X IS NOW S'= SGN*S 3121da177e4SLinus Torvalds andil #0x80000000,%d0 3131da177e4SLinus Torvalds 3141da177e4SLinus Torvalds fmulx %fp1,%fp3 | ...TB7 3151da177e4SLinus Torvalds|--HIDE THE NEXT TWO WHILE WAITING FOR THE XU 3161da177e4SLinus Torvalds oril #0x3F800000,%d0 | ...D0 IS SGN IN SINGLE 3171da177e4SLinus Torvalds movel %d0,POSNEG1(%a6) 3181da177e4SLinus Torvalds 3191da177e4SLinus Torvalds faddd COSB6,%fp2 | ...B6+TB8 3201da177e4SLinus Torvalds faddd COSB5,%fp3 | ...B5+TB7 3211da177e4SLinus Torvalds 3221da177e4SLinus Torvalds fmulx %fp1,%fp2 | ...T(B6+TB8) 3231da177e4SLinus Torvalds fmulx %fp1,%fp3 | ...T(B5+TB7) 3241da177e4SLinus Torvalds 3251da177e4SLinus Torvalds faddd COSB4,%fp2 | ...B4+T(B6+TB8) 3261da177e4SLinus Torvalds faddx COSB3,%fp3 | ...B3+T(B5+TB7) 3271da177e4SLinus Torvalds 3281da177e4SLinus Torvalds fmulx %fp1,%fp2 | ...T(B4+T(B6+TB8)) 3291da177e4SLinus Torvalds fmulx %fp3,%fp1 | ...T(B3+T(B5+TB7)) 3301da177e4SLinus Torvalds 3311da177e4SLinus Torvalds faddx COSB2,%fp2 | ...B2+T(B4+T(B6+TB8)) 3321da177e4SLinus Torvalds fadds COSB1,%fp1 | ...B1+T(B3+T(B5+TB7)) 3331da177e4SLinus Torvalds 3341da177e4SLinus Torvalds fmulx %fp2,%fp0 | ...S(B2+T(B4+T(B6+TB8))) 3351da177e4SLinus Torvalds|--FP3 RELEASED, RESTORE NOW AND TAKE SOME ADVANTAGE OF HIDING 3361da177e4SLinus Torvalds|--FP2 RELEASED. 3371da177e4SLinus Torvalds 3381da177e4SLinus Torvalds 3391da177e4SLinus Torvalds faddx %fp1,%fp0 3401da177e4SLinus Torvalds|--FP1 RELEASED 3411da177e4SLinus Torvalds 3421da177e4SLinus Torvalds fmulx X(%a6),%fp0 3431da177e4SLinus Torvalds 3441da177e4SLinus Torvalds fmovel %d1,%FPCR |restore users exceptions 3451da177e4SLinus Torvalds fadds POSNEG1(%a6),%fp0 |last inst - possible exception set 3461da177e4SLinus Torvalds bra t_frcinx 3471da177e4SLinus Torvalds 3481da177e4SLinus Torvalds 3491da177e4SLinus TorvaldsSINBORS: 3501da177e4SLinus Torvalds|--IF |X| > 15PI, WE USE THE GENERAL ARGUMENT REDUCTION. 3511da177e4SLinus Torvalds|--IF |X| < 2**(-40), RETURN X OR 1. 3521da177e4SLinus Torvalds cmpil #0x3FFF8000,%d0 3531da177e4SLinus Torvalds bgts REDUCEX 3541da177e4SLinus Torvalds 3551da177e4SLinus Torvalds 3561da177e4SLinus TorvaldsSINSM: 3571da177e4SLinus Torvalds movel ADJN(%a6),%d0 3581da177e4SLinus Torvalds cmpil #0,%d0 3591da177e4SLinus Torvalds bgts COSTINY 3601da177e4SLinus Torvalds 3611da177e4SLinus TorvaldsSINTINY: 3621da177e4SLinus Torvalds movew #0x0000,XDCARE(%a6) | ...JUST IN CASE 3631da177e4SLinus Torvalds fmovel %d1,%FPCR |restore users exceptions 3641da177e4SLinus Torvalds fmovex X(%a6),%fp0 |last inst - possible exception set 3651da177e4SLinus Torvalds bra t_frcinx 3661da177e4SLinus Torvalds 3671da177e4SLinus Torvalds 3681da177e4SLinus TorvaldsCOSTINY: 3691da177e4SLinus Torvalds fmoves #0x3F800000,%fp0 3701da177e4SLinus Torvalds 3711da177e4SLinus Torvalds fmovel %d1,%FPCR |restore users exceptions 3721da177e4SLinus Torvalds fsubs #0x00800000,%fp0 |last inst - possible exception set 3731da177e4SLinus Torvalds bra t_frcinx 3741da177e4SLinus Torvalds 3751da177e4SLinus Torvalds 3761da177e4SLinus TorvaldsREDUCEX: 3771da177e4SLinus Torvalds|--WHEN REDUCEX IS USED, THE CODE WILL INEVITABLY BE SLOW. 3781da177e4SLinus Torvalds|--THIS REDUCTION METHOD, HOWEVER, IS MUCH FASTER THAN USING 3791da177e4SLinus Torvalds|--THE REMAINDER INSTRUCTION WHICH IS NOW IN SOFTWARE. 3801da177e4SLinus Torvalds 3811da177e4SLinus Torvalds fmovemx %fp2-%fp5,-(%a7) | ...save FP2 through FP5 3821da177e4SLinus Torvalds movel %d2,-(%a7) 3831da177e4SLinus Torvalds fmoves #0x00000000,%fp1 3841da177e4SLinus Torvalds|--If compact form of abs(arg) in d0=$7ffeffff, argument is so large that 3851da177e4SLinus Torvalds|--there is a danger of unwanted overflow in first LOOP iteration. In this 3861da177e4SLinus Torvalds|--case, reduce argument by one remainder step to make subsequent reduction 3871da177e4SLinus Torvalds|--safe. 3881da177e4SLinus Torvalds cmpil #0x7ffeffff,%d0 |is argument dangerously large? 3891da177e4SLinus Torvalds bnes LOOP 3901da177e4SLinus Torvalds movel #0x7ffe0000,FP_SCR2(%a6) |yes 3911da177e4SLinus Torvalds| ;create 2**16383*PI/2 3921da177e4SLinus Torvalds movel #0xc90fdaa2,FP_SCR2+4(%a6) 3931da177e4SLinus Torvalds clrl FP_SCR2+8(%a6) 3941da177e4SLinus Torvalds ftstx %fp0 |test sign of argument 3951da177e4SLinus Torvalds movel #0x7fdc0000,FP_SCR3(%a6) |create low half of 2**16383* 3961da177e4SLinus Torvalds| ;PI/2 at FP_SCR3 3971da177e4SLinus Torvalds movel #0x85a308d3,FP_SCR3+4(%a6) 3981da177e4SLinus Torvalds clrl FP_SCR3+8(%a6) 3991da177e4SLinus Torvalds fblt red_neg 4001da177e4SLinus Torvalds orw #0x8000,FP_SCR2(%a6) |positive arg 4011da177e4SLinus Torvalds orw #0x8000,FP_SCR3(%a6) 4021da177e4SLinus Torvaldsred_neg: 4031da177e4SLinus Torvalds faddx FP_SCR2(%a6),%fp0 |high part of reduction is exact 4041da177e4SLinus Torvalds fmovex %fp0,%fp1 |save high result in fp1 4051da177e4SLinus Torvalds faddx FP_SCR3(%a6),%fp0 |low part of reduction 4061da177e4SLinus Torvalds fsubx %fp0,%fp1 |determine low component of result 4071da177e4SLinus Torvalds faddx FP_SCR3(%a6),%fp1 |fp0/fp1 are reduced argument. 4081da177e4SLinus Torvalds 4091da177e4SLinus Torvalds|--ON ENTRY, FP0 IS X, ON RETURN, FP0 IS X REM PI/2, |X| <= PI/4. 4101da177e4SLinus Torvalds|--integer quotient will be stored in N 4111da177e4SLinus Torvalds|--Intermediate remainder is 66-bit long; (R,r) in (FP0,FP1) 4121da177e4SLinus Torvalds 4131da177e4SLinus TorvaldsLOOP: 4141da177e4SLinus Torvalds fmovex %fp0,INARG(%a6) | ...+-2**K * F, 1 <= F < 2 4151da177e4SLinus Torvalds movew INARG(%a6),%d0 4161da177e4SLinus Torvalds movel %d0,%a1 | ...save a copy of D0 4171da177e4SLinus Torvalds andil #0x00007FFF,%d0 4181da177e4SLinus Torvalds subil #0x00003FFF,%d0 | ...D0 IS K 4191da177e4SLinus Torvalds cmpil #28,%d0 4201da177e4SLinus Torvalds bles LASTLOOP 4211da177e4SLinus TorvaldsCONTLOOP: 4221da177e4SLinus Torvalds subil #27,%d0 | ...D0 IS L := K-27 4231da177e4SLinus Torvalds movel #0,ENDFLAG(%a6) 4241da177e4SLinus Torvalds bras WORK 4251da177e4SLinus TorvaldsLASTLOOP: 4261da177e4SLinus Torvalds clrl %d0 | ...D0 IS L := 0 4271da177e4SLinus Torvalds movel #1,ENDFLAG(%a6) 4281da177e4SLinus Torvalds 4291da177e4SLinus TorvaldsWORK: 4301da177e4SLinus Torvalds|--FIND THE REMAINDER OF (R,r) W.R.T. 2**L * (PI/2). L IS SO CHOSEN 4311da177e4SLinus Torvalds|--THAT INT( X * (2/PI) / 2**(L) ) < 2**29. 4321da177e4SLinus Torvalds 4331da177e4SLinus Torvalds|--CREATE 2**(-L) * (2/PI), SIGN(INARG)*2**(63), 4341da177e4SLinus Torvalds|--2**L * (PIby2_1), 2**L * (PIby2_2) 4351da177e4SLinus Torvalds 4361da177e4SLinus Torvalds movel #0x00003FFE,%d2 | ...BIASED EXPO OF 2/PI 4371da177e4SLinus Torvalds subl %d0,%d2 | ...BIASED EXPO OF 2**(-L)*(2/PI) 4381da177e4SLinus Torvalds 4391da177e4SLinus Torvalds movel #0xA2F9836E,FP_SCR1+4(%a6) 4401da177e4SLinus Torvalds movel #0x4E44152A,FP_SCR1+8(%a6) 4411da177e4SLinus Torvalds movew %d2,FP_SCR1(%a6) | ...FP_SCR1 is 2**(-L)*(2/PI) 4421da177e4SLinus Torvalds 4431da177e4SLinus Torvalds fmovex %fp0,%fp2 4441da177e4SLinus Torvalds fmulx FP_SCR1(%a6),%fp2 4451da177e4SLinus Torvalds|--WE MUST NOW FIND INT(FP2). SINCE WE NEED THIS VALUE IN 4461da177e4SLinus Torvalds|--FLOATING POINT FORMAT, THE TWO FMOVE'S FMOVE.L FP <--> N 4471da177e4SLinus Torvalds|--WILL BE TOO INEFFICIENT. THE WAY AROUND IT IS THAT 4481da177e4SLinus Torvalds|--(SIGN(INARG)*2**63 + FP2) - SIGN(INARG)*2**63 WILL GIVE 4491da177e4SLinus Torvalds|--US THE DESIRED VALUE IN FLOATING POINT. 4501da177e4SLinus Torvalds 4511da177e4SLinus Torvalds|--HIDE SIX CYCLES OF INSTRUCTION 4521da177e4SLinus Torvalds movel %a1,%d2 4531da177e4SLinus Torvalds swap %d2 4541da177e4SLinus Torvalds andil #0x80000000,%d2 4551da177e4SLinus Torvalds oril #0x5F000000,%d2 | ...D2 IS SIGN(INARG)*2**63 IN SGL 4561da177e4SLinus Torvalds movel %d2,TWOTO63(%a6) 4571da177e4SLinus Torvalds 4581da177e4SLinus Torvalds movel %d0,%d2 4591da177e4SLinus Torvalds addil #0x00003FFF,%d2 | ...BIASED EXPO OF 2**L * (PI/2) 4601da177e4SLinus Torvalds 4611da177e4SLinus Torvalds|--FP2 IS READY 4621da177e4SLinus Torvalds fadds TWOTO63(%a6),%fp2 | ...THE FRACTIONAL PART OF FP1 IS ROUNDED 4631da177e4SLinus Torvalds 4641da177e4SLinus Torvalds|--HIDE 4 CYCLES OF INSTRUCTION; creating 2**(L)*Piby2_1 and 2**(L)*Piby2_2 4651da177e4SLinus Torvalds movew %d2,FP_SCR2(%a6) 4661da177e4SLinus Torvalds clrw FP_SCR2+2(%a6) 4671da177e4SLinus Torvalds movel #0xC90FDAA2,FP_SCR2+4(%a6) 4681da177e4SLinus Torvalds clrl FP_SCR2+8(%a6) | ...FP_SCR2 is 2**(L) * Piby2_1 4691da177e4SLinus Torvalds 4701da177e4SLinus Torvalds|--FP2 IS READY 4711da177e4SLinus Torvalds fsubs TWOTO63(%a6),%fp2 | ...FP2 is N 4721da177e4SLinus Torvalds 4731da177e4SLinus Torvalds addil #0x00003FDD,%d0 4741da177e4SLinus Torvalds movew %d0,FP_SCR3(%a6) 4751da177e4SLinus Torvalds clrw FP_SCR3+2(%a6) 4761da177e4SLinus Torvalds movel #0x85A308D3,FP_SCR3+4(%a6) 4771da177e4SLinus Torvalds clrl FP_SCR3+8(%a6) | ...FP_SCR3 is 2**(L) * Piby2_2 4781da177e4SLinus Torvalds 4791da177e4SLinus Torvalds movel ENDFLAG(%a6),%d0 4801da177e4SLinus Torvalds 4811da177e4SLinus Torvalds|--We are now ready to perform (R+r) - N*P1 - N*P2, P1 = 2**(L) * Piby2_1 and 4821da177e4SLinus Torvalds|--P2 = 2**(L) * Piby2_2 4831da177e4SLinus Torvalds fmovex %fp2,%fp4 4841da177e4SLinus Torvalds fmulx FP_SCR2(%a6),%fp4 | ...W = N*P1 4851da177e4SLinus Torvalds fmovex %fp2,%fp5 4861da177e4SLinus Torvalds fmulx FP_SCR3(%a6),%fp5 | ...w = N*P2 4871da177e4SLinus Torvalds fmovex %fp4,%fp3 4881da177e4SLinus Torvalds|--we want P+p = W+w but |p| <= half ulp of P 4891da177e4SLinus Torvalds|--Then, we need to compute A := R-P and a := r-p 4901da177e4SLinus Torvalds faddx %fp5,%fp3 | ...FP3 is P 4911da177e4SLinus Torvalds fsubx %fp3,%fp4 | ...W-P 4921da177e4SLinus Torvalds 4931da177e4SLinus Torvalds fsubx %fp3,%fp0 | ...FP0 is A := R - P 4941da177e4SLinus Torvalds faddx %fp5,%fp4 | ...FP4 is p = (W-P)+w 4951da177e4SLinus Torvalds 4961da177e4SLinus Torvalds fmovex %fp0,%fp3 | ...FP3 A 4971da177e4SLinus Torvalds fsubx %fp4,%fp1 | ...FP1 is a := r - p 4981da177e4SLinus Torvalds 4991da177e4SLinus Torvalds|--Now we need to normalize (A,a) to "new (R,r)" where R+r = A+a but 5001da177e4SLinus Torvalds|--|r| <= half ulp of R. 5011da177e4SLinus Torvalds faddx %fp1,%fp0 | ...FP0 is R := A+a 5021da177e4SLinus Torvalds|--No need to calculate r if this is the last loop 5031da177e4SLinus Torvalds cmpil #0,%d0 5041da177e4SLinus Torvalds bgt RESTORE 5051da177e4SLinus Torvalds 5061da177e4SLinus Torvalds|--Need to calculate r 5071da177e4SLinus Torvalds fsubx %fp0,%fp3 | ...A-R 5081da177e4SLinus Torvalds faddx %fp3,%fp1 | ...FP1 is r := (A-R)+a 5091da177e4SLinus Torvalds bra LOOP 5101da177e4SLinus Torvalds 5111da177e4SLinus TorvaldsRESTORE: 5121da177e4SLinus Torvalds fmovel %fp2,N(%a6) 5131da177e4SLinus Torvalds movel (%a7)+,%d2 5141da177e4SLinus Torvalds fmovemx (%a7)+,%fp2-%fp5 5151da177e4SLinus Torvalds 5161da177e4SLinus Torvalds 5171da177e4SLinus Torvalds movel ADJN(%a6),%d0 5181da177e4SLinus Torvalds cmpil #4,%d0 5191da177e4SLinus Torvalds 5201da177e4SLinus Torvalds blt SINCONT 5211da177e4SLinus Torvalds bras SCCONT 5221da177e4SLinus Torvalds 5231da177e4SLinus Torvalds .global ssincosd 5241da177e4SLinus Torvaldsssincosd: 5251da177e4SLinus Torvalds|--SIN AND COS OF X FOR DENORMALIZED X 5261da177e4SLinus Torvalds 5271da177e4SLinus Torvalds fmoves #0x3F800000,%fp1 5281da177e4SLinus Torvalds bsr sto_cos |store cosine result 5291da177e4SLinus Torvalds bra t_extdnrm 5301da177e4SLinus Torvalds 5311da177e4SLinus Torvalds .global ssincos 5321da177e4SLinus Torvaldsssincos: 5331da177e4SLinus Torvalds|--SET ADJN TO 4 5341da177e4SLinus Torvalds movel #4,ADJN(%a6) 5351da177e4SLinus Torvalds 5361da177e4SLinus Torvalds fmovex (%a0),%fp0 | ...LOAD INPUT 5371da177e4SLinus Torvalds 5381da177e4SLinus Torvalds movel (%a0),%d0 5391da177e4SLinus Torvalds movew 4(%a0),%d0 5401da177e4SLinus Torvalds fmovex %fp0,X(%a6) 5411da177e4SLinus Torvalds andil #0x7FFFFFFF,%d0 | ...COMPACTIFY X 5421da177e4SLinus Torvalds 5431da177e4SLinus Torvalds cmpil #0x3FD78000,%d0 | ...|X| >= 2**(-40)? 5441da177e4SLinus Torvalds bges SCOK1 5451da177e4SLinus Torvalds bra SCSM 5461da177e4SLinus Torvalds 5471da177e4SLinus TorvaldsSCOK1: 5481da177e4SLinus Torvalds cmpil #0x4004BC7E,%d0 | ...|X| < 15 PI? 5491da177e4SLinus Torvalds blts SCMAIN 5501da177e4SLinus Torvalds bra REDUCEX 5511da177e4SLinus Torvalds 5521da177e4SLinus Torvalds 5531da177e4SLinus TorvaldsSCMAIN: 5541da177e4SLinus Torvalds|--THIS IS THE USUAL CASE, |X| <= 15 PI. 5551da177e4SLinus Torvalds|--THE ARGUMENT REDUCTION IS DONE BY TABLE LOOK UP. 5561da177e4SLinus Torvalds fmovex %fp0,%fp1 5571da177e4SLinus Torvalds fmuld TWOBYPI,%fp1 | ...X*2/PI 5581da177e4SLinus Torvalds 5591da177e4SLinus Torvalds|--HIDE THE NEXT THREE INSTRUCTIONS 5601da177e4SLinus Torvalds lea PITBL+0x200,%a1 | ...TABLE OF N*PI/2, N = -32,...,32 5611da177e4SLinus Torvalds 5621da177e4SLinus Torvalds 5631da177e4SLinus Torvalds|--FP1 IS NOW READY 5641da177e4SLinus Torvalds fmovel %fp1,N(%a6) | ...CONVERT TO INTEGER 5651da177e4SLinus Torvalds 5661da177e4SLinus Torvalds movel N(%a6),%d0 5671da177e4SLinus Torvalds asll #4,%d0 5681da177e4SLinus Torvalds addal %d0,%a1 | ...ADDRESS OF N*PIBY2, IN Y1, Y2 5691da177e4SLinus Torvalds 5701da177e4SLinus Torvalds fsubx (%a1)+,%fp0 | ...X-Y1 5711da177e4SLinus Torvalds fsubs (%a1),%fp0 | ...FP0 IS R = (X-Y1)-Y2 5721da177e4SLinus Torvalds 5731da177e4SLinus TorvaldsSCCONT: 5741da177e4SLinus Torvalds|--continuation point from REDUCEX 5751da177e4SLinus Torvalds 5761da177e4SLinus Torvalds|--HIDE THE NEXT TWO 5771da177e4SLinus Torvalds movel N(%a6),%d0 5781da177e4SLinus Torvalds rorl #1,%d0 5791da177e4SLinus Torvalds 5801da177e4SLinus Torvalds cmpil #0,%d0 | ...D0 < 0 IFF N IS ODD 5811da177e4SLinus Torvalds bge NEVEN 5821da177e4SLinus Torvalds 5831da177e4SLinus TorvaldsNODD: 5841da177e4SLinus Torvalds|--REGISTERS SAVED SO FAR: D0, A0, FP2. 5851da177e4SLinus Torvalds 5861da177e4SLinus Torvalds fmovex %fp0,RPRIME(%a6) 5871da177e4SLinus Torvalds fmulx %fp0,%fp0 | ...FP0 IS S = R*R 5881da177e4SLinus Torvalds fmoved SINA7,%fp1 | ...A7 5891da177e4SLinus Torvalds fmoved COSB8,%fp2 | ...B8 5901da177e4SLinus Torvalds fmulx %fp0,%fp1 | ...SA7 5911da177e4SLinus Torvalds movel %d2,-(%a7) 5921da177e4SLinus Torvalds movel %d0,%d2 5931da177e4SLinus Torvalds fmulx %fp0,%fp2 | ...SB8 5941da177e4SLinus Torvalds rorl #1,%d2 5951da177e4SLinus Torvalds andil #0x80000000,%d2 5961da177e4SLinus Torvalds 5971da177e4SLinus Torvalds faddd SINA6,%fp1 | ...A6+SA7 5981da177e4SLinus Torvalds eorl %d0,%d2 5991da177e4SLinus Torvalds andil #0x80000000,%d2 6001da177e4SLinus Torvalds faddd COSB7,%fp2 | ...B7+SB8 6011da177e4SLinus Torvalds 6021da177e4SLinus Torvalds fmulx %fp0,%fp1 | ...S(A6+SA7) 6031da177e4SLinus Torvalds eorl %d2,RPRIME(%a6) 6041da177e4SLinus Torvalds movel (%a7)+,%d2 6051da177e4SLinus Torvalds fmulx %fp0,%fp2 | ...S(B7+SB8) 6061da177e4SLinus Torvalds rorl #1,%d0 6071da177e4SLinus Torvalds andil #0x80000000,%d0 6081da177e4SLinus Torvalds 6091da177e4SLinus Torvalds faddd SINA5,%fp1 | ...A5+S(A6+SA7) 6101da177e4SLinus Torvalds movel #0x3F800000,POSNEG1(%a6) 6111da177e4SLinus Torvalds eorl %d0,POSNEG1(%a6) 6121da177e4SLinus Torvalds faddd COSB6,%fp2 | ...B6+S(B7+SB8) 6131da177e4SLinus Torvalds 6141da177e4SLinus Torvalds fmulx %fp0,%fp1 | ...S(A5+S(A6+SA7)) 6151da177e4SLinus Torvalds fmulx %fp0,%fp2 | ...S(B6+S(B7+SB8)) 6161da177e4SLinus Torvalds fmovex %fp0,SPRIME(%a6) 6171da177e4SLinus Torvalds 6181da177e4SLinus Torvalds faddd SINA4,%fp1 | ...A4+S(A5+S(A6+SA7)) 6191da177e4SLinus Torvalds eorl %d0,SPRIME(%a6) 6201da177e4SLinus Torvalds faddd COSB5,%fp2 | ...B5+S(B6+S(B7+SB8)) 6211da177e4SLinus Torvalds 6221da177e4SLinus Torvalds fmulx %fp0,%fp1 | ...S(A4+...) 6231da177e4SLinus Torvalds fmulx %fp0,%fp2 | ...S(B5+...) 6241da177e4SLinus Torvalds 6251da177e4SLinus Torvalds faddd SINA3,%fp1 | ...A3+S(A4+...) 6261da177e4SLinus Torvalds faddd COSB4,%fp2 | ...B4+S(B5+...) 6271da177e4SLinus Torvalds 6281da177e4SLinus Torvalds fmulx %fp0,%fp1 | ...S(A3+...) 6291da177e4SLinus Torvalds fmulx %fp0,%fp2 | ...S(B4+...) 6301da177e4SLinus Torvalds 6311da177e4SLinus Torvalds faddx SINA2,%fp1 | ...A2+S(A3+...) 6321da177e4SLinus Torvalds faddx COSB3,%fp2 | ...B3+S(B4+...) 6331da177e4SLinus Torvalds 6341da177e4SLinus Torvalds fmulx %fp0,%fp1 | ...S(A2+...) 6351da177e4SLinus Torvalds fmulx %fp0,%fp2 | ...S(B3+...) 6361da177e4SLinus Torvalds 6371da177e4SLinus Torvalds faddx SINA1,%fp1 | ...A1+S(A2+...) 6381da177e4SLinus Torvalds faddx COSB2,%fp2 | ...B2+S(B3+...) 6391da177e4SLinus Torvalds 6401da177e4SLinus Torvalds fmulx %fp0,%fp1 | ...S(A1+...) 6411da177e4SLinus Torvalds fmulx %fp2,%fp0 | ...S(B2+...) 6421da177e4SLinus Torvalds 6431da177e4SLinus Torvalds 6441da177e4SLinus Torvalds 6451da177e4SLinus Torvalds fmulx RPRIME(%a6),%fp1 | ...R'S(A1+...) 6461da177e4SLinus Torvalds fadds COSB1,%fp0 | ...B1+S(B2...) 6471da177e4SLinus Torvalds fmulx SPRIME(%a6),%fp0 | ...S'(B1+S(B2+...)) 6481da177e4SLinus Torvalds 6491da177e4SLinus Torvalds movel %d1,-(%sp) |restore users mode & precision 6501da177e4SLinus Torvalds andil #0xff,%d1 |mask off all exceptions 6511da177e4SLinus Torvalds fmovel %d1,%FPCR 6521da177e4SLinus Torvalds faddx RPRIME(%a6),%fp1 | ...COS(X) 6531da177e4SLinus Torvalds bsr sto_cos |store cosine result 6541da177e4SLinus Torvalds fmovel (%sp)+,%FPCR |restore users exceptions 6551da177e4SLinus Torvalds fadds POSNEG1(%a6),%fp0 | ...SIN(X) 6561da177e4SLinus Torvalds 6571da177e4SLinus Torvalds bra t_frcinx 6581da177e4SLinus Torvalds 6591da177e4SLinus Torvalds 6601da177e4SLinus TorvaldsNEVEN: 6611da177e4SLinus Torvalds|--REGISTERS SAVED SO FAR: FP2. 6621da177e4SLinus Torvalds 6631da177e4SLinus Torvalds fmovex %fp0,RPRIME(%a6) 6641da177e4SLinus Torvalds fmulx %fp0,%fp0 | ...FP0 IS S = R*R 6651da177e4SLinus Torvalds fmoved COSB8,%fp1 | ...B8 6661da177e4SLinus Torvalds fmoved SINA7,%fp2 | ...A7 6671da177e4SLinus Torvalds fmulx %fp0,%fp1 | ...SB8 6681da177e4SLinus Torvalds fmovex %fp0,SPRIME(%a6) 6691da177e4SLinus Torvalds fmulx %fp0,%fp2 | ...SA7 6701da177e4SLinus Torvalds rorl #1,%d0 6711da177e4SLinus Torvalds andil #0x80000000,%d0 6721da177e4SLinus Torvalds faddd COSB7,%fp1 | ...B7+SB8 6731da177e4SLinus Torvalds faddd SINA6,%fp2 | ...A6+SA7 6741da177e4SLinus Torvalds eorl %d0,RPRIME(%a6) 6751da177e4SLinus Torvalds eorl %d0,SPRIME(%a6) 6761da177e4SLinus Torvalds fmulx %fp0,%fp1 | ...S(B7+SB8) 6771da177e4SLinus Torvalds oril #0x3F800000,%d0 6781da177e4SLinus Torvalds movel %d0,POSNEG1(%a6) 6791da177e4SLinus Torvalds fmulx %fp0,%fp2 | ...S(A6+SA7) 6801da177e4SLinus Torvalds 6811da177e4SLinus Torvalds faddd COSB6,%fp1 | ...B6+S(B7+SB8) 6821da177e4SLinus Torvalds faddd SINA5,%fp2 | ...A5+S(A6+SA7) 6831da177e4SLinus Torvalds 6841da177e4SLinus Torvalds fmulx %fp0,%fp1 | ...S(B6+S(B7+SB8)) 6851da177e4SLinus Torvalds fmulx %fp0,%fp2 | ...S(A5+S(A6+SA7)) 6861da177e4SLinus Torvalds 6871da177e4SLinus Torvalds faddd COSB5,%fp1 | ...B5+S(B6+S(B7+SB8)) 6881da177e4SLinus Torvalds faddd SINA4,%fp2 | ...A4+S(A5+S(A6+SA7)) 6891da177e4SLinus Torvalds 6901da177e4SLinus Torvalds fmulx %fp0,%fp1 | ...S(B5+...) 6911da177e4SLinus Torvalds fmulx %fp0,%fp2 | ...S(A4+...) 6921da177e4SLinus Torvalds 6931da177e4SLinus Torvalds faddd COSB4,%fp1 | ...B4+S(B5+...) 6941da177e4SLinus Torvalds faddd SINA3,%fp2 | ...A3+S(A4+...) 6951da177e4SLinus Torvalds 6961da177e4SLinus Torvalds fmulx %fp0,%fp1 | ...S(B4+...) 6971da177e4SLinus Torvalds fmulx %fp0,%fp2 | ...S(A3+...) 6981da177e4SLinus Torvalds 6991da177e4SLinus Torvalds faddx COSB3,%fp1 | ...B3+S(B4+...) 7001da177e4SLinus Torvalds faddx SINA2,%fp2 | ...A2+S(A3+...) 7011da177e4SLinus Torvalds 7021da177e4SLinus Torvalds fmulx %fp0,%fp1 | ...S(B3+...) 7031da177e4SLinus Torvalds fmulx %fp0,%fp2 | ...S(A2+...) 7041da177e4SLinus Torvalds 7051da177e4SLinus Torvalds faddx COSB2,%fp1 | ...B2+S(B3+...) 7061da177e4SLinus Torvalds faddx SINA1,%fp2 | ...A1+S(A2+...) 7071da177e4SLinus Torvalds 7081da177e4SLinus Torvalds fmulx %fp0,%fp1 | ...S(B2+...) 7091da177e4SLinus Torvalds fmulx %fp2,%fp0 | ...s(a1+...) 7101da177e4SLinus Torvalds 7111da177e4SLinus Torvalds 7121da177e4SLinus Torvalds 7131da177e4SLinus Torvalds fadds COSB1,%fp1 | ...B1+S(B2...) 7141da177e4SLinus Torvalds fmulx RPRIME(%a6),%fp0 | ...R'S(A1+...) 7151da177e4SLinus Torvalds fmulx SPRIME(%a6),%fp1 | ...S'(B1+S(B2+...)) 7161da177e4SLinus Torvalds 7171da177e4SLinus Torvalds movel %d1,-(%sp) |save users mode & precision 7181da177e4SLinus Torvalds andil #0xff,%d1 |mask off all exceptions 7191da177e4SLinus Torvalds fmovel %d1,%FPCR 7201da177e4SLinus Torvalds fadds POSNEG1(%a6),%fp1 | ...COS(X) 7211da177e4SLinus Torvalds bsr sto_cos |store cosine result 7221da177e4SLinus Torvalds fmovel (%sp)+,%FPCR |restore users exceptions 7231da177e4SLinus Torvalds faddx RPRIME(%a6),%fp0 | ...SIN(X) 7241da177e4SLinus Torvalds 7251da177e4SLinus Torvalds bra t_frcinx 7261da177e4SLinus Torvalds 7271da177e4SLinus TorvaldsSCBORS: 7281da177e4SLinus Torvalds cmpil #0x3FFF8000,%d0 7291da177e4SLinus Torvalds bgt REDUCEX 7301da177e4SLinus Torvalds 7311da177e4SLinus Torvalds 7321da177e4SLinus TorvaldsSCSM: 7331da177e4SLinus Torvalds movew #0x0000,XDCARE(%a6) 7341da177e4SLinus Torvalds fmoves #0x3F800000,%fp1 7351da177e4SLinus Torvalds 7361da177e4SLinus Torvalds movel %d1,-(%sp) |save users mode & precision 7371da177e4SLinus Torvalds andil #0xff,%d1 |mask off all exceptions 7381da177e4SLinus Torvalds fmovel %d1,%FPCR 7391da177e4SLinus Torvalds fsubs #0x00800000,%fp1 7401da177e4SLinus Torvalds bsr sto_cos |store cosine result 7411da177e4SLinus Torvalds fmovel (%sp)+,%FPCR |restore users exceptions 7421da177e4SLinus Torvalds fmovex X(%a6),%fp0 7431da177e4SLinus Torvalds bra t_frcinx 7441da177e4SLinus Torvalds 7451da177e4SLinus Torvalds |end 746