The goal of this thesis was to investigate the task of creating a computer generated animation and the computational demands of rendering. Autodesk's 3D Studio Version 4 package was used to draw, animate, and render the animation. I created a computer generated animation short using 3D Studio and incorporated techniques to refine the animation process. The final product contains audio effects and has a running time of approximately four minutes. Areas investigated include : (1) creating a "virtual set" using 3D Studio, (2) developing a screenplay, (3) modeling characters and objects, (4) animating the scenes, (5) timing the movements of the characters and objects, (6) rendering techniques, (7) converting and compressing animation files, (8) adding audio effects, and (9) computational requirements of rendering. In determining the computational needs of rendering, a ray-tracing program was profiled. Testing of the program involved replacing functions with macro equivalents in an effort to minimize computational overheard due to parameter passing and function calling.
Computer animation is one of the hottest areas of computer science today. With advances in computer architectures and modeling software, computer animation is becoming a quick, reliable means of visualizing ideas. Computer animation is used in various fields such as engineering, medicine, physics, architecture, and computer science. Simply by turning on the television, one can see how computer animation and computer generated graphics have become a useful means to both educate as well as entertain us.
With Pixar's release of Toy Story, the first computer generated motion picture which was 4 years in the making, computer generated movies have made their mark in the entertainment industry. Other films, such as Jurassic Park and Twister, have brought computers into modern motion picture making by allowing for special effects never before possible.
Computer animation is much more efficient than standard cartooning in that once a computer model is created, movement and rotation are easily done by the computer. This has a definite advantage over traditional cartoon making in that a cartoonist has to physically draw and color each individual cel whereas computer animation simplifies the process by automatically "tweening" between key frames of the animation. Furthermore, by using texture mapping (the process of applying an image to a computer model), photo-realistic results can be achieved using computer animation.
While computer animation simplifies the process of creating an
animation, it is still not an easy process. This thesis dealt
with both the process of creating a computer generated animation
using Autodesk's 3D Studio modeling package as well as the computational
demands of rendering. The process itself can be divided into the
following sections:
(1) creating a screenplay
(2) creating a 3D virtual world
(3) animating 3D objects
(4) rendering the animation
(5) adding audio and sound effects to the rendered movie.
This paper is divided into chapters that provide information on the process used to create a computer generated animation short as well as give background information on related topics. These chapters are described as follows:
Animation: Literature Review
This chapter is a synopsis of work that has been done in the area
of computer animation and computer generated images. This chapter
also contains information on the current state of computer animation
in modern film making.
Comparison: Computer Animation vs. Cartooning
This chapter describes the differences between 2-D and 3D computer
generated animation and traditional cartooning.
Modeling Packages
A description of the features found in most popular commercial
and freeware modeling packages, as well as a background in terminology,
are found in this chapter.
Creating the Magic
This chapter explains the procedure used to create a computer generated animation using 3D Studio. An explanation of each of the stages of the "animation pipeline" as well as programs within 3D Studio that aid in animation and 3D modeling is included. Tips and tricks found during the study are also covered in this section.
Network Rendering
This chapter describes the potential usefulness of network rendering
and explains the differences between single machine versus network
rendering.
Profiling Results for a Minimal Ray Tracer
To illustrate the computing power needed for rendering, a ray-tracing
program was profiled to give an idea of the processing "muscle"
that is required during rendering. Though 3D Studio uses polygonal
shading, an evaluation of ray-tracing will serve to demonstrate
the demands of rendering.
Problems Encountered
This chapter contains a discussion of the problems encountered
and their possible solutions.
Discussions and Future Work
This chapter describes the difficulty of this project and a provides
a "wish list" for things that I would like to study
further.
Conclusion
A summary of my results are found in this chapter.
Due to the overwhelming popularity and usefulness of computer animation there is definitely no shortage of books relating to the topic. Computer animation has evolved from traditional cartooning and is a cutting edge tool that has no limitations as to its potential. This chapter describes some of the books and materials that pertain to computer animation. There are two areas of emphasis that this thesis dealt with which are (1) animating a story, and (2) the computational demands needed for rendering the animation.
Thanks to movies such as Jurassic Park and Toy Story, there has been no shortage of information dealing with computer animation. One of the best sources of information about traditional animation, computer animation, and the rigors of rendering is Toy Story: The Making of the Animated Film (Lasseter, 1995). This book details the famous Disney approach of story telling and how they applied it to computer animation. Also in this book is a description of how the traditional cartoonists and the computer animators worked together by using story boards to describe each scene. Ironically, it turned out that the computer animators were relying too heavily on the story boards when they were creating the animation!
Before Toy Story there was Jurassic Park -- Steven Speilberg's dinosaur extravaganza. While Toy Story had a "cartoon-ish" look, the goal of Jurassic Park's animators was to create realistic virtual dinosaurs. The video, "The Making of Jurassic Park" (Michenaud, 1995) details the process of how the animators combined both live action with computer animation to achieve believable imagery to captivate the audience.
A wonderful book on traditional cartooning techniques is The Animation Book (Laybourne, 1979). This book explains the traditional method of creating cartoons where each individual frame of an animation must be hand drawn. Some of the topics covered in this book include cel animation, flip-books, and how to create an entire cartoon and put it on film. This information is especially useful later in this document when traditional cartooning is compared to modern computer animation.
During the course of this thesis I constantly referred to the 3D Studio Version 4 manuals (Autodesk, 3D Studio, 1994). These manuals describe some of the basics of computer animation and provide information on how to use 3D Studio Version 4. Along with these manuals I obtained a demo CD of 3D Studio Max (Kinetix, 1996) which provides technical information about the latest version of 3D Studio, as well as providing an instructive demonstration of the potential of computer animation.
A useful reference dealing with 2D computer animation is Autodesk's Animator Pro manuals (Autodesk, Animator Pro, 1994). These manuals not only contain information as to the use of Animator Pro, but also serve as a source of computer animation terminology. These manuals do a good job of explaining such concepts as "tweening" and keyframing. This information is particularly useful when describing the differences in traditional cartooning and modern 2-dimensional computer animation.
The most rigorous part of computer animation is rendering the animation. In the appendices I've included my results from profiling a ray-tracing program. Real-time rendering is becoming a more attainable goal through the research of Dr. Raghu Karinthi and his Z-Buffer Rendering paper (Karinthi, 1993). The goal of his research was to attain real-time rendering on a PC. Another excellent source of information on the demands of rendering is Andrew Glassner's "An Introduction to Ray-tracing" (Glassner, 1989) that fully explains the stages of ray-tracing and all of the computations that are required.
The book that has taught me more about graphics than any other is Introduction to Computer Graphics (Foley, 1994) which was my textbook from my first graphics course and was taught by Dr. Raghu Karinthi. This book explains the various shading models in great detail as well as describing basic graphic matrix operations for scaling, transforming, and rotating - all of which are used by 3D Studio during rendering.
There are many resources that can be found on the web dealing with computer animation and rendering. The largest collection of public domain modeling and rendering tools can be found at Viewpoint DataLab's Avalon web site, http://www.viewpoint.com. This site also contains many utilities for 3D Studio as well as a large collection of public domain models.
Ray-tracing has been an obsession for many people on the web and the home of ray-tracing is at http://www.povray.org. This site is dedicated to a freeware ray-tracing package called POV-Ray. There are sections of the site devoted to explaining what ray-tracing is, a file section where you can download the latest version of POV-Ray, a nice gallery of POV-Ray-traced images that artists have submitted, and help files on every aspect of ray-tracing. One of the sections of this site is a programming competition where programmers try to write enhancements to the POV-Ray program, or receive a challenge to write a minimal ray-tracer. Contained in this thesis are the profiling results that were done on one such minimal ray-tracer. The goal of the competition was to write a ray-tracing program that contained the fewest lines of code. The program that I profiled using a DLX simulator was minray.c. This program was a combination of the best submissions compilated by Dr. Paul Heckbert at Carnegie Mellon University. Dr. Heckbert is a computer science instructor and has specialized in computer graphics. His web page contains a vast collection of information pertaining to computer animation and graphics and is located at http://www.cs.cmu.edu/afs/cs/user/ph/www/heckbert.html.
Further information on Disney's approach to traditional cartooning can be found in Christopher Finch's book The Art of Walt Disney : From Mickey Mouse to the Magic Kingdoms (Finch, 1995). Another source of information on Disney's approach to cartooning is Bob Thomas's book Disney's Art of Animation : From Mickey Mouse to Beauty and the Beast (Thomas, 1991). These two books give a detailed description of the process Disney cartoonists use to animate a story from story-boarding to the big screen.
The premier company in modern motion picture special effects is George Lucas's Industrial Light & Magic. Lucas formed this company to work on his classic Star Wars trilogy. When the company was founded in the 1970's the average age of the employees was 23! Lucas recruited the best people he could find from high school and college drama clubs to work at the small start-up company. Mark Cotta Vaz's book Industrial Light & Magic : Into the Digital Realm (Cotta Vaz, 1996) describes the work done by the special effects company. The most notable work that ILM has done is the Star Wars trilogy, Jurassic Park, Twister, Star Trek Generations and most recently, Star Trek First Contact. With the right blend of computer animation, engineering, and art, ILM has become the undisputed leader in cutting edge imagery in Hollywood.
Persistence of vision is the term used to describe how many pictures that the human eye must see to give the effect of smooth motion. In order to achieve the illusion of motion for the human eye, 24 frames per second are necessary. This means that a traditional cartoonist has to hand draw 24 separate images for a single second of smooth flowing animation. Imagine a feature film such as Walt Disney's "Bambi" (Finch) that has a running time of 69 minutes -- that translates to 99,360 separate animation frames, or cels, that had to be hand drawn by cartoonists. To put that number into perspective, that's one frame of animation per character in this entire document! Typically, key-frames of an animation are drawn and act as the story board. Next, a team of artists known as "tweeners" draw the images that fill in the gaps. Imagine how much time is spent drawing and redrawing the same image by hand with only a minor difference between the two images to allow for the natural progression of the animation. If this "tweening" process could be automated so that a computer could draw and redraw the cels of an animation then the turn-about-time for an animation would be greatly improved. With computer animation packages, it is now possible to create an animation very quickly as opposed to conventional cartooning. Consider the following example.
Imagine a cartoon that consists of a circle and a large square, and in the animation it is desired to have the circle move past the square. Furthermore, it is desired that the animation last for 5 seconds. That is, from the time the circle starts moving to the time that it passes the square, five seconds will have elapsed. The first approach examined will be the traditional cartooning method followed by the more modern computerized method.
To first get a feel for how involved the animation is going to be, it is desirable to calculate how many frames are required for the 5 second animation. Knowing that to achieve persistence of vision it is necessary to have 24 frames per second of animation, the 5 second animation will require 120 separate frames of animation. Now, the task for the cartoonist is to draw each frame by hand and combine them all into an animation sequence. The best way for the cartoonist to accomplish this is to draw the key frames of the animation, the first and last frames for instance, and then draw the necessary cels that will create a smooth transition from the first frame to the final frame (Laybourne).


Using an animation package, such as Autodesk's Animator Pro, a
computer animator can use a built in tool to draw a circle and
a square. Then by specifying the key frames of the animation,
the first and last frames for example, it is possible for the
computer animator to quickly animate the sequence (Autodesk, Animator
Pro). This can be done by simply placing the two objects in the
first frame and then moving to the final frame, frame 120, and
place the circle some distance past the square. By placing the
objects at these key frames, the computer can then perform the
tweening process by drawing the subsequent frames between the
first and last frames to achieve a smooth animation.
It is easy to see how the computer can be used to create computer
animation from this simple example. In order to create the computer
animation the animator needed to only draw the circle and the
square, position the objects at the key frames, and finally let
the computer do the tweening. The entire process using the computer
takes a matter of seconds. The traditional cartoonist follows
a similar process in that the circle and square are drawn at the
key frames, but then each frame had to be hand drawn, thus increasing
the overall time required. In effect, the computer animator had
to draw only two separate frames, while the traditional cartoonist
had to draw 120 separate frames!
The term "three-dimensional computer graphics" refers to how mesh objects in the image relate to one another in terms of size, location, appearance, and orientation (Foley). 3D graphics allow the artist to create a "virtual world" similar to a child playing with Lincoln logs. Much in the same way that the child builds a log house, the computer animator can create a virtual log house using tubes, cylinders, and boxes. As the play scene becomes more elaborate, the child may want to create a small town by building more log buildings and placing them in different locations. The 3D animator follows a similar process by creating more buildings and placing them relative to the original building. Since the child is confined by the laws of physics, he has already been bound by gravity and more importantly, the floor. In the "virtual world" there are no set bounds for the floor so the artist may choose to create a flat prairie or a mountainous region to place the buildings. Now the child chooses to add some characters to his play set and he places them inside some of the buildings, in the street, and in a field. The animator does the same thing as the child except the animator has more flexibility in the choice of characters and their appearance. By placing the characters at desired positions, both the child and the animator are ready to create a story. The child elects to have some bad guys rob the local saloon and run out of the building into the crowded street.
In the 3D world, the animator defines the movements of the characters
in the street as well as the bad guys running out of the saloon.
By using the concept of keyframing, the animator can simply
design important, or key, frames of the animation that allow the
computer to generate the images between these key frames for a
smooth animation. Key frames for this example may be the people
walking from one end of the street to the other. By defining how
many frames it takes for the characters to walk from one end of
the street to the other, key frames can be established. Assuming
that the playback speed is going to be 30 frames per second and
taking into account how fast the people are to walk, it is possible
to determine the total number of frames required. Assume that
it should take 20 seconds for the characters to walk from one
end of the street to the other. That translates to 30*20, or 600
frames to make the journey. Key frames are then decided to be
at frame 0 and frame 600. At frame 0 the artist places characters
at opposite ends of the street so that it appears to be a busy,
crowded street. Next, the artist goes to frame 600 and moves each
character to the opposite end of the street. The computer will
now perform the "tweening" process to produce the corresponding
frames between these two key frames when the scene is rendered.
In effect, the animator only placed the objects where they needed
to be at certain points in the animation and let the computer
do the rest of the work.
A good understanding of the various modeling and rendering packages
available is a necessity in working with 3D graphics. This focus
of this chapter is to provide a background in terminology (Foley).
The first thing that is necessary to do is to distinguish between
a modeling package and a rendering package. A modeling package
allows a user to create and manipulate 3 dimensional objects.
Rendering packages allow the user to create a ray traced or polygonal-shading
image from a 3Dimensional model created by a modeling package.
Later in this document is an analysis of a ray-tracing program
which details the computational intensity of rendering. There
are many variations between modeling packages as well as rendering
packages, and the differences are very important in order to achieve
a desired image or animation appearance.
3D modeling packages are only as good as their tools and functions are. A good modeling package should offer the ability to create simple 3D objects, or primitives, such as spheres, cubes, tubes, and cylinders plus allow for proper placement of the objects. The ability to create simple 3D objects allows for more complex objects to be made by building them from the simpler objects. A useful feature is Boolean Operations that allows for the creation of more complex objects by combining simpler objects. An example of where Boolean operations can be used is in creating a cube with a hole through it as shown below.




In this example, I drew a cube and a cylinder in 3D Studio using the 3D Editor facility. I then placed the cylinder inside the cube where I wanted to create the hole. Basically, I want to create a new object by subtracting the cylinder's volume from the cube. An easy way to imagine this is to think of the cylinder as a drill bit and the cube as a block of wood. I want to remove the part of the cube that overlaps with the cylinder, i.e. drill the cube with my virtual drill bit.
The cube looks very nice as it is but what if I wanted to make it look like an actual piece of wood? The feature that allows you to add visual realism to an object is called texture mapping which is simply applying a bitmap image to a mesh object. By using texture mapping it is possible to change many features of an object depending on the bitmap or "material" that you choose to use. Another feature of texture mapping is that it allows you to add a bumpy look to an object, for example make the cube out of concrete.


A very useful feature for animating an object is known as Inverse
Kinematics which allows for a child object to move while properly
moving the parent object. A good example of this can be seen in
the following images:


The goal was to move the woman's hand above her head. There are several different ways that an animator could do this but none are as simple as using Inverse Kinematics. By using a pre-defined object hierarchy (discussed later in this thesis) and using a Kinematic Chain, the animation can be done by selecting the hand and dragging it above her head. With Inverse Kinematics it is also possible to define joint parameters such as type of joint and range of motion in order to get realistic movements so that the model behaves as the real object would.
For a more exciting use of Inverse Kinematics I chose to have
the individual in a kneeling position. The only work involved
in making this image was to drag her feet to the proper position
and then move her arms. The rest of the model was properly positioned
in relation to the movement of the feet and hands.

In contrast to the ease of animating using Inverse Kinematics is the manual process of rotating or moving each object in a hierarchy. This method takes a very long time but good detail and control can still be achieved. If an animator chooses not to use Inverse Kinematics then in order to move the character's hand over her head, the upper arm must first be rotated, then the lower arm, and finally the hand and its fingers. From this comparison it should be easy to see that using Inverse Kinematics can greatly enhance the animation process.
The following sections of this paper will describe the process
I used to create the computer generated animation. The sections
are divided as follows:
Creating a 3-D Virtual World
In this section I will explain how I created the characters and objects used in the movie. Before I was able to use the characters and the objects in the story, they had to be created, or modeled, in a 3 dimensional space. Some models are easy to create while others can be quite complex depending on several factors such as visual appearance, movement, etc. Simple models include the grassy field where complex models include the characters such as the boy or the egret bird.
A simple model used in the story was the ground. In order to model the ground I created a very large, thin box using the 3D Editor. Then by applying mapping coordinates and using texture mapping to apply a "grassy" look to the box, I had created a large field. This object was quite simple to create and gives the proper appearance of a grassy field. While this one was simple, there are others that can become quite complex and involved.
An example of complex models that I created are the stick figures or "skeleton" figures that I used in the story. By using a stick figure I can later apply a more complex mesh to it and have the stick figure behave more like a skeleton for the more complex model. A simple example would be that if I animate a stick figure moving its arms about, then I apply a realistic looking model of a little boy to the stick figure, the model of the little boy will move as does the stick figure "skeleton". By taking this approach I can apply any high quality mesh to the stick figure model. More information can be found in the "Problems Encountered" section of this paper.
The goal was to create a stick figure that behaves identically like a real human skeleton. That is, to try to get the proper proportionality and more importantly to properly define the joints of the model. The steps I used to create a good, working model are as follows:
In order to draw a good representation of the human body I had to determine what I needed to represent and ultimately how I represented it. The important concept to realize here is that detail does not matter. The most important thing to concentrate on is proper representation of the major parts of the body. The parts I chose to represent in my model are the head, neck, upper torso, lower torso, hips, upper arms, lower arms, hands, upper leg, lower leg, and the feet. Please note that the detail on the hands and feet are not important right now and they are represented simply as blocks.
Using the 3D Editor I simply created boxes of different dimensions
to represent most of the body parts. The only body part that is
not represented as a box is the upper torso which I chose to use
a 3 sided box, i.e., a triangular box as can be seen in the following
images:




Notice in the pictures how the figure is in different positions.
This is very easy to do once the proper object hierarchies are
in effect as well as the proper joint definitions.
Imagine how each of your body parts moves in relation to one another.
Take a look at your feet and your lower leg and consider how each
of these moves with respect to the other. Now establish a parent/child
hierarchy for these two body parts. Try moving your foot without
moving your lower leg. Pretty easy to do. Now, try moving your
lower leg without moving your foot. This is impossible. The reason
this is impossible is that the foot is a child of the lower leg.
That is, whenever the lower leg is moved, the foot must move as
well. Perhaps the foot itself does not move but its location in
3 space does change due to the movement of your lower leg. Using
this as a model I had to establish a parent/child hierarchy for
the parts of the model's body that would behave most like the
human body. The following chart shows the hierarchies I used for
the model.

From this chart it is seen that the Right Foot is the child of the Right Lower Leg, or that the Hips are the child of the Lower Torso. In order to implement this hierarchy I linked the objects together using the Keyframer Hierarchy command.
The steps involved were:
Choose "Link Objects" from the Hierarchy menu. Here you must first specify the child object and then the parent object. Referring back to the hierarchy chart, I started with the Right Foot as my first child object. I then selected the Right Lower Leg as its parent. I then selected the Right Lower Leg as the child object and the Right Upper Leg as the parent object. I repeated this process for Left leg objects. I then selected the Right Upper Leg as the child and selected the Hips as the parent object. I repeated this for the Left Upper Leg. At this point, if anything is done to the Hips, such as movement or rotation, every child, grandchild, and subsequent offspring will be affected. Continuing, I select the Right Hand as the child and then select the Right Lower Arm as it's parent. I then selected the Right Lower Arm as the child and the Right Upper Arm as its parent. I repeated this process for the Left Hand, Left Lower Arm, and Left Upper Arm.
Now I needed to establish a link between the two Upper Arms and the Upper Torso. To do this, I selected the Right Upper Arm as the child object and the Upper Torso as the parent. I then repeated this process for the Left Upper Arm. At this point the arms are linked to the upper torso and the legs are attached to the hips. I then combined the upper and lower halves of the model. Remembering that the Upper Torso is the parent object to all other objects I established a link between the Hips and the Upper Torso. This was done using the Lower Torso. I selected the Hips to be a child object and the Lower Torso to be the parent object. To finish linking the body together I selected the Lower Torso as the child and the Upper Torso as the parent. At this point, any change to the Upper Torso (such as movement) will have an effect on all parts of the model.
The model is still not finished as it is missing a head and a
neck. This part was confusing for me when I first started defining
the hierarchies. In my mind I felt that the head had more control
than did the neck, thus making the head the neck's parent. I found
out however that if I selected the neck as the child and the head
as the parent, I could not link the head and neck to the rest
of the body. With this I had to declare the Head as the child
and the Neck as the parent. In order to then complete the model
I selected the Neck as the child object and the Upper Torso as
the parent object. With the conclusion of this phase, the model
is now ready to have the joints properly defined.
Now that the objects are all linked together it is necessary to properly position the pivot points for all the objects. This is important so that whenever an object is rotated it will rotate about the proper pivot point. Another major reason that this is important is for establishing proper joint freedoms later on. This section will explain how to use the Keyframer to correctly place the pivot point for each object.
To place an object's pivot point, it is essential to have already
established an object hierarchy for the model. By placing the
object pivot points I am specifying center of rotation for each
object. When modifying or placing an object's pivot point, the
object as well as its parent object are displayed in 4 separate
viewports which aids in the correctly placing the pivot point.
I will explain how I placed the object pivot points for the objects.
The process is fairly straightforward so I will only detail one
specific example for the Right Foot. The remaining object pivot
points can be placed by simply repeating the following process:
In the Keyframer program, I used the "Object Pivot" command under the Hierarchy menu and selected the Right Foot of the model. Upon selecting the foot, I now see 4 viewports that contain the Right Foot, its parent object, and a black X. The X marks the current location of the object's pivot point. By using the four viewports which I chose to be a top view, front view, left view, and a user defined view, I could easily adjust the object pivot point by placing the X at the desired locations in the 4 viewports. Once the pivot point is in the correct position for the different viewpoints then I know that it is in the correct position for the object. The proper location for the pivot point when modeling a human joint is in the center of where the two objects, child and parent, meet.
At this point, the Right Foot can be rotated and will move about
the pivot point. However, even though the object rotates about
the proper point there are no constraints on the joint. That is,
the foot can be rotated in abnormal directions which are not possible
for humans! Also, the Right Foot can intersect the Right Lower
Leg since the objects are not rigid and more importantly the joint
parameters have not yet been sufficiently defined.


The above images show the "natural" or original position
of the foot and the second image shows that the foot has been
rotated about its pivot point but is not a natural human rotation.
Inverse Kinematics is a plug-in utility for 3D Studio that allows an animator to create a more natural animation by simply moving a leaf object rather than moving its ancestor objects (Autodesk, 3D Studio). The main advantage of using Inverse Kinematics is that it is an easier method of creating life like animation sequences in a very short period of time as compared to manually moving an ancestor object and its subsequent children. More simply, if you want an animation of a person dribbling a basketball, simply use the Inverse Kinematics plug-in and have the hand follow the ball and IK will solve for all object positions in the hierarchy of the body. If the left hand is to be moved over the head, simply move the left hand above the head and the lower arm and upper arm movements will be taken care of automatically. Before Inverse Kinematics can be used to create an animation sequence the objects must be prepared for use. The IK feature requires that an object hierarchy already be established and that the object pivot points have been set. The next step is to define the restrictions on the joints of the human model.
The Inverse Kinematics plug-in for 3D Studio is a KXP program and must be run from within the Keyframer. After starting the Inverse Kinematics program, choose Pick Objects and then select any part of the model hierarchy in the Keyframer. Next, select the Joint Parameters button. The resulting screen is where you define the type of joint (sliding versus revolving) and then define its range of motion in terms of X, Y, Z.
The following is a table of typical joint settings (Autodesk, 3D Studio).
| Object | X From | X To | Y From | Y To | Z From | Z To |
| L. Foot | 40 | -25 | -15 | 15 | -10 | 10 |
| L. Shin | -135 | 0 | 0 | 0 | 0 | 0 |
| L. Thigh | 80 | -80 | 0 | 0 | 10 | -10 |
| R. Foot | 40 | -25 | -15 | 15 | -10 | 10 |
| R. Shin | -135 | 0 | 0 | 0 | 0 | 0 |
| R. Thigh | 80 | -80 | 0 | 0 | 10 | -10 |
| Pelvis | 0 | 0 | 0 | 0 | 0 | 0 |
| Chest | -75 | 20 | 35 | -35 | 35 | -35 |
| Neck | 45 | -35 | 15 | -15 | 25 | -25 |
| Head | 45 | -45 | 80 | -80 | 30 | -30 |
| L. Hand | 85 | -65 | 80 | -110 | 10 | -30 |
| L. Forearm | 140 | 0 | 10 | -10 | 0 | 0 |
| L. Upr Arm | -45 | 180 | 10 | -10 | 150 | -5 |
| R. Hand | 85 | -65 | 80 | -110 | 10 | -30 |
| R. Forearm | 140 | 0 | 10 | -10 | 0 | 0 |
| R. Upr Arm | -45 | 180 | 10 | -10 | 150 | -5 |
A helpful hint in defining the joint parameters is to make them all revolving joints and then define the range of motion possible in each of the X, Y, and Z axes. I will now demonstrate how the chart is used in defining joint behavior. For example, move your left forearm. The design of the elbow joint is such that there is a large range of motion along the X axis, a much smaller range along the Y axis, and no freedom along the Z axis. The joint information used here is simply a suggested guideline and is supplied in the manual for the Inverse Kinematics utility.
Once all of the joint parameters are defined, it is then possible to use the settings to manipulate the model in a very quick and efficient manner. There is a problem with the version of Inverse Kinematics that is supplied with 3D Studio Version 4 in that only objects that are linked in a hierarchy are able to be imported into the Inverse Kinematics plug-in. The inability to only work with linked objects makes it difficult, if not impossible, to easily create an animation involving two separate entities. The example I will use deals with the hero of the story trying to sit under the shade tree. I've already defined the hierarchy and joint parameters of the hero and now I want to be able to use the power of Inverse Kinematics to make the character sit under the tree. Herein lies the problem that I can only import objects that are linked together. Obviously the hero is not in any way attached to a tree so I am unable to manipulate the character with respect to the tree in the IK program. If I linked the hero and the tree, then I would have a pivot point between the two objects so whenever I would try to move the parent of all other objects, the entire hierarchy would move. In other words, if I linked all objects to the Upper Torso of the hero, whenever I move the Upper Torso the tree will also move. I am hopeful that 3D Studio Max solves for this discrepancy.
Since I was unable to use the power of Inverse Kinematics I had to manually rotate each of the limbs of the characters using the Keyframer. This takes a much longer time to do but without manually moving each limb I would be unable to see how the characters and objects interact with one another during the animation process.
Creating a screenplay is basically taking a story and creating the environment in a 3-dimensional virtual world (for my thesis I animated excerpts from a Bengali comic book (Ray, 1972) ). The procedure for doing the transformation is to take the clean, distinct breaks in the story and define or build scenes from this information. The model used for this thesis is as follows:
When a character is introduced to or exits from the main set and the flow of the story allows for a distinct "break", then a scene can be developed. By knowing the dialogue in this scene, character placement, actions, and movement can be developed. Another key factor in creating a screenplay is to define the proper environment or setting. An advantage of using virtual sets as opposed to physical sets is that the computer allows an animator to create any set or environment that the is necessary at a very low cost comparison to the physical construction of the set. Another advantage of the virtual set is that it may be impossible to create the desired set in the real world.
In this study, the screenplay was developed from rough ideas of environment, characters needed, character placement, and finally character movements and actions. Other features can be added to enhance the overall effect of the movie but are not critical at the time of creating the animation. An example of this may be a flock of birds flying around trees or a twinkling sun.
The first step in creating the screenplay was to create the set.
In the opening scene of the story, the little boy is sitting under
a shade tree and is sweating. When he reaches for a handkerchief
sitting next to him to wipe off his sweat, he discovers that it
has transformed into a cat. By taking this information, a rough
screenplay can be developed. I chose to have the setting to be
a flat, grassy field with some palm trees and a blue sky with
clouds. The logic behind these choices is simple - the green grass
and blue sky give a feeling of a nice, summer or spring day. In
order to develop the idea of a hot day, I chose to use palm trees
as opposed to oak or other seasonal trees. The distinct advantage
of using the palm tree is for effect as well as for rendering/computational
complexity. The palm tree has a much simpler geometry than do
the seasonal trees. The seasonal tree models have roughly 20,000
faces each and require much more computational resources in terms
of time and memory than do the much simpler palm tree models.

Since I am creating the screenplay from scratch I have the liberty
of adding certain visual effects that will enhance the overall
scene. One such enhancement was the notion of a yard. This was
achieved by using white picket fencing to mark off a perimeter.
I also felt that the screenplay would benefit by having the boy
walk from inside of his house to the shade tree. This gave a little
more background familiarity for the character by giving the viewer
something to relate to with the character. I chose a simple model
of a house since it is not a key factor in the progression of
the story. The underlying philosophy I employed during the entire
process is that "the simpler the models, the less computational
time required to render the animation."
Now that the set has been created, the next important phase is to animate the objects in the scene. To start the opening scene I wanted to have the character walk from inside of his house to the main tree. For me to do this, several things needed to be considered. I had to determine how far the tree was from the house and also imagine how fast the character would be traveling to get there - i.e. is he walking or running? A more technical area that needed to be addressed was that of timing. Since my target frame rate at playback is 15 frames per second (30 - 60 is optimal but for time and space considerations I'm using 15 fps) and I chose to have the character walk, I had to calculate how many frames will have elapsed while walking from the house to the tree. By guessing that it would take roughly 4 seconds to walk from the house to the tree, I can calculate that I will then need 60 frames to work with. Now I have my first two key frames - frame 0 and frame 60. Note that this is actually 61 frames but in order to maintain simplicity I tried to number my key frames at frames with integer divisor of 5. This book keeping methodology is helpful when "debugging" an animation sequence in that the key frames are easier to remember and I can quickly traverse through the sequence and fix any problems. There's no reason to write down the key frames either since 3D Studio will color the mesh parts differently depending on when the part was last moved. For example, if I am at frame 20 and I rotate a character's arm, the rotated part of the mesh, the arm in this case would then be displayed as a white wire frame and the rest of the mesh would be displayed using a black wire frame.
To begin the animation I enter the keyframer program in 3D Studio and set my animation counter to frame 0. Here I set initial positions of all the objects and characters in the scene. At frame 0 I will place the character inside of the house with the door already open. I have to pay careful attention so that the character and all the objects are in the proper position with respect to the ground and other objects. In the animation world, objects are not "rigid" and can overlap or intersect other objects. I can properly place the objects by using the 4 different viewports in the keyframer program. By adjusting the viewports I can get any angle or view of the scene or a particular object that I need. The most common views are already defined, such as top, left, right, front, etc. but there is also the option of creating a user defined viewport by rotating the axes with the mouse. Once the proper perspective is achieved it may be necessary to zoom in on a particular area of interest, say, the character's feet with respect to the ground.
By looking at the location of the character's feet with respect to the ground, it is then possible to properly place the character so that the feet are on the ground. To move the character, you must move the object that is the parent to all other objects. In my case, to move the character I have to move the Upper Torso. If I simply move the legs or the feet, the legs will then detach from the rest of the model. I would like to clarify that by moving an object, I do not mean to simply rotate it but rather to actually change the object's location with respect to the other objects in the scene. Once the character is properly positioned with respect to the ground, simply repeat the process for each character or object in the scene.
Now it is time to animate the character by making him walk from the house to the tree. Since I'm using a 15 frames per second target playback speed, I set the current frame to 50. Move the character from the house to the tree by moving the Upper Torso from the house to the tree. Doing so will also move the children of the Upper Torso so that the entire character moves. If you were to then playback the 50 frame segment, the character would move from the house to the tree.

Now that the character is standing next to the tree, it is desirable to make the character sit under the tree using a kneeling approach. At frame 55, rotate the character's left leg back, left calf back, right leg forward, and move the entire character so that the feet are always touching the ground. What is taking place is that we are making the character start to kneel.

At frame 60, the character should be in the kneeling position by having the character's left knee on the ground and the right thigh should be parallel to the ground with the right calf perpendicular to the ground.

It is now time to have the character rotate the left leg so that
the left calf is under the right knee. The right calf should be
rotated towards the character and the right thigh should be rotated
upwards. Keep in mind that the entire model should be moved so
that the feet are touching the ground.

The final step in having the character kneel is to rotate the right leg and left leg and position the character so that the feet and the character's hips are on the ground.

As the story progresses, the little boy reaches for his handkerchief and discovers that it is has transformed into a cat. To create this effect I reduced the scale of the handkerchief and increased the scale of the cat. By reducing the scale of the handkerchief I was able to make the handkerchief disappear over several frames. I chose to have the handkerchief transform into a cat over frames 80 to 150.

Once the handkerchief is small enough that it cannot be seen, I then increase the scale of the cat (which I originally altered so that the cat would not appear until needed).

Once the cat appears it is necessary to have interaction between the characters. The little boy is quite surprised that the cat is there and is having trouble believing what has taken place. The next frame I will describe is frame 250 which has the main character covering his eyes in disbelief and the cat moving to stand in front of him. In order to have the boy stand, I used the same process that I described for the sitting effect in that the legs are properly rotated (with respect to how real human legs bend) and the feet must always be on the ground. All movement is done using limb rotation and by actually placing the character in proper position with respect to the ground.

At frame 320 I wanted to have the boy reach out to the cat to convince himself that he was seeing a talking cat. By rotating the boy's limbs, similar to the kneeling process, in the Keyframer, I am able to have him timidly reach out for the cat.

Later in the story the cat leaves and a bird comes to talk to the boy. To make the cat leave I moved it off of the set so that it would no longer be visible by the camera. At frame 615 the boy kneels to talk with the bird.

By following the steps that I have discussed in this chapter it is possible to see how to animate an entire scene. The process of using limb rotation is an effective alternative to using Inverse Kinematics but does take more time for the animator to achieve realistic results. As discussed in other chapters of this document I found that the Inverse Kinematics plug-in for 3D Studio Version 4 is not useful for character/object interaction. It should be noted however that Inverse Kinematics is a powerful tool used by many high-end modeling programs.
As the scene stands right now, the viewpoint is not very impressive.
Though it gives the entire scope of the scene, it also gives too
much of a view for the entire performance of the story. As the
story progresses it is important that the camera move with the
story so that the audience gets the view that is best suited for
each moment of the story. Before proper camera placement could
be achieved, it was imperative that the character placement be
developed and implemented. With this task completed it is now
simple for the animator to place cameras and their targets in
order to get a good view of the action in the scene. The method
I chose was to dolly the camera and target to give various panning
views since time does not allow me to do cut scenes for each shot.
The rendering phase of creating a computer generated movie is one of the most important steps. Here, all the animation and texture mapping are combined to make the scene look realistic - or however the animator wants it to look. This section will discuss the rendering techniques employed in this thesis and will provide information on rendering times for some complex scenes. Rendering times can vary depending on the scene complexity and the rendering model used from a few seconds to several days per image.
The rendering part of 3D Studio allows the animator to control
many aspects of the rendering technique by simply pressing a mouse
button. The most general rendering options are available as buttons
while more in-depth control can be achieved by altering the default
values. I will first explain the general options before explaining
the more advanced topics The general layout of the rendering screen
is as follows.
Shading Limit Flat Gouraud Phong Metal
Anti-Aliasing ON OFF
Filter Maps ON OFF
Shadows ON OFF
Mapping ON OFF
Auto-Reflect ON OFF
Force 2-Sided ON OFF
Force Wire ON OFF
Hidden Geometry SHOW HIDE
Background RESCALE TILE
Configure
File Type Targa
Driver Vibrant
Resolution 0 x 0 x 0.00
Options
Video Color Check OFF
Pixel Size 1.10
Render Alpha NO
Gamma Correction ON
Output
Display No Display Hardcopy Disk
Net ASAP Net Queue
Render
Cancel
This is a representation of the rendering screen within 3D Studio. The bold face items are buttons that are either toggled on/off or will allow you to alter the settings below the button. For example, toggle buttons are the ON OFF buttons which set whether the parameter is set to on or off, such as Anti-Aliasing. An example of a button that allows for setting other parameters is the Configure button which will bring up a separate screen in order to set the File Type, Driver, and the Resolution. An explanation of the basic options follows (Autodesk, 3D Studio).
Shading Limit
3D Studio allows the animator to choose from Flat, Gouraud, Phong, and Metal.
Flat
This is the simplest shading model for a polygon and is often referred to as constant shading. This shading model applies a single color to each face or the object depending on the location of the light source. It is the fastest shading model and yields a faceted look to a rendered object. Notice the specular highlight, or white spot, in the first 2 figures. By taking a closer look at the highlight on the sphere it is possible to see the faceted effect of flat shading.



Gouraud
This shading model will give a more realistic look to a rendered
image than will the flat shading model. A characteristic of the
Gouraud shading model is that it tends to blur the specular highlight
since it interpolates color along each face based on the 3 vertices
of the face. A closer look at the specular highlight on the sphere
demonstrates the difference between Gouraud shading and Flat shading.



Phong
This shading model yields a much more realistic effect than does the Gouraud model in that it interpolates the surface normal at each pixel based on the normals at each vertex. The result is that each pixel can be given a unique color thus achieving a much more realistic looking image. A good example of this can be seen in the specular highlights. Comparison of the various shading models treatment of specular highlights is a good demonstration of their differences and abilities.



Metal
Metal shading produces a metallic effect and is similar to Phong shading. The difference between Metal and Phong shading is that Metal shading mixes ambient and diffuse colors differently, giving an increase in contrast of the specular highlight. This is a new shading model found in 3D Studio. It adds a swirl effect to an image.



Anti-Aliasing
Aliasing is the term used to describe the jagged appearance of
an edge and is commonly known as "staircasing". This
is caused when an "all or nothing" approach is taken
to directly translate a scan line to a pixel, where each pixel
in the line is replaced with the line's actual color or left unchanged.
An example of aliasing can be seen in the following simple example.


The first image shows a relatively nice circle with smooth edges.
If a closer look were taken and we actually zoom in on the circle,
it is possible to see the effects of aliasing. The second image
is a zoom of part of the circle. In this picture it is possible
to see that the circle is comprised of small pixels. The arrangement
of the pixels is such that for each point along the circle's computed
edge, the pixel is either black or white.






In the above pictures, the images on the left are the original,
aliased images and the images on the right are rendered using
anti-aliasing. Notice the jagged appearance of the edge of the
ball and the rings. The images on the right were rendered using
anti-aliasing techniques. In the full size image (all are 640x480
full size) the staircase effect, or aliasing, is greatly reduced.
For this example it is possible to see what effect anti-aliasing
can have on an image. It should be noted that due to printer resolution,
the aliasing may be increased in all the images but it is still
possible to see the difference between aliasing and anti-aliasing.
Filter Maps
Toggles the filtering of mapped materials. Typically this option
should be left ON unless it is desirable to have extremely sharp
textures in the background of the scene.
Shadows
If this option is OFF then there will be no cast shadows in a
scene. This will speed up the rendering but will not give a realistic
image due to the lack of shadows.


The difference in rendering times are significant between the
two options. For the image with the shadows OFF, the rendering
time was 14 seconds. For the image with the shadows ON, the rendering
time was 27 seconds.
Mapping
If this option is OFF, the material mapping information will be
ignored however the image will render faster. This is good for
while test rendering an image.
Auto-Reflect
If this option is OFF, the auto-reflection maps will be ignored.
It will speed up the rendering if it is off. Auto-reflection
is when an object can be seen on another object just like the
reflection on a pane of glass or shiny object.
Force 2-Sided
If this option is OFF, only the outer, or visible side of a face
will be rendered. If this option is ON, both sides of the face
will be rendered.


With Two Sided OFF the image took 22 seconds to render. With Two
Sided on the image took 25 seconds to render. From the above images
it is easy to see the effect that the Two Sided option has on
an image.
Force Wire
If this option is ON and Anti-Aliasing is ON, the object will
be rendered using a single pixel width lines.

Hidden Geometry
If this option is SHOW, all objects, even hidden ones, will be
rendered. A hidden object is an object that the user hid in the
scene so that it would not be displayed while working on creating
the image.
Background
If this option is set to TILE then the background image will be tiled. If the option is RESCALE, the bitmap image that is to be the background will be re-scaled to the rendering resolution


File Type
This is the desired output file type. The available file types
are Targa, JPEG, GIF, FLIC.
Driver
This is the current video display driver and should not be changed
unless an updated driver is available or the video card is changed.
Resolution
The output resolution of the image or animation. The format is
as follows: # pixels in the X direction x # pixels in the Y direction
x aspect ratio. Typical resolutions are 320 x 200 x .83, 640 x
480 x 1, 1024 x 768 x 1.
Video Color Check
If this option is on the colors are checked against a threshold
of valid colors. If this option is off, certain colors will tend
to blur when displayed on the computer screen.
Pixel Size
This option will smooth the edges of objects without blurring
the object. The range for Pixel Size is 1.0 to 1.5, where higher
values result in better quality rendering.
Render Alpha
If this option is on it is then possible to have a 32-bit Targa
image.
Gamma Correction
Allows user to calibrate 3D Studio's color map to you monitor. This is often necessary when a rendered image is too dark to be easily seen on the monitor.
Output
This is where you tell the program where to send the rendering
job. The following is an explanation of each of the available
selections.
Display - this will display the rendering to the computer screen
No Display - this option will not display the rendering to the computer screen
Hardcopy - this option will send the rendering to a printer
Disk - save the rendering to a file.
Net ASAP - send the rendering job to the network for immediate rendering
Net Queue - send the rendering job to the network queue for later rendering
Render
This button starts the rendering process.
Cancel
This button cancels the rendering setup screen .
When I was ready to render the final animation, the settings I used were Phong shading, Anti-aliasing ON, Filter Maps ON, Shadows ON, Mapping ON, Auto-reflect OFF, Force 2-sided ON, Force Wire OFF, Hidden Geometry HIDE, and Background RESCALE. The resolution I chose was a 640 x 480 and the File Type was a FLIC. The remaining options were left at the default values.
As a perfomance note, I rendered the animation using a Pentium
100 with 32 megabytes of RAM and a 486 DX4-100 with 16 megabytes
of RAM. I rendered 100 frame segments at a time due to the large
size of the FLICs. The Pentium averaged 30 seconds per frame and
the 486 averaged 3 minutes per frame. I am convinced that the
major difference in rendering time is due to the difference in
RAM. The 486 was swapping with the hard drive while the Pentium
did not. 3D Studio reported that the total memory necessary to
render the animation was approximately 18 megabytes. In that event,
the Pentium would never have to use a swap file whereas the 486
would.
Adding audio to the animation was a fairly straightforward process. Once I had a rendered scene that I wanted to add audio to I had to first convert the scene from an Autodesk FLIC format to a Windows AVI format. Once the rendered scene was converted using VidEdit, I then used the Sound Recorder program that comes with Windows 95 to add dialogue. The third step was to then merge the animation and the audio together which is accomplished by a program called AviEdit and is part of the Video For Windows Developer's Kit.
After I rendered a specific scene that I wanted to add audio to, the output file from 3D Studio was an Autodesk FLIC. A FLIC file is the type of animation file output by 3D Studio and is comprised of a series of GIF images played in succession. Since GIF images are not a compressed image format, it is then easy to see that a file consisting of a series of GIF images will be a very large file. By using the program VidEdit, I am able to not only convert the large FLIC file to an AVI file but I am also able to compress the animation using the CINEPAK compression option. Though this process takes some time to perform, it is well worth the time spent in that it yields much smaller files.
Due to the nature of the Autodesk FLIC format, it is not possible to embed audio into a FLIC file. It was necessary to convert the FLIC files to the Windows AVI format since the AVI file can have audio embedded in it and can be compressed. By using CINEPAK compression on an AVI file, it was possible to have an AVI file 1/3 the size of the FLIC file.
Now that I have a compressed AVI file of the animation, I can use the Sound Recorder to add audio to the animation. I found that the best way to add audio to an animation is to add it to short animation sequences, maybe an segment consisting of 30 seconds of animation. This way, if I don't like the merging of the audio and the video, I can quickly do it again until I get the desired result.
The method I used to properly synchronize the audio and video was to play the animation and simultaneously record the dialogue for it using the Sound Recorder utility. Using this method, I could then play the animation and replay the audio file to see if I properly synchronized the audio and video before merging the two files together. This is why I suggested adding audio to short animation sequences so that in the event that the audio and video do not properly synchronize, the time required to try again is minimized. Another reason to use the short sequences is that once the audio is added to the video and the file is saved which contains both audio and video, the file can possibly be very large. The size of the file depends greatly on the recording quality used for the audio. Low sampling rates will yield smaller files, whereas higher sampling rates will yield much larger files.
Finally, to merge the AVI animation and the audio, I used AviEdit.
I first opened the animation file and then merged it with the
audio file. The result is an AVI file that has audio embedded
in it.
Networks can be a very useful tool for rendering large animations. 3D Studio has a network rendering feature which allows rendering jobs to be placed in a network queue where a machine on the network will take a job and render it. The drawback I found with 3D Studio's network rendering feature is that I was unable to tell multiple machines to render a job and output it as a complete animation, or FLIC. However, it is possible to manually tell the computers which segment of frames to render and then manually cut and paste the output FLICs into one FLIC. Ideally, the network rendering feature should divide the work for a single rendering job over all computers that are available to render. The version of 3D Studio that I used for my thesis, 3D Studio Version 4, does not fully utilize the power of network rendering but rather uses it as a means to allow multiple machines networked together to be used to render separate rendering jobs. The primary reason that the network rendering feature is available in the release is due to the hardware lock that is needed to run 3D Studio. With network rendering it is possible to have multiple machines render separate jobs but the network must be configured using a Master/Slave relationship between the machines. In this sense, the Master machine is the computer with the hardware lock installed and the Slave machines are those computers that do not have the hardware lock installed on them. This setup is beneficial in situations where there are a large number of rendering jobs to be done and it is desirable to use multiple computers to render them. The primary difference between Master and Slave mode is that a computer running in 3D Studio Slave mode is unable to do anything other than render network jobs. A computer in Master mode is capable of performing all features of 3D Studio. Basically, in this release of 3D Studio the network rendering feature is to allow a network of computers with the 3D Studio software installed on them to be used as a rendering farm but only the machine that has the hardware lock installed on it is fully capable of using 3D Studio. Slave mode is a special mode that 3D Studio will run in that only allows the computer to render network rendering jobs. It should be noted that the Autodesk recommends that the hardware lock be insured against theft since it is very expensive and without it 3D Studio will not run.
Ideally, the network rendering feature should take a network rendering job and divide the work among the available networked machines. It should take into account the speed of the various machines in the network and then distribute portions of the rendering job to the different machines. In a sense, the network rendering feature should be modeled after a parallel system where each computer is a node and the nodes communicate with one another to efficiently render the entire job.
Using a network consisting of a Pentium 100, a Pentium 75, and a 486DX4-100, I experimented with the network rendering feature to find its capabilities. By setting the Pentium 100 up as the Master and the other two computers as Slave machines, I found some interesting problems with the network rendering feature. The first problem was that if I assign a rendering job to be output as a FLIC format file, the work cannot be distributed across multiple machines on the network and only one machine can be used to render it. What I did discover though is that if the output of the rendering job is to be a series of GIFs then the network rendering feature will assign tasks to the computers on a first come first serve basis. The second problem that I found was that once I place a job in the network queue, I cannot use the Master computer to participate in the rendering. Compared with the network configuration I used, a more effective use of resources would be to have the slowest computer be the Master and have the faster computers be the Slave computers. This way since the Master computer can only tell the Slave computers what frames need rendered, the Slaves are then the only computers capable of automated rendering using the 3D Studio network rendering feature. It is possible to have the Master computer help render an animation job but it must be noted that I had to manually tell the Master which frames to render, say, 0 to 50 for example, and then submit a network rendering job of the remaining frames, frames 51 on. This way, the Slave computers will divide the work of the segment of frames that I submitted to them and the Master computer will have to render frames 0 to 50 by itself.
Once all the frames of the animation are rendered, the result using the network rendering feature is a series of GIFs stored on the Master computer's hard drive. It is now necessary to manually create a FLIC file from these GIFs. It should be noted that a FLIC is simply a large file containing a succession of GIFs. Unfortunately, the tools that I found to convert a series of GIFs into a FLIC are limited and can only create a low resolution FLIC of 320x200. This entire episode was an exercise in futility and is used to show the limitations of the provided network rendering in 3D Studio.
Once the animation is ready to be rendered, 3D Studio offers 4 polygonal based shading models to choose from. In an attempt to learn how computationally intensive rendering is, I profiled a ray-tracing program. Although ray-tracing and polygonal shading are fundamentally different, it is still an interesting and worthwhile investigation. To learn more about the demands of rendering, I profiled a minimal ray-tracing program. The program, minray.c, was a combined effort by Paul Heckbert, Darwyn Peachey, and Joe Cychosz to write a minimal ray tracer. A copy of the original source code and header ray.h, as well as the profiled functions are located in the appendices.
The program minray.c was divided into 6 individual functions
that were each tested on the DLX simulator. The appendices contain
the data obtained from profiling each of the original functions.
Also included are the results from altering the functions in an
attempt to increase the level of parallelism and decrease overall
execution speed. This chapter will also identify the bottlenecks
in the ray-tracing pipeline as observed from the minray.c program.
Here is a table illustrating the number of times each function
is called when minray is executed with the given ray.h file.
| Function | Frequency |
| vdot | 120978 |
| vcomb | 99408 |
| vunit | 15946 |
| intersect | 9011 |
| trace | 5998 |
| main | 1 |
Now that the frequency of each program function is known it will be beneficial to examine each of these functions using the DLX simulator to see how costly in terms of CPU resources each function is.
Since the vdot function is called the greatest number of times it is necessary to examine several forms of the function to determine which implementation is faster. The methods used in examining vdot were (1) testing the original function and (2) testing a version of the function implemented as a macro. The macro version of vdot ran over 200% faster than did the functional version. By implementing a macro in place of a function, it is possible to avoid costly branches and other overhead associated with function calls by allowing the compiler to straight-line the code. At a glance the differences can be summarized as follows.
| Program version | # float stalls | # integer operations | # float operations | total # operations | total # cycles |
| function | 9 | 87 | 5 | 92 | 113 |
| macro | 9 | 37 | 5 | 42 | 51 |
A more in-depth look at the atomic differences between the function
version and the macro version reveals that for the integer operations
alone, the macro implementation has fewer adds, loads, moves,
and stores which result in a faster piece of code than its function
counterpart.
The same approach used in profiling vdot was used in profiling vcomb with one exception - the original vcomb function uses a C structure data type. An added test was performed to determine the cost difference between using structures instead of using more variables. It has been suggested that when substituting more variables instead of structures, most compilers will treat each case the same and output similar machine code. With this in mind, the tests that were performed were (1) testing the original vcomb function, (2) testing a macro version of the function, and (3) testing the function version of vcomb without structures to compare with test (1). Summarized here are the results obtained from these tests.
| Program version | # float stalls | # integer operations | # float operations | total # operations | total # cycles |
| original function | 9 | 122 | 6 | 128 | 155 |
| macro | 9 | 42 | 6 | 48 | 51 |
| original w/o structures | 9 | 116 | 6 | 122 | 131 |
macro vs. original function:
A closer look at how the integer operations break down shows that the macro version requires half of the adds and nops, and only a small fraction of the loads and stores required by the function implementation.
structures vs. more variables
After a closer examination of the integer operation distribution, the version using more variables instead of structures requires a similar number of adds, no jumps, and nearly a third fewer loads and stores. However, the original version had nearly 20% fewer nops.
This function will be tested somewhat differently than the previous
two in that vunit calls both vdot and vcomb. Two versions of vunit
will be tested and are described as (1) the original function
version and (2) a version that uses only macros. The following
table will summarize the data obtained from the simulator.
| Program version | # float stalls | # integer operations | # float operations | total # operations | total # cycles |
| function | 36 | 246 | 13 | 259 | 331 |
| macro | 66 | 131 | 17 | 148 | 220 |
Investigating the differences in the integer operations between the 2 versions of vunit reveals that in the macro version there are fewer adds, and far fewer loads, stores and nops. The macro version does have more moves and traps than does the function version.
By inspecting the differences in the floating point operations between the versions shows that the macro implementation has more divides and more conversions from integer to decimal and may be attributed to the testing environment.
Testing for intersect consisted of profiling the original code.
The following table summarizes the findings.
| Program version | # float stalls | # integer operations | # float operations | total # operations | total # cycles |
| original | 185 | 1990 | 130 | 2120 | 2547 |
Testing for trace consisted of profiling the original code. Below
is a summary of the results obtained from the simulator.
| Program version | # float stalls | # integer operations | # float operations | total # operations | total # cycles |
| original | 0 | 122 | 0 | 122 | 141 |
When this code was profiled, the parameter corresponding to the level was passed as a 1.
The main function was tested in the following ways: (1) test the
original main function, and (2) test the main function along with
implementing the vdot macro which was chosen due to the high frequency
with which it is called during program execution. This table provides
the results from the tests.
| Program
version | # float stalls | # integer operations | # float operations | total # operations | total # cycles |
| functions | 46080 | 549971 | 20480 | 570451 | 698456 |
| vdot macro | 46080 | 493651 | 20480 | 514131 | 628824 |
One alteration is similar in both forms of main which is the replacement of the tan() function with the constant it equates to. Therefore, the overhead in performing a mundane calculation has been eliminated from the while loop in the main function of both test cases.
Due to the number of times vdot is referenced and due to the efficiency of the macro implementation of vdot it is surprising to see the small improvement gained by this testing scenario. Future testing will be done to implement other macros into the testing of main to determine their effect on program execution speed.
The compiler for the DLX simulator is not an optimizing compiler. Further enhancements could have been made to the profiled cases had the source code generated by the simulator been altered. By altering this machine code, only performance on the DLX simulator would have been noticed. The goal to strive for in enhancing the execution speed of this program is in software pipelining. By implementing techniques such as loop unrolling or straight line code in place of branching which correlates to larger basic program blocks, performance will be improved over the existing code.
Though 3D Studio uses polygonal rendering, it is still beneficial to see the computational overhead associated with ray-tracing.
During the course of this thesis I encountered many problems along the way. Most of the problems I encountered dealt with the immense computational power and space required by computer graphics. In the appendices, I've included work I've done on profiling a ray-tracing algorithm. There were also problems with using freeware programs in that they did not do what I expected or hoped them to be able to do. This section of the thesis will recognize the range of the problems I had and will explain my solution for the problems. There is no easy way to divide the problems into specific categories so I will simply discuss each one in no specific order.
One of the most common problems was in storing and transporting the large animation files between CERC and my home. At home I used a 486 DX4-100 and a Pentium 100 and at CERC I used a Pentium 100. When I would need to render a scene that would take a very long time, on order of several days of constant rendering time, I would take the job home and render it on my 486. Once the rendering was complete I would then want to take it to CERC in order to demonstrate it to my advisors. The first and foremost problem with this idea was that I had to figure out how to get a file that was well over 100 megabytes from my house to CERC. At home I have a 33.6 kb modem that I could have used to simply upload the large file to my CERC account and then download it once I was at CERC. Unfortunately, at 33.6 kb, I'd be a very old graduate student by the time it finished uploading! Another problem with the modem idea is that I have limited space available on my CERC account (15 megabytes maximum). The solution I used was to simply buy an Iomega Zip drive which can store 100 megabytes per disk. The drive is portable and easy to install to any PC. Another solution I could have used to transport the large animation file was to simply render smaller chunks of the entire animation. This, however, would not be a good solution because there is still over 100 megabytes of data to be transferred. Even if they are smaller files, they all still need to be sent to the PC at CERC.
In rendering the animation sequences, it is necessary to have a large amount of hard drive space for storage as well as memory swap space. A useful feature I used at home was to run 3D Studio under Windows 95 and use the networking feature of Windows to map a network drive on one of the other PC's in the apartment and thus store the large animation files to a network drive instead of to the local hard drive of the rendering machine. This allows the rendering computer to use all of its memory resources for rendering. The hard drive space on the rendering PC can be used for a very large swap file if the animation requires it. The only drawback is in using 3D Studio under Windows 95 because it is not entirely stable nor is it very fast under Windows. 3D Studio version 4 was not meant to be a Windows application but rather a DOS Protected Mode application.
When I began this thesis and had developed a rough set consisting of a fenced in area consisting of 7 deciduous trees, a person standing amongst the trees, and some birds flying around the trees, I thought that I had a very simple scene description to render. I was using a Pentium 100 with 32 megabytes of RAM and was rendering a scene that was 40 frames long. I figured that it would take roughly 30 seconds to render each frame as I was rendering the animation using the lowest possible detail settings with a screen resolution of 640x480. I had anti-aliasing turned off, flat shading, no shadows, two sided turned off, etc. Much to my surprise, the scene took an entire weekend to render! I found out that not only did the scene require all 32 megabytes of system memory, it also had created a 50 megabyte swap file. I determined that the cause of the large swap file were the trees. Each tree consisted of 20,000 faces! In comparison, the trees that I used in place of the 7 deciduous trees were 4 palm trees that consisted of only 1400 faces.
The problem encountered in converting the FLIC file to an AVI file was that I constantly kept running out of memory on a Pentium 100 with 32 megabytes of RAM and a maximum swap file of 700 megabytes. The original size of the FLIC file was 100 megabytes and I could not configure Windows 95 nor the program VidEdit to convert the large animation file. The program VidEdit is a utility that is available in the Video for Windows Developer's Kit. I had repeatedly gotten the error message that the system was dangerously low on system resources! The solution to convert the files from FLIC to AVI was to render the entire animation sequence again except I only rendered 100 frame segments. The resulting FLIC files were 20 megabytes each which allowed me to convert them to AVI files using the VidEdit utility. The resulting AVI files were 1/3 the size of the original 20 megabyte FLIC files. These AVI files are only animation files and do not have audio embedded in them yet.
Another software based problem was in using a routine, or program, known as an IPAS routine in 3D Studio. The routine I was trying to use is a freeware IPAS routine that is called Bones which should allow me to use a skeleton frame and then apply a high detail mesh object to move as the skeleton does. I found several problems with this IPAS which include the inability to select certain portions of a high detail mesh and instruct the portion to move as does the skeletal portion of the skeleton. An example would be to have a skeleton model of an arm and then animate the skeletal arm. Next, select the arm of a high detail mesh of a human and have those faces of the high detail arm move with respect to the skeletal arm. Unfortunately, the IPAS I was using does not allow me to do this. I am only allowed to select the parent object of the entire skeleton hierarchy and then select the high detail model that I want to bend. The results I've obtained were not impressive. The resulting animation looked like the human model had bones and joints in the wrong places and looked very painful! The solution for this problem is 3D Studio Max with the Character Studio plug-in. This package was ordered well after the thesis was started and was simply ordered after I unexpectedly saw a demo. Unfortunately, 3D Studio Max has not yet arrived and there are less than three weeks until my defense. I will talk more about the potential of 3D Studio Max and the Character Studio plug-in in the Future Work chapter of this paper.
After working on this thesis for almost a year, I've only been able to scratch the surface of the potential of computer animation. My only regret from working on my thesis is that the days simply are not long enough! Every day I thought of new things I wanted to try and only got to work on some of them. Luckily, while working on my thesis I had the opportunity to create animation sequences for some other projects which allowed me to try a few of these ideas. This chapter contains a discussion about my work as well as some ideas of future work to build on what I've done.
One of the regrets that I have is that I was unable to get the characters to walk. Unfortunately, with 3D Studio Version 4 there is no easy way to do this and I was forced to have my characters slide instead of walk. For future work I would like to enhance the work I've done by using 3D Studio Max with the Character Studio plug-in. With the Character Studio plug-in I will be able to use Footstep Driven Keyframe Animation which allows me to place the footprints for the characters and the program will then construct the key frames to provide me with a rough sketch of their movement. I will then be able to add body swaying, strutting, skipping, dancing, or jumping with the click of a button.
Perhaps the most important feature in creating realistic computer animation is the Inverse Kinematics feature which this version of 3D Studio did not live up to my expectations or needs. I was hoping to be able to use Inverse Kinematics to quickly and easily create a realistic animation. Unfortunately the limitation of the version I was using did not allow me to import more than one object hierarchy into the Inverse Kinematics program. This prevented me for being able to utilize the potential of Inverse Kinematics. The good news for future work is that the Character Studio program allows for Advanced Inverse Kinematics which will allow me to dynamically attach and detach an object to a character's hands, such as throwing a Frisbee or catching a ball. It will also be possible to have multiple object hierarchies interact so that I can animate the objects with respect to each other. The potential here is easily seen from the example of wanting the main character in the story to sit under the tree. Now, it will be possible to use Inverse Kinematics to animate the scene and have the character sit under the tree at a fraction of the time that manually rotating and moving each limb of the character took. This is possible since both object hierarchies will be visible from within the Inverse Kinematics program.
Perhaps the most noticeable feature in my animation is that the humans are simple stick figures. I would like to use the Character Studio to be able to take the skeletons and apply a high detail mesh as well as be able to add muscle bulging and tendon effects to my characters.
Animation is a powerful medium to convey thoughts and ideas but adding audio to the animation is a much more powerful means to convey those thoughts and ideas. My method for adding audio to the animation is elementary but effective. There are several good products on the market that offer a much broader base for adding audio. The two products that I ordered to add audio tracks to the animation are 3D Studio Max and Autodesk's Animator Studio. Both packages have an audio track editor so the audio can be properly fit to the animation. By using repetition, or slightly changing the audio properties by changing the playback speed, the audio can then be fit to the animation. Unfortunately these products have not yet arrived as of 2 weeks prior to my defense.
This thesis dealt with the process of creating a computer generated animation by exploring areas such as the computational aspects of rendering, the use of a modeling package and the tools needed to create an animation, various shading models, memory requirements, video conversion and compression, and how to add audio effects to an animation. While developing my thesis I tried to convey as many aspects as possible dealing with the entire process of creating an animation. By showing these aspects, it is easy to see that creating a computer animation, or multimedia product, is a complex process that is hungry for computational power and memory as I have demonstrated by profiling the ray-tracing program. The exciting part of the future of computer animation is that the current state of computer animation is ready for tomorrow's processors. There is no limit in sight for the potential of computer animation.
As a mechanical engineer, I can see even greater potential for the use of computer animation in the area of accident reconstruction. By creating a 3D model of the scene of an accident combined with information gathered from witnesses, and then incorporating physical laws into creating the reenactment, we will be able to relive almost any accident.
Computer animation has become a powerful tool for expressing ideas and information. We see stunning visual effects in modern movies, 3Dimensional weather maps on the local news, and video games - none of which would be possible without computer generated animation. I have truly enjoyed working on my thesis and sincerely hope to continue working with computer animation.
Autodesk. 3D Studio Version 4 Manuals.
Autodesk, Inc., 1994.
Autodesk. Animator Pro Manuals.
Autodesk Inc., 1994.
Cotta Vaz, Mark. Industrial Light & Magic : Into the Digital Realm.
Del-Rey, 1996.
Daly, Steve and Lasseter, John. Toy Story: The Art and Making of the Animated
Film. New York: Hyperion, 1995.
Finch, Christopher. The Art of Walt Disney from Mickey Mouse to the
Magic Kingdoms. Harry N. Abrams Inc., 1995.
Foley, James D. Introduction to Computer Graphics.
Addison-Wesley, 1994.
Glassner, Andrew. An Introduction to Ray-tracing.
Academic Press 1989.
Karinthi, Raghu. "Accurate Z-Buffer Rendering." Graphics Gems V.
Academic Press 1995.
Kinetix. 3D Studio Max and Character Studio [CD-Rom].
Autodesk Inc., 1996.
Laybourne, Kit. The Animation Book.
Crown Publishers, Inc., 1979.
Michenaud, Jean-Michael, et al. (Producer), & Schultz, John (Director). (1995).
The Making of Jurassic Park [Film]. Universal City, CA:
MCA/Universal Home Video.
Ray, Sukumar. "Ha Ja Ba Ra La", (in Bengali).
Cygnet Press, Calcutta, India. 8th Cygnet edition, 1972.
Thomas, Bob. Disney's Art of Animation from Mickey Mouse to Beauty and the
Beast. Hyperion, 1991.
/* minimal ray tracer, hybrid version - 888 tokens
* Paul Heckbert, ucbvax!pixar!ph, 13 Jun 87
* Using tricks from Darwyn Peachey and Joe Cychosz. */
#define TOL 1e-7
#define AMBIENT vec U, black, amb
#define SPHERE struct sphere {vec cen, color; double rad, kd, ks, kt, kl, ir} \
*s, *best, sph[]
typedef struct {double x, y, z} vec;
#include "ray.h"
yx;
double u, b, tmin, sqrt(), tan();
double vdot(A, B)
vec A, B;
{
return A.x*B.x + A.y*B.y + A.z*B.z;
}
vec vcomb(a, A, B) /* aA+B */
double a;
vec A, B;
{
B.x += a*A.x;
B.y += a*A.y;
B.z += a*A.z;
return B;
}
vec vunit(A)
vec A;
{
return vcomb(1./sqrt(vdot(A, A)), A, black);
}
struct sphere *intersect(P, D)
vec P, D;
{
best = 0;
tmin = 1e30;
s = sph+NSPHERE;
while (s-->sph)
b = vdot(D, U = vcomb(-1., P, s->cen)),
u = b*b-vdot(U, U)+s->rad*s->rad,
u = u>0 ? sqrt(u) : 1e31,
u = b-u>TOL ? b-u : b+u,
tmin = u>=TOL && u<tmin ?
best = s, u : tmin;
return best;
}
vec trace(level, P, D)
vec P, D;
{
double d, eta, e;
vec N, color;
struct sphere *s, *l;
if (!level--) return black;
if (s = intersect(P, D));
else return amb;
color = amb;
eta = s->ir;
d = -vdot(D, N = vunit(vcomb(-1., P = vcomb(tmin, D, P), s->cen)));
if (d<0)
N = vcomb(-1., N, black),
eta = 1/eta,
d = -d;
l = sph+NSPHERE;
while (l-->sph)
if ((e = l->kl*vdot(N, U = vunit(vcomb(-1., P, l->cen)))) > 0 &&
intersect(P, U)==l)
color = vcomb(e, l->color, color);
U = s->color;
color.x *= U.x;
color.y *= U.y;
color.z *= U.z;
e = 1-eta*eta*(1-d*d);
/* the following is non-portable: we assume right to left arg evaluation.
* (use U before call to trace, which modifies U) */
return vcomb(s->kt,
e>0 ? trace(level, P, vcomb(eta, D, vcomb(eta*d-sqrt(e), N, black)))
: black,
vcomb(s->ks, trace(level, P, vcomb(2*d, N, D)),
vcomb(s->kd, color, vcomb(s->kl, U, black))));
}
main()
{
printf("%d %d\n", SIZE, SIZE);
while (yx<SIZE*SIZE)
U.x = yx%SIZE-SIZE/2,
U.z = SIZE/2-yx++/SIZE,
U.y = SIZE/2/tan(AOV/114.5915590261), /* 360/PI~=114 */
U = vcomb(255., trace(DEPTH, black, vunit(U)), black),
printf("%.0f %.0f %.0f\n", U); /* yowsa! non-portable! */
}
/* ray.h for test1, first test scene */
#define DEPTH 3 /* max ray tree depth */
#define SIZE 8 /* resolution of picture in x and y */
#define AOV 25 /* total angle of view in degrees */
#define NSPHERE 1 /* number of spheres */
AMBIENT = {.02, .02, .02}; /* ambient light color */
/* sphere: x y z r g b rad kd ks kt kl ir */
SPHERE = {
0., 6., .5, 1., 1., 1., .9, .05, .2, .85, 0., 1.7,
-1., 8., -.5, 1., .5, .2, 1., .7, .3, 0., .05, 1.2,
1., 8., -.5, .1, .8, .8, 1., .3, .7, 0., 0., 1.2,
3., -6., 15., 1., .8, 1., 7., 0., 0., 0., .6, 1.5,
-3., -3., 12., .8, 1., 1., 5., 0., 0., 0., .5, 1.5,
};
vdot
/* vdot_macro.c */
typedef struct {double x,y,z;} vec;
#define vdot(result,A,B) result = A.x*B.x + A.y*B.y + A.z*B.z
main()
{
vec A,B;
double result;
vdot(result,A,B);
}
vcomb
/* vcomb_profile_macro.c */
typedef struct {double x,y,z;} vec;
#define vcomb(a, A, B) \
B.x += a*A.x; \
B.y += a*A.y; \
B.z += a*A.z
main()
{
vec A,B;
double a;
vcomb(a,A,B);
}
vunit
/* vunit_profile_macro.c */
typedef struct {double x,y,z;} vec;
#define vdot(result_vdot,A,B) result_vdot = A.x*B.x + A.y*B.y + A.z*B.z
#define vcomb(a, A, B) \
B.x += a*A.x; \
B.y += a*A.y; \
B.z += a*A.z
#define vunit(result_vunit,A) \
vdot(result_vdot,A,A); \
vcomb(1.0/sqrt(result_vdot),A,A); \
result_vunit = A
main()
{
vec result_vunit,data;
double result_vdot;
vunit(result_vunit,data);
}
main profile with vdot as a macro
/* main_profile_vdot_macro.c */
#define TOL 1e-7
#define AMBIENT vec U, black, amb
#define vdot(vdot_result,A,B) vdot_result = A.x*B.x + A.y*B.y + A.z*B.z
#define SPHERE struct sphere {vec cen, color; double rad, kd, ks, kt, kl, ir;}*s, *best, sph[]
typedef struct {double x, y, z;} vec;
#include "ray.h"
yx;
#include <math.h>
double u, b, tmin, sqrt(), tan(), vdot_result;
int vdot_counter=0, vunit_counter=0, inter_counter=0, trace_counter=0,
vcomb_counter=0;
vec vcomb(a, A, B) /* aA+B */
double a;
vec A, B;
{
vcomb_counter++;
B.x += a*A.x;
B.y += a*A.y;
B.z += a*A.z;
return B;
}
vec vunit(A)
vec A;
{
vunit_counter++;
vdot(vdot_result, A, A);
return vcomb(1./sqrt(vdot_result), A, black);
}
struct sphere *intersect(P, D)
vec P, D;
{
inter_counter++;
best = 0;
tmin = 1e30;
s = sph+NSPHERE;
while (s-->sph)
U = vcomb(-1., P, s->cen),
vdot(vdot_result, D, U),
b = vdot_result,
vdot(vdot_result, U, U),
u = b*b-vdot_result+s->rad*s->rad,
u = u>0 ? sqrt(u) : 1e31,
u = b-u>TOL ? b-u : b+u,
tmin = u>=TOL && u<tmin ?
best = s, u : tmin;
return best;
}
vec trace(level, P, D)
vec P, D;
{
double d, eta, e;
vec N, color;
struct sphere *s, *l;
trace_counter++;
if (!level--) return black;
if (s = intersect(P, D));
else return amb;
color = amb;
eta = s->ir;
P = vcomb(tmin, D, P);
N = vunit(vcomb(-1.,P, s->cen));
vdot(vdot_result, D, N);
d = -vdot_result;
if (d<0)
N = vcomb(-1., N, black),
eta = 1/eta,
d = -d;
l = sph+NSPHERE;
while (l-->sph)
U = vunit(vcomb(-1., P, l->cen)),
vdot(vdot_result, N, U);
if ((e = l->kl*vdot_result) > 0 && intersect(P, U)==l)
color = vcomb(e, l->color, color);
U = s->color;
color.x *= U.x;
color.y *= U.y;
color.z *= U.z;
e = 1-eta*eta*(1-d*d);
/* the following is non-portable: we assume right to left arg evaluation.
* (use U before call to trace, which modifies U) */
return vcomb(s->kt,
e>0 ? trace(level, P, vcomb(eta, D, vcomb(eta*d-sqrt(e), N, black)))
: black,
vcomb(s->ks, trace(level, P, vcomb(2*d, N, D)),
vcomb(s->kd, color, vcomb(s->kl, U, black))));
}
main()
{
vec result_vunit, result_trace;
while (yx<SIZE*SIZE)
{
U.x = yx%SIZE-SIZE/2;
U.z = SIZE/2-yx++/SIZE;
U.y = SIZE/0.4456;
result_vunit = vunit(U);
result_trace = trace(DEPTH, black, result_vunit);
U = vcomb(255.,result_trace, black);
/* printf("%.0f %.0f %.0f\n",U); */
/*
U = vcomb(255., trace(DEPTH, black, vunit(U)), black);
*/
}
printf("vunit_counter = %d\nvdot_counter = %d\nvcomb_counter
= %i\n",vunit_counter, vdot_counter, vcomb_counter);
printf("trace_counter = %d\ninter_counter = %d\n",trace_counter, inter_counter);
}
vdot.c
/* vdot.c */
typedef struct {double x,y,z;} vec;
double vdot(A,B)
vec A,B;
{
return A.x*B.x + A.y*B.y + A.z*B.z;
}
main()
{
double result;
vec A,B;
result = vdot(A,B);
}
vcomb.c
/* vcomb.c */
typedef struct {double x,y,z;} vec;
vec vcomb(a, A, B)
double a;
vec A, B;
{
B.x += a*A.x;
B.y += a*A.y;
B.z += a*A.z;
return B;
}
main()
{
vec result,A,B;
result = vcomb(1.0,A,B);
}
vcomb without structures
/* vcomb_profile_nostruct.c */
void vcomb(a,ax,ay,az,bx,by,bz)
double a,ax,ay,az,bx,by,bz;
{
bx += a*ax;
by += a*ay;
bz += a*az;
}
main()
{
double bx,by,bz;
vcomb(1.0,2.0,3.0,4.0,bx,by,bz);
}
vunit_profile.c
/* vunit_profile.c */
typedef struct {double x,y,z;} vec;
double vdot(A,B)
vec A,B;
{
return A.x*B.x + A.y*B.y + A.z*B.z;
}
vec vcomb(a, A, B)
double a;
vec A, B;
{
B.x += a*A.x;
B.y += a*A.y;
B.z += a*A.z;
return B;
}
vec vunit(A)
vec A;
{
return vcomb(1./sqrt(vdot(A,A)),A,A);
}
main()
{
vec result, data;
result = vunit(data);
}
trace_profile.c
/* trace_profile.c */
#define TOL 1e-7
#define AMBIENT vec U, black, amb
#define SPHERE struct sphere {vec cen, color; double rad, kd, ks, kt, kl, ir;}*s, *best, sph[]
typedef struct {double x, y, z;} vec;
#include "ray.h"
double u, b, tmin, sqrt(), tan();
double vdot(A, B)
vec A, B;
{
return A.x*B.x + A.y*B.y + A.z*B.z;
}
vec vcomb(a, A, B) /* aA+B */
double a;
vec A, B;
{
B.x += a*A.x;
B.y += a*A.y;
B.z += a*A.z;
return B;
}
vec vunit(A)
vec A;
{
return vcomb(1./sqrt(vdot(A, A)), A, black);
}
struct sphere *intersect(P, D)
vec P, D;
{
best = 0;
tmin = 1e30;
s = sph+NSPHERE;
while (s-->sph)
b = vdot(D, U = vcomb(-1., P, s->cen)),
u = b*b-vdot(U, U)+s->rad*s->rad,
u = u>0 ? sqrt(u) : 1e31,
u = b-u>TOL ? b-u : b+u,
tmin = u>=TOL && u<tmin ?
best = s, u : tmin;
return best;
}
vec trace(level, P, D)
vec P, D;
{
double d, eta, e;
vec N, color;
struct sphere *s, *l;
if (!level--) return black;
if (s = intersect(P, D));
else return amb;
color = amb;
eta = s->ir;
d = -vdot(D, N = vunit(vcomb(-1., P = vcomb(tmin, D, P), s->cen)));
if (d<0)
N = vcomb(-1., N, black),
eta = 1/eta,
d = -d;
l = sph+NSPHERE;
while (l-->sph)
if ((e = l->kl*vdot(N, U = vunit(vcomb(-1., P, l->cen)))) > 0 &&
intersect(P, U)==l)
color = vcomb(e, l->color, color);
U = s->color;
color.x *= U.x;
color.y *= U.y;
color.z *= U.z;
e = 1-eta*eta*(1-d*d);
/* the following is non-portable: we assume right to left arg evaluation.
* (use U before call to trace, which modifies U) */
return vcomb(s->kt,
e>0 ? trace(level, P, vcomb(eta, D, vcomb(eta*d-sqrt(e), N, black)))
: black,
vcomb(s->ks, trace(level, P, vcomb(2*d, N, D)),
vcomb(s->kd, color, vcomb(s->kl, U, black))));
}
main()
{
vec A,B,result;
int level = 1;
result = trace(1,A,B);
}sphere intersect profile
/* intersect_profile.c */
#define TOL 1e-7
#define AMBIENT vec U, black, amb
#define SPHERE struct sphere {vec cen, color; double rad, kd, ks, kt, kl, ir;}*s, *best, sph[]
typedef struct {double x, y, z;} vec;
#include "ray.h"
double u, b, tmin, sqrt(), tan();
double vdot(A, B)
vec A, B;
{
return A.x*B.x + A.y*B.y + A.z*B.z;
}
vec vcomb(a, A, B) /* aA+B */
double a;
vec A, B;
{
B.x += a*A.x;
B.y += a*A.y;
B.z += a*A.z;
return B;
}
vec vunit(A)
vec A;
{
return vcomb(1./sqrt(vdot(A, A)), A, black);
}
struct sphere *intersect(P, D)
vec P, D;
{
best = 0;
tmin = 1e30;
s = sph+NSPHERE;
while (s-->sph)
b = vdot(D, U = vcomb(-1., P, s->cen)),
u = b*b-vdot(U, U)+s->rad*s->rad,
u = u>0 ? sqrt(u) : 1e31,
u = b-u>TOL ? b-u : b+u,
tmin = u>=TOL && u<tmin ?
best = s, u : tmin;
return best;
}
main()
{
struct sphere *result;
vec A,B;
result = intersect(A,B);
}main_profile.c
/* main_profile.c */
#define TOL 1e-7
#define AMBIENT vec U, black, amb
#define SPHERE struct sphere {vec cen, color; double rad, kd, ks, kt, kl, ir;}*s, *best, sph[]
typedef struct {double x, y, z;} vec;
#include "ray.h"
yx;
#include <math.h>
double u, b, tmin, sqrt(), tan();
int vdot_counter=0, vunit_counter=0, inter_counter=0, trace_counter=0,
vcomb_counter=0;
double vdot(A, B)
vec A, B;
{
vdot_counter++;
return A.x*B.x + A.y*B.y + A.z*B.z;
}
vec vcomb(a, A, B) /* aA+B */
double a;
vec A, B;
{
vcomb_counter++;
B.x += a*A.x;
B.y += a*A.y;
B.z += a*A.z;
return B;
}
vec vunit(A)
vec A;
{
vunit_counter++;
return vcomb(1./sqrt(vdot(A, A)), A, black);
}
struct sphere *intersect(P, D)
vec P, D;
{
inter_counter++;
best = 0;
tmin = 1e30;
s = sph+NSPHERE;
while (s-->sph)
b = vdot(D, U = vcomb(-1., P, s->cen)),
u = b*b-vdot(U, U)+s->rad*s->rad,
u = u>0 ? sqrt(u) : 1e31,
u = b-u>TOL ? b-u : b+u,
tmin = u>=TOL && u<tmin ?
best = s, u : tmin;
return best;
}
vec trace(level, P, D)
vec P, D;
{
double d, eta, e;
vec N, color;
struct sphere *s, *l;
trace_counter++;
if (!level--) return black;
if (s = intersect(P, D));
else return amb;
color = amb;
eta = s->ir;
d = -vdot(D, N = vunit(vcomb(-1., P = vcomb(tmin, D, P), s->cen)));
if (d<0)
N = vcomb(-1., N, black),
eta = 1/eta,
d = -d;
l = sph+NSPHERE;
while (l-->sph)
if ((e = l->kl*vdot(N, U = vunit(vcomb(-1., P, l->cen)))) > 0 &&
intersect(P, U)==l)
color = vcomb(e, l->color, color);
U = s->color;
color.x *= U.x;
color.y *= U.y;
color.z *= U.z;
e = 1-eta*eta*(1-d*d);
/* the following is non-portable: we assume right to left arg evaluation.
* (use U before call to trace, which modifies U) */
return vcomb(s->kt,
e>0 ? trace(level, P, vcomb(eta, D, vcomb(eta*d-sqrt(e), N, black)))
: black,
vcomb(s->ks, trace(level, P, vcomb(2*d, N, D)),
vcomb(s->kd, color, vcomb(s->kl, U, black))));
}
main()
{
vec result_vunit, result_trace;
while (yx<SIZE*SIZE)
{
U.x = yx%SIZE-SIZE/2;
U.z = SIZE/2-yx++/SIZE;
U.y = SIZE/0.4456;
result_vunit = vunit(U);
result_trace = trace(DEPTH, black, result_vunit);
U = vcomb(255.,result_trace, black);
/* printf("%.0f %.0f %.0f\n",U); */
/*
U = vcomb(255., trace(DEPTH, black, vunit(U)), black);
*/
}
printf("vunit_counter = %d\nvdot_counter = %d\nvcomb_counter
= %i\n",vunit_counter, vdot_counter, vcomb_counter);
printf("trace_counter = %d\ninter_counter = %d\n",trace_counter, inter_counter);
}
vdot macro profile
(dlxsim) load vdot_profile_macro.s
Heap (for malloc) begins at 0x39C
(dlxsim) go _main
TRAP #0 received
(dlxsim) stats
Memory size: 65536 bytes.
Floating Point Hardware Configuration
1 add/subtract units, latency = 2 cycles
1 divide units, latency = 19 cycles
1 multiply units, latency = 5 cycles
Load Stalls = 0
Floating Point Stalls = 9
No branch instructions executed.
Pending Floating Point Operations:
none.
INTEGER OPERATIONS
==================
ADD 3 ADDI 1 ADDU 0 ADDUI 1
AND 0 ANDI 0 BEQZ 0 BFPF 0
BFPT 0 BNEZ 0 J 0 JAL 1
JALR 0 JR 0 LB 0 LBU 0
LD 9 LF 0 LH 0 LHI 1
LHU 0 LW 2 MOVD 0 MOVF 0
MOVFP2I 0 MOVI2FP 0 MOVI2S 0 MOVS2I 0
OR 0 ORI 0 RFE 0 SB 0
SD 4 SEQ 0 SEQI 0 SEQU 0
SEQUI 0 SF 0 SGE 0 SGEI 0
SGEU 0 SGEUI 0 SGT 0 SGTI 0
SGTU 0 SGTUI 0 SH 0 SLE 0
SLEI 0 SLEU 0 SLEUI 0 SLL 0
SLLI 0 SLT 0 SLTI 0 SLTU 0
SLTUI 0 SNE 0 SNEI 0 SNEU 0
SNEUI 0 SRA 0 SRAI 0 SRL 0
SRLI 0 SUB 0 SUBI 0 SUBU 0
SUBUI 0 SW 2 TRAP 1 XOR 0
XORI 0 NOP 12
Total integer operations = 37
FLOATING POINT OPERATIONS
=========================
ADDD 2 ADDF 0 CVTD2F 0 CVTD2I 0
CVTF2D 0 CVTF2I 0 CVTI2D 0 CVTI2F 0
DIV 0 DIVD 0 DIVF 0 DIVU 0
EQD 0 EQF 0 GED 0 GEF 0
GTD 0 GTF 0 LED 0 LEF 0
LTD 0 LTF 0 MULT 0 MULTD 3
MULTF 0 MULTU 0 NED 0 NEF 0
SUBD 0 SUBF 0
Total floating point operations = 5
Total operations = 42
Total cycles = 51vcomb
macro profile
(dlxsim) load vcomb_profile_macro.s
Heap (for malloc) begins at 0x3B4
(dlxsim) go _main
TRAP #0 received
(dlxsim) stats
Memory size: 65536 bytes.
Floating Point Hardware Configuration
1 add/subtract units, latency = 2 cycles
1 divide units, latency = 19 cycles
1 multiply units, latency = 5 cycles
Load Stalls = 0
Floating Point Stalls = 9
No branch instructions executed.
Pending Floating Point Operations:
none.
INTEGER OPERATIONS
==================
ADD 3 ADDI 1 ADDU 0 ADDUI 1
AND 0 ANDI 0 BEQZ 0 BFPF 0
BFPT 0 BNEZ 0 J 0 JAL 1
JALR 0 JR 0 LB 0 LBU 0
LD 11 LF 0 LH 0 LHI 1
LHU 0 LW 2 MOVD 0 MOVF 0
MOVFP2I 0 MOVI2FP 0 MOVI2S 0 MOVS2I 0
OR 0 ORI 0 RFE 0 SB 0
SD 5 SEQ 0 SEQI 0 SEQU 0
SEQUI 0 SF 0 SGE 0 SGEI 0
SGEU 0 SGEUI 0 SGT 0 SGTI 0
SGTU 0 SGTUI 0 SH 0 SLE 0
SLEI 0 SLEU 0 SLEUI 0 SLL 0
SLLI 0 SLT 0 SLTI 0 SLTU 0
SLTUI 0 SNE 0 SNEI 0 SNEU 0
SNEUI 0 SRA 0 SRAI 0 SRL 0
SRLI 0 SUB 0 SUBI 0 SUBU 0
SUBUI 0 SW 2 TRAP 1 XOR 0
XORI 0 NOP 14
Total integer operations = 42
FLOATING POINT OPERATIONS
=========================
ADDD 3 ADDF 0 CVTD2F 0 CVTD2I 0
CVTF2D 0 CVTF2I 0 CVTI2D 0 CVTI2F 0
DIV 0 DIVD 0 DIVF 0 DIVU 0
EQD 0 EQF 0 GED 0 GEF 0
GTD 0 GTF 0 LED 0 LEF 0
LTD 0 LTF 0 MULT 0 MULTD 3
MULTF 0 MULTU 0 NED 0 NEF 0
SUBD 0 SUBF 0
Total floating point operations = 6
Total operations = 48
Total cycles = 57vunit
macro profile
(dlxsim) load vunit_profile_macro.s
Heap (for malloc) begins at 0x530
(dlxsim) go _main
TRAP #0 received
(dlxsim) stats
Memory size: 65536 bytes.
Floating Point Hardware Configuration
1 add/subtract units, latency = 2 cycles
1 divide units, latency = 19 cycles
1 multiply units, latency = 5 cycles
Load Stalls = 6
Floating Point Stalls = 66
No branch instructions executed.
Pending Floating Point Operations:
none.
INTEGER OPERATIONS
==================
ADD 6 ADDI 2 ADDU 0 ADDUI 4
AND 0 ANDI 0 BEQZ 0 BFPF 0
BFPT 0 BNEZ 0 J 0 JAL 4
JALR 0 JR 3 LB 0 LBU 0
LD 18 LF 0 LH 0 LHI 4
LHU 0 LW 18 MOVD 0 MOVF 0
MOVFP2I 0 MOVI2FP 3 MOVI2S 0 MOVS2I 0
OR 0 ORI 0 RFE 0 SB 0
SD 7 SEQ 0 SEQI 0 SEQU 0
SEQUI 0 SF 0 SGE 0 SGEI 0
SGEU 0 SGEUI 0 SGT 0 SGTI 0
SGTU 0 SGTUI 0 SH 0 SLE 0
SLEI 0 SLEU 0 SLEUI 0 SLL 0
SLLI 0 SLT 0 SLTI 0 SLTU 0
SLTUI 0 SNE 0 SNEI 0 SNEU 0
SNEUI 0 SRA 0 SRAI 0 SRL 0
SRLI 0 SUB 0 SUBI 3 SUBU 0
SUBUI 0 SW 18 TRAP 4 XOR 0
XORI 0 NOP 37
Total integer operations = 131
FLOATING POINT OPERATIONS
=========================
ADDD 5 ADDF 0 CVTD2F 0 CVTD2I 0
CVTF2D 0 CVTF2I 0 CVTI2D 3 CVTI2F 0
DIV 0 DIVD 3 DIVF 0 DIVU 0
EQD 0 EQF 0 GED 0 GEF 0
GTD 0 GTF 0 LED 0 LEF 0
LTD 0 LTF 0 MULT 0 MULTD 6
MULTF 0 MULTU 0 NED 0 NEF 0
SUBD 0 SUBF 0
Total floating point operations = 17
Total operations = 148
Total cycles = 220main
profile with vdot as a macro
(dlxsim) load main_profile_vdot_macro.s
Heap (for malloc) begins at 0x2744
(dlxsim) go _main
TRAP #0 received
(dlxsim) stats
Memory size: 65536 bytes.
Floating Point Hardware Configuration
1 add/subtract units, latency = 2 cycles
1 divide units, latency = 19 cycles
1 multiply units, latency = 5 cycles
Load Stalls = 68613
Floating Point Stalls = 46080
Branches: total 4097, taken 2049 (50.01%), untaken 2048 (49.99%)
Pending Floating Point Operations:
none.
INTEGER OPERATIONS
==================
ADD 18435 ADDI 23555 ADDU 0 ADDUI 21513
AND 0 ANDI 0 BEQZ 0 BFPF 0
BFPT 0 BNEZ 4097 J 5120 JAL 5123
JALR 0 JR 5122 LB 0 LBU 0
LD 35841 LF 0 LH 0 LHI 21513
LHU 0 LW 115726 MOVD 0 MOVF 0
MOVFP2I 0 MOVI2FP 4096 MOVI2S 0 MOVS2I 0
OR 0 ORI 0 RFE 0 SB 0
SD 20481 SEQ 0 SEQI 0 SEQU 0
SEQUI 0 SF 0 SGE 2048 SGEI 0
SGEU 0 SGEUI 0 SGT 1025 SGTI 0
SGTU 0 SGTUI 0 SH 0 SLE 0
SLEI 0 SLEU 0 SLEUI 0 SLL 0
SLLI 1024 SLT 0 SLTI 0 SLTU 0
SLTUI 0 SNE 1024 SNEI 0 SNEU 0
SNEUI 0 SRA 0 SRAI 2048 SRL 0
SRLI 0 SUB 2048 SUBI 5122 SUBU 0
SUBUI 0 SW 107535 TRAP 1027 XOR 0
XORI 0 NOP 90128
Total integer operations = 493651
FLOATING POINT OPERATIONS
=========================
ADDD 8192 ADDF 0 CVTD2F 0 CVTD2I 0
CVTF2D 0 CVTF2I 0 CVTI2D 2048 CVTI2F 0
DIV 0 DIVD 1024 DIVF 0 DIVU 0
EQD 0 EQF 0 GED 0 GEF 0
GTD 0 GTF 0 LED 0 LEF 0
LTD 0 LTF 0 MULT 0 MULTD 9216
MULTF 0 MULTU 0 NED 0 NEF 0
SUBD 0 SUBF 0
Total floating point operations = 20480
Total operations = 514131
Total cycles = 628824
vdot profile
(dlxsim) load vdot_profile.s
Heap (for malloc) begins at 0x464
(dlxsim) go _main
TRAP #0 received
(dlxsim) stats
Memory size: 65536 bytes.
Floating Point Hardware Configuration
1 add/subtract units, latency = 2 cycles
1 divide units, latency = 19 cycles
1 multiply units, latency = 5 cycles
Load Stalls = 12
Floating Point Stalls = 9
No branch instructions executed.
Pending Floating Point Operations:
none.
INTEGER OPERATIONS
==================
ADD 5 ADDI 3 ADDU 0 ADDUI 1
AND 0 ANDI 0 BEQZ 0 BFPF 0
BFPT 0 BNEZ 0 J 1 JAL 2
JALR 0 JR 1 LB 0 LBU 0
LD 10 LF 0 LH 0 LHI 1
LHU 0 LW 17 MOVD 0 MOVF 0
MOVFP2I 2 MOVI2FP 0 MOVI2S 0 MOVS2I 0
OR 0 ORI 0 RFE 0 SB 0
SD 4 SEQ 0 SEQI 0 SEQU 0
SEQUI 0 SF 0 SGE 0 SGEI 0
SGEU 0 SGEUI 0 SGT 0 SGTI 0
SGTU 0 SGTUI 0 SH 0 SLE 0
SLEI 0 SLEU 0 SLEUI 0 SLL 0
SLLI 0 SLT 0 SLTI 0 SLTU 0
SLTUI 0 SNE 0 SNEI 0 SNEU 0
SNEUI 0 SRA 0 SRAI 0 SRL 0
SRLI 0 SUB 0 SUBI 1 SUBU 0
SUBUI 0 SW 19 TRAP 1 XOR 0
XORI 0 NOP 19
Total integer operations = 87
FLOATING POINT OPERATIONS
=========================
ADDD 2 ADDF 0 CVTD2F 0 CVTD2I 0
CVTF2D 0 CVTF2I 0 CVTI2D 0 CVTI2F 0
DIV 0 DIVD 0 DIVF 0 DIVU 0
EQD 0 EQF 0 GED 0 GEF 0
GTD 0 GTF 0 LED 0 LEF 0
LTD 0 LTF 0 MULT 0 MULTD 3
MULTF 0 MULTU 0 NED 0 NEF 0
SUBD 0 SUBF 0
Total floating point operations = 5
Total operations = 92
Total cycles = 113
vcomb profile
(dlxsim) load vcomb_profile.s
Heap (for malloc) begins at 0x5FC
(dlxsim) go _main
TRAP #0 received
(dlxsim) stats
Memory size: 65536 bytes.
Floating Point Hardware Configuration
1 add/subtract units, latency = 2 cycles
1 divide units, latency = 19 cycles
1 multiply units, latency = 5 cycles
Load Stalls = 18
Floating Point Stalls = 9
No branch instructions executed.
Pending Floating Point Operations:
none.
INTEGER OPERATIONS
==================
ADD 6 ADDI 4 ADDU 0 ADDUI 2
AND 0 ANDI 0 BEQZ 0 BFPF 0
BFPT 0 BNEZ 0 J 1 JAL 2
JALR 0 JR 1 LB 0 LBU 0
LD 11 LF 0 LH 0 LHI 2
LHU 0 LW 30 MOVD 0 MOVF 0
MOVFP2I 0 MOVI2FP 0 MOVI2S 0 MOVS2I 0
OR 0 ORI 0 RFE 0 SB 0
SD 5 SEQ 0 SEQI 0 SEQU 0
SEQUI 0 SF 0 SGE 0 SGEI 0
SGEU 0 SGEUI 0 SGT 0 SGTI 0
SGTU 0 SGTUI 0 SH 0 SLE 0
SLEI 0 SLEU 0 SLEUI 0 SLL 0
SLLI 0 SLT 0 SLTI 0 SLTU 0
SLTUI 0 SNE 0 SNEI 0 SNEU 0
SNEUI 0 SRA 0 SRAI 0 SRL 0
SRLI 0 SUB 0 SUBI 1 SUBU 0
SUBUI 0 SW 30 TRAP 1 XOR 0
XORI 0 NOP 26
Total integer operations = 122
FLOATING POINT OPERATIONS
=========================
ADDD 3 ADDF 0 CVTD2F 0 CVTD2I 0
CVTF2D 0 CVTF2I 0 CVTI2D 0 CVTI2F 0
DIV 0 DIVD 0 DIVF 0 DIVU 0
EQD 0 EQF 0 GED 0 GEF 0
GTD 0 GTF 0 LED 0 LEF 0
LTD 0 LTF 0 MULT 0 MULTD 3
MULTF 0 MULTU 0 NED 0 NEF 0
SUBD 0 SUBF 0
Total floating point operations = 6
Total operations = 128
Total cycles = 155
vcomb without structures
profile
(dlxsim) load vcomb_profile_nostruct.s
Heap (for malloc) begins at 0x83C
(dlxsim) go _main
TRAP #0 received
(dlxsim) stats
Memory size: 65536 bytes.
Floating Point Hardware Configuration
1 add/subtract units, latency = 2 cycles
1 divide units, latency = 19 cycles
1 multiply units, latency = 5 cycles
Load Stalls = 0
Floating Point Stalls = 9
No branch instructions executed.
Pending Floating Point Operations:
none.
INTEGER OPERATIONS
==================
ADD 5 ADDI 3 ADDU 0 ADDUI 5
AND 0 ANDI 0 BEQZ 0 BFPF 0
BFPT 0 BNEZ 0 J 0 JAL 2
JALR 0 JR 1 LB 0 LBU 0
LD 11 LF 0 LH 0 LHI 5
LHU 0 LW 21 MOVD 0 MOVF 0
MOVFP2I 0 MOVI2FP 0 MOVI2S 0 MOVS2I 0
OR 0 ORI 0 RFE 0 SB 0
SD 5 SEQ 0 SEQI 0 SEQU 0
SEQUI 0 SF 0 SGE 0 SGEI 0
SGEU 0 SGEUI 0 SGT 0 SGTI 0
SGTU 0 SGTUI 0 SH 0 SLE 0
SLEI 0 SLEU 0 SLEUI 0 SLL 0
SLLI 0 SLT 0 SLTI 0 SLTU 0
SLTUI 0 SNE 0 SNEI 0 SNEU 0
SNEUI 0 SRA 0 SRAI 0 SRL 0
SRLI 0 SUB 0 SUBI 1 SUBU 0
SUBUI 0 SW 21 TRAP 1 XOR 0
XORI 0 NOP 35
Total integer operations = 116
FLOATING POINT OPERATIONS
=========================
ADDD 3 ADDF 0 CVTD2F 0 CVTD2I 0
CVTF2D 0 CVTF2I 0 CVTI2D 0 CVTI2F 0
DIV 0 DIVD 0 DIVF 0 DIVU 0
EQD 0 EQF 0 GED 0 GEF 0
GTD 0 GTF 0 LED 0 LEF 0
LTD 0 LTF 0 MULT 0 MULTD 3
MULTF 0 MULTU 0 NED 0 NEF 0
SUBD 0 SUBF 0
Total floating point operations = 6
Total operations = 122
Total cycles = 131
vunit profile
(dlxsim) load vunit_profile.s
Heap (for malloc) begins at 0x750
(dlxsim) go _main
TRAP #0 received
(dlxsim) stats
Memory size: 65536 bytes.
Floating Point Hardware Configuration
1 add/subtract units, latency = 2 cycles
1 divide units, latency = 19 cycles
1 multiply units, latency = 5 cycles
Load Stalls = 36
Floating Point Stalls = 36
No branch instructions executed.
Pending Floating Point Operations:
none.
INTEGER OPERATIONS
==================
ADD 14 ADDI 9 ADDU 0 ADDUI 2
AND 0 ANDI 0 BEQZ 0 BFPF 0
BFPT 0 BNEZ 0 J 3 JAL 5
JALR 0 JR 4 LB 0 LBU 0
LD 24 LF 0 LH 0 LHI 2
LHU 0 LW 54 MOVD 0 MOVF 0
MOVFP2I 2 MOVI2FP 1 MOVI2S 0 MOVS2I 0
OR 0 ORI 0 RFE 0 SB 0
SD 12 SEQ 0 SEQI 0 SEQU 0
SEQUI 0 SF 0 SGE 0 SGEI 0
SGEU 0 SGEUI 0 SGT 0 SGTI 0
SGTU 0 SGTUI 0 SH 0 SLE 0
SLEI 0 SLEU 0 SLEUI 0 SLL 0
SLLI 0 SLT 0 SLTI 0 SLTU 0
SLTUI 0 SNE 0 SNEI 0 SNEU 0
SNEUI 0 SRA 0 SRAI 0 SRL 0
SRLI 0 SUB 0 SUBI 4 SUBU 0
SUBUI 0 SW 56 TRAP 2 XOR 0
XORI 0 NOP 52
Total integer operations = 246
FLOATING POINT OPERATIONS
=========================
ADDD 5 ADDF 0 CVTD2F 0 CVTD2I 0
CVTF2D 0 CVTF2I 0 CVTI2D 1 CVTI2F 0
DIV 0 DIVD 1 DIVF 0 DIVU 0
EQD 0 EQF 0 GED 0 GEF 0
GTD 0 GTF 0 LED 0 LEF 0
LTD 0 LTF 0 MULT 0 MULTD 6
MULTF 0 MULTU 0 NED 0 NEF 0
SUBD 0 SUBF 0
Total floating point operations = 13
Total operations = 259
Total cycles = 331
trace profile
(dlxsim) load trace_profile.s
Heap (for malloc) begins at 0x2278
(dlxsim) go _main
TRAP #0 received
(dlxsim) stats
Memory size: 65536 bytes.
Floating Point Hardware Configuration
1 add/subtract units, latency = 2 cycles
1 divide units, latency = 19 cycles
1 multiply units, latency = 5 cycles
Load Stalls = 19
Floating Point Stalls = 0
Branches: total 1, taken 0 (0.00%), untaken 1 (100.00%)
Pending Floating Point Operations:
none.
INTEGER OPERATIONS
==================
ADD 6 ADDI 8 ADDU 0 ADDUI 2
AND 0 ANDI 0 BEQZ 0 BFPF 0
BFPT 0 BNEZ 1 J 1 JAL 2
JALR 0 JR 1 LB 0 LBU 0
LD 3 LF 0 LH 0 LHI 2
LHU 0 LW 34 MOVD 0 MOVF 0
MOVFP2I 0 MOVI2FP 0 MOVI2S 0 MOVS2I 0
OR 0 ORI 0 RFE 0 SB 0
SD 3 SEQ 0 SEQI 0 SEQU 0
SEQUI 0 SF 0 SGE 0 SGEI 0
SGEU 0 SGEUI 0 SGT 0 SGTI 0
SGTU 0 SGTUI 0 SH 0 SLE 0
SLEI 0 SLEU 0 SLEUI 0 SLL 0
SLLI 0 SLT 0 SLTI 0 SLTU 0
SLTUI 0 SNE 1 SNEI 0 SNEU 0
SNEUI 0 SRA 0 SRAI 0 SRL 0
SRLI 0 SUB 0 SUBI 1 SUBU 0
SUBUI 0 SW 35 TRAP 1 XOR 0
XORI 0 NOP 21
Total integer operations = 122
FLOATING POINT OPERATIONS
=========================
ADDD 0 ADDF 0 CVTD2F 0 CVTD2I 0
CVTF2D 0 CVTF2I 0 CVTI2D 0 CVTI2F 0
DIV 0 DIVD 0 DIVF 0 DIVU 0
EQD 0 EQF 0 GED 0 GEF 0
GTD 0 GTF 0 LED 0 LEF 0
LTD 0 LTF 0 MULT 0 MULTD 0
MULTF 0 MULTU 0 NED 0 NEF 0
SUBD 0 SUBF 0
Total floating point operations = 0
Total operations = 122
Total cycles = 141
sphere intersect profile
(dlxsim) load intersect_profile.s
Heap (for malloc) begins at 0x1458
(dlxsim) go _main
TRAP #0 received
(dlxsim) stats
Memory size: 65536 bytes.
Floating Point Hardware Configuration
1 add/subtract units, latency = 2 cycles
1 divide units, latency = 19 cycles
1 multiply units, latency = 5 cycles
Load Stalls = 242
Floating Point Stalls = 185
Branches: total 26, taken 8 (30.77%), untaken
18 (69.23%)
Pending Floating Point Operations:
none.
INTEGER OPERATIONS
==================
ADD 50 ADDI 51 ADDU 0 ADDUI 160
AND 0 ANDI 0 BEQZ 0 BFPF 5
BFPT 15 BNEZ 6 J 29 JAL 22
JALR 0 JR 21 LB 0 LBU 0
LD 233 LF 0 LH 0 LHI 160
LHU 0 LW 341 MOVD 0 MOVF 0
MOVFP2I 20 MOVI2FP 10 MOVI2S 0 MOVS2I 0
OR 0 ORI 0 RFE 0 SB 0
SD 78 SEQ 0 SEQI 0 SEQU 0
SEQUI 0 SF 0 SGE 0 SGEI 0
SGEU 0 SGEUI 0 SGT 0 SGTI 0
SGTU 0 SGTUI 0 SH 0 SLE 0
SLEI 0 SLEU 6 SLEUI 0 SLL 0
SLLI 0 SLT 0 SLTI 0 SLTU 0
SLTUI 0 SNE 0 SNEI 0 SNEU 0
SNEUI 0 SRA 0 SRAI 0 SRL 0
SRLI 0 SUB 0 SUBI 21 SUBU 0
SUBUI 0 SW 342 TRAP 6 XOR 0
XORI 0 NOP 414
Total integer operations = 1990
FLOATING POINT OPERATIONS
=========================
ADDD 45 ADDF 0 CVTD2F 0 CVTD2I 0
CVTF2D 0 CVTF2I 0 CVTI2D 0 CVTI2F 0
DIV 0 DIVD 0 DIVF 0 DIVU 0
EQD 0 EQF 0 GED 0 GEF 0
GTD 0 GTF 0 LED 10 LEF 0
LTD 10 LTF 0 MULT 0 MULTD 55
MULTF 0 MULTU 0 NED 0 NEF 0
SUBD 10 SUBF 0
Total floating point operations = 130
Total operations = 2120
Total cycles = 2547
main profile
(dlxsim) load main_profile.s
Heap (for malloc) begins at 0x283C
(dlxsim) go _main
TRAP #0 received
(dlxsim) stats
Memory size: 65536 bytes.
Floating Point Hardware Configuration
1 add/subtract units, latency = 2 cycles
1 divide units, latency = 19 cycles
1 multiply units, latency = 5 cycles
Load Stalls = 81925
Floating Point Stalls = 46080
Branches: total 4097, taken 2049 (50.01%), untaken
2048 (49.99%)
Pending Floating Point Operations:
none.
INTEGER OPERATIONS
==================
ADD 23555 ADDI 26627 ADDU 0 ADDUI 20489
AND 0 ANDI 0 BEQZ 0 BFPF 0
BFPT 0 BNEZ 4097 J 6144 JAL 6147
JALR 0 JR 6146 LB 0 LBU 0
LD 38913 LF 0 LH 0 LHI 20489
LHU 0 LW 131086 MOVD 0 MOVF 0
MOVFP2I 2048 MOVI2FP 4096 MOVI2S 0 MOVS2I 0
OR 0 ORI 0 RFE 0 SB 0
SD 22529 SEQ 0 SEQI 0 SEQU 0
SEQUI 0 SF 0 SGE 2048 SGEI 0
SGEU 0 SGEUI 0 SGT 1025 SGTI 0
SGTU 0 SGTUI 0 SH 0 SLE 0
SLEI 0 SLEU 0 SLEUI 0 SLL 0
SLLI 1024 SLT 0 SLTI 0 SLTU 0
SLTUI 0 SNE 1024 SNEI 0 SNEU 0
SNEUI 0 SRA 0 SRAI 2048 SRL 0
SRLI 0 SUB 2048 SUBI 6146 SUBU 0
SUBUI 0 SW 123919 TRAP 1027 XOR 0
XORI 0 NOP 97296
Total integer operations = 549971
FLOATING POINT OPERATIONS
=========================
ADDD 8192 ADDF 0 CVTD2F 0 CVTD2I 0
CVTF2D 0 CVTF2I 0 CVTI2D 2048 CVTI2F 0
DIV 0 DIVD 1024 DIVF 0 DIVU 0
EQD 0 EQF 0 GED 0 GEF 0
GTD 0 GTF 0 LED 0 LEF 0
LTD 0 LTF 0 MULT 0 MULTD 9216
MULTF 0 MULTU 0 NED 0 NEF 0
SUBD 0 SUBF 0
Total floating point operations = 20480
Total operations = 570451
Total cycles = 698456