Abstract

The goal of this thesis was to investigate the task of creating a computer generated animation and the computational demands of rendering. Autodesk's 3D Studio Version 4 package was used to draw, animate, and render the animation. I created a computer generated animation short using 3D Studio and incorporated techniques to refine the animation process. The final product contains audio effects and has a running time of approximately four minutes. Areas investigated include : (1) creating a "virtual set" using 3D Studio, (2) developing a screenplay, (3) modeling characters and objects, (4) animating the scenes, (5) timing the movements of the characters and objects, (6) rendering techniques, (7) converting and compressing animation files, (8) adding audio effects, and (9) computational requirements of rendering. In determining the computational needs of rendering, a ray-tracing program was profiled. Testing of the program involved replacing functions with macro equivalents in an effort to minimize computational overheard due to parameter passing and function calling.


Introduction

Computer animation is one of the hottest areas of computer science today. With advances in computer architectures and modeling software, computer animation is becoming a quick, reliable means of visualizing ideas. Computer animation is used in various fields such as engineering, medicine, physics, architecture, and computer science. Simply by turning on the television, one can see how computer animation and computer generated graphics have become a useful means to both educate as well as entertain us.

With Pixar's release of Toy Story, the first computer generated motion picture which was 4 years in the making, computer generated movies have made their mark in the entertainment industry. Other films, such as Jurassic Park and Twister, have brought computers into modern motion picture making by allowing for special effects never before possible.

Computer animation is much more efficient than standard cartooning in that once a computer model is created, movement and rotation are easily done by the computer. This has a definite advantage over traditional cartoon making in that a cartoonist has to physically draw and color each individual cel whereas computer animation simplifies the process by automatically "tweening" between key frames of the animation. Furthermore, by using texture mapping (the process of applying an image to a computer model), photo-realistic results can be achieved using computer animation.

While computer animation simplifies the process of creating an animation, it is still not an easy process. This thesis dealt with both the process of creating a computer generated animation using Autodesk's 3D Studio modeling package as well as the computational demands of rendering. The process itself can be divided into the following sections:

(1) creating a screenplay

(2) creating a 3D virtual world

(3) animating 3D objects

(4) rendering the animation

(5) adding audio and sound effects to the rendered movie.

This paper is divided into chapters that provide information on the process used to create a computer generated animation short as well as give background information on related topics. These chapters are described as follows:

Animation: Literature Review

This chapter is a synopsis of work that has been done in the area of computer animation and computer generated images. This chapter also contains information on the current state of computer animation in modern film making.

Comparison: Computer Animation vs. Cartooning

This chapter describes the differences between 2-D and 3D computer generated animation and traditional cartooning.

Modeling Packages

A description of the features found in most popular commercial and freeware modeling packages, as well as a background in terminology, are found in this chapter.

Creating the Magic

This chapter explains the procedure used to create a computer generated animation using 3D Studio. An explanation of each of the stages of the "animation pipeline" as well as programs within 3D Studio that aid in animation and 3D modeling is included. Tips and tricks found during the study are also covered in this section.

Network Rendering

This chapter describes the potential usefulness of network rendering and explains the differences between single machine versus network rendering.

Profiling Results for a Minimal Ray Tracer

To illustrate the computing power needed for rendering, a ray-tracing program was profiled to give an idea of the processing "muscle" that is required during rendering. Though 3D Studio uses polygonal shading, an evaluation of ray-tracing will serve to demonstrate the demands of rendering.

Problems Encountered

This chapter contains a discussion of the problems encountered and their possible solutions.

Discussions and Future Work

This chapter describes the difficulty of this project and a provides a "wish list" for things that I would like to study further.

Conclusion

A summary of my results are found in this chapter.


Animation: Literature Review

Due to the overwhelming popularity and usefulness of computer animation there is definitely no shortage of books relating to the topic. Computer animation has evolved from traditional cartooning and is a cutting edge tool that has no limitations as to its potential. This chapter describes some of the books and materials that pertain to computer animation. There are two areas of emphasis that this thesis dealt with which are (1) animating a story, and (2) the computational demands needed for rendering the animation.

Thanks to movies such as Jurassic Park and Toy Story, there has been no shortage of information dealing with computer animation. One of the best sources of information about traditional animation, computer animation, and the rigors of rendering is Toy Story: The Making of the Animated Film (Lasseter, 1995). This book details the famous Disney approach of story telling and how they applied it to computer animation. Also in this book is a description of how the traditional cartoonists and the computer animators worked together by using story boards to describe each scene. Ironically, it turned out that the computer animators were relying too heavily on the story boards when they were creating the animation!

Before Toy Story there was Jurassic Park -- Steven Speilberg's dinosaur extravaganza. While Toy Story had a "cartoon-ish" look, the goal of Jurassic Park's animators was to create realistic virtual dinosaurs. The video, "The Making of Jurassic Park" (Michenaud, 1995) details the process of how the animators combined both live action with computer animation to achieve believable imagery to captivate the audience.

A wonderful book on traditional cartooning techniques is The Animation Book (Laybourne, 1979). This book explains the traditional method of creating cartoons where each individual frame of an animation must be hand drawn. Some of the topics covered in this book include cel animation, flip-books, and how to create an entire cartoon and put it on film. This information is especially useful later in this document when traditional cartooning is compared to modern computer animation.

During the course of this thesis I constantly referred to the 3D Studio Version 4 manuals (Autodesk, 3D Studio, 1994). These manuals describe some of the basics of computer animation and provide information on how to use 3D Studio Version 4. Along with these manuals I obtained a demo CD of 3D Studio Max (Kinetix, 1996) which provides technical information about the latest version of 3D Studio, as well as providing an instructive demonstration of the potential of computer animation.

A useful reference dealing with 2D computer animation is Autodesk's Animator Pro manuals (Autodesk, Animator Pro, 1994). These manuals not only contain information as to the use of Animator Pro, but also serve as a source of computer animation terminology. These manuals do a good job of explaining such concepts as "tweening" and keyframing. This information is particularly useful when describing the differences in traditional cartooning and modern 2-dimensional computer animation.

The most rigorous part of computer animation is rendering the animation. In the appendices I've included my results from profiling a ray-tracing program. Real-time rendering is becoming a more attainable goal through the research of Dr. Raghu Karinthi and his Z-Buffer Rendering paper (Karinthi, 1993). The goal of his research was to attain real-time rendering on a PC. Another excellent source of information on the demands of rendering is Andrew Glassner's "An Introduction to Ray-tracing" (Glassner, 1989) that fully explains the stages of ray-tracing and all of the computations that are required.

The book that has taught me more about graphics than any other is Introduction to Computer Graphics (Foley, 1994) which was my textbook from my first graphics course and was taught by Dr. Raghu Karinthi. This book explains the various shading models in great detail as well as describing basic graphic matrix operations for scaling, transforming, and rotating - all of which are used by 3D Studio during rendering.

There are many resources that can be found on the web dealing with computer animation and rendering. The largest collection of public domain modeling and rendering tools can be found at Viewpoint DataLab's Avalon web site, http://www.viewpoint.com. This site also contains many utilities for 3D Studio as well as a large collection of public domain models.

Ray-tracing has been an obsession for many people on the web and the home of ray-tracing is at http://www.povray.org. This site is dedicated to a freeware ray-tracing package called POV-Ray. There are sections of the site devoted to explaining what ray-tracing is, a file section where you can download the latest version of POV-Ray, a nice gallery of POV-Ray-traced images that artists have submitted, and help files on every aspect of ray-tracing. One of the sections of this site is a programming competition where programmers try to write enhancements to the POV-Ray program, or receive a challenge to write a minimal ray-tracer. Contained in this thesis are the profiling results that were done on one such minimal ray-tracer. The goal of the competition was to write a ray-tracing program that contained the fewest lines of code. The program that I profiled using a DLX simulator was minray.c. This program was a combination of the best submissions compilated by Dr. Paul Heckbert at Carnegie Mellon University. Dr. Heckbert is a computer science instructor and has specialized in computer graphics. His web page contains a vast collection of information pertaining to computer animation and graphics and is located at http://www.cs.cmu.edu/afs/cs/user/ph/www/heckbert.html.

Further information on Disney's approach to traditional cartooning can be found in Christopher Finch's book The Art of Walt Disney : From Mickey Mouse to the Magic Kingdoms (Finch, 1995). Another source of information on Disney's approach to cartooning is Bob Thomas's book Disney's Art of Animation : From Mickey Mouse to Beauty and the Beast (Thomas, 1991). These two books give a detailed description of the process Disney cartoonists use to animate a story from story-boarding to the big screen.

The premier company in modern motion picture special effects is George Lucas's Industrial Light & Magic. Lucas formed this company to work on his classic Star Wars trilogy. When the company was founded in the 1970's the average age of the employees was 23! Lucas recruited the best people he could find from high school and college drama clubs to work at the small start-up company. Mark Cotta Vaz's book Industrial Light & Magic : Into the Digital Realm (Cotta Vaz, 1996) describes the work done by the special effects company. The most notable work that ILM has done is the Star Wars trilogy, Jurassic Park, Twister, Star Trek Generations and most recently, Star Trek First Contact. With the right blend of computer animation, engineering, and art, ILM has become the undisputed leader in cutting edge imagery in Hollywood.


Comparison: Computer Animation vs. Cartooning

Persistence of vision is the term used to describe how many pictures that the human eye must see to give the effect of smooth motion. In order to achieve the illusion of motion for the human eye, 24 frames per second are necessary. This means that a traditional cartoonist has to hand draw 24 separate images for a single second of smooth flowing animation. Imagine a feature film such as Walt Disney's "Bambi" (Finch) that has a running time of 69 minutes -- that translates to 99,360 separate animation frames, or cels, that had to be hand drawn by cartoonists. To put that number into perspective, that's one frame of animation per character in this entire document! Typically, key-frames of an animation are drawn and act as the story board. Next, a team of artists known as "tweeners" draw the images that fill in the gaps. Imagine how much time is spent drawing and redrawing the same image by hand with only a minor difference between the two images to allow for the natural progression of the animation. If this "tweening" process could be automated so that a computer could draw and redraw the cels of an animation then the turn-about-time for an animation would be greatly improved. With computer animation packages, it is now possible to create an animation very quickly as opposed to conventional cartooning. Consider the following example.

Imagine a cartoon that consists of a circle and a large square, and in the animation it is desired to have the circle move past the square. Furthermore, it is desired that the animation last for 5 seconds. That is, from the time the circle starts moving to the time that it passes the square, five seconds will have elapsed. The first approach examined will be the traditional cartooning method followed by the more modern computerized method.

Traditional Cartooning

To first get a feel for how involved the animation is going to be, it is desirable to calculate how many frames are required for the 5 second animation. Knowing that to achieve persistence of vision it is necessary to have 24 frames per second of animation, the 5 second animation will require 120 separate frames of animation. Now, the task for the cartoonist is to draw each frame by hand and combine them all into an animation sequence. The best way for the cartoonist to accomplish this is to draw the key frames of the animation, the first and last frames for instance, and then draw the necessary cels that will create a smooth transition from the first frame to the final frame (Laybourne).

Figure 1: Traditional cartooning, Keyframe 0

Figure 2: Traditional cartooning, Keyframe 120

Computer Animation

Using an animation package, such as Autodesk's Animator Pro, a computer animator can use a built in tool to draw a circle and a square. Then by specifying the key frames of the animation, the first and last frames for example, it is possible for the computer animator to quickly animate the sequence (Autodesk, Animator Pro). This can be done by simply placing the two objects in the first frame and then moving to the final frame, frame 120, and place the circle some distance past the square. By placing the objects at these key frames, the computer can then perform the tweening process by drawing the subsequent frames between the first and last frames to achieve a smooth animation.

Comparison

It is easy to see how the computer can be used to create computer animation from this simple example. In order to create the computer animation the animator needed to only draw the circle and the square, position the objects at the key frames, and finally let the computer do the tweening. The entire process using the computer takes a matter of seconds. The traditional cartoonist follows a similar process in that the circle and square are drawn at the key frames, but then each frame had to be hand drawn, thus increasing the overall time required. In effect, the computer animator had to draw only two separate frames, while the traditional cartoonist had to draw 120 separate frames!

A 3-Dimensional Approach

The term "three-dimensional computer graphics" refers to how mesh objects in the image relate to one another in terms of size, location, appearance, and orientation (Foley). 3D graphics allow the artist to create a "virtual world" similar to a child playing with Lincoln logs. Much in the same way that the child builds a log house, the computer animator can create a virtual log house using tubes, cylinders, and boxes. As the play scene becomes more elaborate, the child may want to create a small town by building more log buildings and placing them in different locations. The 3D animator follows a similar process by creating more buildings and placing them relative to the original building. Since the child is confined by the laws of physics, he has already been bound by gravity and more importantly, the floor. In the "virtual world" there are no set bounds for the floor so the artist may choose to create a flat prairie or a mountainous region to place the buildings. Now the child chooses to add some characters to his play set and he places them inside some of the buildings, in the street, and in a field. The animator does the same thing as the child except the animator has more flexibility in the choice of characters and their appearance. By placing the characters at desired positions, both the child and the animator are ready to create a story. The child elects to have some bad guys rob the local saloon and run out of the building into the crowded street.

In the 3D world, the animator defines the movements of the characters in the street as well as the bad guys running out of the saloon. By using the concept of keyframing, the animator can simply design important, or key, frames of the animation that allow the computer to generate the images between these key frames for a smooth animation. Key frames for this example may be the people walking from one end of the street to the other. By defining how many frames it takes for the characters to walk from one end of the street to the other, key frames can be established. Assuming that the playback speed is going to be 30 frames per second and taking into account how fast the people are to walk, it is possible to determine the total number of frames required. Assume that it should take 20 seconds for the characters to walk from one end of the street to the other. That translates to 30*20, or 600 frames to make the journey. Key frames are then decided to be at frame 0 and frame 600. At frame 0 the artist places characters at opposite ends of the street so that it appears to be a busy, crowded street. Next, the artist goes to frame 600 and moves each character to the opposite end of the street. The computer will now perform the "tweening" process to produce the corresponding frames between these two key frames when the scene is rendered. In effect, the animator only placed the objects where they needed to be at certain points in the animation and let the computer do the rest of the work.


Modeling Packages

A good understanding of the various modeling and rendering packages available is a necessity in working with 3D graphics. This focus of this chapter is to provide a background in terminology (Foley). The first thing that is necessary to do is to distinguish between a modeling package and a rendering package. A modeling package allows a user to create and manipulate 3 dimensional objects. Rendering packages allow the user to create a ray traced or polygonal-shading image from a 3Dimensional model created by a modeling package. Later in this document is an analysis of a ray-tracing program which details the computational intensity of rendering. There are many variations between modeling packages as well as rendering packages, and the differences are very important in order to achieve a desired image or animation appearance.

3-D Modeling

3D modeling packages are only as good as their tools and functions are. A good modeling package should offer the ability to create simple 3D objects, or primitives, such as spheres, cubes, tubes, and cylinders plus allow for proper placement of the objects. The ability to create simple 3D objects allows for more complex objects to be made by building them from the simpler objects. A useful feature is Boolean Operations that allows for the creation of more complex objects by combining simpler objects. An example of where Boolean operations can be used is in creating a cube with a hole through it as shown below.

Figure 3: Rendered primitives

Figure 4: Primitive placement for Boolean Operation

Figure 5: Top View of primitive placement

Figure 6: Rendered Boolean Object

In this example, I drew a cube and a cylinder in 3D Studio using the 3D Editor facility. I then placed the cylinder inside the cube where I wanted to create the hole. Basically, I want to create a new object by subtracting the cylinder's volume from the cube. An easy way to imagine this is to think of the cylinder as a drill bit and the cube as a block of wood. I want to remove the part of the cube that overlaps with the cylinder, i.e. drill the cube with my virtual drill bit.

The cube looks very nice as it is but what if I wanted to make it look like an actual piece of wood? The feature that allows you to add visual realism to an object is called texture mapping which is simply applying a bitmap image to a mesh object. By using texture mapping it is possible to change many features of an object depending on the bitmap or "material" that you choose to use. Another feature of texture mapping is that it allows you to add a bumpy look to an object, for example make the cube out of concrete.

Figure 7: Wood grain texture mapping

Figure 8: Concrete texture mapping

A very useful feature for animating an object is known as Inverse Kinematics which allows for a child object to move while properly moving the parent object. A good example of this can be seen in the following images:

Figure 9: Initial position of model

Figure 10: After using Inverse Kinematics

The goal was to move the woman's hand above her head. There are several different ways that an animator could do this but none are as simple as using Inverse Kinematics. By using a pre-defined object hierarchy (discussed later in this thesis) and using a Kinematic Chain, the animation can be done by selecting the hand and dragging it above her head. With Inverse Kinematics it is also possible to define joint parameters such as type of joint and range of motion in order to get realistic movements so that the model behaves as the real object would.

For a more exciting use of Inverse Kinematics I chose to have the individual in a kneeling position. The only work involved in making this image was to drag her feet to the proper position and then move her arms. The rest of the model was properly positioned in relation to the movement of the feet and hands.

Figure 11: Using Inverse Kinematics to kneel

In contrast to the ease of animating using Inverse Kinematics is the manual process of rotating or moving each object in a hierarchy. This method takes a very long time but good detail and control can still be achieved. If an animator chooses not to use Inverse Kinematics then in order to move the character's hand over her head, the upper arm must first be rotated, then the lower arm, and finally the hand and its fingers. From this comparison it should be easy to see that using Inverse Kinematics can greatly enhance the animation process.

Creating the Magic

The following sections of this paper will describe the process I used to create the computer generated animation. The sections are divided as follows:

  1. Creating a 3D Virtual World
  2. Developing a Rough Screenplay
  3. Animating 3D Objects
  4. Camera Placement
  5. Rendering the Animation
  6. Adding Audio to the Rendered Animation

Creating a 3-D Virtual World

In this section I will explain how I created the characters and objects used in the movie. Before I was able to use the characters and the objects in the story, they had to be created, or modeled, in a 3 dimensional space. Some models are easy to create while others can be quite complex depending on several factors such as visual appearance, movement, etc. Simple models include the grassy field where complex models include the characters such as the boy or the egret bird.

A simple model used in the story was the ground. In order to model the ground I created a very large, thin box using the 3D Editor. Then by applying mapping coordinates and using texture mapping to apply a "grassy" look to the box, I had created a large field. This object was quite simple to create and gives the proper appearance of a grassy field. While this one was simple, there are others that can become quite complex and involved.

An example of complex models that I created are the stick figures or "skeleton" figures that I used in the story. By using a stick figure I can later apply a more complex mesh to it and have the stick figure behave more like a skeleton for the more complex model. A simple example would be that if I animate a stick figure moving its arms about, then I apply a realistic looking model of a little boy to the stick figure, the model of the little boy will move as does the stick figure "skeleton". By taking this approach I can apply any high quality mesh to the stick figure model. More information can be found in the "Problems Encountered" section of this paper.

The goal was to create a stick figure that behaves identically like a real human skeleton. That is, to try to get the proper proportionality and more importantly to properly define the joints of the model. The steps I used to create a good, working model are as follows:

  1. Draw the head, neck, upper and lower torso, and limbs in the 3D Editor
  2. Use the Keyframer's Hierarchy Link tool to define child/parent relationships of the various parts
  3. Use the Keyframer Hierarchy Object Pivot tool to define object rotation points with respect to the object's parent object
  4. Use the Inverse Kinematics utility to define joint degrees of freedom and the type of joint (i.e., hinge, ball and socket, etc.)

Drawing the Body

In order to draw a good representation of the human body I had to determine what I needed to represent and ultimately how I represented it. The important concept to realize here is that detail does not matter. The most important thing to concentrate on is proper representation of the major parts of the body. The parts I chose to represent in my model are the head, neck, upper torso, lower torso, hips, upper arms, lower arms, hands, upper leg, lower leg, and the feet. Please note that the detail on the hands and feet are not important right now and they are represented simply as blocks.

Using the 3D Editor I simply created boxes of different dimensions to represent most of the body parts. The only body part that is not represented as a box is the upper torso which I chose to use a 3 sided box, i.e., a triangular box as can be seen in the following images:

Figure 12: Various views of character skeleton

Notice in the pictures how the figure is in different positions. This is very easy to do once the proper object hierarchies are in effect as well as the proper joint definitions.

Linking It All Together

Imagine how each of your body parts moves in relation to one another. Take a look at your feet and your lower leg and consider how each of these moves with respect to the other. Now establish a parent/child hierarchy for these two body parts. Try moving your foot without moving your lower leg. Pretty easy to do. Now, try moving your lower leg without moving your foot. This is impossible. The reason this is impossible is that the foot is a child of the lower leg. That is, whenever the lower leg is moved, the foot must move as well. Perhaps the foot itself does not move but its location in 3 space does change due to the movement of your lower leg. Using this as a model I had to establish a parent/child hierarchy for the parts of the model's body that would behave most like the human body. The following chart shows the hierarchies I used for the model.

Figure 13: Hierarchy Chart

From this chart it is seen that the Right Foot is the child of the Right Lower Leg, or that the Hips are the child of the Lower Torso. In order to implement this hierarchy I linked the objects together using the Keyframer Hierarchy command.

The steps involved were:

Choose "Link Objects" from the Hierarchy menu. Here you must first specify the child object and then the parent object. Referring back to the hierarchy chart, I started with the Right Foot as my first child object. I then selected the Right Lower Leg as its parent. I then selected the Right Lower Leg as the child object and the Right Upper Leg as the parent object. I repeated this process for Left leg objects. I then selected the Right Upper Leg as the child and selected the Hips as the parent object. I repeated this for the Left Upper Leg. At this point, if anything is done to the Hips, such as movement or rotation, every child, grandchild, and subsequent offspring will be affected. Continuing, I select the Right Hand as the child and then select the Right Lower Arm as it's parent. I then selected the Right Lower Arm as the child and the Right Upper Arm as its parent. I repeated this process for the Left Hand, Left Lower Arm, and Left Upper Arm.

Now I needed to establish a link between the two Upper Arms and the Upper Torso. To do this, I selected the Right Upper Arm as the child object and the Upper Torso as the parent. I then repeated this process for the Left Upper Arm. At this point the arms are linked to the upper torso and the legs are attached to the hips. I then combined the upper and lower halves of the model. Remembering that the Upper Torso is the parent object to all other objects I established a link between the Hips and the Upper Torso. This was done using the Lower Torso. I selected the Hips to be a child object and the Lower Torso to be the parent object. To finish linking the body together I selected the Lower Torso as the child and the Upper Torso as the parent. At this point, any change to the Upper Torso (such as movement) will have an effect on all parts of the model.

The model is still not finished as it is missing a head and a neck. This part was confusing for me when I first started defining the hierarchies. In my mind I felt that the head had more control than did the neck, thus making the head the neck's parent. I found out however that if I selected the neck as the child and the head as the parent, I could not link the head and neck to the rest of the body. With this I had to declare the Head as the child and the Neck as the parent. In order to then complete the model I selected the Neck as the child object and the Upper Torso as the parent object. With the conclusion of this phase, the model is now ready to have the joints properly defined.

Placing Object Pivot Points

Now that the objects are all linked together it is necessary to properly position the pivot points for all the objects. This is important so that whenever an object is rotated it will rotate about the proper pivot point. Another major reason that this is important is for establishing proper joint freedoms later on. This section will explain how to use the Keyframer to correctly place the pivot point for each object.

To place an object's pivot point, it is essential to have already established an object hierarchy for the model. By placing the object pivot points I am specifying center of rotation for each object. When modifying or placing an object's pivot point, the object as well as its parent object are displayed in 4 separate viewports which aids in the correctly placing the pivot point. I will explain how I placed the object pivot points for the objects. The process is fairly straightforward so I will only detail one specific example for the Right Foot. The remaining object pivot points can be placed by simply repeating the following process:

  1. Within the Keyframer program, use the Hierarchy option "Object Pivot"
  2. Select the object that will be modified
  3. Using the mouse and the four tiled viewports, properly place the pivot point
  4. Repeat this process until all object pivot points have been placed

In the Keyframer program, I used the "Object Pivot" command under the Hierarchy menu and selected the Right Foot of the model. Upon selecting the foot, I now see 4 viewports that contain the Right Foot, its parent object, and a black X. The X marks the current location of the object's pivot point. By using the four viewports which I chose to be a top view, front view, left view, and a user defined view, I could easily adjust the object pivot point by placing the X at the desired locations in the 4 viewports. Once the pivot point is in the correct position for the different viewpoints then I know that it is in the correct position for the object. The proper location for the pivot point when modeling a human joint is in the center of where the two objects, child and parent, meet.

At this point, the Right Foot can be rotated and will move about the pivot point. However, even though the object rotates about the proper point there are no constraints on the joint. That is, the foot can be rotated in abnormal directions which are not possible for humans! Also, the Right Foot can intersect the Right Lower Leg since the objects are not rigid and more importantly the joint parameters have not yet been sufficiently defined.

Figure 14: "Natural" foot position

Figure 15: "Unnatural" foot position

The above images show the "natural" or original position of the foot and the second image shows that the foot has been rotated about its pivot point but is not a natural human rotation.

Using Inverse Kinematics

Inverse Kinematics is a plug-in utility for 3D Studio that allows an animator to create a more natural animation by simply moving a leaf object rather than moving its ancestor objects (Autodesk, 3D Studio). The main advantage of using Inverse Kinematics is that it is an easier method of creating life like animation sequences in a very short period of time as compared to manually moving an ancestor object and its subsequent children. More simply, if you want an animation of a person dribbling a basketball, simply use the Inverse Kinematics plug-in and have the hand follow the ball and IK will solve for all object positions in the hierarchy of the body. If the left hand is to be moved over the head, simply move the left hand above the head and the lower arm and upper arm movements will be taken care of automatically. Before Inverse Kinematics can be used to create an animation sequence the objects must be prepared for use. The IK feature requires that an object hierarchy already be established and that the object pivot points have been set. The next step is to define the restrictions on the joints of the human model.

The Inverse Kinematics plug-in for 3D Studio is a KXP program and must be run from within the Keyframer. After starting the Inverse Kinematics program, choose Pick Objects and then select any part of the model hierarchy in the Keyframer. Next, select the Joint Parameters button. The resulting screen is where you define the type of joint (sliding versus revolving) and then define its range of motion in terms of X, Y, Z.

The following is a table of typical joint settings (Autodesk, 3D Studio).

ObjectX From X ToY From Y ToZ FromZ To
L. Foot40-25 -1515-10 10
L. Shin-1350 000 0
L. Thigh80-80 0010 -10
R. Foot40-25 -1515-10 10
R. Shin-1350 000 0
R. Thigh80-80 0010 -10
Pelvis00 000 0
Chest-7520 35-3535 -35
Neck45-35 15-1525 -25
Head45-45 80-8030 -30
L. Hand85-65 80-11010 -30
L. Forearm140 010-100 0
L. Upr Arm-45 18010-10 150-5
R. Hand85-65 80-11010 -30
R. Forearm140 010-100 0
R. Upr Arm-45 18010-10 150-5

Figure 16: Joint Parameters

A helpful hint in defining the joint parameters is to make them all revolving joints and then define the range of motion possible in each of the X, Y, and Z axes. I will now demonstrate how the chart is used in defining joint behavior. For example, move your left forearm. The design of the elbow joint is such that there is a large range of motion along the X axis, a much smaller range along the Y axis, and no freedom along the Z axis. The joint information used here is simply a suggested guideline and is supplied in the manual for the Inverse Kinematics utility.

Once all of the joint parameters are defined, it is then possible to use the settings to manipulate the model in a very quick and efficient manner. There is a problem with the version of Inverse Kinematics that is supplied with 3D Studio Version 4 in that only objects that are linked in a hierarchy are able to be imported into the Inverse Kinematics plug-in. The inability to only work with linked objects makes it difficult, if not impossible, to easily create an animation involving two separate entities. The example I will use deals with the hero of the story trying to sit under the shade tree. I've already defined the hierarchy and joint parameters of the hero and now I want to be able to use the power of Inverse Kinematics to make the character sit under the tree. Herein lies the problem that I can only import objects that are linked together. Obviously the hero is not in any way attached to a tree so I am unable to manipulate the character with respect to the tree in the IK program. If I linked the hero and the tree, then I would have a pivot point between the two objects so whenever I would try to move the parent of all other objects, the entire hierarchy would move. In other words, if I linked all objects to the Upper Torso of the hero, whenever I move the Upper Torso the tree will also move. I am hopeful that 3D Studio Max solves for this discrepancy.

Since I was unable to use the power of Inverse Kinematics I had to manually rotate each of the limbs of the characters using the Keyframer. This takes a much longer time to do but without manually moving each limb I would be unable to see how the characters and objects interact with one another during the animation process.

Developing a Rough Screenplay

Creating a screenplay is basically taking a story and creating the environment in a 3-dimensional virtual world (for my thesis I animated excerpts from a Bengali comic book (Ray, 1972) ). The procedure for doing the transformation is to take the clean, distinct breaks in the story and define or build scenes from this information. The model used for this thesis is as follows:

When a character is introduced to or exits from the main set and the flow of the story allows for a distinct "break", then a scene can be developed. By knowing the dialogue in this scene, character placement, actions, and movement can be developed. Another key factor in creating a screenplay is to define the proper environment or setting. An advantage of using virtual sets as opposed to physical sets is that the computer allows an animator to create any set or environment that the is necessary at a very low cost comparison to the physical construction of the set. Another advantage of the virtual set is that it may be impossible to create the desired set in the real world.

In this study, the screenplay was developed from rough ideas of environment, characters needed, character placement, and finally character movements and actions. Other features can be added to enhance the overall effect of the movie but are not critical at the time of creating the animation. An example of this may be a flock of birds flying around trees or a twinkling sun.

The first step in creating the screenplay was to create the set. In the opening scene of the story, the little boy is sitting under a shade tree and is sweating. When he reaches for a handkerchief sitting next to him to wipe off his sweat, he discovers that it has transformed into a cat. By taking this information, a rough screenplay can be developed. I chose to have the setting to be a flat, grassy field with some palm trees and a blue sky with clouds. The logic behind these choices is simple - the green grass and blue sky give a feeling of a nice, summer or spring day. In order to develop the idea of a hot day, I chose to use palm trees as opposed to oak or other seasonal trees. The distinct advantage of using the palm tree is for effect as well as for rendering/computational complexity. The palm tree has a much simpler geometry than do the seasonal trees. The seasonal tree models have roughly 20,000 faces each and require much more computational resources in terms of time and memory than do the much simpler palm tree models.

Figure 17: Developing the set

Since I am creating the screenplay from scratch I have the liberty of adding certain visual effects that will enhance the overall scene. One such enhancement was the notion of a yard. This was achieved by using white picket fencing to mark off a perimeter. I also felt that the screenplay would benefit by having the boy walk from inside of his house to the shade tree. This gave a little more background familiarity for the character by giving the viewer something to relate to with the character. I chose a simple model of a house since it is not a key factor in the progression of the story. The underlying philosophy I employed during the entire process is that "the simpler the models, the less computational time required to render the animation."

Animating 3-D Objects

Now that the set has been created, the next important phase is to animate the objects in the scene. To start the opening scene I wanted to have the character walk from inside of his house to the main tree. For me to do this, several things needed to be considered. I had to determine how far the tree was from the house and also imagine how fast the character would be traveling to get there - i.e. is he walking or running? A more technical area that needed to be addressed was that of timing. Since my target frame rate at playback is 15 frames per second (30 - 60 is optimal but for time and space considerations I'm using 15 fps) and I chose to have the character walk, I had to calculate how many frames will have elapsed while walking from the house to the tree. By guessing that it would take roughly 4 seconds to walk from the house to the tree, I can calculate that I will then need 60 frames to work with. Now I have my first two key frames - frame 0 and frame 60. Note that this is actually 61 frames but in order to maintain simplicity I tried to number my key frames at frames with integer divisor of 5. This book keeping methodology is helpful when "debugging" an animation sequence in that the key frames are easier to remember and I can quickly traverse through the sequence and fix any problems. There's no reason to write down the key frames either since 3D Studio will color the mesh parts differently depending on when the part was last moved. For example, if I am at frame 20 and I rotate a character's arm, the rotated part of the mesh, the arm in this case would then be displayed as a white wire frame and the rest of the mesh would be displayed using a black wire frame.

To begin the animation I enter the keyframer program in 3D Studio and set my animation counter to frame 0. Here I set initial positions of all the objects and characters in the scene. At frame 0 I will place the character inside of the house with the door already open. I have to pay careful attention so that the character and all the objects are in the proper position with respect to the ground and other objects. In the animation world, objects are not "rigid" and can overlap or intersect other objects. I can properly place the objects by using the 4 different viewports in the keyframer program. By adjusting the viewports I can get any angle or view of the scene or a particular object that I need. The most common views are already defined, such as top, left, right, front, etc. but there is also the option of creating a user defined viewport by rotating the axes with the mouse. Once the proper perspective is achieved it may be necessary to zoom in on a particular area of interest, say, the character's feet with respect to the ground.

By looking at the location of the character's feet with respect to the ground, it is then possible to properly place the character so that the feet are on the ground. To move the character, you must move the object that is the parent to all other objects. In my case, to move the character I have to move the Upper Torso. If I simply move the legs or the feet, the legs will then detach from the rest of the model. I would like to clarify that by moving an object, I do not mean to simply rotate it but rather to actually change the object's location with respect to the other objects in the scene. Once the character is properly positioned with respect to the ground, simply repeat the process for each character or object in the scene.

Now it is time to animate the character by making him walk from the house to the tree. Since I'm using a 15 frames per second target playback speed, I set the current frame to 50. Move the character from the house to the tree by moving the Upper Torso from the house to the tree. Doing so will also move the children of the Upper Torso so that the entire character moves. If you were to then playback the 50 frame segment, the character would move from the house to the tree.

Figure 18: Frame 50, Character is standing

Now that the character is standing next to the tree, it is desirable to make the character sit under the tree using a kneeling approach. At frame 55, rotate the character's left leg back, left calf back, right leg forward, and move the entire character so that the feet are always touching the ground. What is taking place is that we are making the character start to kneel.

Figure 19: Frame 55, Character starting to kneel

At frame 60, the character should be in the kneeling position by having the character's left knee on the ground and the right thigh should be parallel to the ground with the right calf perpendicular to the ground.

Figure 20: Frame 60, Character is kneeling

It is now time to have the character rotate the left leg so that the left calf is under the right knee. The right calf should be rotated towards the character and the right thigh should be rotated upwards. Keep in mind that the entire model should be moved so that the feet are touching the ground.

Figure 21: Frame 70, Getting into sitting position

The final step in having the character kneel is to rotate the right leg and left leg and position the character so that the feet and the character's hips are on the ground.

Figure 22: Frame 75, Character is sitting

As the story progresses, the little boy reaches for his handkerchief and discovers that it is has transformed into a cat. To create this effect I reduced the scale of the handkerchief and increased the scale of the cat. By reducing the scale of the handkerchief I was able to make the handkerchief disappear over several frames. I chose to have the handkerchief transform into a cat over frames 80 to 150.

Figure 23: Character reaches for handkerchief

Once the handkerchief is small enough that it cannot be seen, I then increase the scale of the cat (which I originally altered so that the cat would not appear until needed).

Figure 24: Handkerchief transforms into a cat

Once the cat appears it is necessary to have interaction between the characters. The little boy is quite surprised that the cat is there and is having trouble believing what has taken place. The next frame I will describe is frame 250 which has the main character covering his eyes in disbelief and the cat moving to stand in front of him. In order to have the boy stand, I used the same process that I described for the sitting effect in that the legs are properly rotated (with respect to how real human legs bend) and the feet must always be on the ground. All movement is done using limb rotation and by actually placing the character in proper position with respect to the ground.

Figure 25: Example of limb rotation

At frame 320 I wanted to have the boy reach out to the cat to convince himself that he was seeing a talking cat. By rotating the boy's limbs, similar to the kneeling process, in the Keyframer, I am able to have him timidly reach out for the cat.

Figure 26: Repeating kneeling process

Later in the story the cat leaves and a bird comes to talk to the boy. To make the cat leave I moved it off of the set so that it would no longer be visible by the camera. At frame 615 the boy kneels to talk with the bird.

Figure 27: Boy kneels to talk to bird

By following the steps that I have discussed in this chapter it is possible to see how to animate an entire scene. The process of using limb rotation is an effective alternative to using Inverse Kinematics but does take more time for the animator to achieve realistic results. As discussed in other chapters of this document I found that the Inverse Kinematics plug-in for 3D Studio Version 4 is not useful for character/object interaction. It should be noted however that Inverse Kinematics is a powerful tool used by many high-end modeling programs.

Camera Placement

As the scene stands right now, the viewpoint is not very impressive. Though it gives the entire scope of the scene, it also gives too much of a view for the entire performance of the story. As the story progresses it is important that the camera move with the story so that the audience gets the view that is best suited for each moment of the story. Before proper camera placement could be achieved, it was imperative that the character placement be developed and implemented. With this task completed it is now simple for the animator to place cameras and their targets in order to get a good view of the action in the scene. The method I chose was to dolly the camera and target to give various panning views since time does not allow me to do cut scenes for each shot.

Render the Animation

The rendering phase of creating a computer generated movie is one of the most important steps. Here, all the animation and texture mapping are combined to make the scene look realistic - or however the animator wants it to look. This section will discuss the rendering techniques employed in this thesis and will provide information on rendering times for some complex scenes. Rendering times can vary depending on the scene complexity and the rendering model used from a few seconds to several days per image.

The rendering part of 3D Studio allows the animator to control many aspects of the rendering technique by simply pressing a mouse button. The most general rendering options are available as buttons while more in-depth control can be achieved by altering the default values. I will first explain the general options before explaining the more advanced topics The general layout of the rendering screen is as follows.

Shading Limit Flat Gouraud Phong Metal

Anti-Aliasing ON OFF

Filter Maps ON OFF

Shadows ON OFF

Mapping ON OFF

Auto-Reflect ON OFF

Force 2-Sided ON OFF

Force Wire ON OFF

Hidden Geometry SHOW HIDE

Background RESCALE TILE

Configure

File Type Targa

Driver Vibrant

Resolution 0 x 0 x 0.00

Options

Video Color Check OFF

Pixel Size 1.10

Render Alpha NO

Gamma Correction ON

Output

Display No Display Hardcopy Disk

Net ASAP Net Queue

Render

Cancel

This is a representation of the rendering screen within 3D Studio. The bold face items are buttons that are either toggled on/off or will allow you to alter the settings below the button. For example, toggle buttons are the ON OFF buttons which set whether the parameter is set to on or off, such as Anti-Aliasing. An example of a button that allows for setting other parameters is the Configure button which will bring up a separate screen in order to set the File Type, Driver, and the Resolution. An explanation of the basic options follows (Autodesk, 3D Studio).

Shading Limit

3D Studio allows the animator to choose from Flat, Gouraud, Phong, and Metal.

Flat

This is the simplest shading model for a polygon and is often referred to as constant shading. This shading model applies a single color to each face or the object depending on the location of the light source. It is the fastest shading model and yields a faceted look to a rendered object. Notice the specular highlight, or white spot, in the first 2 figures. By taking a closer look at the highlight on the sphere it is possible to see the faceted effect of flat shading.

Figure 28: Flat Shading

Figure 29: Close-up of specular highlight

Gouraud

This shading model will give a more realistic look to a rendered image than will the flat shading model. A characteristic of the Gouraud shading model is that it tends to blur the specular highlight since it interpolates color along each face based on the 3 vertices of the face. A closer look at the specular highlight on the sphere demonstrates the difference between Gouraud shading and Flat shading.

Figure 30: Gouraud Shading

Figure 31: Close-up of specular highlight for Gouraud

Phong

This shading model yields a much more realistic effect than does the Gouraud model in that it interpolates the surface normal at each pixel based on the normals at each vertex. The result is that each pixel can be given a unique color thus achieving a much more realistic looking image. A good example of this can be seen in the specular highlights. Comparison of the various shading models treatment of specular highlights is a good demonstration of their differences and abilities.

Figure 32: Phong Shading

Figure 33: Close-up of specular highlight for Phong

Metal

Metal shading produces a metallic effect and is similar to Phong shading. The difference between Metal and Phong shading is that Metal shading mixes ambient and diffuse colors differently, giving an increase in contrast of the specular highlight. This is a new shading model found in 3D Studio. It adds a swirl effect to an image.


Figure 34: Metal Shading


Figure 35: Close-up of specular highlight for Metal

Anti-Aliasing

Aliasing is the term used to describe the jagged appearance of an edge and is commonly known as "staircasing". This is caused when an "all or nothing" approach is taken to directly translate a scan line to a pixel, where each pixel in the line is replaced with the line's actual color or left unchanged. An example of aliasing can be seen in the following simple example.


Figure 36: Smooth circle, and closeup of circle

The first image shows a relatively nice circle with smooth edges. If a closer look were taken and we actually zoom in on the circle, it is possible to see the effects of aliasing. The second image is a zoom of part of the circle. In this picture it is possible to see that the circle is comprised of small pixels. The arrangement of the pixels is such that for each point along the circle's computed edge, the pixel is either black or white.



Figure 37: Aliased and Anti-aliased images


Figure 38: Zoom of Aliased and Anti-aliased rings

In the above pictures, the images on the left are the original, aliased images and the images on the right are rendered using anti-aliasing. Notice the jagged appearance of the edge of the ball and the rings. The images on the right were rendered using anti-aliasing techniques. In the full size image (all are 640x480 full size) the staircase effect, or aliasing, is greatly reduced. For this example it is possible to see what effect anti-aliasing can have on an image. It should be noted that due to printer resolution, the aliasing may be increased in all the images but it is still possible to see the difference between aliasing and anti-aliasing.

Filter Maps

Toggles the filtering of mapped materials. Typically this option should be left ON unless it is desirable to have extremely sharp textures in the background of the scene.

Shadows

If this option is OFF then there will be no cast shadows in a scene. This will speed up the rendering but will not give a realistic image due to the lack of shadows.


Figure 39:Shadows ON and OFF

The difference in rendering times are significant between the two options. For the image with the shadows OFF, the rendering time was 14 seconds. For the image with the shadows ON, the rendering time was 27 seconds.

Mapping

If this option is OFF, the material mapping information will be ignored however the image will render faster. This is good for while test rendering an image.

Auto-Reflect

If this option is OFF, the auto-reflection maps will be ignored. It will speed up the rendering if it is off. Auto-reflection is when an object can be seen on another object just like the reflection on a pane of glass or shiny object.

Force 2-Sided

If this option is OFF, only the outer, or visible side of a face will be rendered. If this option is ON, both sides of the face will be rendered.


Figure 40: Two Sided ON and OFF

With Two Sided OFF the image took 22 seconds to render. With Two Sided on the image took 25 seconds to render. From the above images it is easy to see the effect that the Two Sided option has on an image.

Force Wire

If this option is ON and Anti-Aliasing is ON, the object will be rendered using a single pixel width lines.


Figure 41: Force Wire ON

Hidden Geometry

If this option is SHOW, all objects, even hidden ones, will be rendered. A hidden object is an object that the user hid in the scene so that it would not be displayed while working on creating the image.

Background

If this option is set to TILE then the background image will be tiled. If the option is RESCALE, the bitmap image that is to be the background will be re-scaled to the rendering resolution


Figure 42: Background set to TILE and RESCALE

File Type

This is the desired output file type. The available file types are Targa, JPEG, GIF, FLIC.

Driver

This is the current video display driver and should not be changed unless an updated driver is available or the video card is changed.

Resolution

The output resolution of the image or animation. The format is as follows: # pixels in the X direction x # pixels in the Y direction x aspect ratio. Typical resolutions are 320 x 200 x .83, 640 x 480 x 1, 1024 x 768 x 1.

Video Color Check

If this option is on the colors are checked against a threshold of valid colors. If this option is off, certain colors will tend to blur when displayed on the computer screen.

Pixel Size

This option will smooth the edges of objects without blurring the object. The range for Pixel Size is 1.0 to 1.5, where higher values result in better quality rendering.

Render Alpha

If this option is on it is then possible to have a 32-bit Targa image.

Gamma Correction

Allows user to calibrate 3D Studio's color map to you monitor. This is often necessary when a rendered image is too dark to be easily seen on the monitor.

Output

This is where you tell the program where to send the rendering job. The following is an explanation of each of the available selections.

Display - this will display the rendering to the computer screen

No Display - this option will not display the rendering to the computer screen

Hardcopy - this option will send the rendering to a printer

Disk - save the rendering to a file.

Net ASAP - send the rendering job to the network for immediate rendering

Net Queue - send the rendering job to the network queue for later rendering

Render

This button starts the rendering process.

Cancel

This button cancels the rendering setup screen .

When I was ready to render the final animation, the settings I used were Phong shading, Anti-aliasing ON, Filter Maps ON, Shadows ON, Mapping ON, Auto-reflect OFF, Force 2-sided ON, Force Wire OFF, Hidden Geometry HIDE, and Background RESCALE. The resolution I chose was a 640 x 480 and the File Type was a FLIC. The remaining options were left at the default values.

As a perfomance note, I rendered the animation using a Pentium 100 with 32 megabytes of RAM and a 486 DX4-100 with 16 megabytes of RAM. I rendered 100 frame segments at a time due to the large size of the FLICs. The Pentium averaged 30 seconds per frame and the 486 averaged 3 minutes per frame. I am convinced that the major difference in rendering time is due to the difference in RAM. The 486 was swapping with the hard drive while the Pentium did not. 3D Studio reported that the total memory necessary to render the animation was approximately 18 megabytes. In that event, the Pentium would never have to use a swap file whereas the 486 would.


Adding Audio and Sound Effects to the Rendered Animation

Adding audio to the animation was a fairly straightforward process. Once I had a rendered scene that I wanted to add audio to I had to first convert the scene from an Autodesk FLIC format to a Windows AVI format. Once the rendered scene was converted using VidEdit, I then used the Sound Recorder program that comes with Windows 95 to add dialogue. The third step was to then merge the animation and the audio together which is accomplished by a program called AviEdit and is part of the Video For Windows Developer's Kit.

After I rendered a specific scene that I wanted to add audio to, the output file from 3D Studio was an Autodesk FLIC. A FLIC file is the type of animation file output by 3D Studio and is comprised of a series of GIF images played in succession. Since GIF images are not a compressed image format, it is then easy to see that a file consisting of a series of GIF images will be a very large file. By using the program VidEdit, I am able to not only convert the large FLIC file to an AVI file but I am also able to compress the animation using the CINEPAK compression option. Though this process takes some time to perform, it is well worth the time spent in that it yields much smaller files.

Due to the nature of the Autodesk FLIC format, it is not possible to embed audio into a FLIC file. It was necessary to convert the FLIC files to the Windows AVI format since the AVI file can have audio embedded in it and can be compressed. By using CINEPAK compression on an AVI file, it was possible to have an AVI file 1/3 the size of the FLIC file.

Now that I have a compressed AVI file of the animation, I can use the Sound Recorder to add audio to the animation. I found that the best way to add audio to an animation is to add it to short animation sequences, maybe an segment consisting of 30 seconds of animation. This way, if I don't like the merging of the audio and the video, I can quickly do it again until I get the desired result.

The method I used to properly synchronize the audio and video was to play the animation and simultaneously record the dialogue for it using the Sound Recorder utility. Using this method, I could then play the animation and replay the audio file to see if I properly synchronized the audio and video before merging the two files together. This is why I suggested adding audio to short animation sequences so that in the event that the audio and video do not properly synchronize, the time required to try again is minimized. Another reason to use the short sequences is that once the audio is added to the video and the file is saved which contains both audio and video, the file can possibly be very large. The size of the file depends greatly on the recording quality used for the audio. Low sampling rates will yield smaller files, whereas higher sampling rates will yield much larger files.

Finally, to merge the AVI animation and the audio, I used AviEdit. I first opened the animation file and then merged it with the audio file. The result is an AVI file that has audio embedded in it.

Network Rendering

Networks can be a very useful tool for rendering large animations. 3D Studio has a network rendering feature which allows rendering jobs to be placed in a network queue where a machine on the network will take a job and render it. The drawback I found with 3D Studio's network rendering feature is that I was unable to tell multiple machines to render a job and output it as a complete animation, or FLIC. However, it is possible to manually tell the computers which segment of frames to render and then manually cut and paste the output FLICs into one FLIC. Ideally, the network rendering feature should divide the work for a single rendering job over all computers that are available to render. The version of 3D Studio that I used for my thesis, 3D Studio Version 4, does not fully utilize the power of network rendering but rather uses it as a means to allow multiple machines networked together to be used to render separate rendering jobs. The primary reason that the network rendering feature is available in the release is due to the hardware lock that is needed to run 3D Studio. With network rendering it is possible to have multiple machines render separate jobs but the network must be configured using a Master/Slave relationship between the machines. In this sense, the Master machine is the computer with the hardware lock installed and the Slave machines are those computers that do not have the hardware lock installed on them. This setup is beneficial in situations where there are a large number of rendering jobs to be done and it is desirable to use multiple computers to render them. The primary difference between Master and Slave mode is that a computer running in 3D Studio Slave mode is unable to do anything other than render network jobs. A computer in Master mode is capable of performing all features of 3D Studio. Basically, in this release of 3D Studio the network rendering feature is to allow a network of computers with the 3D Studio software installed on them to be used as a rendering farm but only the machine that has the hardware lock installed on it is fully capable of using 3D Studio. Slave mode is a special mode that 3D Studio will run in that only allows the computer to render network rendering jobs. It should be noted that the Autodesk recommends that the hardware lock be insured against theft since it is very expensive and without it 3D Studio will not run.

Ideally, the network rendering feature should take a network rendering job and divide the work among the available networked machines. It should take into account the speed of the various machines in the network and then distribute portions of the rendering job to the different machines. In a sense, the network rendering feature should be modeled after a parallel system where each computer is a node and the nodes communicate with one another to efficiently render the entire job.

Using a network consisting of a Pentium 100, a Pentium 75, and a 486DX4-100, I experimented with the network rendering feature to find its capabilities. By setting the Pentium 100 up as the Master and the other two computers as Slave machines, I found some interesting problems with the network rendering feature. The first problem was that if I assign a rendering job to be output as a FLIC format file, the work cannot be distributed across multiple machines on the network and only one machine can be used to render it. What I did discover though is that if the output of the rendering job is to be a series of GIFs then the network rendering feature will assign tasks to the computers on a first come first serve basis. The second problem that I found was that once I place a job in the network queue, I cannot use the Master computer to participate in the rendering. Compared with the network configuration I used, a more effective use of resources would be to have the slowest computer be the Master and have the faster computers be the Slave computers. This way since the Master computer can only tell the Slave computers what frames need rendered, the Slaves are then the only computers capable of automated rendering using the 3D Studio network rendering feature. It is possible to have the Master computer help render an animation job but it must be noted that I had to manually tell the Master which frames to render, say, 0 to 50 for example, and then submit a network rendering job of the remaining frames, frames 51 on. This way, the Slave computers will divide the work of the segment of frames that I submitted to them and the Master computer will have to render frames 0 to 50 by itself.

Once all the frames of the animation are rendered, the result using the network rendering feature is a series of GIFs stored on the Master computer's hard drive. It is now necessary to manually create a FLIC file from these GIFs. It should be noted that a FLIC is simply a large file containing a succession of GIFs. Unfortunately, the tools that I found to convert a series of GIFs into a FLIC are limited and can only create a low resolution FLIC of 320x200. This entire episode was an exercise in futility and is used to show the limitations of the provided network rendering in 3D Studio.

Profiling a Minimal Ray Tracer

Once the animation is ready to be rendered, 3D Studio offers 4 polygonal based shading models to choose from. In an attempt to learn how computationally intensive rendering is, I profiled a ray-tracing program. Although ray-tracing and polygonal shading are fundamentally different, it is still an interesting and worthwhile investigation. To learn more about the demands of rendering, I profiled a minimal ray-tracing program. The program, minray.c, was a combined effort by Paul Heckbert, Darwyn Peachey, and Joe Cychosz to write a minimal ray tracer. A copy of the original source code and header ray.h, as well as the profiled functions are located in the appendices.

The program minray.c was divided into 6 individual functions that were each tested on the DLX simulator. The appendices contain the data obtained from profiling each of the original functions. Also included are the results from altering the functions in an attempt to increase the level of parallelism and decrease overall execution speed. This chapter will also identify the bottlenecks in the ray-tracing pipeline as observed from the minray.c program.

Profiled Functions of minray.c

Here is a table illustrating the number of times each function is called when minray is executed with the given ray.h file.




FunctionFrequency
vdot120978
vcomb99408
vunit15946
intersect9011
trace5998
main1

Figure 43: Module Frequency

Now that the frequency of each program function is known it will be beneficial to examine each of these functions using the DLX simulator to see how costly in terms of CPU resources each function is.

Observations

vdot()

Since the vdot function is called the greatest number of times it is necessary to examine several forms of the function to determine which implementation is faster. The methods used in examining vdot were (1) testing the original function and (2) testing a version of the function implemented as a macro. The macro version of vdot ran over 200% faster than did the functional version. By implementing a macro in place of a function, it is possible to avoid costly branches and other overhead associated with function calls by allowing the compiler to straight-line the code. At a glance the differences can be summarized as follows.

Program version# float stalls # integer operations# float operations total # operationstotal # cycles
function987 592113
macro937 54251

Figure 44: vdot performance

A more in-depth look at the atomic differences between the function version and the macro version reveals that for the integer operations alone, the macro implementation has fewer adds, loads, moves, and stores which result in a faster piece of code than its function counterpart.

vcomb()

The same approach used in profiling vdot was used in profiling vcomb with one exception - the original vcomb function uses a C structure data type. An added test was performed to determine the cost difference between using structures instead of using more variables. It has been suggested that when substituting more variables instead of structures, most compilers will treat each case the same and output similar machine code. With this in mind, the tests that were performed were (1) testing the original vcomb function, (2) testing a macro version of the function, and (3) testing the function version of vcomb without structures to compare with test (1). Summarized here are the results obtained from these tests.

Program version# float stalls # integer operations# float operations total # operationstotal # cycles
original function9 1226128 155
macro 942 64851
original w/o structures9 1166122 131

Figure 45: vcomb performance


macro vs. original function:

A closer look at how the integer operations break down shows that the macro version requires half of the adds and nops, and only a small fraction of the loads and stores required by the function implementation.

structures vs. more variables

After a closer examination of the integer operation distribution, the version using more variables instead of structures requires a similar number of adds, no jumps, and nearly a third fewer loads and stores. However, the original version had nearly 20% fewer nops.

vunit()

This function will be tested somewhat differently than the previous two in that vunit calls both vdot and vcomb. Two versions of vunit will be tested and are described as (1) the original function version and (2) a version that uses only macros. The following table will summarize the data obtained from the simulator.

Program version# float stalls # integer operations# float operations total # operationstotal # cycles
function36246 13259331
macro66131 17148220

Figure 46: vunit performance

Investigating the differences in the integer operations between the 2 versions of vunit reveals that in the macro version there are fewer adds, and far fewer loads, stores and nops. The macro version does have more moves and traps than does the function version.

By inspecting the differences in the floating point operations between the versions shows that the macro implementation has more divides and more conversions from integer to decimal and may be attributed to the testing environment.

intersect()

Testing for intersect consisted of profiling the original code. The following table summarizes the findings.

Program version# float stalls # integer operations# float operations total # operationstotal # cycles
original1851990 13021202547

Figure 47: intersect performance

trace()

Testing for trace consisted of profiling the original code. Below is a summary of the results obtained from the simulator.

Program version# float stalls # integer operations# float operations total # operationstotal # cycles
original0122 0122141

Figure 48: trace performance

When this code was profiled, the parameter corresponding to the level was passed as a 1.

main()

The main function was tested in the following ways: (1) test the original main function, and (2) test the main function along with implementing the vdot macro which was chosen due to the high frequency with which it is called during program execution. This table provides the results from the tests.

Program

version

# float stalls # integer operations# float operations total # operationstotal # cycles
functions46080 54997120480570451 698456
vdot macro46080 49365120480514131 628824

Figure 49: main performance

One alteration is similar in both forms of main which is the replacement of the tan() function with the constant it equates to. Therefore, the overhead in performing a mundane calculation has been eliminated from the while loop in the main function of both test cases.

Due to the number of times vdot is referenced and due to the efficiency of the macro implementation of vdot it is surprising to see the small improvement gained by this testing scenario. Future testing will be done to implement other macros into the testing of main to determine their effect on program execution speed.

What Can Be Learned From This Data

The compiler for the DLX simulator is not an optimizing compiler. Further enhancements could have been made to the profiled cases had the source code generated by the simulator been altered. By altering this machine code, only performance on the DLX simulator would have been noticed. The goal to strive for in enhancing the execution speed of this program is in software pipelining. By implementing techniques such as loop unrolling or straight line code in place of branching which correlates to larger basic program blocks, performance will be improved over the existing code.

Though 3D Studio uses polygonal rendering, it is still beneficial to see the computational overhead associated with ray-tracing.

Problems Encountered

During the course of this thesis I encountered many problems along the way. Most of the problems I encountered dealt with the immense computational power and space required by computer graphics. In the appendices, I've included work I've done on profiling a ray-tracing algorithm. There were also problems with using freeware programs in that they did not do what I expected or hoped them to be able to do. This section of the thesis will recognize the range of the problems I had and will explain my solution for the problems. There is no easy way to divide the problems into specific categories so I will simply discuss each one in no specific order.

One of the most common problems was in storing and transporting the large animation files between CERC and my home. At home I used a 486 DX4-100 and a Pentium 100 and at CERC I used a Pentium 100. When I would need to render a scene that would take a very long time, on order of several days of constant rendering time, I would take the job home and render it on my 486. Once the rendering was complete I would then want to take it to CERC in order to demonstrate it to my advisors. The first and foremost problem with this idea was that I had to figure out how to get a file that was well over 100 megabytes from my house to CERC. At home I have a 33.6 kb modem that I could have used to simply upload the large file to my CERC account and then download it once I was at CERC. Unfortunately, at 33.6 kb, I'd be a very old graduate student by the time it finished uploading! Another problem with the modem idea is that I have limited space available on my CERC account (15 megabytes maximum). The solution I used was to simply buy an Iomega Zip drive which can store 100 megabytes per disk. The drive is portable and easy to install to any PC. Another solution I could have used to transport the large animation file was to simply render smaller chunks of the entire animation. This, however, would not be a good solution because there is still over 100 megabytes of data to be transferred. Even if they are smaller files, they all still need to be sent to the PC at CERC.

In rendering the animation sequences, it is necessary to have a large amount of hard drive space for storage as well as memory swap space. A useful feature I used at home was to run 3D Studio under Windows 95 and use the networking feature of Windows to map a network drive on one of the other PC's in the apartment and thus store the large animation files to a network drive instead of to the local hard drive of the rendering machine. This allows the rendering computer to use all of its memory resources for rendering. The hard drive space on the rendering PC can be used for a very large swap file if the animation requires it. The only drawback is in using 3D Studio under Windows 95 because it is not entirely stable nor is it very fast under Windows. 3D Studio version 4 was not meant to be a Windows application but rather a DOS Protected Mode application.

When I began this thesis and had developed a rough set consisting of a fenced in area consisting of 7 deciduous trees, a person standing amongst the trees, and some birds flying around the trees, I thought that I had a very simple scene description to render. I was using a Pentium 100 with 32 megabytes of RAM and was rendering a scene that was 40 frames long. I figured that it would take roughly 30 seconds to render each frame as I was rendering the animation using the lowest possible detail settings with a screen resolution of 640x480. I had anti-aliasing turned off, flat shading, no shadows, two sided turned off, etc. Much to my surprise, the scene took an entire weekend to render! I found out that not only did the scene require all 32 megabytes of system memory, it also had created a 50 megabyte swap file. I determined that the cause of the large swap file were the trees. Each tree consisted of 20,000 faces! In comparison, the trees that I used in place of the 7 deciduous trees were 4 palm trees that consisted of only 1400 faces.

The problem encountered in converting the FLIC file to an AVI file was that I constantly kept running out of memory on a Pentium 100 with 32 megabytes of RAM and a maximum swap file of 700 megabytes. The original size of the FLIC file was 100 megabytes and I could not configure Windows 95 nor the program VidEdit to convert the large animation file. The program VidEdit is a utility that is available in the Video for Windows Developer's Kit. I had repeatedly gotten the error message that the system was dangerously low on system resources! The solution to convert the files from FLIC to AVI was to render the entire animation sequence again except I only rendered 100 frame segments. The resulting FLIC files were 20 megabytes each which allowed me to convert them to AVI files using the VidEdit utility. The resulting AVI files were 1/3 the size of the original 20 megabyte FLIC files. These AVI files are only animation files and do not have audio embedded in them yet.

Another software based problem was in using a routine, or program, known as an IPAS routine in 3D Studio. The routine I was trying to use is a freeware IPAS routine that is called Bones which should allow me to use a skeleton frame and then apply a high detail mesh object to move as the skeleton does. I found several problems with this IPAS which include the inability to select certain portions of a high detail mesh and instruct the portion to move as does the skeletal portion of the skeleton. An example would be to have a skeleton model of an arm and then animate the skeletal arm. Next, select the arm of a high detail mesh of a human and have those faces of the high detail arm move with respect to the skeletal arm. Unfortunately, the IPAS I was using does not allow me to do this. I am only allowed to select the parent object of the entire skeleton hierarchy and then select the high detail model that I want to bend. The results I've obtained were not impressive. The resulting animation looked like the human model had bones and joints in the wrong places and looked very painful! The solution for this problem is 3D Studio Max with the Character Studio plug-in. This package was ordered well after the thesis was started and was simply ordered after I unexpectedly saw a demo. Unfortunately, 3D Studio Max has not yet arrived and there are less than three weeks until my defense. I will talk more about the potential of 3D Studio Max and the Character Studio plug-in in the Future Work chapter of this paper.

Discussions and Future Work

After working on this thesis for almost a year, I've only been able to scratch the surface of the potential of computer animation. My only regret from working on my thesis is that the days simply are not long enough! Every day I thought of new things I wanted to try and only got to work on some of them. Luckily, while working on my thesis I had the opportunity to create animation sequences for some other projects which allowed me to try a few of these ideas. This chapter contains a discussion about my work as well as some ideas of future work to build on what I've done.

One of the regrets that I have is that I was unable to get the characters to walk. Unfortunately, with 3D Studio Version 4 there is no easy way to do this and I was forced to have my characters slide instead of walk. For future work I would like to enhance the work I've done by using 3D Studio Max with the Character Studio plug-in. With the Character Studio plug-in I will be able to use Footstep Driven Keyframe Animation which allows me to place the footprints for the characters and the program will then construct the key frames to provide me with a rough sketch of their movement. I will then be able to add body swaying, strutting, skipping, dancing, or jumping with the click of a button.

Perhaps the most important feature in creating realistic computer animation is the Inverse Kinematics feature which this version of 3D Studio did not live up to my expectations or needs. I was hoping to be able to use Inverse Kinematics to quickly and easily create a realistic animation. Unfortunately the limitation of the version I was using did not allow me to import more than one object hierarchy into the Inverse Kinematics program. This prevented me for being able to utilize the potential of Inverse Kinematics. The good news for future work is that the Character Studio program allows for Advanced Inverse Kinematics which will allow me to dynamically attach and detach an object to a character's hands, such as throwing a Frisbee or catching a ball. It will also be possible to have multiple object hierarchies interact so that I can animate the objects with respect to each other. The potential here is easily seen from the example of wanting the main character in the story to sit under the tree. Now, it will be possible to use Inverse Kinematics to animate the scene and have the character sit under the tree at a fraction of the time that manually rotating and moving each limb of the character took. This is possible since both object hierarchies will be visible from within the Inverse Kinematics program.

Perhaps the most noticeable feature in my animation is that the humans are simple stick figures. I would like to use the Character Studio to be able to take the skeletons and apply a high detail mesh as well as be able to add muscle bulging and tendon effects to my characters.

Animation is a powerful medium to convey thoughts and ideas but adding audio to the animation is a much more powerful means to convey those thoughts and ideas. My method for adding audio to the animation is elementary but effective. There are several good products on the market that offer a much broader base for adding audio. The two products that I ordered to add audio tracks to the animation are 3D Studio Max and Autodesk's Animator Studio. Both packages have an audio track editor so the audio can be properly fit to the animation. By using repetition, or slightly changing the audio properties by changing the playback speed, the audio can then be fit to the animation. Unfortunately these products have not yet arrived as of 2 weeks prior to my defense.

Conclusion

This thesis dealt with the process of creating a computer generated animation by exploring areas such as the computational aspects of rendering, the use of a modeling package and the tools needed to create an animation, various shading models, memory requirements, video conversion and compression, and how to add audio effects to an animation. While developing my thesis I tried to convey as many aspects as possible dealing with the entire process of creating an animation. By showing these aspects, it is easy to see that creating a computer animation, or multimedia product, is a complex process that is hungry for computational power and memory as I have demonstrated by profiling the ray-tracing program. The exciting part of the future of computer animation is that the current state of computer animation is ready for tomorrow's processors. There is no limit in sight for the potential of computer animation.

As a mechanical engineer, I can see even greater potential for the use of computer animation in the area of accident reconstruction. By creating a 3D model of the scene of an accident combined with information gathered from witnesses, and then incorporating physical laws into creating the reenactment, we will be able to relive almost any accident.

Computer animation has become a powerful tool for expressing ideas and information. We see stunning visual effects in modern movies, 3Dimensional weather maps on the local news, and video games - none of which would be possible without computer generated animation. I have truly enjoyed working on my thesis and sincerely hope to continue working with computer animation.

References

Autodesk. 3D Studio Version 4 Manuals.

Autodesk, Inc., 1994.

Autodesk. Animator Pro Manuals.

Autodesk Inc., 1994.

Cotta Vaz, Mark. Industrial Light & Magic : Into the Digital Realm.

Del-Rey, 1996.

Daly, Steve and Lasseter, John. Toy Story: The Art and Making of the Animated

Film. New York: Hyperion, 1995.

Finch, Christopher. The Art of Walt Disney from Mickey Mouse to the

Magic Kingdoms. Harry N. Abrams Inc., 1995.

Foley, James D. Introduction to Computer Graphics.

Addison-Wesley, 1994.

Glassner, Andrew. An Introduction to Ray-tracing.

Academic Press 1989.

Karinthi, Raghu. "Accurate Z-Buffer Rendering." Graphics Gems V.

Academic Press 1995.

Kinetix. 3D Studio Max and Character Studio [CD-Rom].

Autodesk Inc., 1996.

Laybourne, Kit. The Animation Book.

Crown Publishers, Inc., 1979.

Michenaud, Jean-Michael, et al. (Producer), & Schultz, John (Director). (1995).

The Making of Jurassic Park [Film]. Universal City, CA:

MCA/Universal Home Video.

Ray, Sukumar. "Ha Ja Ba Ra La", (in Bengali).

Cygnet Press, Calcutta, India. 8th Cygnet edition, 1972.

Thomas, Bob. Disney's Art of Animation from Mickey Mouse to Beauty and the

Beast. Hyperion, 1991.

Appendices

Source Code

Minray.c Original Source Code

/* minimal ray tracer, hybrid version - 888 tokens

* Paul Heckbert, ucbvax!pixar!ph, 13 Jun 87

* Using tricks from Darwyn Peachey and Joe Cychosz. */

#define TOL 1e-7

#define AMBIENT vec U, black, amb

#define SPHERE struct sphere {vec cen, color; double rad, kd, ks, kt, kl, ir} \

*s, *best, sph[]

typedef struct {double x, y, z} vec;

#include "ray.h"

yx;

double u, b, tmin, sqrt(), tan();

double vdot(A, B)

vec A, B;

{

return A.x*B.x + A.y*B.y + A.z*B.z;

}

vec vcomb(a, A, B) /* aA+B */

double a;

vec A, B;

{

B.x += a*A.x;

B.y += a*A.y;

B.z += a*A.z;

return B;

}

vec vunit(A)

vec A;

{

return vcomb(1./sqrt(vdot(A, A)), A, black);

}

struct sphere *intersect(P, D)

vec P, D;

{

best = 0;

tmin = 1e30;

s = sph+NSPHERE;

while (s-->sph)

b = vdot(D, U = vcomb(-1., P, s->cen)),

u = b*b-vdot(U, U)+s->rad*s->rad,

u = u>0 ? sqrt(u) : 1e31,

u = b-u>TOL ? b-u : b+u,

tmin = u>=TOL && u<tmin ?

best = s, u : tmin;

return best;

}

vec trace(level, P, D)

vec P, D;

{

double d, eta, e;

vec N, color;

struct sphere *s, *l;

if (!level--) return black;

if (s = intersect(P, D));

else return amb;

color = amb;

eta = s->ir;

d = -vdot(D, N = vunit(vcomb(-1., P = vcomb(tmin, D, P), s->cen)));

if (d<0)

N = vcomb(-1., N, black),

eta = 1/eta,

d = -d;

l = sph+NSPHERE;

while (l-->sph)

if ((e = l->kl*vdot(N, U = vunit(vcomb(-1., P, l->cen)))) > 0 &&

intersect(P, U)==l)

color = vcomb(e, l->color, color);

U = s->color;

color.x *= U.x;

color.y *= U.y;

color.z *= U.z;

e = 1-eta*eta*(1-d*d);

/* the following is non-portable: we assume right to left arg evaluation.

* (use U before call to trace, which modifies U) */

return vcomb(s->kt,

e>0 ? trace(level, P, vcomb(eta, D, vcomb(eta*d-sqrt(e), N, black)))

: black,

vcomb(s->ks, trace(level, P, vcomb(2*d, N, D)),

vcomb(s->kd, color, vcomb(s->kl, U, black))));

}

main()

{

printf("%d %d\n", SIZE, SIZE);

while (yx<SIZE*SIZE)

U.x = yx%SIZE-SIZE/2,

U.z = SIZE/2-yx++/SIZE,

U.y = SIZE/2/tan(AOV/114.5915590261), /* 360/PI~=114 */

U = vcomb(255., trace(DEPTH, black, vunit(U)), black),

printf("%.0f %.0f %.0f\n", U); /* yowsa! non-portable! */

}

Ray.h

/* ray.h for test1, first test scene */

#define DEPTH 3 /* max ray tree depth */

#define SIZE 8 /* resolution of picture in x and y */

#define AOV 25 /* total angle of view in degrees */

#define NSPHERE 1 /* number of spheres */

AMBIENT = {.02, .02, .02}; /* ambient light color */

/* sphere: x y z r g b rad kd ks kt kl ir */

SPHERE = {

0., 6., .5, 1., 1., 1., .9, .05, .2, .85, 0., 1.7,

-1., 8., -.5, 1., .5, .2, 1., .7, .3, 0., .05, 1.2,

1., 8., -.5, .1, .8, .8, 1., .3, .7, 0., 0., 1.2,

3., -6., 15., 1., .8, 1., 7., 0., 0., 0., .6, 1.5,

-3., -3., 12., .8, 1., 1., 5., 0., 0., 0., .5, 1.5,

};

Macro Based Modules

vdot

/* vdot_macro.c */

typedef struct {double x,y,z;} vec;

#define vdot(result,A,B) result = A.x*B.x + A.y*B.y + A.z*B.z

main()

{

vec A,B;

double result;

vdot(result,A,B);

}

vcomb

/* vcomb_profile_macro.c */

typedef struct {double x,y,z;} vec;

#define vcomb(a, A, B) \

B.x += a*A.x; \

B.y += a*A.y; \

B.z += a*A.z

main()

{

vec A,B;

double a;

vcomb(a,A,B);

}

vunit

/* vunit_profile_macro.c */

typedef struct {double x,y,z;} vec;

#define vdot(result_vdot,A,B) result_vdot = A.x*B.x + A.y*B.y + A.z*B.z

#define vcomb(a, A, B) \

B.x += a*A.x; \

B.y += a*A.y; \

B.z += a*A.z

#define vunit(result_vunit,A) \

vdot(result_vdot,A,A); \

vcomb(1.0/sqrt(result_vdot),A,A); \

result_vunit = A

main()

{

vec result_vunit,data;

double result_vdot;

vunit(result_vunit,data);

}

main profile with vdot as a macro

/* main_profile_vdot_macro.c */

#define TOL 1e-7

#define AMBIENT vec U, black, amb

#define vdot(vdot_result,A,B) vdot_result = A.x*B.x + A.y*B.y + A.z*B.z

#define SPHERE struct sphere {vec cen, color; double rad, kd, ks, kt, kl, ir;}*s, *best, sph[]

typedef struct {double x, y, z;} vec;

#include "ray.h"

yx;

#include <math.h>

double u, b, tmin, sqrt(), tan(), vdot_result;

int vdot_counter=0, vunit_counter=0, inter_counter=0, trace_counter=0, vcomb_counter=0;

vec vcomb(a, A, B) /* aA+B */

double a;

vec A, B;

{

vcomb_counter++;

B.x += a*A.x;

B.y += a*A.y;

B.z += a*A.z;

return B;

}

vec vunit(A)

vec A;

{

vunit_counter++;

vdot(vdot_result, A, A);

return vcomb(1./sqrt(vdot_result), A, black);

}

struct sphere *intersect(P, D)

vec P, D;

{

inter_counter++;

best = 0;

tmin = 1e30;

s = sph+NSPHERE;

while (s-->sph)

U = vcomb(-1., P, s->cen),

vdot(vdot_result, D, U),

b = vdot_result,

vdot(vdot_result, U, U),

u = b*b-vdot_result+s->rad*s->rad,

u = u>0 ? sqrt(u) : 1e31,

u = b-u>TOL ? b-u : b+u,

tmin = u>=TOL && u<tmin ?

best = s, u : tmin;

return best;

}

vec trace(level, P, D)

vec P, D;

{

double d, eta, e;

vec N, color;

struct sphere *s, *l;

trace_counter++;

if (!level--) return black;

if (s = intersect(P, D));

else return amb;

color = amb;

eta = s->ir;

P = vcomb(tmin, D, P);

N = vunit(vcomb(-1.,P, s->cen));

vdot(vdot_result, D, N);

d = -vdot_result;

if (d<0)

N = vcomb(-1., N, black),

eta = 1/eta,

d = -d;

l = sph+NSPHERE;

while (l-->sph)

U = vunit(vcomb(-1., P, l->cen)),

vdot(vdot_result, N, U);

if ((e = l->kl*vdot_result) > 0 && intersect(P, U)==l)

color = vcomb(e, l->color, color);

U = s->color;

color.x *= U.x;

color.y *= U.y;

color.z *= U.z;

e = 1-eta*eta*(1-d*d);

/* the following is non-portable: we assume right to left arg evaluation.

* (use U before call to trace, which modifies U) */

return vcomb(s->kt,

e>0 ? trace(level, P, vcomb(eta, D, vcomb(eta*d-sqrt(e), N, black)))

: black,

vcomb(s->ks, trace(level, P, vcomb(2*d, N, D)),

vcomb(s->kd, color, vcomb(s->kl, U, black))));

}

main()

{

vec result_vunit, result_trace;

while (yx<SIZE*SIZE)

{

U.x = yx%SIZE-SIZE/2;

U.z = SIZE/2-yx++/SIZE;

U.y = SIZE/0.4456;

result_vunit = vunit(U);

result_trace = trace(DEPTH, black, result_vunit);

U = vcomb(255.,result_trace, black);

/* printf("%.0f %.0f %.0f\n",U); */

/*

U = vcomb(255., trace(DEPTH, black, vunit(U)), black);

*/

}

printf("vunit_counter = %d\nvdot_counter = %d\nvcomb_counter = %i\n",vunit_counter, vdot_counter, vcomb_counter);

printf("trace_counter = %d\ninter_counter = %d\n",trace_counter, inter_counter);

}






Non-Macro Based Modules

vdot.c

/* vdot.c */

typedef struct {double x,y,z;} vec;

double vdot(A,B)

vec A,B;

{

return A.x*B.x + A.y*B.y + A.z*B.z;

}

main()

{

double result;

vec A,B;

result = vdot(A,B);

}

vcomb.c

/* vcomb.c */

typedef struct {double x,y,z;} vec;

vec vcomb(a, A, B)

double a;

vec A, B;

{

B.x += a*A.x;

B.y += a*A.y;

B.z += a*A.z;

return B;

}

main()

{

vec result,A,B;

result = vcomb(1.0,A,B);

}

vcomb without structures

/* vcomb_profile_nostruct.c */

void vcomb(a,ax,ay,az,bx,by,bz)

double a,ax,ay,az,bx,by,bz;

{

bx += a*ax;

by += a*ay;

bz += a*az;

}

main()

{

double bx,by,bz;

vcomb(1.0,2.0,3.0,4.0,bx,by,bz);

}

vunit_profile.c

/* vunit_profile.c */

typedef struct {double x,y,z;} vec;

double vdot(A,B)

vec A,B;

{

return A.x*B.x + A.y*B.y + A.z*B.z;

}

vec vcomb(a, A, B)

double a;

vec A, B;

{

B.x += a*A.x;

B.y += a*A.y;

B.z += a*A.z;

return B;

}

vec vunit(A)

vec A;

{

return vcomb(1./sqrt(vdot(A,A)),A,A);

}

main()

{

vec result, data;

result = vunit(data);

}

trace_profile.c

/* trace_profile.c */

#define TOL 1e-7

#define AMBIENT vec U, black, amb

#define SPHERE struct sphere {vec cen, color; double rad, kd, ks, kt, kl, ir;}*s, *best, sph[]

typedef struct {double x, y, z;} vec;

#include "ray.h"

double u, b, tmin, sqrt(), tan();

double vdot(A, B)

vec A, B;

{

return A.x*B.x + A.y*B.y + A.z*B.z;

}

vec vcomb(a, A, B) /* aA+B */

double a;

vec A, B;

{

B.x += a*A.x;

B.y += a*A.y;

B.z += a*A.z;

return B;

}

vec vunit(A)

vec A;

{

return vcomb(1./sqrt(vdot(A, A)), A, black);

}

struct sphere *intersect(P, D)

vec P, D;

{

best = 0;

tmin = 1e30;

s = sph+NSPHERE;

while (s-->sph)

b = vdot(D, U = vcomb(-1., P, s->cen)),

u = b*b-vdot(U, U)+s->rad*s->rad,

u = u>0 ? sqrt(u) : 1e31,

u = b-u>TOL ? b-u : b+u,

tmin = u>=TOL && u<tmin ?

best = s, u : tmin;

return best;

}

vec trace(level, P, D)

vec P, D;

{

double d, eta, e;

vec N, color;

struct sphere *s, *l;

if (!level--) return black;

if (s = intersect(P, D));

else return amb;

color = amb;

eta = s->ir;

d = -vdot(D, N = vunit(vcomb(-1., P = vcomb(tmin, D, P), s->cen)));

if (d<0)

N = vcomb(-1., N, black),

eta = 1/eta,

d = -d;

l = sph+NSPHERE;

while (l-->sph)

if ((e = l->kl*vdot(N, U = vunit(vcomb(-1., P, l->cen)))) > 0 &&

intersect(P, U)==l)

color = vcomb(e, l->color, color);

U = s->color;

color.x *= U.x;

color.y *= U.y;

color.z *= U.z;

e = 1-eta*eta*(1-d*d);

/* the following is non-portable: we assume right to left arg evaluation.

* (use U before call to trace, which modifies U) */

return vcomb(s->kt,

e>0 ? trace(level, P, vcomb(eta, D, vcomb(eta*d-sqrt(e), N, black)))

: black,

vcomb(s->ks, trace(level, P, vcomb(2*d, N, D)),

vcomb(s->kd, color, vcomb(s->kl, U, black))));

}

main()

{

vec A,B,result;

int level = 1;

result = trace(1,A,B);

}sphere intersect profile

/* intersect_profile.c */

#define TOL 1e-7

#define AMBIENT vec U, black, amb

#define SPHERE struct sphere {vec cen, color; double rad, kd, ks, kt, kl, ir;}*s, *best, sph[]

typedef struct {double x, y, z;} vec;

#include "ray.h"

double u, b, tmin, sqrt(), tan();

double vdot(A, B)

vec A, B;

{

return A.x*B.x + A.y*B.y + A.z*B.z;

}

vec vcomb(a, A, B) /* aA+B */

double a;

vec A, B;

{

B.x += a*A.x;

B.y += a*A.y;

B.z += a*A.z;

return B;

}

vec vunit(A)

vec A;

{

return vcomb(1./sqrt(vdot(A, A)), A, black);

}

struct sphere *intersect(P, D)

vec P, D;

{

best = 0;

tmin = 1e30;

s = sph+NSPHERE;

while (s-->sph)

b = vdot(D, U = vcomb(-1., P, s->cen)),

u = b*b-vdot(U, U)+s->rad*s->rad,

u = u>0 ? sqrt(u) : 1e31,

u = b-u>TOL ? b-u : b+u,

tmin = u>=TOL && u<tmin ?

best = s, u : tmin;

return best;

}

main()

{

struct sphere *result;

vec A,B;

result = intersect(A,B);

}main_profile.c

/* main_profile.c */

#define TOL 1e-7

#define AMBIENT vec U, black, amb

#define SPHERE struct sphere {vec cen, color; double rad, kd, ks, kt, kl, ir;}*s, *best, sph[]

typedef struct {double x, y, z;} vec;

#include "ray.h"

yx;

#include <math.h>

double u, b, tmin, sqrt(), tan();

int vdot_counter=0, vunit_counter=0, inter_counter=0, trace_counter=0, vcomb_counter=0;

double vdot(A, B)

vec A, B;

{

vdot_counter++;

return A.x*B.x + A.y*B.y + A.z*B.z;

}

vec vcomb(a, A, B) /* aA+B */

double a;

vec A, B;

{

vcomb_counter++;

B.x += a*A.x;

B.y += a*A.y;

B.z += a*A.z;

return B;

}

vec vunit(A)

vec A;

{

vunit_counter++;

return vcomb(1./sqrt(vdot(A, A)), A, black);

}

struct sphere *intersect(P, D)

vec P, D;

{

inter_counter++;

best = 0;

tmin = 1e30;

s = sph+NSPHERE;

while (s-->sph)

b = vdot(D, U = vcomb(-1., P, s->cen)),

u = b*b-vdot(U, U)+s->rad*s->rad,

u = u>0 ? sqrt(u) : 1e31,

u = b-u>TOL ? b-u : b+u,

tmin = u>=TOL && u<tmin ?

best = s, u : tmin;

return best;

}

vec trace(level, P, D)

vec P, D;

{

double d, eta, e;

vec N, color;

struct sphere *s, *l;

trace_counter++;

if (!level--) return black;

if (s = intersect(P, D));

else return amb;

color = amb;

eta = s->ir;

d = -vdot(D, N = vunit(vcomb(-1., P = vcomb(tmin, D, P), s->cen)));

if (d<0)

N = vcomb(-1., N, black),

eta = 1/eta,

d = -d;

l = sph+NSPHERE;

while (l-->sph)

if ((e = l->kl*vdot(N, U = vunit(vcomb(-1., P, l->cen)))) > 0 &&

intersect(P, U)==l)

color = vcomb(e, l->color, color);

U = s->color;

color.x *= U.x;

color.y *= U.y;

color.z *= U.z;

e = 1-eta*eta*(1-d*d);

/* the following is non-portable: we assume right to left arg evaluation.

* (use U before call to trace, which modifies U) */

return vcomb(s->kt,

e>0 ? trace(level, P, vcomb(eta, D, vcomb(eta*d-sqrt(e), N, black)))

: black,

vcomb(s->ks, trace(level, P, vcomb(2*d, N, D)),

vcomb(s->kd, color, vcomb(s->kl, U, black))));

}

main()

{

vec result_vunit, result_trace;

while (yx<SIZE*SIZE)

{

U.x = yx%SIZE-SIZE/2;

U.z = SIZE/2-yx++/SIZE;

U.y = SIZE/0.4456;

result_vunit = vunit(U);

result_trace = trace(DEPTH, black, result_vunit);

U = vcomb(255.,result_trace, black);

/* printf("%.0f %.0f %.0f\n",U); */

/*

U = vcomb(255., trace(DEPTH, black, vunit(U)), black);

*/

}

printf("vunit_counter = %d\nvdot_counter = %d\nvcomb_counter = %i\n",vunit_counter, vdot_counter, vcomb_counter);

printf("trace_counter = %d\ninter_counter = %d\n",trace_counter, inter_counter);

}

Profiling Data

Results for Macro Based Modules

vdot macro profile

(dlxsim) load vdot_profile_macro.s

Heap (for malloc) begins at 0x39C

(dlxsim) go _main

TRAP #0 received

(dlxsim) stats

Memory size: 65536 bytes.

Floating Point Hardware Configuration

1 add/subtract units, latency = 2 cycles

1 divide units, latency = 19 cycles

1 multiply units, latency = 5 cycles

Load Stalls = 0

Floating Point Stalls = 9

No branch instructions executed.

Pending Floating Point Operations:

none.

INTEGER OPERATIONS

==================

ADD 3 ADDI 1 ADDU 0 ADDUI 1

AND 0 ANDI 0 BEQZ 0 BFPF 0

BFPT 0 BNEZ 0 J 0 JAL 1

JALR 0 JR 0 LB 0 LBU 0

LD 9 LF 0 LH 0 LHI 1

LHU 0 LW 2 MOVD 0 MOVF 0

MOVFP2I 0 MOVI2FP 0 MOVI2S 0 MOVS2I 0

OR 0 ORI 0 RFE 0 SB 0

SD 4 SEQ 0 SEQI 0 SEQU 0

SEQUI 0 SF 0 SGE 0 SGEI 0

SGEU 0 SGEUI 0 SGT 0 SGTI 0

SGTU 0 SGTUI 0 SH 0 SLE 0

SLEI 0 SLEU 0 SLEUI 0 SLL 0

SLLI 0 SLT 0 SLTI 0 SLTU 0

SLTUI 0 SNE 0 SNEI 0 SNEU 0

SNEUI 0 SRA 0 SRAI 0 SRL 0

SRLI 0 SUB 0 SUBI 0 SUBU 0

SUBUI 0 SW 2 TRAP 1 XOR 0

XORI 0 NOP 12

Total integer operations = 37

FLOATING POINT OPERATIONS

=========================

ADDD 2 ADDF 0 CVTD2F 0 CVTD2I 0

CVTF2D 0 CVTF2I 0 CVTI2D 0 CVTI2F 0

DIV 0 DIVD 0 DIVF 0 DIVU 0

EQD 0 EQF 0 GED 0 GEF 0

GTD 0 GTF 0 LED 0 LEF 0

LTD 0 LTF 0 MULT 0 MULTD 3

MULTF 0 MULTU 0 NED 0 NEF 0

SUBD 0 SUBF 0

Total floating point operations = 5

Total operations = 42

Total cycles = 51vcomb macro profile

(dlxsim) load vcomb_profile_macro.s

Heap (for malloc) begins at 0x3B4

(dlxsim) go _main

TRAP #0 received

(dlxsim) stats

Memory size: 65536 bytes.

Floating Point Hardware Configuration

1 add/subtract units, latency = 2 cycles

1 divide units, latency = 19 cycles

1 multiply units, latency = 5 cycles

Load Stalls = 0

Floating Point Stalls = 9

No branch instructions executed.

Pending Floating Point Operations:

none.

INTEGER OPERATIONS

==================

ADD 3 ADDI 1 ADDU 0 ADDUI 1

AND 0 ANDI 0 BEQZ 0 BFPF 0

BFPT 0 BNEZ 0 J 0 JAL 1

JALR 0 JR 0 LB 0 LBU 0

LD 11 LF 0 LH 0 LHI 1

LHU 0 LW 2 MOVD 0 MOVF 0

MOVFP2I 0 MOVI2FP 0 MOVI2S 0 MOVS2I 0

OR 0 ORI 0 RFE 0 SB 0

SD 5 SEQ 0 SEQI 0 SEQU 0

SEQUI 0 SF 0 SGE 0 SGEI 0

SGEU 0 SGEUI 0 SGT 0 SGTI 0

SGTU 0 SGTUI 0 SH 0 SLE 0

SLEI 0 SLEU 0 SLEUI 0 SLL 0

SLLI 0 SLT 0 SLTI 0 SLTU 0

SLTUI 0 SNE 0 SNEI 0 SNEU 0

SNEUI 0 SRA 0 SRAI 0 SRL 0

SRLI 0 SUB 0 SUBI 0 SUBU 0

SUBUI 0 SW 2 TRAP 1 XOR 0

XORI 0 NOP 14

Total integer operations = 42

FLOATING POINT OPERATIONS

=========================

ADDD 3 ADDF 0 CVTD2F 0 CVTD2I 0

CVTF2D 0 CVTF2I 0 CVTI2D 0 CVTI2F 0

DIV 0 DIVD 0 DIVF 0 DIVU 0

EQD 0 EQF 0 GED 0 GEF 0

GTD 0 GTF 0 LED 0 LEF 0

LTD 0 LTF 0 MULT 0 MULTD 3

MULTF 0 MULTU 0 NED 0 NEF 0

SUBD 0 SUBF 0

Total floating point operations = 6

Total operations = 48

Total cycles = 57vunit macro profile

(dlxsim) load vunit_profile_macro.s

Heap (for malloc) begins at 0x530

(dlxsim) go _main

TRAP #0 received

(dlxsim) stats

Memory size: 65536 bytes.

Floating Point Hardware Configuration

1 add/subtract units, latency = 2 cycles

1 divide units, latency = 19 cycles

1 multiply units, latency = 5 cycles

Load Stalls = 6

Floating Point Stalls = 66

No branch instructions executed.

Pending Floating Point Operations:

none.

INTEGER OPERATIONS

==================

ADD 6 ADDI 2 ADDU 0 ADDUI 4

AND 0 ANDI 0 BEQZ 0 BFPF 0

BFPT 0 BNEZ 0 J 0 JAL 4

JALR 0 JR 3 LB 0 LBU 0

LD 18 LF 0 LH 0 LHI 4

LHU 0 LW 18 MOVD 0 MOVF 0

MOVFP2I 0 MOVI2FP 3 MOVI2S 0 MOVS2I 0

OR 0 ORI 0 RFE 0 SB 0

SD 7 SEQ 0 SEQI 0 SEQU 0

SEQUI 0 SF 0 SGE 0 SGEI 0

SGEU 0 SGEUI 0 SGT 0 SGTI 0

SGTU 0 SGTUI 0 SH 0 SLE 0

SLEI 0 SLEU 0 SLEUI 0 SLL 0

SLLI 0 SLT 0 SLTI 0 SLTU 0

SLTUI 0 SNE 0 SNEI 0 SNEU 0

SNEUI 0 SRA 0 SRAI 0 SRL 0

SRLI 0 SUB 0 SUBI 3 SUBU 0

SUBUI 0 SW 18 TRAP 4 XOR 0

XORI 0 NOP 37

Total integer operations = 131

FLOATING POINT OPERATIONS

=========================

ADDD 5 ADDF 0 CVTD2F 0 CVTD2I 0

CVTF2D 0 CVTF2I 0 CVTI2D 3 CVTI2F 0

DIV 0 DIVD 3 DIVF 0 DIVU 0

EQD 0 EQF 0 GED 0 GEF 0

GTD 0 GTF 0 LED 0 LEF 0

LTD 0 LTF 0 MULT 0 MULTD 6

MULTF 0 MULTU 0 NED 0 NEF 0

SUBD 0 SUBF 0

Total floating point operations = 17

Total operations = 148

Total cycles = 220main profile with vdot as a macro

(dlxsim) load main_profile_vdot_macro.s

Heap (for malloc) begins at 0x2744

(dlxsim) go _main

TRAP #0 received

(dlxsim) stats

Memory size: 65536 bytes.

Floating Point Hardware Configuration

1 add/subtract units, latency = 2 cycles

1 divide units, latency = 19 cycles

1 multiply units, latency = 5 cycles

Load Stalls = 68613

Floating Point Stalls = 46080

Branches: total 4097, taken 2049 (50.01%), untaken 2048 (49.99%)

Pending Floating Point Operations:

none.

INTEGER OPERATIONS

==================

ADD 18435 ADDI 23555 ADDU 0 ADDUI 21513

AND 0 ANDI 0 BEQZ 0 BFPF 0

BFPT 0 BNEZ 4097 J 5120 JAL 5123

JALR 0 JR 5122 LB 0 LBU 0

LD 35841 LF 0 LH 0 LHI 21513

LHU 0 LW 115726 MOVD 0 MOVF 0

MOVFP2I 0 MOVI2FP 4096 MOVI2S 0 MOVS2I 0

OR 0 ORI 0 RFE 0 SB 0

SD 20481 SEQ 0 SEQI 0 SEQU 0

SEQUI 0 SF 0 SGE 2048 SGEI 0

SGEU 0 SGEUI 0 SGT 1025 SGTI 0

SGTU 0 SGTUI 0 SH 0 SLE 0

SLEI 0 SLEU 0 SLEUI 0 SLL 0

SLLI 1024 SLT 0 SLTI 0 SLTU 0

SLTUI 0 SNE 1024 SNEI 0 SNEU 0

SNEUI 0 SRA 0 SRAI 2048 SRL 0

SRLI 0 SUB 2048 SUBI 5122 SUBU 0

SUBUI 0 SW 107535 TRAP 1027 XOR 0

XORI 0 NOP 90128

Total integer operations = 493651

FLOATING POINT OPERATIONS

=========================

ADDD 8192 ADDF 0 CVTD2F 0 CVTD2I 0

CVTF2D 0 CVTF2I 0 CVTI2D 2048 CVTI2F 0

DIV 0 DIVD 1024 DIVF 0 DIVU 0

EQD 0 EQF 0 GED 0 GEF 0

GTD 0 GTF 0 LED 0 LEF 0

LTD 0 LTF 0 MULT 0 MULTD 9216

MULTF 0 MULTU 0 NED 0 NEF 0

SUBD 0 SUBF 0

Total floating point operations = 20480

Total operations = 514131

Total cycles = 628824

Results for Non-Macro Based Modules

vdot profile

(dlxsim) load vdot_profile.s

Heap (for malloc) begins at 0x464

(dlxsim) go _main

TRAP #0 received

(dlxsim) stats

Memory size: 65536 bytes.

Floating Point Hardware Configuration

1 add/subtract units, latency = 2 cycles

1 divide units, latency = 19 cycles

1 multiply units, latency = 5 cycles

Load Stalls = 12

Floating Point Stalls = 9

No branch instructions executed.

Pending Floating Point Operations:

none.

INTEGER OPERATIONS

==================

ADD 5 ADDI 3 ADDU 0 ADDUI 1

AND 0 ANDI 0 BEQZ 0 BFPF 0

BFPT 0 BNEZ 0 J 1 JAL 2

JALR 0 JR 1 LB 0 LBU 0

LD 10 LF 0 LH 0 LHI 1

LHU 0 LW 17 MOVD 0 MOVF 0

MOVFP2I 2 MOVI2FP 0 MOVI2S 0 MOVS2I 0

OR 0 ORI 0 RFE 0 SB 0

SD 4 SEQ 0 SEQI 0 SEQU 0

SEQUI 0 SF 0 SGE 0 SGEI 0

SGEU 0 SGEUI 0 SGT 0 SGTI 0

SGTU 0 SGTUI 0 SH 0 SLE 0

SLEI 0 SLEU 0 SLEUI 0 SLL 0

SLLI 0 SLT 0 SLTI 0 SLTU 0

SLTUI 0 SNE 0 SNEI 0 SNEU 0

SNEUI 0 SRA 0 SRAI 0 SRL 0

SRLI 0 SUB 0 SUBI 1 SUBU 0

SUBUI 0 SW 19 TRAP 1 XOR 0

XORI 0 NOP 19

Total integer operations = 87

FLOATING POINT OPERATIONS

=========================

ADDD 2 ADDF 0 CVTD2F 0 CVTD2I 0

CVTF2D 0 CVTF2I 0 CVTI2D 0 CVTI2F 0

DIV 0 DIVD 0 DIVF 0 DIVU 0

EQD 0 EQF 0 GED 0 GEF 0

GTD 0 GTF 0 LED 0 LEF 0

LTD 0 LTF 0 MULT 0 MULTD 3

MULTF 0 MULTU 0 NED 0 NEF 0

SUBD 0 SUBF 0

Total floating point operations = 5

Total operations = 92

Total cycles = 113

vcomb profile

(dlxsim) load vcomb_profile.s

Heap (for malloc) begins at 0x5FC

(dlxsim) go _main

TRAP #0 received

(dlxsim) stats

Memory size: 65536 bytes.

Floating Point Hardware Configuration

1 add/subtract units, latency = 2 cycles

1 divide units, latency = 19 cycles

1 multiply units, latency = 5 cycles

Load Stalls = 18

Floating Point Stalls = 9

No branch instructions executed.

Pending Floating Point Operations:

none.

INTEGER OPERATIONS

==================

ADD 6 ADDI 4 ADDU 0 ADDUI 2

AND 0 ANDI 0 BEQZ 0 BFPF 0

BFPT 0 BNEZ 0 J 1 JAL 2

JALR 0 JR 1 LB 0 LBU 0

LD 11 LF 0 LH 0 LHI 2

LHU 0 LW 30 MOVD 0 MOVF 0

MOVFP2I 0 MOVI2FP 0 MOVI2S 0 MOVS2I 0

OR 0 ORI 0 RFE 0 SB 0

SD 5 SEQ 0 SEQI 0 SEQU 0

SEQUI 0 SF 0 SGE 0 SGEI 0

SGEU 0 SGEUI 0 SGT 0 SGTI 0

SGTU 0 SGTUI 0 SH 0 SLE 0

SLEI 0 SLEU 0 SLEUI 0 SLL 0

SLLI 0 SLT 0 SLTI 0 SLTU 0

SLTUI 0 SNE 0 SNEI 0 SNEU 0

SNEUI 0 SRA 0 SRAI 0 SRL 0

SRLI 0 SUB 0 SUBI 1 SUBU 0

SUBUI 0 SW 30 TRAP 1 XOR 0

XORI 0 NOP 26

Total integer operations = 122

FLOATING POINT OPERATIONS

=========================

ADDD 3 ADDF 0 CVTD2F 0 CVTD2I 0

CVTF2D 0 CVTF2I 0 CVTI2D 0 CVTI2F 0

DIV 0 DIVD 0 DIVF 0 DIVU 0

EQD 0 EQF 0 GED 0 GEF 0

GTD 0 GTF 0 LED 0 LEF 0

LTD 0 LTF 0 MULT 0 MULTD 3

MULTF 0 MULTU 0 NED 0 NEF 0

SUBD 0 SUBF 0

Total floating point operations = 6

Total operations = 128

Total cycles = 155

vcomb without structures profile

(dlxsim) load vcomb_profile_nostruct.s

Heap (for malloc) begins at 0x83C

(dlxsim) go _main

TRAP #0 received

(dlxsim) stats

Memory size: 65536 bytes.

Floating Point Hardware Configuration

1 add/subtract units, latency = 2 cycles

1 divide units, latency = 19 cycles

1 multiply units, latency = 5 cycles

Load Stalls = 0

Floating Point Stalls = 9

No branch instructions executed.

Pending Floating Point Operations:

none.

INTEGER OPERATIONS

==================

ADD 5 ADDI 3 ADDU 0 ADDUI 5

AND 0 ANDI 0 BEQZ 0 BFPF 0

BFPT 0 BNEZ 0 J 0 JAL 2

JALR 0 JR 1 LB 0 LBU 0

LD 11 LF 0 LH 0 LHI 5

LHU 0 LW 21 MOVD 0 MOVF 0

MOVFP2I 0 MOVI2FP 0 MOVI2S 0 MOVS2I 0

OR 0 ORI 0 RFE 0 SB 0

SD 5 SEQ 0 SEQI 0 SEQU 0

SEQUI 0 SF 0 SGE 0 SGEI 0

SGEU 0 SGEUI 0 SGT 0 SGTI 0

SGTU 0 SGTUI 0 SH 0 SLE 0

SLEI 0 SLEU 0 SLEUI 0 SLL 0

SLLI 0 SLT 0 SLTI 0 SLTU 0

SLTUI 0 SNE 0 SNEI 0 SNEU 0

SNEUI 0 SRA 0 SRAI 0 SRL 0

SRLI 0 SUB 0 SUBI 1 SUBU 0

SUBUI 0 SW 21 TRAP 1 XOR 0

XORI 0 NOP 35

Total integer operations = 116

FLOATING POINT OPERATIONS

=========================

ADDD 3 ADDF 0 CVTD2F 0 CVTD2I 0

CVTF2D 0 CVTF2I 0 CVTI2D 0 CVTI2F 0

DIV 0 DIVD 0 DIVF 0 DIVU 0

EQD 0 EQF 0 GED 0 GEF 0

GTD 0 GTF 0 LED 0 LEF 0

LTD 0 LTF 0 MULT 0 MULTD 3

MULTF 0 MULTU 0 NED 0 NEF 0

SUBD 0 SUBF 0

Total floating point operations = 6

Total operations = 122

Total cycles = 131

vunit profile

(dlxsim) load vunit_profile.s

Heap (for malloc) begins at 0x750

(dlxsim) go _main

TRAP #0 received

(dlxsim) stats

Memory size: 65536 bytes.

Floating Point Hardware Configuration

1 add/subtract units, latency = 2 cycles

1 divide units, latency = 19 cycles

1 multiply units, latency = 5 cycles

Load Stalls = 36

Floating Point Stalls = 36

No branch instructions executed.

Pending Floating Point Operations:

none.

INTEGER OPERATIONS

==================

ADD 14 ADDI 9 ADDU 0 ADDUI 2

AND 0 ANDI 0 BEQZ 0 BFPF 0

BFPT 0 BNEZ 0 J 3 JAL 5

JALR 0 JR 4 LB 0 LBU 0

LD 24 LF 0 LH 0 LHI 2

LHU 0 LW 54 MOVD 0 MOVF 0

MOVFP2I 2 MOVI2FP 1 MOVI2S 0 MOVS2I 0

OR 0 ORI 0 RFE 0 SB 0

SD 12 SEQ 0 SEQI 0 SEQU 0

SEQUI 0 SF 0 SGE 0 SGEI 0

SGEU 0 SGEUI 0 SGT 0 SGTI 0

SGTU 0 SGTUI 0 SH 0 SLE 0

SLEI 0 SLEU 0 SLEUI 0 SLL 0

SLLI 0 SLT 0 SLTI 0 SLTU 0

SLTUI 0 SNE 0 SNEI 0 SNEU 0

SNEUI 0 SRA 0 SRAI 0 SRL 0

SRLI 0 SUB 0 SUBI 4 SUBU 0

SUBUI 0 SW 56 TRAP 2 XOR 0

XORI 0 NOP 52

Total integer operations = 246

FLOATING POINT OPERATIONS

=========================

ADDD 5 ADDF 0 CVTD2F 0 CVTD2I 0

CVTF2D 0 CVTF2I 0 CVTI2D 1 CVTI2F 0

DIV 0 DIVD 1 DIVF 0 DIVU 0

EQD 0 EQF 0 GED 0 GEF 0

GTD 0 GTF 0 LED 0 LEF 0

LTD 0 LTF 0 MULT 0 MULTD 6

MULTF 0 MULTU 0 NED 0 NEF 0

SUBD 0 SUBF 0

Total floating point operations = 13

Total operations = 259

Total cycles = 331

trace profile

(dlxsim) load trace_profile.s

Heap (for malloc) begins at 0x2278

(dlxsim) go _main

TRAP #0 received

(dlxsim) stats

Memory size: 65536 bytes.

Floating Point Hardware Configuration

1 add/subtract units, latency = 2 cycles

1 divide units, latency = 19 cycles

1 multiply units, latency = 5 cycles

Load Stalls = 19

Floating Point Stalls = 0

Branches: total 1, taken 0 (0.00%), untaken 1 (100.00%)

Pending Floating Point Operations:

none.

INTEGER OPERATIONS

==================

ADD 6 ADDI 8 ADDU 0 ADDUI 2

AND 0 ANDI 0 BEQZ 0 BFPF 0

BFPT 0 BNEZ 1 J 1 JAL 2

JALR 0 JR 1 LB 0 LBU 0

LD 3 LF 0 LH 0 LHI 2

LHU 0 LW 34 MOVD 0 MOVF 0

MOVFP2I 0 MOVI2FP 0 MOVI2S 0 MOVS2I 0

OR 0 ORI 0 RFE 0 SB 0

SD 3 SEQ 0 SEQI 0 SEQU 0

SEQUI 0 SF 0 SGE 0 SGEI 0

SGEU 0 SGEUI 0 SGT 0 SGTI 0

SGTU 0 SGTUI 0 SH 0 SLE 0

SLEI 0 SLEU 0 SLEUI 0 SLL 0

SLLI 0 SLT 0 SLTI 0 SLTU 0

SLTUI 0 SNE 1 SNEI 0 SNEU 0

SNEUI 0 SRA 0 SRAI 0 SRL 0

SRLI 0 SUB 0 SUBI 1 SUBU 0

SUBUI 0 SW 35 TRAP 1 XOR 0

XORI 0 NOP 21

Total integer operations = 122

FLOATING POINT OPERATIONS

=========================

ADDD 0 ADDF 0 CVTD2F 0 CVTD2I 0

CVTF2D 0 CVTF2I 0 CVTI2D 0 CVTI2F 0

DIV 0 DIVD 0 DIVF 0 DIVU 0

EQD 0 EQF 0 GED 0 GEF 0

GTD 0 GTF 0 LED 0 LEF 0

LTD 0 LTF 0 MULT 0 MULTD 0

MULTF 0 MULTU 0 NED 0 NEF 0

SUBD 0 SUBF 0

Total floating point operations = 0

Total operations = 122

Total cycles = 141

sphere intersect profile

(dlxsim) load intersect_profile.s

Heap (for malloc) begins at 0x1458

(dlxsim) go _main

TRAP #0 received

(dlxsim) stats

Memory size: 65536 bytes.

Floating Point Hardware Configuration

1 add/subtract units, latency = 2 cycles

1 divide units, latency = 19 cycles

1 multiply units, latency = 5 cycles

Load Stalls = 242

Floating Point Stalls = 185

Branches: total 26, taken 8 (30.77%), untaken 18 (69.23%)

Pending Floating Point Operations:

none.

INTEGER OPERATIONS

==================

ADD 50 ADDI 51 ADDU 0 ADDUI 160

AND 0 ANDI 0 BEQZ 0 BFPF 5

BFPT 15 BNEZ 6 J 29 JAL 22

JALR 0 JR 21 LB 0 LBU 0

LD 233 LF 0 LH 0 LHI 160

LHU 0 LW 341 MOVD 0 MOVF 0

MOVFP2I 20 MOVI2FP 10 MOVI2S 0 MOVS2I 0

OR 0 ORI 0 RFE 0 SB 0

SD 78 SEQ 0 SEQI 0 SEQU 0

SEQUI 0 SF 0 SGE 0 SGEI 0

SGEU 0 SGEUI 0 SGT 0 SGTI 0

SGTU 0 SGTUI 0 SH 0 SLE 0

SLEI 0 SLEU 6 SLEUI 0 SLL 0

SLLI 0 SLT 0 SLTI 0 SLTU 0

SLTUI 0 SNE 0 SNEI 0 SNEU 0

SNEUI 0 SRA 0 SRAI 0 SRL 0

SRLI 0 SUB 0 SUBI 21 SUBU 0

SUBUI 0 SW 342 TRAP 6 XOR 0

XORI 0 NOP 414

Total integer operations = 1990

FLOATING POINT OPERATIONS

=========================

ADDD 45 ADDF 0 CVTD2F 0 CVTD2I 0

CVTF2D 0 CVTF2I 0 CVTI2D 0 CVTI2F 0

DIV 0 DIVD 0 DIVF 0 DIVU 0

EQD 0 EQF 0 GED 0 GEF 0

GTD 0 GTF 0 LED 10 LEF 0

LTD 10 LTF 0 MULT 0 MULTD 55

MULTF 0 MULTU 0 NED 0 NEF 0

SUBD 10 SUBF 0

Total floating point operations = 130

Total operations = 2120

Total cycles = 2547

main profile

(dlxsim) load main_profile.s

Heap (for malloc) begins at 0x283C

(dlxsim) go _main

TRAP #0 received

(dlxsim) stats

Memory size: 65536 bytes.

Floating Point Hardware Configuration

1 add/subtract units, latency = 2 cycles

1 divide units, latency = 19 cycles

1 multiply units, latency = 5 cycles

Load Stalls = 81925

Floating Point Stalls = 46080

Branches: total 4097, taken 2049 (50.01%), untaken 2048 (49.99%)

Pending Floating Point Operations:

none.

INTEGER OPERATIONS

==================

ADD 23555 ADDI 26627 ADDU 0 ADDUI 20489

AND 0 ANDI 0 BEQZ 0 BFPF 0

BFPT 0 BNEZ 4097 J 6144 JAL 6147

JALR 0 JR 6146 LB 0 LBU 0

LD 38913 LF 0 LH 0 LHI 20489

LHU 0 LW 131086 MOVD 0 MOVF 0

MOVFP2I 2048 MOVI2FP 4096 MOVI2S 0 MOVS2I 0

OR 0 ORI 0 RFE 0 SB 0

SD 22529 SEQ 0 SEQI 0 SEQU 0

SEQUI 0 SF 0 SGE 2048 SGEI 0

SGEU 0 SGEUI 0 SGT 1025 SGTI 0

SGTU 0 SGTUI 0 SH 0 SLE 0

SLEI 0 SLEU 0 SLEUI 0 SLL 0

SLLI 1024 SLT 0 SLTI 0 SLTU 0

SLTUI 0 SNE 1024 SNEI 0 SNEU 0

SNEUI 0 SRA 0 SRAI 2048 SRL 0

SRLI 0 SUB 2048 SUBI 6146 SUBU 0

SUBUI 0 SW 123919 TRAP 1027 XOR 0

XORI 0 NOP 97296

Total integer operations = 549971

FLOATING POINT OPERATIONS

=========================

ADDD 8192 ADDF 0 CVTD2F 0 CVTD2I 0

CVTF2D 0 CVTF2I 0 CVTI2D 2048 CVTI2F 0

DIV 0 DIVD 1024 DIVF 0 DIVU 0

EQD 0 EQF 0 GED 0 GEF 0

GTD 0 GTF 0 LED 0 LEF 0

LTD 0 LTF 0 MULT 0 MULTD 9216

MULTF 0 MULTU 0 NED 0 NEF 0

SUBD 0 SUBF 0

Total floating point operations = 20480

Total operations = 570451

Total cycles = 698456