Simulations of Relativistic Extragalactic Jets
G. Comer Duncan  Philip A. Hughes  Mark A. Miller
Proposal Abstract
The major focus of the present proposal for OSC resources is the simulation
of relativistic jets using our 3D, Adaptive Mesh Refinement hydrodynamic
code. Over the last two years we have performed numerous tests on the code
with the goal of making it as efficient as the underlying algorithms permit.
The code is now ready to be used to perform research level simulations on the
OSC Cray T94 and the Origin 2000 machines. The computational requirements of
such simulations are significant in both memory and wall clock hours, and the
3D simulations requested herein require facilities of the caliber of those
at the OSC, principally due to the memory requirements. The present proposal
requests OSC computing and visualization resources to study a number of
challenging relativistic jet problems: the nonlinear development of
instabilities, the nature of jet internal structure, and the evolution of
morphology and dynamics of both deflected and precessing 3D relativistic
jets. With the results of these simulations we can make major strides toward
an understanding of the structure evident in bent jets, assess their
stability, and through comparison with observational data, probe the internal
physical conditions of such flows.
Description of Project Problem
Over the last decade the quantity and quality of images of extragalactic
relativistic jets have increased to a point where it has become evident
that curvature of lightyearscale flows is the norm, not the exception
(e.g., Wardle et al. 1994). Some of this curvature may be `apparent', resulting
from a more modestly curved flow seen close to the line of sight. Nevertheless,
such flows must posses some intrinsic curvature, and thus their 3D morphology
must be addressed.
This raises both questions and possibilities: how can highly relativistic
flows suffer significant curvature, and yet retain their integrity? It
has been argued (Begelman, Blandford & Rees 1984) that sublightyearscale
flows are highly dissipative, and should radiate a significant fraction
of their flow energy if subjected to a perturbation such as bending. How
do transverse shocks propagate along a curved flow? Will the shock plane
rotate with respect to the flow axis; will the shock strengthen or weaken?
What role does flow curvature have to play in explaining phenomena such
as stationary knots between which superluminal components propagate (Shaffer
et al. 1987), knots which brighten after an initial fading (Mutel et al.
1990), and changing component speed (Lobanov & Zensus 1996). Numerous
lines of evidence point convincingly to the occurrence of oblique shocks
(e.g., Heinz & Begelman 1997; Lister & Marscher 1999; Marscher
et al. 1997; Polatidis & Wilkinson 1998); how do they form and evolve?
All these issues are amenable to study through hydrodynamic simulation
 but all demand that such simulations be 3D.
Furthermore, it has become evident over the last five years that such
highly energetic flows are also found in Galactic objects with stellar
mass `engines'. In the galactic superluminals GRS 1915+105 (Mirabel &
Rodríguez 1994, 1995) and GRO J165540 (Hjellming & Rupen 1995;
Tingay et al. 1995) the observed motions indicate jet flow speeds up to
92%c. There is compelling evidence from the observation of optical afterglow
that gammaray bursts are of cosmological origin, and whether produced
by the mergers of compact objects, or accretion induced collapse (AIC),
simple relativistic fireball models seem ruled out, the data strongly favoring
highly relativistic jets (Dar 1998). Thus a detailed understanding of the
dynamics of collimated relativistic flows has wide application in astrophysics.
This grant application requests resources to perform an initial exploration
of the key issues pertaining to the uniquely 3D dynamics of relativistic
astrophysical flows.
Rationale for Machine Use
Since 1993 we have used machines at OSC and elsewhere in a continuing
project to study the dynamics and radiation properties of relativistic
extragalactic jets. Our first results (Duncan & Hughes 1994) constituted
the firstpublished highresolution axisymmetric simulations of these objects.
That work demonstrated the stability of highly relativistic flows, and
suggested an interpretation of a longknown difference between two classes
of astrophysical flows in terms of flow speeddependent stability. That
work was performed on a combination of computers, including workstations
at BGSU, the University of Michigan, and the OSC Cray and OVL Onyx machines.
In addition, OSC personnel helped us make movies of the 2D simulations
which proved very useful in both the assessment of the science and the
communication of the work.
Some representative graphics illustrating the jet dynamics is shown
in the following figures.
That work has subsequently developed along four parallel tracks:

A detailed exploration of the morphology and dynamics of these flows within
the parameter space of flow speed, Mach number, and gas temperature. This
work has enabled us to understand the difference in character between flows
with and without relativistic speed. See, e.g., Rosen, Hughes, Duncan &
Hardee (1999). The work was done on workstations at the Universities of
Michigan and Alabama, in addition to OSC machines.

A comparison of the results of linear stability analyses with those
of numerical simulation, for both relativistic and nonrelativistic axisymmetric
flows. We can understand the instabilities that are seen in the numerical
simulations if account is taken of the coupling between available modes
and the disturbances that may excite them. See, e.g., Hardee, Rosen, Hughes
& Duncan (1998). This part of the work was done on workstations at
the Universities of Michigan and Alabama, in addition to OSC machines.

Radiation transfer calculations to predict the appearance of these
flows for comparison with observations with the latest and next generation
telescopes. See, e.g., Mioduszewski, Hughes & Duncan (1997). This work
was done on workstations at the University of Michigan and BGSU.

The development and first use of a 3D version of our original 2D
code, in recognition of the many effects that demand a fully 3D study.
We have been developing the 3D version on workstations at Washington University,
St. Louis, the University of Michigan, BGSU, and on the OSC Cray and Origin
2000 over the last two years. Some additional testing has been done on
other machines (Potsdam, NCSA).
The major focus of the present proposal for OSC resources is the 3D relativistic
jet simulations using our Adaptive Mesh Refinement code. Over the last
two years we have performed numerous tests on the code and have endeavored
to make it as efficient as the underlying algorithms permit. Following
this extensive development and testing, the code is ready to be used to
perform research level simulations on the OSC Cray T94 and the Origin 2000.
The computational requirements of such simulations are significant in both
memory and wall clock hours, and the 3D simulations requested herein require
facilities of the caliber of those at the OSC, principally due to the memory
requirements. The fast, large memory machines such as the Cray T94 and
the Origin 2000 are ideal for our purpose, because currently available
workstations provide insufficient memory for us to achieve anything like
adequate resolution while, as noted below, the use of massively parallel
technology still presents serious challenges to AMRtype codes.
Justification for Amount of Resource Units
Resource Units Per Run
The following is a list of specific projects to be done along with estimates
of the number of RU needed for each part. These estimates are based on
testing done on the OSC Origin 2000 and on the NCSA Origin 2000 cluster.
Specifically, this project plans to

Forge a link with earlier, 2D axisymmetric hydrodynamic simulations, by
exploring for a small subset of earlier 2D simulations those modes of
instability that are suppressed in 2D, but which are expected on the basis
of linear stability analysis to dominate the formation of internal jet
structure in 3D; see the next section for further comments. For these
axisymmetric relativistic jets in 3D cartesian coordinates we request
a total of three such runs at 150 RU per run. The total requested is 450
RU for this part.

Study the nonlinear development of instability for a small
subset of earlier 2D simulations by perturbing the flow with a periodic
driving force designed to excite the most unstable modes suggested by linear
stability analysis. This project will study the nonlinear development of
instability in Cartesian coordinates of some of the axisymmetric simulations.
We request a total of three such runs at 150 RU per run. The total requested
is 450 RU for this part.

Study the development of internal jet structure, and its subsequent
evolution, due to the interaction of a jet with an ambient density inhomogeneity
 an idealization of the interaction of the jet with an interstellar cloud
or gradient in the diffuse interstellar medium. The deflection of initially
axisymmetric jets in Cartesian coordinates will begin to address the important
issue of which mechanisms are most viable for deflection, help us to understand
the jet's response to bending, the possible maintenance of the jet's integrity
and the development of its internal structure. We request a total of three
such runs at 150 RU per run. The total requested is 450 RU for this
part.

Study the development of internal jet structure and its subsequent
evolution due to precession of the jet inflow  a process strongly
suggested by the Ssymmetric kiloparsecscale sources, and likely to occur
in situations with multiple galactic nuclei. We hope to extract information
on the kinematics of high brightness structures in the complex evolving
flow. We have made some preliminary runs with coarse resolution of the
evloving precessing jet, the results of which are alluded to in the Number
of Runs section of this proposal. Due to the complexity of the flow induced
by the precession this will be a single high resolution run of the code.
We estimate that 500 RU will be needed for this.
The total requested RU for the project is thus: 1850 RU.
Number of Runs
The total number of runs from all four parts of the proposed studies amounts
to of the order of ten, with the exact number determined in part by earlier
results: novel initial results might compel us to invest more time in few
high resolution runs, or a broader exploration of parameter space. Either
way, trial runs on the Origin 2000s at both OSC and NCSA provide us with
a dependable estimate of the resources required.
We note the very important point that because of our use of Adaptive
Mesh Refinement, low resolution 3D runs can be made on workstation class
machines with 128Mb of memory. Such runs have insufficient resolution to
address the questions set out above, but provide invaluable insight into
which initial conditions and parts of parameter space are most appropriate
choices for production runs at OSC. We thus expect to need little or no
experimentation, and will make efficient use of OSC facilities.
Runs will require typically 1  2 Gb of memory (which is available through
a queue  Q95.2g36000  set up at our request in October 1998). Fig. 2
in the graphics
link shows the results of an exploratory run with a precessing jet;
this required 0.7 Gb on the Origin 2000, and employs 2 levels of refinement
on a base grid of size
. Although interesting, and indeed capable of giving insight into the dynamics
of the flow, the resolution is clearly insufficient. An additional level
of refinement alone would lead to untenable memory requirements, but an
additional level in conjunction with more selective refinement criteria
 restricting refinement to key portions of the flow  would increase the
memory required by no more than
.
Performance Report
Algorithmic Description
Our current calculations assume an inviscid and compressible gas, and an
ideal equation of state with constant adiabatic index. We use a Godunovtype
solver which is a relativistic generalization of the method due to Harten,
Lax, & Van Leer (1983), and Einfeldt (1988), in which the full solution
to the Riemann problem is approximated by two waves separated by a piecewise
constant state. We evolve mass density R, the three components of
the momentum density
,
and
, and the total energy density E relative to the laboratory frame.
Earlier, 2D axisymmetric calculations computed the axial and radial components
of momentum density,
and
, in a cylindrical coordinate system.
Defining the vector
and the three flux vectors
the conservative form of the relativistic Euler equation is
The pressure is given by the ideal gas equation of state
The Godunovtype solvers are well known for their capability as robust,
conservative flow solvers with excellent shock capturing features. In this
family of solvers one reduces the problem of updating the components of
the vector U, averaged over a cell, to the computation of fluxes
at the cell interfaces. In one spatial dimension the part of the update
due to advection of the vector U may be written as
In the scheme originally devised by Godunov (1959), a fundamental emphasis
is placed on the strategy of decomposing the problem into many local Riemann
problems, one for each pair of values of
and
to yield values which allow the computation of the local interface fluxes
. In general, an initial discontinuity at
due to
and
will evolve into four piecewise constant states separated by three waves.
The leftmost and rightmost waves may be either shocks or rarefaction
waves, while the middle wave is always a contact discontinuity. The determination
of these four piecewise constant states can, in general, be achieved only
by iteratively solving nonlinear equations. Thus the computation of the
fluxes necessitates a step which can be computationally expensive. For
this reason much attention has been given to approximate, but sufficiently
accurate, techniques. One notable method is that due to Harten, Lax, &
Van Leer (1983; HLL), in which the middle wave, and the two constant states
that it separates, are replaced by a single piecewise constant state. One
benefit of this approximation, which smears the contact discontinuity somewhat,
is to eliminate the iterative step, thus significantly improving efficiency.
However, the HLL method requires accurate estimates of the wave speeds
for the left and rightmoving waves. Einfeldt (1988) analyzed the HLL
method and found good estimates for the wave speeds. The resulting method
combining the original HLL method with Einfeldt's improvements (the HLLE
method), has been taken as a starting point for our simulations. In our
implementation we use wave speed estimates based on a simple application
of the relativistic addition of velocities formula for the individual components
of the velocities, and the relativistic sound speed
, assuming that the waves can be decomposed into components moving perpendicular
to the two coordinate directions.
In order to compute the pressure p and sound speed
we need the rest frame mass density n and energy density e.
However, these quantities are nonlinearly coupled to the components of
the velocity as well as to the laboratory frame variables via the Lorentz
transformation:
where
is the Lorentz factor and
. When the adiabatic index is constant it is possible to reduce the computation
of n, e,
,
and
to the solution of the following quartic equation:
where
. This quartic is solved at each cell several times during the update of
a given mesh using NewtonRaphson iteration.
Our scheme is generally of second order accuracy, which is achieved
by taking the state variables as piecewise linear in each cell, and computing
fluxes at the halftime step. However, in estimating the laboratory frame
values on each cell boundary, it is possible that through discretization,
the lab frame quantities are unphysical  they correspond to rest frame
values v > 1 or p < 0. At each point where a transformation is needed,
we check that certain conditions on M/E and R/E
are satisfied, and if not, recompute cell interface values in the piecewise
constant approximation. We find that such `fall back to first order' rarely
occurs.
The relativistic HLLE (RHLLE) method discussed above constitutes the
basic flow integration scheme on a single mesh. Our past and planned work
also utilizes adaptive mesh refinement (AMR) in order to gain spatial and
temporal resolution.
The AMR algorithm used is a general purpose mesh refinement scheme which
is an outgrowth of original work by Berger (1982) and Berger and Colella
(1989). The AMR method uses a hierarchical collection of grids consisting
of embedded meshes to discretize the flow domain. We have used a scheme
which subdivides the domain into logically rectangular meshes with uniform
spacing in the three coordinate directions, and a fixed refinement ratio
of
. The AMR algorithm orchestrates i) the flagging of cells which need further
refinement, assembling collections of such cells into meshes; ii) the construction
of boundary zones so that a given mesh is a selfcontained entity consisting
of the interior cells and the needed boundary information; iii) mechanisms
for sweeping over all levels of refinement and over each mesh in a given
level to update the physical variables on each such mesh; and iv) the transfer
of data between various meshes in the hierarchy, with the eventual completed
update of all variables on all meshes to the same final time level. The
adaption process is dynamic so that the AMR algorithm places further resolution
where and when it is needed, as well as removing resolution when it is
no longer required. Adaption occurs in time, as well as in space: the time
step on a refined grid is less than that on the coarser grid, by the refinement
factor for the spatial dimension. More time steps are taken on finer grids,
and the advance of the flow solution is synchronized by interleaving the
integrations at different levels. This helps prevent any interlevel mismatches
that could adversely affect the accuracy of the simulation.
In order for the AMR method to sense where further refinement is needed,
some monitoring function is required. We have used a combination of the
gradient of the laboratory frame mass density, a test that recognizes the
presence of contact surfaces, and a measure of the celltocell shear in
the flow, the choice of which functions to use being determined by which
part of the flow is of most significance in a given study. Since the tracking
of shock waves is of paramount importance, a buffer ensures the flagging
of extra cells at the edge of meshes, ensuring that important flow structures
do not `leak' out of meshes during the update of the hydrodynamic solution.
The combined effect of using the RHLLE single mesh solver and the AMR algorithm
results in a very efficient scheme. Where the RHLLE method is unable to
give adequate resolution on a single coarse mesh the AMR algorithm places
more cells, resulting in an excellent overall coverage of the computational
domain.
Vectorization or Parallelization Implementation
While there exist many implementations of distributed Adaptive Mesh Refinement
in 3D (e.g. DAGH, PYRAMID), the problems associated with implementing
an efficient method remain largely unsolved. Specifically, the issues of
load balancing and data locality present large obstacles in constructing
an efficient AMR implementation in a distributed memory environment. We
have based our AMR implementation on a simple Fortran 90 derived data type.
We plan to eventually employ multiple processor technology at two different
levels. First, we plan to utilize recent advances in High Performance Fortran
(HPF) compiler technology in our code. This will enable us to access distributed
memory architectures. Second, we plan to implement the shared memory parallelism
offered by the SGI Origin 2000.
Performance Analysis Reports
Code compilation has been performed with extensive but safe optimization
(typically O2). We have avoided certain options (e.g., the OPT:IEEE_
arithmetic=3 option of the Origin 2000) to minimize potential rounding
error induced problems. The code makes few calls to intrinsic math functions
and performs no standard tasks such as linear equation solving or fast
Fourier transformation. We have thus not called upon libraries with optimized
functions or packages.
Profiling has been used to very good effect: in an early version of
the code profiling revealed that interpolation to populate newly generated
grids  in particular to provide boundary values during advance of the
solution to a common time on all grid levels  took 58.5% of the processor
time. Recoding to perform this interpolation on a gridbygrid, rather
than pointbypoint basis reduced the time spent in this task to 6.5% in
a fiducial problem, with 77% of the time spent on core calculations pertaining
to the calculation of fluxes and updating conserved variables.
perfex data from a run on the Origin 2000  a machine wellsuited
to the current project  showed a graduated instructions per cycle value
of 0.78, somewhat, but not substantially, below 1.0. Graduated loads (and
stores) per issued loads (and stores) were close to 1.0; both L1 and L2
data cache hit rates were above 0.93. In general, perfex indicates
that the code suffers no significant inefficiencies.
Bibliography

1

Begelman, M. C., Blandford, R. D., & Rees, M. J. 1984, Rev. Mod.
Phys., 56, 255

2

Berger, M. 1982, Dissertation, Stanford University

3

Berger, M., & Colella, P. 1989, J. Comp. Phys., 82, 67

4

Dar, A. 1998, ApJ (Letters), 500, L93

5

Duncan, G. C., & Hughes, P. A. 1994, ApJ, 436, L119

6

Einfeldt, B. 1988, SIAM J. Numerical Analysis, 25, 294

7

Godunov, S. K. 1959, Mat. Sb., 47, 271

8

Hardee, P. E., Rosen, A., Hughes, P. A., & Duncan, G. C. 1998,
ApJ, 500, 599

9

Harten, A., Lax, P., & Van Leer, B. 1983, SIAM Rev., 25, 35

10

Heinz, S., & Begelman, M.C. 1997, ApJ, 490, 653

11

Hjellming, R.M., & Rupen, M.P. 1995, Nature, 375, 464

12

Lister, M.L., & Marscher, A.P. 1999, ApJ, in press

13

Lobanov, A. P., & Zensus, J. A. 1996, in Energy Transport in
Radio Galaxies and Quasars, ed. P. E. Hardee, A. H. Bridle, & J. A.
Zensus (San Francisco: ASP ), 109

14

Marscher, A.P. et al. 1997, BAAS, 190, 5106

15

Mioduszewski, A. J., Hughes, P. A., & Duncan, G. C. 1997, ApJ,
476, 649

16

Mirabel, I.F., & Rodríguez, L.F. 1994, Nature, 371, 46

17

Mirabel, I.F., & Rodríguez, L.F. 1995, Annals of the New
York Academy of Sciences, eds. H. Böhringer, G.E. Morfil, & J.
Trümper, 759, 21

18

Mutel, R. L., Phillips, R. B., Su, B., & Bucciferro, R. R. 1990,
ApJ, 352, 81

19

Polatidis, A.G., & Wilkinson, P.N. 1998, MNRAS, 294, 327

20

Rosen, A., Hughes, P. A., Duncan, G. C., & Hardee, P. E. 1999,
ApJ, in press

21

Shaffer, D. B., Marscher, A. P., Marcaide, J., & Romney, J. D.
1987, ApJ, 314, L1

22

Tingay, S.J., et al. 1995, Nature, 374, 141

23

Wardle, J. F. C., Cawthorne, T. V., Roberts, D. H., & Brown,
L. F. 1994, ApJ, 437, 122
List of Publications and Reprints of Work Done on OSC Machines

1

Duncan, G. C., & Hughes, P. A. 1994, ApJ, 436, L119

2

Duncan, G. C., Hughes, P. A., & Opperman, J. 1996, in Energy
Transport in Radio Galaxies and Quasars, ed. Hardee, P. E., Bridle, A.
H., & Zensus, J. A., (ASP Conference Series), p. 143

3

Hardee, P. E., Rosen, A., Hughes, P. A., & Duncan, G. C. 1998,
ApJ, 500, 599

4

Hughes, P. A., Duncan, G. C., & Mioduszewski, A. J. 1996, in
Energy Transport in Radio Galaxies and Quasars, ed. Hardee, P. E., Bridle,
A. H., & Zensus, J. A., (ASP Conference Series), p. 137

5

Mioduszewski, A. J., Hughes, P. A., & Duncan, G. C. 1997, ApJ,
476, 649

6

Rosen, A., Hughes, P. A., Duncan, G. C., & Hardee, P. E. 1999,
ApJ, in press
G. Comer Duncan
gcd@chandra.bgsu.edu
Wednesday. April 7, 1999