# Real-time ocean simulation

Ryan Welch / June, 2019

47 min read– views

As my final project for my BSc in Computer Science I attempted to recreate leading industry techniques for simulating water in real-time. The submitted paper has been converted with some minor editorial adjustments and can be read below.

# Introduction

## Context

William Fetter was the first to use the term computer graphics in 1961 (Khronos Group, 2004), since then computer graphics have developed rapidly, from 2D sprites and images moving on the screen to 3D models with photo-realistic lighting. However we are still limited by hardware and it is impractical for us to simulate all the real world lighting models as they are often too slow for real-time applications, therefore we must come up with techniques and tricks to create a reasonable approximation for the human eye.

Three dimensional rendering is the process of turning 3D models into 2D images for displaying on screen. Rendering is done in two main categories: real-time and non real-time. In real-time, graphics are often sacrificed for speed and tricks are used to achieve an image that is tolerable to humans and looks approximately like real life. Real-time rendering is mainly done for interactive media such as video games or interactive simulations. In contrast, non-real time rendering is used mainly in videos or films where a much higher quality is desired, hence the graphics techniques are usually closer to real life approximations such as ray-tracing however each frame can take minutes or even hours to render.

Ocean simulations and rendering have been done with traditional non-real time methods fairly effectively although real-time is still a challenge to date with the complexity of lighting and the scale of the simulation. Many simple implementations of oceans or water are simply 2D animated textures with lighting tricks such as normal mapping (Valve, 2010).

## Motivation

An ocean simulation is a technically challenging problem due to the infeasibility of running fluid simulations in real-time, or even in non-real time it is not possible to simulate all particle interactions. Many simulations are either for accuracy or for visual approximation such as in games. Water has complex lighting properties as well as complex behaviour which at a large scale can only be approximated. Water and oceans are often overlooked in applications however good terrain can improve the quality of a scene dramatically.

## Aims

The aim was to create a visually acceptable ocean with wave movement and adjustable simulation parameters. This entails an animated ocean with realistic lighting and able to run in real-time with simulated motion. The simulation should be able to be configured and controlled in real-time such as changing between calm and stormy conditions, or waves in a shallow lake and a vast ocean. This provides a feature rich simulation to use in various applications and helps increase the visual fidelity of the application and increase the immersion of the observer.

So that the techniques can be fully realised and understood and in order to implement and understand modern lighting and shading techniques the implementation is written using low level system libraries and without the use of a rendering engine.

# Background

## C++

The language C++ is one of the most popular choices in real-time graphics due to the need to optimise memory layout and communicate to hardware such as the GPU. Therefore C++ has been chosen for the implementation.

CMake (Cedilnik, Hoffman, King, Martinm, & Neundorf, 2000) is a build tool for C++ which makes it easy to work with dependencies and different platforms, it generates platform specific build files such as a Visual Studio project on windows or a Makefile on Linux. The project is primarily written on Windows in Visual Studio but aims to have Linux compatibility too.

## OpenGL

OpenGL (Khronos Group, 1992) is a popular cross platform graphics API (Application Programming Interface), it is used extensively throughout the industry for scientific visualisation, simulation, computer-aided design and video games. The project uses the open source glad (Herberth, 2013) OpenGL loader library which creates a C binding and loads the OpenGL functions at runtime. The GLFW (Geelnard & Lowy, 2002) library is also used for cross platform windowing where the library creates and manages the platforms native window and abstracts the interface to it. GLFW makes it simple to create a new OpenGL context and also provides the input from the window such as mouse and keyboard.

There are alternatives to OpenGL such as the Windows proprietary DirectX library and the recent successor to OpenGL, Vulkan (Khronos Group, 2016b). OpenGL was chosen however as it is the most widely supported and many existing implementations use it successfully, although most rendering engines allow using multiple different back-ends.

## Graphics Pipeline

A graphics pipeline is the stages performed in order to display a mesh or model on screen, this typically involves transforming and projecting the mesh to a 2D image whilst applying lighting and other effects. In older versions of OpenGL and older graphics hardware, the stages of turning a 3D model into a 2D image were pre-set in a fixed function pipeline. A collection of geometry was sent to the GPU and set functions applied transformation, lighting and shading. In modern graphics, there is a new programmable pipeline where many stages of the process can be controlled by shaders (Scripts written in GLSL that run on the GPU).

The advantage to this is that different tricks and effects can now be made to use hardware to perform their computations more quickly. Many instances of a shader are run in parallel on the GPU utilising all of its cores. This allows quick calculations per pixel or per geometry vertex.

## Rendering

### Level of Detail

In order to keep performance high, the entirety of a scene cannot be rendered, a common technique to improve the detail of geometry and amount of objects we can render is to have multiple levels of detail. In general, the further and object is from the camera the less detail we need. The aim is to have our geometry primitives more or less map evenly in size to screen-space meaning that all parts of the final image have equal detail and resolution, there is no need to have a large amount of geometry that can only be seen in the area of a few pixels. It can also be biased to put more detail up front where it will be noticed more and less detail further away.

### Lighting

#### Rendering Equation

The rendering equation (Kajiya, 1986) models the physical interaction of light on a surface, it describes how much light is emitted from a point on the surface. It forms the basis of rendering, we can simplify it for general purpose. The amount and color of light can be approximated in various ways and then combined.

$L(x, v) = \int_\omega f(x, l, v) L(x, l) cos(\theta)$

The equation shows the outgoing light (or radiance) $L(x, v)$ as the integral over the hemisphere where $f(x, l, v)$ is the BRDF and $L(x, l)$ is the incoming radiance. $x$ denotes the position on the surface, $v$ the outgoing light and $l$ the incoming light. In practice we calculate the diffuse and specular components seperately and we use a BRDF that approximates the integral. The normal of the surface is used as the angle of incidence $cos(\theta)$, it can be expressed in terms of vectors as $-l \cdot n$.

#### Physically Based Rendering

Physically based rendering (Karis, 2013) (Burley, 2012) is a more modern approach to shading as opposed to simplified models such as Blinn-Phong (Blinn, 1977). They have been popularized as modern hardware is capable of using more detailed models and approximations of lighting. Physically based models are specifically a physical based reasoning of how to define a material and its interaction with light on the surface level. Energy conservation is also important, a more reflective surface should have less diffuse and hence appear darker.

We chose to model the surface as a combination of a few physical parameters including diffuse, roughness, metallicness and bumpiness. This provides a good range of materials we can model, the ocean surface has a few extra shading differences including a high emissive color to simulate scattering in the water instead of transparency.

#### Bidirectional Reflectance Distribution Function

A BRDF represents the ratio of light reflected to light received, it is based on the probability of the surface reflecting a ray. There are various different models that can be used, below are the one's we chose to implement based mainly on the findings of Epic Games in Real Shading in Unreal Engine 4 (Karis, 2013).

For our diffuse BRDF we use the Lambertian diffuse model, it is very cheap to compute and provides good results.

$f(l, v) = \frac{diffuse}{\pi}$

Lambertian diffuse model

For specular shading we use the Cook-Torrance microfacet specular shading model (Burley, 2012) (Hoffman, n.d.). It represents the BRDF for the specular component of light. The diffuse and specular are later combined in the simplified rendering equation.

$f(l, v) = \frac{D(h) F(v, h) G(l, v, h)}{4 (n \cdot l) (n \cdot v)}$

Cook-Torrance microfacet specular shading model

$D(h)$ is the normal distribution function, we use GGX (Walter, Marschner, Li, & Torrance, 2007) as the model, it describes the chance that a ray will reflect back into the viewers eye direction. It describes the distibution of normals on the microsurface statistically. A smooth surface will have all the microfacets facing a similar direction and hence reflects in a small area directly at the viewer causing a small specular highlight, a rougher surface will have a more scattered reflection of light and hance a wider specular highlight.

$D(h) = \frac{\alpha^2}{\pi ((n \cdot h)^2 (\alpha^2 - 1) + 1)^2}$

GGX normal distribution function

$G(l, v, h)$ is the specular geometric attenuation term, we use the schlick model (Schlick, 1994) with modifications which is very close to the GGX model (Walter, Marschner, Li, & Torrance, 2007). It models the probability that the ray will not be shadowed on a microscopic level and hence be able to reflect back into the viewers direction.

$G_1(v) = \frac{n \cdot v}{(n \cdot v)(1 − k) + k}$
$G(l, v, h) = G_1(l) G_1(v)$

Schlick model

Finally, $F(v, h)$ is the fresnel factor which uses the schlick approximation as described in the next section.

#### Fresnel and Reflections

Fresnel describes the factor at which the material reflects or refracts light based on the viewing angle. For example if looking at water straight on then the majority of light is refracted and you can see into or through the water. However if seen at a grazing angle water becomes much closer to a mirror like surface where most light is reflected. Fresnel is therefore very important in producing a convincing water surface, and reflections make a huge impact into the realism of the surface.

Fresnel is commonly approximated using the Schlick approximation (Schlick, 1994). It is fairly inexpensive to compute and produces good results and is widely accepted as the standard.

$R(\theta) = R_0 + (1 - R_0)(1 - cos \theta)^5$

Fresnel using Schlick approximation

#### Normal Mapping

Normal mapping is a technique used to simulate the bumps on a surface by displacing the surface normals which are used to affect the lighting of the surface thus shading or lighting the bumps to appear as though the surface has more depth and higher resolution than the actual mesh. This provides extra detail to surfaces which is key in adding realism and it is very cheap to compute since it is simply affecting the input to our lighting calculations and not adding any extra geometry.

Normal maps are stored in a standard texture where the red, green and blue components instead represent the x, y and z of the normal vector. The normals are stored in tangent space so they can be combined with the surface normal in model space.

## Procedural Generation

Procedural generation is a way of creating data from an algorithm such that we define the function that creates the content. Usually procedural generation is deterministic such that the same input parameters will output the same value. In the case of terrain, or more generally generating a height map, given the input coordinates $(x, z)$ and any other parameter the function should produce the same height every time it is called. This way the function is very versatile and can be used to generate content on the fly as it is reproducible without any previous state.

One downside of procedural generation is the lack of contextual data, we cannot sample or use neighbouring data about the world since it would cause it to be dependant on that neighbour's data which would then be dependent on that neighbour's data and so on, thus making it un-computable. For example computing erosion on terrain cannot be done procedurally or modelling water as particles with interactions as we do not carry forward any state.

# Related Work

## Wave Models

There are various ways of approximating water waves (NVIDIA, 2007), since we are aiming for real-time however a realistic simulation of fluid is not possible. Although there are plenty of good real-time solutions which approximate waves to a very high level of realism especially for the casual observer.

### Particle Simulation

There are various physically based models of water as particles and interactions between them (Braley & Sandu, n.d.) (Zadick, Kenwright, & Mitchell, 2016). This leads to very convincing effects and realistic simulations, there are however scale issues as modelling an entire ocean as particles is infeasible and by itself even on small bodies of water particle simulations become expensive to compute. There could possibly be some compromise by modelling some parts of wave interactions as particles but this increases the complexity enormously. Water spray is a common phenomenon that could be recreated with particle simulations and effects.

### Sum of Sinusoids

The simplest model is a summation of sine waves in different directions. It is very fast and easy to implement but not very realistic especially when making high waves at sea. It is successfully used for approximations of small bodies of water or flowing rivers with small ripples, it is good at approximating high frequencies. It is very useful in relation to normal maps as normal maps tend to represent the higher frequencies of the waves, this technique was used in the award winning game Portal 2 (Valve, 2010).

The height at a given point $(x, z)$ in this model is given by:

$H(x, z, t) = \sum_{i=1}^{n}(A_i sin(w_i D_i \cdot (x, z) + t \phi_i))$

Where for each wave $i$, $A_i$ is the amplitude of the wave $i$, $D_i$ is the 2D horizontal direction of the wave, $w_i$ is the frequency of the wave, $t$ is the time and $\phi_i$ is the phase constant of the wave.

The problem with sum of sine's is that the waves are too shallow and flat. So to make the peaks steeper and more realistic looking, the sine function used must be modified. First the sine function is translated and compressed to be between 0 and 1 and then raised to a power, this results in steeper peaks as shown in figure ~\ref{fig:sinefunctions}.

### Gerstner Waves

Gerstner waves are an improved approximation of gravity waves similar to the sum of sine's approximation with the steepness enhancement (NVIDIA, 2007). The difference being that motion is also applied to the vertex position meaning the waves result in sharper crests and flatter troughs, this makes them look much more realistic. They are an approximate solution to the fluid dynamics equations.

The wave surface of the Gerstner model is given by the equation:

$P(x, z, t) = \begin{pmatrix} x + \sum(Q_i A_i \times D_i.x \times cos(w_i D_i \cdot (x, z) + t \phi_i) \\ \sum(A_i sin(w_i D_i \cdot (x, z) + t \phi_i) \\ z + \sum(Q_i A_i \times D_i.z \times cos(w_i D_i \cdot (x, z) + t \phi_i) \end{pmatrix}$

Where as above, for each wave $i$, $A_i$ is the amplitude of the wave $i$, $D_i$ is the 2D lateral direction of the wave, $w_i$ is the frequency of the wave, $t$ is the time and $\phi_i$ is the phase constant of the wave. $Q_i$ controls the steepness of the wave. The height component here is the same as the one in equation \ref{eq:sum_sinusoids} only now the vertex will move towards the waves.

Normals are important for lighting and we can use the following formula to find the surface normal at a point based on the transformed vertex position $P$.

$N(P, t) = \begin{pmatrix} - \sum(D_i.x \times w_i A_i \times cos(w_i \times D_i \cdot P + t \phi_i) \\ 1 - \sum(Q_i \times w_i A_i \times sin(w_i \times D_i \cdot P + t \phi_i) \\ - \sum(D_i.z \times w_i A_i \times cos(w_i \times D_i \cdot P + t \phi_i) \end{pmatrix}$

### Fast Fourier Transform

A realistic wave surface requires the synthesis of many octaves of Gerstner waves, since they are essentially sine waves we need to have a large variation in frequencies in order to appear realistic. The Fast Fourier Transform is an efficient way of calculating the impact of many sinusoids based on a range of frequency contributions and has been successfully used in many applications (Tessendorf, 2001).

## Rendering

### Raycasting

Raycasting is most frequently used in offline applications due to the more accurate lighting effects and approximations as it is the closest to the idea of real life physics where light is modelled as rays, however raycasting is in many cases too slow for even a modern computer. A clever solution is to perform raycasting from screen space into world space, if the geometry is defined as a mathematical function it can be improved by ray marching directly into the scene (Toman, 2009). However if we want to perform sophisticated water simulations or use the surface for physics we would still need to convert into a a standard triangle mesh which may be even slower.

### Geometry

Rendering terrain has many similarities to oceans, both are large flat surfaces which mainly vary in elevation. In order to render a large terrain it is usually represented as a height map where each texel in the height map corresponds to the height of the terrain at a certain point on a grid. This height map is then used to create the geometry, not all of the geometry has to be created at once it could be generated depending on where the camera is looking.

Most terrain in 3D real-time applications is designed as varying elevation over a flat plane, this makes it easier to render but limits the terrain to not having any overhangs or caves, these could be later hand placed as 3D models. This means that if we choose to model our ocean surface as a flat height map it becomes harder to add shore details like breaking waves as we would need additional 3D geometry. An ocean is mainly a flat surface with displacement of waves over it's surface, therefore we can use the simpler model rather than storing full 3D data as the breaking waves are a relatively small part of the ocean.

There are some implementations of water rendering where the vertices of a static mesh are displaced in real-time on the GPU by a height map or procedural function. Bandwidth between the GPU and CPU is often a bottleneck. Sending terrain vertices every frame can use up a lot of the available bandwidth, this method is therefore an improvement over using the height map to generate the mesh on the CPU as it avoids having to send any new geometry to the GPU every frame. Only the changed height map (if needed) and a static mesh to be displaced is sent. There are downsides however including not being able to use the mesh on the CPU for calculations such as physics, a low resolution mesh may still need to be calculated by the CPU.

#### Chunked terrain

A method of rendering terrain is breaking it up into ‘chunks’ which the CPU then generates the mesh for if the player is nearby, this puts most of the work on the CPU but provides a lot more control into how the chunks are created. It is a simple system and works effectively but is quite slow as a major overhead is sending new meshes to the GPU, this is especially a problem if the mesh constantly changes and has to be resent to the GPU. This system is most suited to static terrain such as land mass or terrain that has little changes, on the other hand water requires constant animated motion. This technique is used by the famous game Minecraft (Mojang, 2011) where the terrain data is stored as a 3D grid and rendered when the camera is nearby, however the terrain has very little updates once generated.

#### Clipmaps

Clipmaps (NVIDIA, 2005) are a method for improving the rendering distance of terrain, the main goal is to prevent overdrawing at distances where the detail cannot be seen. The main problem with sending an evenly spaced mesh to be rendered is that once projected onto the screen distant detail becomes very compact, therefore we are spending a lot more time on pixels that are distant from the viewer. Clipmaps provide level of detail by creating a grid like pattern where the centre grids are more compact and tessellated. Each level of the clipmap has the same number of vertices however it is stretched progressively and has a lower level of detail.

At the edge of each level of detail the mesh must transition properly to avoid holes or cracks appearing. There is usually additional or special geometry needed to transition, these are often called seams and must be handled explicitly otherwise artefacts may appear such as flickering or gaps in the terrain.

#### GPU Clipmaps

GPU clipmaps are the idea of moving some of the processing involved in regular clipmaps to the GPU, there are two ways to do this, one way which is ideal for us is to send a static flat pre-computed clipmap which the vertices can then be displaced in the GPU.

A further way to reduce CPU work, is to generate the entire clipmap on the GPU using the new Tessellation pipeline in OpenGL 4+. A simple mesh submitted from the CPU is tessellated accordingly in the GPU and then fed through the displacement shaders.

### Reflections

Water is very reflective and almost becomes mirror like in a calm body of water such as a still lake or puddle. Therefore reflections are a very important part of rendering water and a good technique must be used. There are three main implementations of reflections.

We can probe a certain space in the world and take a full 360 degree snapshot of the surrounding world, this is known as a cubemap or environment map (we can also do this as a sphere, sphere map). Environment maps can be computed statically if we know the scene will not change for example when we only want global reflections such as the sky and it does not change we can use a precomputed environment map. A more dynamic approach can be taken if there will be moving or changing objects (such as a procedural sky) but it requires that we render all six sides of the cube every frame to build the environment map. This can be quite costly but the realism provided by reflections is often worth it. The cost can be reduced if the scene is rendered to a small cubemap instead of full resolution whilst usually still providing detailed enough reflections.

A very accurate approach but costly are planar reflections, they work by reflecting the position of the camera below the reflective surface and then rendering from below, the image is then flipped and rendered on top of the reflective surface. They are therefore limited to more or less flat surfaces although they give very good results. They are used quite successfully however and can be made more efficient by making use of the GPU's stencil buffer to only draw the parts that will be drawn onto a reflective surface (Kilgard, 1999).

Finally a very cheap but unreliable approach are screen-space reflections, the previous two methods have problems with artefacts appearing in certain angles or positions or other misalignments with the reflections. Screen space reflections work by ray casting in screen space, hence their name. They are performed as a post-processing effect using the normal buffer and depth buffer to compute the reflected ray and trace it back into the scene, if the reflected ray lies within the scene a sample is taken from the colour buffer and blended into the output colour. This works very well until the ray is reflected outside the scene, in which case we do not have data for the ray as this is happening in screen space, in practice a lot of the reflections however are within the scene and it makes for a convincing look, especially with accurate alignment of reflections. However it is best to combine this method with one of the previous methods, such as a lower resolution environment map, as a fall-back so that reflections outside the rendered scene can still be approximated.

# Design and Implementation

## Code Structure

📦ocean-simulation
┣ 📂build
┣ 📂data
┣ 📂lib
┃ ┗ 📜index.js
┣ 📂release
┃ ┗ 📜OceanDemo.exe
┣ 📂source
┃ ┣ 📂assets
┃ ┣ 📂components
┃ ┣ 📂ecs
┃ ┣ 📂graphics
┃ ┣ 📂input
┃ ┣ 📂shaders
┃ ┣ 📂terrain
┃ ┣ 📂ui
┃ ┗ 📂util
┗ 📜CMakeLists.txt


Folder structure of project source code, on the left is the overall structure, on the right is the structure of the source folder.

The file structure of the project is shown above, the entire ocean-sim folder is tracked in version control in order to review changes and make development easier. The project is written in C++ and uses the CMake build system, therefore there is a CMakeLists.txt file in the main folder which defines and generates the appropriate project files. The build/ folder is used when generating the CMake files which is referred to as an out-of-source build, it avoids generating the build files into the same folder as the actual source to keep our source folder clean. The source/ folder contains the source code written for this project including the C++ source and GLSL shader source. The lib/ folder contains the source of all the third party external libraries used. The libraries are mostly included as git submodules in order to be able to update the libraries easily, in the distribution however they are pre packaged without git. The data/ directory contains any engine data required such as textures and models. The release binary is copied to the release/ fodler along with a copy of data/ and source/shaders/ in order to provide a fully working pre-built release.

## Engine

The engine controls the various systems of the real-time simulation, it consists of: a renderer which manages drawing a mesh to OpenGL with appropriate shaders and parameters; an entity system to be able to place and manage objects in the world and track the camera; an asset manager to load shader, textures and other resources; an input processing system; and other miscellaneous utilities.

As well as handling all the different subsystems such as the renderer, the engine also importantly handles the time loop. The time loop runs repeatedly whilst the program is running and updates and renders at set intervals. The loop used was inspired by \citetitle{fix_your_timestep} (Fiedler, 2004) where the updates are done at fixed intervals this ensures consistency in calculations that are sensitive to large variations in time such as physics, networking code and complex simulations. Interpolating between render states has not been implemented.

The framework is designed modularly as shown in figure \ref{fig:systemoverview} with the Engine being the main entry point and frame coordinator which calls and managed the appropriate submodules. Submodules can access other submodules through the engine instance or via events. For example the ECS submodule emits events when entities are added or removed and any interested module can listen and react to these events.

The entity component system is a useful design where entities are constructed by attaching components which add behaviour to the entity. Therefore favouring composition over inheritance and providing a versatile system. Entities are updated and simulated using systems which are able to iterate over subsets of entities. For example and important system is the Render System, it's responsibility is to submit entities every frame to the Renderer which handles the actual multi stage rendering.

Asset loading is important so that we can use external resources such as textures in our engine, it uses the assimp library which supports loading of many object and scene formats (Gessler, Schulze, & Kulling, n.d.) as well as the SOIL library (Dummer, n.d.) which loads and parses many different texture formats.

The renderer system implements a wrapper around many OpenGL concepts such as textures, framebuffers, shaders and makes it easy to use by providing a mostly object oriented interface. The renderer works by integrating with the entity component system, every frame the system will gather and submit all entities with models associated and submit them to the renderer. The renderer will then render them to a screen using multiple passes such as an environment map pass, main pass and post processing passes. Every model is associated to a material which defines the shader to be used to render the mesh and the parameters.

These abstractions whilst time consuming to create provide the basis for a strong and scalable framework for graphics work, as well as making implementing more complex graphics features like deferred rendering and multiple pass rendering possible.

## Wave Models

For our wave model we had a set of requirements we wanted to have: it should be fast enough to run on a range of modern devices; appear realistic and have a large range of wave frequencies; and the waves should be controllable so that the look can be fine tuned by the user or artist.

### Chunk Based

The initial design of the implementation was to create a system that can load meshes procedurally when needed in chunks. Each chunk represents a set size in the world, only the chunks near the viewer are loaded therefore implementing a form of level of detail. The chunks further away have a lower level of detail than the ones closer to the viewer. Some chunks also require a seam, specifically between chunks that have different levels of detail. In the early implementation, the ocean was divided into chunks which are generated in real-time on the CPU, however they cause a noticeable stutter when moving around as new chunks are generated in the background.

The chunks can be seen above, they are generated at run time using a simple sum of sinusoids model based on the current position in the world. There is only one octave of sine waves shown, it was also lacking seams between the chunks. The code fragment in figure \ref{lst:chunk_gen_code} shows the implementation for the sine wave model.

1float Chunk::getHeight(float x, float z) {
2  float scale = 0.5;
3  return amplitude * std::cos(scale * x) * std::cos(scale * z) + offset;
4}
5


Code fragment returns the height $y$ at the given world position $(x, z)$

Whilst this looked promising there were several issues in this approach which formed the basis to the choices of models and rendering for later implementations. The good aspects were that we had full control over the mesh and could even attempt more sophisticated models such as full particle models possibly even in 3d. However it was apparent very quickly that this method is too slow due to the bandwidth and CPU power to update and send the mesh every frame. We needed to generate many vertices in parallel which seemed more appropriate for the GPU. We also wanted a more sophisticated level of detail system than chunks which would be easier to manage.

### Gerstner Waves

Gerstner waves whilst not being as realistic as the Fast Fourier Transform (FFT) model provide reasonable level of realism whilst relatively cheap. An important aspect is that they are based on a predefined procedural function and therefore have no state, this means given the same inputs $H(x, z, t)$ it will produce the same output, it is incredibly beneficial in terms of reproducibility such as in multi-client simulations where synchronising the inputs will lead to the same visible ocean.

The Fast Fourier Transform model requires transformations based on previous iterations which require more memory overhead and lack the deterministic property, however for some applications where these are not as important FFT provides a considerable benefit in realism and believability due to the higher complexity in the surface it creates. FFT has the benefit of appearing less repetetive and can even be based on a real wave spectrum such as the JONSWAP Spectrum (Hasselmann et al., 1973).

This design was to perform all generation of waves on the GPU requiring the CPU to only send parameters and a static mesh, this moved the heavy lifting to the GPU which is designed to process many millions of vertices and is much more efficient. The equation \ref{eq:gerstner_position} and \ref{eq:gerstner_normal} were implemented in OpenGL Shading Language (GLSL). The position function shown in figure \ref{lst:gerstner_position_code} takes a given input $(x, z)$ position and outputs the transformed position. The similar normal generation function shown in figure \ref{lst:gerstner_normal_code} outputs the normal at the transformed position by the previous function. The functions iterate over the list of predefined wave parameters which are sent from the CPU.

The Gerstner waves are faded in and out over time to give variation and also allow us to modify the sea conditions, we cannot just modify amplitude or frequency on the fly as it deforms the waves and makes them jump about. Waves are generated based on some ranges defined by the user and some physical properties to represent real gravity waves. Speed and frequency for example are calculated based on the wavelength. We generate a random wavelength for each wave based on a median wavelength and generate the amplitude based on the ratio of a median amplitude to the median wavelength. This means that realistic waves can be generated by tweaking some simple parameters rather than choosing individual properties per wave which becomes confusing. The implementation for choosing parameters is shown in figure \ref{lst:wave_generation} and gives a better idea of how the parameters are created.

1// Generates a random angle in radians +/- 3/4 PI around wind direction
2float radians = ((mDistribution(mGenerator) * 1.5 * M_PI) - (3 * M_PI / 4)) + (mWindDirection * (M_PI / 180.0));
3// Direction in x and y components
4glm::vec2 direction = glm::vec2(std::cos(radians), std::sin(radians));
5// Ratio to use for amplitude and wavelength
6float ratio = (mDistribution(mGenerator) * 1.5f + 0.5f);
7mWaves.push_back({
8  direction, // Direction 2d vector, between -1.0 and 1.0
9  ratio * mWaveMedianAmplitude, // Amplitude, 0.5 to 1.5 * medianAmplitude
10  ratio * mWaveMedianWaveLength, // Wave length, 0.5 to 2.0 * medianWaveLength
11  mDistribution(mGenerator) * 0.1f - 0.05f + mWaveChoppy * 0.95f, // Choppiness, between 0 and 1
12  mWaveMinLifetime + mDistribution(mGenerator) * (mWaveMaxLifetime - mWaveMinLifetime), // Wave lifetime, how long the wave should exist for
13  0
14});
15


Code fragment that generates new waves and parameters based on user input

### Clipmap

The implementation of Gerstner waves was done on the GPU, but we still need to generate the geometry. A clipmap is generated on the CPU and then stored in Vertex and Index Buffer Objects, which cache the geometry in the GPU's memory this makes rendering from the CPU very inexpensive and simply a single draw call of the buffers and our wave shader program. The shader program performs the wave calculations per vertex of the clipmap and displaces the mesh every frame. We notice that our CPU code for generating the clipmap is not very efficient and requires a few seconds to initialize at the start of our engine however since we are not concerned about start-up time of our application we did not optimise this to use Tessellation shaders yet as mentioned in section \ref{gpu_clipmaps}. A comparison of before and after using clipmaps is shown in figure \ref{fig:gerstner_clipmap}, it shows much better tessellation.

The clipmap was implemented by generating square grids per level where the inside is cut out based on the previous level, and at the outer border of each level special consideration is taken to avoid T-junctions where there are less vertices. The code snipper in figure \ref{lst:clipmapcode} outlines the main algorithm based on (McGuire, 2014).

## Rendering

Rendering is a standard forward renderer, each model has an associated material which contains a shader and parameters that are applied to the shader and any textures required. The shader then draws the mesh directly to the output frame buffer to be displayed on screen and therefore also calculates any lighting required.

Our scene is rendered multiple times in different passes, there is a need to do this for different perspectives, lighting conditions and effects. Frame buffers in OpenGL allow us to bind multiple textures or render targets as the output of the render pipeline so our shader programs can write to the frame buffer instead of the back buffer (which itself is actually just a special frame buffer). We can then use these textures as input in shaders later on, our current renderer uses the textures to combine them in a final compositing pass where post processing effects such as fog, contrast, brightness and exposure etc. are applied. An example capture of some buffers is shown in figure \ref{fig:buffer_capture}.

A technique called ping-pong is useful when running through multiple post process shaders which modify the output based on the previous, since we can only read from one texture and write to a different frame buffer we swap the one being written to and read from after every post shader, this is essentially what the graphics driver does with the back and front buffer. The back buffer being what we write to and the front buffer being what is displayed currently, after every frame the GPU is told to flip the buffers and display the back buffer. Ping pong buffers are used during blurring post-processing for example, when blurring in multiple passes (e.g. horizontal then vertical) since it is more efficient.

An improvement to the renderer would be to perform deferred rendering so that we may handle more lights as the idea is that the geometry pass outputs the information for lighting to then be computed in screen space. However, transparent materials cannot be done in deferred rendering as we can only store at most one material in the frame buffer (we can't see the material behind it). Since we are rendering water we chose to stick to forward rendering although at present the water surface is modelled as opaque since we assume being in a deep sea.

### Physically Based Rendering

Physically based rendering (Khronos Group, 2016a) helped a lot with the realistic lighting effects and made the surface look glossy. It still appeared too plastic most likely due to the lack of reflections and also overly dark due to no ambient light. Water was modelled as an opaque object as we did not have the capabilities to do real time refraction and we also did not have an ocean floor to render. A good way of approximating the light scattering in the water is to make the material slightly emissive, as light would enter the water and be scattered in almost every direction, therefore we add a small amount of background emissive light to the material to fake this effect.

### Water Surface

Foam is rendered over the top of the waves to add to the realism, a combination of the normal and height of the wave are used to estimate areas of turbulence where foam would generate such as on the crests of waves. This works well and gives the ocean a less repetitive look and adds a lot to the realism of the scene, tiling textures tend to show repeating patterns however especially from far away this is partially mitigated by the noise function. A more accurate way to compute the areas of high turbulence would be to find the derivative of the waves or similarly the jacobian determinant, in practice the extra computations required did not seem worth the improved foam placement.

Texture coordinates for the waves can be calculated from the world position of the vertex and simply tiling the texture at a certain scale repeatedly in world space, this works fairly well and appears realistic. In order to provide more realism we offset the texture over time to provide movement, a more detailed approach could be taken to move in a random direction or in a predefined direction such as wind direction. We could also warp coordinates so that tiling becomes less obvious further away.

Similar to foam, we apply a static normal map over the surface for higher frequency detail. Since our displacement waves have a limited resolution due to the underlying mesh we can use normal mapping for smaller details. The normal texture is overlayed twice and moves in opposing directions at different speeds, the normals are combined by perturbing the normal along the direction of the surface normal.

The normal map is pre rendered and an example is shown in figure \ref{fig:normalmap}. A further enhancement would be to render Gerstner waves to a texture to then apply later in the pipeline instead of the static normal map, while this would perhaps add more realism we did not explore this fully due to time constraints and additional overhead, as well as the additional restrictions required on the gerstner equations to make sure the normal map can repeat.

### Reflections

Reflections were first done using a dynamic environment cubemap centred on the camera, this works well however there are problems with the resolution of the cubemap especially at a distance. There are also problems with using the camera position as the probe position for the cubemap especially near other reflective surfaces. An example of the cubemap displayed as a panoramic view is shown in Figure \ref{fig:cubemap_reflection}

The cubemap is created by capturing all six sides of the cube independently with a \ang{90} field of view. It is quite intensive to redraw the scene six times plus the final render however since we are doing to very low resolution it is not a huge performance hit. An improvement could be made to render the cubemap in a single pass by expanding the vertices into six different layers in a geometry shader, this way we are reducing the number of draw calls and context switching.

An improved reflection system was implemented which was based on screen space reflections. The main problem with screen space reflections are the lack of information in screen space therefore a good fallback is needed such as a cubemap. In the scene we only mainly needed the sky reflection and since the sky is procedural we can simply generate the color at any angle of reflection, therefore we simply used this function to generate our screen space reflections instead of ray marching into the scene and avoid the problem of lack of information entirely. This means we can have perfect sky reflections in any direction, however a major downside by not including actual screen space reflections is that we lose local reflections such as objects above the water surface. Most applications benefit from local reflections therefore a more fleshed out SSR solution would be required.

### Sky dome

The sky is generated procedurally with Rayleigh and Mie scattering (Terrell, 2016). The sun position is animated over time to simulate a day cycle. The sky is projected onto a sky dome created from an icosphere that is always centered on the viewer and drawn behind all other geometry.

### Post Processing

Post processing is done by rendering to a full screen quad, which is a quad drawn in screen space where the coordinates range from $(-1, -1)$ to $(1, 1)$ which covers the whole screen. We can them map this to UV coordinates $(0, 0)$ to $(1, 1)$ and use this to sample from our texture buffers from the geometry and lighting phases. Since we also have multiple buffers with different information such as depth, normals, colour, etc. we can apply more sophisticated post effects such as Screen-Space reflections and depth based fog.

There are various post processing effects applied, screen space reflections are used as mentioned above; a bloom effect produced by a high-pass filter as well as several blurs; and a lens flare effect. There is one final master compositing pass which combines various effects together, such as mixing bloom with the main color, tone mapping from High Definition Range (HDR) to Low Definition Range (LDR) and a subtle vignetting. The renderer is fully HDR and works with 16-bit precision floats in all the frame buffers and converts to standard LDR sRGB for outputting to the screen buffer. Tone mapping is done using the popular Uncharted 4 method (Naughty Dog, 2016).

# Evaluation

Evaluation is performed in two stages, the first is a subjective comparison against real-life examples and popular games. Games are a useful candidate to compare against as water is prevalent in games and they share the requirement of real-time simulation.

## Subjective

The project reached a satisfactory level of detail and quality to the point it could be realistically used in an application effectively for certain conditions. It would require more work to be up to the standard of triple A games or real time cinematography effects however due to the complexity and scale of the work required the project should be considered a success in implementing and experimenting with ocean rendering techniques.

The framework built for the scene is also versatile enough to allow different configurations and effects to be tested.

The figure shows a a real life reference photo in (a) and our replication of the ocean in (b). Our image approximates the ocean closely, the small scale waves with subtle movement and the reflections help sell the water look. The sky differs in the images and the overall colors are different however the surface is considered fairly similar.

In figure \ref{fig:ocean_game_ref} a comparison of two triple-A games against our water surface is shown, all three show a nice surface pattern where the water waves are clearly visible and appear glossy, they also show distortion of the reflections over the water surface. There are some subtle resolution issues close up in our ocean render which could be improved with higher resolution normal maps or a better interpolation method. The lowest quality water in the comparison appears to be PlayerUnknown's Battlegrounds (PUBG) water, it appears too flat and with many artefacts. It appears to be using screen space reflections as the edges of the screen are missing reflections as well as a lack of displacement in the waves.

Refraction helps make the water surface more believable, this is visible in images (a) and (b) of figure \ref{fig:ocean_game_ref}, whilst our render does not seem to show any change this is because of the assumption that we are in the deep sea and hence there will be no visible refraction. This assumption simplifies our implementation however it would make shallow water look much better in our ocean.

## Objective

Development of the project was primarily on a mid range computer and performed more than adequately. A scaled down version with less graphical fidelity was run on a modern laptop with integrated graphics and would perform as expected, whilst it struggled at times it was manageable.

Frames per second (FPS) data was recorded at varying clipmap sizes, it was run on a mid range computer which is in line with most modern computers and gaming consoles, the data is shown in table \ref{tab:fps_table}. The parameters such as base resolution and vertices per level were mostly chosen through trial and error and visually seeing the results, these were the best parameters that gave the best level of detail close up as well as hovering several meters above the surface which would be the typical viewing angles in many games.

The data shows the demo achieves high average frames per second even with lack of optimisations, we consider anything over 60 FPS to be acceptable as it is what most modern consoles and games achieve at present and what most monitors are capable of outputting. The high maximum FPS is likely due to when the camera is only facing the sky as the ocean mesh will be clipped out of frame and reduce the frame cost significantly. The minimum FPS is also fairly consistent even when increasing the number of clipmap levels, we only see a drop of about 10 FPS average per clipmap level.

Base ResolutionVertices per LevelLevelsMin FPSMax FPSAverage FPS
0.5m256788.500424.367137.355
0.5m256699.590413.216151.250
0.5m256586.378454.828167.460
0.5m256476.494512.330170.990

FPS at different clipmap sizes using 16 waves, 1920x1080 resolution, i7 4770k, GTX 970.

Usually in a real-time application the focus will be on another aspect and hence frame time has to be shared. Water or ocean effects may not be allocated as much time in the application especially if it requires other complex physics and terrain simulations etc. It is therefore important that it performs over 60 FPS in the average case as it leaves computing time to other applications.

The simulation works best at low viewing angles as is the case in most scenarios, as at higher viewing distances the surface appears too repetitive and some graphical glitches appear due to the clipmap. These could be mitigated by fading out higher wave frequencies at further distances, however it may be best to change to a different approach if the ocean will only be viewed from a distance.

The clipmap causes issues where the waves appear to jump when moving, this is due to the high frequencies waves specifically when the wavelength is shorter than the resolution of the grid. When the wavelength is too short and the grid does not have enough tesselation to represent the vertices will behave odly especially when moving as the vertex positions will noticeably snap to different points in the wave. A solution is to use a different representation of the surface at distance, possibly a flat surface and a pure BRDF solution, or by using a screen space grid projected into world space.

## Further Work

There are various improvements that would be beneficial before using in an application. The main improvements being to wave generation. Since there are a large variety of approaches there is a lot of areas to expand on, until we have the computing power to ray trace and simulate many particles we will most likely stick with variations of these techniques in order to keep render times reasonable.

The Fast Fourier Transform along with a realistic wave spectrum would greatly improve the realism and shape of the waves. Another problem is testing various parameters at different heights above the water surface as there are various visible artefacts at higher altitudes due to the repetition and high frequency details. Repetition becomes much more obvious at larger heights. The fast fourier transform method however requires much more computation.

Spray is another very complex effect that definitely contributes to the realism of the scene especially when the water surface is interacting with another surface. It could possibly be approximated by particle systems based on wave shapes however this was out of the scope of the project. It would also be an improvement to work on shore interactions as in many cases this is desirable and there are different conditions in shore waves.

In order to be more useful the model and renderer could be extended for shallow water, the main changes would be ensuring the wave behaviour adapts depending on height, such as reducing high frequencies and ensuring waves do not appear to clip through the ground. Refraction would also need to be implemented into the renderer by rendering the scene first without water and then adding the water in another pass.

Finally, in the near future, with some clever techniques such as ray marching, it should be reasonable to ray trace the waves in screen space in real time. It is currently quite intensive, however it results in much better effects such as real subsurface scattering.

# Conclusion

The project evaluated techniques in simulating and rendering an ocean in real time and the development of a framework that can support the techniques extensively.

The wave model chosen was Gerstner waves as they have a huge improvement over the sum of sine's in terms of realism. A further improvement for more advanced and detailed oceans would be to use Fast Fourier Transform based on real-life captured wave spectrum on the ocean, this requires more computing power however and may not be worth the additional cost depending on the type of real time application. The choice of Gerstner waves is satisfactory to provide a convincing effect.

Although it does not hold up as well as most recent triple-A games and films, it shows a convincing ocean can be implemented very reasonably and that performs on a wide range of hardware. Water simulation is very challenging for modern hardware, however with reasonable approximations and artistic choices a believable surface can be achieved. When used in an application the quality need not be outstanding in order to sell the scene.

In conclusion the implementation outlined creates a reasonable surface that with minor tweaks would be embeddable in an application, there is no single best wave model and many involve trade offs in terms of performance, realism and feasability. It is also a large task to create a framework in C++ that can support modern graphics techniques and is usually not ideal unless custom requirements are needed. Overall the demo can be considered a successful implementation of an ocean surface.

# References

1. Blinn, J. F. (1977). Models of light reflection for computer synthesized pictures.
2. Braley, C., & Sandu, A. (n.d.). Fluid Simulation For Computer Graphics: A Tutorial in Grid Based and Particle Based Methods.
3. Burley, B. (2012). Physically-Based Shading at Disney.
4. Cedilnik, A., Hoffman, B., King, B., Martinm, K., & Neundorf, A. (2000). CMake. Retrieved from https://cmake.org/
5. Dummer, J. (n.d.). Simple OpenGL Image Library. Retrieved from https://www.lonesock.net/soil.html
6. Fiedler, G. (2004). Fix your timestep. Retrieved from https://gafferongames.com/post/fix_your_timestep/
7. Geelnard, M., & Lowy, C. (2002). GLFW. Retrieved from http://www.glfw.org/
8. Gessler, A., Schulze, T., & Kulling, K. (n.d.). Open Asset Import Library. Retrieved from https://github.com/assimp/assimp
9. Hasselmann, K., Barnett, T., Bouws, E., Carlson, H., Cartwright, D., Enke, K., … others. (1973). Measurements of wind-wave growth and swell decay during the Joint North Sea Wave Project (JONSWAP). Deutches Hydrographisches Institut.
10. Herberth, D. (2013). glad. Retrieved from https://github.com/Dav1dde/glad
11. Hoffman, N. (n.d.). Background: Physics and Math of Shading. Retrieved from http://blog.selfshadow.com/publications/s2013-shading-course/hoffman/s2013_pbs_physics_math_slides.pdf
12. Kajiya, J. T. (1986). The rendering equation (4th ed., Vol. 20, pp. 143--150).
13. Karis, B. (2013). Real Shading in Unreal Engine 4.
14. Khronos Group. (1992). OpenGL. Retrieved from https://www.opengl.org/
15. Khronos Group. (2004). A Brief History of Computer Graphics. Retrieved from http://www.comphist.org/computing_history/new_page_6.htm
16. Khronos Group. (2010). OpenGL 4 Update.
17. Khronos Group. (2016a). Physically-Based Rendering in glTF 2.0 using WebGL. Retrieved from https://github.com/KhronosGroup/glTF-WebGL-PBR
18. Khronos Group. (2016b). Vulkan. Retrieved from https://www.khronos.org/vulkan/
19. Kilgard, M. J. (1999). Creating reflections and shadows using stencil buffers.
20. McGuire, M. (2014). Fast Terrain Rendering with Continuous Detail on a Modern GPU. Retrieved from http://casual-effects.blogspot.co.uk/2014/04/fast-terrain-rendering-with-continuous.html
21. Mojang. (2011). Minecraft. Retrieved from https://minecraft.net/
22. Naughty Dog. (2012). Water Technology of Uncharted. Retrieved from https://www.gdcvault.com/play/1015309/Water-Technology-of
23. Naughty Dog. (2016). The Technical Art of Uncharted 4. Retrieved from http://advances.realtimerendering.com/other/2016/naughty_dog/NaughtyDog_TechArt_Final.pdf
24. NVIDIA. (2005). Terrain Rendering Using GPU-Based Geometry Clipmaps. Retrieved from https://developer.nvidia.com/gpugems/GPUGems2/gpugems2_chapter02.html
25. NVIDIA. (2007). GPU Gems Effective Water Simulation from Physical Models. Retrieved from https://developer.nvidia.com/gpugems/GPUGems/gpugems_ch01.html
26. PUBG Corporation . (2017). Retrieved from http://playbattlegrounds.com/main.pu
27. Schlick, C. (1994). An Inexpensive BRDF Model for Physically-based Rendering.
28. Terrell, R. (2016). glsl-atmosphere.
29. Tessendorf, J. (2001). Simulating Ocean Water. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.161.9102&rep=rep1&type=pdf
30. Toman, W. (2009). Rendering Water as a Post-process Effect.
31. Valve. (2010). Water flow in Portal 2. Retrieved from http://www.valvesoftware.com/publications/2010/siggraph2010_vlachos_waterflow.pdf
32. Walter, B., Marschner, S. R., Li, H., & Torrance, K. E. (2007). Microfacet models for refraction through rough surfaces.
33. Zadick, J., Kenwright, B., & Mitchell, K. (2016). Integrating Real-Time Fluid Simulation with a Voxel Engine. https://doi.org/10.1007/s40869-016-0020-5

© 2020 Ryan Welch. All rights reserved.