This article was originally published on ImmersiveShooter.com
VR filmmaking and photography have exploded in the last year, and with a bevy of both professional and consumer 360 cameras hitting the market, this new technology has put a new level of immersive image capture at everyone's fingertips. A couple of years ago, my friend and I got into VR filmmaking by creating an interactive horror film inspired by immersive theater. We shot that project, The Presence, on a consumer-grade camera that gave us monoscopic 360 video. Our second project was an interactive music video that changes seamlessly based on the user’s gaze. For this project, The Cooties - VR, we were able to upgrade to shooting in stereoscopic 360, which gives the viewer a real sense of depth while watching through a VR headset. I was very impressed with the stereo 360 images we were getting, and I assumed that was the best that video was capable of. But lately I’ve been experimenting with an emerging technology called 6DoF, recently revealed to the public by Adobe’s Project Sidewinder and Facebook’s Surround360 6DoF demo. 6DoF brings a whole new dimension of possibilities to VR image capture.
What is 6Dof?
6DoF, or 6-Degrees-of-Freedom, refers to the amount of movement that is tracked in VR. Headsets such as the Samsung GearVR and Google Cardboard are 3-degrees-of-freedom devices (3DoF), because they track rotational orientation through the x, y, and z axes, but they do not track position.
Looking around inside a textured sphere.
360 photos and videos are, essentially, images wrapped around a sphere, and in VR, you are placed in the middle of that sphere. Imagine you are in a big spherical room, and the room is painted with a photorealistic nature scene. While you stand in the center of the room and look around, it might look convincing. But as you start to walk closer to the wall, you would start to notice that the flat, spherical walls lack dimension.
Moving around inside a textured sphere.
Even in stereoscopic 360, you still don’t actually have any 3D shapes to move around, you just have two images textured on a sphere, a slightly different one for each eye. But that can all change with depth maps.
Deep into Depth Maps
Depth maps are images that use shades of grey to represent distances from the camera. Black areas represent the farthest points, and white areas are the closest.
A 6DoF image with equirectangular color image on top and depth map on bottom.
Some high-end VR camera systems such as Google Jump, Kandao Obsidian, and Nokia Ozo already export depth maps out-of-the-box. We shot our second project on a Google Jump camera, and when I saw the depth maps it was capable of exporting, I was immediately curious about how they could be used. I had become quite familiar with the Unity game engine, having used it for both of our previous projects, and so I started to investigate what is possible with these depth maps in Unity. After a lot of experimentation, I was able to write a custom shader that displaces each vertex of the sphere on which the video was textured by the shade of grey recorded in the depth map.
Moving around before and after displacing the sphere using its depth map.
And this just works! The program takes video information, and in real-time, turns it into a textured 3-dimensional mesh that you can walk up to and look around. Of course, it’s not perfect. All of the distances are dependent on the accuracy of the depth map, which is in turn derived from comparing all of the raw captured video images using complex Computer Vision algorithms. These algorithms have trouble with reflections and repeating patterns, and so the depths are not always accurate. But it is early days, and they’ll improve with time.
Some 6DoF techniques do not involve 360 video/photo capture and depth maps, and instead use many cameras pointed inward at a single subject (outside-in). This method captures the subject from all angles and uses photogrammetry to processes it into a 3D object. This particular technique has the advantage of giving the viewer the ability to walk around the subject and to examine it from all sides. But it does not capture the environment, has trouble with complex or multiple subjects, and requires a large setup and lots of processing.
Left: Microsoft Mixed Reality Capture is an outside-in 6DoF capture solution that requires numerous cameras in many positions.
Right: A Google Jump camera on the set of our music video. It captures video from a single camera position.
By contrast, depth map 6DoF is all captured from a single position (inside-out). Using depth maps has the advantage of being able to capture an entire space with a single 360 camera setup. However, because of this set-up, no information is captured “behind” objects. If, for example, a car drives through your scene, the camera will not be able to capture anything that is behind the car. This can cause some major distortions in the 3D mesh, which are more noticeable the further you move away from the origin point.
A car drives by, and the sphere is displaced. Notice that no information is recorded behind the car.
This is why demos such Adobe’s Project Sidewinder and Facebook’s Surround360 6DoF demo limit the viewer to relatively small movements, as the further you move away from the origin, the more distorted the picture gets. These distortions may be improved or even solved in the future by different capture or processing techniques, but right now they’re here to stay.
Point Cloud City
Point clouds are pretty much exactly what they sound like, a big mass of points in 3D space. Each point has a position and a color, and it is not uncommon to be dealing with hundreds of thousands or even millions of points.
So how can we go from a photo or video to a point cloud? Well, if you take an equirectangular 360 picture that has a resolution of 2880x1440, then what you have is a list of just over 4 million pixels, or colors, since each pixel represents one color (Here’s the math: 2880*1440 = 4,147,200). If you then take an equirectangular 360 depth map of the same resolution, what you have is a list of 4 million distances from a single point in space. A little math later, and you’ve got 4 million colors and 4 million corresponding positions in space, which is perfect for being represented as a point cloud.
Adjusting point size and moving around a point cloud.
Point clouds don’t have the same distortion issues that the displacement technique does, but it replaces those distortions with empty gaps. Point clouds are fun to play with, but they don’t really offer many advantages over the displacement technique right now. In the future, however, point clouds could be the path towards some more interesting 6DoF techniques.
You may be wondering, with all these distortions and empty gaps, how is 6DoF any better than 3DoF? When watching a stereo 360 video in 3DoF, you have to remain sitting upright, unable to lean forward or to the side. If you do, the whole virtual world will move with you, which can lead to nausea. Even tilting your head sideways a little can break the stereo effect and cause you to feel cross-eyed. But in 6DoF, you are free to lean and move and tilt. You can sit and move naturally while maintaining full immersion with minimal distortion (within a limited range).
If you want to experience 6DoF right now, I have just released Pseudoscience 6DoF Viewer
, an app that lets you view and experiment with depth-based 6DoF photos and videos. The app features displacement and point cloud modes, real-time lighting effects, and much more. Pseudoscience 6DoF Viewer is free and available for Windows Mixed Reality
and HTC Vive
, and will be coming soon to Oculus Rift. Non-VR desktop versions are also available for Mac and PC. And check out r/6DoF
on Reddit, where you can find and share 6DoF content as well as keep updated about changes in the medium.