VR filmmaking and photography have exploded in the last year, and with a bevy of both professional and consumer 360 cameras hitting the market, this new technology has put a new level of immersive image capture at everyone's fingertips. A couple of years ago, my friend and I got into VR filmmaking by creating an interactive horror film inspired by immersive theater. We shot that project, The Presence, on a consumer-grade camera that gave us monoscopic 360 video. Our second project was an interactive music video that changes seamlessly based on the user’s gaze. For this project, The Cooties - VR, we were able to upgrade to shooting in stereoscopic 360, which gives the viewer a real sense of depth while watching through a VR headset. I was very impressed with the stereo 360 images we were getting, and I assumed that was the best that video was capable of. But lately I’ve been experimenting with an emerging technology called 6DoF, recently revealed to the public by Adobe’s Project Sidewinder and Facebook’s Surround360 6DoF demo. 6DoF brings a whole new dimension of possibilities to VR image capture.
What is 6Dof?
6DoF, or 6-Degrees-of-Freedom, refers to the amount of movement that is tracked in VR. Headsets such as the Samsung GearVR and Google Cardboard are 3-degrees-of-freedom devices (3DoF), because they track rotational orientation through the x, y, and z axes, but they do not track position.
Looking around inside a textured sphere.
360 photos and videos are, essentially, images wrapped around a sphere, and in VR, you are placed in the middle of that sphere. Imagine you are in a big spherical room, and the room is painted with a photorealistic nature scene. While you stand in the center of the room and look around, it might look convincing. But as you start to walk closer to the wall, you would start to notice that the flat, spherical walls lack dimension.
Moving around inside a textured sphere.
Even in stereoscopic 360, you still don’t actually have any 3D shapes to move around, you just have two images textured on a sphere, a slightly different one for each eye. But that can all change with depth maps.
Deep into Depth Maps
Depth maps are images that use shades of grey to represent distances from the camera. Black areas represent the farthest points, and white areas are the closest.
A 6DoF image with equirectangular color image on top and depth map on bottom.
Some high-end VR camera systems such as Google Jump, Kandao Obsidian, and Nokia Ozo already export depth maps out-of-the-box. We shot our second project on a Google Jump camera, and when I saw the depth maps it was capable of exporting, I was immediately curious about how they could be used. I had become quite familiar with the Unity game engine, having used it for both of our previous projects, and so I started to investigate what is possible with these depth maps in Unity. After a lot of experimentation, I was able to write a custom shader that displaces each vertex of the sphere on which the video was textured by the shade of grey recorded in the depth map.
Moving around before and after displacing the sphere using its depth map.
And this just works! The program takes video information, and in real-time, turns it into a textured 3-dimensional mesh that you can walk up to and look around. Of course, it’s not perfect. All of the distances are dependent on the accuracy of the depth map, which is in turn derived from comparing all of the raw captured video images using complex Computer Vision algorithms. These algorithms have trouble with reflections and repeating patterns, and so the depths are not always accurate. But it is early days, and they’ll improve with time.
Some 6DoF techniques do not involve 360 video/photo capture and depth maps, and instead use many cameras pointed inward at a single subject (outside-in). This method captures the subject from all angles and uses photogrammetry to processes it into a 3D object. This particular technique has the advantage of giving the viewer the ability to walk around the subject and to examine it from all sides. But it does not capture the environment, has trouble with complex or multiple subjects, and requires a large setup and lots of processing.
Left: Microsoft Mixed Reality Capture is an outside-in 6DoF capture solution that requires numerous cameras in many positions.
Right: A Google Jump camera on the set of our music video. It captures video from a single camera position.
By contrast, depth map 6DoF is all captured from a single position (inside-out). Using depth maps has the advantage of being able to capture an entire space with a single 360 camera setup. However, because of this set-up, no information is captured “behind” objects. If, for example, a car drives through your scene, the camera will not be able to capture anything that is behind the car. This can cause some major distortions in the 3D mesh, which are more noticeable the further you move away from the origin point.
A car drives by, and the sphere is displaced. Notice that no information is recorded behind the car.
This is why demos such Adobe’s Project Sidewinder and Facebook’s Surround360 6DoF demo limit the viewer to relatively small movements, as the further you move away from the origin, the more distorted the picture gets. These distortions may be improved or even solved in the future by different capture or processing techniques, but right now they’re here to stay.
Point Cloud City
Point clouds are pretty much exactly what they sound like, a big mass of points in 3D space. Each point has a position and a color, and it is not uncommon to be dealing with hundreds of thousands or even millions of points.
So how can we go from a photo or video to a point cloud? Well, if you take an equirectangular 360 picture that has a resolution of 2880x1440, then what you have is a list of just over 4 million pixels, or colors, since each pixel represents one color (Here’s the math: 2880*1440 = 4,147,200). If you then take an equirectangular 360 depth map of the same resolution, what you have is a list of 4 million distances from a single point in space. A little math later, and you’ve got 4 million colors and 4 million corresponding positions in space, which is perfect for being represented as a point cloud.
Adjusting point size and moving around a point cloud.
Point clouds don’t have the same distortion issues that the displacement technique does, but it replaces those distortions with empty gaps. Point clouds are fun to play with, but they don’t really offer many advantages over the displacement technique right now. In the future, however, point clouds could be the path towards some more interesting 6DoF techniques.
You may be wondering, with all these distortions and empty gaps, how is 6DoF any better than 3DoF? When watching a stereo 360 video in 3DoF, you have to remain sitting upright, unable to lean forward or to the side. If you do, the whole virtual world will move with you, which can lead to nausea. Even tilting your head sideways a little can break the stereo effect and cause you to feel cross-eyed. But in 6DoF, you are free to lean and move and tilt. You can sit and move naturally while maintaining full immersion with minimal distortion (within a limited range).
If you want to experience 6DoF right now, I have just released Pseudoscience 6DoF Viewer, an app that lets you view and experiment with depth-based 6DoF photos and videos. The app features displacement and point cloud modes, real-time lighting effects, and much more. Pseudoscience 6DoF Viewer is free and available for Windows Mixed Reality and HTC Vive, and will be coming soon to Oculus Rift. Non-VR desktop versions are also available for Mac and PC. And check out r/6DoF on Reddit, where you can find and share 6DoF content as well as keep updated about changes in the medium.
Note: This writeup was created when we originally released The Presence, but was never published. Much of the information is outdated but being posted here for posterity.
VR is dead. Long live VR.
Anytime I see the words "virtual reality," I'm immediately taken back to my childhood and the promise of early 90s technology. I remember the advertisements for Nintendo’s Virtualboy - just glimpsing inside the futuristic-looking visor would transport you into the digital world. VR was not just sold as a gaming device: it was in books, television shows, magazines and movies, it was the inevitable medium of the future.
What a disappointing let down that vision ended up being - the human imagination was far greater than technology would allow. It felt like we had collectively decided VR was never going to be a real thing and time marched on.
I hadn’t really thought much more about VR in the 20 years that followed, it never came up in film school or in any facet of my work within the film industry. That was until I had been given an early version of the Gear VR to try. It was very much a work-in-progress, but still - the possibilities were apparent.
Nonetheless, VR would have probably remained just a curiosity for me had I not happened to mention the experience to my friend and future Pseudoscience co-founder Josh Gladstone. Unbeknownst to me Josh had been a life-long fan of Virtual Reality, as a young kid he got to experience Disney’s early foray into VR and it had made a tremendous impact on him. So when I told him what Oculus and Samsung had developed, a light went off. In a very short while Josh had done a thorough study of the VR landscape - it seemed to him that no one had quite cracked how to tell stories effectively. Thus began a long conversation about the merits and pitfalls of storytelling in this new medium.
Curiosity becomes obsession.
I liken those early discussions and discoveries to something akin to what the Lumière Brothers, Eisenstein & Kuleshov must have gone through. The notion that the moving picture could be more than just a train running through a theater, that the juxtaposition of images could create a feeling, a real emotion and those emotions layered within a plot could create an immersive story that transcends the technology. What we saw in those early VR demos was only light projected on a wall – but if films could grow to be more than spectacle, couldn’t there be more to Virtual Reality than just a 360° image? Couldn’t VR be an entirely new language of visual storytelling?
Most VR content we found seemed to just apply traditional 2-dimensional editing to 360 films. We even sat in on a 'VR discussion' for working film editors here in Los Angeles but to our disappointment it was more of the same - treat the footage like every other piece of media. It seemed to us that no one was even considering what it meant for a viewer to have 360 degrees of information. Josh and I felt like we were the only ones seeing past the fourth wall; either that or maybe we just didn't get it.
The kernel of our thinking began with the idea that in a traditional film the director knows where the audience is going to be looking but in 360° a person is not only able to, but invited to explore the whole space. So rather than forcing that person’s attention away with an unanticipated cut, why not use their curiosity to let the story unfold on its own. We followed that logic to it’s natural end and realized there might be too much variation, too many possibilities of what someone could be looking at – it’d be impossible! But then again, what could be the harm in trying?
Testing the waters.
In looking at the broader market for 360° cameras, it seemed to us (in mid 2015) that there were only a handful of options for 360° cameras and that given our budget nothing would look particularly impressive. Which is why we had initially settled on the Ricoh Theta S as our camera. The consumer pricing met our budget and as far as we could tell the resolution seemed reasonable 1920 x 1080 (we would be wrong but more on that later.)
We started to shoot very basic lighting tests – Josh, who had a 3D printer, began experimenting on various camera rigs that would allow us to shoot in 360° without having to worry about removing the tripod. Our mindset was "this is a test, let's make it easy on ourselves."
Josh had proven himself the better programmer having already experimenting with coding for arduinos (mostly creating camera shutters for old cameras). We decided the easiest way to incorporate interactivity would be inside of a game engine. We quickly overcame the first of many hurdles by finding a plugin for Unity that allows videos to play on spherical objects on any mobile device (Easy Movie Texture). Josh then built a platform that could run on the Gear VR and we began to build the game from there.
We wanted to tell a simple story that would allow us to showcase multiple perspectives and angles. We landed upon a horror concept that seemed to fit the bill. We knew we wanted a small cast and crew, both for budgetary reasons and also because we didn’t know if this whole thing was going to be a disaster and frankly the less people involved – the better.
So when it came time to casting I knew I wanted to work with people we would be comfortable to "fail in front of." Which is why we were so lucky to get the cast that we did.
At the end of the day our four person crew consisted of Josh and myself – alternating duties on direction, camera, sound, grip and gaffing. As well as a very crucial makeup/sfx artist and a last minute PA.
With all of our cast and crew we brought them in for individual sitdowns, not just to talk about the complexities of the shoot, but to also show them our demos so that everyone who participated would understand what we were trying to accomplish, and though there was was some hesitation - the overall enthusiasm was enough to paint the picture.
Two weeks before the shoot - Josh and I had made a last minute decision to lose our original camera choice (the Ricoh Theta S) and instead use two back-to-back Kodak Pixpro360 4ks, which had just come out. The main reason was the resolution increase (3840 x 1920), but with the improved resolution came a number of unforeseen problems. Josh had to quickly create a new hanging rig to accommodate the added weight of the new cameras.
Quiet on set.
Because we only had one camera rig, the basic methodology for our two night shoot was that we’d place the rig in the center of a room, have our actors perform the entire movie, and then reset the camera elsewhere and have our actors perform the whole move again, and again and again and again. We knew this was going to be unusual but we hadn’t anticipated all the other problems that would come about...
Any shoot, large or small is going to have unanticipated setbacks that you just have to roll with – but shooting in 360° makes everything so much more complicated. First and foremost you can’t have lights, cables or crew visible. So how do you go about trying to film something with any sort of production value? Well we tried our best, for the most part we adopted a Dogme 95 style using only practical/natural lights or the occasionally well-hidden 1k. With no boom mic or sound crew – we mic'd all of our actors up and hid additional lavs where we could, this allowed us to capture enough audio to create a compelling mix in post.
We ran into another major issue after we shot our first scene, the “elevator” scene. When we downloaded the footage, we noticed a weird green glow around our actors. After some quick analysis we discovered that despite all of our prior camera tests, we hadn’t really tested how the Pixpro’s sensor would react to low lighting, to our great dismay the green artifact was there no matter what we did. So we just made the decision to continue shooting and hope that we might be able to minimize the visual artifact in post.
As we moved along directing proved to be more difficult than we had anticipated – the system didn’t allow us to preview the footage until we downloaded it later. So we had discussed hiding a 2nd, hidden camera in each room so that we could see the performances without physically being in the space – but our hastily made solution ran into technical problems in our third scene and we were forced to abandon it as we were getting behind schedule. This meant we had to ask a lot of our actors who were already outside of their comfort zone (performing to a vague voyeur hanging from a stick on the ceiling) to also be our eyes in case something happened. Thankfully everyone pulled together and the footage turned out great. And for this Josh and I owe a great debt of gratitude to the professionalism of our actors.
My favorite shot in the movie is the scene in the bathroom where one of the characters looks in the mirror to discover blood pouring from his head. There's a couple technical things going on here that took a while to figure out. Our first issue was just a practical one – unlike the other rooms, space in the bathroom is limited and there’s also a mirror. So we were forced to paint out the camera – what we discovered is that we could reuse one camera's footage, warp it and cut it into the shape of the mirror to make it look seamless. The other issue was a classic film school problem - how to get blood to pour from your actor's head. Initially our Makeup/SFX artist came up with a ‘gravity well’ that was hidden beneath his beanie – but it didn't work. So the backup solution involved hiding someone in the bathroom with a pump and activating it by hand. But don't forget - tiny space and a camera that sees everything - so our solution was to paint her and the tube out with vfx.
Another problem was the constant resetting of props and makeup. Throughout the course of the movie wine is poured into four glasses, lighting effects are triggered, two people are covered in blood, and a third is thrown against a greenscreen. Trying to replicate these actions on every camera setup proved taxing, so some scenes were broken up into two parts (but only in rooms where there weren’t actors visible). All in all we were able to shoot about 80% of the “active” version on day 1 and the remaining 20% as well as all the closeup shots utilized in the “reactive” version were completed on day 2. So around sunrise on the third day we successfully wrapped and sent our actors home. Josh and I then began the arduous task of post-production and we discovered a horrible new word that would become the bane of our existence: stitching.
Post-production from hell.
Josh and I both have backgrounds in editing, so initially we weren't too concerned about our Post-production schedule. But stitching wasn’t a process we were too familiar with – in our earlier tests we shot only on the Ricoh – and one of the nice things about that camera is it doesn’t require additional stitching beyond what their own software exports. The Pixpros however don’t work as seamlessly. They do provide software which does a rough stitch, but we found the stitchline to be obnoxiously noticeable so we decided to stich ourselves.
The workflow we ended up with is a hodgepodge of different bits of software that took a tremendous amount of time in the long-run. First we’d use Kodak’s proprietary program to export an unstitched equirectangular mp4 video of each camera. We’d then convert that video to ProRes and bring it into After Effects. In After Effects we’d painstakingly mesh and distort and feather the two images until the stitchlines disappeared, then we’d leave it to render. Each scene was at least six minutes long, and at 4k resolution our tiny macbooks took ages to render. Once the footage was rendered we’d bring the files into Final Cut Pro for coloring. Josh created a simple program in Unity we called Juxtoposition that would pull screengrabs and display them in a sphere - that allowed us to see the final look and color of each scene in 360.
Once all this was done we were able to do some of the more traditional aspects of post – dialogue editing, scoring, mixing. We had sent rough cuts to our composer – he delivered a beautifully haunting score that we then broke apart and incorporated into the various shots and game elements, so they would activate when the story merited them. We also brought other dialogue and sound effects directly into Unity so we could spatial sound. Depending on where you’re looking you’ll hear the dialogue, sound effects and score differently.
All this just got us to the finish line of the “Active” version, to do the “Reactive” version required an additional marathon effort by Josh. Unlike the “Active” version, which only required the videos to all be synced altogether, the “Reactive” version required windows of time for each video to engage the viewer before jumping to another perspective. In order to do this we had to anticipate every possible experience someone could encounter. It was an editor’s nightmare, cutting and recutting based on momentary decisions and every version had to exist and mix with one another. And because we were building this program from the ground up – it was all done in very dense code - it's like editing in 5 dimensions, blind. I think the end result is impressive but it's all to Josh's credit as it was a behemoth to accomplish.
Getting it to you.
When we had set out to create this experiment we hadn’t really thought about how we’d distribute it to other people. To remind you we were filmmakers not app developers. We knew there were a few video distribution apps like Samsung VR (Formerly MilkVR), Youtube and even Facebook. But none of these platforms could really support the interactive element of The Presence. So the only other choice was to publish ourselves.
Considering we had little to no programming experience the idea of now learning how to distribute an app seemed like a totally different ballgame. But by this point we had already accomplished so much, what was one more hoop to jump through? We targeted three stores – The Oculus Store on the Gear VR, the Apple Store and Google’s Play Store. Each store presented different problems, the main one they all shared was that none could host our video files.
In their final state the video files that made up the 6 minute short film came in at just under 10gb of data. That means every time someone wanted to download the app, they'd need 10gigs of space and some server would need to provide 10gigs of bandwidth. In order to make an app that people could download in a reasonable amount of time, we had to find a new compression method as well as acquire a reliable hosting server. After a lot of trial and error we ended up using the HVEC H.265 codec for Android devices and Apple’s M4V encoder for the Apple Store. This meant a substantial loss of quality, but there was no alternative. Hosting was also another issue, after failing to find a partner to help host the files we decided to pay for server space ourselves, it’s a strange thing to invest so much time and money in something you’re giving away for free but there’s no infrastructure yet and it was the only solution we saw.
The final steps of app development is its own story, but suffice to say if production was a marathon, post-production an ultramarathon, then the app approval process was like doing both, backwards on one foot, uphill through the snow.
What does this all mean? Only time will tell. It’s way too early to know what people will make of our little experiment. Hopefully we can take what we learned on this project and challenge ourselves to dream bigger on the next one. But for myself and Josh I can say it was well worth the experience. Once you truly embrace the idea of a fully interactive and immersive world, the rules for storytelling get rewritten and there's something thrilling about that shift. Don’t get me wrong we still love movies, but if we’re right and VR isn’t just a passing gimmick, then it really is a new world out there with unique and compelling stories to be told, and frankly we’re excited to be on the forefront of this new thing – whatever it’ll end up being.
1 I use terms like VR, 360°, 3D, 2D and immersive filmmaking somewhat interchangeably. I'm well aware that people can be quite vocal about their specific definitions of what is and is not Virtual Reality. For the sake of diplomacy here's where I stand: I do believe that 3D 360° live-action is much more immersive than it's 2D counterpart, though I would argue neither are truly VR because they aren't interactive. For this reason, I believe that our experiment, while not stereoscopic, is much more immersive than other content being labelled as VR. Sure we'd like the opportunity to shoot an interactive film using stereoscopic cameras, but until then I'll happily defend The Presence as a fully immersive VR experience. Though if we really want to get picky only CG-rendered games, where the user can truly interact in a sandbox environment is probably the closest thing to true 'Virtual Reality.'
2 All of our crew members contributions were so great that I doubt we would have finished shooting without their involvement, and they certainly deserve as much credit as we can give them.
3 Each video is around 6 minutes, and the project has 24 of them - so truthfully our 6 minute film is closer in filesize to a two and a half hour 4k film.
4 An unfortunate anecdote about the server, when we soft-launched on Wednesday, June 29th, 2016 - the server held up fine and we started to get some positive reviews. We went to sleep feeling good about all our hard work, when we awoke on Thursday to our actual launch day, we discovered a series of damning reviews "doesn't even work," "black screen wtf." It turns out our server crashed overnight and simply couldn't handle the demand. We had purchased a low-end virtual server, which was designed to be a low-cost solution to longterm hosting. We never expected such a huge demand overnight. We were able to get the server back up but to our shock and surprise, the amount of downloads continued to exceeded anything we were prepared for - and furthermore our server was charging us per GB, some crude math told me that we were literally going to be paying hundreds of dollars a day to share this project for free. I quickly had to find a new solution and ended up paying for a more robust machine with huge bandwidth upfront, as well as including a CDN - all these things a developer would have likely known to do but we had to learn the hard way, and our early reviews and overall rating reflect people's vitriol when something doesn't immediately work. C'est la vie.