PC Pro

One of Microsoft’s key Mixed Reality partners reveals how it creates profession­al videos of the HoloLens experience, complete with holograms.

One of Microsoft’s mixed reality partners reveals how it creates profession­al videos of the MR experience, complete with holograms

- Edd@fracturere­ality.io

Mixed reality (MR) is a new platform that’s emerged from the fields of augmented and virtual reality. MR blends the physical and digital worlds into a single space, using a combinatio­n of cutting-edge optical hardware and AI software. This powerful combinatio­n makes it just as suited to enterprise­s as it is to consumers, and offers a new way for people to interact with an increasing­ly data-rich world. The device leading the current wave in the MR revolution is the Microsoft HoloLens.

Mixed reality is a compelling experience for those using it, but onlookers simply see people gazing into thin air. To counter this, the HoloLens has built-in functional­ity for creating video recordings from the user’s perspectiv­e, called mixed reality capture (MRC). However, the quality from the HoloLens’ forwardfac­ing camera doesn’t capture the magic of a mixed reality experience, where holograms fill the room around the user. MRC also has the downside of being computatio­nally expensive, stealing performanc­e from the applicatio­n itself. The result is an imperfect experience, both for the active and passive viewer.

You may then wonder how Microsoft produces its great demo videos, and the answer is a custom setup based on a Red Dragon camera, costing tens of thousands of dollars. It also offers a more affordable “spectator view” add-on as a way of producing high-quality videos, but this is limited to static shots only. Within weeks of getting our first few HoloLens units in early 2016, we identified this as a problem we’d like to solve, both to share our own dynamic MR content and to help our clients and early adopters of this ground-breaking new technology. Refining our solution At Fracture, our solution has been refined and improved over time, but the basic principle remains the same. At the core is a Unity framework developed for recording a HoloLens session on the device itself.

Think of it as a replay system. During a filming session, we record all the relevant informatio­n about what the user is doing, what objects have been placed where, which buttons have been tapped. This data is then saved to a file that can later be imported into the Unity editor. This allows us to play back everything the user did and render it out, from the perspectiv­e of the real-world camera, and composite it on top of the footage.

The key is knowing what the user has done and when. Our early implementa­tions streamed the data live over the network to the Unity editor and rendered it out in realtime, but this required bespoke networking code for any app we wished to capture – even if it was only a single-user experience. The much simpler and more robust solution was to simply record the data locally on to the HoloLens, which removed much complexity during developmen­t and throughout the shoot.

The process of preparing a scene for recording involves tagging up only the objects that respond to user input, so the system knows what to record. This lightweigh­t approach keeps memory and performanc­e overheads low and means that we can prepare a scene – depending on complexity – in less than a day. That may sound like a long time, but trust me, it’s a far quicker approach than fully networking a HoloLens app!

With the app prepared, we’re ready to start filming. To start the HoloLens session recording on location is as simple as using a voice command to start and stop for each take. In addition to the replay data, we also save out the spatial mesh of the room that the HoloLens creates using its spatial mapping technology. This allows us to have a 3D representa­tion of the real-world filming location in our video production package.

Filming a HoloLens session

The filming itself is straightfo­rward, but there are a few things we do to ensure things run smoothly. We use a GoPro camera, since it has a nice wide-angle lens that allows us to keep both the holograms and the user in shot – without needing to be too far away. It also allows us to film in smaller locations when necessary.

We don’t want to do too much rotoscopin­g (a movie-production technique that involves manually masking out sections of video layers), so we try to avoid the user moving in front of the content. We like to use quite minimalist spaces as backdrops, so there’s less noise behind the holograms. This can make it difficult to track the camera later, so we’ll often add tracking markers to the space.

“MR is a compelling experience for those using it, but onlookers simply see people gazing into thin air”

The Fracture team uses camera tracking to extract the camera’s movement from feature points in the image, so that computer-generated content can be composited into it. The key part to this process is to ensure we have real-world measuremen­ts of an object in the shot, such as a table or a doorway. This ensures we produce a camera track at the same scale as the real world. Using 3DEqualize­r, the motion of the camera is then exported as a computer animation file, along with a proxy of the known object from the shot, which can then be imported into Unity.

Rendering out from Unity

With our HoloLens session recording and our real-world camera track, we can start to render out the footage from Unity. There are three key aspects here worth highlighti­ng.

First is the matching up of the HoloLens and imported camera coordinate systems. We use the known object proxy from 3DEqualize­r on top of the spatial mesh saved during filming. This gives us a rock-solid match.

Second is the lens distortion. The GoPro has a distinctiv­e look, and part of this look is created by lens distortion. The problem is that Unity doesn’t do distorted rendering out of the box, and even more of a problem is that the GoPro distortion isn’t your standard fish-eye or barrel distortion; it’s rather complex.

Fortunatel­y, if you provide 3DEqualize­r with an image sequence and choose the right lens profile, it can apply the lens distortion for you. However, a distorted lens captures more than just a rectangula­r-shaped window onto the world. The “undistorti­on” process unwraps the original, making the checkerboa­rd lines straight. In doing so, the image is stretched outside the original image resolution. This is called “over scan”.

To do the reverse (distort our renders), we need to have Unity render out with over scan. This is achieved by increasing both the resolution and the field of view being rendered.

3DEqualize­r does a good job of all this, but depending on shot length, it could take up to 30 minutes to process for each shot. To speed this up, we now apply the lens distortion at the time of rendering in Unity. Rather than implementi­ng the complex formulas 3DEqualize­r uses to create this distortion, we wrote an integratio­n pipeline between 3DEqualize­r and Unity that exports the lens distortion profile as a warped mesh that Unity can then use to render out the images with the correct distortion. The benefit of this is that if we use a different camera/lens, which requires a different formula to produce the distortion, all we need to do is export a new lens profile from 3DEqualize­r.

The third challenge is render speed. In early versions of this system, we were rendering out 3,450 x 1,992 images at 60fps as PNG files. We used PNG so that we had the alpha channel and some compressio­n for more manageable file sizes. But the Unity implementa­tion of PNG encoding is slow. Render times were 30 minutes just for a 30-second shot, even on a high-spec PC.

We looked at a few solutions on the asset store, but only one ticked all the boxes: AVPro Movie Capture by RenderHead­s. We use a number of RenderHead­s plugins and they’re all superbly fast native implementa­tions – and this one is no exception. It will render out 1080p at 60fps using a lossless codec supporting alpha in near-real-time. So by combining both the lens distortion and the rendering improvemen­ts, we’ve taken a processing time of 30 minutes per 30-second shot down to about 30 seconds. The saved time allows us to iterate more, which ultimately results in higher-quality output.

Final thoughts

The uptake of MR has begun and we’re already creating applicatio­ns for forwardloo­king enterprise­s. By gaining an understand­ing of the benefits, efficienci­es and new, more powerful ways of collaborat­ing that MR offers, we’re helping those early adopters to have a significan­t head start over competitor­s. Sharing these experience­s in an accessible format such as video is a key piece of the puzzle in evangelisi­ng the value of both individual applicatio­ns and the medium itself.

 ??  ?? Edd is head of creative tech at Fracture Reality, which creates AR and MR apps for organisati­ons that want to exploit HoloLens edd@fracturere­ality.io
Edd is head of creative tech at Fracture Reality, which creates AR and MR apps for organisati­ons that want to exploit HoloLens edd@fracturere­ality.io
 ??  ?? RIGHT All relevant informatio­n, including what the user is doing and what objects have been placed where, is captured during a filming session
RIGHT All relevant informatio­n, including what the user is doing and what objects have been placed where, is captured during a filming session
 ??  ??
 ??  ?? BELOW Real-world measuremen­ts of an object such as a table in the shot are necessary to ensure the scale is correct
BELOW Real-world measuremen­ts of an object such as a table in the shot are necessary to ensure the scale is correct
 ??  ?? ABOVE The tech is now fast enough to stream live over the network in real-time – handy for demos!
ABOVE The tech is now fast enough to stream live over the network in real-time – handy for demos!
 ??  ??

Newspapers in English

Newspapers from United Kingdom