Wednesday, November 13, 2019
Title: Challenges and Practices of Video Technology in the Age of AI
Speaker: Dr. Xian-Sheng Hua
Session chair: André Kaup
Abstract: With the rapid progress on machine learning, big data and cloud computing technologies, artificial intelligence has become an integral part of almost all video applications. Digital entertainment, e-commerce, video conferencing, video surveillance, AR and VR, self-driving cars, etc., all rely on deep learning technologies to process the video signals they receive, such as indexing, search, object recognition and segmentation, and so on. Meanwhile, advances in video coding technologies have mainly focused on pushing the video coding efficiency in order to reduce operation cost such as bandwidth cost and storage cost. As we look into the future, should video coding technologies consider a wider range of features when a new video codec is designed such that it becomes easier to integrate with machine learning technologies? In addition, as conventional video codec design has been using metrics such as PSNR, SSIM, and MOS scores from subjective tests for video quality measurement, should additional metrics be designed for video applications that are for machines rather than for humans? This talk will provide an overview of the AI technologies commonly used in a range of video applications, review the challenges they face, and put forward some suggested future directions for video coding.
Xian-Sheng Hua is now a VP/Distinguished Engineer of Alibaba Group, leading the Artificial Intelligence Center of Alibaba Cloud and City Brain Lab of DAMO Academy. Dr. Hua is an IEEE Fellow and ACM Distinguished Scientist. He received his B.S. degree in 1996, and the Ph.D. degree in applied mathematics in 2001, both from Peking University. He joined Microsoft Research Asia in 2001, as a Researcher. He was a Principal Research and Development Lead in Multimedia Search for the Microsoft search engine, Bing, in USA, from 2011 to 2013. He was a Senior Researcher with Microsoft Research Redmond from 2013 to 2015.
He has authored or coauthored more than 200 research papers and has more than 60 granted patents. His research interests include big multimedia data search, advertising, understanding, and mining, as well as pattern recognition and machine learning. Dr. Hua served or is now serving as an Associate Editor for the IEEE Trans. on Multimedia and ACM Transactions on Intelligent Systems and Technology. He served as a Program Co-Chair for IEEE ICME 2013, ACM Multimedia 2012, and IEEE ICME 2012. He was one of the recipients of the 2008 MIT Technology Review TR35 Young Innovator Award for his outstanding contributions on video search. He was the recipient of the Best Paper Awards at ACM Multimedia 2007, and Best Paper Award of the IEEE Trans. on CSVT in 2014.Dr. Hua will be serving as general co-chair of ACM Multimedia 2020.
Thursday, November 14, 2019
Title: Inflated 3D pixels, aka point clouds, also need compression [PDF Download]
Speaker: Marius Preda
Session chair: Jörn Ostermann
Abstract: Point clouds are typically represented by extremely large amounts of data, which is a significant barrier for mass market applications. However, the relative ease to capture and render spatial information compared to other volumetric video representations makes point clouds increasingly popular to present immersive volumetric data.
This talk introduces the technologies developed during the MPEG standardization process for defining an international standard for point cloud compression. The diversity of point clouds in terms of density conducted to the design of two approaches: the first one, called V-PCC (Video based Point Cloud Compression) is projecting the 3D space into a set of 2D patches and encodes them by using traditional video technologies. The second one, called G-PCC (Geometry based Point Cloud Compression) is traversing directly the 3D space in order to create the predictors.
With the current V-PCC encoder implementation providing a compression of 125:1, a dynamic point cloud of 1 million points could be encoded at 8 Mbit/s with good perceptual quality. For the second approach, the current implementation of a lossless, intra-frame G PCC encoder provides a compression ratio up to 10:1 and acceptable quality lossy coding of ratio up to 35:1.
By providing high-level immersiveness at currently available bandwidths, the two MPEG standards are expected to enable several applications such as six Degrees of Freedom (6 DoF) immersive media, virtual reality (VR) / augmented reality (AR), immersive real-time communication, autonomous driving, cultural heritage, and a mix of individual point cloud objects with background 2D/360-degree video.
Marius Preda is associate professor at "Institut MINES-Télécom" and Chairman of the 3D Graphics group of ISO MPEG. He contributed to various ISO standards with technologies in the fields of 3D graphics, virtual worlds and augmented reality and has received several ISO Certifications of Appreciation. Academically, he received a Degree in Engineering from Politehnica Bucharest, a PhD in Mathematics and Informatics from University Paris V and an eMBA from IMT Business School, Paris.
Friday, November 15, 2019
Title: Neural Image Compression: Recent Developments and Opportunities [PDF Download]
Speaker: Dr. George Toderici
Session chair: Touradj Ebrahimi
Abstract: Compressing images with neural networks has been attempted since the late 1980s, but only recently it was proven that it is possible to obtain compression rates that are comparable to those of image codecs such as JPEG90/2000, and more recently H265. I will discuss how neural networks were used in an attempt to bypass the need for an arithmetic coder, and then later on were adapted to directly optimize for the rate-distortion tradeoff. Lossless image compression methods have also been developed using similar techniques, and I will briefly present some recent work in the area. I will then conclude the talk looking at the future of neural image compression and beyond.
Dr. George Toderici is a research scientist who joined Google in 2008, where he has worked on a multitude of research areas. His current work at Google Research is focused on multimedia compression using neural networks. His past projects at Google include the design of neural-network architectures and classical approaches for video classification, action recognition, YouTube channel recommendations, and video enhancement. He is one of the co-organizers of the Challenge on Learned Image Compression (CVPR 2018, CVPR 2019). the THUMOS-2014 and YouTube-8M (CVPR 2017, ECCV 2018, ICCV 2019) video classification challenges, and contributed to the design of the Sports-1M dataset. He has also served as Area Chair for the ACM Multimedia Conference in 2014, and is a regular reviewer for CVPR, ICCV, and NIPS.