University of Toronto St. George Campus - Mathematics
Senior Staff at AMD
Mehdi
Saeedi
Toronto, Canada Area
Mehdi Saeedi is an AMD Senior Staff and currently is a member of Architecture team, responsible for innovative future-looking solutions for different SW and ASICs.
Since joining AMD, Mehdi worked on many HW and SW projects with applications in Video Processing, Video Encode, and GFX Design. His research and innovations result in several granted patents and over a dozen pending patent applications in process. He also authored and co-authored more than 50 reviewed conference and journal papers.
Mehdi holds a PhD in Computer Engineering (2010) from Tehran Polytechnic, and has a postdoc in Electrical & Computer Engineering (2013) from the University of Southern California (USC). Since 2015 he is elevated to IEEE Senior Member.
Mehdi's research interests include Deep Learning, Hardware Architecture and Accelerator, Algorithm Design, and Emerging Technologies
Specialties: Deep learning accelerator, System-level design, RTL/ Digital design, ASIC & FPGA-based data processing systems, Algorithm Design, Large-Scale Optimization, Programming, Video, Hardware Architecture
PhD
Computer Engineering
Master of Science
Computer Engineering
Postdoc
Computer Science
Applied algorithms in graph theory and VLSI design for synthesis/physical design of new technologies
● Contributed in a multi-university Integrated Design Environment (IDE) for a quantum computing system
Bachelor of Science
Computer Engineering
Research Assistant
● Implemented datalink and coding/synchronization sublayer of CCSDS standard in C++ for a space data system for performance evaluation. Used multi-threading, socket programming, and kernel programming in C++ to handle different complexities of the system in software. Developed a GUI in Linux based on QT for parameter handling.
● Designed and implemented a congestion estimation and reduction algorithm during placement in C++ for standard cells in ASIC design
Lecturer
Taught several undergraduate courses including Electric Circuit II, Logic Design, Electronic Circuits, and Digital Electronics
Quantum Information Processing
While a couple of impressive quantum technologies have been proposed, they have several intrinsic limitations which must be considered by circuit designers to produce realizable circuits. Limited interaction distance between gate qubits is one of the most common limitations. In this paper, we suggest extensions of the existing synthesis flow aimed to realize circuits for quantum architectures with linear nearest neighbor (LNN) interaction. To this end, a template matching optimization, an exact synthesis approach, and two reordering strategies are introduced. The proposed methods are combined as an integrated synthesis flow. Experiments show that by using the suggested flow, quantum cost can be improved by more than 50% on average.
Quantum Information Processing
While a couple of impressive quantum technologies have been proposed, they have several intrinsic limitations which must be considered by circuit designers to produce realizable circuits. Limited interaction distance between gate qubits is one of the most common limitations. In this paper, we suggest extensions of the existing synthesis flow aimed to realize circuits for quantum architectures with linear nearest neighbor (LNN) interaction. To this end, a template matching optimization, an exact synthesis approach, and two reordering strategies are introduced. The proposed methods are combined as an integrated synthesis flow. Experiments show that by using the suggested flow, quantum cost can be improved by more than 50% on average.
MMSys '19 Proceedings of the 10th ACM Multimedia Systems Conference
Cloud gaming allows users with thin-clients to play complex games on their end devices as the bulk of processing is offloaded to remote servers. A thin-client is only required to have basic decoding capabilities which exist on most modern devices. The result of the remote processing is an encoded video that gets streamed to the client. As modern games are complex in terms of graphics and motion, the encoded video requires high bandwidth to provide acceptable Quality of Experience (QoE) to end users. The cost incurred by the cloud gaming service provider to stream the encoded video at such high bandwidth grows rapidly with the increase in the number of users. In this paper, we present a content-aware video encoding method for cloud gaming (referred to as CAVE) to improve the perceptual quality of the streamed video frames with comparable bandwidth requirements. This is a challenging task because of the stringent requirements on latency in cloud gaming, which impose additional restrictions on frame sizes as well as processing time to limit the total latency perceived by clients. Unlike many of the previous works, the proposed method is suitable for the state-of-the-art High Efficiency Video Coding (HEVC) encoder, which by itself offers substantial bitrate savings compared to prior encoders. The proposed method leverages information from the game such as the Regions-of-Interest (ROIs), and optimizes the quality by allocating different amounts of bits to various areas in the video frames. Through actual implementation in an open-source cloud gaming platform, we show that the proposed method achieves quality gains in ROIs that can be translated to bitrate savings between 21% and 46% against the baseline HEVC encoder and between 12% and 89% against the closest work in the literature.
Quantum Information Processing
While a couple of impressive quantum technologies have been proposed, they have several intrinsic limitations which must be considered by circuit designers to produce realizable circuits. Limited interaction distance between gate qubits is one of the most common limitations. In this paper, we suggest extensions of the existing synthesis flow aimed to realize circuits for quantum architectures with linear nearest neighbor (LNN) interaction. To this end, a template matching optimization, an exact synthesis approach, and two reordering strategies are introduced. The proposed methods are combined as an integrated synthesis flow. Experiments show that by using the suggested flow, quantum cost can be improved by more than 50% on average.
MMSys '19 Proceedings of the 10th ACM Multimedia Systems Conference
Cloud gaming allows users with thin-clients to play complex games on their end devices as the bulk of processing is offloaded to remote servers. A thin-client is only required to have basic decoding capabilities which exist on most modern devices. The result of the remote processing is an encoded video that gets streamed to the client. As modern games are complex in terms of graphics and motion, the encoded video requires high bandwidth to provide acceptable Quality of Experience (QoE) to end users. The cost incurred by the cloud gaming service provider to stream the encoded video at such high bandwidth grows rapidly with the increase in the number of users. In this paper, we present a content-aware video encoding method for cloud gaming (referred to as CAVE) to improve the perceptual quality of the streamed video frames with comparable bandwidth requirements. This is a challenging task because of the stringent requirements on latency in cloud gaming, which impose additional restrictions on frame sizes as well as processing time to limit the total latency perceived by clients. Unlike many of the previous works, the proposed method is suitable for the state-of-the-art High Efficiency Video Coding (HEVC) encoder, which by itself offers substantial bitrate savings compared to prior encoders. The proposed method leverages information from the game such as the Regions-of-Interest (ROIs), and optimizes the quality by allocating different amounts of bits to various areas in the video frames. Through actual implementation in an open-source cloud gaming platform, we show that the proposed method achieves quality gains in ROIs that can be translated to bitrate savings between 21% and 46% against the baseline HEVC encoder and between 12% and 89% against the closest work in the literature.
Quantum Information and Computation
Reversible circuits for modular multiplication Cx%M with x
Quantum Information Processing
While a couple of impressive quantum technologies have been proposed, they have several intrinsic limitations which must be considered by circuit designers to produce realizable circuits. Limited interaction distance between gate qubits is one of the most common limitations. In this paper, we suggest extensions of the existing synthesis flow aimed to realize circuits for quantum architectures with linear nearest neighbor (LNN) interaction. To this end, a template matching optimization, an exact synthesis approach, and two reordering strategies are introduced. The proposed methods are combined as an integrated synthesis flow. Experiments show that by using the suggested flow, quantum cost can be improved by more than 50% on average.
MMSys '19 Proceedings of the 10th ACM Multimedia Systems Conference
Cloud gaming allows users with thin-clients to play complex games on their end devices as the bulk of processing is offloaded to remote servers. A thin-client is only required to have basic decoding capabilities which exist on most modern devices. The result of the remote processing is an encoded video that gets streamed to the client. As modern games are complex in terms of graphics and motion, the encoded video requires high bandwidth to provide acceptable Quality of Experience (QoE) to end users. The cost incurred by the cloud gaming service provider to stream the encoded video at such high bandwidth grows rapidly with the increase in the number of users. In this paper, we present a content-aware video encoding method for cloud gaming (referred to as CAVE) to improve the perceptual quality of the streamed video frames with comparable bandwidth requirements. This is a challenging task because of the stringent requirements on latency in cloud gaming, which impose additional restrictions on frame sizes as well as processing time to limit the total latency perceived by clients. Unlike many of the previous works, the proposed method is suitable for the state-of-the-art High Efficiency Video Coding (HEVC) encoder, which by itself offers substantial bitrate savings compared to prior encoders. The proposed method leverages information from the game such as the Regions-of-Interest (ROIs), and optimizes the quality by allocating different amounts of bits to various areas in the video frames. Through actual implementation in an open-source cloud gaming platform, we show that the proposed method achieves quality gains in ROIs that can be translated to bitrate savings between 21% and 46% against the baseline HEVC encoder and between 12% and 89% against the closest work in the literature.
Quantum Information and Computation
Reversible circuits for modular multiplication Cx%M with x
Signal, Image and Video Processing
Bitrate reduction with little to no degradation in visual perception is a long-standing challenge in video coding. This paper targets this challenge by adaptively filtering the content prior to video compression and in the preprocessing stage. This is done by applying a bilateral filter where the filter parameters are selected according to regional content complexity and estimated visual importance besides bitrate and quality requirements. A multi-scale metric based on 2D gradient is employed to determine bandwidth requirements of different regions. A random forest regression model is trained to predict distortion and bit requirements for a block, if it is filtered and encoded at a given quality. The predicted distortion and bit requirements are used to select filter parameters considering a cost function. The proposed approach is applied to both H.264 and HEVC encoders, with different GOP structures. The results show up to 60% bitrate reduction in terms of BD-Rate (about 20% on average) for the attempted test cases with little to no noticeable quality degradation.
Quantum Information Processing
While a couple of impressive quantum technologies have been proposed, they have several intrinsic limitations which must be considered by circuit designers to produce realizable circuits. Limited interaction distance between gate qubits is one of the most common limitations. In this paper, we suggest extensions of the existing synthesis flow aimed to realize circuits for quantum architectures with linear nearest neighbor (LNN) interaction. To this end, a template matching optimization, an exact synthesis approach, and two reordering strategies are introduced. The proposed methods are combined as an integrated synthesis flow. Experiments show that by using the suggested flow, quantum cost can be improved by more than 50% on average.
MMSys '19 Proceedings of the 10th ACM Multimedia Systems Conference
Cloud gaming allows users with thin-clients to play complex games on their end devices as the bulk of processing is offloaded to remote servers. A thin-client is only required to have basic decoding capabilities which exist on most modern devices. The result of the remote processing is an encoded video that gets streamed to the client. As modern games are complex in terms of graphics and motion, the encoded video requires high bandwidth to provide acceptable Quality of Experience (QoE) to end users. The cost incurred by the cloud gaming service provider to stream the encoded video at such high bandwidth grows rapidly with the increase in the number of users. In this paper, we present a content-aware video encoding method for cloud gaming (referred to as CAVE) to improve the perceptual quality of the streamed video frames with comparable bandwidth requirements. This is a challenging task because of the stringent requirements on latency in cloud gaming, which impose additional restrictions on frame sizes as well as processing time to limit the total latency perceived by clients. Unlike many of the previous works, the proposed method is suitable for the state-of-the-art High Efficiency Video Coding (HEVC) encoder, which by itself offers substantial bitrate savings compared to prior encoders. The proposed method leverages information from the game such as the Regions-of-Interest (ROIs), and optimizes the quality by allocating different amounts of bits to various areas in the video frames. Through actual implementation in an open-source cloud gaming platform, we show that the proposed method achieves quality gains in ROIs that can be translated to bitrate savings between 21% and 46% against the baseline HEVC encoder and between 12% and 89% against the closest work in the literature.
Quantum Information and Computation
Reversible circuits for modular multiplication Cx%M with x
Signal, Image and Video Processing
Bitrate reduction with little to no degradation in visual perception is a long-standing challenge in video coding. This paper targets this challenge by adaptively filtering the content prior to video compression and in the preprocessing stage. This is done by applying a bilateral filter where the filter parameters are selected according to regional content complexity and estimated visual importance besides bitrate and quality requirements. A multi-scale metric based on 2D gradient is employed to determine bandwidth requirements of different regions. A random forest regression model is trained to predict distortion and bit requirements for a block, if it is filtered and encoded at a given quality. The predicted distortion and bit requirements are used to select filter parameters considering a cost function. The proposed approach is applied to both H.264 and HEVC encoders, with different GOP structures. The results show up to 60% bitrate reduction in terms of BD-Rate (about 20% on average) for the attempted test cases with little to no noticeable quality degradation.
IEEE International Conference on Multimedia and Expo Workshop (ICMEW)
Aiming at improved rate-distortion (R-D) performance, this paper presents a machine-learning-based solution for the run-time video resolution adaptation problem. The proposed approach utilizes neural networks that leverage a complexity feature extracted from the video frames to predict a quantization parameter (QP) for downscaled video targeting the same bitrate as the native video. The peak signal to noise ratio (PSNR) is also predicted for both the native and downscaled resolutions, and the one that leads to the highest PSNR is selected.
Quantum Information Processing
While a couple of impressive quantum technologies have been proposed, they have several intrinsic limitations which must be considered by circuit designers to produce realizable circuits. Limited interaction distance between gate qubits is one of the most common limitations. In this paper, we suggest extensions of the existing synthesis flow aimed to realize circuits for quantum architectures with linear nearest neighbor (LNN) interaction. To this end, a template matching optimization, an exact synthesis approach, and two reordering strategies are introduced. The proposed methods are combined as an integrated synthesis flow. Experiments show that by using the suggested flow, quantum cost can be improved by more than 50% on average.
MMSys '19 Proceedings of the 10th ACM Multimedia Systems Conference
Cloud gaming allows users with thin-clients to play complex games on their end devices as the bulk of processing is offloaded to remote servers. A thin-client is only required to have basic decoding capabilities which exist on most modern devices. The result of the remote processing is an encoded video that gets streamed to the client. As modern games are complex in terms of graphics and motion, the encoded video requires high bandwidth to provide acceptable Quality of Experience (QoE) to end users. The cost incurred by the cloud gaming service provider to stream the encoded video at such high bandwidth grows rapidly with the increase in the number of users. In this paper, we present a content-aware video encoding method for cloud gaming (referred to as CAVE) to improve the perceptual quality of the streamed video frames with comparable bandwidth requirements. This is a challenging task because of the stringent requirements on latency in cloud gaming, which impose additional restrictions on frame sizes as well as processing time to limit the total latency perceived by clients. Unlike many of the previous works, the proposed method is suitable for the state-of-the-art High Efficiency Video Coding (HEVC) encoder, which by itself offers substantial bitrate savings compared to prior encoders. The proposed method leverages information from the game such as the Regions-of-Interest (ROIs), and optimizes the quality by allocating different amounts of bits to various areas in the video frames. Through actual implementation in an open-source cloud gaming platform, we show that the proposed method achieves quality gains in ROIs that can be translated to bitrate savings between 21% and 46% against the baseline HEVC encoder and between 12% and 89% against the closest work in the literature.
Quantum Information and Computation
Reversible circuits for modular multiplication Cx%M with x
Signal, Image and Video Processing
Bitrate reduction with little to no degradation in visual perception is a long-standing challenge in video coding. This paper targets this challenge by adaptively filtering the content prior to video compression and in the preprocessing stage. This is done by applying a bilateral filter where the filter parameters are selected according to regional content complexity and estimated visual importance besides bitrate and quality requirements. A multi-scale metric based on 2D gradient is employed to determine bandwidth requirements of different regions. A random forest regression model is trained to predict distortion and bit requirements for a block, if it is filtered and encoded at a given quality. The predicted distortion and bit requirements are used to select filter parameters considering a cost function. The proposed approach is applied to both H.264 and HEVC encoders, with different GOP structures. The results show up to 60% bitrate reduction in terms of BD-Rate (about 20% on average) for the attempted test cases with little to no noticeable quality degradation.
IEEE International Conference on Multimedia and Expo Workshop (ICMEW)
Aiming at improved rate-distortion (R-D) performance, this paper presents a machine-learning-based solution for the run-time video resolution adaptation problem. The proposed approach utilizes neural networks that leverage a complexity feature extracted from the video frames to predict a quantization parameter (QP) for downscaled video targeting the same bitrate as the native video. The peak signal to noise ratio (PSNR) is also predicted for both the native and downscaled resolutions, and the one that leads to the highest PSNR is selected.
ACM Journal of Emerging Technologies in Computing Systems
Improving circuit realization of known quantum algorithms by CAD techniques has benefits for quantum experimentalists. In this article, the problem of synthesizing a given function on a set of ancillea is addressed. The proposed approach benefits from extensive sharing of cofactors among cubes that appear on function outputs. Accordingly, it can be considered a multilevel logic optimization technique for reversible circuits. In particular, the suggested approach can efficiently implement any n-input, m-output lookup table (LUT) by a reversible circuit. This problem has interesting applications in the Shor's number-factoring algorithm and in quantum walk on sparse graphs. Simulation results reveal that the proposed cofactor-sharing synthesis algorithm has a significant impact on reducing the size of modular exponentiation circuits for Shor's quantum factoring algorithm, oracle circuits in quantum walk on sparse graphs, and the well-known MCNC benchmarks.
Quantum Information Processing
While a couple of impressive quantum technologies have been proposed, they have several intrinsic limitations which must be considered by circuit designers to produce realizable circuits. Limited interaction distance between gate qubits is one of the most common limitations. In this paper, we suggest extensions of the existing synthesis flow aimed to realize circuits for quantum architectures with linear nearest neighbor (LNN) interaction. To this end, a template matching optimization, an exact synthesis approach, and two reordering strategies are introduced. The proposed methods are combined as an integrated synthesis flow. Experiments show that by using the suggested flow, quantum cost can be improved by more than 50% on average.
MMSys '19 Proceedings of the 10th ACM Multimedia Systems Conference
Cloud gaming allows users with thin-clients to play complex games on their end devices as the bulk of processing is offloaded to remote servers. A thin-client is only required to have basic decoding capabilities which exist on most modern devices. The result of the remote processing is an encoded video that gets streamed to the client. As modern games are complex in terms of graphics and motion, the encoded video requires high bandwidth to provide acceptable Quality of Experience (QoE) to end users. The cost incurred by the cloud gaming service provider to stream the encoded video at such high bandwidth grows rapidly with the increase in the number of users. In this paper, we present a content-aware video encoding method for cloud gaming (referred to as CAVE) to improve the perceptual quality of the streamed video frames with comparable bandwidth requirements. This is a challenging task because of the stringent requirements on latency in cloud gaming, which impose additional restrictions on frame sizes as well as processing time to limit the total latency perceived by clients. Unlike many of the previous works, the proposed method is suitable for the state-of-the-art High Efficiency Video Coding (HEVC) encoder, which by itself offers substantial bitrate savings compared to prior encoders. The proposed method leverages information from the game such as the Regions-of-Interest (ROIs), and optimizes the quality by allocating different amounts of bits to various areas in the video frames. Through actual implementation in an open-source cloud gaming platform, we show that the proposed method achieves quality gains in ROIs that can be translated to bitrate savings between 21% and 46% against the baseline HEVC encoder and between 12% and 89% against the closest work in the literature.
Quantum Information and Computation
Reversible circuits for modular multiplication Cx%M with x
Signal, Image and Video Processing
Bitrate reduction with little to no degradation in visual perception is a long-standing challenge in video coding. This paper targets this challenge by adaptively filtering the content prior to video compression and in the preprocessing stage. This is done by applying a bilateral filter where the filter parameters are selected according to regional content complexity and estimated visual importance besides bitrate and quality requirements. A multi-scale metric based on 2D gradient is employed to determine bandwidth requirements of different regions. A random forest regression model is trained to predict distortion and bit requirements for a block, if it is filtered and encoded at a given quality. The predicted distortion and bit requirements are used to select filter parameters considering a cost function. The proposed approach is applied to both H.264 and HEVC encoders, with different GOP structures. The results show up to 60% bitrate reduction in terms of BD-Rate (about 20% on average) for the attempted test cases with little to no noticeable quality degradation.
IEEE International Conference on Multimedia and Expo Workshop (ICMEW)
Aiming at improved rate-distortion (R-D) performance, this paper presents a machine-learning-based solution for the run-time video resolution adaptation problem. The proposed approach utilizes neural networks that leverage a complexity feature extracted from the video frames to predict a quantization parameter (QP) for downscaled video targeting the same bitrate as the native video. The peak signal to noise ratio (PSNR) is also predicted for both the native and downscaled resolutions, and the one that leads to the highest PSNR is selected.
ACM Journal of Emerging Technologies in Computing Systems
Improving circuit realization of known quantum algorithms by CAD techniques has benefits for quantum experimentalists. In this article, the problem of synthesizing a given function on a set of ancillea is addressed. The proposed approach benefits from extensive sharing of cofactors among cubes that appear on function outputs. Accordingly, it can be considered a multilevel logic optimization technique for reversible circuits. In particular, the suggested approach can efficiently implement any n-input, m-output lookup table (LUT) by a reversible circuit. This problem has interesting applications in the Shor's number-factoring algorithm and in quantum walk on sparse graphs. Simulation results reveal that the proposed cofactor-sharing synthesis algorithm has a significant impact on reducing the size of modular exponentiation circuits for Shor's quantum factoring algorithm, oracle circuits in quantum walk on sparse graphs, and the well-known MCNC benchmarks.
ACM Journal of Emerging Technologies in Computing Systems
Reversible logic has applications in various research areas including signal processing, cryptography and quantum computation. In this paper, direct NCT-based synthesis of a given k-cycle in a cycle-based synthesis scenario is examined. To this end, a set of seven building blocks is proposed that reveals the potential of direct synthesis of a given permutation to reduce both quantum cost and average runtime. To synthesize a given large cycle, we propose a decomposition algorithm to extract the suggested building blocks from the input specification. Then, a synthesis method is introduced which uses the building blocks and the decomposition algorithm. Finally, a hybrid synthesis framework is suggested which uses the proposed cycle-based synthesis method in conjunction with one of the recent NCT-based synthesis approaches which is based on Reed-Muller (RM) spectra. The time complexity and the effectiveness of the proposed synthesis approach are analyzed in detail. Our analyses show that the proposed hybrid framework leads to a better quantum cost in the worst-case scenario compared to the previously presented methods. The proposed framework always converges and typically synthesizes a given specification very fast compared to the available synthesis algorithms. Besides, the quantum costs of benchmark functions are improved about 20% on average (55% in the best case).
Quantum Information Processing
While a couple of impressive quantum technologies have been proposed, they have several intrinsic limitations which must be considered by circuit designers to produce realizable circuits. Limited interaction distance between gate qubits is one of the most common limitations. In this paper, we suggest extensions of the existing synthesis flow aimed to realize circuits for quantum architectures with linear nearest neighbor (LNN) interaction. To this end, a template matching optimization, an exact synthesis approach, and two reordering strategies are introduced. The proposed methods are combined as an integrated synthesis flow. Experiments show that by using the suggested flow, quantum cost can be improved by more than 50% on average.
MMSys '19 Proceedings of the 10th ACM Multimedia Systems Conference
Cloud gaming allows users with thin-clients to play complex games on their end devices as the bulk of processing is offloaded to remote servers. A thin-client is only required to have basic decoding capabilities which exist on most modern devices. The result of the remote processing is an encoded video that gets streamed to the client. As modern games are complex in terms of graphics and motion, the encoded video requires high bandwidth to provide acceptable Quality of Experience (QoE) to end users. The cost incurred by the cloud gaming service provider to stream the encoded video at such high bandwidth grows rapidly with the increase in the number of users. In this paper, we present a content-aware video encoding method for cloud gaming (referred to as CAVE) to improve the perceptual quality of the streamed video frames with comparable bandwidth requirements. This is a challenging task because of the stringent requirements on latency in cloud gaming, which impose additional restrictions on frame sizes as well as processing time to limit the total latency perceived by clients. Unlike many of the previous works, the proposed method is suitable for the state-of-the-art High Efficiency Video Coding (HEVC) encoder, which by itself offers substantial bitrate savings compared to prior encoders. The proposed method leverages information from the game such as the Regions-of-Interest (ROIs), and optimizes the quality by allocating different amounts of bits to various areas in the video frames. Through actual implementation in an open-source cloud gaming platform, we show that the proposed method achieves quality gains in ROIs that can be translated to bitrate savings between 21% and 46% against the baseline HEVC encoder and between 12% and 89% against the closest work in the literature.
Quantum Information and Computation
Reversible circuits for modular multiplication Cx%M with x
Signal, Image and Video Processing
Bitrate reduction with little to no degradation in visual perception is a long-standing challenge in video coding. This paper targets this challenge by adaptively filtering the content prior to video compression and in the preprocessing stage. This is done by applying a bilateral filter where the filter parameters are selected according to regional content complexity and estimated visual importance besides bitrate and quality requirements. A multi-scale metric based on 2D gradient is employed to determine bandwidth requirements of different regions. A random forest regression model is trained to predict distortion and bit requirements for a block, if it is filtered and encoded at a given quality. The predicted distortion and bit requirements are used to select filter parameters considering a cost function. The proposed approach is applied to both H.264 and HEVC encoders, with different GOP structures. The results show up to 60% bitrate reduction in terms of BD-Rate (about 20% on average) for the attempted test cases with little to no noticeable quality degradation.
IEEE International Conference on Multimedia and Expo Workshop (ICMEW)
Aiming at improved rate-distortion (R-D) performance, this paper presents a machine-learning-based solution for the run-time video resolution adaptation problem. The proposed approach utilizes neural networks that leverage a complexity feature extracted from the video frames to predict a quantization parameter (QP) for downscaled video targeting the same bitrate as the native video. The peak signal to noise ratio (PSNR) is also predicted for both the native and downscaled resolutions, and the one that leads to the highest PSNR is selected.
ACM Journal of Emerging Technologies in Computing Systems
Improving circuit realization of known quantum algorithms by CAD techniques has benefits for quantum experimentalists. In this article, the problem of synthesizing a given function on a set of ancillea is addressed. The proposed approach benefits from extensive sharing of cofactors among cubes that appear on function outputs. Accordingly, it can be considered a multilevel logic optimization technique for reversible circuits. In particular, the suggested approach can efficiently implement any n-input, m-output lookup table (LUT) by a reversible circuit. This problem has interesting applications in the Shor's number-factoring algorithm and in quantum walk on sparse graphs. Simulation results reveal that the proposed cofactor-sharing synthesis algorithm has a significant impact on reducing the size of modular exponentiation circuits for Shor's quantum factoring algorithm, oracle circuits in quantum walk on sparse graphs, and the well-known MCNC benchmarks.
ACM Journal of Emerging Technologies in Computing Systems
Reversible logic has applications in various research areas including signal processing, cryptography and quantum computation. In this paper, direct NCT-based synthesis of a given k-cycle in a cycle-based synthesis scenario is examined. To this end, a set of seven building blocks is proposed that reveals the potential of direct synthesis of a given permutation to reduce both quantum cost and average runtime. To synthesize a given large cycle, we propose a decomposition algorithm to extract the suggested building blocks from the input specification. Then, a synthesis method is introduced which uses the building blocks and the decomposition algorithm. Finally, a hybrid synthesis framework is suggested which uses the proposed cycle-based synthesis method in conjunction with one of the recent NCT-based synthesis approaches which is based on Reed-Muller (RM) spectra. The time complexity and the effectiveness of the proposed synthesis approach are analyzed in detail. Our analyses show that the proposed hybrid framework leads to a better quantum cost in the worst-case scenario compared to the previously presented methods. The proposed framework always converges and typically synthesizes a given specification very fast compared to the available synthesis algorithms. Besides, the quantum costs of benchmark functions are improved about 20% on average (55% in the best case).
Physical Review A
A major obstacle to implementing Shor's quantum number-factoring algorithm is the large size of modular-exponentiation circuits. We reduce this bottleneck by customizing reversible circuits for modular multiplication to individual runs of Shor's algorithm. Our circuit-synthesis procedure exploits spectral properties of multiplication operators and constructs optimized circuits from the traces of the execution of an appropriate GCD algorithm. Empirically, gate counts are reduced by 4-5 times, and circuit latency is reduced by larger factors.
Quantum Information Processing
While a couple of impressive quantum technologies have been proposed, they have several intrinsic limitations which must be considered by circuit designers to produce realizable circuits. Limited interaction distance between gate qubits is one of the most common limitations. In this paper, we suggest extensions of the existing synthesis flow aimed to realize circuits for quantum architectures with linear nearest neighbor (LNN) interaction. To this end, a template matching optimization, an exact synthesis approach, and two reordering strategies are introduced. The proposed methods are combined as an integrated synthesis flow. Experiments show that by using the suggested flow, quantum cost can be improved by more than 50% on average.
MMSys '19 Proceedings of the 10th ACM Multimedia Systems Conference
Cloud gaming allows users with thin-clients to play complex games on their end devices as the bulk of processing is offloaded to remote servers. A thin-client is only required to have basic decoding capabilities which exist on most modern devices. The result of the remote processing is an encoded video that gets streamed to the client. As modern games are complex in terms of graphics and motion, the encoded video requires high bandwidth to provide acceptable Quality of Experience (QoE) to end users. The cost incurred by the cloud gaming service provider to stream the encoded video at such high bandwidth grows rapidly with the increase in the number of users. In this paper, we present a content-aware video encoding method for cloud gaming (referred to as CAVE) to improve the perceptual quality of the streamed video frames with comparable bandwidth requirements. This is a challenging task because of the stringent requirements on latency in cloud gaming, which impose additional restrictions on frame sizes as well as processing time to limit the total latency perceived by clients. Unlike many of the previous works, the proposed method is suitable for the state-of-the-art High Efficiency Video Coding (HEVC) encoder, which by itself offers substantial bitrate savings compared to prior encoders. The proposed method leverages information from the game such as the Regions-of-Interest (ROIs), and optimizes the quality by allocating different amounts of bits to various areas in the video frames. Through actual implementation in an open-source cloud gaming platform, we show that the proposed method achieves quality gains in ROIs that can be translated to bitrate savings between 21% and 46% against the baseline HEVC encoder and between 12% and 89% against the closest work in the literature.
Quantum Information and Computation
Reversible circuits for modular multiplication Cx%M with x
Signal, Image and Video Processing
Bitrate reduction with little to no degradation in visual perception is a long-standing challenge in video coding. This paper targets this challenge by adaptively filtering the content prior to video compression and in the preprocessing stage. This is done by applying a bilateral filter where the filter parameters are selected according to regional content complexity and estimated visual importance besides bitrate and quality requirements. A multi-scale metric based on 2D gradient is employed to determine bandwidth requirements of different regions. A random forest regression model is trained to predict distortion and bit requirements for a block, if it is filtered and encoded at a given quality. The predicted distortion and bit requirements are used to select filter parameters considering a cost function. The proposed approach is applied to both H.264 and HEVC encoders, with different GOP structures. The results show up to 60% bitrate reduction in terms of BD-Rate (about 20% on average) for the attempted test cases with little to no noticeable quality degradation.
IEEE International Conference on Multimedia and Expo Workshop (ICMEW)
Aiming at improved rate-distortion (R-D) performance, this paper presents a machine-learning-based solution for the run-time video resolution adaptation problem. The proposed approach utilizes neural networks that leverage a complexity feature extracted from the video frames to predict a quantization parameter (QP) for downscaled video targeting the same bitrate as the native video. The peak signal to noise ratio (PSNR) is also predicted for both the native and downscaled resolutions, and the one that leads to the highest PSNR is selected.
ACM Journal of Emerging Technologies in Computing Systems
Improving circuit realization of known quantum algorithms by CAD techniques has benefits for quantum experimentalists. In this article, the problem of synthesizing a given function on a set of ancillea is addressed. The proposed approach benefits from extensive sharing of cofactors among cubes that appear on function outputs. Accordingly, it can be considered a multilevel logic optimization technique for reversible circuits. In particular, the suggested approach can efficiently implement any n-input, m-output lookup table (LUT) by a reversible circuit. This problem has interesting applications in the Shor's number-factoring algorithm and in quantum walk on sparse graphs. Simulation results reveal that the proposed cofactor-sharing synthesis algorithm has a significant impact on reducing the size of modular exponentiation circuits for Shor's quantum factoring algorithm, oracle circuits in quantum walk on sparse graphs, and the well-known MCNC benchmarks.
ACM Journal of Emerging Technologies in Computing Systems
Reversible logic has applications in various research areas including signal processing, cryptography and quantum computation. In this paper, direct NCT-based synthesis of a given k-cycle in a cycle-based synthesis scenario is examined. To this end, a set of seven building blocks is proposed that reveals the potential of direct synthesis of a given permutation to reduce both quantum cost and average runtime. To synthesize a given large cycle, we propose a decomposition algorithm to extract the suggested building blocks from the input specification. Then, a synthesis method is introduced which uses the building blocks and the decomposition algorithm. Finally, a hybrid synthesis framework is suggested which uses the proposed cycle-based synthesis method in conjunction with one of the recent NCT-based synthesis approaches which is based on Reed-Muller (RM) spectra. The time complexity and the effectiveness of the proposed synthesis approach are analyzed in detail. Our analyses show that the proposed hybrid framework leads to a better quantum cost in the worst-case scenario compared to the previously presented methods. The proposed framework always converges and typically synthesizes a given specification very fast compared to the available synthesis algorithms. Besides, the quantum costs of benchmark functions are improved about 20% on average (55% in the best case).
Physical Review A
A major obstacle to implementing Shor's quantum number-factoring algorithm is the large size of modular-exponentiation circuits. We reduce this bottleneck by customizing reversible circuits for modular multiplication to individual runs of Shor's algorithm. Our circuit-synthesis procedure exploits spectral properties of multiplication operators and constructs optimized circuits from the traces of the execution of an appropriate GCD algorithm. Empirically, gate counts are reduced by 4-5 times, and circuit latency is reduced by larger factors.
ACM Computing Surveys
Reversible logic circuits have been historically motivated by theoretical research in low-power electronics as well as practical improvement of bit-manipulation transforms in cryptography and computer graphics. Recently, reversible circuits have attracted interest as components of quantum algorithms, as well as in photonic and nano-computing technologies where some switching devices offer no signal gain. Research in generating reversible logic distinguishes between circuit synthesis, post-synthesis optimization, and technology mapping. In this survey, we review algorithmic paradigms --- search-based, cycle-based, transformation-based, and BDD-based --- as well as specific algorithms for reversible synthesis, both exact and heuristic. We conclude the survey by outlining key open challenges in synthesis of reversible and quantum logic, as well as most common misconceptions.
Quantum Information Processing
While a couple of impressive quantum technologies have been proposed, they have several intrinsic limitations which must be considered by circuit designers to produce realizable circuits. Limited interaction distance between gate qubits is one of the most common limitations. In this paper, we suggest extensions of the existing synthesis flow aimed to realize circuits for quantum architectures with linear nearest neighbor (LNN) interaction. To this end, a template matching optimization, an exact synthesis approach, and two reordering strategies are introduced. The proposed methods are combined as an integrated synthesis flow. Experiments show that by using the suggested flow, quantum cost can be improved by more than 50% on average.
MMSys '19 Proceedings of the 10th ACM Multimedia Systems Conference
Cloud gaming allows users with thin-clients to play complex games on their end devices as the bulk of processing is offloaded to remote servers. A thin-client is only required to have basic decoding capabilities which exist on most modern devices. The result of the remote processing is an encoded video that gets streamed to the client. As modern games are complex in terms of graphics and motion, the encoded video requires high bandwidth to provide acceptable Quality of Experience (QoE) to end users. The cost incurred by the cloud gaming service provider to stream the encoded video at such high bandwidth grows rapidly with the increase in the number of users. In this paper, we present a content-aware video encoding method for cloud gaming (referred to as CAVE) to improve the perceptual quality of the streamed video frames with comparable bandwidth requirements. This is a challenging task because of the stringent requirements on latency in cloud gaming, which impose additional restrictions on frame sizes as well as processing time to limit the total latency perceived by clients. Unlike many of the previous works, the proposed method is suitable for the state-of-the-art High Efficiency Video Coding (HEVC) encoder, which by itself offers substantial bitrate savings compared to prior encoders. The proposed method leverages information from the game such as the Regions-of-Interest (ROIs), and optimizes the quality by allocating different amounts of bits to various areas in the video frames. Through actual implementation in an open-source cloud gaming platform, we show that the proposed method achieves quality gains in ROIs that can be translated to bitrate savings between 21% and 46% against the baseline HEVC encoder and between 12% and 89% against the closest work in the literature.
Quantum Information and Computation
Reversible circuits for modular multiplication Cx%M with x
Signal, Image and Video Processing
Bitrate reduction with little to no degradation in visual perception is a long-standing challenge in video coding. This paper targets this challenge by adaptively filtering the content prior to video compression and in the preprocessing stage. This is done by applying a bilateral filter where the filter parameters are selected according to regional content complexity and estimated visual importance besides bitrate and quality requirements. A multi-scale metric based on 2D gradient is employed to determine bandwidth requirements of different regions. A random forest regression model is trained to predict distortion and bit requirements for a block, if it is filtered and encoded at a given quality. The predicted distortion and bit requirements are used to select filter parameters considering a cost function. The proposed approach is applied to both H.264 and HEVC encoders, with different GOP structures. The results show up to 60% bitrate reduction in terms of BD-Rate (about 20% on average) for the attempted test cases with little to no noticeable quality degradation.
IEEE International Conference on Multimedia and Expo Workshop (ICMEW)
Aiming at improved rate-distortion (R-D) performance, this paper presents a machine-learning-based solution for the run-time video resolution adaptation problem. The proposed approach utilizes neural networks that leverage a complexity feature extracted from the video frames to predict a quantization parameter (QP) for downscaled video targeting the same bitrate as the native video. The peak signal to noise ratio (PSNR) is also predicted for both the native and downscaled resolutions, and the one that leads to the highest PSNR is selected.
ACM Journal of Emerging Technologies in Computing Systems
Improving circuit realization of known quantum algorithms by CAD techniques has benefits for quantum experimentalists. In this article, the problem of synthesizing a given function on a set of ancillea is addressed. The proposed approach benefits from extensive sharing of cofactors among cubes that appear on function outputs. Accordingly, it can be considered a multilevel logic optimization technique for reversible circuits. In particular, the suggested approach can efficiently implement any n-input, m-output lookup table (LUT) by a reversible circuit. This problem has interesting applications in the Shor's number-factoring algorithm and in quantum walk on sparse graphs. Simulation results reveal that the proposed cofactor-sharing synthesis algorithm has a significant impact on reducing the size of modular exponentiation circuits for Shor's quantum factoring algorithm, oracle circuits in quantum walk on sparse graphs, and the well-known MCNC benchmarks.
ACM Journal of Emerging Technologies in Computing Systems
Reversible logic has applications in various research areas including signal processing, cryptography and quantum computation. In this paper, direct NCT-based synthesis of a given k-cycle in a cycle-based synthesis scenario is examined. To this end, a set of seven building blocks is proposed that reveals the potential of direct synthesis of a given permutation to reduce both quantum cost and average runtime. To synthesize a given large cycle, we propose a decomposition algorithm to extract the suggested building blocks from the input specification. Then, a synthesis method is introduced which uses the building blocks and the decomposition algorithm. Finally, a hybrid synthesis framework is suggested which uses the proposed cycle-based synthesis method in conjunction with one of the recent NCT-based synthesis approaches which is based on Reed-Muller (RM) spectra. The time complexity and the effectiveness of the proposed synthesis approach are analyzed in detail. Our analyses show that the proposed hybrid framework leads to a better quantum cost in the worst-case scenario compared to the previously presented methods. The proposed framework always converges and typically synthesizes a given specification very fast compared to the available synthesis algorithms. Besides, the quantum costs of benchmark functions are improved about 20% on average (55% in the best case).
Physical Review A
A major obstacle to implementing Shor's quantum number-factoring algorithm is the large size of modular-exponentiation circuits. We reduce this bottleneck by customizing reversible circuits for modular multiplication to individual runs of Shor's algorithm. Our circuit-synthesis procedure exploits spectral properties of multiplication operators and constructs optimized circuits from the traces of the execution of an appropriate GCD algorithm. Empirically, gate counts are reduced by 4-5 times, and circuit latency is reduced by larger factors.
ACM Computing Surveys
Reversible logic circuits have been historically motivated by theoretical research in low-power electronics as well as practical improvement of bit-manipulation transforms in cryptography and computer graphics. Recently, reversible circuits have attracted interest as components of quantum algorithms, as well as in photonic and nano-computing technologies where some switching devices offer no signal gain. Research in generating reversible logic distinguishes between circuit synthesis, post-synthesis optimization, and technology mapping. In this survey, we review algorithmic paradigms --- search-based, cycle-based, transformation-based, and BDD-based --- as well as specific algorithms for reversible synthesis, both exact and heuristic. We conclude the survey by outlining key open challenges in synthesis of reversible and quantum logic, as well as most common misconceptions.
IEEE Trans. on Computer-Aided Design
For years, the quantum/reversible circuit community has been convinced that: 1) the addition of auxiliary quantum bits (qubits) is instrumental in constructing a smaller quantum circuit, and 2) the introduction of quantum gates inside reversible circuits may result in more efficient designs. This paper presents a systematic approach to optimizing reversible (and quantum) circuits via the introduction of auxiliary qubits and quantum gates inside circuit designs. This advances our understanding of what may be achieved with 1) and 2).
Quantum Information Processing
While a couple of impressive quantum technologies have been proposed, they have several intrinsic limitations which must be considered by circuit designers to produce realizable circuits. Limited interaction distance between gate qubits is one of the most common limitations. In this paper, we suggest extensions of the existing synthesis flow aimed to realize circuits for quantum architectures with linear nearest neighbor (LNN) interaction. To this end, a template matching optimization, an exact synthesis approach, and two reordering strategies are introduced. The proposed methods are combined as an integrated synthesis flow. Experiments show that by using the suggested flow, quantum cost can be improved by more than 50% on average.
MMSys '19 Proceedings of the 10th ACM Multimedia Systems Conference
Cloud gaming allows users with thin-clients to play complex games on their end devices as the bulk of processing is offloaded to remote servers. A thin-client is only required to have basic decoding capabilities which exist on most modern devices. The result of the remote processing is an encoded video that gets streamed to the client. As modern games are complex in terms of graphics and motion, the encoded video requires high bandwidth to provide acceptable Quality of Experience (QoE) to end users. The cost incurred by the cloud gaming service provider to stream the encoded video at such high bandwidth grows rapidly with the increase in the number of users. In this paper, we present a content-aware video encoding method for cloud gaming (referred to as CAVE) to improve the perceptual quality of the streamed video frames with comparable bandwidth requirements. This is a challenging task because of the stringent requirements on latency in cloud gaming, which impose additional restrictions on frame sizes as well as processing time to limit the total latency perceived by clients. Unlike many of the previous works, the proposed method is suitable for the state-of-the-art High Efficiency Video Coding (HEVC) encoder, which by itself offers substantial bitrate savings compared to prior encoders. The proposed method leverages information from the game such as the Regions-of-Interest (ROIs), and optimizes the quality by allocating different amounts of bits to various areas in the video frames. Through actual implementation in an open-source cloud gaming platform, we show that the proposed method achieves quality gains in ROIs that can be translated to bitrate savings between 21% and 46% against the baseline HEVC encoder and between 12% and 89% against the closest work in the literature.
Quantum Information and Computation
Reversible circuits for modular multiplication Cx%M with x
Signal, Image and Video Processing
Bitrate reduction with little to no degradation in visual perception is a long-standing challenge in video coding. This paper targets this challenge by adaptively filtering the content prior to video compression and in the preprocessing stage. This is done by applying a bilateral filter where the filter parameters are selected according to regional content complexity and estimated visual importance besides bitrate and quality requirements. A multi-scale metric based on 2D gradient is employed to determine bandwidth requirements of different regions. A random forest regression model is trained to predict distortion and bit requirements for a block, if it is filtered and encoded at a given quality. The predicted distortion and bit requirements are used to select filter parameters considering a cost function. The proposed approach is applied to both H.264 and HEVC encoders, with different GOP structures. The results show up to 60% bitrate reduction in terms of BD-Rate (about 20% on average) for the attempted test cases with little to no noticeable quality degradation.
IEEE International Conference on Multimedia and Expo Workshop (ICMEW)
Aiming at improved rate-distortion (R-D) performance, this paper presents a machine-learning-based solution for the run-time video resolution adaptation problem. The proposed approach utilizes neural networks that leverage a complexity feature extracted from the video frames to predict a quantization parameter (QP) for downscaled video targeting the same bitrate as the native video. The peak signal to noise ratio (PSNR) is also predicted for both the native and downscaled resolutions, and the one that leads to the highest PSNR is selected.
ACM Journal of Emerging Technologies in Computing Systems
Improving circuit realization of known quantum algorithms by CAD techniques has benefits for quantum experimentalists. In this article, the problem of synthesizing a given function on a set of ancillea is addressed. The proposed approach benefits from extensive sharing of cofactors among cubes that appear on function outputs. Accordingly, it can be considered a multilevel logic optimization technique for reversible circuits. In particular, the suggested approach can efficiently implement any n-input, m-output lookup table (LUT) by a reversible circuit. This problem has interesting applications in the Shor's number-factoring algorithm and in quantum walk on sparse graphs. Simulation results reveal that the proposed cofactor-sharing synthesis algorithm has a significant impact on reducing the size of modular exponentiation circuits for Shor's quantum factoring algorithm, oracle circuits in quantum walk on sparse graphs, and the well-known MCNC benchmarks.
ACM Journal of Emerging Technologies in Computing Systems
Reversible logic has applications in various research areas including signal processing, cryptography and quantum computation. In this paper, direct NCT-based synthesis of a given k-cycle in a cycle-based synthesis scenario is examined. To this end, a set of seven building blocks is proposed that reveals the potential of direct synthesis of a given permutation to reduce both quantum cost and average runtime. To synthesize a given large cycle, we propose a decomposition algorithm to extract the suggested building blocks from the input specification. Then, a synthesis method is introduced which uses the building blocks and the decomposition algorithm. Finally, a hybrid synthesis framework is suggested which uses the proposed cycle-based synthesis method in conjunction with one of the recent NCT-based synthesis approaches which is based on Reed-Muller (RM) spectra. The time complexity and the effectiveness of the proposed synthesis approach are analyzed in detail. Our analyses show that the proposed hybrid framework leads to a better quantum cost in the worst-case scenario compared to the previously presented methods. The proposed framework always converges and typically synthesizes a given specification very fast compared to the available synthesis algorithms. Besides, the quantum costs of benchmark functions are improved about 20% on average (55% in the best case).
Physical Review A
A major obstacle to implementing Shor's quantum number-factoring algorithm is the large size of modular-exponentiation circuits. We reduce this bottleneck by customizing reversible circuits for modular multiplication to individual runs of Shor's algorithm. Our circuit-synthesis procedure exploits spectral properties of multiplication operators and constructs optimized circuits from the traces of the execution of an appropriate GCD algorithm. Empirically, gate counts are reduced by 4-5 times, and circuit latency is reduced by larger factors.
ACM Computing Surveys
Reversible logic circuits have been historically motivated by theoretical research in low-power electronics as well as practical improvement of bit-manipulation transforms in cryptography and computer graphics. Recently, reversible circuits have attracted interest as components of quantum algorithms, as well as in photonic and nano-computing technologies where some switching devices offer no signal gain. Research in generating reversible logic distinguishes between circuit synthesis, post-synthesis optimization, and technology mapping. In this survey, we review algorithmic paradigms --- search-based, cycle-based, transformation-based, and BDD-based --- as well as specific algorithms for reversible synthesis, both exact and heuristic. We conclude the survey by outlining key open challenges in synthesis of reversible and quantum logic, as well as most common misconceptions.
IEEE Trans. on Computer-Aided Design
For years, the quantum/reversible circuit community has been convinced that: 1) the addition of auxiliary quantum bits (qubits) is instrumental in constructing a smaller quantum circuit, and 2) the introduction of quantum gates inside reversible circuits may result in more efficient designs. This paper presents a systematic approach to optimizing reversible (and quantum) circuits via the introduction of auxiliary qubits and quantum gates inside circuit designs. This advances our understanding of what may be achieved with 1) and 2).
Physical Review A
We design a circuit structure with linear depth to implement an n-qubit Toffoli gate. The proposed construction uses a quadratic-size circuit consists of elementary 2-qubit controlled-rotation gates around the x axis and uses no ancilla qubit. Circuit depth remains linear in quantum technologies with finite-distance interactions between qubits. The suggested construction is related to the long-standing construction by Barenco et al. (Phys. Rev. A, 52: 3457-3467, 1995, arXiv:quant-ph/9503016), which uses a quadratic-size, quadratic-depth quantum circuit for an n-qubit Toffoli gate.