Intel® DevCloud for oneAPI

Overview Get Started Documentation Forum external link
1

Connect to DevCloud

Sign In to Connect to the DevCloud using SSH Clients.

2

Hello World!

Get Started by running a simple sample on DevCloud.

3

Run OpenCL for FPGA development on DevCloud

Explore the samples already installed in Step 2.

Browse Available Samples

Getting Started

Sector

This FPGA tutorial introduces how to compile DPC++ for FPGA through a simple vector addition example. If you are new to DPC++ for FPGA, start here!

View code on GitHub*

Separating Host and Device Code Compilation

Sector

This FPGA tutorial demonstrates how to separate the compilation of a program's host code and device code to save development time. It's recommended to read the 'fpga_compile' code sample before this one.

View code on GitHub*

Reference Designs

CRR Binomial Tree Model for Option Pricing

Sector

An FPGA-optimized reference design computing the Cox-Ross-Rubinstein (CRR) binomial tree model with Greeks for American exercise options.

View code on GitHub*

Database Query Acceleration

Sector

This reference design demonstrates how to use an FPGA to accelerate database queries for a data-warehouse schema derived from TPC-H.

View code on GitHub*

GZIP Compression

Sector

Reference design demonstrating high-performance GZIP compression on FPGA.

View code on GitHub*

Merge Sort

Sector

This DPC++ reference design demonstrates a highly paramaterizable merge sort algorithm on an FPGA.

View code on GitHub*

MVDR Beamforming

Sector

This reference design demonstrates IO streaming in DPC++ on an FPGA for a large system. The IO streaming is 'faked' using data from the host.

View code on GitHub*

QR Decomposition of Matrices

Sector

This DPC++ reference design demonstrates high performance QR decomposition of complex matrices on FPGA.

View code on GitHub*

Design Pattern Tutorials

Buffered Host-Device Streaming

Sector

This tutorial demonstrates how to create a high-performance full system CPU-FPGA design using SYCL USM.

View code on GitHub*

Using Compute Units To Duplicate Kernels

Sector

This FPGA tutorial showcases a design pattern that allows you to make multiple copies of a kernel, called compute units.

View code on GitHub*

Double Buffering to Overlap Kernel Execution with Buffer Transfers and Host Processing

Sector

This FPGA tutorial demonstrates how to parallelize host-side processing and buffer transfers between host and device with kernel execution, which can improve overall application performance.

View code on GitHub*

Explicit Data Movement

Sector

An FPGA tutorial demonstrating an alternative coding style, SYCL Unified Shared Memory (USM) device allocations, in which data movement between host and device is controlled explicitly by the code author.

View code on GitHub*

IO Streaming

Sector

An FPGA code sample describing how to use DPC++ IO pipes to stream data through the FPGA's IO.

View code on GitHub*

Removing Loop Carried Dependencies

Sector

This tutorial demonstrates how to remove a loop-carried dependency to improve the performance of the FPGA device code.

View code on GitHub*

N-Way Buffering to Overlap Kernel Execution with Buffer Transfers and Host Processing

Sector

This FPGA tutorial demonstrates how to parallelize host-side processing and buffer transfers between host and device with kernel execution to improve overall application performance. It is a generalization of the 'double buffering' technique and can be used to perform this overlap even when the host-processing time exceeds kernel execution time.

View code on GitHub*

Caching On-Chip Memory to Improve Loop Performance

Sector

This FPGA tutorial demonstrates how to build a simple cache (implemented in FPGA registers) to store recently-accessed memory locations so that the compiler can achieve II=1 on critical loops in task kernels.

View code on GitHub*

Optimizing Inner Loop Throughput

Sector

This FPGA tutorial discusses optimizing the throughput of an inner loop with a low trip count.

View code on GitHub*

Data Transfers Using Pipe Arrays

Sector

This FPGA tutorial showcases a design pattern that makes it possible to create arrays of pipes.

View code on GitHub*

Shannonization to Improve fMAX/II

Sector

This tutorial describes the process of Shannonization (named after Claude Shannon) for a simple FPGA design. This optimization improves the fMAX/II of a design by precomputing operations in a loop to remove them from the critical path.

View code on GitHub*

Simple Host-Device Streaming

Sector

This tutorial demonstrates how to use SYCL Universal Shared Memory (USM) to stream data between the host and FPGA device and achieve low latency while maintaining throughput.

View code on GitHub*

Triangular Loop Optimization

Sector

This FPGA tutorial demonstrates an advanced technique to improve the performance of nested triangular loops with loop-carried dependencies in single-task kernels.

View code on GitHub*

Zero-copy Data Transfer

Sector

This tutorial demonstrates how to use zero-copy host memory via the SYCL Unified Shared Memory (USM) to improve your FPGA design's performance.

View code on GitHub*

Features Tutorials

Explicit Pipeline Register Insertion with fpga_reg

Sector

This FPGA tutorial demonstrates how a power user can apply the DPC++ extension ext::intel::fpga_reg to tweak the hardware generated by the compiler.

View code on GitHub*

Avoiding Aliasing of Kernel Arguments

Sector

This tutorial explains the kernel_args_restrict attribute and its effect on the performance of FPGA kernels.

View code on GitHub*

Coalescing Nested Loops

Sector

This FPGA tutorial demonstrates applying the loop_coalesce attribute to a nested loop in a task kernel to reduce the area overhead.

View code on GitHub*

Loop initiation_interval attribute

Sector

This FPGA tutorial demonstrates how a user can use the intel::initiation_interval attribute to change the initiation interval (II) of a loop in scenarios that this feature improves performance.

View code on GitHub*

Loop ivdep Attribute

Sector

This FPGA tutorial demonstrates how to apply the ivdep attribute to a loop to aid the compiler's loop dependence analysis

View code on GitHub*

Unrolling Loops

Sector

This FPGA tutorial demonstrates a simple example of unrolling loops to improve a DPC++ FPGA program's throughput.

View code on GitHub*

LSU Control

Sector

This FPGA tutorial demonstrates how to configure the load-store units (LSU) in your DPC++ program using the LSU controls extension.

View code on GitHub*

Maximum Interleaving of a Loop

Sector

This FPGA tutorial explains how to use the max_interleaving attribute for loops.

View code on GitHub*

On-Chip Memory Attributes

Sector

This FPGA tutorial demonstrates how to use on-chip memory attributes to control memory structures in your DPC++ program.

View code on GitHub*

Data Transfers Using Pipes

Sector

This FPGA tutorial shows how to use pipes to transfer data between kernels.

View code on GitHub*

Private Copies

Sector

This FPGA tutorial explains how to use the private_copies attribute to trade off the on-chip memory use and the throughput of a DPC++ FPGA program.

View code on GitHub*

Speculated Iterations of a Loop

Sector

This FPGA tutorial demonstrates applying the speculated_iterations attribute to a loop in a task kernel to enable more efficient loop pipelining.

View code on GitHub*

Reducing latency of computations

Sector

This FPGA tutorial demonstrates how to use the use_stall_enable_clusters attribute to reduce the area and latency of your FPGA kernels. This attribute may reduce the FPGA FMax for your kernels.

View code on GitHub*