aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
  • Tools
  • About
  • Big Data
  • Programming
  • Public Cloud

Scale Your EDA Flows: How Google Cloud Enables Faster Verification

  • aster_cloud
  • December 23, 2020
  • 4 minute read

Companies embark on modernizing their infrastructure in the cloud for three main reasons: 1) to accelerate product delivery 2) to reduce system downtime and 3) to enable innovation. Chip designers with Electronic Design Automation (EDA) workloads share these goals, and can greatly benefit from using cloud.

Chip design and manufacturing includes several tools across the flow, with varied compute and memory footprints. Register Transfer Level (RTL) design and modeling is one of the most time consuming steps in the design process, accounting for more than half the time needed in the entire design cycle. RTL designers use Hardware Description Languages (HDL) such as SystemVerilog and VHDL to create a design which then goes through a series of tools. Mature RTL verification flows include static analysis (checks for design integrity without use of test vectors), formal property verification (mathematically proving or falsifying design properties), dynamic simulation (test vector-based simulation of actual designs) and emulation (a complex system that imitates the behavior of the final chip, especially useful to validate functionality of the software stack).


Partner with aster.cloud
for your next big idea.
Let us know here.


cyberpogo

Dynamic simulation arguably takes up the most compute in any design team’s data center. We wanted to create an easy set up using Google Cloud technologies and open-source designs and solutions to showcase three key points:

  1. How simulation can accelerate with more compute
  2. How verification teams can benefit from auto-scaling cloud clusters
  3. How organizations can effectively leverage the elasticity of cloud to build highly utilized technology infrastructure
1 OpenPiton Tile architecture.jpg
OpenPiton Tile architecture(a) and Chipset Architecture(b)Source: OpenPiton: An Open Source Manycore Research Framework, Balkind et al

We did this using a variety of tools: We used the OpenPiton design verification scripts, Icarus Verilog Simulator, SLURM workload management solution and Google Cloud standard compute configurations.

  • OpenPiton is the world’s first open-source, general-purpose, multithreaded manycore processor and framework. Developed at Princeton University, it’s scalable and portable and can scale up to 500-million cores. It’s wildly popular within the research community and comes with scripts for performing the typical steps in the design flow, including dynamic simulation, logic synthesis and physical synthesis.
  • Icarus Verilog, sometimes known as iverilog, is an open-source Verilog simulation and synthesis tool.
Read More  Accelerate Speed To Insights With Data Exploration In Dataplex

Simple Linux Utility for Resource Management or SLURM is an open-source, fault-tolerant and highly scalable cluster management and job scheduling system for Linux clusters. SLURM provides functionality such as enabling user access to compute nodes, managing a queue of pending work, and a framework for starting and monitoring jobs. Auto-scaling of a SLURM cluster refers to the capability of the cluster manager to spin up nodes on demand and shut down nodes automatically after jobs are completed.

2 SLURM Components.jpg
SLURM Components. Source: slurm.schedmd.com/quickstart.html

 

Setup

We used a very basic reference architecture for the underlying infrastructure. While simple, it was sufficient to achieve our goals. We used standard N1 machines (n1-standard-2 with 2 vCPUs, 7.5 GB memory), and set up the SLURM cluster to auto-scale to 10 compute nodes. The reference architecture is shown here. All required scripts are provided in this github repo.

3 architecture.jpg

 

Running the OpenPiton regression

The first step in running the OpenPiton regression is to follow the steps outlined in the github repo and complete the process successfully.

The next step is to download the design and verification files. Instructions are provided in the github repo. Once downloaded, there are three simple setup tasks to perform:

  1. Set up the PITON_ROOT environment variable (%export PITON_ROOT=<location of root of OpenPiton extracted files>)
  2. Set up the simulator home (%export ICARUS_HOME=/usr). The scripts provided to you in the github repo already take care of installing Icarus on the machines provisioned. This shows yet another advantage of cloud: simplified machine configuration.
  3. Finally, source your required settings (%source $PITON_ROOT/piton/piton_settings.bash)

For the verification run, we used the single tile setup for OpenPiton, the regression script ‘sims’ provided in the OpenPiton bundle and the ‘tile1_mini’ regression. We tried two runs—sequential and parallel. The parallel runs were managed by SLURM.

Read More  Google Cloud Unveils World’s Largest Publicly Available ML Hub With Cloud TPU V4, 90% Carbon-Free Energy

We invoked the sequential run using the following command:

%sims -sim_type=icv -group=tile1_mini

And the distributed run using this command:

%sims -sim_type=icv -group=tile1_mini -slurm -sim_q_command=sbatch

 

Results

The ‘tile1_mini’ regression has 46 tests. Running all 46 tile1_mini tests sequentially took an average of 120 minutes. The parallel run for tile1_mini with 10 auto-scaled SLURM nodes completed in 21 minutes—a 6X improvement!

4 View of VM instances on GCP console.jpg
View of VM instances on GCP console; node instances edafarm-compute0-<0-9> are created when the regression is launched
5 View of VM instances on GCP console.jpg
View of VM instances on GCP console when the regression was winding down; notice that the number of nodes has decreased

Further, we wanted to also highlight the value of autoscaling. The SLURM cluster was set up with two static nodes, and 10 dynamic nodes. The dynamic nodes were up and active quite soon after the distributed run was invoked. Since the nodes are shut down if there are no jobs, the cluster auto-scaled to 0 nodes after the run was complete. The additional cost of the dynamic nodes for the time of the simulation was $8.46.

6 Report generated to view compute utilization.jpg
Report generated to view compute utilization of SLURM nodes; notice the high utilization of the top 5 nodes
7 The cost of the extra compute.jpg
The cost of the extra compute can also be easily viewed by the several existing reports on GCP console

The above example shows a simple regression run, with very standard machines. By providing the capability to scale to more than 10 machines, further improvements in turnaround time can be achieved. In real-life, it is common for project teams to run millions of simulations. By having access to elastic compute capacity, you can dramatically reduce the verification process and shave months off verification sign-off.

Other considerations

Typical simulation environments use commercial simulators that extensively leverage multi-core machines and large compute farms. When it comes to Google Cloud infrastructure, it’s possible to build many different machine types (often referred to as “shapes”) with various numbers of cores, disk types, and memory. Further, while a simulation can only tell you whether the simulator ran successfully, verification teams have the subsequent task of validating the results of a simulation. Elaborate infrastructure that captures the simulation results across simulation runs—and provides follow-up tasks based on findings—is an integral part of the overall verification process. You can use Google Cloud solutions such as Cloud SQL and BigTable to create a high-performance, highly scalable and fault-tolerant simulation and verification environment. Further, you can use solutions such as AutoML Tables to infuse ML into your verification flows.

Read More  Visual Studio Code for Python and Data Science? Top 3 Plugins You Must Have

Interested? Try it out!

All the required scripts are publically available—no cloud experience is necessary to try them out. Google Cloud provides everything you need, including free Google Cloud credits to get you up and running. Click here to learn more about high performance computing (HPC) on Google Cloud.

 

Source Google Cloud by Sashi Obilisetty Chief Architect, Silicon Solutions | Mark Mims Solutions Architect, Big Data


Our humans need coffee too! Your support is highly appreciated, thank you!

aster_cloud

Related Topics
  • Cloud SQL
  • Google Cloud
  • Icarus Verilog
  • OpenPiton
  • SLURM
You May Also Like
View Post
  • Programming
  • Software Engineering
  • Technology

Build a Python App to Alert You When Asteroids Are Close to Earth

  • May 22, 2023
View Post
  • Programming

Illuminating Interactions: Visual State In Jetpack Compose

  • May 20, 2023
View Post
  • Data
  • Public Cloud

Cloud Data Loss Prevention’s Sensitive Data Intelligence Service Is Now Available In Security Command Center

  • May 18, 2023
View Post
  • Multi-Cloud
  • Public Cloud
  • Software Engineering

Policy Controller Dashboard: Now Available For All Anthos And GKE Environments

  • May 18, 2023
View Post
  • Containers
  • Public Cloud
  • Software
  • Software Engineering

How To Easily Migrate Your Apps To Containers — Free Deep Dive And Workshop

  • May 18, 2023
View Post
  • Design
  • Engineering
  • Public Cloud

Kubernetes Costs Less, But Less Than What?

  • May 18, 2023
View Post
  • Public Cloud
  • Software

Best Practices For Monetizing Cloud-based Technology And 5G Networks

  • May 15, 2023
View Post
  • Public Cloud
  • Software

New Cloud Deploy Features Make Application Deployment Even More Efficient

  • May 15, 2023

Stay Connected!
LATEST
  • 1
    Building A Kubernetes Platform: How And Why To Apply Governance And Policy
    • June 4, 2023
  • 2
    Leave, This “United” “Kingdom”, This “Great” “Britain”
    • June 4, 2023
  • 3
    Amazing Federated Multicloud Apps
    • June 2, 2023
  • 4
    What’s The Future Of DevOps? You Tell Us. Take The 2023 Accelerate State Of DevOps Survey
    • June 2, 2023
  • 5
    Resolving Deployment Issues With Ts-node And Azure Development Pipelines
    • June 1, 2023
  • 6
    What To Expect From Apple’s WWDC 2023
    • June 1, 2023
  • 7
    What Is Platform Engineering And Why Adopt It In Your Company?
    • June 1, 2023
  • 8
    Four Steps To Managing Your Cloud Logging Costs On A Budget
    • May 31, 2023
  • 9
    Red Hat Puts Podman Container Management On The Desktop
    • May 30, 2023
  • 10
    The Agile Mindset: A Path to Personal Fulfillment and Growth
    • May 30, 2023
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • 1
    Huawei ICT Competition 2022-2023 Global Final Held In Shenzhen — 146 Teams From 36 Countries And Regions Win Awards
    • May 27, 2023
  • 2
    Huawei OceanStor Pacific Scale-Out Storage Tops IO500 Rankings
    • May 26, 2023
  • 3
    MongoDB And Alibaba Cloud Extend Global Partnership
    • May 25, 2023
  • 4
    Tricentis Launches Quality Engineering Community ShiftSync
    • May 23, 2023
  • 5
    G7 2023: The Real Threat To The World Order Is Hypocrisy.
    • May 27, 2023
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.