Advances in chip manufacturing processes continue to increase the number of vertices to deal with in Optical Proximity Correction (OPC) and Mask Data Preparation (MDP). In order to manage the processing time, OPC in sub-nanometer now requires employing tens of thousands of CPU cores. A slight mishap can force the job to be re-run. Increased complexity in the computing environment, along with the length of time a job spends in it, makes a job more susceptible to various sources of failures, while the re-run penalty is also growing with the design complexity. Checkpointing is a technique that saves the state of a system that continuously transitions to another state. The purpose of creating a checkpoint for a job is to make it possible to restart an interrupted job from the saved state at a later time. With checkpointing, if a running job encounters termination before its normal completion, be it due to a hardware failure or a human intervention to make resources available for an urgent job, it can resume from the checkpoint closest to the point of termination and continue to its completion. Checkpointing applications can include 1) resume flow where a terminated job can be resumed from the point close to the termination, 2) repair flow to fix OPC hotspot areas without having to process the entire chip and 3) cloud usage where job migration may be necessary. In this paper, we study the runtime and storage impact with checkpointing and demonstrate virtually no runtime impact and very minimal increase in filer storage. Furthermore, we show how checkpointing can significantly reduce runtime in OPC hotspot repair applications
As the semiconductor manufacturing industry moves towards advanced nodes with increasingly complex designs, users may experience an enormous increase in runtimes of mask data processing steps. This is true for Mask Process Correction (MPC)/Mask Data Preparation (MDP or fracture). Conventionally, in a high-volume production flow, the steps execute sequentially; a subsequent step waits for the preceding step to complete. In cases where the entire design does not need to be processed for the next step to begin, users may pipeline the steps to reduce total turn-around time. For example, if MPC and fracture are executed on a design, fracture commences only once MPC is complete on the entire design. An integrated, pipelined flow subsumes the runtimes associated with downstream fracture processing to the maximum possible capability of the computing resources. The integration is achieved by using a task-based pipeline that produces and consumes data without intermediate file exchange. It allows for in-memory processing which eliminates intermediate disk I/O operations, thereby making room for optimizations. This paper presents a comprehensive study of an integrated Curvilinear MPC (CLMPC) + IMS Multibeam Fracture (MBW) flow and demonstrates the runtime benefit with minimal impact on accuracy and file size over a conventional sequential flow [1,2]. We explore and demonstrate the runtime advantage from pipelining on a set of representative curvilinear/rectilinear designs by comparing sequential vs. pipelined execution of curvilinear MPC (CLMPC) and curvilinear fracture. A set of 11 designs are investigated to illustrate a substantial reduction in runtime while maintaining high-quality results.
As the computational requirements for post tape out (PTO) flows increase at the 7nm and below technology nodes, there is a need to increase the scalability of the computational tools in order to reduce the turn-around time (TAT) of the flows. Utilization of design hierarchy has been one proven method to provide sufficient partitioning to enable PTO processing. However, as the data is processed through the PTO flow, its effective hierarchy is reduced. The reduction is necessary to achieve the desired accuracy. Also, the sequential nature of the PTO flow is inherently non-scalable. To address these limitations, we are proposing a quasi-hierarchical solution that combines multiple levels of parallelism to increase the scalability of the entire PTO flow. In this paper, we describe the system and present experimental results demonstrating the runtime reduction through scalable processing with thousands of computational cores.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.