Design and implementation of a programming language on virtual machines

Regardless of the target runtime, a language is first specified [1] with a set of formal syntax and semantics usually known as the grammar of the language. The most important goal of a language specification is to create a mutual grounds for understanding the language, the discussion and its implementation. Accordingly, a language implementation is essentially creating another software system to run the programs that are expressed according to the specification [2]. There are two general schemes in this regards, namely, translation and interpretation. In translation, the program in directly translated into the underlying machine understandable code while in interpretation the program is basically run line by line through a runtime environment. Moreover, some languages such as Java take advantage of both methods; here comes the concept of a virtual machine [3].

Classically, a language comes in with a set of tools including a compiler or an interpreter. The programmer writes a program and then compiles the program into the machine-understandable code that is directly executable. However, in the case of VM languages, in the first pass, the programmer, the programmer translates (compiles) the code into the intermediate bytecode [4] understandable for the virtual machine. In the second pass, the virtual machine is responsible to run the bytecode on the underlying platform which is a form of interpretation approach. The first pass is the same and produces the same byte code on all platforms; so, the programming language becomes platform-independent. However, for the language designers, this creates another task of implementing a virtual machine for all target platforms and OS’s.

Now, VM languages such as Java have created a good platform in a way that they have become a primary target in design and implementation of new programming languages. It means that instead of implementing the new language directly to the machine code, designers tend to create output based on the language that is compatible with JVM; so the JVM will take care of the execution of the program. Here rises two different approaches in this regard:

  1. Directly create bytecode for JVM
  2. Create Java sources and then compile them into bytecode for JVM

Case Study: Groovy

Groovy is a dynamic runtime language that supports and favors functional programming paradigm and now is widely used in domain specific applications. It is interesting to take a look at its design and implementation. Groovy takes  good use of ANTLR [5] that is a set of tools for language processing. For instance, Groovy language grammar and syntax is specified using ANTLR grammar syntax language. Based on this grammar, ANTLR can produce various tools including a lexer and a parser. The lexer and parser are both of the fundamental elements required to write a compiler for each language. Another advantage of ANTLR is that it gives the option to what to produce with respect to the grammar provided. One of the options is the abstract syntax tree (AST) [6] that is an intermediate data structure very useful during the parsing and compiling of a source code [7]. Groovy compiler, after it has scanned the source code of the program, receives an instance of AST for the program that is provided through a parser plugin generated by ANTLR. Briefly, the compilation of a source code unit in Groovy is as follows:

  1. Scanning the source and parsing it using the parser plugin generated by ANTLR based on the grammar
  2. Obtaining the AST instance for the source code unit
  3. Applying additional phases such as code optimization, code semantic analysis and verification
  4. Generating output

The whole process is heavily based on Visitor pattern [8]. In step 4, the Groovy compiler uses visitor pattern and another library called ASM [9] to generate bytecode for JVM. ASM is a library that helps manipulate or dynamically generate bytecode; i.e. “.class” files that are instruction sets for JVM. In Groovy compiler, the AST instance is visited throughout, and in each node of the tree, as each node is known to have a specific representation as the JVM bytecode, thus after visiting all the nodes in the AST, the root of the tree can collect all the bytecode for the whole source unit. This is a very neat and modular way of creating a language on top of Java Virtual Machine. In addition, Groovy compiler also has the option to generate the Java source along the bytecode. It is straightforward that having the AST instance, there are a number of things that can easily done.

Case Study: Scala

Scala is a dynamic functional scalable language based on Java. In contrast with Groovy, they both run on JVM. However, speaking of language implementation, Scala takes another interesting approach. As Scala is a dynamic functional language, it takes advantage of this feature in the compiling the source units. Specifically, Scala introduces its own parser actually declared in Scala language. The parsers declares all the syntax rules that are defined in the language; there is also a repository for all the rules in Scala language [10]. Apart from this, Scala introduces a compiler and an interpreter for the language.

Finally, the case studies show that are things to decide apart from the classical ones in language implementation on top of a virtual machine such as JVM. Groovy and Scala each takes a different approach; while different, each shows to have its advantages and applications.

References

  1. [1]: http://en.wikipedia.org/wiki/Programming_language_specification
  2. [2]: http://en.wikipedia.org/wiki/Programming_language_implementation
  3. [3]: http://en.wikipedia.org/wiki/Virtual_machine
  4. [4]: http://en.wikipedia.org/wiki/Bytecode
  5. [5]: http://www.antlr.org/
  6. [6]: http://en.wikipedia.org/wiki/Abstract_syntax_tree
  7. [7]: http://www.antlr.org/wiki/display/ANTLR3/Interfacing+AST+with+Java
  8. [8]: http://en.wikipedia.org/wiki/Visitor_pattern
  9. [9]: http://asm.ow2.org/
  10. [10]: http://code.google.com/p/scala-rules/

How does JVM map a Java thread to a native thread?

I’ve been recently studying about programming language design on multicore platforms. To design such a language, studying other platforms such as Java or C++ helps understand concepts better. There are a few questions interesting to have answers for:

  1. How does JVM maps a Java thread to a native thread to be executed in the underlying operating system?
  2. How is language design and implementation affected by thread scheduling and management in accordance with the operating system?

Starting with the first one, Java threads actually have two faces. One is the one that is seen in the Java programming language and used by the programmer. The other is the native implementation that is provided by Java language and managed by JVM. Java introduces JNI [1]: Java Native Interface. Through JNI, a programmer can have a class that is partly written in Java and partly written in some other language such as C++. JNI is used to implement parts of Java Thread class in C++. This implementation declares the methods and services required in Java Thread features but not implemented in Java itself.

While JNI brings in some of the implementation of Java Thread into C++, source implementation of JDK shows that there is also an abstraction of threads that are completely written in C++ providing the functionality required to work with threads. This abstraction is used by a JVM instance. As JVM is implemented in a platform-specific approach, each platform provides a set of API and libraries to work with OS threads.Thus, the abstraction is used by JVM to bridge the Java Thread model to an OS-specific model to be executed. For the same reason, a JVM implementation comes with a platform-specific source that will handle different settings regarding CPU and OS variations. For instance, there is a `linux_os_cpu.cpp` source file in the JVM that denotes that implementation is specific for a Linux-based operating system dealing with CPU requirements.

So, let’s the configuration to a multicore processor on a Linux-based operating system having a relevant JVM implementation. On one level, a JVM instance will handle a Java Thread and converts it into an OS native thread that will be executed. On another level, Linux provides a high-level abstraction of threads to be used by different applications. The modern library that handles this is called Native POSIX Thread Library (NPTL) [2] that is a C++ library to enable the Linux kernel to run threads that are written based on POSIX Thread Standard (PThreads) [3]. Thus, JVM implementation actually takes advantage of a PThreads implementation called NPTL and maps a Java Thread to an instance of thread that will be understandable and executable by the Linux kernel.

Speaking about the kernel, the Linux kernel understands some scheduled entity called task that will be managed and executed according to the algorithm  used in the kernel. So here, we have a problem of mapping an application thread to a kernel schedulable entity. There are three models that can be discussed: kernel-level threading (1:1), user-level threading (N:1), and hybrid threading (N:M) [4]. In kernel-level threading, each thread in the application and user space is converted to exactly one scheduled entity in the kernel space. In user-level threading, all threads in an application are mapped to only one thread in the kernel space. Hybrid threading is a mixture of both. Browsing through OS implementation history, it reveals that most implementations have converged to 1:1 model since it utilizes the processing power and relieves the task of scheduling from the languages and libraries. Also, it is argued that implementing N:M model will be costly while complex  and also usually operating system implementation will provide better and optimized services such as scheduling and context switching. Linux kernel uses a 1:1 mapping model. Since JVM only maps a Java Thread to a native OS thread, JVM is also following 1:1 model. So, if a Java program is written in a way to utilize several processors, it is guarantees that the Linux kernel will maximize the simultaneous use of different core as much as possible. Since Java 1.2+, JVM put aside the concepts of Green Threads [5] and used instead the native thread features to map Java Thread to native operating system threads.

Through the discussion, we mentioned the scheduling of kernel entities for execution (tasks). As of Linux kernel 2.6.23+, they have implemented a scheduler algorithm called Completely Fair Scheduler (CFS) [6]. CFS will optimize the most important scheduling workload to O(1) [7]. Also, it has a feature that will check for load balancing the work load among the processors and redistributes the work if necessary [7].So, this way the reason not to go to N:M model gets even more strengthen as the Linux kernel features promise more while efficient. On the other hand, when a process is created it will hold a number of threads of execution. Each process, and its child threads, will be given a Processor Affinity [8] that is a simple map to show that how much this process (thread) is likely to be executed on a specific core among the processors. This is done to minimize the costs in case some thread is being reactivated in the same core and some of its data is present nearby.

To conclude, as the native operating systems on multicore is doing pretty well in terms of abstraction and performance, it is worth to consider such facts when designing and implementing a language that aims the challenges of programming on multicore platforms. Since the mapping of application threads to kernel threads are handled in a neat way, maybe it would not be wise to meddle with the such problems in the language design.

Accordingly, I also posted two questions ([9] and [10]) on Stackoverflow.com and received good answers.

  1. [1]: http://en.wikipedia.org/wiki/Java_Native_Interface
  2. [2]: http://en.wikipedia.org/wiki/Native_POSIX_Thread_Library
  3. [3]: http://en.wikipedia.org/wiki/POSIX_Threads
  4. [4]: http://en.wikipedia.org/wiki/Thread_%28computer_science%29#Models
  5. [5]: http://en.wikipedia.org/wiki/Green_threads
  6. [6]: http://en.wikipedia.org/wiki/Completely_Fair_Scheduler
  7. [7]: http://www.ibm.com/developerworks/linux/library/l-scheduler/
  8. [8]: http://en.wikipedia.org/wiki/Processor_affinity
  9. [9]: http://stackoverflow.com/questions/4203021/jvm-implementation-of-thread-work-distribution-and-multicore
  10. [10]: http://stackoverflow.com/questions/4249124/hybrid-thread-model-mn-implementation

برنامه نویسی چند‌هسته‌ای – Multicore Programming

در مسیر پیدا‌کردن موضوعی برای پایان‌نامه‌ی فوق لیسانس، به طور خیلی اتفاقی با موضوعی آشنا شدم به نام برنامه‌نویسی چند‌هسته‌ای که چند سالیه موضوع داغی برای تحقیقات در زمینه‌ی زبان‌های برنامه‌نویسی و تولید و توسعه‌ی نرم‌افزار محسوب می‌شه. این داستان از این‌جا جالبه که ایده‌های برنامه‌نویسی موازی از سال‌ها پیش مطرح بود اما در همون زمان به دلیل نداشتن امکانات سخت‌افزاری و قدرت پردازشی لازم روش‌های برنامه‌نویسی و زبان‌های برنامه‌نویسی با دیدگاه تک‌واحدی بودن پردازنده شروع به رشد کردند. از سال ۲۰۰۰ به بعد که سخت‌افزار‌ها از نظر قدرت پردازشی به شدت پیشرفت کردند و مفهوم چند‌هسته‌ای بودن یک پردازنده الان خیلی عادی می‌آد، چندسالیه که این موضوع قدیمی دوباره به روی میز اومده و خیلی‌ها به دنبال این هستند که چه طور نرم‌افزار باید برای این دوره آماده بشوند.

مقایسه‌ی مشابهی در این زمینه وجود داره که زمانی که ساختار و چارچوب شی‌ءگرا مطرح شده و دنیای نرم‌افزار هم به این سمت رفت که از این ساختار بیشتر استفاده کنه، تاریخ نشون می‌ده که بخشی از موضوع به پیاده‌سازی فرهنگ استفاده از این ساختار مربوط بوده؛ به این معنی زمانی طول کشیده تا برنامه‌نویسان در مرحله اول متقاعد بشوند که این ساختار بهتر از ساختار‌های ساخت‌یافته و قدیمی‌تر بودند و در مرحله این تحول ایجاد شد که تقریباً تمام برنامه‌نویسان از ابتدایی که شروع به فکر و نوشتن برای یک نرم‌افزار می‌کنند، راه‌حل‌ها و ایده‌های خود را در این چارچوب ارائه بدهند. در مورد برنامه‌نویسی چند‌هسته‌ای هم در حال حاضر به نظر می‌آید که این تردید در جامعه‌ی برنامه‌نویس وجود دارد و زمانی طول خواهد کشید تا در آینده با این دیدگاه به سراغ حل مسائل و نوشتن نرم‌افزار بروند.

از سوی دیگر، بررسی راه‌حل‌های موجود نشان می‌دهد که تلاش‌های فراوانی در این زمینه صورت گرفته‌است؛ این تلاش‌های اکثراً یا به شکل زبان‌های جدید مبتنی برزبان‌های شیءگرا هستند یا به شکل کتابخانه‌های تکمیلی برای زبان هدف. نیز، دیدگاه‌های جدیدی هم در این زمینه مطرح‌ شده مانند استفاده از مدل Actor یا Software Transactional Memory که هر دو دارای خوبی‌ها و بدی‌هایی هم هستند.از طرف دیگر، دیدگاه‌های توصیفی (Declarative) در مقابل دستوری (Imperative) هم خود مایه‌ای برای تفاوت در تلاش‌های تحقیقاتی در این زمینه می‌باشد. نکته‌ی جالب این است که تقریباً تمام تحقیقات این موضوع به اتفاق از برنامه‌نویسی وظیفه‌گرا (Functional Programming) حمایت می‌کنند.

یکی از چالش‌های مطرح در این شاخه، انتقال (Migration) سیستم‌های حال حاضر در شرکت‌های بزرگ به بستر‌های چند‌هسته‌ای است. اهمیت این چالش در این است که خیلی‌ها به دنبال روش‌هایی هستند که یا بدون تغییر نرم‌افزار یا با کم‌ترین تغییرات این انتقال بستر را انجام دهند. به همین دلیل، به طور مثال، تلاش‌های خیلی وسیعی در این زمینه بر روی بستر جاوا (Java) انجام شده که به طور مثال این موضوع در زبانی مانند اسکالا (Scala) و با استفاده از Actorها و یا در کتابخانه‌ای مانند Multiverse از طریق Software Transactional Memory مورد تحقیق قرار گرفته‌است. در همین راستا، این موضوع نیز جالب است که زبان‌های برنامه‌نویسی در چه سطحی از دستورالعمل‌ها گرفته تا اشیاء برنامه این فناوری را پشتیبانی می‌کنند که هر کدام کاربرد‌های خود را دارند.

در این زمینه کتابی به نام The Art of Multiprocessor Programming کتابی خوب با محتوای مناسب برای آشنایی عمیق با این موضوعات می‌باشد.

تفاوت حافظه Heap و Stack در Java

بر اساس تعاریف JVM، هر زمانی که یک ریسمان (Thread) شروع به کار می‌کنه، حافظه‌ای بهش تخصص داده می‌شه که که بهش Stack گفته می‌شه. این حافظه متغیر‌ها محلی و نتایج محلی مربوط به ریسمان رو در خودش نگه می‌داره. برای تعیین مقادیر این حافظه از کلید‌های Xms و Xmx استفاده می‌شه. این حافظه در فراخوانی متد‌ها هم مورد استفاده قرار می‌گیره. هنگامی که ریسمان جدیدی درحال شروع باشه یا محاسبات ریسمانی نیاز به حافظه‌ی بیشتر داشته باشه و حد حافظه این امکان رو نده، خطایی که پرتاب می‌شه، StackOverflowError هست.

در طرف دیگه هر زمانی که یک JVM شروع به کار می‌کنه، حافظه‌ای رو در اختیار می‌گیره که درش اشیاء از روی کلاس‌های ساخته می‌شن. این حافظه بین تمام ریسمان‌های JVM مشترکه. برای تنظیم مقادیر این حافظه از کلید‌های XX:PermSize و XX:MaxPermSize استفاده می‌شه. در صورتی که در هنگام محاسبات نیاز به حافظه‌ی بیشتر از حد مجاز باشه، در صورتی که امکان تخصیص وجود نداشته باشه، خطایی دیده می‌شه به نام OutOfMemoryError.

نرم‌نویس

اولین برنامه‌م رو توی دبیرستان و روی یک بسته ورق کلاسور نوشتم؛ به زبان پاسکال. این‌جا از هر چه که مربوط به نرم‌افزار باشه می‌نویسم.