Lua Source Code Reading (Part 2)

Lua source code series:
Lua Source Code Reading Plan
Lua Source Code Reading (Part 1)
Lua Source Code Reading (Part 2)
Lua Source Code Reading (Part 3)
Lua Source Code Reading (Part 4)
Lua Source Code Reading (Part 5)
Lua Source Code Reading (Part 6)

After reading Yun Feng’s section on Lua strings, I have a corresponding understanding of Lua’s string handling.

Contents

String

Lua strings are internally divided into long strings and short strings. This classification is transparent at the Lua level and is likely just an optimization within the Lua interpreter.

Addendum¹: The maximum length of short strings is determined by the LUAI_MAXSHORTLEN macro. Long strings call createstrobj to create.

Long and short strings are handled differently during string creation and comparison.

When creating strings, short strings are directly internalized, and the extra bit in the TString structure marks whether it’s an internal reserved field. When creating long strings, memory is directly copied, and the extra bit is marked for lazy hashing during comparison or internalization.

When comparing strings, short strings directly compare pointer addresses. Long strings perform character-by-character comparison.

Addendum²: The long/short string optimization was added starting from Lua 5.2.1. Before that, all strings were stored in a global hash table.

Addendum³: When concatenating strings, using .. is inefficient as it involves memory allocation and memory copying. It’s recommended to use table.concat⁴ or string.format⁵.

userdata

Lua also has a UData structure called userdata. This data structure is mainly used when Lua interacts with C and C++, handing C data structures to Lua’s GC for management. Specifically, in C you can use lua_newuserdata() which creates a data structure in Lua and returns a pointer. This is similar to calling malloc, but the difference is you don’t need to manually call free to release memory - just leave it to Lua’s GC. A good example is Lua’s io library, which puts the C data structure FILE * into Lua’s userdata. By implementing a __gc metamethod, file handles can be automatically closed during GC.

https://github.com/xiaocang/lua-5.2.2_with_comments/releases/tag/lua_string_02

References:

Source Code Implementation of String Type in Lua

Added 2018-11-01 ↩︎
Added 2018-11-01 ↩︎
Added 2018-11-01 ↩︎
table.concat also uses the .. operator at the bottom layer for string concatenation, but it uses an algorithm to reduce the number of .. operations, reducing GC and thus improving efficiency. Main idea: Using a binary approach with a stack to store strings. Newly pushed strings are compared in length with strings below. If longer, use the .. operator to concatenate into a new string and remove the top string, continuing downward until encountering a longer string or reaching the stack bottom. This keeps the longest string at the bottom, making the stack pyramid-shaped, and finally uses the .. operator to concatenate all strings in the stack into the final string. Author: AaronChanFighting Source: CSDN Original: https://blog.csdn.net/qq_26958473/article/details/79392222 ↩︎
Each %s placeholder in string.format has a 512 character limit ↩︎

String

userdata

Leave a comment