Compositor is doing a ton of string manipulation.
The core model class that does most of it, SourceModel
(mentioned
here),
is still based around Foundation’s
NSMutableString
.
With the recent improvements to Swift’s native string type, I
plan to move the core model code away
from NSMutableString
, towards Swift native strings, to take advantage of
the more compact UTF-8 in-memory representation (as compared to NSString
’s UTF-16 in-memory format)
and benefit from certain performance flags
(e.g., isAscii
)
and corresponding optimized code paths.
I quickly realized I didn’t fully understand what happens under the hood when using NSString
s and native
Swift strings in code, so I ventured to take a closer look at how the Swift compiler translates code that
involves strings. I learned some techniques along the way that may be helpful for other applications, too,
so I wrote up my learnings.
Examining what the Swift Compiler emits
Let’s throw some code at the Swift compiler!
- Go to the awesome Compiler Explorer
- Select
Swift
from the language drop-down - Select
x86-64 swiftc 5
compiler - Set the optimization to
-Onone
Paste the following snippet into Compiler Explorer:
import Foundation
func testCastSwiftStringToNSString() {
let swiftString = "Hello" // #1
let nsstring = swiftString as NSString // #2
}
Note: I’m using Xcode 10.2.1 in what follows.
Creating a Swift string from a string literal (#1)
Let’s disect this. The line
let swiftString = "Hello" // #1
basically translates to
call ($sSS21_builtinStringLiteral17utf8CodeUnitCount7isASCIISSBp_BwBi1_tcfC)@PLT
Some notes on reading Compiler Explorer’s output:
- Swift-mangled names begin with a
$s
prefix - The
fC
suffix denotes an allocating constructor -
The numbers are the lengths of the identifiers:
- 21:
_builtinStringLiteral
- 17:
utf8CodeUnitCount
- 7:
isASCII
- 21:
Let’s see what this does at runtime. In Xcode, set a breakpoint on this line
let swiftString = "Hello" // #1
and launch the target in the debugger (I’m using a unit test target to run the code). When execution stops at this breakpoint, we’ll drop into Xcode’s console and continue on the lldb
level.
Now, how to set the breakpoint for the $sSS21_builtinStringLiteral17utf8CodeUnitCount7isASCIISSBp_BwBi1_tcfC
call (which is presumably an initializer invocation)?
First, we use lldb
’s regular expression-capable module name lookup, passing a regex that describes what we are looking for (i.e., _builtinStringLiteral.*utf8CodeUnitCount.*isASCII
):
(lldb) image lookup -rn "_builtinStringLiteral.*utf8CodeUnitCount.*isASCII"
5 matches found in /Users/ktraunmueller/Library/Developer/Xcode/DerivedData/Strings-afzvxwjhkbfwrjbkghdpnalizovg/Build/Products/Debug-iphonesimulator/StringTests.xctest/Frameworks/libswiftCore.dylib:
Address: libswiftCore.dylib[0x0000000000177060] (libswiftCore.dylib.__TEXT.__text + 1523184)
Summary: libswiftCore.dylib`protocol witness for Swift._ExpressibleByBuiltinStringLiteral.init(_builtinStringLiteral: Builtin.RawPointer, utf8CodeUnitCount: Builtin.Word, isASCII: Builtin.Int1) -> A in conformance Swift.StaticString : Swift._ExpressibleByBuiltinStringLiteral in Swift
Address: libswiftCore.dylib[0x00000000001836f0] (libswiftCore.dylib.__TEXT.__text + 1574016)
Summary: libswiftCore.dylib`protocol witness for Swift._ExpressibleByBuiltinStringLiteral.init(_builtinStringLiteral: Builtin.RawPointer, utf8CodeUnitCount: Builtin.Word, isASCII: Builtin.Int1) -> A in conformance Swift.String : Swift._ExpressibleByBuiltinStringLiteral in Swift
Address: libswiftCore.dylib[0x000000000000d190] (libswiftCore.dylib.__TEXT.__text + 40736)
Summary: libswiftCore.dylib`Swift.String.init(_builtinStringLiteral: Builtin.RawPointer, utf8CodeUnitCount: Builtin.Word, isASCII: Builtin.Int1) -> Swift.String
Address: libswiftCore.dylib[0x00000000000077e0] (libswiftCore.dylib.__TEXT.__text + 17776)
Summary: libswiftCore.dylib`Swift.StaticString.init(_builtinStringLiteral: Builtin.RawPointer, utf8CodeUnitCount: Builtin.Word, isASCII: Builtin.Int1) -> Swift.StaticString
Address: libswiftCore.dylib[0x000000000029b000] (libswiftCore.dylib.__TEXT.__text + 2719120)
Summary: libswiftCore.dylib`dispatch thunk of Swift._ExpressibleByBuiltinStringLiteral.init(_builtinStringLiteral: Builtin.RawPointer, utf8CodeUnitCount: Builtin.Word, isASCII: Builtin.Int1) -> A
Ignoring the protocol witness and dispatch thunk results, this boils down to
libswiftCore.dylib`Swift.String.init(
_builtinStringLiteral: Builtin.RawPointer,
utf8CodeUnitCount: Builtin.Word,
isASCII: Builtin.Int1) -> Swift.String
and
libswiftCore.dylib`Swift.StaticString.init(
_builtinStringLiteral: Builtin.RawPointer,
utf8CodeUnitCount: Builtin.Word,
isASCII: Builtin.Int1) -> Swift.StaticString
Let’s set breakpoints on both of these, again using an lldb
regular expression-capable breakpoint command:
(lldb) rb "Swift.String.init\(_builtinStringLiteral: Builtin.RawPointer, utf8CodeUnitCount: Builtin.Word, isASCII: Builtin.Int1\)"
Breakpoint 2: where = libswiftCore.dylib`Swift.String.init(_builtinStringLiteral: Builtin.RawPointer, utf8CodeUnitCount: Builtin.Word, isASCII: Builtin.Int1) -> Swift.String, address = 0x000000011c0f7190
(lldb) rb "Swift.StaticString.init\(_builtinStringLiteral: Builtin.RawPointer, utf8CodeUnitCount: Builtin.Word, isASCII: Builtin.Int1\)"
Breakpoint 3: where = libswiftCore.dylib`Swift.StaticString.init(_builtinStringLiteral: Builtin.RawPointer, utf8CodeUnitCount: Builtin.Word, isASCII: Builtin.Int1) -> Swift.StaticString, address = 0x000000011c0f17e0
Note that we need to escape the parentheses, because they have a special meaning in regular expressions.
Ok, the breakpoints were successfully set up, so let’s continue:
(lldb) continue
We’re hitting this breakpoint:
libswiftCore.dylib`Swift.String.init(_builtinStringLiteral: Builtin.RawPointer, utf8CodeUnitCount: Builtin.Word, isASCII: Builtin.Int1) -> Swift.String:
-> 0x11c0f7190 <+0>: pushq %rbp
0x11c0f7191 <+1>: movq %rsp, %rbp
...
So the string literal translates to the creation of a regular Swift String
instance. Who would have thought.
StaticString
, the other candidate, is a type designed for a very narrow set of use cases where runtime modification or interpolation must be prevented. It is used in the os.os_log()
API, for example.
Note: Here’s some more useful lldb
commands:
- List all current breakpoints:
(lldb) br list
Current breakpoints:
1: file = '/Users/ktraunmueller/Documents/sandbox/misc/Strings/StringTests/StringTests.swift', line = 16, exact_match = 0, locations = 1, resolved = 1, hit count = 1
1.1: where = StringTests`StringTests.StringTests.testCastSwiftStringToNSString() -> () + 27 at StringTests.swift:16:27, address = 0x000000011a4540eb, resolved, hit count = 1
2: regex = 'Swift.String.init\(_builtinStringLiteral:', locations = 1, resolved = 1, hit count = 1
2.1: where = libswiftCore.dylib`Swift.String.init(_builtinStringLiteral: Builtin.RawPointer, utf8CodeUnitCount: Builtin.Word, isASCII: Builtin.Int1) -> Swift.String, address = 0x000000011c0f7190, resolved, hit count = 1
3: regex = 'Swift.StaticString.init\(_builtinStringLiteral:', locations = 1, resolved = 1, hit count = 0
3.1: where = libswiftCore.dylib`Swift.StaticString.init(_builtinStringLiteral: Builtin.RawPointer, utf8CodeUnitCount: Builtin.Word, isASCII: Builtin.Int1) -> Swift.StaticString, address = 0x000000011c0f17e0, resolved, hit count = 0
- Clear all breakpoints set for a debugging session:
(lldb) br delete -f
All breakpoints removed. (4 breakpoints)
- Note that Xcode’s console offers completion for lldb commands, and there’s also a
help
option for all commands, e.g.
(lldb) br help
Commands for operating on breakpoints (see 'help b' for shorthand.)
...
Ok, let’s continue with our example.
Casting a Swift string to a Foundation string (#2)
The line
let nsstring = swiftString as NSString // #2
basically translates to
call ($sSS10FoundationE19_bridgeToObjectiveCAA8NSStringCyF)@PLT
Notes
- The trailing
F
says that_bridgeToObjectiveC()
is a function (in moduleFoundation
).
To find the function signature for setting a breakpoint, we again perform a regular expression name search:
(lldb) image lookup -rn _bridgeToObjectiveC.*NSString
2 matches found in /Users/ktraunmueller/Library/Developer/Xcode/DerivedData/Strings-afzvxwjhkbfwrjbkghdpnalizovg/Build/Products/Debug-iphonesimulator/StringTests.xctest/Frameworks/libswiftFoundation.dylib:
Address: libswiftFoundation.dylib[0x0000000000015ac0] (libswiftFoundation.dylib.__TEXT.__text + 68256)
Summary: libswiftFoundation.dylib`(extension in Foundation):Swift.String._bridgeToObjectiveC() -> __C.NSString
Address: libswiftFoundation.dylib[0x00000000000e30b0] (libswiftFoundation.dylib.__TEXT.__text + 909456)
Summary: libswiftFoundation.dylib`(extension in Foundation):Swift.Substring._bridgeToObjectiveC() -> __C.NSString
Since our code involves no Substring
-producing API, it should be enough to consider the first candidate.
In this case, we can actually look up the source code for Swift.String
. This seems to be what gets invoked:
extension String : _ObjectiveCBridgeable {
@_semantics("convertToObjectiveC")
public func _bridgeToObjectiveC() -> NSString {
// This method should not do anything extra except calling into the
// implementation inside core. (These two entry points should be
// equivalent.)
return unsafeBitCast(_bridgeToObjectiveCImpl() as AnyObject, to: NSString.self)
}
Checking with another name lookup for _bridgeToObjectiveCImpl
:
(lldb) image lookup -n _bridgeToObjectiveCImpl
4 matches found in /Users/ktraunmueller/Library/Developer/Xcode/DerivedData/Strings-afzvxwjhkbfwrjbkghdpnalizovg/Build/Products/Debug-iphonesimulator/StringTests.xctest/Frameworks/libswiftCore.dylib:
Address: libswiftCore.dylib[0x0000000000089950] (libswiftCore.dylib.__TEXT.__text + 550624)
Summary: libswiftCore.dylib`Swift.Dictionary._bridgeToObjectiveCImpl() -> Swift.AnyObject
Address: libswiftCore.dylib[0x0000000000097070] (libswiftCore.dylib.__TEXT.__text + 605696)
Summary: libswiftCore.dylib`Swift.String._bridgeToObjectiveCImpl() -> Swift.AnyObject
Address: libswiftCore.dylib[0x000000000001df00] (libswiftCore.dylib.__TEXT.__text + 109712)
Summary: libswiftCore.dylib`Swift.Array._bridgeToObjectiveCImpl() -> Swift.AnyObject
Address: libswiftCore.dylib[0x0000000000168cd0] (libswiftCore.dylib.__TEXT.__text + 1464928)
Summary: libswiftCore.dylib`Swift.Set._bridgeToObjectiveCImpl() -> Swift.AnyObject
returns Swift.String._bridgeToObjectiveCImpl()
, as expected.
This should corresponds to the code in StringBridge.swift
:
extension String {
@_effects(releasenone)
public // SPI(Foundation)
func _bridgeToObjectiveCImpl() -> AnyObject {
if _guts.isSmall {
return _guts.asSmall.withUTF8 { bufPtr in
return _createCFString(
bufPtr.baseAddress._unsafelyUnwrappedUnchecked,
bufPtr.count,
kCFStringEncodingUTF8
)
}
}
if _guts._object.isImmortal {
// TODO: We'd rather emit a valid ObjC object statically than create a
// shared string class instance.
let gutsCountAndFlags = _guts._object._countAndFlags
return __SharedStringStorage(
immortal: _guts._object.fastUTF8.baseAddress!,
countAndFlags: _StringObject.CountAndFlags(
sharedCount: _guts.count, isASCII: gutsCountAndFlags.isASCII))
}
_internalInvariant(_guts._object.hasObjCBridgeableObject,
"Unknown non-bridgeable object case")
return _guts._object.objCBridgeableObject
}
}
Since we’re casting a small string, I’d expected the first if
to be entered, and the _createCFString()
method to be called:
@_effects(releasenone)
private func _createCFString(
_ ptr: UnsafePointer<UInt8>,
_ count: Int,
_ encoding: UInt32
) -> AnyObject {
return _swift_stdlib_CFStringCreateWithBytes(
nil, //ignored in the shim for perf reasons
ptr,
count,
kCFStringEncodingUTF8,
0
) as AnyObject
}
_createCFString()
calls into _swift_stdlib_CFStringCreateWithBytes
:
_swift_shims_CFStringRef
swift::_swift_stdlib_CFStringCreateWithBytes(
const void *unused, const uint8_t *bytes,
_swift_shims_CFIndex numBytes, _swift_shims_CFStringEncoding encoding,
_swift_shims_Boolean isExternalRepresentation) {
assert(unused == NULL);
return cast(CFStringCreateWithBytes(kCFAllocatorSystemDefault, bytes, numBytes,
cast(encoding),
isExternalRepresentation));
}
which finally calls CoreFoundation
’s CFStringCreateWithBytes()
.
Let’s set a breakpoint for CFStringCreateWithBytes()
and see if we hit it. Continuing (twice) confirms our guess:
(lldb) rb CFStringCreateWithBytes
Breakpoint 2: 6 locations.
(lldb) continue
Process 47888 resuming
(lldb) continue
Process 47888 resuming
(lldb) thread backtrace
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 2.1
* frame #0: 0x000000010372dfb0 CoreFoundation`CFStringCreateWithBytes
frame #1: 0x000000011a9d315e libswiftCore.dylib`function signature specialization <Arg[0] = Exploded> of Swift.String._bridgeToObjectiveCImpl() -> Swift.AnyObject + 110
frame #2: 0x000000011a801079 libswiftCore.dylib`Swift.String._bridgeToObjectiveCImpl() -> Swift.AnyObject + 9
frame #3: 0x000000011ae87ac9 libswiftFoundation.dylib`(extension in Foundation):Swift.String._bridgeToObjectiveC() -> __C.NSString + 9
frame #4: 0x0000000118ae612c StringTests`StringTests.testCastSwiftStringToNSString(self=0x00007f9817c0cea0) at StringTests.swift:17:24
So line #2 in our example allocates a new instance of a CFString
, which is bitcast to NSString
by String
’s _bridgeToObjectiveC()
method:
unsafeBitCast(_bridgeToObjectiveCImpl() as AnyObject, to: NSString.self)
Wrapping Up
Ok, that’s quite a lengthy analysis of two lines of Swift code, but it helped me get a better understanding of what happens under the hood when writing Swift code that deals with strings. I hope you find it useful, too!
References
- Swift name mangling
- Mike Ash’s Friday Q&A 2014-08-08: Swift Name Mangling
- The Complete Guide to Function Mangling in iOS