What is Real-Time Text (RTT)?

Real-Time Text

Before we discuss 'Real-time Text', let us consider what is 'Real-time'.

Real-time means that something occurs within a fraction of a second. For example, a voice conversation between two or more people happens in real-time. The audio is sent and received immediately by the people. Another example of real-time is a computer game, where the actions of the player are immediately shown on the computer screen.

Real-Time Text is text transmitted while it is being typed or created, with the characters being sent immediately (within a fraction of a second) once typed, and also displayed immediately to the receiving person(s). This means that the receiving person(s) can read the newly created text while the sender is still typing it. In this way Real-Time Text has the same conversational directness and interactivity as voice.

The International Telecom Union (ITU) has defined 'real-time' in ITU-T F.700 Section 2.1.2.1 and 'Real-Time Text' in ITU-T F.700 Annex A.3 and ITU-T F.703 Section 5.3.2.3.

Real-Time Text is of particular importance for people who are Deaf or Hard of Hearing as a replacement for voice telephony (not a complementary technology). However, it is expected that Real-Time Text will be adopted by mainstream users as well. In particular, it is a natural extension for other real-time, conversational services such as voice telephony, e.g., for use in noisy environments, when you want to communicate during a meeting when voice is not appropriate. It is also very useful for conveying information where exact spelling is important during a voice call, e.g. booking numbers, street addresses, words that are hard to perceive because of different dialects etc.

Consistency with other real-time multimedia

Real-Time Text can be used as a stand-alone feature or it can be used in conjunction with other real-time features such as voice telephony and video conferencing. Real-Time Text can be used to enhance the effectiveness of these other features.

In order to achieve this, any implementation of Real-Time Text should be consistent with the implementation of these other features. For instance, the means of control and the method of transport of Real-Time Text should be as similar as possible to those used for voice and video.

The presentation of Real-Time Text simultaneously with Voice and Video is called Total Conversation. The ITU-T has defined Total Conversation (F.703 Section 7.2). 3GPP has defined a set of protocols to implement Total Conversation in an IP environment. 3GPP TS 26.114 "Multimedia Telephony, media handling and interaction" describes how to implement real-time voice, video and text services.

Use of Real-Time Text

Real-Time Text provides a quite different user experience from Instant Messaging (IM). Real-Time Text and IM (and email) are complementary text services with different capabilities. Real-Time Text allows new services to be created or improve existing services. Real-time Text can be used:

  • in conjunction with voice and/or video in a multimedia communication or on its own, on fixed or mobile accesses,
  • by people who want a fast and really interactive means of conversing,
  • in noisy environments where it may be hard to hear,
  • in environments where other people are nearby but where communications privacy is required,
  • to transfer information e.g. numbers, addresses etc, where exactness is necessary,
  • by people who are Deaf or Hard of Hearing or with a speech impairment to communicate with other people , including people who can hear and speak.
  • for relay services to offer real-time conversion between different modes of communication as a service to people who are Deaf or Hard of Hearing or with a speech impairment. E.g. to provide real-time captioning of a voice conversation for people who are Hard-of-Hearing.
  • to provide all voice callers with a convenient means to accurately pass numbers, addresses and other detailed information in text.
  • to allow people who are Deaf or Hard of Hearing or have a speech impairment to use the emergency services without limitations.
  • to offer remote interpreter and transcribing (note taking) services for every user who needs it.

ToIP (RFC 4103) as the primary Real-Time Text standard for IP networks

The IETF has defined a Framework for Real-Time Text in RFC5194. At the core of this framework are SIP control and Real-Time Text transport using RTP as described in RFC4103. This technology is broadly supported for VoIP and other multimedia applications in an IP environment, and is also consistent with the Total Conversation concept. Real-Time Text that operates over IP and is based on RFC4103 is often referred to as Text-over-IP or ToIP.

Features of Text over IP (ToIP)

ToIP is designed around the ITU-T T.140 Real-Time Text presentation layer protocol. T.140 allows real-time editing of text e.g. by using 'backspace' and retyping. T.140 is based on the ISO 10646-1 character set that is used by most IP text specifications and uses the UTF-8 format. Transport of ToIP uses the same Real-time Transport Protocol (RTP) as VoIP and Video-over-IP. The text is encoded according to IETF RFC 4103 titled "RTP Payload for Text Conversation". RFC 4103 supports an optional error correction scheme based on redundant transmission (using RFC 2198). This results in a very low end-to-end character loss across IP transmission links that have moderately high packet loss. To improve efficiency, text is buffered for 300 - 500 milliseconds before it is sent whilst still meeting the delay requirements.

RTP is usually transported over the User Datagram Protocol (UDP). However, because 2.5G mobile/cellular phones supported the Transmission Control Protocol (TCP) but did not consistently support API's for UDP, some implementations of ToIP over mobile/cellular networks use TCP internally optimized for such networks. 3G mobile/cellular networks can support UDP, which means that such networks can use ToIP natively.

The protocol stack for ToIP is:

T.140
RFC4103
RTP
UDP
IP

Despite that ToIP is character by character streaming text; the used bandwidth of ToIP is very low compared to VoIP.

Typing 30 characters per second results in a traffic load between 2 and 3 Kbit/s depending on the language used (including overheads for RFC4103 with the maximum level of redundancy, RTP, UDP and IP).

Control of ToIP sessions has been defined using the standard Session Initiation Protocol (SIP) (RFC 3261) and the Session Description Protocol (SDP) (RFC 4566) protocols.

  • SIP is used without any alteration.
  • Real-Time Text encoding is identified by using the SDP media definition 'm=text'.
  • The details of SDP for ToIP are described in RFC 4103 and its errata.

Text over IP developments

Next Generation Networking (NGNs) is a concept developed by telecommunication service providers and their suppliers. It aims to create a true multi-service network based in IP technology.

ToIP has been specified for inclusion in the 3GPP IP Multimedia Subsystem (IMS) (in 3GPP TS 26.114 "IMS, Multimedia Telephony, Media handling and interaction"). IMS is being used to implement NGNs in many fixed and mobile networks.

Support of ToIP is seriously being considered in multimedia Emergency Public-safety answering point (PSAPs) in Europe (112) and USA and Canada (911).

The ECRIT IETF working group defines ToIP as one medium in the access to Emergency Services.

ToIP can provide a 'low impact' solution to meeting national regulatory requirements to provide 'equivalent service' to the telephone service for people who have hearing or speech impairments.