String Utilities

GitLab Docs Pipeline Coverage
This module contains utilities around manipulating and working with strings and readable output, and helping with CLI input and output. See morimekta.net/utils for procedures on releases.

Getting Started

To add to maven:

<dependency>
    <groupId>net.morimekta.utils</groupId>
    <artifactId>strings</artifactId>
    <version>4.5.0</version>
</dependency>

To add to gradle:

implementation 'net.morimekta.utils:strings:4.5.0'

Core Utilities

  • ConsoleUtil: Contains a few methods related to the visibility of characters on the console.
  • EscapeUtil: Escape and unescape strings using the same escape sequence as strings in java code.
  • NamingUtil: Reformat names using naming rules.
  • ReaderUtil: Utilities to read or skip content from Reader.
  • StringUtil: Get properties of strings and modify strings using extra utilities from the library. Also has util to make consistent string formatting of any object.

And interfaces

  • Displayable: Simple interface with a displayString() method, and utility methods to make readable strings of standard java utility classes.
  • Stringable: Simple interface with an asString() method, and utility methods to make to-string like strings from standard java types.

Character

  • Char, and implementations Color, Control, Unicode: These are wrappers around single keystrokes, control sequences, terminal colors and unicode chars. When handling terminal input, these can represent a keystroke each, or when updating a terminal also control visible colors, move cursor around etc.
  • CharReader: A reader that can reads Char objects form an input stream.
  • CharStream (and CharSplitterator): Makes a stream of chars from a string or input stream.
  • CharUtil: Utilities making chars from meaningful input, or bytes from list of chars.
  • CharSlice: Make an immutable sliced view of a char sequence. Operates as a CharSequence, but unlike a string, will never copy the underlying data on view operations.

Diff

  • DiffStringUtil: Utilities used when handling diffs. Splitting by line into CharSlice, and prefix, suffix and overlap comparisons of char sequences.
  • PatchUtil: Get line-by-line diff of two strings, and make parch strings of a list of changes.

Encoding

  • GSMCharset: Charset used in GSM (mobile) encoding. Uses 7 bits per byte in encoding, so can be bit-packed after encoding.
  • T61Charset: Charset used in TELEX (old terminal control exchange format). Is mostly a subset of ASCII plus it's own extended characters.
  • TBCDCharset: Charset used in SS7 and MAP messages (core telco systems used by GSM systems since 1986). Encodes number sequences plus * and # and the letters a-c. Has special variant for handling odd number of digits.

IO

  • LineBufferedReader: Read from a sub-reader, but only buffering one line at a time. Will never read the next line until required.
  • Utf8Stream(Reader|Writer): Read or write characters to a stream using proper UTF-8 encoding, meaning UTF-16 / USC2 combined characters are properly re-encoded to UTF-8 sequences. Also does no caching except for handling the extended unicode chars.
  • IndentedPrintWriter: Write while remembering and applying ongoing indent. Can stack indents so as to generate a properly indented string based on simpler code.