Unicage FAQ’s

Click the issue you want to know more about

1. Why is Processing Speed fast?

Each command is written in C language, and the input/output buffer, memory manipulation and calculation algorithm have been designed to allow high-speed processing.
The Shell uses kernel functions directly. By removing middleware there is no processing.
Unicage Shell Script programming methodology avoids slow variable type programming and functional programming and it follows the data flow programming that takes advantage of the processing speed of each command.
In Unicage Shell programming we organize the data in advance for increased performance. Unicage developed high-speed commands for complex sorting.

2. Comparison with mainstream technologies

Performance benchmark vs mainstream data storage and data processing frameworks (Spark, SparkSQL, Kudu/ HDFS, Hadoop)
Research studies in prestigious higher education institutions both in the United States (MIT), Japan (Kanazawa University) and Europe (IST Lisbon) show results ranging from 3 to 50 times faster.

Gains in development productivity
Unicage provides a significant reduction of lines of code (depending on the language to be converted, for example Cobol application re-writing is a 20:1 ratio). Unicage is also easy to read and understand and provides easily measurable auditing to the usage of the system.

3. Unicage format vs DB architecture

scheme

A.When programs update records in any given table, DBMS lock access to related records in other tables to avoid inconsistencies and consumption of time for locking and releasing. When manipulating a large numbers of records, significant amounts of time are spent in locking and unlocking.
When adding or deleting records, typical DBMS need to allocate and release the memory (malloc) for each record, also consuming significant time and resources.

scheme

B. In Unicage, consistency is ensured when process changes are completed. Records are not locked even if there is a large number of records to change.

The total amount of memory required to add or delete records is allocated and released at once, requiring only one memory execution (malloc).

4. Data organization and distribution

How is the data organized at the storage level (column,row)?
Data is stored in a UNIX file system as a regular flat text file format. The Unicage methodology consists of the way data is organized and then processed by our proprietary commands. The uniqueness of the solution lies on its ability to utilize flat text files instead of requiring middleware or relational database engines that decrease the speed of execution.

How is the data distributed across data units?
Parallel processing is done on a master-slave model where flat text files are divided according to the logic of the script across all the available nodes. Those nodes execute the script and finally they are merged and consolidated into a new text file with the result of the execution.

5. Data file management (levels)

Unicage organizes data files into business units along five levels (‘OSAHO’ – Unicage’s etiquette of data file management)

  • Level 1 (event data)
  • Level 2 (confirmed data)
  • Level 3 (organized data)
  • Level 4 (application reference data
  • Level 5 (application output data)

Data files and programs are managed in a pre-defined folder structure

  • DATA/LV1, DATA/LV2, DATA/LV3,
  • DATA/LV5 : Level 1-5 data files
  • SYS:Shell script
  • LOG:Running log of the shell script
  • LAYOUT:Layout file
  • SEMAPHORE:Semaphore file
  • RCV: Receive files from an external system
  • SND: Send files to an external system
  • BACKUP: Backup of shell scripts

Unicage does not require the existence of any special file management software

6. Data file relation ( join commands)

Unicage joins multiple files by using the “join command” such as join1 (inner join) and join2 (outer join).
It is possible to generate a new file that incorporates data in the sales data file and the item master file.

scheme
scheme

7. Data Integrity

Typical DBMS allow to insert only one record per key. The integrity and centralized management of data is governed by the immutability of records and keys.
Therefore, this underlying functionality of Unicage eliminates any mismatches in results in the plurality of processing within monthly or daily batch processing.

On the other hand, Unicage is consistent in its concept of “original data is the most important“ and not a requirement for "physically unique data".
Data is stored as Level1 file, as the result of various unique processes, to assure the integrity of the original data and its results. If inconsistencies are encountered within the results of a plurality of processes, Unicage allows correcting of the processing thru reprocessing, repairing the mismatch easily.
As the data integrity of Unicage relies on storage of the correct "original data", Unicage enhances the safety of your system with copy and backup, so that the original data is not compromised.

8. Access control of data files

In DB, DBMS lock automatically the physical record for exclusive access control. On the other hand, in Unicage, you explicitly specify the range that you want to process exclusively, using the unlock command in your program.

scheme

9. Changes rollback

scheme
scheme

10. Backup of data files

Backup commands such as tar, cpio in Linux can be used.
For example, you want to backup the directory user1 in the home folder.
# cd /home
# tar cvf /def /nst0 user1

scheme

11. Interoperability, locking, compression and memory management

Interoperability
Unicage is based on UNIX fundamentals. Interoperability through frameworks such as Tivoli ( IBM) or JP1 (Hitachi).

Locking
Unicage does not require locking unless a concurrent situation is present. For those purposes, the command “ulock” is available.

Compression
Unicage is compatible with multiple compression tools. For example.gz .Z is used for data compression.

Memory Management
Unicage processes data based on streaming. Which decreases the memory usage comparatively to technologies such as java or python. Managing memory is accomplished by Unix commands.

12. Security (authentication, authorization and encryption)

Unicage utilizes segregation mechanisms of the underlying UNIX Operating System: filesystem permissions, memory stack protection and role-based access controls.
Encryption can be achieved on several levels - either native filesystem encryption mechanisms (F2FS in Linux or ZFS in BSD/UNIX) or 3rd party mechanisms offered by a number of vendors. Self-encrypting disk is also a possibility as Unicage just uses the Operating System POSIX infrastructure to access the data storage.
Security can be increased through File checksumming tools (such as native capacity or products like Tripwire) and rule based firewall.

A typical Linux/UNIX node running Unicage will only need SSH as an open port - this can be ensured by service minimization (disabling and/or uninstalling unnecessary services). This is implemented by a host-based firewall, which will permit access only from known hosts. SSH access is restricted to a number of known users (no Administration/Super User access is conceded).
Essential configuration files are then protected by checksumming to ensure no alteration of content.
File systems can be encrypted to prevent data loss and processes will run only with needed privileges inside protected memory.

13. Backup of data files

1. Security of application
One can develop applications (access tools) with access rights by using the “getpermission” command, which is enabled to read participation of the permission by referring to the table.

2. Security of data
Non-developers cannot directly access text files of Unicage, without specific permission. Developers can also be restricted from access, as one can change the settings of the OS (for example, use the SELinux settings). Likewise, the system administrator (the person who set the OS) may be allowed to access those files, encrypting those files so that the security administrator’s password is needed.<

scheme
scheme

14. Data file relation ( join commands)

scheme

15. UNIX parallel processing

UNIX is a multi-user, multi-tasking OS. You can run multiple jobs for multiple users at the same time.

  • Parallel processing commands used:
  • Specify “& (background)“ when the job starts, to parallelize the job
  • “bg” or ”fg” commands switch between parallelizing and sequencing along the processing
  • “nice“ command changes the priority of parallel processing
  • ”stop“ or ”kill” commands interrupt or stop the job
  • “jobs” or “ps” or “tree” commands allow monitoring the parallel processing

Above mentioned job control commands, allow for writing a shell script to perform parallel processing in any number of processes.

16. Partitioning, indexing and concurrency

Partitioning
There are no special requirements on partitioning. Nodes can be independent servers or virtualized (i.e Docker). The only recommendation we provide is to leave 10% disk space available for our data operations inside the disk.

Indexing
There are no special requirements for indexing as everything is executed on the UNIX file system as a text file.

Concurrency
Unicage is based on UNIX fundamentals where multithreading, concurrent and exclusive processes are allowed. Our suite of commands contains blocking commands as well as atomic writing.

17. Scalability and workload management support

Scalability
Unicage scales quasi-linearly with extra hardware. The cluster version commands provide an automatic map and reduce process.

Workload Management support
There are multiple frameworks that control UNIX processes. As an example for general workload management commands such as “ulimit” or “nice” can be used. Enforcing detail management usually is handled by sub-OS functions such as “cgroup” or “jail”.

18. Handling of image files

Use the image processing command that has been published in the UNIX / Linux distribution (examples for ImageDisk functions):
• Conversion of the image format
convert <original file.extension> <file name after conversion.extension> (e.g. to convert from JPG to PPM use convert test.jpg test.ppm)
• Monochroming
In the conversion of the image format, image in color can be converted to a format in monochrome (e.g. convert test.jpg test.pgm)
• Image scaling
Use “convert –scale” command: convert -scale 30% test.jpg test.ppm
• Trimming
convert -crop ordinate of upper-left, abscissa of upper-left+width+height filename newfilename
• Create animated GIF from a plurality of images
convert image1 image2 image3 … output_animation.gif
• Binding of a plurality of images
Create a catalog screen for a plurality of images montage Image1 Image2 Image3
… Output_catalog_file