Unicage FAQ’s

Click the issue you want to know more about

1. Why is Processing Speed fast?

Each command is written in primitive C language, and input/output buffer, memory manipulation and calculation algorithm has been designed to allow high-speed processing.
Shell uses the kernel functions directly. It is basically a high-speed processing system because there is no unnecessary processing and middleware.
In Unicage Shell programming method we avoid slow variable type programming and functional programming, and follow the data flow programming that takes advantage of the processing speed of each command.
In Unicage Shell programming we prepare sorted data in advance to avoid unnecessary sort. Also we have developed a high-speed commands for sorting, sort of 10 million records can be carried out in about one second.

2. Comparison with mainstream technologies

Performance benchmark vs mainstream data storage and data processing frameworks (Spark, SparkSQL, Kudu/ HDFS, Hadoop)
Research studies in prestigious higher education institutions both in the United States (MIT), Japan (Kanazawa University) and Europe (IST Lisbon) showing results ranging from 3 to 50 times faster depending on the technology.

Gains in development productivity
Unicage provides significant reduction on lines of code (depending on language to be converted, for example Cobol application re-writing is a 20:1 ratio). Unicage is also easy to read and understand and provides easily measurable auditing to the usage of the system.

3. Unicage format vs DB architecture

scheme

A.When programs update records in any given table, DBMS lock access to the related records in other tables, to avoid inconsistencies, consuming time for locking and releasing. When manipulating large numbers of records, significant amount of time is spent in locking/unlicking.
When adding or deleting records, typical DBMS need to allocate and release the memory (malloc) for each record, also consuming significant time and resources.

scheme

B. In Unicage, consistency is ensured when the change process is completed. Records are not locked even if there are large number of records to change.

The total amount of memory required to adding or deleting records is allocated and released at once, requiring only one memory execution (malloc).

4. Data organization and distribution

How the data is organized at the storage level (columnar, row,...etc)?
Data is stored in a UNIX file system as a regular flat text file format. The Unicage methodology consists of the way data is organized and then processed by our proprietary commands. The uniqueness of the solution lies on its ability to utilize flat text files instead of requiring middleware or relational database engines that decrease the speed of execution.

How the data is distributed across data units?
Parallel processing is done on a master-slave model where flat text files are divided according to the logic of the script across all the available nodes. Those nodes execute the script and finally they are merged and consolidated into a new text file with the result of the execution.

5. Data file management (levels)

Unicage organizes data files into business units along five levels (‘OSAHO’ – Unicage’s etiquette of data file management)

  • Level 1 (event data)
  • Level 2 (confirmed data)
  • Level 3 (organized data)
  • Level 4 (application reference data
  • Level 5 (application output data)

Data files and programs are managed in a pre-defined folder structure

  • DATA/LV1, DATA/LV2, DATA/LV3,
  • DATA/LV5 : Level 1-5 data files
  • SYS:Shell script
  • LOG:Running log of the shell script
  • LAYOUT:Layout file
  • SEMAPHORE:Semaphore file
  • RCV: Receive files from an external system
  • SND: Send files to an external system
  • BACKUP: Backup of shell scripts

Unicage does not require the existence of special file management software

6. Data file relation ( join commands)

You can join multiple files with a key by “join command” such as join1 (inner join) and join2 (outer join).
For example, to join sales data file and item master fie, it is possible to generate a new file that incorporates the item to the sales data file.

scheme
scheme

7. Data Integrity

Typical DBMS allow to input only one data against one key. The integrity and centralized management of data is governed by this “data is physically unique" principle.
In this way, for example, if the mismatch occurs in the results of the plurality of processing such as daily data processing and monthly data processing, there is a disadvantage that it is not possible to modify the system on this inconsistency.

On the other hand, Unicage has a concept of “original data is the most important“ and not requirement for "physically unique data".
In other words, all the data is stored as Level1 file if it is fact, and we keep the result of various processes unique, assure the integrity of the original data and results. If you encounter inconsistencies in the results of a plurality of processes, all you need are correcting the processing and reprocessing it. Then you can reasonably repair the mismatch.
As the data integrity in Unicage relies on storage of the right "original data", we enhance the safety of system with copy and backup so that the original data is not compromised.

8. Access control of data files

In DB, DBMS locks automatically the physical record for exclusive access control. On the other hand, in Unicage you explicitly specify the range that you want to process exclusively using the unlock command in your program.

scheme

9. Changes rollback

scheme
scheme

10. Backup of data files

Backup commands such as tar, cpio in the Linux can be used.
For example, you want to backup the directory user1 under the home.
# cd /home
# tar cvf /def /nst0 user1

scheme

11. Interoperability, locking, compression and memory management

Interoperability
Unicage is based on UNIX fundamentals. Interoperability through frameworks such as Tivoli ( IBM) or JP1 (Hitachi).

Locking
Unicage does not require locking unless a concurrency situation comes along. For those purposes the command “ulock” is available.

Compression
Unicage is compatible with multiple compression tools. For example.gz .Z is used for data compression.

Memory Management
Unicage processes data based on streaming which decreases the memory usage comparatively to technologies such as java or python. As explained on the workload management support question, managing memory is possible thru Unix commands.

12. Security (authentication, authorization, encryption, …)

Unicage utilizes the segregation mechanisms of the underlying UNIX Operating System: filesystem permissions, memory stack protection and role based access control.
Encryption can be achieved on several levels - either native filesystem encryption mechanisms (F2FS in Linux or ZFS in BSD/UNIX) or 3rd party echanisms offered by a number of vendors. Self-encrypting disk is also a possibility as Unicage just uses the perating System POSIX infrastructure to access the data storage.
Security can be increased through File hecksumming tools (such as nativen capacity or products like Tripwire), host based firewall rulebase, etc.
Prevention techniques are also used - plataforms used in Unicage require only a minimal installation and can be further minimized at deploy time, reducing then attack footprint dramatically (especially when compared to Windows based servers or "Vanilla" Linux-based appliances).

A typical Linux/UNIX node running Unicage will only need SSH as an open port - this can be ensured by service minimization (disabling and/or uninstalling unnecessary services) and then omplemented by an host based firewall which will only permit access from known hosts. SSH access is restricted to a number of known users (no Administration/Super User access is conceded).
Essential configuration files are then protected by checksumming to ensure no one alters any content.
Filesystem can be encrypted to prevent data loss and processes will run only with needed priviledges and inside a protected memory stack.

13. Backup of data files

1. Security of application
You can develop application (access tool) with access rights by using “getpermission” command which is enabled to read participate the permission by referring the table.

2. Security of data
Non-developers can not directly access text files of Unicage. If you want developers not to access those files, you can change the settings of the OS (for example, use the SELinux settings). If you do not want system administrator (the person who set the OS) to allow to access those files, encrypt those files so that the security administrator’s password is needed.

scheme
scheme

14. Data file relation ( join commands)

scheme

15. UNIX parallel processing

UNIX is a multi-user, multi-tasking OS. You can run multiple jobs for multiple users at the same time.

  • Parallel processing commands used:
  • Specify “& (background)“ when the job starts, to parallelize the job
  • “bg” or ”fg” commands switch between parallelizing and sequencing along the processing
  • “nice“ command changes the priority of parallel processing
  • ”stop“ or ”kill” commands interrupt or stop the job
  • “jobs” or “ps” or “tree” commands allow monitoring the parallel processing

Above mentioned job control commands, allow for writing a shell script to perform parallel processing in any number of processes.

16. Partitioning, indexing and concurrency

Partitioning
There are no special requirements on partitioning. Nodes can be independent servers or virtualized (i.e Docker). The only recommendation we provide is to leave 10% disk space available for our data operations inside the disk.

Indexing
There are no special requirements for indexing as everything is executed on the UNIX file system as a text file.

Concurrency
Unicage is based on UNIX fundamentals where multithreading, concurrent and exclusive processes are allowed. Our suite of commands contains blocking commands as well as atomic writing.

17. Scalability and workload management support

Scalability
Unicage scales quasi-linearly with extra hardware. The cluster version commands provide an automatic map and reduce process.

Workload Management support
There are multiple frameworks that control UNIX processes. As an example for general workload management commands such as “ulimit” or “nice” can be used. Enforcing detail management usually is handled by sub-OS functions such as “cgroup” or “jail”.

18. Handling of image files

Use the image processing command that has been published in the UNIX / Linux distribution (examples for ImageDisk functions):
• Conversion of the image format
convert <original file.extension> <file name after conversion.extension> (e.g. to convert from JPG to PPM use convert test.jpg test.ppm)
• Monochroming
In the conversion of the image format, image in color can be converted to a format in monochrome (e.g. convert test.jpg test.pgm)
• Image scaling
Use “convert –scale” command: convert -scale 30% test.jpg test.ppm
• Trimming
convert -crop ordinate of upper-left, abscissa of upper-left+width+height filename newfilename
• Create animated GIF from a plurality of images
convert image1 image2 image3 … output_animation.gif
• Binding of a plurality of images
Create a catalog screen for a plurality of images montage Image1 Image2 Image3
… Output_catalog_file