a2p accept access acct addftinfo addr2line adjtime afmtodit after aio_cancel aio_error aio_read aio_return aio_suspend aio_waitcomplete aio_write alias aliases alloc anvil append apply apropos ar array as asa asn1parse at atq atrm attemptckalloc attemptckrealloc authlib authtest autopoint awk b64decode b64encode basename batch bc bdes bell bg bgerror biff big5 binary bind bindkey bindtags bindtextdomain bio bitmap blowfish bn bootparams bootptab bounce brandelf break breaksw brk bsdiff bsdtar bsnmpd bspatch bthost btsockstat buffer builtin builtins bunzip2 button byacc bzcat bzegrep bzfgrep bzgrep bzip2 c2ph c89 c99 ca cal calendar canvas cap_mkdb case cat catch catman cc cd cdcontrol chdir checkbutton checknr chflags chfn chgrp chio chkey chmod chown chpass chroot chsh ci ciphers ckalloc ckdist ckfree ckrealloc cksum cleanup clear clipboard clock clock_getres clock_gettime clock_settime close cmp co col colcrt colldef colors colrm column comm command compile_et complete compress concat config connect console continue core courierlogger couriertcpd cp cpan cpio cpp creat crl crontab crunchgen crunchide crypt crypto csh csplit ctags ctm ctm_dequeue ctm_rmail ctm_smail cu cursor cursors cut cvs date dbiprof dbiproxy dc dcgettext dcngettext dd dde default defer deliverquota des destroy devfs df dgettext dgst dh dhparam dialog diff diff3 dig dir dirent dirname dirs discard disktab dngettext do domainname done dprofpp dsa dsaparam dtmfdecode du dup dup2 eaccess ec ecdsa echo echotc ecparam ed edit editrc ee egrep elf elfdump elif else enc enc2xs encoding end endif endsw engine enigma entry env envsubst eof eqn err errno error errstr esac ethers euc eui64 eval event evp ex exec execve exit expand export exports expr extattr extattr_delete_fd extattr_delete_file extattr_get_fd extattr_get_file extattr_set_fd extattr_set_file f77 false famm famx fblocked fbtab fc fchdir fchflags fchmod fchown fcntl fconfigure fcopy fdescfs fdformat fdread fdwrite fetch fg fgrep fhopen fhstat fhstatfs fi file file2c fileevent filename filetest find find2perl finger flex flock flush fmt focus fold font fontedit for foreach fork format forward fpathconf frame from fs fstab fstat fstatfs fsync ftp ftpchroot ftpusers ftruncate futimes g711conv gb2312 gb18030 gbk gcc gcore gcov gdb gencat gendsa genrsa gensnmptree getconf getdents getdirentries getdtablesize getegid geteuid getfacl getfh getfsstat getgid getgroups getitimer getlogin getopt getopts getpeername getpgid getpgrp getpid getppid getpriority getresgid getresuid getrlimit getrusage gets getsid getsockname getsockopt gettext gettextize gettimeofday gettytab getuid glob global gmake goto gperf gprof grab grep grid grn grodvi groff groff_font groff_out groff_tmac grog grolbp grolj4 grops grotty group groups gunzip gzcat gzexe gzip h2ph h2xs hash hashstat hd head help2man hesinfo hexdump history host hostname hosts hosts_access hosts_options hpftodit http hup i386_get_ioperm i386_get_ldt i386_set_ioperm i386_set_ldt i386_vm86 iconv id ident idprio if ifnames253 ifnames259 image imapd incr indent indxbib info infokey inode install instmodsh interp intro introduction ioctl ipcrm ipcs ipf ipftest ipnat ippool ipresend issetugid jail jail_attach jobid jobs join jot kbdcontrol kbdmap kcon kdestroy kdump kenv kevent keycap keylogin keylogout keymap keysyms kgdb kill killall killpg kinit kldfind kldfirstmod kldload kldnext kldstat kldsym kldunload klist kpasswd kqueue kse kse_create kse_exit kse_release kse_switchin kse_thr_interrupt kse_wakeup ktrace label labelframe lam lappend last lastcomm lastlog lchflags lchmod lchown ld ldap ldapadd ldapcompare ldapdelete ldapmodify ldapmodrdn ldappasswd ldapsearch ldapwhoami ldd leave less lesskey lex lgetfh lhash libnetcfg library limit limits lindex link linprocfs linsert lint lio_listio list listbox listen lj4_font lkbib llength lmtp ln load loadfont local locale locate lock lockf log logger login logins logname logout look lookbib lorder lower lp lpq lpr lprm lptest lrange lreplace ls lsearch lseek lset lsort lstat lsvfs lutimes lynx m4 madvise magic mail maildiracl maildirkw maildirmake mailq mailx make makeinfo makewhatis man manpath master mc mcedit mcview md2 md4 md5 mdc2 memory menu menubar menubutton merge mesg message mincore minherit minigzip mkdep mkdir mkfifo mkimapdcert mklocale mknod mkpop3dcert mkstr mktemp mlock mlockall mmap mmroff modfind modfnext modnext modstat moduli more motd mount mprotect mptable msdos msdosfs msgattrib msgcat msgcmp msgcomm msgconv msgen msgexec msgfilter msgfmt msggrep msginit msgmerge msgs msgunfmt msguniq mskanji msql2mysql msync mt munlock munlockall munmap mv myisamchk myisamlog myisampack mysql mysqlaccess mysqladmin mysqlbinlog mysqlcheck mysqld mysqldump mysqld_multi mysqld_safe mysqlhotcopy mysqlimport mysqlshow mysql_config mysql_fix_privilege_tables mysql_zap namespace nanosleep nawk nc ncal ncplist ncplogin ncplogout neqn netconfig netgroup netid netstat networks newaliases newgrp nex nfsstat nfssvc ngettext nice nl nm nmount nohup nologin notify nroff nseq nslookup ntp_adjtime ntp_gettime nvi nview objcopy objdump objformat ocsp od onintr open openssl opieaccess opieinfo opiekey opiekeys opiepasswd option options oqmgr pack package packagens pagesize palette pam_auth panedwindow parray passwd paste patch pathchk pathconf pawd pax pbm pcre pcreapi pcrebuild pcrecallout pcrecompat pcrecpp pcregrep pcrematching pcrepartial pcrepattern pcreperform pcreposix pcreprecompile pcresample pcretest perl perl56delta perl58delta perl561delta perl570delta perl571delta perl572delta perl573delta perl581delta perl582delta perl583delta perl584delta perl585delta perl586delta perl587delta perl588delta perl5004delta perl5005delta perlaix perlamiga perlapi perlapio perlapollo perlartistic perlbeos perlbook perlboot perlbot perlbs2000 perlbug perlcall perlcc perlce perlcheat perlclib perlcn perlcompile perlcygwin perldata perldbmfilter perldebguts perldebtut perldebug perldelta perldgux perldiag perldoc perldos perldsc perlebcdic perlembed perlepoc perlfaq perlfaq1 perlfaq2 perlfaq3 perlfaq4 perlfaq5 perlfaq6 perlfaq7 perlfaq8 perlfaq9 perlfilter perlfork perlform perlfreebsd perlfunc perlglossary perlgpl perlguts perlhack perlhist perlhpux perlhurd perlintern perlintro perliol perlipc perlirix perlivp perljp perlko perllexwarn perllinux perllocale perllol perlmachten perlmacos perlmacosx perlmint perlmod perlmodinstall perlmodlib perlmodstyle perlmpeix perlnetware perlnewmod perlnumber perlobj perlop perlopenbsd perlopentut perlos2 perlos390 perlos400 perlothrtut perlpacktut perlplan9 perlpod perlpodspec perlport perlqnx perlre perlref perlreftut perlrequick perlreref perlretut perlrun perlsec perlsolaris perlstyle perlsub perlsyn perlthrtut perltie perltoc perltodo perltooc perltoot perltrap perltru64 perltw perlunicode perluniintro perlutil perluts perlvar perlvmesa perlvms perlvos perlwin32 perlxs perlxstut perror pfbtops pftp pgrep phones photo pic pickup piconv pid pipe pkcs7 pkcs8 pkcs12 pkg_add pkg_check pkg_create pkg_delete pkg_info pkg_sign pkg_version pkill pl2pm place pod2html pod2latex pod2man pod2text pod2usage podchecker podselect poll popd popup posix_madvise postalias postcat postconf postdrop postfix postkick postlock postlog postmap postqueue postsuper pr pread preadv printcap printenv printf proc procfs profil protocols prove proxymap ps psed psroff pstruct ptrace publickey pushd puts pwd pwrite pwritev qmgr qmqpd quota quotactl radiobutton raise rand ranlib rcp rcs rcsclean rcsdiff rcsfile rcsfreeze rcsintro rcsmerge read readelf readlink readonly readv realpath reboot recv recvfrom recvmsg red ree refer regexp registry regsub rehash remote rename repeat replace req reset resolver resource return rev revoke rfcomm_sppd rfork rhosts ripemd ripemd160 rlog rlogin rm rmd160 rmdir rpc rpcgen rs rsa rsautl rsh rtld rtprio rup ruptime rusers rwall rwho s2p safe sasl sasldblistusers2 saslpasswd2 sbrk scache scale scan sched sched_getparam sched_getscheduler sched_get_priority_max sched_get_priority_min sched_rr_get_interval sched_setparam sched_setscheduler sched_yield scon scp script scrollbar sdiff sed seek select selection semctl semget semop send sendbug sendfile sendmail sendmsg sendto services sess_id set setegid setenv seteuid setfacl setgid setgroups setitimer setlogin setpgid setpgrp setpriority setregid setresgid setresuid setreuid setrlimit setsid setsockopt settc settimeofday setty setuid setvar sftp sh sha sha1 sha256 shar shells shift shmat shmctl shmdt shmget showq shutdown sigaction sigaltstack sigblock sigmask sigpause sigpending sigprocmask sigreturn sigsetmask sigstack sigsuspend sigvec sigwait size slapadd slapcat slapd slapdn slapindex slappasswd slaptest sleep slogin slurpd smbutil smime smtp smtpd socket socketpair sockstat soelim sort source spawn speed spinbox spkac splain split squid squid_ldap_auth squid_ldap_group squid_unix_group sscop ssh sshd_config ssh_config stab startslip stat statfs stop string strings strip stty su subst sum suspend swapoff swapon switch symlink sync sysarch syscall sysconftool sysconftoolcheck systat s_client s_server s_time tabs tail talk tar tbl tclsh tcltest tclvars tcopy tcpdump tcpslice tcsh tee tell telltc telnet term termcap terminfo test texindex texinfo text textdomain tfmtodit tftp then threads time tip tk tkerror tkvars tkwait tlsmgr tmac top toplevel touch tput tr trace trafshow trap troff true truncate truss tset tsort tty ttys type tzfile ui ul ulimit umask unalias uname uncomplete uncompress undelete unexpand unhash unifdef unifdefall uniq units unknown unlimit unlink unmount unset unsetenv until unvis update uplevel uptime upvar usbhidaction usbhidctl users utf8 utimes utmp utrace uudecode uuencode uuidgen vacation variable verify version vfork vgrind vgrindefs vi vidcontrol vidfont view virtual vis vt220keys vwait w wait wait3 wait4 waitpid wall wc wget what whatis where whereis which while who whoami whois window winfo wish wm write writev wtmp x509 xargs xgettext xmlwf xstr xsubpp yacc yes ypcat ypchfn ypchpass ypchsh ypmatch yppasswd ypwhich yyfix zcat zcmp zdiff zegrep zfgrep zforce zgrep zmore znew _exit __syscall
FreeBSD/Linux/UNIX General Commands Manual |
|
PERLEBCDIC(1) Perl Programmers Reference Guide PERLEBCDIC(1) NAME perlebcdic - Considerations for running Perl on EBCDIC platforms DESCRIPTION An exploration of some of the issues facing Perl programmers on EBCDIC based computers. We do not cover localization, internationalization, or multi byte character set issues other than some discussion of UTF-8 and UTF-EBCDIC. Portions that are still incomplete are marked with XXX. COMMON CHARACTER CODE SETS ASCII The American Standard Code for Information Interchange is a set of integers running from 0 to 127 (decimal) that imply character interpre- tation by the display and other system(s) of computers. The range 0..127 can be covered by setting the bits in a 7-bit binary digit, hence the set is sometimes referred to as a "7-bit ASCII". ASCII was described by the American National Standards Institute document ANSI X3.4-1986. It was also described by ISO 646:1991 (with localization for currency symbols). The full ASCII set is given in the table below as the first 128 elements. Languages that can be written adequately with the characters in ASCII include English, Hawaiian, Indonesian, Swahili and some Native American languages. There are many character sets that extend the range of integers from 0..2**7-1 up to 2**8-1, or 8 bit bytes (octets if you prefer). One common one is the ISO 8859-1 character set. ISO 8859 The ISO 8859-$n are a collection of character code sets from the Inter- national Organization for Standardization (ISO) each of which adds characters to the ASCII set that are typically found in European lan- guages many of which are based on the Roman, or Latin, alphabet. Latin 1 (ISO 8859-1) A particular 8-bit extension to ASCII that includes grave and acute accented Latin characters. Languages that can employ ISO 8859-1 include all the languages covered by ASCII as well as Afrikaans, Alba- nian, Basque, Catalan, Danish, Faroese, Finnish, Norwegian, Portuguese, Spanish, and Swedish. Dutch is covered albeit without the ij ligature. French is covered too but without the oe ligature. German can use ISO 8859-1 but must do so without German-style quotation marks. This set is based on Western European extensions to ASCII and is commonly encountered in world wide web work. In IBM character code set identi- fication terminology ISO 8859-1 is also known as CCSID 819 (or some- times 0819 or even 00819). EBCDIC The Extended Binary Coded Decimal Interchange Code refers to a large collection of slightly different single and multi byte coded character sets that are different from ASCII or ISO 8859-1 and typically run on host computers. The EBCDIC encodings derive from 8 bit byte extensions of Hollerith punched card encodings. The layout on the cards was such that high bits were set for the upper and lower case alphabet charac- ters [a-z] and [A-Z], but there were gaps within each latin alphabet range. Some IBM EBCDIC character sets may be known by character code set iden- tification numbers (CCSID numbers) or code page numbers. Leading zero digits in CCSID numbers within this document are insignificant. E.g. CCSID 0037 may be referred to as 37 in places. 13 variant characters Among IBM EBCDIC character code sets there are 13 characters that are often mapped to different integer values. Those characters are known as the 13 "variant" characters and are: \ [ ] { } ^ ~ ! # | $ @ ` 0037 Character code set ID 0037 is a mapping of the ASCII plus Latin-1 char- acters (i.e. ISO 8859-1) to an EBCDIC set. 0037 is used in North Amer- ican English locales on the OS/400 operating system that runs on AS/400 computers. CCSID 37 differs from ISO 8859-1 in 237 places, in other words they agree on only 19 code point values. 1047 Character code set ID 1047 is also a mapping of the ASCII plus Latin-1 characters (i.e. ISO 8859-1) to an EBCDIC set. 1047 is used under Unix System Services for OS/390 or z/OS, and OpenEdition for VM/ESA. CCSID 1047 differs from CCSID 0037 in eight places. POSIX-BC The EBCDIC code page in use on Siemens' BS2000 system is distinct from 1047 and 0037. It is identified below as the POSIX-BC set. Unicode code points versus EBCDIC code points In Unicode terminology a code point is the number assigned to a charac- ter: for example, in EBCDIC the character "A" is usually assigned the number 193. In Unicode the character "A" is assigned the number 65. This causes a problem with the semantics of the pack/unpack "U", which are supposed to pack Unicode code points to characters and back to num- bers. The problem is: which code points to use for code points less than 256? (for 256 and over there's no problem: Unicode code points are used) In EBCDIC, for the low 256 the EBCDIC code points are used. This means that the equivalences pack("U", ord($character)) eq $character unpack("U", $character) == ord $character will hold. (If Unicode code points were applied consistently over all the possible code points, pack("U",ord("A")) would in EBCDIC equal A with acute or chr(101), and unpack("U", "A") would equal 65, or non- breaking space, not 193, or ord "A".) Remaining Perl Unicode problems in EBCDIC o Many of the remaining seem to be related to case-insensitive match- ing: for example, "/[\x{131}]/" (LATIN SMALL LETTER DOTLESS I) does not match "I" case-insensitively, as it should under Unicode. (The match succeeds in ASCII-derived platforms.) o The extensions Unicode::Collate and Unicode::Normalized are not supported under EBCDIC, likewise for the encoding pragma. Unicode and UTF UTF is a Unicode Transformation Format. UTF-8 is a Unicode conforming representation of the Unicode standard that looks very much like ASCII. UTF-EBCDIC is an attempt to represent Unicode characters in an EBCDIC transparent manner. Using Encode Starting from Perl 5.8 you can use the standard new module Encode to translate from EBCDIC to Latin-1 code points use Encode 'from_to'; my %ebcdic = ( 176 => 'cp37', 95 => 'cp1047', 106 => 'posix-bc' ); # $a is in EBCDIC code points from_to($a, $ebcdic{ord '^'}, 'latin1'); # $a is ISO 8859-1 code points and from Latin-1 code points to EBCDIC code points use Encode 'from_to'; my %ebcdic = ( 176 => 'cp37', 95 => 'cp1047', 106 => 'posix-bc' ); # $a is ISO 8859-1 code points from_to($a, 'latin1', $ebcdic{ord '^'}); # $a is in EBCDIC code points For doing I/O it is suggested that you use the autotranslating features of PerlIO, see perluniintro. Since version 5.8 Perl uses the new PerlIO I/O library. This enables you to use different encodings per IO channel. For example you may use use Encode; open($f, ">:encoding(ascii)", "test.ascii"); print $f "Hello World!\n"; open($f, ">:encoding(cp37)", "test.ebcdic"); print $f "Hello World!\n"; open($f, ">:encoding(latin1)", "test.latin1"); print $f "Hello World!\n"; open($f, ">:encoding(utf8)", "test.utf8"); print $f "Hello World!\n"; to get two files containing "Hello World!\n" in ASCII, CP 37 EBCDIC, ISO 8859-1 (Latin-1) (in this example identical to ASCII) respective UTF-EBCDIC (in this example identical to normal EBCDIC). See the docu- mentation of Encode::PerlIO for details. As the PerlIO layer uses raw IO (bytes) internally, all this totally ignores things like the type of your filesystem (ASCII or EBCDIC). SINGLE OCTET TABLES The following tables list the ASCII and Latin 1 ordered sets including the subsets: C0 controls (0..31), ASCII graphics (32..7e), delete (7f), C1 controls (80..9f), and Latin-1 (a.k.a. ISO 8859-1) (a0..ff). In the table non-printing control character names as well as the Latin 1 extensions to ASCII have been labelled with character names roughly corresponding to The Unicode Standard, Version 3.0 albeit with substi- tutions such as s/LATIN// and s/VULGAR// in all cases, s/CAPITAL LET- TER// in some cases, and s/SMALL LETTER ([A-Z])/\l$1/ in some other cases (the "charnames" pragma names unfortunately do not list explicit names for the C0 or C1 control characters). The "names" of the C1 con- trol set (128..159 in ISO 8859-1) listed here are somewhat arbitrary. The differences between the 0037 and 1047 sets are flagged with ***. The differences between the 1047 and POSIX-BC sets are flagged with ###. All ord() numbers listed are decimal. If you would rather see this table listing octal values then run the table (that is, the pod version of this document since this recipe may not work with a pod2_other_format translation) through: recipe 0 perl -ne 'if(/(.{33})(\d+)\s+(\d+)\s+(\d+)\s+(\d+)/)' \ -e '{printf("%s%-9o%-9o%-9o%o\n",$1,$2,$3,$4,$5)}' perlebcdic.pod If you want to retain the UTF-x code points then in script form you might want to write: recipe 1 open(FH," |