Shawn Lin's blog

GNU libextractor

Shawn Lin's picture


GNU libextractor is GNU’s library for extracting meta data from files. Meta data includes format information (such as mime type, image dimensions, color depth, recording frequency), content descriptions (such as document title or document description) and copyright information (such as license, author and contributors). Currently, libextractor supports the following formats: HTML, PDF, PS, OLE2 (DOC, XLS, PPT), OpenOffice (sxw), StarOffice (sdw), DVI, MAN, FLAC, MP3 (ID3v1 and ID3v2), NSF(E) (NES music), SID (C64 music), OGG, WAV, EXIV2, JPEG, GIF, PNG, TIFF, DEB, RPM, TAR(.GZ), ZIP, ELF, S3M (Scream Tracker 3), XM (eXtended Module), IT (Impulse Tracker), FLV, REAL, RIFF (AVI), MPEG, QT and ASF. Also, various additional MIME types are detected.

Libextractor is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License. GNU libextractor uses plugins to handle various file formats. Technically a plugin can support multiple file formats; however, most plugins only support one particular format. By default, GNU libextractor will use all plugins that are available and found in the plugin installation directory. Applications can request the use of only specific plugins or the exclusion of certain plugins.  read more »

Protocol Buffers

Shawn Lin's picture


  • flexible, efficient, automated mechanism for serializing structured data.
  • think XML, but smaller, faster, and simpler.
  • use special generated source code to easily write and read your structured data.
  • update your data structure without breaking deployed programs that are compiled against the "old" format.

Why not just use XML?
Protocol buffers have many advantages over XML for serializing structured data. Protocol buffers:

* are simpler
* are 3 to 10 times smaller
* are 20 to 100 times faster
* are less ambiguous
* generate data access classes that are easier to use programmatically  read more »

Parallel programming language Erlang!

Shawn Lin's picture

 Telecommunication companies like Nortel Networks and T-Mobile develop their system with Erlang to achieve ‘Concurrent’ and ‘Fault-Torrent’ capabilities. In addition to concurrent and fault-tolerant, multi-core and Hyper-Threading (HT) processor environments are very good environments for the Erlang language .

Erlang solves one of the most pressing problems facing developers today: how to write reliable, concurrent, high-performance systems. It's used worldwide by companies who need to produce reliable, efficient, and scalable applications.

Moore's Law is the observation that the amount you can do on a single chip doubles every two years, but Moore's Law is taking a detour. Rather than producing faster and faster processors, companies such as Intel and AMD are producing multi-core devices: single chips containing two, four, or more processors. If your programs aren't concurrent, they'll only run on a single processor at a time. Your users will think that your code is slow.

Erlang is a programming language designed for building highly parallel, distributed, fault-tolerant systems. It has been used commercially for many years to build massive fault-tolerated systems that run for years with minimal failures.

Erlang programs run seamlessly on multi-core computers: this means your Erlang program should run a lot faster on a quad-core processor than on a single core processor, all without you having to change a line of code.

Developing systems with Erlang has the following benefits:

  • Write a program, move to the implementation of a multi-core environment, the speed will naturally become faster (or even possible to achieve linear acceleration, n-core to enhance the n-fold).
  • You can write fault-tolerant systems, the computer will restart after crash.
  • You can write a "hot-swap code" system, you can upgrade your code while it is processing, without suspending it.
  • The program is incredibly streamlined.

Erlang's Mnesia provides a database management system (Database Management System, DBMS). Mnesia is an integrated DBMS and can be accessed at a fairly rapid pace. It can be set across a number of separate entities for data replication node to provide fault-tolerant operation.
In addition to Mnesia, you will always use the OTP library when developing systems with Erlang. OTP is a set of Erlang libraries and open source programs, to help the Erlang programs establish industrial grade applications. OTP is Erlang’s source of power; using OTP can be quite easy to write a solid server.

Amazing Graphical Scripting Language

Shawn Lin's picture

Sikuli is a visual technology to automate and test graphical user interfaces (GUI) using images (screenshots). Sikuli includes Sikuli Script, a visual scripting API for Python, and Sikuli IDE, an integrated development environment for writing visual scripts with screenshots easily. Sikuli Script automates anything you see on the screen without internal API's support. You can programmatically control a web page, a desktop application running on Windows/Linux/Mac OS X, or even an iPhone application running in an emulator.

Sikuli, which read much like a Japanese name, in fact, is an innovative programming language, by a student at MIT (students from Taiwan) and his friends took more than three years to research and generated products.

It is a new concept, the use of image recognition, to the effect of automation of many complex instructions.

As Vgod said:“ The most important revolution of Sikuli is code readability and ease to use. Screenshot directly on the code inside, people can directly ‘see’ what he wants to control, which no one ever thought about. Previously, only programmers were able to write programs using the mysterious alien languages.”

From the automated tools point of view, "SIKULI" is not so unique, but it is unique in the method it uses. We know that programming languages are fairly mature technology tools and thus programmers are used to the idea that the languages are difficult to use or have been hypnotized himself to say such as ". NET has been more useful! "," wow! DELPHI has added a super useful component of the Windows API, "this type of dialogue.

No one ever overturn the concept of the past before the advent of "SIKULI", that came up with a new programming language and creative ways. "SIKULI" really achieved innovation. Screenshot replaced with objects, you do not have to know the Windows API libraries, which can control the window components. Although it has not developed to the concept where one can write stand-alone applications, it can be used as a desktop automation tool. but it does point out a way to tell programmers around the world, "A new program design way to go!"

First, of course, you must download "SIKULI", and installed Java Runtime Environment (JRE) environment in the computer. You can follow the method to easily customize the operation!