Blogs

Node.JS

Griffith's picture

Node.js is an evented I/O framework built on top of Google’s V8 JavaScript engine; its design is influenced by systems like Ruby’s Event Machine or Python’s Twisted. Node’s goal is to provide an easy way to build high performance, real-time and scalable web applications.

JavaScript has traditionally only run in the web browser. In recent years, projects such as CommonJS, Jaxer and Narwhal reflect the considerable interest in bringing JavaScript into the server side as well. In contrast to these concurrency models where OS threads are employed, Node is event-based rather than thread based. Thread based model often has the disadvantage of not scaling well with many long-lived connections necessary in real-time applications, becoming relatively inefficient and complex.
Node takes an alternative approach by telling the OS that it should be notified when a new connection is made, and then it goes to sleep. In the event of a new connection, the callback is executed; each connection is only a small heap allocation. This results in a much better memory efficiency under high-loads than systems which allocated 2mb thread stacks for each connection. Furthermore, Node is free of locks: almost no function in Node directly performs I/O, so the process never blocks. The programmers won’t need to worry about dead-locking the process.

Node’ advantage comes from the fact that most thread based models spend the majority of their time waiting for I/O operations which are much slower than memory operations. Node’s I/O operations are asynchronous, which means that it can continue to process incoming requests while the I/O operation is taking place.

The following is an example of a web server written in Node which responds with “Hello World” for every request.

var sys = require(‘sys’), http = require(‘http’);

http.createServer(function(request, response) {
response.writeHead(200, {‘Content-Type’: ‘text/plain’});
res.end(‘Hello World\n’);
}).listen(8124);

sys.puts(‘Server running at http://127.0.0.1:8124’);

This simple script imports the sys and http modules, and creates an HTTP server. The anonymous functions passed to http.createServer will be called every time a request is made to the server.

Node is a very exciting technology built on top of another powerful technology, V8. It has gathered a lot of attention within the technology community, and with its great module system, there are many third party modules available for just about everything.

Opera unite: Your Browser is Now a Web Server

Ricky Wu's picture

New Opera Unite technique blurs the boundary between client and server after new version of Opera browser released. You can also have your own dedicated web server by walking through few step of simple setting up. With opera unite you need to find web hosting no more but can sharing your files, documents, videos and pictures to anyone who permitted to access you web host. Of course you should open up your opera unite and keep your computer awake to ensure these service continuing.

Opera unite let your PC acts like a client or a server. Not like the traditional installation of  the web server, It simplify the setting steps make user configure their own server more convenient and easy, such like reduce the setting up of port forwarding in traditional network setting. And it has characteristic of cross platform and structure based on open architecture network also reduce the complexity of web service developing for developers.


Opera unite comes with six basic services, File Sharing, Fridge, Media Player, Photo Sharing, The Lounge, and Web server etc. The File Sharing service allows you sharing any type of files with friends no matter how big it is. And you also can make your own access limitation rule to make file sharing more private. You can also browse your music and play it directly through Unite Media Player in anywhere you want the only require is you need to connect to internet. The Lounge creates a chat room that you can host in your computer, Fridge for friends and family to leave virtual sticky notes on. And it provides a more private and secure platform to let people possible to pass instant message without install any instant message applications.


In the future development, Opera Unite provides an open architecture platform that broken old fashioned client-server architecture rule. Maybe the easy to use able to provides an alternative option of peer-to-peer model to online users, even if it will take place the recently centralized peer-to-peer community under development of network in the future. In this architecture, users manage their private information on their own host, so personal information will share with others in more safe and reliable way. For developers, this new technique provides a lower-cost system required method to built-up development environment, and accelerates development cycles for network service development. Besides, the open network architecture has an advantage in flexibility of diversity of service development. For example, if recent online games such like ‘Happy Farm’ reconstruct without centralized model, I think it will become smaller and growth in various way.

Reference:

unite.opera.com

Read White Web: Your Browser is Now a Web Server: Opera Includes Opera Unite in Opera 10.10

Cassandra Introduction -- data model

Introduction:

With the more and more data insertions and queries from the database, we may face the situation that we need to scale out the architecture by increasing new machines to handle the amount of data. However, in the traditional MySQL database, it needs a lot of work to add a new machine (i.e. shading, we partition the data into different machines). And sometimes only key-value queries are needed instead of JOIN operation. We can't help but think that if there is an alternative solution for database system scalability. By searching on the internet, we find many distributed key-value database are develop for this situation. Among these database systems, Cassandra is a java-based distributed key-value database which is created by Facebook. It is different from MySQL which contains the JOIN operation, Cassandra is good at dealing with the distributed data. You may view the whole cluster as a big hash table with all fault tolerant and data partition are handle by it. It provides "incremental scalability" (which means you can increase throughput by adding new nodes). And Cassandra also supports "Column" feature, it is more convenient than only key-value database systems.

Let me show you the key elements of Cassandra :


Basic key-value database:

Table['key1'] = value1

With Column feature:


Table['Key1']['column family1']['Column1'] = Vaule1


Data Model:

In Cassandra, it can be thought of as a four or five dimensional hash table. From top to bottom, the hierarchy looks like this.



So the query will look like this:

get <ksp>.<cf>['<key>']['<col>']                             Get a column value.
get <ksp>.<cf>['<key>']['<super>']['<col>']              Get a sub column value.



Key Space:

    In Cassandra, you can define many Key Space. You can think it as the Table in MySQL. It contains {Row, [ColumnFamily]} list. Normally one Key Space per application.


Row:

    For row key, you can have data from relative Column Family. The data in each Column Family is sorted according row key's order. The row key does not have to contains data in all column family.

        
Column Family:

    In Column Family, it contains a list of Column or a list of Super Column. You must define it in config before Cassandra start. And each Column Family is stored in a separate file. The number of column in each column family is unlimited.


Column:

    It is the smallest element  of data, and it only contains a name, a value, and a timestamp. You can add new or delete column at anytime.


Super Column:

    Super Column is the container to  contain Columns.


Architecture:

Cassandra use consistent hash to do key distribution and partition. Each node in Cassandra cluster will take a token (0<token<2^32) in the ring. The size of the ring is 2^32. When the key is coming, it will make the md5 hash for the key and find the smallest token which is larger than the key md5. The the key is mapping the correspond node according to the token, so the data will be store in the corresponding node.

Like the following example, the key will be inserted into node 2.


Replicate method:

If you want to store two replicas of data in Cassandra cluster. It will store data in the next two nodes.

Adding a new node:

In consistent hash method, adding a new node will only affect the nodes in neighbors. In this case, we do not need to rehash all data. Some data store in node 1 will now store in new node 4. The new node will choose a token randomly, and find the corresponding location according to the md5 hash. 

Reference:

Login with Facebook and retrieve user identities

Hank Chen's picture

Facebook, which is abbreviated as FB, is a social-network website. Many websites integrate FB for promotions. How to integrate FB into your website? There are
1.Your options for API:
Facebook official API can be divide to two parts: one part is web programming, which is implemented by JavaScript or PHP; another port is mobile programming for iOS or android. Because FB is so popular recently, many program languages also develop APIs for FB, such as SilverLight, Flash, .Net or Java. And this article choose JavaScript for example.
2.Before developing:
2.1.Sign up for a Facebook account.
2.2.Login your Facebook account and sign up for “Developer”. You can get information from following URL:
http://sofree.cc/fb-app-1/
3.Start creating your Facebook APP:
3.1.After complete Step 2, click the button “Create New APP”, to get your App ID and APP secret, just like the following screenshot with red box. Then, fill out the “Basic Info” form, like the following screenshot with blue box. For example, our website for testing is “me.cellpoint.com”, so you fill “me.cellpoint.com” in the field “App Domain”, and fill “http://me.cellpoint.com” in “URL”. Finally, click “Save” for complete your work.

3.2.In the folder “me.cellopoint.com”, add the new file “channel.html” and “index.html”, and open “channel.html” to add the JavaScript code as follows:

<script src="//connect.facebook.net/en_US/all.js"></script>
3.3.Open the file “index.html”, pasting the following codes into this file, and edit those two lines of code which is marked with red. Note if your JavaScript code paste after the <body> tag.

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:fb="http://www.facebook.com/2008/fbml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Login</title>
</head>
<body>
<div id="fb-root"></div>
<script src="//connect.facebook.net/en_US/all.js"></script>
<script>
window.fbAsyncInit = function() {
FB.init({
appId : 'xxxxxxxxxxx',
// App ID, please replace it to the App ID which you retrieved from Step 3.1
channelURL : '//me.cellopoint.com/channel.html',
// Please replace it to the real URL that you setup for the channel.html in Step 3.2

status : true, // check login status
cookie : true, // enable cookies to allow the server to access the session
oauth : true, // enable OAuth 2.0
xfbml : true // parse XFBML
});
};
</script>
</body>
</html>
3.4.Login and retrieve member’s personal information
Add JavaScript code as follows, and all the property of callback described as follows:
a.userID: the login user’s ID, which is retrieved as the following code marked in red. If your website try to integrate FB for login, you can identity your member by this ID.
b.If you want to retrieve user’s email, you have to setup the following code which is marked in green.
c.You can retrieve user name and email by using “/me” from FB.api as following code which is marked in blue.
function FBLogin(){
FB.login(function(response) {
if (response.authResponse) {
alert('Success');
alert('UserID' + response.authResponse.userID);
FB.api('/me', function(response) {
alert('UserName' + response.name);
alert('UserEmail' + response.email);
});
} else {
alert('Failed');
}
}, {scope: 'email'});
}
3.5.Logout: add JavaScript code as follows.
function FBLoginOut(){
FB.logout(function(response) {
alert('Logged out.');
});
}
4.If you want to remove this application, you can find out the option which is called “Apps” on your personal page, find out this application, and remove it.
5.Reference: http://developers.facebook.com/docs/reference/javascript/
 

Libgtop

David Lee's picture

How to get the resource usage of Linux system, such as memory and CPU utilization, at the runtime of a process? We can read the file of system in the directory /proc/<process id>/stat, or we can use the “top” command in the shell; Howerver, extra effort is required with both approach because the file or interface need to be parsed before we use it. Here is another method to get the information about resource usage of entire system or a specific process: Ligbtop, a open source library based on C programming.

Libgtop is a library of GNOME project, used to implement the “top” functionality of the desktop environment. It depends on Glib, another library of GNOME. The latest version of Libgtop is 2.28. Noticed that Glib 2.6.0 and Intltool 0.35.0 or later versions need to be installed before we install Libgtop.

In general, the CPU utilization is caculated according the time CPU spend in different mode. These usually can be divided into user mode, nice mode, system(kernel) mode and idle mode. We can use the API of Libgtop to get the CPU time (clock clicks) of each mode from system boot. For example, the source code below can be used to caculate the CPU utilization.

#include <glibtop>
#include <glibtop/cpu.h>

double cpu_rate;
int dt, du, dn, ds;
glibtop_cpu cpu_begin,cpu_end;
glibtop_get_cpu(&cpu_begin);
sleep(1);
glibtop_get_cpu(&cpu_end);
dt = cpu_end.total - cpu.begin.total;
du = cpu_end.user - cpu.begin.user;
dn = cpu_end.nice - cpu.begin.nice;
ds = cpu_end.sys - cpu.begin.sys;
cpu_rate = 100.0 * (du+dn+ds) / dt

Note that we need to get the clock click count at two different points in time, so the function glibtop_get_cpu is called twice. On the other hand, the monitor of memory utilization is much more simply, as:

#include <glibtop>
#include <glibtop/mem.h>

double mem_rate;
glibtop_mem memory;
glibtop_get_mem(&memory);
mem_rate = 100.0 * memory.used / memory.total;

There are a variety of types of resource can be monitored by Libgtop. In addition to system CPU and memory utilization described above, also includes CPU and memory utilization of specific process, swap, file system, network interface, and so on. The Detail API and data structure can refer to GNOME’s official website: http://developer.gnome.org/libgtop/

Introduction of Google File System

David Lee's picture

Why can Google dominate the search engine market? One important reason is the excellent performance relies on the file system. Google has designed a unique distributed file system to meet its huge storage demand, known as Google File System (GFS). Google did not release GFS as open source software, but still released some technical details, including an official paper.

There are two mainly differences between GFS and traditional distributed file system. First, component failures are the norm rather than the exception. The failures can be caused by application bugs, operating system bugs, human errors, and even hardware or network problems. Since even the expensive hard disk device can not completely exclude all failures, Google just simply builds its storage machine by mutiple inexpensive comodity, and against failures through integrate constant monitoring, error detection, fault tolerance, and automatic recovery to GFS.

Second, most files are mutated by appending new data rather than overwriting or removing existing data. Once written, data are usually need only to be readable but not writable. And most reading operations are “large streaming reads”, where individual operations typically read hundreds of KBs, more commonly 1 MB or more. Notice that the system stores a modest number of large files, each typically 100 MB or larger in size. GFS support small files, but does not optimize for them.

The architecture of GFS similar to the supernode (Master) and distributed nodes (chunkservers) approach. Real data will be stored in chunkservers, which report their state to Master periodically. When a client wants to read a file, it query to Master about the state of target chunkserver, and Master response the location of chunkserver if it is in idle stage. Hence the client can request chunk data to the chunkserver.

GFS supports the large volume and flows for Google search engine. On the other hand, BigTable, a database system used by a number of Google applications such as Gmail, Google Maps, Youtube and other cloud services, also built on GFS. We can say that GFS is the killer technology in the cloud generation.

More details can refer to the GFS: http://labs.google.com/papers/gfs.html

Semaphore functions in PHP

Ruby Lin's picture

Semaphore is a variable or abstract data type that provides a simple but useful abstraction for controlling access by multiple processes to a common resource in a parallel programming environment.

A useful way to think of a semaphore is as a record of how many units of a particular resource are available, coupled with operations to safely (i.e. without race conditions) adjust that record as units are required or become free, and if necessary wait until a unit of the resource becomes available.

Semaphores are a useful tool in the prevention of race conditions and deadlocks; however, their use is by no means a guarantee that a program is free from these problems. Semaphores which allow an arbitrary resource count are called counting semaphores, whilst semaphores which are restricted to the values 0 and 1 (or locked/unlocked, unavailable/available) are called binary semaphores.

The following are semaphore functions in PHP:
int ftok (string $pathname, string $proj) - Convert a pathname and a project identifier to a System V IPC key.

sem_acquire (resource $sem_identifier) - Acquire a semaphore.

resource sem_get (int $key [,int $max_acquire = 1 [,int $perm = 0666 [,int $auto_release = 1]]]) - Get a semaphore id.

bool sem_release (resource $sem_identifier) - Release a semaphore.

bool sem_remove (resource $sem_identifier) - Remove a semaphore.

resource shm_attach (int $key [, int $memsize [, int $perm]]) - Creates or open a shared memory segment.

bool shm_detach (resource $shm_identifier) - Disconnects from shared memory segment.

mixed shm_get_var (resource $shm_identifier, int $variable_key) - Returns a variable from shared memory.

bool shm_has_var (resource $shm_identifier, int $variable_key) - Check whether a specific entry exists.

bool shm_put_var (resource $shm_identifier, int $variable_key, mixed $variable) - Inserts or updates a variable in shared memory.

bool shm_remove_var (resource $shm_identifier, int $variable_key) - Removes a variable from shared memory.

bool shm_remove (resource $shm_identifier) - Removes shared memory from Unix systems.

http://php.net/manual/en/book.sem.php

GNU libextractor

Shawn Lin's picture

Introduction

GNU libextractor is GNU’s library for extracting meta data from files. Meta data includes format information (such as mime type, image dimensions, color depth, recording frequency), content descriptions (such as document title or document description) and copyright information (such as license, author and contributors). Currently, libextractor supports the following formats: HTML, PDF, PS, OLE2 (DOC, XLS, PPT), OpenOffice (sxw), StarOffice (sdw), DVI, MAN, FLAC, MP3 (ID3v1 and ID3v2), NSF(E) (NES music), SID (C64 music), OGG, WAV, EXIV2, JPEG, GIF, PNG, TIFF, DEB, RPM, TAR(.GZ), ZIP, ELF, S3M (Scream Tracker 3), XM (eXtended Module), IT (Impulse Tracker), FLV, REAL, RIFF (AVI), MPEG, QT and ASF. Also, various additional MIME types are detected.

Libextractor is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License. GNU libextractor uses plugins to handle various file formats. Technically a plugin can support multiple file formats; however, most plugins only support one particular format. By default, GNU libextractor will use all plugins that are available and found in the plugin installation directory. Applications can request the use of only specific plugins or the exclusion of certain plugins.

Example for using dynamic library

// hello.c
#include <extractor.h>
int main()
{
struct EXTRACTOR_PluginList *el;
el = EXTRACTOR_plugin_load_defaults (EXTRACTOR_OPTION_DEFAULT_POLICY);
// ...
EXTRACTOR_plugin_remove_all (el);
return 0;
}

You can then compile the example using
$ gcc \ -I/Library/Frameworks/Extractor.framework/Versions/Current/include \
-o hello hello.c \
-L/Library/Frameworks/Extractor.framework/Versions/Current/lib \
-lextractor

Plugin management

C Struct: EXTRACTOR_PluginList

A plugin list represents a set of GNU libextractor plugins.

Function: void EXTRACTOR_plugin_remove_all (struct EXTRACTOR_PluginList *plugins)

Unload all of the plugins in the given list.

Function: struct EXTRACTOR_PluginList * EXTRACTOR_plugin_remove (struct EXTRACTOR_PluginList *plugins, const char*name)

Unloads a particular plugin. The given name should be the short name of the plugin, for example “mime” for the mime-type extractor or “mpeg” for the MPEG extractor.

Function: struct EXTRACTOR_PluginList
* EXTRACTOR_plugin_add (struct EXTRACTOR_PluginList *plugins, const char* name,const char* options, enum EXTRACTOR_Options flags)

Loads a particular plugin.

Function: struct EXTRACTOR_PluginList
* EXTRACTOR_plugin_add_config (struct EXTRACTOR_PluginList *plugins, const char* config, enum EXTRACTOR_Options flags)

Loads and unloads plugins based on a configuration string, modifying the existing list, which can be NULL. The string has the format “[-]NAME(OPTIONS){:[-]NAME(OPTIONS)}*”. Prefixing the plugin name with a “-” means that the plugin should be unloaded.

Function: struct EXTRACTOR_PluginList
* EXTRACTOR_plugin_add_defaults (enum EXTRACTOR_Options flags)

Loads all of the plugins in the plugin directory.

Example for a minimal extract method

The following example shows how a plugin can return the mime type of a file.

int
EXTRACTOR_mymime_extract
(const char *data,
size_t data_size,
EXTRACTOR_MetaDataProcessor proc,
void *proc_cls,
const char * options)
{
if (data_size < 4)
return 0;
if (0 != memcmp (data, "\177ELF", 4))
return 0;
if (0 != proc (proc_cls,
"mymime",
EXTRACTOR_METATYPE_MIMETYPE,
EXTRACTOR_METAFORMAT_UTF8,
"text/plain",
"application/x-executable",
1 + strlen("application/x-executable")))
return 1;
/* more calls to 'proc' here as needed */
return 0;
}

Internal utility functions

‘convert_numeric.h’ defines various conversion functions for numbers (in particular, byte-order conversion for floating point numbers).

‘unzip.h’ defines an API for accessing compressed files.

‘pack.h’ provides an interpreter for unpacking structs of integer numbers from streams and converting from big or little endian to host byte order at the same time.

‘convert.h’ provides a function for character set conversion described below.

Function: char * EXTRACTOR_common_convert_to_utf8(const char *input, size_t len, const char * charset)
which can be used to easily convert text from any character set to UTF-8.

Conclusion

In short, for us doing frequent message analyses, GNU libextractor not only helped us to find the correct file format, you can also write a variety of plugin tools, analysis of various text formats, find the corresponding MIME type. In terms of email, it can be said to be an indispensable tool.

What is memcache?

HH Tu's picture

Today, I will introduce an useful technique in fetching database - Memcache. it is a distributed memory caching system, we can build a highly efficient cloud system with it. The basic concept behind this method is to use key-based structure to fetch and store data into memory. The original idea comes from Brad Fitzpatrick who used this method to enhance LiveJournal.com(2003). There are lots of website which use this method: LiveJournal, Wikipedia, Flickr, Twitter, Youtube, Digg, WordPress.com…etc. It can reduce almost all the databases loading time, and has better access and resource utilization to the database when a Memcache miss happened. It got key-based cache & distributed memory object caching system, but the authentication needs to control by the users.

It is good to store frequently used information to reduce the need to retrieve. The simplest example is like when you browsing on the Internet, most of the website contents will be downloaded into your folder, it is used to improve the speed when you browsing same website in the next time. Memcache system use the same viewpoint. It takes part of your computer memory to make your computer faster access, deployed and accessed from anywhere over a network, and you can create more and more cache as you want(of course, you need enough memory), and even more, it treats all cache as one single node which means you can combine several computer memory and use together!! What a wonderful mechanism. All operations should run in O(1) time.

Here is a simple example to illustrate the usefulness in combing memory together. We fetch data from a server every day, and we want to speed up, so we add one more server, how to use it efficiently?

In pic(1), we got two server two memory, in order to ensure same results when you stored and retrieved from any server, you need to copy every data into another server's memory, it wastes time and memory, not a good setup. In Pic(2), we use Memcache method, then you can store and retrieve from same location in your web servers, there will be no in-consonance happened. More memory, more cache!!

When should we use Memcache? when you use lots of “SELECT * from XXX” from database, and have high probability to use and use again, you can use it happily. Here is an easy analysis, when you start to use Memcache, you can consider the following situations: 1. The search timing(How often) 2. The hit timing(What accuracy) 3. Validate?(How long). Of course you need to pay a little extra works to handle this, this should be included.

I give a brief procedure for how to implement your code with your database. Assuming you have lots of servers need to connect to each other for better memory using. Here's an simple example flow: Your clients ask servers for data, and your servers ask Memcache first, if the data is not in your Memcache does not have this data, then you go to fetch your database. Once you got the new data, remember store it into Memcache to increase the hit rate in the next time.

In the above example, you can understand Memcache is implemented as a network daemon. Most people use PHP or C/C++ to communicate with Memcache. I use it in the Linux system, if you wants to use it with C/C++, you need to install some basic package in your Linux: 1. libevent 2. Memcache 3. libmemcache.

The other details are all in the Official website: http://memcached.org/

What is Node.js?

Paul Chien's picture

JavaScript has traditionally only run in the web browser, but recently there has been considerable interest in bringing it to the server side as well, thanks to the CommonJS project. Other server-side JavaScript environments include Jaxer and Narwhal. However, Node.js is a bit different from these solutions, because it is event-based rather than thread based. Web servers like Apache that are used to serve PHP and other CGI scripts are thread based because they spawn a system thread for every incoming request. While this is fine for many applications, the thread based model does not scale well with many long-lived connections like you would need in order to serve real-time applications like Friendfeed or Google Wave.

Node.js, uses an event loop instead of threads, and is able to scale to millions of concurrent connections. It takes advantage of the fact that servers spend most of their time waiting for I/O operations, like reading a file from a hard drive, accessing an external web service or waiting for a file to finish being uploaded, because these operations are much slower than in memory operations. Every I/O operation in Node.js is asynchronous, meaning that the server can continue to process incoming requests while the I/O operation is taking place. JavaScript is extremely well suited to event-based programming because it has anonymous functions and closures which make defining inline callbacks a cinch, and JavaScript developers already know how to program in this way. This event-based model makes Node.js very fast, and makes scaling real-time applications very easy.

http://docs.pylonsproject.org/projects/pyramid/1.0/narr/introduction.html