Why Hadoop on Cygwin is a bad idea?
Cygwin is a DLL (cygwin1.dll) which acts as a Linux API emulation layer providing substantial Linux API functionality and a collection of tools which provide Linux look and feel. Although cygwin is a really nice emulation layer, it is is not 24x7 ready. Running Hadoop on Cygwin on production servers is a bad idea because of the following reasons:- First of all it is officially "for development purposes only"
- It can be quite tricky to install Cygwin and SSHD components on all of your servers.
- Like any other software Cygwin has its own bugs, and these bugs will be added to the bugs you already have in Hadoop. Sometimes you will end up with something like:
2010-xx-xx xx:xx:xx,430 WARN mapred.TaskTracker - Error initializing attempt_201001280757_0129_m_000002_0:
org.apache.hadoop.util.Shell$ExitCodeException: assertion "root_idx != -1" failed: file "/ext/build/netrel/src/cygwin-1.7.1-1/winsup/cygwin/mount.cc", line 363, function: void mount_info::init()
Stack trace:
Frame Function Args
00289984 77461184 (00000084, 0000EA60, 00000000, 00289AA8)
00289998 77461138 (00000084, 0000EA60, 000000A4, 00289A8C)
...
End of stack trace
- Windows has a slow process startup time compared to Linux. At the same time Hadoop does some of its job by running shell commands (measuring disk size, files size, starting Mapper, Reducer). Even if it works well in Linux, for Windows it results in a bad perfomance
Cluster Setup
In this article I make an assumption that you are installing Hadoop on a single machine. For multi-server setup please repeat all steps from the document for all your servers.First of all download Hadoop 0.20.2 from Apache mirrors site and configure it. Please note that you should use Windows path separator "\" for paths to files or folders on local filesystem.
Now you'll need patched Hadoop, Windows shell scripts and Java Service Wrapper configuration files to be able to run JobTracker, NameNode, TaskTracker and DataNode as Windows servers. All these components you can download from Hadoop Jira. Please download file Hadoop-0.20.2-patched.zip. In case you want to build Hadoop by yourself, read Building Patched Hadoop section of the document.
Unpack downloaded archive to the directory of your choise and copy:
- hadoop-0.20.2-core.jar file and service folder to the root of your Hadoop installation
- cpappend.bat, hadoop.bat files from bin folder to the bin folder of your Hadoop installation
- commons-compress-1.0.jar, jna-3.2.2.jar, commons-io-1.4.jar from lib folder to the lib folder of your Hadoop installation
Start Windows Command Shell and go to the service\bin folder in your Hadoop installation. If you are doing an installation on Windows 7 or Windows 2008 start Command Shell as system administrator. Run commands
InstallService.bat ..\conf\JobTracker.confYou will be asked to input the password for account you set in HADOOP_USER environment variable and should see following output
InstallService.bat ..\conf\NameNode.conf
InstallService.bat ..\conf\TaskTracker.conf
InstallService.bat ..\conf\DataNode.conf
wrapper | Hadoop XXXXXXX installed.At last you should format the DFS filesystem. To do it go to the bin folder in the root of your Hadoop and run shell command
hadoop.bat namenode -formatNow you are ready to start Hadoop. Run Services (services.msc) and start services in following order:
- Hadoop NameNode
- Hadoop DataNode
- Hadoop JobTracker
- Hadoop TaskTracker
Cluster Deinstallation
To remove services you should go to the service\bin directory of Hadoop and run shell commands:UninstallService.bat ..\conf\JobTracker.confThis commands will stop all Hadoop Windows services and will remove them.
UninstallService.bat ..\conf\NameNode.conf
UninstallService.bat ..\conf\TaskTracker.conf
UninstallService.bat ..\conf\DataNode.conf
How does it work?
Hadoop uses Linux shell commands to accomplish some of its tasks. For example, it uses linux df and du commands to measure folder size and to get file system disk space usage. We implemented this functionality with help of JNA. With JNA we have an access to native shared libraries Kernel32.dll and Advapi32.dll.Building Patched Hadoop From Source
You can build Hadoop both on Windows and Linux. To be able to build Hadoop on Windows you will need Cygwin. First checkout Hadoop 0.20.2 source code and our patch from Hadoop Jira. Put the patch to the folder where you've checked out Hadoop and apply it by issuingpatch -p0 < HADOOP-6767.patch
Now simply build Hadoop
ant clean jarBuilt Hadoop will be located in the build folder
Shortcomings
Although we tried to test our patch as strongly as we can, there might be numerous bags in it. Here is a list of known shortcoming of the patch:- We haven't tested patched Hadoop with contributed modules
- JNA library is provided under the LGPL 2.1 license which is not fully compatible with the license of Hadoop
- I have only patched Hadoop 0.20.2. But I am planning to provide a patch for Hadoop 0.18 and Hadoop that is currently in trunk later
- JNA is not the best choise for accessing Windows native API functions
I'm dying to hear you progress on implementing this task - I read in your last comment to your jira that you are going to change your patch a little. I'm also facing issue deploying hadoop on big number of windows machines because of cygwin - I wish I could help but I'm not so good at Java.
ОтветитьУдалитьI've updated an issue. Batch scripts and scripts for running Hadoop as Windows services have been added. But these changes doesn't allow to run Hadoop w/o Cygwin
ОтветитьУдалитьCan you tell me if we need cygwin installed though we are not running hadoop on it? I saw that in hadoop.bat, the path to cygwin is being added.
ОтветитьУдалитьwhen it asks for the password it appends ".\" in front of the username as set in HADOOP_USER.
ОтветитьУдалитьexample, HADOOP_USER=user
Please input the password for account '.\user':
yes, u need to enable the user to logon as a service..
ОтветитьУдалитьHello Orlov,
ОтветитьУдалитьCan I install Hadoop 2.203 also with the same patch? Really appreciate the help.
Thanks
Babu
Hi
ОтветитьУдалитьI am getting this error when trying to start name node in Windows XP
The Hadoop NameNode service is starting..
The Hadoop NameNode service could not be started.
A service specific error occurred: 4294967295.
More help is available by typing NET HELPMSG 3547.
How to resolve this issue.
Nice post you share with us
ОтветитьУдалитьNewcastle office tinting
I'm running Sun Java 1.6.0_35-b10 with the windows port of hadoop 0.20.2, and when I
ОтветитьУдалитьrun 'hadoop namenode -format', I get:
The java class could not be loaded. java.lang.UnsupportedClassVersionError: (org/apache/hadoop/hdfs/server/namenode/NameNode) bad major version at offset=6
I also cannot start the windows services - I get "Windows could not start Hadoop NameNode on Local Computer.
For more information, review the system event log. If this is a non-Microsoft service, contact the service
vendor, and refer to service-specific error code 1.
Thanks for your efforts.
Informative blog .. Explanation are very clear so easy to understand and having more useful information which really helpful to my career
ОтветитьУдалитьhadoop training institute in velachery | big data training institute in velachery
Great and helpful blog to everyone.. Installation procedure are very clear and step by so easy to understand.. All installation commands are very clear and i learnt installation procedure easily form this blog so i install hadoop in my system very quickly.. thanks a lot for sharing this blog to us...
ОтветитьУдалитьhadoop training in chennai tambaram | big data training in chennai tambaram