Hadoop 2.0.3-alpha Release Notes
These release notes include new developer and user-facing incompatibilities, features, and major improvements.
Changes since Hadoop 2.0.2
- YARN-372.
Minor task reported by Siddharth Seth and fixed by Siddharth Seth
Move InlineDispatcher from hadoop-yarn-server-resourcemanager to hadoop-yarn-common
InlineDispatcher is a utility used in unit tests. Belongs in yarn-common instead of yarn-server-resource-manager.
- YARN-364.
Major bug reported by Jason Lowe and fixed by Jason Lowe
AggregatedLogDeletionService can take too long to delete logs
AggregatedLogDeletionService uses the yarn.log-aggregation.retain-seconds property to determine which logs should be deleted, but it uses the same value to determine how often to check for old logs. This means logs could actually linger up to twice as long as configured.
- YARN-360.
Critical bug reported by Daryn Sharp and fixed by Daryn Sharp
Allow apps to concurrently register tokens for renewal
{{DelegationTokenRenewer#addApplication}} has an unnecessary {{synchronized}} keyword. This serializes job submissions and can add unnecessary latency and/or hang all submissions if there are problems renewing the token.
- YARN-357.
Major bug reported by Daryn Sharp and fixed by Daryn Sharp (resourcemanager)
App submission should not be synchronized
MAPREDUCE-2953 fixed a race condition with querying of app status by making {{RMClientService#submitApplication}} synchronously invoke {{RMAppManager#submitApplication}}. However, the {{synchronized}} keyword was also added to {{RMAppManager#submitApplication}} with the comment:
bq. I made the submitApplication synchronized to keep it consistent with the other routines in RMAppManager although I do not believe it needs it since the rmapp datastructure is already a concurrentMap and I don't see anything else that would be an issue.
It's been observed that app submission latency is being unnecessarily impacted.
- YARN-355.
Blocker bug reported by Daryn Sharp and fixed by Daryn Sharp (resourcemanager)
RM app submission jams under load
The RM performs a loopback connection to itself to renew its own tokens. If app submissions consume all RPC handlers for {{ClientRMProtocol}}, then app submissions block because it cannot loopback to itself to do the renewal.
- YARN-354.
Blocker bug reported by Liang Xie and fixed by Liang Xie
WebAppProxyServer exits immediately after startup
Please see HDFS-4426 for detail, i found the yarn WebAppProxyServer is broken by HADOOP-9181 as well, here's the hot fix, and i verified manually in our test cluster.
I'm really applogized for bring about such trouble...
- YARN-343.
Major bug reported by Thomas Graves and fixed by Xuan Gong (capacityscheduler)
Capacity Scheduler maximum-capacity value -1 is invalid
I tried to start the resource manager using the capacity scheduler with a particular queues maximum-capacity set to -1 which is supposed to disable it according to the docs but I got the following exception:
java.lang.IllegalArgumentException: Illegal value of maximumCapacity -0.01 used in call to setMaxCapacity for queue foo
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.checkMaxCapacity(CSQueueUtils.java:31)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.setupQueueConfigs(LeafQueue.java:220)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.<init>(LeafQueue.java:191)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:310)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:325)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:232)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:202)
- YARN-336.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)
Fair scheduler FIFO scheduling within a queue only allows 1 app at a time
The fair scheduler allows apps to be scheduled in FIFO fashion within a queue. Currently, when this setting is turned on, the scheduler only allows one app to run at a time. While apps submitted earlier should get first priority for allocations, when there is space remaining, other apps should have a change to get at them.
- YARN-334.
Critical bug reported by Thomas Graves and fixed by Thomas Graves
Maven RAT plugin is not checking all source files
yarn side of HADOOP-9097
Running 'mvn apache-rat:check' passes, but running RAT by hand (by downloading the JAR) produces some warnings for Java files, amongst others.
- YARN-331.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)
Fill in missing fair scheduler documentation
In the fair scheduler documentation, a few config options are missing:
locality.threshold.node
locality.threshold.rack
max.assign
aclSubmitApps
minSharePreemptionTimeout
- YARN-330.
Major bug reported by Hitesh Shah and fixed by Sandy Ryza (nodemanager)
Flakey test: TestNodeManagerShutdown#testKillContainersOnShutdown
=Seems to be timing related as the container status RUNNING as returned by the ContainerManager does not really indicate that the container task has been launched. Sleep of 5 seconds is not reliable.
Running org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 9.353 sec <<< FAILURE!
testKillContainersOnShutdown(org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown) Time elapsed: 9283 sec <<< FAILURE!
junit.framework.AssertionFailedError: Did not find sigterm message
at junit.framework.Assert.fail(Assert.java:47)
at junit.framework.Assert.assertTrue(Assert.java:20)
at org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown.testKillContainersOnShutdown(TestNodeManagerShutdown.java:162)
Logs:
2013-01-09 14:13:08,401 INFO [AsyncDispatcher event handler] container.Container (ContainerImpl.java:handle(835)) - Container container_0_0000_01_000000 transitioned from NEW to LOCALIZING
2013-01-09 14:13:08,412 INFO [AsyncDispatcher event handler] localizer.LocalizedResource (LocalizedResource.java:handle(194)) - Resource file:hadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown/tmpDir/scriptFile.sh transitioned from INIT to DOWNLOADING
2013-01-09 14:13:08,412 INFO [AsyncDispatcher event handler] localizer.ResourceLocalizationService (ResourceLocalizationService.java:handle(521)) - Created localizer for container_0_0000_01_000000
2013-01-09 14:13:08,589 INFO [LocalizerRunner for container_0_0000_01_000000] localizer.ResourceLocalizationService (ResourceLocalizationService.java:writeCredentials(895)) - Writing credentials to the nmPrivate file hadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown/nm0/nmPrivate/container_0_0000_01_000000.tokens. Credentials list:
2013-01-09 14:13:08,628 INFO [LocalizerRunner for container_0_0000_01_000000] nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:createUserCacheDirs(373)) - Initializing user nobody
2013-01-09 14:13:08,709 INFO [main] containermanager.ContainerManagerImpl (ContainerManagerImpl.java:getContainerStatus(538)) - Returning container_id {, app_attempt_id {, application_id {, id: 0, cluster_timestamp: 0, }, attemptId: 1, }, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
2013-01-09 14:13:08,781 INFO [LocalizerRunner for container_0_0000_01_000000] nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:startLocalizer(99)) - Copying from hadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown/nm0/nmPrivate/container_0_0000_01_000000.tokens to hadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown/nm0/usercache/nobody/appcache/application_0_0000/container_0_0000_01_000000.tokens
- YARN-328.
Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (resourcemanager)
Use token request messages defined in hadoop common
YARN changes related to HADOOP-9192 to reuse the protobuf messages defined in common.
- YARN-325.
Blocker bug reported by Jason Lowe and fixed by Arun C Murthy (capacityscheduler)
RM CapacityScheduler can deadlock when getQueueInfo() is called and a container is completing
If a client calls getQueueInfo on a parent queue (e.g.: the root queue) and containers are completing then the RM can deadlock. getQueueInfo() locks the ParentQueue and then calls the child queues' getQueueInfo() methods in turn. However when a container completes, it locks the LeafQueue then calls back into the ParentQueue. When the two mix, it's a recipe for deadlock.
Stacktrace to follow.
- YARN-320.
Blocker bug reported by Daryn Sharp and fixed by Daryn Sharp (resourcemanager)
RM should always be able to renew its own tokens
YARN-280 introduced fast-fail for job submissions with bad tokens. Unfortunately, other stack components like oozie and customers are acquiring RM tokens with a hardcoded dummy renewer value. These jobs would fail after 24 hours because the RM token couldn't be renewed, but fast-fail is failing them immediately. The RM should always be able to renew its own tokens submitted with a job. The renewer field may continue to specify an external user who can renew.
- YARN-319.
Major bug reported by shenhong and fixed by shenhong (resourcemanager , scheduler)
Submit a job to a queue that not allowed in fairScheduler, client will hold forever.
RM use fairScheduler, when client submit a job to a queue, but the queue do not allow the user to submit job it, in this case, client will hold forever.
- YARN-315.
Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas
Use security token protobuf definition from hadoop common
YARN part of HADOOP-9173.
- YARN-302.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager , scheduler)
Fair scheduler assignmultiple should default to false
The MR1 default was false. When true, it results in overloading some machines with many tasks and underutilizing others.
- YARN-301.
Major bug reported by shenhong and fixed by shenhong (resourcemanager , scheduler)
Fair scheduler throws ConcurrentModificationException when iterating over app's priorities
In my test cluster, fairscheduler appear to concurrentModificationException and RM crash, here is the message:
2012-12-30 17:14:17,171 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler
java.util.ConcurrentModificationException
at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1100)
at java.util.TreeMap$KeyIterator.next(TreeMap.java:1154)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.assignContainer(AppSchedulable.java:297)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:181)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:780)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:842)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:340)
at java.lang.Thread.run(Thread.java:662)
- YARN-300.
Major bug reported by shenhong and fixed by Sandy Ryza (resourcemanager , scheduler)
After YARN-271, fair scheduler can infinite loop and not schedule any application.
After yarn-271, when yarn.scheduler.fair.max.assign<=0, when a node was been reserved, fairScheduler will infinite loop and not schedule any application.
- YARN-293.
Critical bug reported by Devaraj K and fixed by Robert Joseph Evans (nodemanager)
Node Manager leaks LocalizerRunner object for every Container
Node Manager creates a new LocalizerRunner object for every container and puts in ResourceLocalizationService.LocalizerTracker.privLocalizers map but it never removes from the map.
- YARN-288.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager , scheduler)
Fair scheduler queue doesn't accept any jobs when ACLs are configured.
If a queue is configured with an ACL for who can submit jobs, no jobs are allowed, even if a user on the list tries.
This is caused by using the scheduler thinking the user is "yarn", because it calls UserGroupInformation.getCurrentUser() instead of UserGroupInformation.createRemoteUser() with the given user name.
- YARN-286.
Major new feature reported by Tom White and fixed by Tom White (applications)
Add a YARN ApplicationClassLoader
Add a classloader that provides webapp-style class isolation for use by applications. This is the YARN part of MAPREDUCE-1700 (which was already developed in that JIRA).
- YARN-285.
Major improvement reported by Derek Dagit and fixed by Derek Dagit
RM should be able to provide a tracking link for apps that have already been purged
As applications complete, the RM tracks their IDs in a completed list. This list is routinely truncated to limit the total number of application remembered by the RM.
When a user clicks the History for a job, either the browser is redirected to the application's tracking link obtained from the stored application instance. But when the application has been purged from the RM, an error is displayed.
In very busy clusters the rate at which applications complete can cause applications to be purged from the RM's internal list within hours, which breaks the proxy URLs users have saved for their jobs.
We would like the RM to provide valid tracking links persist so that users are not frustrated by broken links.
- YARN-283.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)
Fair scheduler fails to get queue info without root prefix
If queue1 exists, and a client calls "mapred queue -info queue1", an exception is thrown. If they use root.queue1, it works correctly.
- YARN-282.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza
Fair scheduler web UI double counts Apps Submitted
Each app submitted is reported twice under "Apps Submitted"
- YARN-280.
Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (resourcemanager)
RM does not reject app submission with invalid tokens
The RM will launch an app with invalid tokens. The tasks will languish with failed connection retries, followed by task reattempts, followed by app reattempts.
- YARN-278.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager , scheduler)
Fair scheduler maxRunningApps config causes no apps to make progress
This occurs because the scheduler erroneously chooses apps to offer resources to that are not runnable, then later decides they are not runnable, and doesn't try to give the resources to anyone else.
- YARN-277.
Major improvement reported by Bikas Saha and fixed by Bikas Saha
Use AMRMClient in DistributedShell to exemplify the approach
- YARN-272.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)
Fair scheduler log messages try to print objects without overridden toString methods
A lot of junk gets printed out like this:
2012-12-11 17:31:52,998 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp: Application application_1355270529654_0003 reserved container org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl@324f0f97 on node host: c1416.hal.cloudera.com:46356 #containers=7 available=0 used=8192, currently has 4 at priority org.apache.hadoop.yarn.api.records.impl.pb.PriorityPBImpl@33; currentReservation 4096
- YARN-271.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager , scheduler)
Fair scheduler hits IllegalStateException trying to reserve different apps on same node
After the fair scheduler reserves a container on a node, it doesn't check for reservations it just made when trying to make more reservations during the same heartbeat.
- YARN-267.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager , scheduler)
Fix fair scheduler web UI
The fair scheduler web UI was broken by MAPREDUCE-4720. The queues area is not shown, and changes are required to still show the fair share inside the applications table.
- YARN-266.
Critical bug reported by Ravi Prakash and fixed by Ravi Prakash (resourcemanager)
RM and JHS Web UIs are blank because AppsBlock is not escaping string properly
e.g. Job names with a line feed "\n" are causing a line feed in the JSON array being written out (since we are only using StringEscapeUtils.escapeHtml() ) and the Javascript parser complains that string quotes are unclosed. This
- YARN-264.
Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla
y.s.rm.DelegationTokenRenewer attempts to renew token even after removing an app
yarn.s.rm.security.DelegationTokenRenewer uses TimerTask/Timer. When such a timer task is canceled, already scheduled tasks run to completion. The task should check for such cancellation before running. Also, delegationTokens needs to be synchronized on all accesses.
- YARN-258.
Major bug reported by Ravi Prakash and fixed by Ravi Prakash (resourcemanager)
RM web page UI shows Invalid Date for start and finish times
Whenever the number of jobs was greater than a 100, two javascript arrays were being populated. appsData and appsTableData. appsData was winning out (because it was coming out later) and so renderHadoopDate was trying to render a <br title=""...> string.
- YARN-254.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager , scheduler)
Update fair scheduler web UI for hierarchical queues
The fair scheduler should have a web UI similar to the capacity scheduler that shows nested queues.
- YARN-253.
Critical bug reported by Tom White and fixed by Tom White (nodemanager)
Container launch may fail if no files were localized
This can be demonstrated with DistributedShell. The containers running the shell do not have any files to localize (if there is no shell script to copy) so if they run on a different NM to the AM (which does localize files), then they will fail since the appcache directory does not exist.
- YARN-251.
Major bug reported by Tom White and fixed by Tom White (resourcemanager)
Proxy URI generation fails for blank tracking URIs
If the URI is an empty string (the default if not set), then a warning is displayed. A null URI displays no such warning. These two cases should be handled in the same way.
- YARN-230.
Major sub-task reported by Bikas Saha and fixed by Bikas Saha (resourcemanager)
Make changes for RM restart phase 1
As described in YARN-128, phase 1 of RM restart puts in place mechanisms to save application state and read them back after restart. Upon restart, the NM's are asked to reboot and the previously running AM's are restarted.
After this is done, RM HA and work preserving restart can continue in parallel. For more details please refer to the design document in YARN-128
- YARN-229.
Major sub-task reported by Bikas Saha and fixed by Bikas Saha (resourcemanager)
Remove old code for restart
Much of the code is dead/commented out and is not executed. Removing it will help with making and understanding new changes.
- YARN-225.
Critical bug reported by Devaraj K and fixed by Devaraj K (resourcemanager)
Proxy Link in RM UI thows NPE in Secure mode
{code:xml}
java.lang.NullPointerException
at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.doGet(WebAppProxyServlet.java:241)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:975)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
{code}
- YARN-224.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza
Fair scheduler logs too many nodeUpdate INFO messages
The RM logs are filled with an INFO message the fair scheduler logs every time it receives a nodeUpdate. It should be taken out or demoted to debug.
- YARN-223.
Critical bug reported by Radim Kolar and fixed by Radim Kolar
Change processTree interface to work better with native code
Problem is that on every update of processTree new object is required. This is undesired when working with processTree implementation in native code.
replace ProcessTree.getProcessTree() with updateProcessTree(). No new object allocation is needed and it simplify application code a bit.
- YARN-222.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager , scheduler)
Fair scheduler should create queue for each user by default
In MR1 the fair scheduler's default behavior was to create a pool for each user. The YARN fair scheduler has this capability, but it should be turned on by default, for consistency.
- YARN-219.
Critical sub-task reported by Robert Joseph Evans and fixed by Robert Joseph Evans (nodemanager)
NM should aggregate logs when application finishes.
The NM should only aggregate logs when the application finishes. This will reduce the load on the NN, especially with respect to lease renewal.
- YARN-217.
Blocker bug reported by Devaraj K and fixed by Devaraj K (resourcemanager)
yarn rmadmin commands fail in secure cluster
All the rmadmin commands fail in secure mode with the "protocol org.apache.hadoop.yarn.server.nodemanager.api.RMAdminProtocolPB is unauthorized" message in RM logs.
- YARN-216.
Major improvement reported by Todd Lipcon and fixed by Robert Joseph Evans
Remove jquery theming support
As of today we have 9.4MB of JQuery themes in our code tree. In addition to being a waste of space, it's a highly questionable feature. I've never heard anyone complain that the Hadoop interface isn't themeable enough, and there's far more value in consistency across installations than there is in themeability. Let's rip it out.
- YARN-214.
Major bug reported by Jason Lowe and fixed by Jonathan Eagles (resourcemanager)
RMContainerImpl does not handle event EXPIRE at state RUNNING
RMContainerImpl has a race condition where a container can enter the RUNNING state just as the container expires. This results in an invalid event transition error:
{noformat}
2012-11-11 05:31:38,954 [ResourceManager Event Processor] ERROR org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: EXPIRE at RUNNING
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
at org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:205)
at org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:44)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApp.containerCompleted(SchedulerApp.java:203)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1337)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainer(CapacityScheduler.java:739)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:659)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:80)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:340)
at java.lang.Thread.run(Thread.java:619)
{noformat}
EXPIRE needs to be handled (well at least ignored) in the RUNNING state to account for this race condition.
- YARN-212.
Blocker bug reported by Nathan Roberts and fixed by Nathan Roberts (nodemanager)
NM state machine ignores an APPLICATION_CONTAINER_FINISHED event when it shouldn't
The NM state machines can make the following two invalid state transitions when a speculative attempt is killed shortly after it gets started. When this happens the NM keeps the log aggregation context open for this application and therefore chews up FDs and leases on the NN, eventually running the NN out of FDs and bringing down the entire cluster.
2012-11-07 05:36:33,774 [AsyncDispatcher event handler] WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APPLICATION_CONTAINER_FINISHED at INITING
2012-11-07 05:36:33,775 [AsyncDispatcher event handler] WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Can't handle this event at current state: Current: [DONE], eventType: [INIT_CONTAINER]
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: INIT_CONTAINER at DONE
- YARN-206.
Major bug reported by Jason Lowe and fixed by Jason Lowe (resourcemanager)
TestApplicationCleanup.testContainerCleanup occasionally fails
testContainerCleanup is occasionally failing with the error:
testContainerCleanup(org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup): expected:<2> but was:<1>
- YARN-204.
Major bug reported by Aleksey Gorshkov and fixed by Aleksey Gorshkov (applications)
test coverage for org.apache.hadoop.tools
Added some tests for org.apache.hadoop.tools
- YARN-202.
Critical bug reported by Kihwal Lee and fixed by Kihwal Lee
Log Aggregation generates a storm of fsync() for namenode
When the log aggregation is on, write to each aggregated container log causes hflush() to be called. For large clusters, this can creates a lot of fsync() calls for namenode.
We have seen 6-7x increase in the average number of fsync operations compared to 1.0.x on a large busy cluster. Over 99% of fsync ops were for log aggregation writing to tmp files.
- YARN-201.
Critical bug reported by Jason Lowe and fixed by Jason Lowe (capacityscheduler)
CapacityScheduler can take a very long time to schedule containers if requests are off cluster
When a user runs a job where one of the input files is a large file on another cluster, the job can create many splits on nodes which are unreachable for computation from the current cluster. The off-switch delay logic in LeafQueue can cause the ResourceManager to allocate containers for the job very slowly. In one case the job was only getting one container every 23 seconds, and the queue had plenty of spare capacity.
- YARN-189.
Blocker bug reported by Thomas Graves and fixed by Thomas Graves (resourcemanager)
deadlock in RM - AMResponse object
we ran into a deadlock in the RM.
=============================
"1128743461@qtp-1252749669-5201":
waiting for ownable synchronizer 0x00002aabbc87b960, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
which is held by "AsyncDispatcher event handler"
"AsyncDispatcher event handler":
waiting to lock monitor 0x00002ab0bba3a370 (object 0x00002aab3d4cd698, a org.apache.hadoop.yarn.api.records.impl.pb.AMResponsePBImpl),
which is held by "IPC Server handler 36 on 8030"
"IPC Server handler 36 on 8030":
waiting for ownable synchronizer 0x00002aabbc87b960, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
which is held by "AsyncDispatcher event handler"
Java stack information for the threads listed above:
===================================================
"1128743461@qtp-1252749669-5201":
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00002aabbc87b960> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:941) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1261)
at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:594) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.getFinalApplicationStatus(RMAppAttemptImpl.java:2
95)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.getFinalApplicationStatus(RMAppImpl.java:222)
at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:328)
at sun.reflect.GeneratedMethodAccessor41.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaM
...
...
..
"AsyncDispatcher event handler":
at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.unregisterAttempt(ApplicationMasterService.java:307)
- waiting to lock <0x00002aab3d4cd698> (a org.apache.hadoop.yarn.api.records.impl.pb.AMResponsePBImpl)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$BaseFinalTransition.transition(RMAppAttemptImpl.java:647)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$FinalTransition.transition(RMAppAttemptImpl.java:809)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$FinalTransition.transition(RMAppAttemptImpl.java:796)
at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
- locked <0x00002aabbb673090> (a org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:478)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:81)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:436)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:417)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
at java.lang.Thread.run(Thread.java:619)
"IPC Server handler 36 on 8030":
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00002aabbc87b960> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178)
at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:807)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.pullJustFinishedContainers(RMAppAttemptImpl.java:437)
at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:285)
- locked <0x00002aab3d4cd698> (a org.apache.hadoop.yarn.api.records.impl.pb.AMResponsePBImpl)
at org.apache.hadoop.yarn.api.impl.pb.service.AMRMProtocolPBServiceImpl.allocate(AMRMProtocolPBServiceImpl.java:56)
at org.apache.hadoop.yarn.proto.AMRMProtocol$AMRMProtocolService$2.callBlockingMethod(AMRMProtocol.java:87)
at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:353)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1528)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1524)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1522)
- YARN-188.
Major test reported by Aleksey Gorshkov and fixed by Aleksey Gorshkov (capacityscheduler)
Coverage fixing for CapacityScheduler
some tests for CapacityScheduler
YARN-188-branch-0.23.patch patch for branch 0.23
YARN-188-branch-2.patch patch for branch 2
YARN-188-trunk.patch patch for trunk
- YARN-187.
Major new feature reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)
Add hierarchical queues to the fair scheduler
- YARN-186.
Major test reported by Aleksey Gorshkov and fixed by Aleksey Gorshkov (resourcemanager , scheduler)
Coverage fixing LinuxContainerExecutor
Added some tests for LinuxContainerExecuror
YARN-186-branch-0.23.patch patch for branch-0.23
YARN-186-branch-2.patch patch for branch-2
ARN-186-trunk.patch patch for trank
- YARN-184.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza
Remove unnecessary locking in fair scheduler, and address findbugs excludes.
In YARN-12, locks were added to all fields of QueueManager to address findbugs. In addition, findbugs exclusions were added in response to MAPREDUCE-4439, without a deep look at the code.
- YARN-183.
Minor improvement reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)
Clean up fair scheduler code
The fair scheduler code has a bunch of minor stylistic issues.
- YARN-181.
Critical bug reported by Siddharth Seth and fixed by Siddharth Seth (resourcemanager)
capacity-scheduler.xml move breaks Eclipse import
Eclipse doesn't seem to handle "testResources" which resolve to an absolute path. YARN-140 moved capacity-scheduler.cfg a couple of levels up to the hadoop-yarn project.
- YARN-180.
Critical bug reported by Thomas Graves and fixed by Arun C Murthy (capacityscheduler)
Capacity scheduler - containers that get reserved create container token to early
The capacity scheduler has the ability to 'reserve' containers. Unfortunately before it decides that it goes to reserved rather then assigned, the Container object is created which creates a container token that expires in roughly 10 minutes by default.
This means that by the time the NM frees up enough space on that node for the container to move to assigned the container token may have expired.
- YARN-179.
Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (capacityscheduler)
Bunch of test failures on trunk
{{CapacityScheduler.setConf()}} mandates a YarnConfiguration. It doesn't need to, throughout all of YARN, components only depend on Configuration and depend on the callers to provide correct configuration.
This is causing multiple tests to fail.
- YARN-178.
Critical bug reported by Radim Kolar and fixed by Radim Kolar
Fix custom ProcessTree instance creation
1. In current pluggable resourcecalculatorprocesstree is not passed root process id to custom implementation making it unusable.
2. pstree do not extend Configured as it should
Added constructor with pid argument with testsuite. Also added test that pstree is correctly configured.
- YARN-177.
Critical bug reported by Thomas Graves and fixed by Arun C Murthy (capacityscheduler)
CapacityScheduler - adding a queue while the RM is running has wacky results
Adding a queue to the capacity scheduler while the RM is running and then running a job in the queue added results in very strange behavior. The cluster Total Memory can either decrease or increase. We had a cluster where total memory decreased to almost 1/6th the capacity. Running on a small test cluster resulted in the capacity going up by simply adding a queue and running wordcount.
Looking at the RM logs, used memory can go negative but other logs show the number positive:
2012-10-21 22:56:44,796 [ResourceManager Event Processor] INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: assignedContainer queue=root usedCapacity=0.0375 absoluteUsedCapacity=0.0375 used=memory: 7680 cluster=memory: 204800
2012-10-21 22:56:45,831 [ResourceManager Event Processor] INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: completedContainer queue=root usedCapacity=-0.0225 absoluteUsedCapacity=-0.0225 used=memory: -4608 cluster=memory: 204800
- YARN-170.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (nodemanager)
NodeManager stop() gets called twice on shutdown
The stop method in the NodeManager gets called twice when the NodeManager is shut down via the shutdown hook.
The first is the stop that gets called directly by the shutdown hook. The second occurs when the NodeStatusUpdaterImpl is stopped. The NodeManager responds to the NodeStatusUpdaterImpl stop stateChanged event by stopping itself. This is so that NodeStatusUpdaterImpl can notify the NodeManager to stop, by stopping itself in response to a request from the ResourceManager
This could be avoided if the NodeStatusUpdaterImpl were to stop the NodeManager by calling its stop method directly.
- YARN-169.
Minor improvement reported by Anthony Rojas and fixed by Anthony Rojas (nodemanager)
Update log4j.appender.EventCounter to use org.apache.hadoop.log.metrics.EventCounter
We should update the log4j.appender.EventCounter in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/resources/container-log4j.properties to use *org.apache.hadoop.log.metrics.EventCounter* rather than *org.apache.hadoop.metrics.jvm.EventCounter* to avoid triggering the following warning:
{code}WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files{code}
- YARN-166.
Major bug reported by Thomas Graves and fixed by Thomas Graves (capacityscheduler)
capacity scheduler doesn't allow capacity < 1.0
1.x supports queue capacity < 1, but in 0.23 the capacity scheduler doesn't. This is an issue for us since we have a large cluster running 1.x that currently has a queue with capacity 0.5%.
- YARN-165.
Blocker improvement reported by Jason Lowe and fixed by Jason Lowe (resourcemanager)
RM should point tracking URL to RM web page for app when AM fails
Currently when an ApplicationMaster fails the ResourceManager is updating the tracking URL to an empty string, see RMAppAttemptImpl.ContainerFinishedTransition. Unfortunately when the client attempts to follow the proxy URL it results in a web page showing an HTTP 500 error and an ugly backtrace because "http://" isn't a very helpful tracking URL.
It would be much more helpful if the proxy URL redirected to the RM webapp page for the specific application. That page shows the various AM attempts and pointers to their logs which will be useful for debugging the problems that caused the AM attempts to fail.
- YARN-163.
Major bug reported by Jason Lowe and fixed by Jason Lowe (nodemanager)
Retrieving container log via NM webapp can hang with multibyte characters in log
ContainerLogsBlock.printLogs currently assumes that skipping N bytes in the log file is the same as skipping N characters, but that is not true when the log contains multibyte characters. This can cause the loop that skips a portion of the log to try to skip past the end of the file and loop forever (or until Jetty kills the worker thread).
- YARN-161.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (api)
Yarn Common has multiple compiler warnings for unchecked operations
The warnings are in classes StateMachineFactory, RecordFactoryProvider, RpcFactoryProvider, and YarnRemoteExceptionFactoryProvider. OpenJDK 1.6.0_24 actually treats these as compilation errors, causing the build to fail.
- YARN-159.
Major bug reported by Thomas Graves and fixed by Thomas Graves (resourcemanager)
RM web ui applications page should be sorted to display last app first
RM web ui applications page should be sorted to display last app first.
It currently sorts with smallest application id first, which is the first apps that were submitted. After you have one page worth of apps its much more useful for it to sort such that the biggest appid (last submitted app) shows up first.
- YARN-151.
Major bug reported by Robert Joseph Evans and fixed by Ravi Prakash
Browser thinks RM main page JS is taking too long
The main RM page with the default settings of 10,000 applications can cause browsers to think that the JS on the page is stuck and ask you if you want to kill it. This is a big usability problem.
- YARN-150.
Major bug reported by Bikas Saha and fixed by Bikas Saha
AppRejectedTransition does not unregister app from master service and scheduler
AttemptStartedTransition() adds the app to the ApplicationMasterService and scheduler. when the scheduler rejects the app then AppRejectedTransition() forgets to unregister it from the ApplicationMasterService.
- YARN-146.
Major new feature reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager)
Add unit tests for computing fair share in the fair scheduler
MR1 had TestComputeFairShares. This should go into the YARN fair scheduler.
- YARN-145.
Major new feature reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager)
Add a Web UI to the fair share scheduler
The fair scheduler had a UI in MR1. Port the capacity scheduler web UI and modify appropriately for the fair share scheduler.
- YARN-140.
Major bug reported by Ahmed Radwan and fixed by Ahmed Radwan (capacityscheduler)
Add capacity-scheduler-default.xml to provide a default set of configurations for the capacity scheduler.
When setting up the capacity scheduler users are faced with problems like:
{code}
FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager
java.lang.IllegalArgumentException: Illegal capacity of -1 for queue root
{code}
Which basically arises from missing basic configurations, which in many cases, there is no need to explicitly provide, and a default configuration will be sufficient. For example, to address the error above, the user need to add a capacity of 100 to the root queue.
So, we need to add a capacity-scheduler-default.xml, this will be helpful to provide the basic set of default configurations required to run the capacity scheduler. The user can still override existing configurations or provide new ones in capacity-scheduler.xml. This is similar to *-default.xml vs *-site.xml for yarn, core, mapred, hdfs, etc.
- YARN-139.
Major bug reported by Nathan Roberts and fixed by Vinod Kumar Vavilapalli (api)
Interrupted Exception within AsyncDispatcher leads to user confusion
Successful applications tend to get InterruptedExceptions during shutdown. The exception is harmless but it leads to lots of user confusion and therefore could be cleaned up.
2012-09-28 14:50:12,477 WARN [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Interrupted Exception while stopping
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1143)
at java.lang.Thread.join(Thread.java:1196)
at org.apache.hadoop.yarn.event.AsyncDispatcher.stop(AsyncDispatcher.java:105)
at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99)
at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:89)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler.handle(MRAppMaster.java:437)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler.handle(MRAppMaster.java:402)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
at java.lang.Thread.run(Thread.java:619)
2012-09-28 14:50:12,477 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.service.AbstractService: Service:Dispatcher is stopped.
2012-09-28 14:50:12,477 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.mapreduce.v2.app.MRAppMaster is stopped.
2012-09-28 14:50:12,477 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Exiting MR AppMaster..GoodBye
- YARN-136.
Major bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (resourcemanager)
Make ClientTokenSecretManager part of RMContext
Helps to add it to the context instead of passing it all around as an extra parameter.
- YARN-135.
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (resourcemanager)
ClientTokens should be per app-attempt and be unregistered on App-finish.
Two issues:
- ClientTokens are per app-attempt but are created per app.
- Apps don't get unregistered from RMClientTokenSecretManager.
- YARN-134.
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli
ClientToAMSecretManager creates keys without checking for validity of the appID
- YARN-133.
Major bug reported by Thomas Graves and fixed by Ravi Prakash (resourcemanager)
update web services docs for RM clusterMetrics
Looks like jira https://issues.apache.org/jira/browse/MAPREDUCE-3747 added in more RM cluster metrics but the docs didn't get updated: http://hadoop.apache.org/docs/r0.23.3/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Metrics_API
- YARN-131.
Major bug reported by Ahmed Radwan and fixed by Ahmed Radwan (capacityscheduler)
Incorrect ACL properties in capacity scheduler documentation
The CapacityScheduler apt file incorrectly specifies the property names controlling acls for application submission and queue administration.
{{yarn.scheduler.capacity.root.<queue-path>.acl_submit_jobs}}
should be
{{yarn.scheduler.capacity.root.<queue-path>.acl_submit_applications}}
{{yarn.scheduler.capacity.root.<queue-path>.acl_administer_jobs}}
should be
{{yarn.scheduler.capacity.root.<queue-path>.acl_administer_queue}}
Uploading a patch momentarily.
- YARN-129.
Major improvement reported by Tom White and fixed by Tom White (client)
Simplify classpath construction for mini YARN tests
The test classpath includes a special file called 'mrapp-generated-classpath' (or similar in distributed shell) that is constructed at build time, and whose contents are a classpath with all the dependencies needed to run the tests. When the classpath for a container (e.g. the AM) is constructed the contents of mrapp-generated-classpath is read and added to the classpath, and the file itself is then added to the classpath so that later when the AM constructs a classpath for a task container it can propagate the test classpath correctly.
This mechanism can be drastically simplified by propagating the system classpath of the current JVM (read from the java.class.path property) to a launched JVM, but only if running in the context of the mini YARN cluster. Any tests that use the mini YARN cluster will automatically work with this change. Although any that explicitly deal with mrapp-generated-classpath can be simplified.
- YARN-127.
Major bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli
Move RMAdmin tool to the client package
It belongs to the client package and not the RM clearly.
- YARN-116.
Major bug reported by xieguiming and fixed by xieguiming (resourcemanager)
RM is missing ability to add include/exclude files without a restart
The "yarn.resourcemanager.nodes.include-path" default value is "", if we need to add an include file, we must currently restart the RM.
I suggest that for adding an include or exclude file, there should be no need to restart the RM. We may only execute the refresh command. The HDFS NameNode already has this ability.
Fix is to the modify HostsFileReader class instances:
From:
{code}
public HostsFileReader(String inFile,
String exFile)
{code}
To:
{code}
public HostsFileReader(Configuration conf,
String NODES_INCLUDE_FILE_PATH,String DEFAULT_NODES_INCLUDE_FILE_PATH,
String NODES_EXCLUDE_FILE_PATH,String DEFAULT_NODES_EXCLUDE_FILE_PATH)
{code}
And thus, we can read the config file dynamically when a {{refreshNodes}} is invoked and therefore have no need to restart the ResourceManager.
- YARN-103.
Major improvement reported by Bikas Saha and fixed by Bikas Saha
Add a yarn AM - RM client module
Add a basic client wrapper library to the AM RM protocol in order to prevent proliferation of code being duplicated everywhere. Provide helper functions to perform reverse mapping of container requests to RM allocation resource request table format.
- YARN-102.
Trivial bug reported by Devaraj K and fixed by Devaraj K (resourcemanager)
Move the apache licence header to the top of the file in MemStore.java
- YARN-94.
Major bug reported by Vinod Kumar Vavilapalli and fixed by Hitesh Shah (applications/distributed-shell)
DistributedShell jar should point to Client as the main class by default
Today, it says so..
{code}
$ $YARN_HOME/bin/yarn jar $YARN_HOME/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-$VERSION.jar
RunJar jarFile [mainClass] args...
{code}
- YARN-93.
Major bug reported by Jason Lowe and fixed by Jason Lowe (resourcemanager)
Diagnostics missing from applications that have finished but failed
If an application finishes in the YARN sense but fails in the app framework sense (e.g.: a failed MapReduce job) then diagnostics are missing from the RM web page for the application. The RM should be reporting diagnostic messages even for successful YARN applications.
- YARN-82.
Minor bug reported by Andy Isaacson and fixed by Hemanth Yamijala (nodemanager)
YARN local-dirs defaults to /tmp/nm-local-dir
{{yarn.nodemanager.local-dirs}} defaults to {{/tmp/nm-local-dir}}. It should be {hadoop.tmp.dir}/nm-local-dir or similar. Among other problems, this can prevent multiple test clusters from starting on the same machine.
Thanks to Hemanth Yamijala for reporting this issue.
- YARN-78.
Major bug reported by Bikas Saha and fixed by Bikas Saha (applications)
Change UnmanagedAMLauncher to use YarnClientImpl
YARN-29 added a common client impl to talk to the RM. Use that in the UnmanagedAMLauncher.
- YARN-72.
Major bug reported by Hitesh Shah and fixed by Sandy Ryza (nodemanager)
NM should handle cleaning up containers when it shuts down
Ideally, the NM should wait for a limited amount of time when it gets a shutdown signal for existing containers to complete and kill the containers ( if we pick an aggressive approach ) after this time interval.
For NMs which come up after an unclean shutdown, the NM should look through its directories for existing container.pids and try and kill an existing containers matching the pids found.
- YARN-57.
Major improvement reported by Radim Kolar and fixed by Radim Kolar (nodemanager)
Plugable process tree
Trunk version of Pluggable process tree. Work based on MAPREDUCE-4204
- YARN-50.
Blocker sub-task reported by Siddharth Seth and fixed by Siddharth Seth
Implement renewal / cancellation of Delegation Tokens
Currently, delegation tokens issues by the RM and History server cannot be renewed or cancelled. This needs to be implemented.
- YARN-43.
Major bug reported by Thomas Graves and fixed by Thomas Graves
TestResourceTrackerService fail intermittently on jdk7
Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.73 sec <<< FAILURE!
testDecommissionWithIncludeHosts(org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService) Time elapsed: 0.086 sec <<< FAILURE!
junit.framework.AssertionFailedError: expected:<0> but was:<1> at junit.framework.Assert.fail(Assert.java:47)
at junit.framework.Assert.failNotEquals(Assert.java:283)
at junit.framework.Assert.assertEquals(Assert.java:64)
at junit.framework.Assert.assertEquals(Assert.java:195)
at junit.framework.Assert.assertEquals(Assert.java:201)
at org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testDecommissionWithIncludeHosts(TestResourceTrackerService.java:90)
- YARN-40.
Major bug reported by Devaraj K and fixed by Devaraj K (client)
Provide support for missing yarn commands
1. status <app-id>
2. kill <app-id> (Already issue present with Id : MAPREDUCE-3793)
3. list-apps [all]
4. nodes-report
- YARN-33.
Major bug reported by Mayank Bansal and fixed by Mayank Bansal (nodemanager)
LocalDirsHandler should validate the configured local and log dirs
WHen yarn.nodemanager.log-dirs is with file:// URI then startup of node manager creates the directory like file:// under CWD.
WHich should not be there.
Thanks,
Mayank
- YARN-32.
Major bug reported by Thomas Graves and fixed by Vinod Kumar Vavilapalli
TestApplicationTokens fails intermintently on jdk7
TestApplicationsTokens fails intermintently on jdk7.
- YARN-30.
Major bug reported by Thomas Graves and fixed by Thomas Graves
TestNMWebServicesApps, TestRMWebServicesApps and TestRMWebServicesNodes fail on jdk7
It looks like the string changed from "const class" to "constant".
Tests run: 19, Failures: 3, Errors: 0, Skipped: 0, Time elapsed: 6.786 sec <<< FAILURE!
testNodeAppsStateInvalid(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesApps) Time elapsed: 0.248 sec <<< FAILURE!
java.lang.AssertionError: exception message doesn't match, got: No enum constant org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationState.FOO_STATE expected: No enum const class org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationState.FOO_STATE
- YARN-28.
Major bug reported by Thomas Graves and fixed by Thomas Graves
TestCompositeService fails on jdk7
test TestCompositeService fails when run with jdk7.
It appears it expects test testCallSequence to be called first and the sequence numbers to start at 0. On jdk7 its not being called first and sequence number has already been incremented.
- YARN-23.
Major improvement reported by Karthik Kambatla and fixed by Karthik Kambatla (scheduler)
FairScheduler: FSQueueSchedulable#updateDemand() - potential redundant aggregation
In FS, FSQueueSchedulable#updateDemand() limits the demand to maxTasks only after iterating though all the pools and computing the final demand.
By checking if the demand has reached maxTasks in every iteration, we can avoid redundant work, at the expense of one condition check every iteration.
- YARN-3.
Major sub-task reported by Arun C Murthy and fixed by Andrew Ferguson
Add support for CPU isolation/monitoring of containers
- YARN-2.
Major new feature reported by Arun C Murthy and fixed by Arun C Murthy (capacityscheduler , scheduler)
Enhance CS to schedule accounting for both memory and cpu cores
With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both.
- MAPREDUCE-4977.
Major improvement reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (documentation)
Documentation for pluggable shuffle and pluggable sort
- MAPREDUCE-4971.
Minor improvement reported by Arun C Murthy and fixed by Arun C Murthy
Minor extensibility enhancements
- MAPREDUCE-4969.
Major bug reported by Arpit Agarwal and fixed by Arpit Agarwal (test)
TestKeyValueTextInputFormat test fails with Open JDK 7
- MAPREDUCE-4953.
Major bug reported by Andy Isaacson and fixed by Andy Isaacson (pipes)
HadoopPipes misuses fprintf
- MAPREDUCE-4949.
Minor improvement reported by Sandy Ryza and fixed by Sandy Ryza (examples)
Enable multiple pi jobs to run in parallel
- MAPREDUCE-4948.
Critical bug reported by Junping Du and fixed by Junping Du (client)
TestYARNRunner.testHistoryServerToken failed on trunk
- MAPREDUCE-4946.
Critical bug reported by Jason Lowe and fixed by Jason Lowe (mr-am)
Type conversion of map completion events leads to performance problems with large jobs
- MAPREDUCE-4936.
Critical bug reported by Daryn Sharp and fixed by Arun C Murthy (mrv2)
JobImpl uber checks for cpu are wrong
- MAPREDUCE-4934.
Critical bug reported by Thomas Graves and fixed by Thomas Graves (build)
Maven RAT plugin is not checking all source files
- MAPREDUCE-4928.
Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (applicationmaster , security)
Use token request messages defined in hadoop common
Protobuf message GetDelegationTokenRequestProto field renewer is made requried from optional. This change is not wire compatible with the older releases.
- MAPREDUCE-4925.
Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla (examples)
The pentomino option parser may be buggy
- MAPREDUCE-4924.
Trivial bug reported by Robert Kanter and fixed by Robert Kanter (mrv1)
flakey test: org.apache.hadoop.mapred.TestClusterMRNotification.testMR
- MAPREDUCE-4923.
Minor bug reported by Sandy Ryza and fixed by Sandy Ryza (mrv1 , mrv2 , task)
Add toString method to TaggedInputSplit
- MAPREDUCE-4921.
Blocker bug reported by Daryn Sharp and fixed by Daryn Sharp (client)
JobClient should acquire HS token with RM principal
- MAPREDUCE-4920.
Major bug reported by Vinod Kumar Vavilapalli and fixed by Suresh Srinivas
Use security token protobuf definition from hadoop common
- MAPREDUCE-4913.
Major bug reported by Jason Lowe and fixed by Jason Lowe (mr-am)
TestMRAppMaster#testMRAppMasterMissingStaging occasionally exits
- MAPREDUCE-4907.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (mrv1 , tasktracker)
TrackerDistributedCacheManager issues too many getFileStatus calls
- MAPREDUCE-4905.
Major test reported by Aleksey Gorshkov and fixed by Aleksey Gorshkov
test org.apache.hadoop.mapred.pipes
- MAPREDUCE-4902.
Trivial bug reported by Albert Chu and fixed by Albert Chu
Fix typo "receievd" should be "received" in log output
- MAPREDUCE-4899.
Major improvement reported by Derek Dagit and fixed by Derek Dagit
Provide a plugin to the Yarn Web App Proxy to generate tracking links for M/R appllications given the ID
- MAPREDUCE-4895.
Major bug reported by Dennis Y and fixed by Dennis Y
Fix compilation failure of org.apache.hadoop.mapred.gridmix.TestResourceUsageEmulators
- MAPREDUCE-4894.
Blocker bug reported by Siddharth Seth and fixed by Siddharth Seth (jobhistoryserver , mrv2)
Renewal / cancellation of JobHistory tokens
- MAPREDUCE-4893.
Major bug reported by Bikas Saha and fixed by Bikas Saha (applicationmaster)
MR AppMaster can do sub-optimal assignment of containers to map tasks leading to poor node locality
- MAPREDUCE-4890.
Critical bug reported by Jason Lowe and fixed by Jason Lowe (mr-am)
Invalid TaskImpl state transitions when task fails while speculating
- MAPREDUCE-4884.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (contrib/streaming , test)
streaming tests fail to start MiniMRCluster due to "Queue configuration missing child queue names for root"
- MAPREDUCE-4861.
Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla
Cleanup: Remove unused mapreduce.security.token.DelegationTokenRenewal
- MAPREDUCE-4856.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (test)
TestJobOutputCommitter uses same directory as TestJobCleanup
- MAPREDUCE-4848.
Major bug reported by Jason Lowe and fixed by Jerry Chen (mr-am)
TaskAttemptContext cast error during AM recovery
- MAPREDUCE-4845.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (client)
ClusterStatus.getMaxMemory() and getUsedMemory() exist in MR1 but not MR2
- MAPREDUCE-4842.
Blocker bug reported by Jason Lowe and fixed by Mariappan Asokan (mrv2)
Shuffle race can hang reducer
- MAPREDUCE-4838.
Major improvement reported by Arun C Murthy and fixed by Zhijie Shen
Add extra info to JH files
- MAPREDUCE-4836.
Major bug reported by Ravi Prakash and fixed by Ravi Prakash
Elapsed time for running tasks on AM web UI tasks page is 0
- MAPREDUCE-4833.
Critical bug reported by Robert Joseph Evans and fixed by Robert Parker (applicationmaster , mrv2)
Task can get stuck in FAIL_CONTAINER_CLEANUP
- MAPREDUCE-4832.
Critical bug reported by Robert Joseph Evans and fixed by Jason Lowe (applicationmaster)
MR AM can get in a split brain situation
- MAPREDUCE-4825.
Major bug reported by Jason Lowe and fixed by Jason Lowe (mr-am)
JobImpl.finished doesn't expect ERROR as a final job state
- MAPREDUCE-4822.
Trivial improvement reported by Robert Joseph Evans and fixed by Chu Tong (jobhistoryserver)
Unnecessary conversions in History Events
- MAPREDUCE-4819.
Blocker bug reported by Jason Lowe and fixed by Bikas Saha (mr-am)
AM can rerun job after reporting final job status to the client
- MAPREDUCE-4817.
Critical bug reported by Jason Lowe and fixed by Thomas Graves (applicationmaster , mr-am)
Hardcoded task ping timeout kills tasks localizing large amounts of data
- MAPREDUCE-4813.
Critical bug reported by Jason Lowe and fixed by Jason Lowe (applicationmaster)
AM timing out during job commit
- MAPREDUCE-4811.
Minor improvement reported by Ravi Prakash and fixed by Ravi Prakash (jobhistoryserver , mrv2)
JobHistoryServer should show when it was started in WebUI About page
- MAPREDUCE-4810.
Minor improvement reported by Jason Lowe and fixed by Jerry Chen (applicationmaster)
Add admin command options for ApplicationMaster
- MAPREDUCE-4809.
Major sub-task reported by Arun C Murthy and fixed by Mariappan Asokan
Change visibility of classes for pluggable sort changes
- MAPREDUCE-4808.
Major new feature reported by Arun C Murthy and fixed by Mariappan Asokan
Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations
- MAPREDUCE-4807.
Major sub-task reported by Arun C Murthy and fixed by Mariappan Asokan
Allow MapOutputBuffer to be pluggable
- MAPREDUCE-4803.
Minor test reported by Mariappan Asokan and fixed by Mariappan Asokan (test)
Duplicate copies of TestIndexCache.java
- MAPREDUCE-4802.
Major improvement reported by Ravi Prakash and fixed by Ravi Prakash (mr-am , mrv2 , webapps)
Takes a long time to load the task list on the AM for large jobs
- MAPREDUCE-4801.
Critical bug reported by Jason Lowe and fixed by Jason Lowe
ShuffleHandler can generate large logs due to prematurely closed channels
- MAPREDUCE-4797.
Major bug reported by Jason Lowe and fixed by Jason Lowe (applicationmaster)
LocalContainerAllocator can loop forever trying to contact the RM
- MAPREDUCE-4787.
Major bug reported by Ravi Prakash and fixed by Robert Parker (test)
TestJobMonitorAndPrint is broken
- MAPREDUCE-4786.
Major bug reported by Ravi Prakash and fixed by Ravi Prakash (mrv2)
Job End Notification retry interval is 5 milliseconds by default
- MAPREDUCE-4782.
Blocker bug reported by Mark Fuhs and fixed by Mark Fuhs (client)
NLineInputFormat skips first line of last InputSplit
- MAPREDUCE-4778.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (jobtracker , scheduler)
Fair scheduler event log is only written if directory exists on HDFS
- MAPREDUCE-4777.
Minor improvement reported by Sandy Ryza and fixed by Sandy Ryza
In TestIFile, testIFileReaderWithCodec relies on testIFileWriterWithCodec
- MAPREDUCE-4774.
Major bug reported by Ivan A. Veselovsky and fixed by Jason Lowe (applicationmaster , mrv2)
JobImpl does not handle asynchronous task events in FAILED state
- MAPREDUCE-4772.
Critical bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mrv2)
Fetch failures can take way too long for a map to be restarted
- MAPREDUCE-4771.
Major bug reported by Jason Lowe and fixed by Jason Lowe (mrv2)
KeyFieldBasedPartitioner not partitioning properly when configured
- MAPREDUCE-4764.
Major improvement reported by Ivan A. Veselovsky and fixed by
repair test org.apache.hadoop.mapreduce.security.TestBinaryTokenFile
- MAPREDUCE-4763.
Minor improvement reported by Ivan A. Veselovsky and fixed by
repair test org.apache.hadoop.mapreduce.security.TestUmbilicalProtocolWithJobToken
- MAPREDUCE-4752.
Major improvement reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mrv2)
Reduce MR AM memory usage through String Interning
- MAPREDUCE-4751.
Major bug reported by Ravi Prakash and fixed by Vinod Kumar Vavilapalli
AM stuck in KILL_WAIT for days
- MAPREDUCE-4748.
Blocker bug reported by Robert Joseph Evans and fixed by Jason Lowe (mrv2)
Invalid event: T_ATTEMPT_SUCCEEDED at SUCCEEDED
- MAPREDUCE-4746.
Major bug reported by Robert Parker and fixed by Robert Parker (applicationmaster)
The MR Application Master does not have a config to set environment variables
- MAPREDUCE-4741.
Minor bug reported by Jason Lowe and fixed by Vinod Kumar Vavilapalli (applicationmaster , mrv2)
WARN and ERROR messages logged during normal AM shutdown
- MAPREDUCE-4740.
Blocker bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mrv2)
only .jars can be added to the Distributed Cache classpath
- MAPREDUCE-4736.
Trivial improvement reported by Brandon Li and fixed by Brandon Li (test)
Remove obsolete option [-rootDir] from TestDFSIO
- MAPREDUCE-4733.
Major bug reported by Jason Lowe and fixed by Jason Lowe (applicationmaster , mrv2)
Reducer can fail to make progress during shuffle if too many reducers complete consecutively
- MAPREDUCE-4730.
Blocker bug reported by Jason Lowe and fixed by Jason Lowe (applicationmaster , mrv2)
AM crashes due to OOM while serving up map task completion events
- MAPREDUCE-4729.
Major bug reported by Thomas Graves and fixed by Vinod Kumar Vavilapalli (jobhistoryserver)
job history UI not showing all job attempts
- MAPREDUCE-4724.
Major bug reported by Thomas Graves and fixed by Thomas Graves (jobhistoryserver)
job history web ui applications page should be sorted to display last app first
- MAPREDUCE-4723.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza
Fix warnings found by findbugs 2
- MAPREDUCE-4721.
Major bug reported by Ravi Prakash and fixed by Ravi Prakash (jobhistoryserver)
Task startup time in JHS is same as job startup time.
- MAPREDUCE-4720.
Major bug reported by Robert Joseph Evans and fixed by Ravi Prakash
Browser thinks History Server main page JS is taking too long
- MAPREDUCE-4712.
Major bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (jobhistoryserver)
mr-jobhistory-daemon.sh doesn't accept --config
- MAPREDUCE-4705.
Critical bug reported by Jason Lowe and fixed by Jason Lowe (jobhistoryserver , mrv2)
Historyserver links expire before the history data does
- MAPREDUCE-4703.
Major improvement reported by Ahmed Radwan and fixed by Ahmed Radwan (mrv1 , mrv2 , test)
Add the ability to start the MiniMRClientCluster using the configurations used before it is being stopped.
- MAPREDUCE-4681.
Major bug reported by Arun C Murthy and fixed by Arun C Murthy
HDFS-3910 broke MR tests
- MAPREDUCE-4678.
Minor bug reported by Chris McConnell and fixed by Chris McConnell (examples)
Running the Pentomino example with defaults throws java.lang.NegativeArraySizeException
- MAPREDUCE-4674.
Minor bug reported by Robert Justice and fixed by Robert Justice
Hadoop examples secondarysort has a typo "secondarysrot" in the usage
- MAPREDUCE-4666.
Minor improvement reported by Jason Lowe and fixed by Jason Lowe (jobhistoryserver)
JVM metrics for history server
- MAPREDUCE-4654.
Critical bug reported by Colin Patrick McCabe and fixed by Sandy Ryza (test)
TestDistCp is @ignored
- MAPREDUCE-4637.
Major bug reported by Tom White and fixed by Mayank Bansal (mrv2)
Killing an unassigned task attempt causes the job to fail
Handle TaskAttempt diagnostic updates while in the NEW and UNASSIGNED states.
- MAPREDUCE-4616.
Minor improvement reported by Tony Burton and fixed by Tony Burton (documentation)
Improvement to MultipleOutputs javadocs
- MAPREDUCE-4607.
Major bug reported by Bikas Saha and fixed by Bikas Saha
Race condition in ReduceTask completion can result in Task being incorrectly failed
- MAPREDUCE-4596.
Major task reported by Siddharth Seth and fixed by Siddharth Seth (applicationmaster , mrv2)
Split StateMachine state from states seen by MRClientProtocol (for Job, Task, TaskAttempt)
- MAPREDUCE-4554.
Major bug reported by Benoy Antony and fixed by Benoy Antony (job submission , security)
Job Credentials are not transmitted if security is turned off
- MAPREDUCE-4521.
Major bug reported by Jason Lowe and fixed by Ravi Prakash (mrv2)
mapreduce.user.classpath.first incompatibility with 0.20/1.x
- MAPREDUCE-4520.
Major new feature reported by Arun C Murthy and fixed by Arun C Murthy
Add experimental support for MR AM to schedule CPUs along-with memory
- MAPREDUCE-4517.
Minor improvement reported by James Kinley and fixed by Jason Lowe (applicationmaster)
Too many INFO messages written out during AM to RM heartbeat
- MAPREDUCE-4479.
Major bug reported by Mariappan Asokan and fixed by Mariappan Asokan (test)
Fix parameter order in assertEquals() in TestCombineInputFileFormat.java
- MAPREDUCE-4458.
Major improvement reported by Robert Joseph Evans and fixed by Robert Parker (mrv2)
Warn if java.library.path is used for AM or Task
- MAPREDUCE-4425.
Critical bug reported by Siddharth Seth and fixed by Jason Lowe (mrv2)
Speculation + Fetch failures can lead to a hung job
- MAPREDUCE-4279.
Major bug reported by Rahul Jain and fixed by Devaraj K (jobtracker)
getClusterStatus() fails with null pointer exception when running jobs in local mode
- MAPREDUCE-4278.
Major bug reported by Araceli Henley and fixed by Sandy Ryza
cannot run two local jobs in parallel from the same gateway.
- MAPREDUCE-4272.
Major bug reported by Luke Lu and fixed by Yu Gao (task)
SortedRanges.Range#compareTo is not spec compliant
- MAPREDUCE-4266.
Major task reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)
remove Ant remnants from MR
- MAPREDUCE-4229.
Major improvement reported by Todd Lipcon and fixed by Miomir Boljanovic (jobtracker)
Counter names' memory usage can be decreased by interning
- MAPREDUCE-4123.
Critical bug reported by Nishan Shetty and fixed by Devaraj K (mrv2)
./mapred groups gives NoClassDefFoundError
- MAPREDUCE-4049.
Major sub-task reported by Avner BenHanoch and fixed by Avner BenHanoch (performance , task , tasktracker)
plugin for generic shuffle service
Allow ReduceTask loading a third party plugin for shuffle (and merge) instead of the default shuffle.
- MAPREDUCE-3678.
Major new feature reported by Bejoy KS and fixed by Harsh J (mrv1 , mrv2)
The Map tasks logs should have the value of input split it processed
A map-task's syslogs now carries basic info on the InputSplit it processed.
- MAPREDUCE-2454.
Minor new feature reported by Mariappan Asokan and fixed by Mariappan Asokan
Allow external sorter plugin for MR
MAPREDUCE-4807 Allow external implementations of the sort phase in a Map task
- MAPREDUCE-2264.
Major bug reported by Adam Kramer and fixed by Devaraj K (jobtracker)
Job status exceeds 100% in some cases
- MAPREDUCE-1806.
Major bug reported by Paul Yang and fixed by Gera Shegalov (harchive)
CombineFileInputFormat does not work with paths not on default FS
- MAPREDUCE-1700.
Major bug reported by Tom White and fixed by Tom White (task)
User supplied dependencies may conflict with MapReduce system JARs
- HDFS-4468.
Minor bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE
Fix TestHDFSCLI and TestQuota for HADOOP-9252
- HDFS-4462.
Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (namenode)
2NN will fail to checkpoint after an HDFS upgrade from a pre-federation version of HDFS
- HDFS-4458.
Major bug reported by wenwupeng and fixed by Binglin Chang (balancer)
start balancer failed with "Failed to create file [/system/balancer.id]" if configure IP on fs.defaultFS
- HDFS-4456.
Major new feature reported by Tsz Wo (Nicholas), SZE and fixed by Plamen Jeliazkov (webhdfs)
Add concat to HttpFS and WebHDFS REST API docs
- HDFS-4452.
Critical bug reported by Konstantin Shvachko and fixed by Konstantin Shvachko (namenode)
getAdditionalBlock() can create multiple blocks if the client times out and retries.
- HDFS-4451.
Major bug reported by Joshua Blatt and fixed by (balancer)
hdfs balancer command returns exit code 1 on success instead of 0
This is an incompatible change from release 2.0.2-alpha and prior releases. Balancer tool exited with exit code 1 on success. It is changed to exit with exit code 0 on success. Non 0 exit code indicates failure.
- HDFS-4445.
Blocker sub-task reported by Vinay and fixed by Vinay
All BKJM ledgers are not checked while tailing, So failover will fail.
- HDFS-4444.
Trivial bug reported by Stephen Chu and fixed by Stephen Chu
Add space between total transaction time and number of transactions in FSEditLog#printStatistics
- HDFS-4443.
Trivial bug reported by Christian Rohling and fixed by Christian Rohling (namenode)
Remove trailing '`' character from HDFS nodelist jsp
- HDFS-4428.
Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
FsDatasetImpl should disclose what the error is when a rename fails
- HDFS-4426.
Blocker bug reported by Jason Lowe and fixed by Arpit Agarwal (namenode)
Secondary namenode shuts down immediately after startup
- HDFS-4415.
Major bug reported by Robert Kanter and fixed by Robert Kanter
HostnameFilter should handle hostname resolution failures and continue processing
- HDFS-4404.
Critical bug reported by liaowenrui and fixed by Todd Lipcon (ha , hdfs-client)
Create file failure when the machine of first attempted NameNode is down
- HDFS-4403.
Minor bug reported by Todd Lipcon and fixed by Todd Lipcon (hdfs-client)
DFSClient can infer checksum type when not provided by reading first byte
The HDFS implementation of getFileChecksum() can now operate correctly against earlier-version datanodes which do not include the checksum type information in their checksum response. The checksum type is automatically inferred by issuing a read of the first byte of each block.
- HDFS-4393.
Minor improvement reported by Brandon Li and fixed by Brandon Li
Empty request and responses in protocol translators can be static final members
- HDFS-4392.
Trivial improvement reported by Andrew Purtell and fixed by Andrew Purtell (test)
Use NetUtils#getFreeSocketPort in MiniDFSCluster
- HDFS-4385.
Critical bug reported by Thomas Graves and fixed by Thomas Graves (build)
Maven RAT plugin is not checking all source files
- HDFS-4384.
Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (libhdfs)
test_libhdfs_threaded gets SEGV if JNIEnv cannot be initialized
- HDFS-4381.
Major improvement reported by Jing Zhao and fixed by Jing Zhao (namenode)
Document fsimage format details in FSImageFormat class javadoc
- HDFS-4377.
Trivial bug reported by Eli Collins and fixed by Eli Collins
Some trivial DN comment cleanup
- HDFS-4375.
Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (namenode , security)
Use token request messages defined in hadoop common
- HDFS-4369.
Blocker bug reported by Suresh Srinivas and fixed by Suresh Srinivas (namenode)
GetBlockKeysResponseProto does not handle null response
Protobuf message GetBlockKeysResponseProto member keys is made optional from required so that null values can be passed over the wire. This is an incompatible wire protocol change and does not affect the API backward compatibility.
- HDFS-4367.
Blocker bug reported by Suresh Srinivas and fixed by Suresh Srinivas (namenode)
GetDataEncryptionKeyResponseProto does not handle null response
Member dataEncryptionKey of the protobuf message GetDataEncryptionKeyResponseProto is made optional instead of required. This is incompatible change is not likely to affect the existing users (that are using HDFS FileSystem and other public APIs).
- HDFS-4364.
Blocker bug reported by Suresh Srinivas and fixed by Suresh Srinivas
GetLinkTargetResponseProto does not handle null path
Protobuf message GetLinkTargetResponseProto member targetPath is made optional from required so that null values can be passed over the wire. This is an incompatible wire protocol change and does not affect the API backward compatibility.
- HDFS-4363.
Major bug reported by Suresh Srinivas and fixed by Suresh Srinivas
Combine PBHelper and HdfsProtoUtil and remove redundant methods
- HDFS-4362.
Critical bug reported by Suresh Srinivas and fixed by Suresh Srinivas
GetDelegationTokenResponseProto does not handle null token
- HDFS-4359.
Major bug reported by Liang Xie and fixed by Liang Xie (datanode)
remove an unnecessary synchronized keyword in BPOfferService.java
- HDFS-4351.
Major bug reported by Andrew Wang and fixed by Andrew Wang (namenode)
Fix BlockPlacementPolicyDefault#chooseTarget when avoiding stale nodes
- HDFS-4350.
Major bug reported by Andrew Wang and fixed by Andrew Wang
Make enabling of stale marking on read and write paths independent
This patch makes an incompatible configuration change, as described below:
In releases 1.1.0 and other point releases 1.1.x, the configuration parameter "dfs.namenode.check.stale.datanode" could be used to turn on checking for the stale nodes. This configuration is no longer supported in release 1.2.0 onwards and is renamed as "dfs.namenode.avoid.read.stale.datanode".
How feature works and configuring this feature:
As described in HDFS-3703 release notes, datanode stale period can be configured using parameter "dfs.namenode.stale.datanode.interval" in seconds (default value is 30 seconds). NameNode can be configured to use this staleness information for reads using configuration "dfs.namenode.avoid.read.stale.datanode". When this parameter is set to true, namenode picks a stale datanode as the last target to read from when returning block locations for reads. Using staleness information for writes is as described in the releases notes of HDFS-3912.
- HDFS-4349.
Major test reported by Konstantin Shvachko and fixed by Konstantin Shvachko (namenode , test)
Test reading files from BackupNode
- HDFS-4347.
Major bug reported by Konstantin Shvachko and fixed by Plamen Jeliazkov (namenode , test)
TestBackupNode can go into infinite loop "Waiting checkpoint to complete."
- HDFS-4344.
Major bug reported by tamtam180 and fixed by Andy Isaacson (namenode)
dfshealth.jsp throws NumberFormatException when dfs.hosts/dfs.hosts.exclude includes port number
- HDFS-4326.
Major task reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur
bump up Tomcat version for HttpFS to 6.0.36
- HDFS-4315.
Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (datanode)
DNs with multiple BPs can have BPOfferServices fail to start due to unsynchronized map access
- HDFS-4308.
Major bug reported by Konstantin Shvachko and fixed by Plamen Jeliazkov (namenode)
addBlock() should persist file blocks once
- HDFS-4307.
Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
SocketCache should use monotonic time
- HDFS-4306.
Major bug reported by Binglin Chang and fixed by Binglin Chang
PBHelper.convertLocatedBlock miss convert BlockToken
- HDFS-4302.
Major bug reported by Eugene Koontz and fixed by Eugene Koontz (ha , namenode)
Precondition in EditLogFileInputStream's length() method is checked too early in NameNode startup, causing fatal exception
- HDFS-4295.
Major bug reported by Stephen Chu and fixed by Stephen Chu (security)
Using port 1023 should be valid when starting Secure DataNode
- HDFS-4294.
Major bug reported by Robert Parker and fixed by Robert Parker
Backwards compatibility is not maintained for TestVolumeId
- HDFS-4292.
Minor bug reported by Binglin Chang and fixed by Binglin Chang
Sanity check not correct in RemoteBlockReader2.newBlockReader
- HDFS-4291.
Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
edit log unit tests leave stray test_edit_log_file around
- HDFS-4288.
Critical bug reported by Daryn Sharp and fixed by Daryn Sharp (namenode)
NN accepts incremental BR as IBR in safemode
- HDFS-4282.
Major bug reported by Junping Du and fixed by Todd Lipcon (namenode , test)
TestEditLog.testFuzzSequences FAILED in all pre-commit test
- HDFS-4279.
Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (namenode)
NameNode does not initialize generic conf keys when started with -recover
- HDFS-4274.
Minor bug reported by Chris Nauroth and fixed by Chris Nauroth (datanode)
BlockPoolSliceScanner does not close verification log during shutdown
- HDFS-4270.
Minor bug reported by Derek Dagit and fixed by Derek Dagit (namenode)
Replications of the highest priority should be allowed to choose a source datanode that has reached its max replication limit
- HDFS-4268.
Major bug reported by Konstantin Shvachko and fixed by Konstantin Shvachko (namenode)
Remove redundant enum NNHAStatusHeartbeat.State
- HDFS-4259.
Minor improvement reported by Harsh J and fixed by Harsh J (hdfs-client)
Improve pipeline DN replacement failure message
- HDFS-4247.
Blocker sub-task reported by Daryn Sharp and fixed by Daryn Sharp (namenode)
saveNamespace should be tolerant of dangling lease
- HDFS-4242.
Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (namenode)
Map.Entry is incorrectly used in LeaseManager
- HDFS-4238.
Major bug reported by Vinay and fixed by Todd Lipcon (ha)
[HA] Standby namenode should not do purging of shared storage edits.
- HDFS-4236.
Blocker bug reported by Allen Wittenauer and fixed by Alejandro Abdelnur
Regression: HDFS-4171 puts artificial limit on username length
- HDFS-4232.
Blocker bug reported by Daryn Sharp and fixed by Daryn Sharp (namenode)
NN fails to write a fsimage with stale leases
- HDFS-4231.
Major improvement reported by Konstantin Shvachko and fixed by Konstantin Shvachko (ha , namenode)
Introduce HAState for BackupNode
- HDFS-4216.
Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (namenode)
Adding symlink should not ignore QuotaExceededException
- HDFS-4214.
Trivial improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (tools)
OfflineEditsViewer should print out the offset at which it encountered an error
- HDFS-4213.
Major new feature reported by Jing Zhao and fixed by Jing Zhao (hdfs-client , namenode)
When the client calls hsync, allows the client to update the file length in the NameNode
- HDFS-4199.
Minor test reported by Ivan A. Veselovsky and fixed by Ivan A. Veselovsky
Provide test for HdfsVolumeId
- HDFS-4186.
Critical bug reported by Kihwal Lee and fixed by Kihwal Lee (namenode)
logSync() is called with the write lock held while releasing lease
- HDFS-4182.
Critical bug reported by Todd Lipcon and fixed by Robert Joseph Evans (namenode)
SecondaryNameNode leaks NameCache entries
- HDFS-4181.
Critical bug reported by Kihwal Lee and fixed by Kihwal Lee (namenode)
LeaseManager tries to double remove and prints extra messages
- HDFS-4179.
Major bug reported by Konstantin Shvachko and fixed by Konstantin Shvachko (namenode)
BackupNode: allow reads, fix checkpointing, safeMode
- HDFS-4178.
Major bug reported by Andy Isaacson and fixed by Andy Isaacson (scripts)
shell scripts should not close stderr
- HDFS-4172.
Minor bug reported by Derek Dagit and fixed by Derek Dagit (namenode)
namenode does not URI-encode parameters when building URI for datanode request
- HDFS-4171.
Major bug reported by Harsh J and fixed by Alejandro Abdelnur
WebHDFS and HttpFs should accept only valid Unix user names
- HDFS-4164.
Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (fuse-dfs)
fuse_dfs: add -lrt to the compiler command line on Linux
- HDFS-4162.
Minor bug reported by Derek Dagit and fixed by Derek Dagit (datanode)
Some malformed and unquoted HTML strings are returned from datanode web ui
- HDFS-4156.
Major bug reported by Eli Collins and fixed by Eli Reisman
Seeking to a negative position should throw an IOE
- HDFS-4155.
Major improvement reported by Liang Xie and fixed by Liang Xie (libhdfs)
libhdfs implementation of hsync API
- HDFS-4153.
Major improvement reported by Liang Xie and fixed by Liang Xie (journal-node)
Add START_MSG/SHUTDOWN_MSG for JournalNode
- HDFS-4143.
Minor improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (namenode)
Change INodeFile.blocks to private
- HDFS-4140.
Major bug reported by Andy Isaacson and fixed by Colin Patrick McCabe (fuse-dfs)
fuse-dfs handles open(O_TRUNC) poorly
- HDFS-4139.
Major bug reported by Andy Isaacson and fixed by Colin Patrick McCabe (fuse-dfs)
fuse-dfs RO mode still allows file truncation
- HDFS-4132.
Major bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (libhdfs)
when libwebhdfs is not enabled, nativeMiniDfsClient frees uninitialized memory
- HDFS-4130.
Major sub-task reported by Han Xiao and fixed by Han Xiao (ha , performance)
BKJM: The reading for editlog at NN starting using bkjm is not efficient
- HDFS-4127.
Minor bug reported by Junping Du and fixed by Junping Du (namenode)
Log message is not correct in case of short of replica
- HDFS-4122.
Major bug reported by Suresh Srinivas and fixed by Suresh Srinivas (datanode , hdfs-client , namenode)
Cleanup HDFS logs and reduce the size of logged messages
The change from this jira changes the content of some of the log messages. No log message are removed. Only the content of the log messages is changed to reduce the size. If you have a tool that depends on the exact content of the log, please look at the patch and make appropriate updates to the tool.
- HDFS-4121.
Minor improvement reported by Binglin Chang and fixed by Binglin Chang
Add namespace declarations in hdfs .proto files for languages other than java
- HDFS-4112.
Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (namenode)
A few improvements on INodeDirectory
- HDFS-4110.
Trivial improvement reported by Liang Xie and fixed by Liang Xie (journal-node)
Refine JNStorage log
- HDFS-4107.
Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (namenode)
Add utility methods to cast INode to INodeFile and INodeFileUnderConstruction
- HDFS-4106.
Minor bug reported by Jing Zhao and fixed by Jing Zhao (namenode , test)
BPServiceActor#lastHeartbeat, lastBlockReport and lastDeletedReport should be declared as volatile
- HDFS-4105.
Major bug reported by Arpit Gupta and fixed by Arpit Gupta
the SPNEGO user for secondary namenode should use the web keytab
- HDFS-4104.
Minor bug reported by Andy Isaacson and fixed by Andy Isaacson
dfs -test -d prints inappropriate error on nonexistent directory
- HDFS-4100.
Major sub-task reported by Liang Xie and fixed by Liang Xie (datanode , journal-node , security)
Fix all findbug security warings
- HDFS-4099.
Minor bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (namenode)
Clean up replication code and add more javadoc
- HDFS-4090.
Critical bug reported by Kihwal Lee and fixed by Kihwal Lee (hdfs-client)
getFileChecksum() result incompatible when called against zero-byte files.
- HDFS-4088.
Minor improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (namenode)
Remove "throws QuotaExceededException" from an INodeDirectoryWithQuota constructor
- HDFS-4080.
Major bug reported by Kihwal Lee and fixed by Kihwal Lee (namenode)
Add a separate logger for block state change logs to enable turning off those logs
- HDFS-4075.
Critical bug reported by Kihwal Lee and fixed by Kihwal Lee (namenode)
Reduce recommissioning overhead
- HDFS-4074.
Trivial improvement reported by Brandon Li and fixed by Brandon Li (namenode)
Remove empty constructors for INode
- HDFS-4073.
Minor improvement reported by Tsz Wo (Nicholas), SZE and fixed by Jing Zhao (namenode)
Two minor improvements to FSDirectory
- HDFS-4072.
Minor bug reported by Jing Zhao and fixed by Jing Zhao (namenode)
On file deletion remove corresponding blocks pending replication
- HDFS-4068.
Minor improvement reported by Eli Collins and fixed by Eli Collins (datanode)
DatanodeID and DatanodeInfo member should be private
- HDFS-4061.
Major bug reported by Eli Collins and fixed by Eli Collins
TestBalancer and TestUnderReplicatedBlocks need timeouts
- HDFS-4059.
Minor sub-task reported by Jing Zhao and fixed by Jing Zhao (datanode , namenode)
Add number of stale DataNodes to metrics
This jira adds a new metric with name "StaleDataNodes" under metrics context "dfs" of type Gauge. This tracks the number of DataNodes marked as stale. A DataNode is marked stale when the heartbeat message from the DataNode is not received within the configured time ""dfs.namenode.stale.datanode.interval".
Please see hdfs-default.xml documentation corresponding to ""dfs.namenode.stale.datanode.interval" for more details on how to configure this feature. When this feature is not configured, this metrics would return zero.
- HDFS-4058.
Major improvement reported by Eli Collins and fixed by Eli Collins (datanode)
DirectoryScanner may fail with IOOB if the directory scanning threads return out of volume order
- HDFS-4055.
Major bug reported by Binglin Chang and fixed by Binglin Chang
TestAuditLogs is flaky
- HDFS-4049.
Critical bug reported by Todd Lipcon and fixed by Todd Lipcon (datanode , performance)
hflush performance regression due to nagling delays
- HDFS-4048.
Major improvement reported by Stephen Chu and fixed by Stephen Chu
Use ERROR instead of INFO for volume failure logs
- HDFS-4046.
Minor bug reported by Binglin Chang and fixed by Binglin Chang (datanode , namenode)
ChecksumTypeProto use NULL as enum value which is illegal in C/C++
- HDFS-4044.
Major bug reported by Binglin Chang and fixed by Binglin Chang (datanode)
Duplicate ChecksumType definition in HDFS .proto files
- HDFS-4041.
Major improvement reported by Chris Nauroth and fixed by Chris Nauroth (build)
Hadoop HDFS Maven protoc calls must not depend on external sh script
- HDFS-4038.
Minor sub-task reported by Vinay and fixed by Vinay (ha)
Override toString() for BookKeeperEditLogInputStream
- HDFS-4037.
Major improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (namenode)
Rename the getReplication() method in BlockCollection to getBlockReplication()
- HDFS-4036.
Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Jing Zhao (namenode)
FSDirectory.unprotectedAddFile(..) should not throw UnresolvedLinkException
- HDFS-4035.
Major sub-task reported by Eli Collins and fixed by Eli Collins
LightWeightGSet and LightWeightHashSet increment a volatile without synchronization
- HDFS-4034.
Major sub-task reported by Eli Collins and fixed by Eli Collins
Remove redundant null checks
- HDFS-4033.
Major sub-task reported by Eli Collins and fixed by Eli Collins
Miscellaneous findbugs 2 fixes
- HDFS-4032.
Major sub-task reported by Eli Collins and fixed by Eli Collins
Specify the charset explicitly rather than rely on the default
- HDFS-4031.
Major sub-task reported by Eli Collins and fixed by Eli Collins (namenode)
Update findbugsExcludeFile.xml to include findbugs 2 exclusions
- HDFS-4030.
Major sub-task reported by Eli Collins and fixed by Eli Collins (namenode)
BlockManager excessBlocksCount and postponedMisreplicatedBlocksCount should be AtomicLongs
- HDFS-4029.
Major sub-task reported by Eli Collins and fixed by Eli Collins (namenode)
GenerationStamp should use an AtomicLong
- HDFS-4022.
Blocker bug reported by suja s and fixed by Vinay
Replication not happening for appended block
- HDFS-4021.
Minor bug reported by Colin Patrick McCabe and fixed by Christopher Conner (namenode)
Misleading error message when resources are low on the NameNode
- HDFS-4020.
Major bug reported by Eli Collins and fixed by Eli Collins
TestRBWBlockInvalidation may time out
- HDFS-4018.
Minor bug reported by Eli Collins and fixed by Eli Collins
TestDataNodeMultipleRegistrations#testMiniDFSClusterWithMultipleNN is missing some cluster cleanup
- HDFS-4008.
Minor improvement reported by Eli Collins and fixed by Eli Collins (test)
TestBalancerWithEncryptedTransfer needs a timeout
- HDFS-4007.
Minor test reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (test)
Rehabilitate bit-rotted unit tests under hadoop-hdfs-project/hadoop-hdfs/src/test/unit/
- HDFS-4006.
Major bug reported by Eli Collins and fixed by Todd Lipcon (namenode)
TestCheckpoint#testSecondaryHasVeryOutOfDateImage occasionally fails due to unexpected exit
- HDFS-4000.
Major improvement reported by Eli Collins and fixed by Colin Patrick McCabe
TestParallelLocalRead fails with "input ByteBuffers must be direct buffers"
- HDFS-3999.
Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur
HttpFS OPEN operation expects len parameter, it should be length
- HDFS-3997.
Trivial bug reported by Mithun Radhakrishnan and fixed by Mithun Radhakrishnan (namenode)
OfflineImageViewer incorrectly passes value of imageVersion when visiting IS_COMPRESSED element
- HDFS-3996.
Minor bug reported by Eli Collins and fixed by Eli Collins
Add debug log removed in HDFS-3873 back
- HDFS-3992.
Minor bug reported by Ivan A. Veselovsky and fixed by Ivan A. Veselovsky
Method org.apache.hadoop.hdfs.TestHftpFileSystem.tearDown() sometimes throws NPEs
- HDFS-3990.
Critical bug reported by Daryn Sharp and fixed by Daryn Sharp (namenode)
NN's health report has severe performance problems
- HDFS-3985.
Major bug reported by Eli Collins and fixed by (test)
Add timeouts to TestMulitipleNNDataBlockScanner
- HDFS-3979.
Major bug reported by Lars Hofhansl and fixed by Lars Hofhansl (datanode)
Fix hsync semantics
- HDFS-3970.
Major bug reported by Vinay and fixed by Andrew Wang (datanode)
BlockPoolSliceStorage#doRollback(..) should use BlockPoolSliceStorage instead of DataStorage to read prev version file.
- HDFS-3964.
Minor bug reported by Eli Collins and fixed by Eli Collins (namenode)
Make NN log of fs.defaultFS debug rather than info
- HDFS-3957.
Minor improvement reported by Andrew Wang and fixed by Andrew Wang
Change MutableQuantiles to use a shared thread for rolling over metrics
- HDFS-3951.
Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (datanode)
datanode web ui does not work over HTTPS when datanode is started in secure mode
- HDFS-3949.
Minor bug reported by Eli Collins and fixed by Eli Collins (namenode)
NameNodeRpcServer#join should join on both client and server RPC servers
- HDFS-3948.
Minor bug reported by Eli Collins and fixed by Jing Zhao (test)
TestWebHDFS#testNamenodeRestart occasionally fails
- HDFS-3944.
Major task reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur
Httpfs resolveAuthority() is not resolving host correctly
- HDFS-3939.
Minor improvement reported by Eli Collins and fixed by Eli Collins (namenode)
NN RPC address cleanup
- HDFS-3938.
Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (documentation)
remove current limitations from HttpFS docs
- HDFS-3936.
Major bug reported by Eli Collins and fixed by Eli Collins
MiniDFSCluster shutdown races with BlocksMap usage
- HDFS-3935.
Major sub-task reported by Eli Collins and fixed by Andy Isaacson
QJM: Add JournalNode to the start / stop scripts
- HDFS-3932.
Major bug reported by Eli Collins and fixed by Eli Collins
NameNode Web UI broken if the rpc-address is set to the wildcard
- HDFS-3931.
Minor bug reported by Eli Collins and fixed by Andy Isaacson (test)
TestDatanodeBlockScanner#testBlockCorruptionPolicy2 is broken
- HDFS-3925.
Minor improvement reported by Andrew Wang and fixed by Andrew Wang
Prettify PipelineAck#toString() for printing to a log
- HDFS-3924.
Major bug reported by Andrew Wang and fixed by Andrew Wang (hdfs-client)
Multi-byte id in HdfsVolumeId
- HDFS-3923.
Major sub-task reported by Jing Zhao and fixed by Jing Zhao
libwebhdfs testing code cleanup
- HDFS-3921.
Major bug reported by Stephen Chu and fixed by Aaron T. Myers
NN will prematurely consider blocks missing when entering active state while still in safe mode
- HDFS-3920.
Major sub-task reported by Jing Zhao and fixed by Jing Zhao
libwebdhfs code cleanup: string processing and using strerror consistently to handle all errors
- HDFS-3919.
Minor bug reported by Andy Isaacson and fixed by Andy Isaacson (test)
MiniDFSCluster:waitClusterUp can hang forever
- HDFS-3916.
Major bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (webhdfs)
libwebhdfs (C client) code cleanups
- HDFS-3912.
Major sub-task reported by Jing Zhao and fixed by Jing Zhao
Detecting and avoiding stale datanodes for writing
- HDFS-3910.
Minor improvement reported by Eli Collins and fixed by Eli Collins (test)
DFSTestUtil#waitReplication should timeout
- HDFS-3896.
Minor improvement reported by Jeff Lord and fixed by Jeff Lord
Add descriptions for dfs.namenode.rpc-address and dfs.namenode.servicerpc-address to hdfs-default.xml
- HDFS-3831.
Critical bug reported by Jason Lowe and fixed by Jason Lowe (security)
Failure to renew tokens due to test-sources left in classpath
- HDFS-3829.
Major bug reported by Trevor Robinson and fixed by Trevor Robinson (test)
TestHftpURLTimeouts fails intermittently with JDK7
- HDFS-3824.
Major bug reported by Trevor Robinson and fixed by Trevor Robinson (test)
TestHftpDelegationToken fails intermittently with JDK7
- HDFS-3813.
Major improvement reported by Stephen Chu and fixed by Stephen Chu (security , webhdfs)
Log error message if security and WebHDFS are enabled but principal/keytab are not configured
- HDFS-3810.
Major sub-task reported by Ivan Kelly and fixed by Ivan Kelly
Implement format() for BKJM
- HDFS-3809.
Major sub-task reported by Ivan Kelly and fixed by Ivan Kelly (namenode)
Make BKJM use protobufs for all serialization with ZK
- HDFS-3804.
Major bug reported by Trevor Robinson and fixed by Trevor Robinson (test)
TestHftpFileSystem fails intermittently with JDK7
- HDFS-3789.
Major sub-task reported by Ivan Kelly and fixed by Ivan Kelly (ha , namenode)
JournalManager#format() should be able to throw IOException
- HDFS-3753.
Major bug reported by Eli Collins and fixed by Colin Patrick McCabe (build , test)
Tests don't run with native libraries
- HDFS-3703.
Major improvement reported by nkeywal and fixed by Jing Zhao (datanode , namenode)
Decrease the datanode failure detection time
This jira adds a new DataNode state called "stale" at the NameNode. DataNodes are marked as stale if it does not send heartbeat message to NameNode within the timeout configured using the configuration parameter "dfs.namenode.stale.datanode.interval" in seconds (default value is 30 seconds). NameNode picks a stale datanode as the last target to read from when returning block locations for reads.
This feature is by default turned * off *. To turn on the feature, set the HDFS configuration "dfs.namenode.check.stale.datanode" to true.
- HDFS-3695.
Major sub-task reported by Todd Lipcon and fixed by Todd Lipcon (ha , namenode)
Genericize format() to non-file JournalManagers
- HDFS-3682.
Minor improvement reported by Eli Collins and fixed by Todd Lipcon (test)
MiniDFSCluster#init should provide more info when it fails
- HDFS-3680.
Minor improvement reported by Marcelo Vanzin and fixed by Marcelo Vanzin (namenode)
Allow customized audit logging in HDFS FSNamesystem
- HDFS-3678.
Critical bug reported by Todd Lipcon and fixed by Aaron T. Myers (namenode)
Edit log files are never being purged from 2NN
- HDFS-3626.
Blocker bug reported by Todd Lipcon and fixed by Todd Lipcon (namenode)
Creating file with invalid path can corrupt edit log
- HDFS-3623.
Major sub-task reported by Uma Maheswara Rao G and fixed by Uma Maheswara Rao G (namenode)
BKJM: zkLatchWaitTimeout hard coded to 6000. Make use of ZKSessionTimeout instead.
- HDFS-3616.
Major bug reported by Uma Maheswara Rao G and fixed by Jing Zhao (datanode)
TestWebHdfsWithMultipleNameNodes fails with ConcurrentModificationException in DN shutdown
- HDFS-3598.
Major new feature reported by Tsz Wo (Nicholas), SZE and fixed by Plamen Jeliazkov (webhdfs)
WebHDFS: support file concat
- HDFS-3573.
Minor sub-task reported by Todd Lipcon and fixed by Todd Lipcon (namenode)
Supply NamespaceInfo when instantiating JournalManagers
- HDFS-3571.
Major sub-task reported by Todd Lipcon and fixed by Todd Lipcon (ha , namenode)
Allow EditLogFileInputStream to read from a remote URL
- HDFS-3553.
Blocker bug reported by Daryn Sharp and fixed by Daryn Sharp
Hftp proxy tokens are broken
- HDFS-3510.
Major bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
Improve FSEditLog pre-allocation
- HDFS-3507.
Critical bug reported by Vinay and fixed by Vinay (ha)
DFS#isInSafeMode needs to execute only on Active NameNode
- HDFS-3483.
Major improvement reported by Stephen Chu and fixed by Stephen Fritz
Better error message when hdfs fsck is run against a ViewFS config
- HDFS-3429.
Major bug reported by Todd Lipcon and fixed by Todd Lipcon (datanode , performance)
DataNode reads checksums even if client does not need them
- HDFS-3373.
Major bug reported by Todd Lipcon and fixed by John George (hdfs-client)
FileContext HDFS implementation can leak socket caches
- HDFS-3224.
Minor bug reported by Eli Collins and fixed by Jason Lowe
Bug in check for DN re-registration with different storage ID
- HDFS-3077.
Major new feature reported by Todd Lipcon and fixed by Todd Lipcon (ha , namenode)
Quorum-based protocol for reading and writing edit logs
- HDFS-3049.
Minor new feature reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (namenode)
During the normal loading NN startup process, fall back on a different EditLog if we see one that is corrupt
- HDFS-2946.
Major improvement reported by Aaron T. Myers and fixed by Aaron T. Myers (ha , namenode)
HA: Put a cap on the number of completed edits files retained by the NN
- HDFS-2908.
Minor sub-task reported by Suresh Srinivas and fixed by Brandon Li
Add apache license header for StorageReport.java
- HDFS-2656.
Major improvement reported by Zhanwei.Wang and fixed by Jing Zhao (webhdfs)
Implement a pure c client based on webhdfs
- HDFS-2264.
Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (namenode)
NamenodeProtocol has the wrong value for clientPrincipal in KerberosInfo annotation
- HDFS-1331.
Minor bug reported by Allen Wittenauer and fixed by Andy Isaacson (tools)
dfs -test should work like /bin/test
"test" will not print a warning for non-existent paths when testing for existence
- HDFS-1322.
Major bug reported by Ravi Gummadi and fixed by Colin Patrick McCabe
Document umask in DistributedFileSystem#mkdirs javadocs
- HDFS-1245.
Major new feature reported by Dmytro Molkov and fixed by Konstantin Shvachko (namenode)
Plugable block id generation
- HADOOP-9289.
Blocker bug reported by Daryn Sharp and fixed by Daryn Sharp (fs)
FsShell rm -f fails for non-matching globs
- HADOOP-9278.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (fs)
HarFileSystem may leak file handle
- HADOOP-9276.
Minor improvement reported by Arun C Murthy and fixed by Arun C Murthy
Allow BoundedByteArrayOutputStream to be resettable
- HADOOP-9260.
Critical bug reported by Jerry Chen and fixed by Chris Nauroth
Hadoop version may be not correct when starting name node or data node
- HADOOP-9255.
Critical bug reported by Thomas Graves and fixed by Thomas Graves (scripts)
relnotes.py missing last jira
- HADOOP-9252.
Minor bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (util)
StringUtils.humanReadableInt(..) has a race condition
- HADOOP-9247.
Minor improvement reported by Ivan A. Veselovsky and fixed by Ivan A. Veselovsky
parametrize Clover "generateXxx" properties to make them re-definable via -D in mvn calls
- HADOOP-9231.
Major bug reported by Konstantin Boudnik and fixed by Konstantin Boudnik (build)
Parametrize staging URL for the uniformity of distributionManagement
- HADOOP-9221.
Major bug reported by Andy Isaacson and fixed by Andy Isaacson
Convert remaining xdocs to APT
- HADOOP-9217.
Major test reported by Andrey Klochkov and fixed by Andrey Klochkov (test)
Print thread dumps when hadoop-common tests fail
- HADOOP-9216.
Major improvement reported by Tsuyoshi OZAWA and fixed by Tsuyoshi OZAWA (io)
CompressionCodecFactory#getCodecClasses should trim the result of parsing by Configuration.
- HADOOP-9215.
Blocker bug reported by Thomas Graves and fixed by Colin Patrick McCabe
when using cmake-2.6, libhadoop.so doesn't get created (only libhadoop.so.1.0.0)
- HADOOP-9212.
Major bug reported by Tom White and fixed by Tom White (fs)
Potential deadlock in FileSystem.Cache/IPC/UGI
- HADOOP-9203.
Trivial bug reported by Andrew Purtell and fixed by Andrew Purtell (ipc , test)
RPCCallBenchmark should find a random available port
- HADOOP-9193.
Minor bug reported by Jason Lowe and fixed by Andy Isaacson (scripts)
hadoop script can inadvertently expand wildcard arguments when delegating to hdfs script
- HADOOP-9192.
Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (security)
Move token related request/response messages to common
- HADOOP-9190.
Major bug reported by Thomas Graves and fixed by Andy Isaacson (documentation)
packaging docs is broken
- HADOOP-9183.
Major bug reported by Tom White and fixed by Tom White (ha)
Potential deadlock in ActiveStandbyElector
- HADOOP-9181.
Major bug reported by Liang Xie and fixed by Liang Xie
Set daemon flag for HttpServer's QueuedThreadPool
- HADOOP-9178.
Minor bug reported by Sandy Ryza and fixed by Sandy Ryza
src/main/conf is missing hadoop-policy.xml
- HADOOP-9173.
Major bug reported by Suresh Srinivas and fixed by Suresh Srinivas
Add security token protobuf definition to common and use it in hdfs
- HADOOP-9162.
Minor improvement reported by Binglin Chang and fixed by Binglin Chang (native)
Add utility to check native library availability
- HADOOP-9155.
Minor bug reported by Binglin Chang and fixed by Binglin Chang
FsPermission should have different default value, 777 for directory and 666 for file
- HADOOP-9153.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (viewfs)
Support createNonRecursive in ViewFileSystem
- HADOOP-9152.
Minor bug reported by Brock Noland and fixed by Brock Noland (fs)
HDFS can report negative DFS Used on clusters with very small amounts of data
- HADOOP-9147.
Trivial improvement reported by Jonathan Allen and fixed by Jonathan Allen
Add missing fields to FIleStatus.toString
Update FileStatus.toString to include missing fields
- HADOOP-9135.
Trivial bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (security)
JniBasedUnixGroupsMappingWithFallback should log at debug rather than info during fallback
- HADOOP-9127.
Major improvement reported by Daisuke Kobayashi and fixed by Daisuke Kobayashi (documentation)
Update documentation for ZooKeeper Failover Controller
- HADOOP-9119.
Minor test reported by Steve Loughran and fixed by Steve Loughran (fs , test)
Add test to FileSystemContractBaseTest to verify integrity of overwritten files
Patches adds more tests to verify overwritten and more complex operations -write-delete-overwrite. By using differently sized datasets and different data inside, these tests verify that the overwrite really did take place. While HDFS meets all these requirements directly, eventually consistent object stores may not -hence these tests.
- HADOOP-9118.
Trivial improvement reported by Steve Loughran and fixed by (test)
FileSystemContractBaseTest test data for read/write isn't rigorous enough
Resolved as part of HADOOP-9119 -it's test data generator creates more bits in every test byte
- HADOOP-9113.
Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla (security , test)
o.a.h.fs.TestDelegationTokenRenewer is failing intermittently
- HADOOP-9106.
Major improvement reported by Todd Lipcon and fixed by Robert Parker (ipc)
Allow configuration of IPC connect timeout
This jira introduces a new configuration parameter "ipc.client.connect.timeout". This configuration defines the Hadoop RPC connection timeout in milliseconds for a client to connect to a server. For details see the description associated with this configuration in core-default.xml.
- HADOOP-9105.
Major bug reported by Daryn Sharp and fixed by Daryn Sharp (fs)
FsShell -moveFromLocal erroneously fails
- HADOOP-9103.
Major bug reported by yixiaohua and fixed by Todd Lipcon (io)
UTF8 class does not properly decode Unicode characters outside the basic multilingual plane
- HADOOP-9097.
Critical bug reported by Tom White and fixed by Thomas Graves (build)
Maven RAT plugin is not checking all source files
- HADOOP-9093.
Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas
Move all the Exception in PathExceptions to o.a.h.fs package
- HADOOP-9090.
Minor new feature reported by Mostafa Elhemali and fixed by Mostafa Elhemali (metrics)
Support on-demand publish of metrics
- HADOOP-9072.
Major bug reported by Robert Parker and fixed by Robert Parker
Hadoop-Common-0.23-Build Fails to build in Jenkins
- HADOOP-9070.
Blocker sub-task reported by Daryn Sharp and fixed by Daryn Sharp (ipc)
Kerberos SASL server cannot find kerberos key
- HADOOP-9067.
Minor test reported by Ivan A. Veselovsky and fixed by Ivan A. Veselovsky
provide test for method org.apache.hadoop.fs.LocalFileSystem.reportChecksumFailure(Path, FSDataInputStream, long, FSDataInputStream, long)
- HADOOP-9064.
Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla (security)
Augment DelegationTokenRenewer API to cancel the tokens on calls to removeRenewAction
- HADOOP-9054.
Major new feature reported by Robert Kanter and fixed by Robert Kanter (security)
Add AuthenticationHandler that uses Kerberos but allows for an alternate form of authentication for browsers
- HADOOP-9049.
Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla (security)
DelegationTokenRenewer needs to be Singleton and FileSystems should register/deregister to/from.
- HADOOP-9042.
Minor test reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
Add a test for umask in FileSystemContractBaseTest
- HADOOP-9041.
Critical bug reported by Radim Kolar and fixed by Radim Kolar (fs)
FileSystem initialization can go into infinite loop
- HADOOP-9038.
Minor test reported by Ivan A. Veselovsky and fixed by Ivan A. Veselovsky
provide unit-test coverage of class org.apache.hadoop.fs.LocalDirAllocator.AllocatorPerContext.PathIterator
- HADOOP-9035.
Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (security)
Generalize setup of LoginContext
- HADOOP-9025.
Major bug reported by Robert Joseph Evans and fixed by Jonathan Eagles
org.apache.hadoop.tools.TestCopyListing failing
- HADOOP-9022.
Major bug reported by Haiyang Jiang and fixed by Jonathan Eagles
Hadoop distcp tool fails to copy file if -m 0 specified
- HADOOP-9021.
Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (ipc , security)
Enforce configured SASL method on the server
- HADOOP-9020.
Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (ipc , security)
Add a SASL PLAIN server
- HADOOP-9015.
Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (ipc)
Standardize creation of SaslRpcServers
- HADOOP-9014.
Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (ipc)
Standardize creation of SaslRpcClients
- HADOOP-9013.
Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (fs , security)
UGI should not hardcode loginUser's authenticationType
- HADOOP-9012.
Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (ipc)
IPC Client sends wrong connection context
- HADOOP-9010.
Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (fs , security)
Map UGI authenticationMethod to RPC authMethod
- HADOOP-9009.
Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (fs , security)
Add SecurityUtil methods to get/set authentication method
- HADOOP-9004.
Major improvement reported by Stephen Chu and fixed by Stephen Chu (security , test)
Allow security unit tests to use external KDC
- HADOOP-8999.
Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (ipc)
SASL negotiation is flawed
The RPC SASL negotiation now always ends with final response. If the SASL mechanism does not have a final response (GSSAPI, PLAIN), then an empty success response is sent to the client. The client will now always expect a final response to definitively know if negotiation is complete/successful.
- HADOOP-8998.
Minor improvement reported by Andy Isaacson and fixed by Alejandro Abdelnur
set Cache-Control no-cache header on all dynamic content
- HADOOP-8994.
Minor bug reported by Andy Isaacson and fixed by Andy Isaacson (test)
TestDFSShell creates file named "noFileHere", making further tests hard to understand
- HADOOP-8992.
Minor improvement reported by Ivan A. Veselovsky and fixed by Ivan A. Veselovsky
Enhance unit-test coverage of class HarFileSystem
- HADOOP-8986.
Critical bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (ipc)
Server$Call object is never released after it is sent
- HADOOP-8985.
Minor improvement reported by Binglin Chang and fixed by Binglin Chang (ha , ipc)
Add namespace declarations in .proto files for languages other than java
- HADOOP-8981.
Major bug reported by Chris Nauroth and fixed by Xuan Gong (metrics)
TestMetricsSystemImpl fails on Windows
- HADOOP-8962.
Critical bug reported by Jason Lowe and fixed by Jason Lowe (fs)
RawLocalFileSystem.listStatus fails when a child filename contains a colon
- HADOOP-8951.
Minor improvement reported by Steve Loughran and fixed by Steve Loughran (util)
RunJar to fail with user-comprehensible error message if jar missing
- HADOOP-8948.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (test)
TestFileUtil.testGetDU fails on Windows due to incorrect assumption of line separator
- HADOOP-8932.
Major improvement reported by Kihwal Lee and fixed by Kihwal Lee (security)
JNI-based user-group mapping modules can be too chatty on lookup failures
- HADOOP-8931.
Trivial improvement reported by Eli Collins and fixed by Eli Collins
Add Java version to startup message
- HADOOP-8930.
Major improvement reported by Andrey Klochkov and fixed by Andrey Klochkov (test)
Cumulative code coverage calculation
- HADOOP-8929.
Major improvement reported by Todd Lipcon and fixed by Todd Lipcon (metrics)
Add toString, other improvements for SampleQuantiles
- HADOOP-8926.
Major improvement reported by Gopal V and fixed by Gopal V (util)
hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data
Speed up Crc32 by improving the cache hit-ratio of hadoop.util.PureJavaCrc32
- HADOOP-8925.
Minor improvement reported by Eli Collins and fixed by Eli Collins (build)
Remove the packaging
- HADOOP-8922.
Trivial improvement reported by Damien Hardy and fixed by Damien Hardy (metrics)
Provide alternate JSONP output for JMXJsonServlet to allow javascript in browser dashboard
Add a JSONP alternative outpout for /jmx HTTP interface to provide a Javascript polling ability in browsers.
- HADOOP-8913.
Minor bug reported by Sandy Ryza and fixed by Sandy Ryza (metrics)
hadoop-metrics2.properties should give units in comment for sampling period
- HADOOP-8912.
Major bug reported by Raja Aluri and fixed by Raja Aluri (build)
adding .gitattributes file to prevent CRLF and LF mismatches for source and text files
- HADOOP-8911.
Major bug reported by Raja Aluri and fixed by Raja Aluri (build)
CRLF characters in source and text files
- HADOOP-8909.
Major improvement reported by Chris Nauroth and fixed by Chris Nauroth (build)
Hadoop Common Maven protoc calls must not depend on external sh script
- HADOOP-8906.
Critical bug reported by Daryn Sharp and fixed by Daryn Sharp (fs)
paths with multiple globs are unreliable
- HADOOP-8901.
Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (native)
GZip and Snappy support may not work without unversioned libraries
- HADOOP-8900.
Major bug reported by Slavik Krassovsky and fixed by Andy Isaacson
BuiltInGzipDecompressor throws IOException - stored gzip size doesn't match decompressed size
- HADOOP-8894.
Major improvement reported by Todd Lipcon and fixed by Todd Lipcon
GenericTestUtils.waitFor should dump thread stacks on timeout
- HADOOP-8889.
Major improvement reported by Todd Lipcon and fixed by Todd Lipcon (build , test)
Upgrade to Surefire 2.12.3
- HADOOP-8883.
Major bug reported by Robert Kanter and fixed by Robert Kanter
Anonymous fallback in KerberosAuthenticator is broken
- HADOOP-8881.
Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (security)
FileBasedKeyStoresFactory initialization logging should be debug not info
- HADOOP-8878.
Major bug reported by Arpit Gupta and fixed by Arpit Gupta
uppercase namenode hostname causes hadoop dfs calls with webhdfs filesystem and fsck to fail when security is on
- HADOOP-8866.
Minor improvement reported by Andrew Wang and fixed by Andrew Wang
SampleQuantiles#query is O(N^2) instead of O(N)
- HADOOP-8860.
Major task reported by Tom White and fixed by Tom White (documentation)
Split MapReduce and YARN sections in documentation navigation
- HADOOP-8855.
Minor bug reported by Todd Lipcon and fixed by Todd Lipcon (security)
SSL-based image transfer does not work when Kerberos is disabled
- HADOOP-8851.
Minor improvement reported by Ivan A. Veselovsky and fixed by Ivan A. Veselovsky (test)
Use -XX:+HeapDumpOnOutOfMemoryError JVM option in the forked tests
- HADOOP-8849.
Minor improvement reported by Ivan A. Veselovsky and fixed by Ivan A. Veselovsky
FileUtil#fullyDelete should grant the target directories +rwx permissions before trying to delete them
- HADOOP-8843.
Critical bug reported by Robert Joseph Evans and fixed by Jason Lowe
Old trash directories are never deleted on upgrade from 1.x
- HADOOP-8833.
Major bug reported by Harsh J and fixed by Harsh J (fs)
fs -text should make sure to call inputstream.seek(0) before using input stream
- HADOOP-8822.
Major bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans
relnotes.py was deleted post mavenization
- HADOOP-8819.
Major bug reported by Brandon Li and fixed by Brandon Li (fs)
Should use && instead of & in a few places in FTPFileSystem,FTPInputStream,S3InputStream,ViewFileSystem,ViewFs
- HADOOP-8816.
Major bug reported by Moritz Moeller and fixed by Moritz Moeller (net)
HTTP Error 413 full HEAD if using kerberos authentication
- HADOOP-8812.
Minor improvement reported by Eli Collins and fixed by Eli Collins
ExitUtil#terminate should print Exception#toString
- HADOOP-8811.
Critical bug reported by Radim Kolar and fixed by Radim Kolar (native)
Compile hadoop native library in FreeBSD
- HADOOP-8806.
Minor improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (build)
libhadoop.so: dlopen should be better at locating libsnappy.so, etc.
- HADOOP-8804.
Minor improvement reported by Eli Collins and fixed by Senthil V Kumar
Improve Web UIs when the wildcard address is used
- HADOOP-8795.
Minor bug reported by Sean Mackrory and fixed by Sean Mackrory (scripts)
BASH tab completion doesn't look in PATH, assumes path to executable is specified
- HADOOP-8791.
Major bug reported by Bertrand Dechoux and fixed by Jing Zhao (documentation)
rm "Only deletes non empty directory and files."
- HADOOP-8789.
Minor improvement reported by Andy Isaacson and fixed by Andy Isaacson (test)
Tests setLevel(Level.OFF) should be Level.ERROR
- HADOOP-8786.
Major bug reported by Todd Lipcon and fixed by Todd Lipcon
HttpServer continues to start even if AuthenticationFilter fails to init
- HADOOP-8784.
Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (ipc , security)
Improve IPC.Client's token use
- HADOOP-8783.
Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (ipc , security)
Improve RPC.Server's digest auth
- HADOOP-8780.
Major bug reported by Ahmed Radwan and fixed by Ahmed Radwan
Update DeprecatedProperties apt file
- HADOOP-8756.
Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (native)
Fix SEGV when libsnappy is in java.library.path but not LD_LIBRARY_PATH
- HADOOP-8755.
Major improvement reported by Andrey Klochkov and fixed by Andrey Klochkov (test)
Print thread dump when tests fail due to timeout
- HADOOP-8736.
Major improvement reported by Brandon Li and fixed by Brandon Li (ipc)
Add Builder for building an RPC server
- HADOOP-8713.
Major bug reported by Trevor Robinson and fixed by Trevor Robinson (test)
TestRPCCompatibility fails intermittently with JDK7
- HADOOP-8712.
Minor improvement reported by Robert Parker and fixed by Robert Parker (security)
Change default hadoop.security.group.mapping
The default group mapping policy has been changed to JniBasedUnixGroupsNetgroupMappingWithFallback. This should maintain the same semantics as the prior default for most users.
- HADOOP-8684.
Minor bug reported by Hiroshi Ikeda and fixed by Jing Zhao (io)
Deadlock between WritableComparator and WritableComparable
- HADOOP-8616.
Major bug reported by Eli Collins and fixed by Sandy Ryza (viewfs)
ViewFS configuration requires a trailing slash
- HADOOP-8597.
Major new feature reported by Harsh J and fixed by Ivan Vladimirov Ivanov (fs)
FsShell's Text command should be able to read avro data files
- HADOOP-8589.
Major bug reported by Andrey Klochkov and fixed by Sanjay Radia (fs , test)
ViewFs tests fail when tests and home dirs are nested
- HADOOP-8561.
Major improvement reported by Luke Lu and fixed by Yu Gao (security)
Introduce HADOOP_PROXY_USER for secure impersonation in child hadoop client processes
- HADOOP-8427.
Major task reported by Eli Collins and fixed by Andy Isaacson (documentation)
Convert Forrest docs to APT, incremental
- HADOOP-8418.
Major bug reported by Luke Lu and fixed by Yu Gao (security)
Fix UGI for IBM JDK running on Windows
- HADOOP-7886.
Minor improvement reported by Jakob Homan and fixed by SreeHari
Add toString to FileStatus
- HADOOP-7688.
Major improvement reported by Tsz Wo (Nicholas), SZE and fixed by Uma Maheswara Rao G
When a servlet filter throws an exception in init(..), the Jetty server failed silently.
- HADOOP-7115.
Major bug reported by Arun C Murthy and fixed by Alejandro Abdelnur
Add a cache for getpwuid_r and getpwgid_r calls
- HADOOP-6762.
Critical bug reported by sam rash and fixed by sam rash
exception while doing RPC I/O closes channel
- HADOOP-6607.
Minor bug reported by Steve Loughran and fixed by Alejandro Abdelnur (io)
Add different variants of non caching HTTP headers